I have a file and a field is a time stamp like 20141028 20:49:49, I want to get the hour 20, so I use the system command :
hour=system("date -d\""$5"\" +'%H'")
the time stamp is the fifth field in my file so I used $5. But when I executed the program I found the command above just output 20 and return 0 so hour is 0 but not 20, so my question is how to get the hour in the time stamp ?
I know a method which use split function two times like this:
split($5, vec, " " )
split(vec[2], vec2, ":")
But this method is a little inefficient and ugly.
so are there any other solutions? Thanks
Another way using gawk:
gawk 'match($5, " ([0-9]+):", r){print r[1]}' input_file
If you want to know how to manage externall process output in awk:
awk '{cmd="date -d \""$5"\" +%H";cmd|getline hour;print hour;close(cmd)}' input_file
You can use the substr function to extract the hour without using system command.
for example:
awk {'print substr("20:49:49",1,2)}'
will produce output as
20
Or more specifically as in question
$ awk {'print substr("20141028 20:49:49",10,2)}'
20
substr(str, pos, len) extracts a substring from str at position pos and lenght len
if the value of $5 is 20141028 20:49:49,
$ awk {'print substr($5,10,2)}'
20
Related
I am trying to split a file column using the substr awk command. So the input is as follows (it consists of 4 lines, one blank line):
#NS500645:122:HYGVMBGX2:4:21402:2606:16446:ACCTAGAAGG:R1
ACCTAGAAGGATATGCGCTTGCGCGTTAGAGATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTATGATCC
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
I want to split the second line by the pattern "GATC" but keeping it on the right sub-string like:
ACCTAGAAGGATATGCGCTTGCGCGTTAGA GATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTATGATCC
I want that the last line have the same length as the splitted one and regenerate the file like:
ACCTAGAAGGATATGCGCTTGCGCGTTAGA
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEE
GATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTAT
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
GATCC
EEEEE
For split the last colum I am using this awk script:
cat prove | paste - - - - | awk 'BEGIN
{FS="\t"; OFS="\t"}\ {gsub("GATC","/tGATC", $2); {split ($2, a, "\t")};\ for
(i in a) print substr($4, length(a[i-1])+1,
length(a[i-1])+length(a[i]))}'
But the output is as follows:
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Being the second and third line longer that expected.
I check the calculated length that are passed to the substr command and are correct:
1 30
31 70
41 45
Using these length the output should be:
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEE
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
EEEEE
But as I showed it is not the case.
Any suggestions?
I guess you're looking something like this, but your question formatting is really confusing
$ awk -v OFS='\t' 'NR==1 {next}
NR==2 {n=index($0,"GATC")}
/^[^+]/ {print substr($0,1,n-1),substr($0,n)}' file
ACCTAGAAGGATATGCGCTTGCGCGTTAGA GATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTATGATCC
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEE EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
I assumed your file is in this format
dummy header line to be ignored
ACCTAGAAGGATATGCGCTTGCGCGTTAGAGATCACTAGAGCTAAGGAATTTGAGATTACAGTAAGCTATGATCC
+
/AAAAEEEEEEEEEEAAEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
I am trying to count the entries that are less than the e threshold of 1e-5 in my tab-del data file that looks something like the table below.
col1 col2 col3 eval
entry1 - - 1e-10
entry2 - - -
entry3 - - 0.001
I used this code:
$: awk -F"\t" '{print $4}' table.txt | awk '($1 + 0) < 1e-5' | grep [0-9*] | wc -l
This outputs:
$: 1
While this works, I would like to improve the command into something pure awk. I would love to know how to do this in awk. Also, I would like to know how to print the line that satisfies the threshold if this is possible. Thank for helping!
This is probably the best way:
awk -F"\t" '($4+0==$4) && ($4 < 1E-5){c++}END{print c}' file
This does the following:
($4+0==$4): first conditional to check if $4 is a number.
($4<1E-5): second conditional to check if the value matches the range
&&: If both conditions are satisfied, increment a counter c
at the END, print the value of c
Be aware that your grep in your original command will fail. If $4 in the original file would read like XXX1XXX (i.e. a string with a number in it) or XXX*XXX (i.e. a string with an asterisk in it), it would be counted as a match.
I have a text file like this
ID
MQ2427D17-01_1_12
MQ2427D17-01_1_1
MQ2427D17-01_1_2
MQ2427D17-01_1_3
MQ2427D17-01_1_4
MQ2427D17-02_2_5
MQ2427D17-02_2_25
MQ2427D17-02_2_1
MQ2427D17-02_2_2
MQ2427D17-02_2_3
MQ2427D17-02_2_4
MQ2427D17-01_1_28
MQ3427D17-01_1_29
MQ3427D17-01_1_1
MQ3427D17-01_1_2
MQ3427D17-01_3_3
MQ3427D17-01_3_30
MQ3427D17-01_3_33
I want to change the numbers at the end whenever it is 1 to 13, whenever it is 2 to 14, whenever it is 3 to 15 , whenever it is 4 to 16, whenever it is 5 to 17, whenever it is 6 to 18, whenever it is 7 to 19 , .... whenever it is 12 to 24.
so the output looks like this
ID
MQ2427D17-01_1_24
MQ2427D17-01_1_13
MQ2427D17-01_1_14
MQ2427D17-01_1_15
MQ2427D17-01_1_16
MQ2427D17-02_2_17
MQ2427D17-02_2_25
MQ2427D17-02_2_13
MQ2427D17-02_2_14
MQ2427D17-02_2_15
MQ2427D17-02_2_16
MQ2427D17-01_1_28
MQ3427D17-01_1_29
MQ3427D17-01_1_13
MQ3427D17-01_1_14
MQ3427D17-01_3_15
MQ3427D17-01_3_30
MQ3427D17-01_3_33
I was trying to do it with this
sed 's/1/13/g' myfile.txt > modified.txt
sed = Stream EDitor
The command string:
s = the substitute command
original = a regular expression describing the number to replace
g = global (i.e. replace all and not just the first occurrence)
myfile.txt = mydata
modified.txt = the output
but this will change the number anywhere they are
I don't know why the solution below does not work, for example on this example data
ID
MQ3HHD2D17-01_1_1
MQ3HHD2D17-01_1_2
MQ3HHD2D17-01_1_3
MQ3HHD2D17-01_1_4
MQ3HHD2D17-01_1_5
MQ3HHD2D17-01_1_6
MQ3HHD2D17-01_1_7
MQ3HHD2D17-01_1_8
MQ3HHD2D17-01_1_9
MQ3HHD2D17-01_1_10
MQ3HHD2D17-01_1_11
MQ3HHD2D17-01_1_12
MQ4HHD2D17-01_2_1
MQ4HHD2D17-01_2_2
MQ4HHD2D17-01_2_3
MQ4HHD2D17-01_2_4
MQ4HHD2D17-01_2_5
MQ4HHD2D17-01_2_6
MQ4HHD2D17-01_2_7
MQ4HHD2D17-01_2_8
MQ4HHD2D17-01_2_9
MQ4HHD2D17-01_2_10
MQ4HHD2D17-01_2_11
MQ4HHD2D17-01_2_12
It should be
ID
MQ3HHD2D17-01_1_13
MQ3HHD2D17-01_1_14
MQ3HHD2D17-01_1_15
MQ3HHD2D17-01_1_16
MQ3HHD2D17-01_1_17
MQ3HHD2D17-01_1_18
MQ3HHD2D17-01_1_19
MQ3HHD2D17-01_1_20
MQ3HHD2D17-01_1_21
MQ3HHD2D17-01_1_22
MQ3HHD2D17-01_1_23
MQ3HHD2D17-01_1_24
MQ4HHD2D17-01_2_13
MQ4HHD2D17-01_2_14
MQ4HHD2D17-01_2_15
MQ4HHD2D17-01_2_16
MQ4HHD2D17-01_2_17
MQ4HHD2D17-01_2_18
MQ4HHD2D17-01_2_19
MQ4HHD2D17-01_2_20
MQ4HHD2D17-01_2_21
MQ4HHD2D17-01_2_22
MQ4HHD2D17-01_2_23
MQ4HHD2D17-01_2_24
From your description, we can observe a pattern: adding 12 to the end-numbers if the end-numbers are below 12. (Here, end-numbers refer to the numbers after the last underscore.)
awk can accomplish this task.
awk -F_ -v OFS=_ '{if($NF <= 12) $NF += 12;}1' myfile.txt >modified.txt
Flags:
-F_: input delimiter is _
-v OFS=_: one of awk's special variables, denoting the Output File Seperator (aka the output delimiter)
Others:
NF: another one of awk's special variables, denoting the Number of Fields
$NF: this will get the variable holding the last field.
{...}1: the 1 at the end tells awk to print everything.
I personally wouldn't recommend using sed since you'll need to replace 1 with 13, 2 with 14, 3 with 15, (and so on) individually. This makes it a mm... tedious candidate to manhandle. On the other hand, awk can perform basic mathematical computations (such as +12 as you've seen) while still being able to parse input.
Output:
ID
MQ2427D17-01_1_24
MQ2427D17-01_1_13
MQ2427D17-01_1_14
MQ2427D17-01_1_15
MQ2427D17-01_1_16
MQ2427D17-02_2_17
MQ2427D17-02_2_25
MQ2427D17-02_2_13
MQ2427D17-02_2_14
MQ2427D17-02_2_15
MQ2427D17-02_2_16
MQ2427D17-01_1_28
MQ3427D17-01_1_29
MQ3427D17-01_1_13
MQ3427D17-01_1_14
MQ3427D17-01_3_15
MQ3427D17-01_3_30
MQ3427D17-01_3_33
Could you please try following.
awk 'BEGIN{FS=OFS="_"} $NF>=1 && $NF<=12{$NF+=12} 1' Input_file
OR
awk 'BEGIN{FS=OFS="_"} {gsub(/\r/,"")} $NF>=1 && $NF<=12{$NF+=12} 1' Input_file
OR
tr -d '\r' < Input_file > temp && mv temp Input_file
awk 'BEGIN{FS=OFS="_"} $NF>=1 && $NF<=12{$NF+=12} 1' Input_file
After doing troubleshooting with user in chatroom came to know that OP has control M characters(which OP doesn't want to have) so advised OP to remove them by doing tr -d '\r' < Input_file > temp && mv temp Input_file and then run above code.
A generic solution using Perl one-liner
perl -pe ' s/(\d+)$/$1<13?$1+12:$1/ge '
with inputs.
$ perl -pe ' s/(\d+)$/ $1<13 ? $1+12 : $1/ge ' learner.txt
ID
MQ2427D17-01_1_24
MQ2427D17-01_1_13
MQ2427D17-01_1_14
MQ2427D17-01_1_15
MQ2427D17-01_1_16
MQ2427D17-02_2_17
MQ2427D17-02_2_25
MQ2427D17-02_2_13
MQ2427D17-02_2_14
MQ2427D17-02_2_15
MQ2427D17-02_2_16
MQ2427D17-01_1_28
MQ3427D17-01_1_29
MQ3427D17-01_1_13
MQ3427D17-01_1_14
MQ3427D17-01_3_15
MQ3427D17-01_3_30
MQ3427D17-01_3_33
$
000Bxxxxx111118064085vxas - header
10000000001000000000053009-000000000053009-
10000000005000000000000000+000000000000000+
10000000030000000004025404-000000004025404-
10000000039000000000004930-000000000004930-
10000000088000005417665901-000005417665901-
90000060883328364801913 - trailer
In the above file we have header and trailer and the records which start with 1 is the detail record
in the detail record,want to sum the values starting from position 28 till 44 including the sign using awk/sed command
Here is sed, with help from bc to do the arithmetic:
sed -rn '
/header|trailer/! {
s/[[:digit:]]*[+-]([[:digit:]]+)([+-])$/\2\1/
H
}
$ {
x
s/\n//gp
}
' file | bc
I assume the +/- sign follows the number.
Using awk we can solve this problem making use of substr:
substr(s, m[, n ]):
Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, or if n specifies more characters than are left in the string, the length of the substring shall be limited by the length of the string s.
This allows us to take the string which represents the number. Here, I assumed that the sign before and after the number is same and thus the sign of the number :
$ echo "10000000001000000000053009-000000000053009-" \
| awk '{print length($0); print substr($0,27,43-27)}'
43
-000000000053009
Since awk implicitly converts strings to numbers if you do numeric operations on them we can write the following awk-code to achieve the requested :
$ awk '/header|trailer/{next}
{s+=substr($0,27,43-27)}
END{print s}' file.dat
-5421749244
Or in a single line:
$ awk '/header|trailer/{next}{s+=substr($0,27,43-27)} END{print s}' file.dat
-5421749244
The above examples just work on the example file given by the OP. However, if you have a file containing multiple blocks with header and trailer and you only want to use the text inside these blocks (exclude everything outside of the blocks), then you should handle it a bit differently :
$ awk '/header/{s=0;c=1;next}
/trailer/{S+=s;c=0;next}
c{s+=substr($0,27,43-27)}
END{print S}' file.dat
Here we do the following:
If a line with header is found, reset the block sum s to ZERO and set c=1 indicating that we take the next lines into account
If a line with trailer is found, add the block sum s to the overall sum S and set c=0 indicating to ignore the lines.
If c/=0 compute the block sum s
At the END, print the total sum S
I got very limited knowledge with awk.
I got big csv files (500.000 lines) with following lines format:
'0000011197118123','136',,'35993706', '33745', '22052', 'appsflyer.com'
'0000011194967123','136',,'35282806', '74518', '30317', 'crashlytics.com'
'0000011199022123’,’139',,'01363100', '8776250', '373671', 'whatsapp.com'
............
I need to cut first 8 digit from first column and add date field, as a new first column, (date should be the day-1 date) like following:
'2016/03/12','97118123','136',,'35993706','33745','22052','appsflyer.com'
'2016/03/12','94967123','136',,'35282806','74518','30317','crashlytics.com'
'2016/03/12','99022123’,’139',,'01363100','8776250','373671','whatsapp.com'
Thanks a lot for your time.
M.Tave
You can do something similar to:
awk -F, -v date="2016/03/12" 'BEGIN{OFS=FS}
{sub(/^.{8}/, "'\''", $1)
s="'\''"date"'\''"
$1=s OFS $1
print }' csv_file
I did not understand how you a determining your date, so i just used a string.
Based on comments, you can do:
awk -v d="2016/03/12" 'sub(/^.{8}/,"'\''"d"'\'','\''")' csv_file
$ awk -v d='2016/03/12' '{print "\047" d "\047,\047" substr($0,10)}' file
'2016/03/12','97118123','136',,'35993706', '33745', '22052', 'appsflyer.com'
'2016/03/12','94967123','136',,'35282806', '74518', '30317', 'crashlytics.com'
'2016/03/12','99022123’,’139',,'01363100', '8776250', '373671', 'whatsapp.com'