Multiple conditional output from single input - awk
I am having a file test.txt. I am looking for multiple pattern matching and I am printing them independently one by one with
awk 'substr($1,5,15) ~ /ccc/ { print $0 }' test.txt >test1.txt
awk 'substr($1,5,15) ~ /abb/ { print $0 }' test.txt >test2.txt
awk 'substr($1,5,15) ~ /abc/ { print $0 }' test.txt >test3.txt
Now, can I run it in one go. Like after
awk 'substr($1,5,15) ~ /ccc/ { print $0 }' test.txt
in the lines which dont match the above pattern can I run
awk 'substr($1,5,15) ~ /abb/ { print $0 }'
and similarly in unmatched pattern lines
awk 'substr($1,5,15) ~ /abc/ { print $0 }'
Input file test.txt
NNNNNabcabAAAAATCTAATCTGCCAGTT
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNaaabbCTAGTTTGTGTAGTAATTTT
NNNNNaaaabTTTTTTTTTTTTTTTTTTTT
NNNNNabbbbTTTTTTCACTACTGGGTTTC
NNNNNabcaaTTTTTTTTAATGGGTCTCAA
NNNNNabaccTTTTTTTTTCGGGAGGCGGG
NNNNNccaaaTTTTTTTTTTTTTATTTGAG
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcaaTAGAGACGGGGTTTCACCAT
NNNNNabcaaTTTTTGTCGAAGATCTCACC
NNNNNabcabTTGGTAAACAGGCGGGTGTA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNaaabbTTTTTGCAAAAAGTAATTTG
NNNNNabcabTTTTTTTTTCTTTCTGCCTG
NNNNNabcaaTTTTGAGACAGAATCTTGCT
NNNNNaaabbTTTTTTTTTTTTTACTAGTG
NNNNNabcccTAGACAGGGAATACTTTATT
NNNNNabcabGACAGGGAATACTTATATTC
awk 'substr($1,5,15) ~ /ccc/ { print $0 }' test.txt >test1.txt
test1.txt
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcccTAGACAGGGAATACTTTATT
awk 'substr($1,5,15) ~ /abb/ { print $0 }' test.txt >test2.txt
test2.txt
NNNNNaaabbCTAGTTTGTGTAGTAATTTT
NNNNNabbbbTTTTTTCACTACTGGGTTTC
NNNNNaaabbTTTTTGCAAAAAGTAATTTG
NNNNNaaabbTTTTTTTTTTTTTACTAGTG
awk 'substr($1,5,15) ~ /abc/ { print $0 }' test.txt >test3.txt
NNNNNabcabAAAAATCTAATCTGCCAGTT
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcaaTTTTTTTTAATGGGTCTCAA
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcaaTAGAGACGGGGTTTCACCAT
NNNNNabcaaTTTTTGTCGAAGATCTCACC
NNNNNabcabTTGGTAAACAGGCGGGTGTA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcabTTTTTTTTTCTTTCTGCCTG
NNNNNabcaaTTTTGAGACAGAATCTTGCT
NNNNNabcccTAGACAGGGAATACTTTATT
NNNNNabcabGACAGGGAATACTTATATTC
While doing like this, following lines are in two output files
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcccTAGACAGGGAATACTTTATT
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcccTTTTTTTTTACACACAATTC
What I am looking for is once an output is print, I dont want to look for matching patten in those input files again. My expected output
test1.txt
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcccTAGACAGGGAATACTTTATT
test2.txt
NNNNNaaabbCTAGTTTGTGTAGTAATTTT
NNNNNabbbbTTTTTTCACTACTGGGTTTC
NNNNNaaabbTTTTTGCAAAAAGTAATTTG
NNNNNaaabbTTTTTTTTTTTTTACTAGTG
test3.txt
NNNNNabcabAAAAATCTAATCTGCCAGTT
NNNNNabcaaTTTTTTTTAATGGGTCTCAA
NNNNNabcaaTAGAGACGGGGTTTCACCAT
NNNNNabcaaTTTTTGTCGAAGATCTCACC
NNNNNabcabTTGGTAAACAGGCGGGTGTA
NNNNNabcabTTTTTTTTTCTTTCTGCCTG
NNNNNabcaaTTTTGAGACAGAATCTTGCT
NNNNNabcabGACAGGGAATACTTATATTC
To do all three in one awk process, try:
awk 'substr($1,5,15) ~ /ccc/ { print>"test1.txt"}
substr($1,5,15) ~ /abb/ { print>"test2.txt"}
substr($1,5,15) ~ /abc/ { print>"test3.txt"}' test.txt
Here, print>"test1.txt" prints to file test1.txt.
Note that > means something different in awk than it means in shell. In awk, like in shell, the first print to a file will overwrite the previous contents of the file. However, unlike shell, subsequent awk print statements using > append to the file.
Variation: Printing only to the first matched output file
awk 'substr($1,5,15) ~ /ccc/ { print>"test1.txt"; next}
substr($1,5,15) ~ /abb/ { print>"test2.txt"; next}
substr($1,5,15) ~ /abc/ { print>"test3.txt"}' test.txt
Here, when a match is found, next tells awk to skip the rest of the tests and jump to start over on the next line.
awk '
{
str = substr($1,5,15)
out = 0
if (str ~ /ccc/) out=1
else if (str ~ /abb/) out=2
else if (str ~ /abc/) out=3
}
out { print > ("test" out ".txt") }
' test.txt
With GNU awk you could use a switch statement instead of nested ifs.
This golf presumes no concurrent matches.
gawk '{
match(substr($1,5,15), /(ccc)|(abb)|(abc)/, A) # probably unnecessary substring
for(i in A) n=i # get last index of A (match number)
print > "test" n ".txt" # print to variable filename
}' test.txt
Related
printing information of two files according specific field
I have two files. I need to print information like the example, when the first field exist and is equal, in two files. file 1 20;"aaaaaa";99292929 24;"fsfdfa";42933294 30;"fsdsff";23832299 38;"fjsdjl";62673777 file 2 13;"fsdffsdfs";2272777 20;"ffuiiii";23728877 30;"wdwfsdh";8882817 40;"sfjslll";82371111 expect result: file1;20;"aaaaaa";99292929;file2;20;"ffuiiii";23728877 file1,30;"fsdsff";23832299;file2;30;"wdwfsdh";8882817 I tried with: awk 'FNR==NR{a[$1]=$1;next} $1 in a' file2 file1 > newfile logical it's ok, but I can't show fields that I want.
awk will help: awk -F ';' 'NR==FNR{rec[$1]=FILENAME FS $0} NR>FNR{ if($1 in rec){ print rec[$1] FS FILENAME FS $0 } }' file{1..2} should do.
$ cat tst.awk BEGIN { FS=OFS=";" } { $0 = FILENAME FS $0 } NR==FNR { a[$2] = $0; next } $2 in a { print a[$2], $0 } $ awk -f tst.awk file1 file2 file1;20;"aaaaaa";99292929;file2;20;"ffuiiii";23728877 file1;30;"fsdsff";23832299;file2;30;"wdwfsdh";8882817
Print columns from two files
How to print columns from various files? I tried according to Awk: extract different columns from many different files paste <(awk '{printf "%.4f %.5f ", $1, $2}' FILE.R ) <(awk '{printf "%.6f %.0f.\n", $3, $4}' FILE_R ) FILE.R == ARGV[1] { one[FNR]=$1 } FILE.R == ARGV[2] { two[FNR]=$2 } FILE_R == ARGV[3] { three[FNR]=$3 } FILE_R == ARGV[4] { four[FNR]=$4 } END { for (i=1; i<=length(one); i++) { print one[i], two[i], three[i], four[i] } } but I don't understand how to use this script. FILE.R 56604.6017 2.3893 2.2926 2.2033 56605.1562 2.3138 2.2172 2.2033 FILE_R 56604.6017 2.29259 0.006699 42. 56605.1562 2.21716 0.007504 40. Output desired 56604.6017 2.3893 0.006699 42. 56605.1562 2.3138 0.007504 40. Thank you
This is one way: $ awk -v OFS="\t" 'NR==FNR{a[$1]=$2;next}{print $1,a[$1],$3,$4}' file1 file2 Output: 56604.6017 2.3893 0.006699 42. 56605.1562 2.3138 0.007504 40. Explained: $ awk -v OFS="\t" ' # setting the field separator to a tab NR==FNR { # process the first file a[$1]=$2 # hash the second field, use first as key next } { print $1,a[$1],$3,$4 # output }' file1 file2 If the field spacing with tabs is not enough, use printf with modifiers like in your sample.
While Read and AWK to Change Field
I have two files - FileA and FileB. FileA has 10 fields with 100 lines. If Field1 and Field2 match, Field3 should be changed. FileB has 3 fields. I am reading in FileB with a while loop to match the two fields and to get the value that should be use for field 3. while IFS=$'\t' read hostname interface metric; do awk -v var1=${hostname} -v var2=${interface} -v var3=${metric} '{if ($1 ~ var1 && $2 ~ var2) $3=var3; print $0}' OFS="\t" FileA.txt done < FileB.txt At each line iteration, this prints FileB.txt with the single line that changed. I only want it to print the line that was changed. Please Help!
It's a smell to be calling awk once for each line of file B. You should be able to accomplish this task with a single pass through each file. Try something like this: awk -F'\t' -v OFS='\t' ' # first, read in data from file B NR == FNR { values[$1 FS $2] = $3; next } # then, output modified lines from matching lines in file A ($1 FS $2) in values { $3 = values[$1 FS $2]; print } ' fileB fileA I'm assuming that you actually want to match with string equality instead of ~ pattern matching.
I only want it to print the line that was changed. Simply put your print $0 statement to if clause body: '{if ($1 ~ var1 && $2 ~ var2) { $3=var3; print $0 }}' or even shorter: '$1~var1 && $2~var2{ $3=var3; print $0 }'
awk command to split nth field
I am learning AWK and was trying some exercises on built-in string functions. Here's my exercise: I have a file containing as below RecordType:83 1,2,3,a|x|y|z,4,5 And my desired output is as below: RecordType:83 1,2,3,a,4,5 1,0,0,x,4,5 1,0,0,y,4,5 1,0,0,z,4,5 I wrote an awk command for the above output. awk -F',' '$1 ~ /RecordType:83/{print $0} $1 == 1{ split($4,splt,"|") for(i in splt) { if(i==1) print $1,$2,$3,splt[i],$5,$6 else print $1,0,0,splt[i],$5,$6 } }' OFS=, file_name The above command looks so clumsy. Is there any way minimizing the command? Thanks in advance
The shortest possible one-liner I could manage: awk -F, 'NR>1{n=split($4,a,"|");for(;i++<n;){$4=a[i];print;$2=$3=0}}NR==1' OFS=, file RecordType:83 1,2,3,a,4,5 1,0,0,x,4,5 1,0,0,y,4,5 1,0,0,z,4,5 The much more readable script (recommended): BEGIN { FS=OFS="," # Comma delimiter } NR==1 { # If the first line in file print $0 # Print the whole line next # Skip to next line } { n=split($4,a,"|") # Split field four on | for(i=1;i<=n;i++) # For each sub-field print $1,i==1?$2OFS$3:"0"OFS"0",a[i],$5,$6 # Print the output }
another shorter one-liner awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file with your example: kent$ awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file RecordType:83 1,2,3,a,4,5 1,0,0,x,4,5 1,0,0,y,4,5 1,0,0,z,4,5
How to print out a specific field in AWK?
A very simple question, which a found no answer to. How do I print out a specific field in awk? awk '/word1/', will print out the whole sentence, when I need just a word1. Or I need a chain of patterns (word1 + word2) to be printed out only from a text.
Well if the pattern is a single word (which you want to print and can't contaion FS (input field separator)) why not: awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print MYPATTERN }' INPUTFILE If your pattern is a regex: awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print gensub(".*(" MYPATTERN ").*","\\1","1",$0) }' INPUTFILE If your pattern must be checked in every single field: awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { for (i=1;i<=NF;i++) { if ($i ~ MYPATTERN) { print "Field " i " in " NR " row matches: " MYPATTERN } } }' INPUTFILE Modify any of the above to your taste.
The fields in awk are represented by $1, $2, etc: $ echo this is a string | awk '{ print $2 }' is $0 is the whole line, $1 is the first field, $2 is the next field ( or blank ), $NF is the last field, $( NF - 1 ) is the 2nd to last field, etc. EDIT (in response to comment). You could try: awk '/crazy/{ print substr( $0, match( $0, "crazy" ), RLENGTH )}'
i know you can do this with awk : an alternative would be : sed -nr "s/.*(PATTERN_TO_MATCH).*/\1/p" file or you can use grep -o
Something like this perhaps: awk '{split("bla1 bla2 bla3",a," "); print a[1], a[2], a[3]}'