Multiple conditional output from single input - awk

I am having a file test.txt. I am looking for multiple pattern matching and I am printing them independently one by one with
awk 'substr($1,5,15) ~ /ccc/ { print $0 }' test.txt >test1.txt
awk 'substr($1,5,15) ~ /abb/ { print $0 }' test.txt >test2.txt
awk 'substr($1,5,15) ~ /abc/ { print $0 }' test.txt >test3.txt
Now, can I run it in one go. Like after
awk 'substr($1,5,15) ~ /ccc/ { print $0 }' test.txt
in the lines which dont match the above pattern can I run
awk 'substr($1,5,15) ~ /abb/ { print $0 }'
and similarly in unmatched pattern lines
awk 'substr($1,5,15) ~ /abc/ { print $0 }'
Input file test.txt
NNNNNabcabAAAAATCTAATCTGCCAGTT
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNaaabbCTAGTTTGTGTAGTAATTTT
NNNNNaaaabTTTTTTTTTTTTTTTTTTTT
NNNNNabbbbTTTTTTCACTACTGGGTTTC
NNNNNabcaaTTTTTTTTAATGGGTCTCAA
NNNNNabaccTTTTTTTTTCGGGAGGCGGG
NNNNNccaaaTTTTTTTTTTTTTATTTGAG
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcaaTAGAGACGGGGTTTCACCAT
NNNNNabcaaTTTTTGTCGAAGATCTCACC
NNNNNabcabTTGGTAAACAGGCGGGTGTA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNaaabbTTTTTGCAAAAAGTAATTTG
NNNNNabcabTTTTTTTTTCTTTCTGCCTG
NNNNNabcaaTTTTGAGACAGAATCTTGCT
NNNNNaaabbTTTTTTTTTTTTTACTAGTG
NNNNNabcccTAGACAGGGAATACTTTATT
NNNNNabcabGACAGGGAATACTTATATTC
awk 'substr($1,5,15) ~ /ccc/ { print $0 }' test.txt >test1.txt
test1.txt
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcccTAGACAGGGAATACTTTATT
awk 'substr($1,5,15) ~ /abb/ { print $0 }' test.txt >test2.txt
test2.txt
NNNNNaaabbCTAGTTTGTGTAGTAATTTT
NNNNNabbbbTTTTTTCACTACTGGGTTTC
NNNNNaaabbTTTTTGCAAAAAGTAATTTG
NNNNNaaabbTTTTTTTTTTTTTACTAGTG
awk 'substr($1,5,15) ~ /abc/ { print $0 }' test.txt >test3.txt
NNNNNabcabAAAAATCTAATCTGCCAGTT
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcaaTTTTTTTTAATGGGTCTCAA
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcaaTAGAGACGGGGTTTCACCAT
NNNNNabcaaTTTTTGTCGAAGATCTCACC
NNNNNabcabTTGGTAAACAGGCGGGTGTA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcabTTTTTTTTTCTTTCTGCCTG
NNNNNabcaaTTTTGAGACAGAATCTTGCT
NNNNNabcccTAGACAGGGAATACTTTATT
NNNNNabcabGACAGGGAATACTTATATTC
While doing like this, following lines are in two output files
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcccTAGACAGGGAATACTTTATT
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcccTTTTTTTTTACACACAATTC
What I am looking for is once an output is print, I dont want to look for matching patten in those input files again. My expected output
test1.txt
NNNNNabcccTTTTTCTAGTCACGATAGCC
NNNNNabcccTTTTTTTTTACACACAATTC
NNNNNabcccTAAGACTGGCCCACAGCTGA
NNNNNabcccTACTTTTTTTAGTGATACAC
NNNNNabcccTAGACAGGGAATACTTTATT
test2.txt
NNNNNaaabbCTAGTTTGTGTAGTAATTTT
NNNNNabbbbTTTTTTCACTACTGGGTTTC
NNNNNaaabbTTTTTGCAAAAAGTAATTTG
NNNNNaaabbTTTTTTTTTTTTTACTAGTG
test3.txt
NNNNNabcabAAAAATCTAATCTGCCAGTT
NNNNNabcaaTTTTTTTTAATGGGTCTCAA
NNNNNabcaaTAGAGACGGGGTTTCACCAT
NNNNNabcaaTTTTTGTCGAAGATCTCACC
NNNNNabcabTTGGTAAACAGGCGGGTGTA
NNNNNabcabTTTTTTTTTCTTTCTGCCTG
NNNNNabcaaTTTTGAGACAGAATCTTGCT
NNNNNabcabGACAGGGAATACTTATATTC

To do all three in one awk process, try:
awk 'substr($1,5,15) ~ /ccc/ { print>"test1.txt"}
substr($1,5,15) ~ /abb/ { print>"test2.txt"}
substr($1,5,15) ~ /abc/ { print>"test3.txt"}' test.txt
Here, print>"test1.txt" prints to file test1.txt.
Note that > means something different in awk than it means in shell. In awk, like in shell, the first print to a file will overwrite the previous contents of the file. However, unlike shell, subsequent awk print statements using > append to the file.
Variation: Printing only to the first matched output file
awk 'substr($1,5,15) ~ /ccc/ { print>"test1.txt"; next}
substr($1,5,15) ~ /abb/ { print>"test2.txt"; next}
substr($1,5,15) ~ /abc/ { print>"test3.txt"}' test.txt
Here, when a match is found, next tells awk to skip the rest of the tests and jump to start over on the next line.

awk '
{
str = substr($1,5,15)
out = 0
if (str ~ /ccc/) out=1
else if (str ~ /abb/) out=2
else if (str ~ /abc/) out=3
}
out { print > ("test" out ".txt") }
' test.txt
With GNU awk you could use a switch statement instead of nested ifs.

This golf presumes no concurrent matches.
gawk '{
match(substr($1,5,15), /(ccc)|(abb)|(abc)/, A) # probably unnecessary substring
for(i in A) n=i # get last index of A (match number)
print > "test" n ".txt" # print to variable filename
}' test.txt

Related

printing information of two files according specific field

I have two files. I need to print information like the example, when the first field exist and is equal, in two files.
file 1
20;"aaaaaa";99292929
24;"fsfdfa";42933294
30;"fsdsff";23832299
38;"fjsdjl";62673777
file 2
13;"fsdffsdfs";2272777
20;"ffuiiii";23728877
30;"wdwfsdh";8882817
40;"sfjslll";82371111
expect result:
file1;20;"aaaaaa";99292929;file2;20;"ffuiiii";23728877
file1,30;"fsdsff";23832299;file2;30;"wdwfsdh";8882817
I tried with:
awk 'FNR==NR{a[$1]=$1;next} $1 in a' file2 file1 > newfile
logical it's ok, but I can't show fields that I want.
awk will help:
awk -F ';' 'NR==FNR{rec[$1]=FILENAME FS $0}
NR>FNR{
if($1 in rec){
print rec[$1] FS FILENAME FS $0
}
}' file{1..2}
should do.
$ cat tst.awk
BEGIN { FS=OFS=";" }
{ $0 = FILENAME FS $0 }
NR==FNR { a[$2] = $0; next }
$2 in a { print a[$2], $0 }
$ awk -f tst.awk file1 file2
file1;20;"aaaaaa";99292929;file2;20;"ffuiiii";23728877
file1;30;"fsdsff";23832299;file2;30;"wdwfsdh";8882817

Print columns from two files

How to print columns from various files?
I tried according to Awk: extract different columns from many different files
paste <(awk '{printf "%.4f %.5f ", $1, $2}' FILE.R ) <(awk '{printf "%.6f %.0f.\n", $3, $4}' FILE_R )
FILE.R == ARGV[1] { one[FNR]=$1 }
FILE.R == ARGV[2] { two[FNR]=$2 }
FILE_R == ARGV[3] { three[FNR]=$3 }
FILE_R == ARGV[4] { four[FNR]=$4 }
END {
for (i=1; i<=length(one); i++) {
print one[i], two[i], three[i], four[i]
}
}
but I don't understand how to use this script.
FILE.R
56604.6017 2.3893 2.2926 2.2033
56605.1562 2.3138 2.2172 2.2033
FILE_R
56604.6017 2.29259 0.006699 42.
56605.1562 2.21716 0.007504 40.
Output desired
56604.6017 2.3893 0.006699 42.
56605.1562 2.3138 0.007504 40.
Thank you
This is one way:
$ awk -v OFS="\t" 'NR==FNR{a[$1]=$2;next}{print $1,a[$1],$3,$4}' file1 file2
Output:
56604.6017 2.3893 0.006699 42.
56605.1562 2.3138 0.007504 40.
Explained:
$ awk -v OFS="\t" ' # setting the field separator to a tab
NR==FNR { # process the first file
a[$1]=$2 # hash the second field, use first as key
next
}
{
print $1,a[$1],$3,$4 # output
}' file1 file2
If the field spacing with tabs is not enough, use printf with modifiers like in your sample.

While Read and AWK to Change Field

I have two files - FileA and FileB. FileA has 10 fields with 100 lines. If Field1 and Field2 match, Field3 should be changed. FileB has 3 fields. I am reading in FileB with a while loop to match the two fields and to get the value that should be use for field 3.
while IFS=$'\t' read hostname interface metric; do
awk -v var1=${hostname} -v var2=${interface} -v var3=${metric} '{if ($1 ~ var1 && $2 ~ var2) $3=var3; print $0}' OFS="\t" FileA.txt
done < FileB.txt
At each line iteration, this prints FileB.txt with the single line that changed. I only want it to print the line that was changed.
Please Help!
It's a smell to be calling awk once for each line of file B. You should be able to accomplish this task with a single pass through each file.
Try something like this:
awk -F'\t' -v OFS='\t' '
# first, read in data from file B
NR == FNR { values[$1 FS $2] = $3; next }
# then, output modified lines from matching lines in file A
($1 FS $2) in values { $3 = values[$1 FS $2]; print }
' fileB fileA
I'm assuming that you actually want to match with string equality instead of ~ pattern matching.
I only want it to print the line that was changed.
Simply put your print $0 statement to if clause body:
'{if ($1 ~ var1 && $2 ~ var2) { $3=var3; print $0 }}'
or even shorter:
'$1~var1 && $2~var2{ $3=var3; print $0 }'

awk command to split nth field

I am learning AWK and was trying some exercises on built-in string functions.
Here's my exercise:
I have a file containing as below
RecordType:83
1,2,3,a|x|y|z,4,5
And my desired output is as below:
RecordType:83
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5
I wrote an awk command for the above output.
awk -F',' '$1 ~ /RecordType:83/{print $0}
$1 == 1{
split($4,splt,"|")
for(i in splt)
{
if(i==1)
print $1,$2,$3,splt[i],$5,$6
else
print $1,0,0,splt[i],$5,$6
}
}' OFS=, file_name
The above command looks so clumsy. Is there any way minimizing the command?
Thanks in advance
The shortest possible one-liner I could manage:
awk -F, 'NR>1{n=split($4,a,"|");for(;i++<n;){$4=a[i];print;$2=$3=0}}NR==1' OFS=, file
RecordType:83    
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5
The much more readable script (recommended):
BEGIN {
FS=OFS="," # Comma delimiter
}
NR==1 { # If the first line in file
print $0 # Print the whole line
next # Skip to next line
}
{
n=split($4,a,"|") # Split field four on |
for(i=1;i<=n;i++) # For each sub-field
print $1,i==1?$2OFS$3:"0"OFS"0",a[i],$5,$6 # Print the output
}
another shorter one-liner
awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file
with your example:
kent$ awk -F, -v OFS="," 'NR>1{n=split($4,a,"|");while(++i<=n){$4=a[i];print;$2=$3=0}}NR==1' file
RecordType:83
1,2,3,a,4,5
1,0,0,x,4,5
1,0,0,y,4,5
1,0,0,z,4,5

How to print out a specific field in AWK?

A very simple question, which a found no answer to. How do I print out a specific field in awk?
awk '/word1/', will print out the whole sentence, when I need just a word1. Or I need a chain of patterns (word1 + word2) to be printed out only from a text.
Well if the pattern is a single word (which you want to print and can't contaion FS (input field separator)) why not:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print MYPATTERN }' INPUTFILE
If your pattern is a regex:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN { print gensub(".*(" MYPATTERN ").*","\\1","1",$0) }' INPUTFILE
If your pattern must be checked in every single field:
awk -v MYPATTERN="INSERT_YOUR_PATTERN" '$0 ~ MYPATTERN {
for (i=1;i<=NF;i++) {
if ($i ~ MYPATTERN) { print "Field " i " in " NR " row matches: " MYPATTERN }
}
}' INPUTFILE
Modify any of the above to your taste.
The fields in awk are represented by $1, $2, etc:
$ echo this is a string | awk '{ print $2 }'
is
$0 is the whole line, $1 is the first field, $2 is the next field ( or blank ),
$NF is the last field, $( NF - 1 ) is the 2nd to last field, etc.
EDIT (in response to comment).
You could try:
awk '/crazy/{ print substr( $0, match( $0, "crazy" ), RLENGTH )}'
i know you can do this with awk :
an alternative would be :
sed -nr "s/.*(PATTERN_TO_MATCH).*/\1/p" file
or you can use grep -o
Something like this perhaps:
awk '{split("bla1 bla2 bla3",a," "); print a[1], a[2], a[3]}'