awk + match string after “=” separator - awk

I have a problem with the following awk syntax
echo " param1 param2 param3 = param1 AA , AB , AC , AD " | awk -F"=" '$2~/AA|AB|AC|AD/{print "passed"}'
The awk prints "passed", but it shouldn't be because after "=" I have "param1" and not "AA" or AB", etc.
The target of the awk is to print "passed" only if the string after "=" is AA OR AB OR AC OR AD.
and if I have something else after "=" then its not should print passed
how to fix the awk syntax?
lidia

You need anchors:
awk -F= '$2 ~ /^(AA|AB|AC|AD)$/ {print "passed"}'
If you want to allow spaces:
awk -F= '$2 ~ /^ *(AA|AB|AC|AD) *$/ {print "passed"}'

This should work:
echo " param1 param2 param3 = param1 AA , AB , AC , AD " |
awk -F"=" -v var="passed" '$2~/AA|AB|AC|AD/{printf "%s",var}'

Related

Is awk 2 dimension array or something similar to store value?

Hi all im new in awk can i ask i have a input file like this:
# ABC DEFG
value1 GH
value2 GH
value3 GH
# BCF SQW
value4 GH
value5 GH
# BEC YUW
value6 GH
value7 GH
Desire output:
##### ABC DEFG #####
DEFG_ABC
ABC_DEFG
value1 ABC
value1 DFG
value2 ABC
value2 DFG
value3 ABC
value3 DFG
##### BCF SQW #####
BCF_SQW
SQW_BCF
value4 BCF
value4 SQW
value5 BCF
value5 SQF
##### BEC YUW #####
BEC_YUW
YUW_BEC
value6 BEC
value6 YUW
value7 BEC
value7 YUW
I had seperate the file $2 and $3 in line have character # into array like this
awk '
/^#/ {
a[na++] = $2
b[nb++] = $3
}
END {
for(i = 0; i < na; i ++){
print ("######" a[i] " " b[i] "#####")
print (a[i] "_" b[i])
print (b[i] "_" a[i])
}
}
' input
But i dont know how to store the $1 of all line between "#" line to the array anyone how to make it ? Thank you so much
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
/^#/{
sub(/^#/,"##### ")
print $0," #####"
val1=$2
val2=$3
print val1"_"val2 ORS val2"_"val1
next
}
{
print $1,val1 ORS $1,val2
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^#/{ ##Checking condition if line starts from #.
sub(/^#/,"##### ") ##Substituting starting # with #####
print $0," #####" ##Printing current line follows by ##### here.
val1=$2 ##Creating val1 which has 2nd field in it.
val2=$3 ##Creating val2 which has 3rd field in it.
print val1"_"val2 ORS val2"_"val1 ##Printing val1 _ val2 newline val2 _ val1.
next ##next will skip further statements from here.
}
{
print $1,val1 ORS $1,val2 ##Printing 1st field val1 ORS 1st field val2 here.
}
' Input_file ##Mentioning Input_file name here.

Compare two columns of two files, print the row if it matches and print zero in third column

I need to compare column 1 and column 2 of my file1.txt and file2.txt. If both columns match, print the entire row of file1.txt, but where a row in file1.txt is not present in file2.txt, also print that missing row in the output and add "0" as its value in third column.
# file1.txt #
AA ZZ
JB CX
CX YZ
BB XX
SU BY
DA XZ
IB KK
XY IK
TY AB
# file2.txt #
AA ZZ 222
JB CX 345
BB XX 3145
DA XZ 876
IB KK 234
XY IK 897
Expected output
# output.txt #
File1.txt
AA ZZ 222
JB CX 345
CX YZ 0
BB XX 3145
SU BY 0
DA XZ 376
IB KK 234
XY IK 897
TY AB 0
I tried this code but couldn't figure out how to add rows that did not match and add "0" to it
awk 'BEGIN { while ((getline <"file2.txt") > 0) {REC[$1]=$0}}{print REC[$1]}' < file1.txt > output.txt
With your shown samples, could you please try following.
awk '
FNR==NR{
arr[$1 OFS $2]
next
}
(($1 OFS $2) in arr){
print
arr1[$1 OFS $2]
}
END{
for(i in arr){
if(!(i in arr1)){
print i,0
}
}
}
' file1.txt file2.txt
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking FNR==NR condition which will be TRUE when file1.txt is being read.
arr[$1 OFS $2] ##Creating array with 1st and 2nd field here.
next ##next will skip all further statements from here.
}
(($1 OFS $2) in arr){ ##Checking condition if 1st and 2nd field of file2.txt is present in arr then do following.
print ##Print the current line here.
arr1[$1 OFS $2] ##Creating array arr1 with index of 1st and 2nd fields here.
}
END{ ##Starting END block of this program from here.
for(i in arr){ ##Traversing through arr all elements from here.
if(!(i in arr1)){ ##Checking if an element/key is NOT present in arr1 then do following.
print i,0 ##Printing index and 0 here.
}
}
}
' file1.txt file2.txt ##Mentioning Input_file names here.
You may try this awk:
awk '
FNR == NR {
map[$1,$2] = $3
next
}
{
print $1, $2, (($1,$2) in map ? map[$1,$2] : 0)
}' file2 file1
AA ZZ 222
JB CX 345
CX YZ 0
BB XX 3145
SU BY 0
DA XZ 876
IB KK 234
XY IK 897
TY AB 0
$ awk '
{ key = $1 FS $2 }
NR==FNR { map[key]=$3; next }
{ print $0, map[key]+0 }
' file2.txt file1.txt
AA ZZ 222
JB CX 345
CX YZ 0
BB XX 3145
SU BY 0
DA XZ 876
IB KK 234
XY IK 897
TY AB 0

manipulate input file and create a new file using awk

Input
1473697,5342715,256,0.3
1473697,7028427,256,0.1
1473697,5342716,256,0.3
1473697,5342715,257,0.3
1473697,7028427,257,0.1
1473610,7028427,256,0.1
1473610,5342715,256,0.3
1473610,7028422,256,0.1
Output
1473697,256,5342715 0.3 7028427 0.1 5342716 0.3
1473697,257,5342715 0.3 7028427 0.1
1473610,256,7028427 0.1 5342715 0.3 7028422 0.1
OFS and FS is = ,
is there a way to search unique lines base on column 1 and 3
then print the line with the details from column 2 and 4
It took awhile to figure out what you want, but I think you're looking for:
awk '!a[$1 $3] {a[$1 $3] = $1","$3","}
{a[$1 $3] = a[$1 $3] " " $2 " " $4}
END {for(i in a) print a[i]}' FS=, input-file
or
awk '{a[$1","$3] = a[$1","$3] " " $2 " " $4}
END {for(i in a) print i","a[i]}' FS=, input-file
There are many variations on the theme.

awk to extract tags between two files

In the awk below which executes as is and results in the current output, I am trying to add a condition that will extract the text or
value after the tags AF=,FR=, HRUN=,LEN=,TYPE= for the lines in each line of file1 compared to file2. As is the lines between
the two files are either a Match, Missing in file 1, or Missing in file2,but I can not add the conditions to extract up to the ; (semi-colon).
There may not always be text after the tags, but they always ends with a ;. The decimal in $6 is also 3 signifigant figures to make it easier to read. It seems
close but there are a couple things I am not quite sure how to do. Thank you :).
file1
chr1 43814978 COSM27286 G A 86.92679999999999 PASS
AF=0;AO=1;DP=5535;FAO=0;FR=.,REALIGNEDx0.008;HRUN=1;LEN=1;TYPE=snp;VARB=0;HS;
chr1 43814981 COSM27287 G A 86.83350000000002 PASS
AF=0;AO=2;DP=5556;FAO=0;FR=.;HRUN=1;LEN=1;TYPE=snp;VARB=0;HS;
chr1 43815008 COSM29008;COSM43212 TGG AAA,AAG 70.3099 PASS
AF=0,0;AO=0,0;DP=5528;FAO=0,0;FR=.,.,;HRUN=1,1;LEN=3,2,;TYPE=mnp,mnp;VARB=0,0;HS;
file2
chr1 43814979 COSM27286 G A 86.92679999999999 PASS
AF=0;AO=1;DP=5535;FAO=0;FR=.,REALIGNEDx0.008;HRUN=1;LEN=1;TYPE=snp;VARB=0;HS;
chr1 43814981 COSM27287 G A 86.83350000000002 PASS
AF=0;AO=2;DP=5556;FAO=0;FR=.;HRUN=1;LEN=1;TYPE=snp;VARB=0;HS;
chr1 43815008 COSM29008;COSM43212 TGG AAA,AAG 70.3099 PASS
AF=0,0;AO=0,0;DP=5528;FAO=0,0;FR=.,.,;HRUN=1,1;LEN=3,2,;TYPE=mnp,mnp;VARB=0,0;HS;
desired output
Match:
chr1 43814981 COSM27287 G A 86.8 PASS
AF=0;FR=.;HRUN=1;LEN=1;TYPE=snp
chr1 43815008 COSM29008;COSM43212 TGG AAA,AAG 70.3099 PASS
AF=0,0;FR=.,.,;HRUN=1,1;LEN=3,2,;TYPE=mnp,mnp
Missing in file1:
chr1 43814979 COSM27286 G A 86.9 PASS
AF=0;FR=.,REALIGNEDx0.008;HRUN=1;LEN=1;TYPE=snp
Missing in file2:
chr1 43814978 COSM27286 G A 86.9 PASS
AF=0;FR=.,REALIGNEDx0.008;HRUN=1;LEN=1;TYPE=snp
awk
awk 'FNR==1 { next }
FNR == NR { file1[$1,$2,$3,$4,$5,$6,$7] = $1 " " $2 " " $3 " " $4 " " $5 " " $6 " "$7 }
FNR != NR { file2[$1,$2,$3,$4,$5,$6,$7] = $1 " " $2 " " $3 " " $4 " " $5 " " $6 " "$7 }
END { print "Match:"; for (k in file1) if (k in file2) print file1[k] # Or file2[k]
print "Missing in file1:"; for (k in file2) if (!(k in file1)) print file2[k]
print "Missing in file2:"; for (k in file1) if (!(k in file2)) print file1[k]
}' file1 file2 > output
Current output
Match:
chr1 43814981 COSM27287 G A 86.83350000000002 PASS
chr1 43815008 COSM29008;COSM43212 TGG AAA,AAG 70.3099 PASS
Missing in File1:
chr1 43814979 COSM27286 G A 86.92679999999999 PASS
Missing in File2:
chr1 43814978 COSM27286 G A 86.92679999999999 PASS
try:
awk 'FNR==NR{
a[$1,$2,$7]=$1 FS $2 FS $3 FS $4 FS $5 FS $6 FS $7;
next
}
(($1,$2,$7) in a){
val_match=val_match?val_match ORS a[$1,$2,$7]:a[$1,$2,$7];
delete a[$1,$2,$7];
next
}
{
val_mismatch_in_file1=val_mismatch_in_file1?val_mismatch_in_file1 ORS $1 FS $2 FS $3 FS $4 FS $5 FS $6 FS $7:$1 FS $2 FS $3 FS $4 FS $5 FS $6 FS $7;
}
END{
for(i in a){
val_missing_in_file2=val_missing_in_file2?a[i]:a[i]};
print "Match:" RS val_match RS "Missing in File1:" RS val_mismatch_in_file1 RS "Missing in File2:" RS val_missing_in_file2
}
' Input_file1 Input_file2
Output will be as follows.
Match:
chr1 43814981 COSM27287 G A 86.83350000000002 PASS
chr1 43815008 COSM29008;COSM43212;COSM19193;COSM27289;COSM28487 TGG AAA,AAG,AGG,CGG,GCG 70.3099 PASS
Missing in File1:
chr1 43814979 COSM27286 G A 86.92679999999999 PASS
Missing in File2:
chr1 43814978 COSM27286 G A 86.92679999999999 PASS

awk: printing lines side by side when the first field is the same in the records

I have a file containing lines like
a x1
b x1
q xq
c x1
b x2
c x2
n xn
c x3
I would like to test on the fist field in each line, and if there is a match I would like to append the matching lines to the first line. The output should look like
a x1
b x1 b x2
q xq
c x1 c x2 c x3
n xn
any help will be greatly appreciated
Using awk you can do this:
awk '{arr[$1]=arr[$1]?arr[$1] " " $0:$0} END {for (i in arr) print arr[i]}' file
n xn
a x1
b x1 b x2
c x1 c x2 c x3
q xq
To preserve input ordering:
$ awk '
{
if ($1 in vals) {
prev = vals[$1] " "
}
else {
prev = ""
keys[++k] = $1
}
vals[$1] = prev $0
}
END {
for (k=1;k in keys;k++)
print vals[keys[k]]
}
' file
a x1
b x1 b x2
q xq
c x1 c x2 c x3
n xn
What I ended up doing. (The answers by Ed Morton and Jonte are obviously more elegant.)
First I saved the 1st column of the input file in a separate file.
awk '{print $1}' input.file.txt > tmp0
Then saved the input file with lines, which has duplicate values at $1 field, removed.
awk 'BEGIN { FS = "\t" }; !x[$1]++ { print $0}' input_file.txt > tmp1
Then saved all the lines with duplicate $1 field.
awk 'BEGIN { FS = "\t" }; x[$1]++ { print $0}' input_file.txt >tmp2
Then saved the $1 fields of the non-duplicate file (tmp1).
awk '{ print $1}' tmp1 > tmp3
I used a for loop to pull in lines from the duplicate file (tmp2) and the duplicates removed file (tmp1) into an output file.
for i in $(cat tmp3)
do
if [ $(grep -w $i tmp0 | wc -l) = 1 ] #test for single instance in the 1st col of input file
then
echo "$(grep -w $i tmp1)" >> output.txt #if single then pull that record from no dupes
else
echo -e "$(grep -w $i tmp1) \t $(grep -w $i tmp2 | awk '{
printf $0"\t" }; END { printf "\n" }')" >> output.txt # if not single then pull that record from no_dupes first then all the records from dupes in a single line.
fi
done
Finally remove the tmp files
rm tmp* # remove all the tmp files