I have a problem with the following awk syntax
echo " param1 param2 param3 = param1 AA , AB , AC , AD " | awk -F"=" '$2~/AA|AB|AC|AD/{print "passed"}'
The awk prints "passed", but it shouldn't be because after "=" I have "param1" and not "AA" or AB", etc.
The target of the awk is to print "passed" only if the string after "=" is AA OR AB OR AC OR AD.
and if I have something else after "=" then its not should print passed
how to fix the awk syntax?
lidia
You need anchors:
awk -F= '$2 ~ /^(AA|AB|AC|AD)$/ {print "passed"}'
If you want to allow spaces:
awk -F= '$2 ~ /^ *(AA|AB|AC|AD) *$/ {print "passed"}'
This should work:
echo " param1 param2 param3 = param1 AA , AB , AC , AD " |
awk -F"=" -v var="passed" '$2~/AA|AB|AC|AD/{printf "%s",var}'
Related
Hi all im new in awk can i ask i have a input file like this:
# ABC DEFG
value1 GH
value2 GH
value3 GH
# BCF SQW
value4 GH
value5 GH
# BEC YUW
value6 GH
value7 GH
Desire output:
##### ABC DEFG #####
DEFG_ABC
ABC_DEFG
value1 ABC
value1 DFG
value2 ABC
value2 DFG
value3 ABC
value3 DFG
##### BCF SQW #####
BCF_SQW
SQW_BCF
value4 BCF
value4 SQW
value5 BCF
value5 SQF
##### BEC YUW #####
BEC_YUW
YUW_BEC
value6 BEC
value6 YUW
value7 BEC
value7 YUW
I had seperate the file $2 and $3 in line have character # into array like this
awk '
/^#/ {
a[na++] = $2
b[nb++] = $3
}
END {
for(i = 0; i < na; i ++){
print ("######" a[i] " " b[i] "#####")
print (a[i] "_" b[i])
print (b[i] "_" a[i])
}
}
' input
But i dont know how to store the $1 of all line between "#" line to the array anyone how to make it ? Thank you so much
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
/^#/{
sub(/^#/,"##### ")
print $0," #####"
val1=$2
val2=$3
print val1"_"val2 ORS val2"_"val1
next
}
{
print $1,val1 ORS $1,val2
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
/^#/{ ##Checking condition if line starts from #.
sub(/^#/,"##### ") ##Substituting starting # with #####
print $0," #####" ##Printing current line follows by ##### here.
val1=$2 ##Creating val1 which has 2nd field in it.
val2=$3 ##Creating val2 which has 3rd field in it.
print val1"_"val2 ORS val2"_"val1 ##Printing val1 _ val2 newline val2 _ val1.
next ##next will skip further statements from here.
}
{
print $1,val1 ORS $1,val2 ##Printing 1st field val1 ORS 1st field val2 here.
}
' Input_file ##Mentioning Input_file name here.
I need to compare column 1 and column 2 of my file1.txt and file2.txt. If both columns match, print the entire row of file1.txt, but where a row in file1.txt is not present in file2.txt, also print that missing row in the output and add "0" as its value in third column.
# file1.txt #
AA ZZ
JB CX
CX YZ
BB XX
SU BY
DA XZ
IB KK
XY IK
TY AB
# file2.txt #
AA ZZ 222
JB CX 345
BB XX 3145
DA XZ 876
IB KK 234
XY IK 897
Expected output
# output.txt #
File1.txt
AA ZZ 222
JB CX 345
CX YZ 0
BB XX 3145
SU BY 0
DA XZ 376
IB KK 234
XY IK 897
TY AB 0
I tried this code but couldn't figure out how to add rows that did not match and add "0" to it
awk 'BEGIN { while ((getline <"file2.txt") > 0) {REC[$1]=$0}}{print REC[$1]}' < file1.txt > output.txt
With your shown samples, could you please try following.
awk '
FNR==NR{
arr[$1 OFS $2]
next
}
(($1 OFS $2) in arr){
print
arr1[$1 OFS $2]
}
END{
for(i in arr){
if(!(i in arr1)){
print i,0
}
}
}
' file1.txt file2.txt
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking FNR==NR condition which will be TRUE when file1.txt is being read.
arr[$1 OFS $2] ##Creating array with 1st and 2nd field here.
next ##next will skip all further statements from here.
}
(($1 OFS $2) in arr){ ##Checking condition if 1st and 2nd field of file2.txt is present in arr then do following.
print ##Print the current line here.
arr1[$1 OFS $2] ##Creating array arr1 with index of 1st and 2nd fields here.
}
END{ ##Starting END block of this program from here.
for(i in arr){ ##Traversing through arr all elements from here.
if(!(i in arr1)){ ##Checking if an element/key is NOT present in arr1 then do following.
print i,0 ##Printing index and 0 here.
}
}
}
' file1.txt file2.txt ##Mentioning Input_file names here.
You may try this awk:
awk '
FNR == NR {
map[$1,$2] = $3
next
}
{
print $1, $2, (($1,$2) in map ? map[$1,$2] : 0)
}' file2 file1
AA ZZ 222
JB CX 345
CX YZ 0
BB XX 3145
SU BY 0
DA XZ 876
IB KK 234
XY IK 897
TY AB 0
$ awk '
{ key = $1 FS $2 }
NR==FNR { map[key]=$3; next }
{ print $0, map[key]+0 }
' file2.txt file1.txt
AA ZZ 222
JB CX 345
CX YZ 0
BB XX 3145
SU BY 0
DA XZ 876
IB KK 234
XY IK 897
TY AB 0
Input
1473697,5342715,256,0.3
1473697,7028427,256,0.1
1473697,5342716,256,0.3
1473697,5342715,257,0.3
1473697,7028427,257,0.1
1473610,7028427,256,0.1
1473610,5342715,256,0.3
1473610,7028422,256,0.1
Output
1473697,256,5342715 0.3 7028427 0.1 5342716 0.3
1473697,257,5342715 0.3 7028427 0.1
1473610,256,7028427 0.1 5342715 0.3 7028422 0.1
OFS and FS is = ,
is there a way to search unique lines base on column 1 and 3
then print the line with the details from column 2 and 4
It took awhile to figure out what you want, but I think you're looking for:
awk '!a[$1 $3] {a[$1 $3] = $1","$3","}
{a[$1 $3] = a[$1 $3] " " $2 " " $4}
END {for(i in a) print a[i]}' FS=, input-file
or
awk '{a[$1","$3] = a[$1","$3] " " $2 " " $4}
END {for(i in a) print i","a[i]}' FS=, input-file
There are many variations on the theme.
In the awk below which executes as is and results in the current output, I am trying to add a condition that will extract the text or
value after the tags AF=,FR=, HRUN=,LEN=,TYPE= for the lines in each line of file1 compared to file2. As is the lines between
the two files are either a Match, Missing in file 1, or Missing in file2,but I can not add the conditions to extract up to the ; (semi-colon).
There may not always be text after the tags, but they always ends with a ;. The decimal in $6 is also 3 signifigant figures to make it easier to read. It seems
close but there are a couple things I am not quite sure how to do. Thank you :).
file1
chr1 43814978 COSM27286 G A 86.92679999999999 PASS
AF=0;AO=1;DP=5535;FAO=0;FR=.,REALIGNEDx0.008;HRUN=1;LEN=1;TYPE=snp;VARB=0;HS;
chr1 43814981 COSM27287 G A 86.83350000000002 PASS
AF=0;AO=2;DP=5556;FAO=0;FR=.;HRUN=1;LEN=1;TYPE=snp;VARB=0;HS;
chr1 43815008 COSM29008;COSM43212 TGG AAA,AAG 70.3099 PASS
AF=0,0;AO=0,0;DP=5528;FAO=0,0;FR=.,.,;HRUN=1,1;LEN=3,2,;TYPE=mnp,mnp;VARB=0,0;HS;
file2
chr1 43814979 COSM27286 G A 86.92679999999999 PASS
AF=0;AO=1;DP=5535;FAO=0;FR=.,REALIGNEDx0.008;HRUN=1;LEN=1;TYPE=snp;VARB=0;HS;
chr1 43814981 COSM27287 G A 86.83350000000002 PASS
AF=0;AO=2;DP=5556;FAO=0;FR=.;HRUN=1;LEN=1;TYPE=snp;VARB=0;HS;
chr1 43815008 COSM29008;COSM43212 TGG AAA,AAG 70.3099 PASS
AF=0,0;AO=0,0;DP=5528;FAO=0,0;FR=.,.,;HRUN=1,1;LEN=3,2,;TYPE=mnp,mnp;VARB=0,0;HS;
desired output
Match:
chr1 43814981 COSM27287 G A 86.8 PASS
AF=0;FR=.;HRUN=1;LEN=1;TYPE=snp
chr1 43815008 COSM29008;COSM43212 TGG AAA,AAG 70.3099 PASS
AF=0,0;FR=.,.,;HRUN=1,1;LEN=3,2,;TYPE=mnp,mnp
Missing in file1:
chr1 43814979 COSM27286 G A 86.9 PASS
AF=0;FR=.,REALIGNEDx0.008;HRUN=1;LEN=1;TYPE=snp
Missing in file2:
chr1 43814978 COSM27286 G A 86.9 PASS
AF=0;FR=.,REALIGNEDx0.008;HRUN=1;LEN=1;TYPE=snp
awk
awk 'FNR==1 { next }
FNR == NR { file1[$1,$2,$3,$4,$5,$6,$7] = $1 " " $2 " " $3 " " $4 " " $5 " " $6 " "$7 }
FNR != NR { file2[$1,$2,$3,$4,$5,$6,$7] = $1 " " $2 " " $3 " " $4 " " $5 " " $6 " "$7 }
END { print "Match:"; for (k in file1) if (k in file2) print file1[k] # Or file2[k]
print "Missing in file1:"; for (k in file2) if (!(k in file1)) print file2[k]
print "Missing in file2:"; for (k in file1) if (!(k in file2)) print file1[k]
}' file1 file2 > output
Current output
Match:
chr1 43814981 COSM27287 G A 86.83350000000002 PASS
chr1 43815008 COSM29008;COSM43212 TGG AAA,AAG 70.3099 PASS
Missing in File1:
chr1 43814979 COSM27286 G A 86.92679999999999 PASS
Missing in File2:
chr1 43814978 COSM27286 G A 86.92679999999999 PASS
try:
awk 'FNR==NR{
a[$1,$2,$7]=$1 FS $2 FS $3 FS $4 FS $5 FS $6 FS $7;
next
}
(($1,$2,$7) in a){
val_match=val_match?val_match ORS a[$1,$2,$7]:a[$1,$2,$7];
delete a[$1,$2,$7];
next
}
{
val_mismatch_in_file1=val_mismatch_in_file1?val_mismatch_in_file1 ORS $1 FS $2 FS $3 FS $4 FS $5 FS $6 FS $7:$1 FS $2 FS $3 FS $4 FS $5 FS $6 FS $7;
}
END{
for(i in a){
val_missing_in_file2=val_missing_in_file2?a[i]:a[i]};
print "Match:" RS val_match RS "Missing in File1:" RS val_mismatch_in_file1 RS "Missing in File2:" RS val_missing_in_file2
}
' Input_file1 Input_file2
Output will be as follows.
Match:
chr1 43814981 COSM27287 G A 86.83350000000002 PASS
chr1 43815008 COSM29008;COSM43212;COSM19193;COSM27289;COSM28487 TGG AAA,AAG,AGG,CGG,GCG 70.3099 PASS
Missing in File1:
chr1 43814979 COSM27286 G A 86.92679999999999 PASS
Missing in File2:
chr1 43814978 COSM27286 G A 86.92679999999999 PASS
I have a file containing lines like
a x1
b x1
q xq
c x1
b x2
c x2
n xn
c x3
I would like to test on the fist field in each line, and if there is a match I would like to append the matching lines to the first line. The output should look like
a x1
b x1 b x2
q xq
c x1 c x2 c x3
n xn
any help will be greatly appreciated
Using awk you can do this:
awk '{arr[$1]=arr[$1]?arr[$1] " " $0:$0} END {for (i in arr) print arr[i]}' file
n xn
a x1
b x1 b x2
c x1 c x2 c x3
q xq
To preserve input ordering:
$ awk '
{
if ($1 in vals) {
prev = vals[$1] " "
}
else {
prev = ""
keys[++k] = $1
}
vals[$1] = prev $0
}
END {
for (k=1;k in keys;k++)
print vals[keys[k]]
}
' file
a x1
b x1 b x2
q xq
c x1 c x2 c x3
n xn
What I ended up doing. (The answers by Ed Morton and Jonte are obviously more elegant.)
First I saved the 1st column of the input file in a separate file.
awk '{print $1}' input.file.txt > tmp0
Then saved the input file with lines, which has duplicate values at $1 field, removed.
awk 'BEGIN { FS = "\t" }; !x[$1]++ { print $0}' input_file.txt > tmp1
Then saved all the lines with duplicate $1 field.
awk 'BEGIN { FS = "\t" }; x[$1]++ { print $0}' input_file.txt >tmp2
Then saved the $1 fields of the non-duplicate file (tmp1).
awk '{ print $1}' tmp1 > tmp3
I used a for loop to pull in lines from the duplicate file (tmp2) and the duplicates removed file (tmp1) into an output file.
for i in $(cat tmp3)
do
if [ $(grep -w $i tmp0 | wc -l) = 1 ] #test for single instance in the 1st col of input file
then
echo "$(grep -w $i tmp1)" >> output.txt #if single then pull that record from no dupes
else
echo -e "$(grep -w $i tmp1) \t $(grep -w $i tmp2 | awk '{
printf $0"\t" }; END { printf "\n" }')" >> output.txt # if not single then pull that record from no_dupes first then all the records from dupes in a single line.
fi
done
Finally remove the tmp files
rm tmp* # remove all the tmp files