If statement in GAWK gives an error - awk

I have this piece of code:
gawk '{if (match($5,/hola/,a) && $6=="hola") {print $2"\t"$1"\t"$2"\t"$1"\t"$3} else if `(match($5,/(_[joxT]+\.[0-9]*)/,a) && match($6,/(_[joxG]+\.[0-9]*)/,b)) {print $2""a[1]"\t"$1""b[1]} else (match($5,/(_[joxT]+\.[0-9]*)/,a) && $6=="hola") {print "hola"}}' pasted`
I'm getting this error:
gawk: cmd. line:1: {if (match($5,/hola/,a) && $6=="hola") {print $2"\t"$1"\t"$2"\t"$1"\t"$3} else if (match($5,/(_[joxT]+\.[0-9]*)/,a) && match($6,/(_[joxG]+\.[0-9]*)/,b)) {print $2""a[1]"\t"$1""b[1]} else (match($5,/(_[joxT]+\.[0-9]*)/,a)) {print $1}}
gawk: cmd. line:1: ^ syntax error
Do you know where the error is?
Thanks.

Take pity on the next guy to maintain your code and indent. Not every program needs to be expressed on one line.
gawk '
BEGIN {OFS = '\t'}
{
if ($5 ~ /hola/ && $6 == "hola") {
print $2, $1, $2, $1, $3
}
else if (match($5, /(_[joxT]+\.[0-9]*)/, a) && match($6, /(_[joxG]+\.[0-9]*)/, b)) {
print $2 a[1], $1 b[1]
}
else if ($5 ~ /(_[joxT]+\.[0-9]*)/ && $6 == "hola") {
print "hola"
}
}
' pasted
Here, only using match() when you need to capture part of the match.

gawk '{if (match($5,/hola/,a) && $6=="hola") {print $2"\t"$1"\t"$2"\t"$1"\t"$3} else if `(match($5,/(_[joxT]+\.[0-9]*)/,a) && match($6,/(_[joxG]+\.[0-9]*)/,b)) {print $2""a[1]"\t"$1""b[1]} else if (match($5,/(_[joxT]+\.[0-9]*)/,a) && $6=="hola") {print "hola"}}' pasted`

Related

Print rows where one column is the same but another is different

Given files test1 and test2:
$ cat test*
alert_name,id,severity
bar,2,1
foo,1,0
alert_name,id,severity
foo,1,9
bar,2,1
I want to find rows where name is the same but severity has changed (ie foo) and print the change. I have got this far using awk:
awk 'BEGIN{FS=","} FNR>1 && NR==FNR { a[$1]; next } ($1 in a) && ($3 != a[$3]) {printf "Alert %s severity from %s to %s\n", $1, a[$3], $3}' test1 test2
which prints:
Alert foo severity from to 9
Alert bar severity from to 1
So the match is wrong, and I can't print a[$3].
You may try this awk:
awk -F, '$1 in sev && $3 != sev[$1] {
printf "Alert %s severity from %s to %s\n", $1, sev[$1], $3
}
{sev[$1] = $3}' test*
Alert foo severity from 0 to 9
mawk 'BEGIN { _+=(_^=FS=OFS=",")+_ }
FNR == NR || +$_<=+___[__=$!!_] ? !_*(___[$!!_]=$_) : \
$!_ = "Alert "__ " severity from "___[__]" to " $_' files*.txt
Alert foo severity from 0 to 9

Awk syntax error with time layout from logs

I'm getting an syntax error with awk when I'm doing this one liner :
awk '{ if ($3 == '16' && $4 == '23:59:44') {print $0} }' /var/log/radius/radius.log
it gives me a syntax error from the time field. However, when I'm doing:
awk '{ print $4 }' /var/log/radius/radius.log
this gives me the proper format for the time hh:mm:ss so I don't understand why it doesn't work from my one liner ?
Cheers!
Single quotes ' should be used to start and end the awk's main program, you should use " instead here for comparison.
OP's code fix:
awk '{ if ($3 == 16 && $4 == "23:59:44") {print $0} }' Input_file
OR above could be shorten to(awk sh way to do):
awk '($3 == 16 && $4 == "23:59:44")' Input_file

Find "complete cases" with awk

Using awk, how can I output the lines of a file that have all fields non-null without manually specifying each column?
foo.dat
A||B|X
A|A|1|
|1|2|W
A|A|A|B
Should return:
A|A|A|B
In this case we can do:
awk -F"|" -v OFS="|" '$1 != "" && $2 != "" && $3 != "" && $4 != "" { print }' foo.dat
But is there a way to do this without specifying each column?
You can loop over all fields and skip the record if any of the fields are empty:
$ awk -F'|' '{ for (i=1; i<=NF; ++i) { if (!$i) next } }1' foo.dat
A|A|A|B
if (!$i) is "if field i is not non-empty", and 1 is short for "print the line", but it is only hit if next was not executed for any of the fields of the current line.
Another in awk:
$ awk -F\| 'gsub(/[^|]+(\||$)/,"&")==NF' file
A|A|A|B
print record if there are NF times | terminating (non-empty, |-excluding) strings.
awk '!/\|\|/&&!/\|$/&&!/^\|/' file
A|A|A|B

awk to improve command print Match and Non-Match case:

Would like to read and compare first field from two files then print
Match Lines from Both the files - ( Available in f11.txt and f22.txt) -> Op_Match.txt
Non- Match Lines from f11.txt ( Available in f11.txt Not-Available in f22.txt)-> Op_NonMatch_f11.txt
Non- Match Lines from f22.txt ( Available in f22.txt Not-Available in f11.txt)-> Op_NonMatch_f22.txt
Using below 3 separate commands to achieve the above scenario's .
f11.txt
10,03-APR-14,abc
20,02-JUL-13,def
10,19-FEB-14,abc
20,02-AUG-13,def
10,22-JAN-07,abc
10,29-JUN-07,abc
40,11-SEP-13,ghi
f22.txt
50,DL,3000~4332,ABC~XYZ
10,DL,5000~2503,ABC~XYZ
30,AL,2000~2800,DEF~PQZ
To Match Lines from Both the files:
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1] = $0; next} ($1 in a) {print $0,a[$1]}' f22.txt f11.txt> Op_Match.txt
10,03-APR-14,abc,10,DL,5000~2503,ABC~XYZ
10,19-FEB-14,abc,10,DL,5000~2503,ABC~XYZ
10,22-JAN-07,abc,10,DL,5000~2503,ABC~XYZ
10,29-JUN-07,abc,10,DL,5000~2503,ABC~XYZ
To Non- Match Lines from f11.txt:
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1] = $0; next} !($1 in a) {print $0}' f22.txt f11.txt > Op_NonMatch_f11.txt
20,02-JUL-13,def
20,02-AUG-13,def
40,11-SEP-13,ghi
To Non- Match Lines from f22.txt:
awk ' BEGIN {FS = OFS = ","} FNR==NR {a[$1] = $0; next} !($1 in a) {print $0}' f11.txt f22.txt > Op_NonMatch_f22.txt
50,DL,3000~4332,ABC~XYZ
30,AL,2000~2800,DEF~PQZ
Using above 3 separate commands to achieve the mentioned scenario’s. Is there any simplest way to avoid 3 different commands? Any Suggestions ...!!!
Something like this, untested:
awk '
BEGIN{ FS=OFS="," }
NR==FNR {
fname1 = FILENAME
keys[NR] = $1
recs[NR] = $0
key2nrs[$1] = ($1 in key2nrs ? key2nrs[$1] RS : "") NR
next
}
{
if ($1 in key2nrs) {
split (key2nrs[$1],nrs,RS)
for (i=1; i in nrs; i++) {
print recs[nrs[i]], $0 > "Op_Match.txt"
}
matched[$1]
}
else {
print > ("Op_NonMatch_" FILENAME ".txt")
}
}
END {
for (i=1; i in recs; i++) {
if (! (keys[i] in matched) ) {
print recs[i] > ("Op_NonMatch_" fname1 ".txt")
}
}
}
' f11.txt f22.txt
The main difference between this and Kent and Etans answers is that theirs assume that the $1 in f22.txt can only appear once within that file while the above would work if, say, 10 occurred as the first field on multiple lines of f22.txt.
The other difference is that the above will output lines in the same order that they occurred in the input files while the other answers will output some of them in random order based on how they're stored internally in a hash table.
I haven't checked #EdMorton's answer but he will quite likely have gotten it right.
My solution (which looks slightly less generic than his at first glance) is:
awk -F, '
FNR==NR {
a[$1]=$0;
next
}
($1 in a){
print $0,a[$1] > "Op_Match.txt"
am[$1]++
}
!($1 in a) {
print $0 > "Op_NonMatch_f11.txt"
}
END {
for (i in a) {
if (!(i in am)) {
print a[i] > "Op_NonMatch_f22.txt"
}
}
}
' f22.txt f11.txt
here is one:
awk -F, -v OFS="," 'NR==FNR{a[$1]=$0;next}
$1 in a{print $0,a[$1]>("common.txt");c[$1];next}
{print $0>("NonMatchFromFile1.txt")}
END{for(x in a)
if(!(x in c))
print a[x]>("NonMatchFromFile2.txt")}' f2 f1
with this, you will get 3 files: common.txt, nonmatchfromFile1.txt and nonMatchfromfile2.txt

changing the appearance of awk output

I used the following code to extract protein residues from text files.
awk '{
if (FNR == 1 ) print ">" FILENAME
if ($5 == 1 && $4 > 30) {
printf $3
}
}
END { printf "\n"}' protein/*.txt > seq.txt
I got the following output when I used the above code.
>1abd
MDEKRRAQHNEVERRRRDKINNWIVQLSKIIPDSSMESTKSGQSKGGILSKASDYIQELRQSNHR>1axc
RQTSMTDFYHSKRRLIFS>1bxc
RQTSMTDFYHSKRRLIFSPRR>1axF
RQTSMTDFYHSKRR>1qqt
ARPYQGVRVKEPVKELLRRKRG
I would like to get the output as shown below.How do I change the above code to get the following output?
>1abd
MDEKRRAQHNEVERRRRDKINNWIVQLSKIIPDSSMESTKSGQSKGGILSKASDYIQELRQSNHR
>1axc
RQTSMTDFYHSKRRLIFS
>1bxc
RQTSMTDFYHSKRRLIFSPRR
>1axF
RQTSMTDFYHSKRR
>1qqt
ARPYQGVRVKEPVKELLRRKRG
This might work for you:
awk '{
if (FNR == 1 ) print newline ">" FILENAME
if ($5 == 1 && $4 > 30) {
newline="\n";
printf $3
}
}
END { printf "\n"}' protein/*.txt > seq.txt
With gawk version 4, you can write:
gawk '
BEGINFILE {print ">" FILENAME}
($5 == 1 && $4 > 30) {printf "%s", $3}
ENDFILE {print ""}
' filename ...
http://www.gnu.org/software/gawk/manual/html_node/BEGINFILE_002fENDFILE.html#BEGINFILE_002fENDFILE