Awk: Print to different files using multiple defined variables

Awk: Print to different files using multiple defined variables - variables

I am opening up a file and checking if the items in columns 1 & 8 match certain specs. If yes, write output to a file x. If the items in column 1 match specs but column 8 does not match the specs, write output to file y.
I am defining multiple variables (awk -v v=$var,f1=$file,f2=$output), and I believe how I reference f1 & f2 is the problem. If I remove the quotes:
print $0 >> f2
awk: cmd. line:5: (FILENAME=- FNR=2) fatal: expression for `>>' redirection has null string value
If I put in a $:
print $0 >> $f2
I end up with a bunch of files with odd names that I don't want, and the files I do want are empty (except for the echoed line).
if I put "":
print $0 >> "f2"
The files I want are almost empty, and it creates a file called f2.
#!/bin/bash
output="output.txt"
echo -e "C1\tSeqID\tAminoAcid\tCD1\tCD2\tCD3\tGene\tEnvironment\tFilename" > $output
inputFile="input.txt.gz"
for var in A B C D E F G H I J K L
do
file=$var".txt"
echo -e "C1\tSeqID\tAA\tCD1\tCD2\tCD3\tGene\tEnvironment\tFilename" > $file
#---Wrong, forgot to catch $8 != v
#zcat $inputFile | awk -v v=$var '{
# if ($8 == v && ($1 == "V1" || $1 == "V2" || $1 == "V3" || $1 == "V4" || $1 == "V5" || $1 == "V6" || $1 == "V7" || $1 == "V8" || $1 == "V9" || $1 == "V10"))
# print $0
# }' | tee -a $file $output
zcat $inputFile | awk -v v=$var,f1=$file,f2=$output '{
if ($8 == v && ($1 == "V1" || $1 == "V2" || $1 == "V3" || $1 == "V4" || $1 == "V5" || $1 == "V6" || $1 == "V7" || $1 == "V8" || $1 == "V9" || $1 == "V10"))
print $0 >> "file"
else if ($8 != v && ($1 == "V1" || $1 == "V2" || $1 == "V3" || $1 == "V4" || $1 == "V5" || $1 == "V6" || $1 == "V7" || $1 == "V8" || $1 == "V9" || $1 == "V10"))
print $0 >> "f2"
}'
gzip $file
done
gzip $output
I can run through the loop and have two separate awk commands that write to different files. However, it is a very large file (4G compressed) and it is more efficient to use my current approach (or something similar to it). Any guidance on how to reference the 2nd & 3rd variable are greatly appreciated.

Use separate -vs:
awk -v v="$var" -v f1="$file" -v f2="$output" '...'

% awk -v v=qw,f1=we,f2=as 'BEGIN{print v, "*", f1, "*", f2}'
qw,f1=we,f2=as * *
% awk -v v=qw -v f1=we -v f2=as 'BEGIN{print v, "*", f1, "*", f2}'
qw * we * as
%
Do you need anything else to proceed?

Related

Compare multiple columns from one file with multiple columns of another file using awk?

I want to compare first 2 characters of col1 of file1 with col1 of file2 if col3 of file1 is same as col3 of file2 , provided col4 in file2 equals to TRUE. I tried something :-
awk -F'|' 'BEGIN{OFS=FS};(NR==FNR)
{a[substr($1,1,2),$3]=$1;next}(($1,$3)in a) && $4==TRUE ' file1 file2 > outfield
file 1
AE1267453617238|BIDKFXXXX|United Arab Emirates|
PL76UTYVJDYGHU9|ABSFXXJBW|Poland|
GB76UTRTSCLSKJ|FVDGXXXUY|Russia|
file 2
AE|^AE[0-9]{2}[0-9]{24}|United Arab Emirates|TRUE|
PL|^PL[0-9]{2}[A-Z]{10}[0-9]{4}|Poland|FALSE|
GB|^GB[0-9]{2}[A-Z]{5}[0-9]{3}|Europe|TRUE
expected output :-
AE1267453617238|BIDKFXXXX|United Arab Emirates|

You could just simply cascade the multiple conditions with a && as below. Remember your expected output is on the first file, so you need to process the second file first
awk -F'|' ' FNR == NR {
if ( $4 == "TRUE" ) m[$1] = $3 ; next }{ k = substr($1,1,2) } k in m && m[k] == $3' file2 file1
The part m[$1] = $3 creates a hash-map of the $1 with the value of $3 in the second file, which is then used in the first file to compare against only the first two characters of $1 i.e. substr($1,1,2). To avoid redundant use of substr(..), the value is extracted into a variable k and reused subsequently.

If the matches must be on the same line number in each file:
awk -F \| '
FNR==NR && $4 == "TRUE" {a[NR,1]=$1; a[NR,3]=$3}
FNR!=NR && $3 == a[FNR,3] &&
$1 ~ "^"a[FNR,1]' file2 file1
If the matches can be on any line (every line of file1 is checked against every line of file2, duplicate matches aren't printed):
awk -F \| '
FNR==NR {++l}
FNR==NR && $4 == "TRUE" {a[NR,1]=$1; a[NR,3]=$3}
FNR!=NR {
for (i=1; i<=l; ++i) {
if ($3 == a[i,3] && $1 ~ "^"a[i,1])
c[$0]==0
}
}
END {
for (i in c)
print i
}' file2 file1
Note the order files are given. file2 (which contains TRUE and FALSE), goes first. I also used regex instead of substr, so the characters should be alphanumeric only, if not, go back to substr.

Regarding your code:
awk -F'|' 'BEGIN{OFS=FS};(NR==FNR)
{a[substr($1,1,2),$3]=$1;next}(($1,$3)in a) && $4==TRUE ' file1 file2 > outfield
newlines matter to awk. This:
NR==FNR
{ print }
is not the same as this:
NR==FNR { print }
The first one is actually the same as:
NR==FNR { print }
1 { print }
Also when you want to output the contents of a file (file1 in your case) it's usually better to read the OTHER file into memory and then compare the values from the target file against that so you can just print it as you go. So you should be doing awk 'script' file2 file1, not awk 'script' file1 file2, and writing a script based on that.
Try this:
$ cat tst.awk
BEGIN { FS="|" }
NR==FNR {
if ( $4 == "TRUE" ) {
map[$1] = $3
}
next
}
{ key = substr($1,1,2) }
(key in map) && (map[key] == $3)
$ awk -f tst.awk file2 file1
AE1267453617238|BIDKFXXXX|United Arab Emirates|

awk -F\| '
NR==FNR{
a[$3,1]=$1;
a[$3,4]=$4;
next
}
substr($1,1,2) == a[$3,1] && a[$3,4] == "TRUE" { print }
' file2.txt file1.txt
AE1267453617238|BIDKFXXXX|United Arab Emirates|

Printing multiple lines with the same "largest" value using awk

I have a file that looks like this:
3, abc, x
2, def, y
3, ghi, z
I want to find the highest value in $1 and print all rows that contain this highest value in $1.
sort -t, -k1,1n| tail -n1
would just give one of the rows that contain 3 in $1, but I need both.
Any suggestions are appreciated (:

I’m not sure if this is the nicest way to get lines while they have the same value with awk, but:
awk 'NR == 1 { t = $1; print } NR > 1 { if (t != $1) { exit; } print }'
which can be combined with sort as follows:
sort -t, -k1,1nr | awk 'NR == 1 { t = $1; print } NR > 1 { if (t != $1) { exit; } print }'
There’s also this, but it does unnecessary work:
sort -t, -k1,1nr | awk 'NR == 1 { t = $1 } t == $1 { print }'

Here is another approach that does not require sorting, but requires two passes over the data.
max=$(awk -F',' '{if(max < $1) max = $1}END{print max}' Input.txt )
awk -v max=$max -F',' '$1 == max' Input.txt

In awk, only one pass over the data:
$ awk -F, '
$1>m { # when new max is found
delete a; m=$1; i=0 # reset all
}
a[1]=="" || $1==m { # if $1 equals max or we're processing the first record
a[++i]=$0 # store the record to a
}
END { # in the end
for(j=1;j<=i;j++)
print a[j] # print a with stored records
}
' file
3, abc, x
3, ghi, z

awk with egrep filter unable to return null value with condition

I new in awk, my command as below. When there is no row return need print pass, else print fail. But when there is no value, the pass is unable to display
egrep -v "^\+" /etc/passwd | awk -F: '($1!="root" && $1!="sync" && $1!="shutdown" && $1!="halt" && $3<500 && $7!="/sbin/nologin") {print}' | awk '{if(NR==0||NR<=0||'null') print "pass"; else print "fail"}'
The result should return pass but there is noting print, please advice on this.

consolidate all into one, for example
$ awk -F: '!/^+/ && $1!="root" && ... {f=1; exit}
END {print (f?"fail":"pass")}' /etc/passwd
perhaps better if you set the exit code
$ awk -F: '!/^+/ && $1!="root" && ... {exit 1}' /etc/passwd

This MAY be what you're trying to do:
awk -F: '/^+/ || $1~/^(root|sync|shutdown|halt)$/ || $3>=500 || $7=="/sbin/nologin"{next} {f=1; exit} END{print (f ? "pass" : "fail")}'

changing the appearance of awk output

I used the following code to extract protein residues from text files.
awk '{
if (FNR == 1 ) print ">" FILENAME
if ($5 == 1 && $4 > 30) {
printf $3
}
}
END { printf "\n"}' protein/*.txt > seq.txt
I got the following output when I used the above code.
>1abd
MDEKRRAQHNEVERRRRDKINNWIVQLSKIIPDSSMESTKSGQSKGGILSKASDYIQELRQSNHR>1axc
RQTSMTDFYHSKRRLIFS>1bxc
RQTSMTDFYHSKRRLIFSPRR>1axF
RQTSMTDFYHSKRR>1qqt
ARPYQGVRVKEPVKELLRRKRG
I would like to get the output as shown below.How do I change the above code to get the following output?
>1abd
MDEKRRAQHNEVERRRRDKINNWIVQLSKIIPDSSMESTKSGQSKGGILSKASDYIQELRQSNHR
>1axc
RQTSMTDFYHSKRRLIFS
>1bxc
RQTSMTDFYHSKRRLIFSPRR
>1axF
RQTSMTDFYHSKRR
>1qqt
ARPYQGVRVKEPVKELLRRKRG

This might work for you:
awk '{
if (FNR == 1 ) print newline ">" FILENAME
if ($5 == 1 && $4 > 30) {
newline="\n";
printf $3
}
}
END { printf "\n"}' protein/*.txt > seq.txt

With gawk version 4, you can write:
gawk '
BEGINFILE {print ">" FILENAME}
($5 == 1 && $4 > 30) {printf "%s", $3}
ENDFILE {print ""}
' filename ...
http://www.gnu.org/software/gawk/manual/html_node/BEGINFILE_002fENDFILE.html#BEGINFILE_002fENDFILE

awk + add print in case of match

I write the following awk ( print VAL_1 & VAL_2 if match in file )
awk -v VAL_1=$NET -v VAL_2=$NET_SPEED '$1 == VAL_1 && $2 == VAL_2 ' file
how to add in awk the print command ,
in order to print the word MATCH
if
$1=VAL_1
&
$2=VAL_2
lidia

$1 == VAL_1 && $2 == VAL_2 { print "MATCH" }

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Awk: Print to different files using multiple defined variables - variables

Use separate -vs: awk -v v="$var" -v f1="$file" -v f2="$output" '...'

% awk -v v=qw,f1=we,f2=as 'BEGIN{print v, "", f1, "", f2}' qw,f1=we,f2=as * * % awk -v v=qw -v f1=we -v f2=as 'BEGIN{print v, "", f1, "", f2}' qw * we * as % Do you need anything else to proceed?

Related

Compare multiple columns from one file with multiple columns of another file using awk?

Printing multiple lines with the same "largest" value using awk

awk with egrep filter unable to return null value with condition

changing the appearance of awk output

awk + add print in case of match

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Awk: Print to different files using multiple defined variables - variables

Use separate -vs: awk -v v="$var" -v f1="$file" -v f2="$output" '...'

% awk -v v=qw,f1=we,f2=as 'BEGIN{print v, "*", f1, "*", f2}' qw,f1=we,f2=as * * % awk -v v=qw -v f1=we -v f2=as 'BEGIN{print v, "*", f1, "*", f2}' qw * we * as % Do you need anything else to proceed?

Related

Compare multiple columns from one file with multiple columns of another file using awk?

Printing multiple lines with the same "largest" value using awk

awk with egrep filter unable to return null value with condition

changing the appearance of awk output

awk + add print in case of match

Categories

Resources

% awk -v v=qw,f1=we,f2=as 'BEGIN{print v, "", f1, "", f2}' qw,f1=we,f2=as * * % awk -v v=qw -v f1=we -v f2=as 'BEGIN{print v, "", f1, "", f2}' qw * we * as % Do you need anything else to proceed?