awk + add print in case of match - awk

I write the following awk ( print VAL_1 & VAL_2 if match in file )
awk -v VAL_1=$NET -v VAL_2=$NET_SPEED '$1 == VAL_1 && $2 == VAL_2 ' file
how to add in awk the print command ,
in order to print the word MATCH
if
$1=VAL_1
&
$2=VAL_2
lidia

$1 == VAL_1 && $2 == VAL_2 { print "MATCH" }

Related

Print rows where one column is the same but another is different

Given files test1 and test2:
$ cat test*
alert_name,id,severity
bar,2,1
foo,1,0
alert_name,id,severity
foo,1,9
bar,2,1
I want to find rows where name is the same but severity has changed (ie foo) and print the change. I have got this far using awk:
awk 'BEGIN{FS=","} FNR>1 && NR==FNR { a[$1]; next } ($1 in a) && ($3 != a[$3]) {printf "Alert %s severity from %s to %s\n", $1, a[$3], $3}' test1 test2
which prints:
Alert foo severity from to 9
Alert bar severity from to 1
So the match is wrong, and I can't print a[$3].
You may try this awk:
awk -F, '$1 in sev && $3 != sev[$1] {
printf "Alert %s severity from %s to %s\n", $1, sev[$1], $3
}
{sev[$1] = $3}' test*
Alert foo severity from 0 to 9
mawk 'BEGIN { _+=(_^=FS=OFS=",")+_ }
FNR == NR || +$_<=+___[__=$!!_] ? !_*(___[$!!_]=$_) : \
$!_ = "Alert "__ " severity from "___[__]" to " $_' files*.txt
Alert foo severity from 0 to 9

Compare multiple columns from one file with multiple columns of another file using awk?

I want to compare first 2 characters of col1 of file1 with col1 of file2 if col3 of file1 is same as col3 of file2 , provided col4 in file2 equals to TRUE. I tried something :-
awk -F'|' 'BEGIN{OFS=FS};(NR==FNR)
{a[substr($1,1,2),$3]=$1;next}(($1,$3)in a) && $4==TRUE ' file1 file2 > outfield
file 1
AE1267453617238|BIDKFXXXX|United Arab Emirates|
PL76UTYVJDYGHU9|ABSFXXJBW|Poland|
GB76UTRTSCLSKJ|FVDGXXXUY|Russia|
file 2
AE|^AE[0-9]{2}[0-9]{24}|United Arab Emirates|TRUE|
PL|^PL[0-9]{2}[A-Z]{10}[0-9]{4}|Poland|FALSE|
GB|^GB[0-9]{2}[A-Z]{5}[0-9]{3}|Europe|TRUE
expected output :-
AE1267453617238|BIDKFXXXX|United Arab Emirates|
You could just simply cascade the multiple conditions with a && as below. Remember your expected output is on the first file, so you need to process the second file first
awk -F'|' ' FNR == NR {
if ( $4 == "TRUE" ) m[$1] = $3 ; next }{ k = substr($1,1,2) } k in m && m[k] == $3' file2 file1
The part m[$1] = $3 creates a hash-map of the $1 with the value of $3 in the second file, which is then used in the first file to compare against only the first two characters of $1 i.e. substr($1,1,2). To avoid redundant use of substr(..), the value is extracted into a variable k and reused subsequently.
If the matches must be on the same line number in each file:
awk -F \| '
FNR==NR && $4 == "TRUE" {a[NR,1]=$1; a[NR,3]=$3}
FNR!=NR && $3 == a[FNR,3] &&
$1 ~ "^"a[FNR,1]' file2 file1
If the matches can be on any line (every line of file1 is checked against every line of file2, duplicate matches aren't printed):
awk -F \| '
FNR==NR {++l}
FNR==NR && $4 == "TRUE" {a[NR,1]=$1; a[NR,3]=$3}
FNR!=NR {
for (i=1; i<=l; ++i) {
if ($3 == a[i,3] && $1 ~ "^"a[i,1])
c[$0]==0
}
}
END {
for (i in c)
print i
}' file2 file1
Note the order files are given. file2 (which contains TRUE and FALSE), goes first. I also used regex instead of substr, so the characters should be alphanumeric only, if not, go back to substr.
Regarding your code:
awk -F'|' 'BEGIN{OFS=FS};(NR==FNR)
{a[substr($1,1,2),$3]=$1;next}(($1,$3)in a) && $4==TRUE ' file1 file2 > outfield
newlines matter to awk. This:
NR==FNR
{ print }
is not the same as this:
NR==FNR { print }
The first one is actually the same as:
NR==FNR { print }
1 { print }
Also when you want to output the contents of a file (file1 in your case) it's usually better to read the OTHER file into memory and then compare the values from the target file against that so you can just print it as you go. So you should be doing awk 'script' file2 file1, not awk 'script' file1 file2, and writing a script based on that.
Try this:
$ cat tst.awk
BEGIN { FS="|" }
NR==FNR {
if ( $4 == "TRUE" ) {
map[$1] = $3
}
next
}
{ key = substr($1,1,2) }
(key in map) && (map[key] == $3)
$ awk -f tst.awk file2 file1
AE1267453617238|BIDKFXXXX|United Arab Emirates|
awk -F\| '
NR==FNR{
a[$3,1]=$1;
a[$3,4]=$4;
next
}
substr($1,1,2) == a[$3,1] && a[$3,4] == "TRUE" { print }
' file2.txt file1.txt
AE1267453617238|BIDKFXXXX|United Arab Emirates|

awk Print Skipping a field

In the case where type is "" print the 3rd field out of sequence and then print the whole line with the exception of the 3rd field.
Given a tab separated line a b c d e the idea is to print ab<tab>c<tab>a<tab>b<tab>d<tab>e
Setting $3="" seems to cause the subsequent print statement to lose the tab field separators and so is no good.
# $1 = year $2 = movie
BEGIN {FS = "\t"}
type=="" {printf "%s\t%s\t", $2 $1,$3; $3=""; print}
type!="" {printf "%s\t<%s>\t", $2 $1,type; print}
END {print ""}
Sticking in a for loop which I like a lot less as a solution results in a blank file.
# $1 = year $2 = movie
BEGIN {FS = "\t"}
type=="" {printf "%s\t%s\t%s\t%s\t", $2 $1,$3,$1,$2; for (i=4; i<=NF;i++) printf "%s\t",$i}
type!="" {printf "%s\t<%s>\t", $2 $1,type; print}
END {print ""}
You need to set the OFS to a tab instead of it's default single blank char and you don't want to just set $3 to a bank char as then you'll get 2 tabs between $2 and $4.
$ cat tst.awk
BEGIN {FS = OFS = "\t"}
{
if (type == "") {
val = $3
for (i=3; i<NF; i++) {
$i = $(i+1)
}
NF--
}
else {
val = "<" type ">"
}
print $2 $1, val, $0
}
$
$ awk -f tst.awk file | tr '\t' '-'
ba-c-a-b-d-e
$
$ awk -v type="foo" -f tst.awk file | tr '\t' '-'
ba-<foo>-a-b-c-d-e
The |tr '\t' '-' is obviously just added to make visible where the tabs are.
If decrementing NF doesn't work in your awk to delete the last field in the record, replace it with sub(/\t[^\t]+$/,"").
One way
awk '{$3=""}1' OFS="\t" infile|column -t
explanation
{$3=""} set column to nil
1 same as print, print the line.
OFS="\t"set Output Field Separator Variable to tab, maybe you needn't it, next commandcolumn -t` make the format again.
column -t columnate lists with tabs.

Awk: Print to different files using multiple defined variables

I am opening up a file and checking if the items in columns 1 & 8 match certain specs. If yes, write output to a file x. If the items in column 1 match specs but column 8 does not match the specs, write output to file y.
I am defining multiple variables (awk -v v=$var,f1=$file,f2=$output), and I believe how I reference f1 & f2 is the problem. If I remove the quotes:
print $0 >> f2
awk: cmd. line:5: (FILENAME=- FNR=2) fatal: expression for `>>' redirection has null string value
If I put in a $:
print $0 >> $f2
I end up with a bunch of files with odd names that I don't want, and the files I do want are empty (except for the echoed line).
if I put "":
print $0 >> "f2"
The files I want are almost empty, and it creates a file called f2.
#!/bin/bash
output="output.txt"
echo -e "C1\tSeqID\tAminoAcid\tCD1\tCD2\tCD3\tGene\tEnvironment\tFilename" > $output
inputFile="input.txt.gz"
for var in A B C D E F G H I J K L
do
file=$var".txt"
echo -e "C1\tSeqID\tAA\tCD1\tCD2\tCD3\tGene\tEnvironment\tFilename" > $file
#---Wrong, forgot to catch $8 != v
#zcat $inputFile | awk -v v=$var '{
# if ($8 == v && ($1 == "V1" || $1 == "V2" || $1 == "V3" || $1 == "V4" || $1 == "V5" || $1 == "V6" || $1 == "V7" || $1 == "V8" || $1 == "V9" || $1 == "V10"))
# print $0
# }' | tee -a $file $output
zcat $inputFile | awk -v v=$var,f1=$file,f2=$output '{
if ($8 == v && ($1 == "V1" || $1 == "V2" || $1 == "V3" || $1 == "V4" || $1 == "V5" || $1 == "V6" || $1 == "V7" || $1 == "V8" || $1 == "V9" || $1 == "V10"))
print $0 >> "file"
else if ($8 != v && ($1 == "V1" || $1 == "V2" || $1 == "V3" || $1 == "V4" || $1 == "V5" || $1 == "V6" || $1 == "V7" || $1 == "V8" || $1 == "V9" || $1 == "V10"))
print $0 >> "f2"
}'
gzip $file
done
gzip $output
I can run through the loop and have two separate awk commands that write to different files. However, it is a very large file (4G compressed) and it is more efficient to use my current approach (or something similar to it). Any guidance on how to reference the 2nd & 3rd variable are greatly appreciated.
Use separate -vs:
awk -v v="$var" -v f1="$file" -v f2="$output" '...'
% awk -v v=qw,f1=we,f2=as 'BEGIN{print v, "*", f1, "*", f2}'
qw,f1=we,f2=as * *
% awk -v v=qw -v f1=we -v f2=as 'BEGIN{print v, "*", f1, "*", f2}'
qw * we * as
%
Do you need anything else to proceed?

changing the appearance of awk output

I used the following code to extract protein residues from text files.
awk '{
if (FNR == 1 ) print ">" FILENAME
if ($5 == 1 && $4 > 30) {
printf $3
}
}
END { printf "\n"}' protein/*.txt > seq.txt
I got the following output when I used the above code.
>1abd
MDEKRRAQHNEVERRRRDKINNWIVQLSKIIPDSSMESTKSGQSKGGILSKASDYIQELRQSNHR>1axc
RQTSMTDFYHSKRRLIFS>1bxc
RQTSMTDFYHSKRRLIFSPRR>1axF
RQTSMTDFYHSKRR>1qqt
ARPYQGVRVKEPVKELLRRKRG
I would like to get the output as shown below.How do I change the above code to get the following output?
>1abd
MDEKRRAQHNEVERRRRDKINNWIVQLSKIIPDSSMESTKSGQSKGGILSKASDYIQELRQSNHR
>1axc
RQTSMTDFYHSKRRLIFS
>1bxc
RQTSMTDFYHSKRRLIFSPRR
>1axF
RQTSMTDFYHSKRR
>1qqt
ARPYQGVRVKEPVKELLRRKRG
This might work for you:
awk '{
if (FNR == 1 ) print newline ">" FILENAME
if ($5 == 1 && $4 > 30) {
newline="\n";
printf $3
}
}
END { printf "\n"}' protein/*.txt > seq.txt
With gawk version 4, you can write:
gawk '
BEGINFILE {print ">" FILENAME}
($5 == 1 && $4 > 30) {printf "%s", $3}
ENDFILE {print ""}
' filename ...
http://www.gnu.org/software/gawk/manual/html_node/BEGINFILE_002fENDFILE.html#BEGINFILE_002fENDFILE