AWK Print fields with new values if match, else print line - awk

I have two files - FileA and FileB. FileA will be changed. FileB contains the new values. FileB has 3 fields. The first two fields will be compared with FileA's first two fields. If the fields match, Field3 should be changed. The code below is working in this manner: "If the two values match, change field3 and print the line. If there is no match, next." The behavior I want is, "If there is no match, print the line unchanged." The "else" part of the code is not working and I've tried so many variations.
awk -F'\t' -v OFS='\t' '
# first, read in data from file B
NR == FNR { values[$1 FS $2] = $3; next }
# then, output modified lines from matching lines in file A
($1 FS $2) in values { $3 = values[$1 FS $2]; print } else { print $0 }
' fileB fileA
FileA
PROVDSRJ02.RD.RI ae0.0 16
PROVDSRJ02.RD.RI ae1.1 1000
PROVDSRJ02.RD.RI ae2.0 5000
PROVDSRJ02.RD.RI ae3.0 5000
ASHBBBRJ01.RD.AS ae39.0 16
ASHBBPRJ01.RD.AS ae2.0 16
ASHBBPRJ02.RD.AS ae1.0 16
ASHBBPRJ02.RD.AS ae2.0 16
ASHBBBRJ01.RD.AS ae0.0 16
ASHBBBRJ01.RD.AS ae11.0 16
FileB
ASHBBBRJ01.RD.AS ae10.0 524
ASHBBBRJ01.RD.AS ae11.0 235
ASHBBBRJ01.RD.AS ae39.0 2096
ASHBBBRJ01.RD.AS ae6.0 183
ASHBBBRJ01.RD.AS ae7.0 1141
ASHBBBRJ02.RD.AS ae11.0 88
ASHBBBRJ02.RD.AS ae13.0 333
ASHBBBRJ02.RD.AS ae20.0 374
ASHBBBRJ02.RD.AS ae9.0 1885
Desired Output (** indicate changed lines and should not be included in code)
PROVDSRJ02.RD.RI ae0.0 16
PROVDSRJ02.RD.RI ae1.1 1000
PROVDSRJ02.RD.RI ae2.0 5000
PROVDSRJ02.RD.RI ae3.0 5000
**ASHBBBRJ01.RD.AS ae39.0 2096**
ASHBBPRJ01.RD.AS ae2.0 16
ASHBBPRJ02.RD.AS ae1.0 16
ASHBBPRJ02.RD.AS ae2.0 16
ASHBBBRJ01.RD.AS ae0.0 16
**ASHBBBRJ01.RD.AS ae11.0 235**

Your syntax is off. Check the tag info for some learning resources.
In any case, you don't need an else as such. You can conditionally set $3 to the new value (as you already are doing), and then always print the line (which may have been modified or not).
Here we use the shortcut 1 to always print the line. 1 is an always-true pattern that invokes the default action, which is to print the current line. If that doesn't make sense now, it will soon.
$ awk 'BEGIN {FS=OFS="\t"}
NR == FNR {values[$1 FS $2] = $3; next}
($1 FS $2) in values {$3 = values[$1 FS $2]}1' fileB fileA
PROVDSRJ02.RD.RI ae0.0 16
PROVDSRJ02.RD.RI ae1.1 1000
PROVDSRJ02.RD.RI ae2.0 5000
PROVDSRJ02.RD.RI ae3.0 5000
ASHBBBRJ01.RD.AS ae39.0 2096
ASHBBPRJ01.RD.AS ae2.0 16
ASHBBPRJ02.RD.AS ae1.0 16
ASHBBPRJ02.RD.AS ae2.0 16
ASHBBBRJ01.RD.AS ae0.0 16
ASHBBBRJ01.RD.AS ae11.0 235

Related

Print sorted output with awk to avoid pipe sort command

I'm trying to match the lines containing (123) and then manipulate field 2 replacing x and + by space that will give 4 columns. Then change order of column 3 by Column 4.
To finally print sorted first by column 3 and second by column 4.
I'm able to get the output piping sort command after awk output in this way.
$ echo "
0: 1920x1663+0+0 kpwr(746)
323: 892x550+71+955 kpwr(746)
211: 891x550+1003+410 kpwr(746)
210: 892x451+71+410 kpwr(746)
415: 891x451+1003+1054 kpwr(746)
1: 894x532+70+330 kpwr(123)
324: 894x532+1001+975 kpwr(123)
2: 894x631+1001+330 kpwr(123)
212: 894x631+70+876 kpwr(123)
61: 892x1+71+375 kpwr(0)
252: 892x1+71+921 kpwr(0)" |
awk '/\(123\)/{b = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2); print b}' |
sort -k3 -k4 -n
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
How can I get the same output using only awk without the need to pipe sort? Thanks for any help.
Here is how you can get it from awk (gnu) itself:
awk '/\(123\)/{
$2 = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2)
split($2, a) # split by space and store into array a
# store array by index 3 and 4
rec[a[3]][a[4]] = (rec[a[3]][a[4]] == "" ? "" : rec[a[3]][a[4]] ORS) $2
}
END {
PROCINFO["sorted_in"]="#ind_num_asc" # sort by numeric key ascending
for (i in rec) # print stored array rec
for (j in rec[i])
print rec[i][j]
}' file
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
Can you handle GNU awk?:
$ gawk '
BEGIN {
PROCINFO["sorted_in"]="#val_num_asc" # for order strategy
}
/\(123\)$/ { # pick records
split($2,t,/[+x]/) # split 2nd field
if((t[4] in a) && (t[3] in a[t[4]])) { # if index collision
n=split(a[t[4]][t[3]],u,ORS) # split stacked element
u[n+1]=t[1] OFS t[2] OFS t[4] OFS t[3] # add new data
delete a[t[4]][t[3]] # del before rebuilding
for(i in u) # sort on whole record
a[t[4]][t[3]]=a[t[4]][t[3]] ORS u[i] # restack to element
} else
a[t[4]][t[3]]=t[1] OFS t[2] OFS t[4] OFS t[3] # no collision, just add
}
END {
PROCINFO["sorted_in"]="#ind_num_asc" # strategy on output
for(i in a)
for(j in a[i])
print a[i][j]
}' file
Output:
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
With collisioning data like:
1: 894x532+70+330 kpwr(123) # this
1: 123x456+70+330 kpwr(123) # and this, notice order
324: 894x532+1001+975 kpwr(123)
2: 894x631+1001+330 kpwr(123)
212: 894x631+70+876 kpwr(123)
output would be:
123 456 330 70 # ordered by the whole record when collision
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
I was almost done with writing and my solution was ditto as #anubhava's so adding a bit tweak to his solution :) This one will take care of multiple lines of same values here.
awk '
BEGIN{
PROCINFO["sorted_in"]="#ind_num_asc"
}
/\(123\)/{
$2 = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2)
split($2, a," ")
arr[a[3]][a[4]] = (arr[a[3]][a[4]]!=""?arr[a[3]][a[4]] ORS:"")$2
}
END {
for (i in arr){
for (j in arr[i]){ print arr[i][j] }
}
}' Input_file

Merge files where some columns matched

Match columns 1,2,3 in both files, if they are equal then.
For files where columns match write value of column 4 in file1 into file2
If there is not match then write NA
file1
31431 37150 100 10100
31431 37201 100 12100
31431 37471 100 14100
file2
31431 37150 100 14100
31431 37131 100 14100
31431 37201 100 14100
31431 37478 100 14100
31431 37471 100 14100
Desired output:
31431 37150 100 14100 10100
31431 37131 100 14100 NA
31431 37201 100 14100 12100
31431 37478 100 14100 NA
31431 37471 100 14100 14100
I tried
awk '
FNR==NR{
a[$1 $2 $3]=$4
next
}
($1 in a){
$1=a[$1]
found=1
}
{
$0=found==1?$0",":$0",NA"
sub(/^...../,"&,")
$1=$1
found=""
}
1
' FS=" " file1 FS=" " OFS="," file2
$ awk ' {k=$1 FS $2 FS $3}
NR==FNR {a[k]=$4; next}
{$(NF+1)=k in a?a[k]:"NA"}1' file1 file2
31431 37150 100 14100 10100
31431 37131 100 14100 NA
31431 37201 100 14100 12100
31431 37478 100 14100 NA
31431 37471 100 14100 14100
Could you please try following.
awk 'FNR==NR{a[$1,$2,$3]=$NF;next} {print $0,($1,$2,$3) in a?a[$1,$2,$3]:"NA"}' Input_file1 Input_file2
OR with creating a variable for fields as per Ed sir's comment.
awk '{var=$1 OFS $2 OFS $3} FNR==NR{a[var]=$NF;next} {print $0,var in a?a[var]:"NA"}' Input_file1 Input_file2
Output will be as follows.
31431 37150 100 14100 10100
31431 37131 100 14100 NA
31431 37201 100 14100 12100
31431 37478 100 14100 NA
31431 37471 100 14100 14100
Explanation: Adding explanation for above code now.
awk '
{
var=$1 OFS $2 OFS $3 ##Creating a variable named var whose value is first, second ansd third field of current lines of Input_file1 and Input_file2.
}
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file1 is being read.
a[var]=$NF ##Creating an array named a whose index is variable var and value is $NF of curent line.
next ##next keyword will skip all further lines from here.
}
{
print $0,var in a?a[var]:"NA" ##Printing current line value and along with that printing either value of a[var] or NA based upon if var present in array a then print a[var] else print NA.
}' Input_file1 Input_file2 ##Mentioning Input_file names here.

AWK: Printing a Variable Containing a Percentage Sign %

I'm trying to call a variable which contains a percentage sign in it, but when I do I receive the following error. Even when only trying to create it, it errors. When I exclude it, my script works fine, but I want it to print this variable value in the 3rd field.
awk -v postutil="$postutil"
Partial Output of Error:
awk: cmd. line:2: postutil=66%
awk: cmd. line:2: ^ unexpected newline or end of string
awk: cmd. line:2: postutil=68%
awk: cmd. line:2: ^ unexpected newline or end of string
awk: cmd. line:2: postutil=63%
awk: cmd. line:2: ^ unexpected newline or end of string
awk: cmd. line:2: postutil=38%
awk: cmd. line:2: ^ unexpected newline or end of string
awk: cmd. line:2: postutil=30%
awk: cmd. line:2: ^ unexpected newline or end of string
awk: cmd. line:2: postutil=29%
awk: cmd. line:2: ^ unexpected newline or end of string
awk: cmd. line:2: postutil=91%
awk: cmd. line:2: ^ unexpected newline or end of string
awk: cmd. line:2: postutil=0%
awk: cmd. line:2: ^ unexpected newline or end of string
awk: cmd. line:2: postutil=0%
awk: cmd. line:2: ^ unexpected newline or end of string
Script:
while IFS=$'\t' read -r hostname interface preutil postutil criticality; do
awk -v hostname="$hostname" -v interface="$interface" -v postutil="$postutil" '$0~ hostname "\t" interface{print hostname, interface, postutil, $0}' OFS='\t' temp/post_lsp_interfaces_02.txt
done < temp/comparison_interfaces_high_med.txt
Partial of post_lsp_interfaces_02.txt
ASHBBPRJ01-CHNDDSRJ01-BE ASHBBPRJ01 ae2.0 ASHBBBRJ02 ae9.0 MCDLBBRJ01 ae9.0 CHNDBBRJ01 ae0.0 CHNDDSRJ01 3740.81
ASHBBPRJ01-DUKEDSRJ02-BE ASHBBPRJ01 ae1.0 ASHBBBRJ01 ae10.0 DUKEBBRJ02 ae6.0 DUKEDSRJ02 8182.02
ASHBBPRJ01-HMRDRCRJ01-BE ASHBBPRJ01 ae2.0 ASHBBBRJ02 ae11.0 MRFDBBRJ02 ae0.0 MRFDBBRJ01 ae4.0 NRFKBBRJ01 ae0.0 NRFKDSRJ01 ae17.0 HMRDRCRJ01 4444.66
ASHBBPRJ01-HMRDRCRJ02-BE ASHBBPRJ01 ae2.0 ASHBBBRJ02 ae11.0 MRFDBBRJ02 ae0.0 MRFDBBRJ01 ae4.0 NRFKBBRJ01 ae6.0 VBCHBBRJ01 ae0.0 VBCHDSRJ01 ae18.0 HMRDRCRJ023125.79
ASHBBPRJ01-MCDLDSRJ01-BE ASHBBPRJ01 ae2.0 ASHBBBRJ02 ae9.0 MCDLBBRJ01 ae0.0 MCDLDSRJ01 3862.34
ASHBBPRJ01-MRFDDSRJ02-10-BE ASHBBPRJ01 ae2.0 ASHBBBRJ02 ae11.0 MRFDBBRJ02 ae1.0 MRFDDSRJ02 2110.26
ASHBBPRJ01-MRFDDSRJ02-11-BE ASHBBPRJ01 ae2.0 ASHBBBRJ02 ae11.0 MRFDBBRJ02 ae1.0 MRFDDSRJ02 2110.26
ASHBBPRJ01-MRFDDSRJ02-12-BE ASHBBPRJ01 ae2.0 ASHBBBRJ02 ae11.0 MRFDBBRJ02 ae1.0 MRFDDSRJ02 2110.26
ASHBBPRJ01-MRFDDSRJ02-13-BE ASHBBPRJ01 ae2.0 ASHBBBRJ02 ae11.0 MRFDBBRJ02 ae1.0 MRFDDSRJ02 2110.26
ASHBBPRJ01-MRFDDSRJ02-14-BE ASHBBPRJ01 ae2.0 ASHBBBRJ02 ae11.0 MRFDBBRJ02 ae1.0 MRFDDSRJ02 2110.26
Partial of comparison_interfaces_high_med.txt
ASHBBBRJ02 ae5.0 9% 31% medium_increase
DALSBBRJ02 ae10.0 34% 0% medium_decrease
DALSBBRJ02 ae4.0 3% 44% medium_increase
DUKEBBRJ01 ae0.0 24% 75% high_increase
DUKEBBRJ01 ae5.0 56% 0% high_decrease
DUKEBBRJ02 ae2.0 5% 57% high_increase
DUKEBBRJ02 ae6.0 15% 73% high_increase
I ended up just using sed to remove the percent sign and then re-added it in the awk statement.

awk to match field between two files and use conditions on match

I am trying to look for $2 of file1 (skipping the header) in $2 of file2 and if they match and the value in $10 is > 30 and $11 is > 49, then print the line to a output file. The below awk has syntax errors in it though shellcheck didn't return any. Both the input and output are tab-delimited. I think the below is close, but not sure what is wrong. Thank you :).
awk
awk -F'\t' -v OFS='\t' 'NR==FNR{A[$2];next}$2 in A
{if($10 >.5 OFS $11 > 49)
print ; next
' file1 file2
awk: cmd. line:2: {if($10 >.5 OFS $11 > 49)
awk: cmd. line:2: ^ syntax error
awk: cmd. line:3: print ; next
awk: cmd. line:3: ^ unexpected newline or end of string
file1
Missing in IDP but found in Reference:
2 166848646 G A exonic SCN1A 68 13 16;20 0;0 17;15 0;0 0;0 0;0 c.[5139C>T]+[=] 52.94
file2
chr2 166245425 SCN2A AMPL5155065355 SNP Het C/T C T 54 100 50 23 27
chr2 166848646 SCN1A AMPL1543060606 SNP Het G/A G A 52.9411764706 100 68 32 36
desired output
2 166848646 G A exonic SCN1A 68 13 16;20 0;0 17;15 0;0 0;0 0;0 c.[5139C>T]+[=] 52.94
edit with new awk
awk -F'\t' -v OFS='\t' 'NR==FNR{A[$2];next}$2 in A {
if($10 >.5 OFS $11 > 49) >>> if($10 >.5 && $11 > 49)
print }
' file1 file2 > out
awk: cmd. line:2: if($10 >.5 OFS $11 > 49) >>> if($10 >.5 && $11 > 49)
awk: cmd. line:2: ^ syntax error
here you go...
$ awk 'BEGIN{FS=OFS="\t"} NR==FNR{a[$2]; next}
($2 in a) && $10>30 && $11>49 ' file1 file2

remove lines that do not match specific digits in list file using awk

I am trying to use awk to remove the lines in file that do not match the digits after the NM_ but before the . in $2 of list. Thank you :).
file
204 NM_003852 chr7 + 138145078 138270332 138145293
204 NM_015905 chr7 + 138145078 138270332 138145293
list
TRIM24 NM_015905.2
awk
awk -v OFS="\t" '{ sub(/\r/, "") } ; NR==FNR { N=$2 ; sub(/\..*/, "", $2); A[$2]=N; next } ; $2 in A { $2=A[$2] } 1' list file > out
current output
204 NM_003852 chr7 + 138145078 138270332 138145293
204 NM_015905.2 chr7 + 138145078 138270332 138145293
desired output (line 1 removed as that is the line that does not match)
204 NM_015905.2 chr7 + 138145078 138270332 138145293
awk 'NR==FNR{split($2,f2,".");a[f2[1]];next} $2 in a' list file
$ awk -F'[ .]' 'NR==FNR{a[$2];next}$2 in a' list file
204 NM_015905 chr7 + 138145078 138270332 138145293