Print default value if index is not in awk array - awk

$ cat file1 #It contains ID:Name
5:John
4:Michel
$ cat file2 #It contains ID
5
4
3
I want to Replace the IDs in file2 with Names from file1, output required
John
Michel
NO MATCH FOUND
I need to expand the below code to reult NO MATCH FOUND text.
awk -F":" 'NR==FNR {a[$1]=$2;next} {print a[$1]}' file1 file2
My current result:
John
Michel
<< empty line
Thanks,

You can use a ternary operator for this: print ($1 in a)?a[$1]:"NO MATCH FOUND". That is, if $1 is in the array, print it; otherwise, print the text "NO MATCH FOUND".
All together:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} {print ($1 in a)?a[$1]:"NO MATCH FOUND"}' f1 f2
John
Michel
NO MATCH FOUND

You can test whether the index occurs in the array:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} $1 in a {print a[$1]; next} {print "NOT FOUND"}' file1 file2
John
Michel
NOT FOUND

if file2 has only digit (no space at the end)
awk -F ':' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1
if not
awk -F '[:[:blank:]]' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1 file2

Related

Filter column matches from two files

I have two files with the format:
number|string
I want to create a third file with the contents of file 1 and without the matches between both files on the second column.
file1
1|abc
2|bcd
3|cde
file2
1|bcd
2|def
file3
1|abc
3|cde
Is this correct?
awk -F '|' 'NR==FNR{a[$2];next}$2 not in a{print $0}' file1 file2 > file3
You may use this awk:
awk -F '|' 'FNR == NR {skip[$2]; next} !($2 in skip)' file2 file1 > file3
cat file3
1|abc
3|cde

How to exclude lines matching a regex pattern in a column by awk?

I want to exclude lines containing a specific string.
header
1:test
2:test
3:none
4:test
Why don't these commands work?
awk -F: 'FNR>1 {$0 !~ /none/} {print $1}' 1.txt
awk -F: 'FNR>1 {$2 !~ /none/} {print $1}' 1.txt
but this works:
awk '$0 !~ /none/ {print $0}' 1.txt
I intend to get
1
2
4
You need to provide the regex test as condition, not as action, and may use
awk -F: 'FNR>1 && !/none/{print $1}' file
awk -F: 'FNR>1 && $2 !~ /none/{print $1}' file
See an awk online demo
Details
-F: - sets the field separator to a colon
FNR>1 && !/none/ - if number of processed records for current file is more than 1 and there is no none on the line (if $2 !~ /none/ is used, returns true if Field 2 does not contain none pattern)
{print $1} - print Field 1 value.

using awk to match and sum a file of multiple lines

I am trying to combine matching lines in file.txt $1 and then display the sum of `$2 for those matches. Thank you :).
File.txt
ENSMUSG00000000001:001
ENSMUSG00000000001:002
ENSMUSG00000000001:003
ENSMUSG00000000002:003
ENSMUSG00000000002:003
ENSMUSG00000000003:002
Desired output
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
awk -F':' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file > output.txt
$ awk -F':' -v OFS='\t' '{sum[$1]+=$2} END{for (key in sum) print key, sum[key]}' file
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
{x=$1;a[x]=a[x] + $2} END{for(x in a)print x,a[x]}
Just a typo I guess: instead of adding $0 add $2. That gives me the expected output. And the $1="" is not necessary. To make sure that there isn't anything funny with $2 you may consider 1.0*$2.

awk compare two files and merge output

Original question
I have 2 files 1.csv and 2.csv
1.csv:-
AK,BA,Alpha,1095
ALL,SA,Alpha,9592
2.csv:-
AK,BA,SPAM,10
I want to merge files so that it will print output file as below
OUTPUT:-
AK,BA,Alpha,1095,SPAM,10
AL,SA,Alpha,9592,NA,NA
Updated question
I have 2 files alpha1.csv and SPAM1.csv
$ cat alpha1.csv
AKTEL_BANGLADESH,BANGLADESH,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Alphanumeric_A_MSISDN_blocking,9592
B-MOBILE_BRUNEI,BRUNEI,Alphanumeric_A_MSISDN_blocking,3
$ cat SPAM1.csv
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111
expected output:
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,**NA,NA**
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,**NA,NA**
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,**NA,NA**
B-MOBILE_BRUNEI,BRUNEI,**NA,NA**,Alphanumeric_A_MSISDN_blocking,3
My command is only printing matched cases of file two with file 1 and not printing non matched cases:
$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (i=a[$1,$2]?a[$1,$2]:"NA,NA")}' alpha1.csv SPAM1.csv
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,NA,NA
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,NA,NA
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,NA,NA
You can use this, for example:
$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (($1,$2) in a?a[$1,$2]:"NA,NA")}' f2 f1
AK,BA,Alpha,1095,SPAM,10
ALL,SA,Alpha,9592,NA,NA
Explanation
BEGIN{FS=OFS=","} set input and output field separator as comma.
FNR==NR {a[$1,$2]=$3 FS $4; next} store 3rd and 4th values in an array a[], whose index is the tuple ($1,$2).
{print $0, (($1,$2) in a?a[$1,$2]:"NA,NA")} print the line together with the matched item from the array. If there is no such element, then print NA,NA.

group of columns in awk

The following awk statement is working as expected.
awk '{print $1, $2, $3}' test.txt
But how do I say that I need all the columns after the second column?
awk '{print $1, $2, $3 to $NF}' test.txt
I need all columns from third column till end of that line. There can be 2 to 10 columns and all are considered as a part of the last column.
if you just want $3-$NF fields, standard way would be loop (for/while)
but for your requirement, you could:
awk '{$1=$2="";}sub("^ *","")'
for example:
kent$ seq -s' ' 10|awk '{$1=$2="";}sub("^ *","")'
3 4 5 6 7 8 9 10
if you want to "group" 100 fields into 3 groups: 1,2, 3-100:
awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
same example:
kent$ seq -s' ' 10|awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
1 2 345678910
hope it is what you want.
The intuitive way.
awk 'BEGIN{ORS=""} {for(i=3; i<=NF; i++) if(i != NF){print $i " "} else {print $i "\n"}}' test.txt
Some more:
awk '{$1=$2=x; $0=$0; $1=$1}1' file
awk '{$1=$1; sub($1 FS $2 FS,x)}1' file
To keep spacing in tact:
awk 'sub($1 "[ \t]*" $2 "[ \t]*",x)' file