I have two files with the format:
number|string
I want to create a third file with the contents of file 1 and without the matches between both files on the second column.
file1
1|abc
2|bcd
3|cde
file2
1|bcd
2|def
file3
1|abc
3|cde
Is this correct?
awk -F '|' 'NR==FNR{a[$2];next}$2 not in a{print $0}' file1 file2 > file3
You may use this awk:
awk -F '|' 'FNR == NR {skip[$2]; next} !($2 in skip)' file2 file1 > file3
cat file3
1|abc
3|cde
I want to exclude lines containing a specific string.
header
1:test
2:test
3:none
4:test
Why don't these commands work?
awk -F: 'FNR>1 {$0 !~ /none/} {print $1}' 1.txt
awk -F: 'FNR>1 {$2 !~ /none/} {print $1}' 1.txt
but this works:
awk '$0 !~ /none/ {print $0}' 1.txt
I intend to get
1
2
4
You need to provide the regex test as condition, not as action, and may use
awk -F: 'FNR>1 && !/none/{print $1}' file
awk -F: 'FNR>1 && $2 !~ /none/{print $1}' file
See an awk online demo
Details
-F: - sets the field separator to a colon
FNR>1 && !/none/ - if number of processed records for current file is more than 1 and there is no none on the line (if $2 !~ /none/ is used, returns true if Field 2 does not contain none pattern)
{print $1} - print Field 1 value.
I am trying to combine matching lines in file.txt $1 and then display the sum of `$2 for those matches. Thank you :).
File.txt
ENSMUSG00000000001:001
ENSMUSG00000000001:002
ENSMUSG00000000001:003
ENSMUSG00000000002:003
ENSMUSG00000000002:003
ENSMUSG00000000003:002
Desired output
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
awk -F':' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file > output.txt
$ awk -F':' -v OFS='\t' '{sum[$1]+=$2} END{for (key in sum) print key, sum[key]}' file
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
{x=$1;a[x]=a[x] + $2} END{for(x in a)print x,a[x]}
Just a typo I guess: instead of adding $0 add $2. That gives me the expected output. And the $1="" is not necessary. To make sure that there isn't anything funny with $2 you may consider 1.0*$2.
Original question
I have 2 files 1.csv and 2.csv
1.csv:-
AK,BA,Alpha,1095
ALL,SA,Alpha,9592
2.csv:-
AK,BA,SPAM,10
I want to merge files so that it will print output file as below
OUTPUT:-
AK,BA,Alpha,1095,SPAM,10
AL,SA,Alpha,9592,NA,NA
Updated question
I have 2 files alpha1.csv and SPAM1.csv
$ cat alpha1.csv
AKTEL_BANGLADESH,BANGLADESH,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Alphanumeric_A_MSISDN_blocking,9592
B-MOBILE_BRUNEI,BRUNEI,Alphanumeric_A_MSISDN_blocking,3
$ cat SPAM1.csv
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111
expected output:
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,**NA,NA**
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,**NA,NA**
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,**NA,NA**
B-MOBILE_BRUNEI,BRUNEI,**NA,NA**,Alphanumeric_A_MSISDN_blocking,3
My command is only printing matched cases of file two with file 1 and not printing non matched cases:
$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (i=a[$1,$2]?a[$1,$2]:"NA,NA")}' alpha1.csv SPAM1.csv
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,NA,NA
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,NA,NA
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,NA,NA
You can use this, for example:
$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (($1,$2) in a?a[$1,$2]:"NA,NA")}' f2 f1
AK,BA,Alpha,1095,SPAM,10
ALL,SA,Alpha,9592,NA,NA
Explanation
BEGIN{FS=OFS=","} set input and output field separator as comma.
FNR==NR {a[$1,$2]=$3 FS $4; next} store 3rd and 4th values in an array a[], whose index is the tuple ($1,$2).
{print $0, (($1,$2) in a?a[$1,$2]:"NA,NA")} print the line together with the matched item from the array. If there is no such element, then print NA,NA.
The following awk statement is working as expected.
awk '{print $1, $2, $3}' test.txt
But how do I say that I need all the columns after the second column?
awk '{print $1, $2, $3 to $NF}' test.txt
I need all columns from third column till end of that line. There can be 2 to 10 columns and all are considered as a part of the last column.
if you just want $3-$NF fields, standard way would be loop (for/while)
but for your requirement, you could:
awk '{$1=$2="";}sub("^ *","")'
for example:
kent$ seq -s' ' 10|awk '{$1=$2="";}sub("^ *","")'
3 4 5 6 7 8 9 10
if you want to "group" 100 fields into 3 groups: 1,2, 3-100:
awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
same example:
kent$ seq -s' ' 10|awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
1 2 345678910
hope it is what you want.
The intuitive way.
awk 'BEGIN{ORS=""} {for(i=3; i<=NF; i++) if(i != NF){print $i " "} else {print $i "\n"}}' test.txt
Some more:
awk '{$1=$2=x; $0=$0; $1=$1}1' file
awk '{$1=$1; sub($1 FS $2 FS,x)}1' file
To keep spacing in tact:
awk 'sub($1 "[ \t]*" $2 "[ \t]*",x)' file