I have two files. file 1 has 16 columns with separator "," and file2 with about 40 columns with separator "tab".
i want to compare two files. if the columns 1,2,3,4 of file 1 are same as columns 1,2,4,5 0f file2, the output file contain all of information of file2 and corresponding column16 of file 1.
file 1:
enter image description here
file2:
enter image description here
awk 'BEGIN {OFS="/t"} NR==FNR{FS=",";a[$1,$2,$3,$4];b[$1,$2,$3,$4]=$16}{FS="/t";if (($1,$2,$4,$5) in a) print $0,b[$1,$2,$4,$5]}' <(sort -k2 file1) <(sort -k2 file2) >output
this should work.
$ awk 'NR==FNR {a[$1,$2,$3,$4]=$16; next}
($1,$2,$4,$5) in a {print $0,a[$1,$2,$4,$5]}' FS=, file1 FS='\t' file2
no need to presort the files.
The purpose is to check if values for column 2 and 3 in file1 match with column 1 in file2. If any value match, then replace values in file2 for column 2 and 3 using the information of file1 columns 4 and 5.
file1
100,31431,37131,999991.70,2334362.30
100,31431,37471,111113.20,2334363.30
100,31433,36769,777775.60,2334361.90
102,31433,36853,333322.00,2334362.80
file2
3143137113 318512.50 2334387.50 100
3143137131 318737.50 2334387.50 100
3143137201 319612.50 2334387.50 100
3143137219 319837.50 2334387.50 100
3143137471 322987.50 2334387.50 100
3143137491 323237.50 2334387.50 100
3143336687 313187.50 2334412.50 100
3143336723 313637.50 2334412.50 100
3143336769 314212.50 2334412.50 100
3143336825 314912.50 2334412.50 100
3143336853 315262.50 2334412.50 102
Output desired
31431,37113,318512.50,2334387.50,100
31431,37131,999991.70,2334362.30,100
31431,37201,319612.50,2334387.50,100
31431,37219,319837.50,2334387.50,100
31431,37471,111113.20,2334363.30,100
31431,37491,323237.50,2334387.50,100
31433,36687,313187.50,2334412.50,100
31433,36723,313637.50,2334412.50,100
31433,36769,777775.60,2334361.90,100
31433,36825,314912.50,2334412.50,100
31433,36853,333322.00,2334362.80,102
I tried
awk -F[, ] 'FNR==NR{a[$1 $2]=$0;next}$1 in a{print $0 ,a[$1 $2]}' file1 file2
Thanks in advance
Could you please try following.
awk '
BEGIN{
OFS=","
}
FNR==NR{
a[$2 $3]=$2 OFS $3
b[$2 $3]=$4;c[$2 $3]=$5
next
}
($1 in a){
$2=b[$1]
$3=c[$1];$1=a[$1]
print
next
}
{
$1=$1
sub(/^...../,"&,",$1)
print
}
' FS="," file1 FS=" " file2
Output will be as follows.
31431,37113,318512.50,2334387.50,100
31431,37131,999991.70,2334362.30,100
31431,37201,319612.50,2334387.50,100
31431,37219,319837.50,2334387.50,100
31431,37471,111113.20,2334363.30,100
31431,37491,323237.50,2334387.50,100
31433,36687,313187.50,2334412.50,100
31433,36723,313637.50,2334412.50,100
31433,36769,777775.60,2334361.90,100
31433,36825,314912.50,2334412.50,100
31433,36853,333322.00,2334362.80,102
Try this:
$ awk -F, 'NR==FNR{tmp=$0;sub($1 FS,"",tmp);a[$2 $3]=tmp;next} $1 in a{print a[$1],$NF;next} {$1=substr($1,1,5) OFS substr($1,6,5);} 1' OFS=, file1 FS=' ' file2
31431,37113,318512.50,2334387.50,100
31431,37131,999991.70,2334362.30,100
31431,37201,319612.50,2334387.50,100
31431,37219,319837.50,2334387.50,100
31431,37471,111113.20,2334363.30,100
31431,37491,323237.50,2334387.50,100
31433,36687,313187.50,2334412.50,100
31433,36723,313637.50,2334412.50,100
31433,36769,777775.60,2334361.90,100
31433,36825,314912.50,2334412.50,100
31433,36853,333322.00,2334362.80,102
Above assumes $1 of file does not include regex characters, so to be accurate and safe, better use this:
awk -F, 'NR==FNR{$1="";a[$2 $3]=substr($0,2);next} $1 in a{print a[$1],$NF;next} {$1=substr($1,1,5) OFS substr($1,6,5);} 1' OFS=, file1 FS=' ' file2
However this one assumes the FS of file1 is 1 character only.
And that leads to another change/efficiency improvement:
awk -F, 'NR==FNR{a[$2 $3]=substr($0,length($1 FS)+1);next} $1 in a{print a[$1],$NF;next} {$1=substr($1,1,5) OFS substr($1,6,5);} 1' OFS=, file1 FS=' ' file2
$ cat file1 #It contains ID:Name
5:John
4:Michel
$ cat file2 #It contains ID
5
4
3
I want to Replace the IDs in file2 with Names from file1, output required
John
Michel
NO MATCH FOUND
I need to expand the below code to reult NO MATCH FOUND text.
awk -F":" 'NR==FNR {a[$1]=$2;next} {print a[$1]}' file1 file2
My current result:
John
Michel
<< empty line
Thanks,
You can use a ternary operator for this: print ($1 in a)?a[$1]:"NO MATCH FOUND". That is, if $1 is in the array, print it; otherwise, print the text "NO MATCH FOUND".
All together:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} {print ($1 in a)?a[$1]:"NO MATCH FOUND"}' f1 f2
John
Michel
NO MATCH FOUND
You can test whether the index occurs in the array:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} $1 in a {print a[$1]; next} {print "NOT FOUND"}' file1 file2
John
Michel
NOT FOUND
if file2 has only digit (no space at the end)
awk -F ':' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1
if not
awk -F '[:[:blank:]]' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1 file2
I would like to obtain the match the IDs of the first file to the IDs of the second file, so i get, for example, Thijs Al,NED19800616,39. I know this should be possible with AWK, but I'm not really good at it.
file1 (few entries)
NED19800616,Thijs Al
BEL19951212,Nicolas Cleppe
BEL19950419,Ben Boes
FRA19900221,Arnaud Jouffroy
...
file2 (many entries)
38,FRA19920611
39,NED19800616
40,BEL19931210
41,NED19751211
...
Don't use awk, use join. First make sure the input files are sorted:
sort -t, -k1,1 file1 > file1.sorted
sort -t, -k2,2 file2 > file2.sorted
join -t, -1 1 -2 2 file[12].sorted
With awk you can do
$ awk -F, 'NR==FNR{a[$2]=$1;next}{print $2, $1, a[$1] }' OFS=, file2 file1
Thijs Al,NED19800616,39
Nicolas Cleppe,BEL19951212,
Ben Boes,BEL19950419,
Arnaud Jouffroy,FRA19900221,