Filter column matches from two files

Filter column matches from two files - awk

I have two files with the format:
number|string
I want to create a third file with the contents of file 1 and without the matches between both files on the second column.
file1
1|abc
2|bcd
3|cde
file2
1|bcd
2|def
file3
1|abc
3|cde
Is this correct?
awk -F '|' 'NR==FNR{a[$2];next}$2 not in a{print $0}' file1 file2 > file3

You may use this awk:
awk -F '|' 'FNR == NR {skip[$2]; next} !($2 in skip)' file2 file1 > file3
cat file3
1|abc
3|cde

Related

compare two files(by diffrent seperators) by 4 columns and back the desired out put

I have two files. file 1 has 16 columns with separator "," and file2 with about 40 columns with separator "tab".
i want to compare two files. if the columns 1,2,3,4 of file 1 are same as columns 1,2,4,5 0f file2, the output file contain all of information of file2 and corresponding column16 of file 1.
file 1:
enter image description here
file2:
enter image description here
awk 'BEGIN {OFS="/t"} NR==FNR{FS=",";a[$1,$2,$3,$4];b[$1,$2,$3,$4]=$16}{FS="/t";if (($1,$2,$4,$5) in a) print $0,b[$1,$2,$4,$5]}' <(sort -k2 file1) <(sort -k2 file2) >output

this should work.
$ awk 'NR==FNR {a[$1,$2,$3,$4]=$16; next}
($1,$2,$4,$5) in a {print $0,a[$1,$2,$4,$5]}' FS=, file1 FS='\t' file2
no need to presort the files.

Match values in two files and replace values in specific columns

The purpose is to check if values for column 2 and 3 in file1 match with column 1 in file2. If any value match, then replace values in file2 for column 2 and 3 using the information of file1 columns 4 and 5.
file1
100,31431,37131,999991.70,2334362.30
100,31431,37471,111113.20,2334363.30
100,31433,36769,777775.60,2334361.90
102,31433,36853,333322.00,2334362.80
file2
3143137113 318512.50 2334387.50 100
3143137131 318737.50 2334387.50 100
3143137201 319612.50 2334387.50 100
3143137219 319837.50 2334387.50 100
3143137471 322987.50 2334387.50 100
3143137491 323237.50 2334387.50 100
3143336687 313187.50 2334412.50 100
3143336723 313637.50 2334412.50 100
3143336769 314212.50 2334412.50 100
3143336825 314912.50 2334412.50 100
3143336853 315262.50 2334412.50 102
Output desired
31431,37113,318512.50,2334387.50,100
31431,37131,999991.70,2334362.30,100
31431,37201,319612.50,2334387.50,100
31431,37219,319837.50,2334387.50,100
31431,37471,111113.20,2334363.30,100
31431,37491,323237.50,2334387.50,100
31433,36687,313187.50,2334412.50,100
31433,36723,313637.50,2334412.50,100
31433,36769,777775.60,2334361.90,100
31433,36825,314912.50,2334412.50,100
31433,36853,333322.00,2334362.80,102
I tried
awk -F[, ] 'FNR==NR{a[$1 $2]=$0;next}$1 in a{print $0 ,a[$1 $2]}' file1 file2
Thanks in advance

Could you please try following.
awk '
BEGIN{
OFS=","
}
FNR==NR{
a[$2 $3]=$2 OFS $3
b[$2 $3]=$4;c[$2 $3]=$5
next
}
($1 in a){
$2=b[$1]
$3=c[$1];$1=a[$1]
print
next
}
{
$1=$1
sub(/^...../,"&,",$1)
print
}
' FS="," file1 FS=" " file2
Output will be as follows.
31431,37113,318512.50,2334387.50,100
31431,37131,999991.70,2334362.30,100
31431,37201,319612.50,2334387.50,100
31431,37219,319837.50,2334387.50,100
31431,37471,111113.20,2334363.30,100
31431,37491,323237.50,2334387.50,100
31433,36687,313187.50,2334412.50,100
31433,36723,313637.50,2334412.50,100
31433,36769,777775.60,2334361.90,100
31433,36825,314912.50,2334412.50,100
31433,36853,333322.00,2334362.80,102

Try this:
$ awk -F, 'NR==FNR{tmp=$0;sub($1 FS,"",tmp);a[$2 $3]=tmp;next} $1 in a{print a[$1],$NF;next} {$1=substr($1,1,5) OFS substr($1,6,5);} 1' OFS=, file1 FS=' ' file2
31431,37113,318512.50,2334387.50,100
31431,37131,999991.70,2334362.30,100
31431,37201,319612.50,2334387.50,100
31431,37219,319837.50,2334387.50,100
31431,37471,111113.20,2334363.30,100
31431,37491,323237.50,2334387.50,100
31433,36687,313187.50,2334412.50,100
31433,36723,313637.50,2334412.50,100
31433,36769,777775.60,2334361.90,100
31433,36825,314912.50,2334412.50,100
31433,36853,333322.00,2334362.80,102
Above assumes $1 of file does not include regex characters, so to be accurate and safe, better use this:
awk -F, 'NR==FNR{$1="";a[$2 $3]=substr($0,2);next} $1 in a{print a[$1],$NF;next} {$1=substr($1,1,5) OFS substr($1,6,5);} 1' OFS=, file1 FS=' ' file2
However this one assumes the FS of file1 is 1 character only.
And that leads to another change/efficiency improvement:
awk -F, 'NR==FNR{a[$2 $3]=substr($0,length($1 FS)+1);next} $1 in a{print a[$1],$NF;next} {$1=substr($1,1,5) OFS substr($1,6,5);} 1' OFS=, file1 FS=' ' file2

Replace ids from file1 with that of file2

I have two text files and I want to replace id from file1 with that of file2. All the ids are in the same order in both the files.
File1
>12_abc
ghfghfjgfhjgfjf
hgfjfgjgfjfgjgfjf
>13_def
ghfghgfgfgfghfjhf
nmbnmbhjgkjgjhggh
>14_ghi
uytghhuytuytuytuyt
ytrftyfrghfhgfgfgg
File2
>12_abc|10
>13_def|20
>14_ghi|30
Desired Output
>12_abc|10
ghfghfjgfhjgfjf
hgfjfgjgfjfgjgfjf
>13_def|20
ghfghgfgfgfghfjhf
nmbnmbhjgkjgjhggh
>14_ghi|30
uytghhuytuytuytuyt
ytrftyfrghfhgfgfgg
awk '{print} !(NR%2) {if ((getline < "File2.txt") > -1) print}' File1

This looks awkwardly a lot as a FASTA file. This is how I would do it:
If you want to replace the name in order:
awk '(NR==FNR){a[FNR]=$0;next}/^>/{print a[++c]; next}1' File2 File1 > File1.new
If you want to replace the name based on the content:
awk -F '|' '(NR==FNR){a[$1]=$0;next}/^>/{print a[$0]; next}1' File2 File1 > File1.new

Print default value if index is not in awk array

$ cat file1 #It contains ID:Name
5:John
4:Michel
$ cat file2 #It contains ID
5
4
3
I want to Replace the IDs in file2 with Names from file1, output required
John
Michel
NO MATCH FOUND
I need to expand the below code to reult NO MATCH FOUND text.
awk -F":" 'NR==FNR {a[$1]=$2;next} {print a[$1]}' file1 file2
My current result:
John
Michel
<< empty line
Thanks,

You can use a ternary operator for this: print ($1 in a)?a[$1]:"NO MATCH FOUND". That is, if $1 is in the array, print it; otherwise, print the text "NO MATCH FOUND".
All together:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} {print ($1 in a)?a[$1]:"NO MATCH FOUND"}' f1 f2
John
Michel
NO MATCH FOUND

You can test whether the index occurs in the array:
$ awk -F":" 'NR==FNR {a[$1]=$2;next} $1 in a {print a[$1]; next} {print "NOT FOUND"}' file1 file2
John
Michel
NOT FOUND

if file2 has only digit (no space at the end)
awk -F ':' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1
if not
awk -F '[:[:blank:]]' '$1 in A {print A[$1];next}{if($2~/^$/) print "NOT FOUND";else A[$1]=$2}' file1 file2

Awk merging of two files on id

I would like to obtain the match the IDs of the first file to the IDs of the second file, so i get, for example, Thijs Al,NED19800616,39. I know this should be possible with AWK, but I'm not really good at it.
file1 (few entries)
NED19800616,Thijs Al
BEL19951212,Nicolas Cleppe
BEL19950419,Ben Boes
FRA19900221,Arnaud Jouffroy
...
file2 (many entries)
38,FRA19920611
39,NED19800616
40,BEL19931210
41,NED19751211
...

Don't use awk, use join. First make sure the input files are sorted:
sort -t, -k1,1 file1 > file1.sorted
sort -t, -k2,2 file2 > file2.sorted
join -t, -1 1 -2 2 file[12].sorted

With awk you can do
$ awk -F, 'NR==FNR{a[$2]=$1;next}{print $2, $1, a[$1] }' OFS=, file2 file1
Thijs Al,NED19800616,39
Nicolas Cleppe,BEL19951212,
Ben Boes,BEL19950419,
Arnaud Jouffroy,FRA19900221,

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Filter column matches from two files - awk

You may use this awk: awk -F '|' 'FNR == NR {skip[$2]; next} !($2 in skip)' file2 file1 > file3 cat file3 1|abc 3|cde

Related

compare two files(by diffrent seperators) by 4 columns and back the desired out put

Match values in two files and replace values in specific columns

Replace ids from file1 with that of file2

Print default value if index is not in awk array

Awk merging of two files on id

Categories

Resources