So yeah im trying to match file1 that contains email to file2 that cointains email colons address, how do i go on bout doing that?
tried awk 'FNR==NR{a[$1]=$0; next}{print a[$1] $0}' but idk what im doing wrong
file1:
email#email.email
email#test.test
test#email.email
file2:
email#email.email:addressotest
email#test.club:clubbingson
test#email.email:addresso2
output:
test#email.email:addresso2
email#email.email:addressotest
Following awk may help you in same.
awk 'FNR==NR{a[$0];next} ($1 in a)' FILE_1 FS=":" FILE_2
join with presorting input files
$ join -t: <(sort file1) <(sort file2)
email#email.email:addressotest
test#email.email:addresso2
Hey why going for a awk solution when you can simply use the following join command:
join -t':' file 1 file2
where join as its names indicate is just a file joining command and you chose the field separator and usually the input columns and output to display (here not necessary)
Tested:
$more file{1,2}
::::::::::::::
file1
::::::::::::::
email#email.email
email#test.test
test#email.email
::::::::::::::
file2
::::::::::::::
email#email.email:addressotest
email#test.club:clubbingson
test#email.email:addresso2
$join -t':' file1 file2
email#email.email:addressotest
test#email.email:addresso2
If you need to sort the output as well, change the command into:
join -t':' file 1 file2 | sort -t":" -k1
or
join -t':' file 1 file2 | sort -t":" -k2
depending on which column you want to sort upon (eventually add the -r option to sort in reverse order.
join -t':' file 1 file2 | sort -t":" -k1 -r
or
join -t':' file 1 file2 | sort -t":" -k2 -r
Related
I have two files. file 1 has 16 columns with separator "," and file2 with about 40 columns with separator "tab".
i want to compare two files. if the columns 1,2,3,4 of file 1 are same as columns 1,2,4,5 0f file2, the output file contain all of information of file2 and corresponding column16 of file 1.
file 1:
enter image description here
file2:
enter image description here
awk 'BEGIN {OFS="/t"} NR==FNR{FS=",";a[$1,$2,$3,$4];b[$1,$2,$3,$4]=$16}{FS="/t";if (($1,$2,$4,$5) in a) print $0,b[$1,$2,$4,$5]}' <(sort -k2 file1) <(sort -k2 file2) >output
this should work.
$ awk 'NR==FNR {a[$1,$2,$3,$4]=$16; next}
($1,$2,$4,$5) in a {print $0,a[$1,$2,$4,$5]}' FS=, file1 FS='\t' file2
no need to presort the files.
I have two ',' separated files as follow:
file1:
A,inf
B,inf
C,0.135802
D,72.6111
E,42.1613
file2:
A,inf
B,inf
C,0.313559
D,189.5
E,38.6735
I want to compare 2 files ans get the common rows based on the 1st column. So, for the mentioned files the out put would look like this:
A,inf,inf
B,inf,inf
C,0.135802,0.313559
D,72.6111,189.5
E,42.1613,38.6735
I am trying to do that in awk and tried this:
awk ' NR == FNR {val[$1]=$2; next} $1 in val {print $1, val[$1], $2}' file1 file2
this code returns this results:
A,inf
B,inf
C,0.135802
D,72.6111
E,42.1613
which is not what I want. do you know how I can improve it?
$ awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=$0;next}$1 in a{print a[$1],$2}' file1 file2
A,inf,inf
B,inf,inf
C,0.135802,0.313559
D,72.6111,189.5
E,42.1613,38.6735
Explained:
$ awk '
BEGIN {FS=OFS="," } # set separators
NR==FNR { # first file
a[$1]=$0 # hash to a, $1 as index
next # next record
}
$1 in a { # second file, if $1 in a
print a[$1],$2 # print indexed record from a with $2
}' file1 file2
Your awk code basically works, you are just missing to tell awk to use , as the field delimiter. You can do it by adding BEGIN{FS=OFS=","} to the beginning of the script.
But having that the files are sorted like in the examples in your question, you can simply use the join command:
join -t, file1 file2
This will join the files based on the first column. -t, tells join that columns are separated by commas.
If the files are not sorted, you can sort them on the fly like this:
join -t, <(sort file1) <(sort file2)
I have 2 big files.
file1 has 160 million lines with this format: id:email
file2 has 45 million lines with this format: id:hash
The problem is to find all equal ids and save those to a third file, with the format: email:hash
Tried something like:
awk -F':' 'NR==FNR{a[$1]=$2;next} {print a[$1]":"$2}' test1.in test2.in > res.in
But it's not working :(
Example file1:
9305718:test00#yahoo.com
59287478:login#hotmail.com
file2:
21367509:e90100b1b668142ad33e58c17a614696ec04474c
9305718:d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e
Desired result:
test00#yahoo.com:d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e
With GNU join and GNU bash:
join -t : -j 1 <(sort -t : -k1,1 file1) <(sort -t : -k1,1 file2) -o 1.2,2.2
Update:
join -t: <(sort file1) <(sort file2) -o 1.2,2.2
In AWK (not considering the amount of resources you have available):
$ awk -F':' 'NR==FNR{a[$1]=$2;next} a[$1] {print a[$1]":"$2}' test1.in test2.in
test00#yahoo.com :d63fff1d21e1a04c066824dd2f83f3aeaa0edf6e
I would like to obtain the match the IDs of the first file to the IDs of the second file, so i get, for example, Thijs Al,NED19800616,39. I know this should be possible with AWK, but I'm not really good at it.
file1 (few entries)
NED19800616,Thijs Al
BEL19951212,Nicolas Cleppe
BEL19950419,Ben Boes
FRA19900221,Arnaud Jouffroy
...
file2 (many entries)
38,FRA19920611
39,NED19800616
40,BEL19931210
41,NED19751211
...
Don't use awk, use join. First make sure the input files are sorted:
sort -t, -k1,1 file1 > file1.sorted
sort -t, -k2,2 file2 > file2.sorted
join -t, -1 1 -2 2 file[12].sorted
With awk you can do
$ awk -F, 'NR==FNR{a[$2]=$1;next}{print $2, $1, a[$1] }' OFS=, file2 file1
Thijs Al,NED19800616,39
Nicolas Cleppe,BEL19951212,
Ben Boes,BEL19950419,
Arnaud Jouffroy,FRA19900221,
I have 2 files:
File 1:
1012055500012221
2011052210011021
3010051501010221
4015051510012201
File 2:
50222111
60202100
75222105
90202125
I want:
1012055500012221
2011052210011021
3010051501010221
4015051510012201
50222111
60202100
75222105
90202125
How can I do that in awk or sed?
Why do you need awk/sed when
cat file1 >> file2
will do just as well?
or if you want to leave the original two files alone and produce the joined file as a seperate one:
cat file1 file2 >> file3
A small (cryptic) awk game :)
$ cat 0
1012055500012221
2011052210011021
3010051501010221
4015051510012201
$ cat 1
50222111
60202100
75222105
90202125
$ awk 42 0 1
1012055500012221
2011052210011021
3010051501010221
4015051510012201
50222111
60202100
75222105
90202125
Since you wanted sed then here it is:
sed '' file{1,2}