I would like to obtain the match the IDs of the first file to the IDs of the second file, so i get, for example, Thijs Al,NED19800616,39. I know this should be possible with AWK, but I'm not really good at it.
file1 (few entries)
NED19800616,Thijs Al
BEL19951212,Nicolas Cleppe
BEL19950419,Ben Boes
FRA19900221,Arnaud Jouffroy
...
file2 (many entries)
38,FRA19920611
39,NED19800616
40,BEL19931210
41,NED19751211
...
Don't use awk, use join. First make sure the input files are sorted:
sort -t, -k1,1 file1 > file1.sorted
sort -t, -k2,2 file2 > file2.sorted
join -t, -1 1 -2 2 file[12].sorted
With awk you can do
$ awk -F, 'NR==FNR{a[$2]=$1;next}{print $2, $1, a[$1] }' OFS=, file2 file1
Thijs Al,NED19800616,39
Nicolas Cleppe,BEL19951212,
Ben Boes,BEL19950419,
Arnaud Jouffroy,FRA19900221,
Related
I have two ',' separated files as follow:
file1:
A,inf
B,inf
C,0.135802
D,72.6111
E,42.1613
file2:
A,inf
B,inf
C,0.313559
D,189.5
E,38.6735
I want to compare 2 files ans get the common rows based on the 1st column. So, for the mentioned files the out put would look like this:
A,inf,inf
B,inf,inf
C,0.135802,0.313559
D,72.6111,189.5
E,42.1613,38.6735
I am trying to do that in awk and tried this:
awk ' NR == FNR {val[$1]=$2; next} $1 in val {print $1, val[$1], $2}' file1 file2
this code returns this results:
A,inf
B,inf
C,0.135802
D,72.6111
E,42.1613
which is not what I want. do you know how I can improve it?
$ awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=$0;next}$1 in a{print a[$1],$2}' file1 file2
A,inf,inf
B,inf,inf
C,0.135802,0.313559
D,72.6111,189.5
E,42.1613,38.6735
Explained:
$ awk '
BEGIN {FS=OFS="," } # set separators
NR==FNR { # first file
a[$1]=$0 # hash to a, $1 as index
next # next record
}
$1 in a { # second file, if $1 in a
print a[$1],$2 # print indexed record from a with $2
}' file1 file2
Your awk code basically works, you are just missing to tell awk to use , as the field delimiter. You can do it by adding BEGIN{FS=OFS=","} to the beginning of the script.
But having that the files are sorted like in the examples in your question, you can simply use the join command:
join -t, file1 file2
This will join the files based on the first column. -t, tells join that columns are separated by commas.
If the files are not sorted, you can sort them on the fly like this:
join -t, <(sort file1) <(sort file2)
So yeah im trying to match file1 that contains email to file2 that cointains email colons address, how do i go on bout doing that?
tried awk 'FNR==NR{a[$1]=$0; next}{print a[$1] $0}' but idk what im doing wrong
file1:
email#email.email
email#test.test
test#email.email
file2:
email#email.email:addressotest
email#test.club:clubbingson
test#email.email:addresso2
output:
test#email.email:addresso2
email#email.email:addressotest
Following awk may help you in same.
awk 'FNR==NR{a[$0];next} ($1 in a)' FILE_1 FS=":" FILE_2
join with presorting input files
$ join -t: <(sort file1) <(sort file2)
email#email.email:addressotest
test#email.email:addresso2
Hey why going for a awk solution when you can simply use the following join command:
join -t':' file 1 file2
where join as its names indicate is just a file joining command and you chose the field separator and usually the input columns and output to display (here not necessary)
Tested:
$more file{1,2}
::::::::::::::
file1
::::::::::::::
email#email.email
email#test.test
test#email.email
::::::::::::::
file2
::::::::::::::
email#email.email:addressotest
email#test.club:clubbingson
test#email.email:addresso2
$join -t':' file1 file2
email#email.email:addressotest
test#email.email:addresso2
If you need to sort the output as well, change the command into:
join -t':' file 1 file2 | sort -t":" -k1
or
join -t':' file 1 file2 | sort -t":" -k2
depending on which column you want to sort upon (eventually add the -r option to sort in reverse order.
join -t':' file 1 file2 | sort -t":" -k1 -r
or
join -t':' file 1 file2 | sort -t":" -k2 -r
Sort By Third Column And Fourth Column
Input
B,c,3,
G,h,2,
J,k,4,
M,n,,1
Output
M,n,,1
G,h,2,
B,c,3,
J,k,4,
Please help me
UPDATED
awk -F, 'a[$3]<$4{a[$3]=$4;b[$3]=$0}END{for(l in a){print b[l]","l} }' FILE2
i use this command and i obtains
this
M,n,,1,
,2
,3
,4
sort is better choice than awk here
$ sort -t, -k3,3n -k4,4n ip.txt
M,n,,1
G,h,2,
B,c,3,
J,k,4,
-t, use , as delimiter
-k3,3n sort by 3rd column, numerically
-k4,4n then sort by 4th column, numerically
awk to the rescue!
$ awk -F, '{a[$3,$4,NR]=$0}
END {n=asorti(a,ix);
for(k=1;k<=n;k++) print a[ix[k]]}' file
M,n,,1
G,h,2,
B,c,3,
J,k,4,
note that key is constructed in a such a way to handle duplicate rows
if you don't have asorti here's a workaround
$ awk -F, '{a[$0]=$3 FS $4 FS NR RS $0}
END {n=asort(a);
for(k=1;k<=n;k++)
{sub(".*"RS,"",a[k]);
print a[k]}}' file
I used RS as a secondary delimiter to keep the line separate from the sort key. Note duplicate rows will be counted as one (duplicate keys are fine). If you want to support duplicate rows change to a[$0,NR]
I use this command and it works:
nawk -F, '{a[$3]<$4;a[$3]=$4;b[$3]=$0} END{for(i in a){print b[i]}}' FILE2
I am trying to combine matching lines in file.txt $1 and then display the sum of `$2 for those matches. Thank you :).
File.txt
ENSMUSG00000000001:001
ENSMUSG00000000001:002
ENSMUSG00000000001:003
ENSMUSG00000000002:003
ENSMUSG00000000002:003
ENSMUSG00000000003:002
Desired output
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
awk -F':' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file > output.txt
$ awk -F':' -v OFS='\t' '{sum[$1]+=$2} END{for (key in sum) print key, sum[key]}' file
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
{x=$1;a[x]=a[x] + $2} END{for(x in a)print x,a[x]}
Just a typo I guess: instead of adding $0 add $2. That gives me the expected output. And the $1="" is not necessary. To make sure that there isn't anything funny with $2 you may consider 1.0*$2.
All I want is the last two columns printed.
You can make use of variable NF which is set to the total number of fields in the input record:
awk '{print $(NF-1),"\t",$NF}' file
this assumes that you have at least 2 fields.
awk '{print $NF-1, $NF}' inputfile
Note: this works only if at least two columns exist. On records with one column you will get a spurious "-1 column1"
#jim mcnamara: try using parentheses for around NF, i. e. $(NF-1) and $(NF) instead of $NF-1 and $NF (works on Mac OS X 10.6.8 for FreeBSD awkand gawk).
echo '
1 2
2 3
one
one two three
' | gawk '{if (NF >= 2) print $(NF-1), $(NF);}'
# output:
# 1 2
# 2 3
# two three
using gawk exhibits the problem:
gawk '{ print $NF-1, $NF}' filename
1 2
2 3
-1 one
-1 three
# cat filename
1 2
2 3
one
one two three
I just put gawk on Solaris 10 M4000:
So, gawk is the cuplrit on the $NF-1 vs. $(NF-1) issue. Next question what does POSIX say?
per:
http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html
There is no direction one way or the other. Not good. gawk implies subtraction, other awks imply field number or subtraction. hmm.
Please try this out to take into account all possible scenarios:
awk '{print $(NF-1)"\t"$NF}' file
or
awk 'BEGIN{OFS="\t"}' file
or
awk '{print $(NF-1), $NF} {print $(NF-1), $NF}' file
try with this
$ cat /tmp/topfs.txt
/dev/sda2 xfs 32G 10G 22G 32% /
awk print last column
$ cat /tmp/topfs.txt | awk '{print $NF}'
awk print before last column
$ cat /tmp/topfs.txt | awk '{print $(NF-1)}'
32%
awk - print last two columns
$ cat /tmp/topfs.txt | awk '{print $(NF-1), $NF}'
32% /