I have two different files with two columns each.
file1.txt
DevId Group
aaa A
bbb B
file2.txt
Group RefId
A 111-222-333
B 444-555-666
I need only need DevId and its corresponding RefId.
Required Output
DevId RefId
aaa 111-222-333
bbb 444-555-666
I tried using this syntax but I can't get it correctly.
awk -F, -v OFS=, 'NR==FNR{a[$1]=$2;next}{print a[$2],$1}' file2.txt file1.txt
I hope someone could help me.
Here:
awk -v RS="\r\n" 'FNR==NR{a[$1]=$2;next}{ print $1, a[$2]}' file2.txt file1.txt
This was modified from Awk multiple files which I suggest you read for the explanation.
Edit: As mentioned by #JamesBrown, added -v RS="\r\n" for line endings
Related
I have a big text file with 2 tab separated fields. as you see in the small example every 2 lines have a number in common. I want to summarize my text file in this way.
1- look for the lines that have the number in common and sum up the second column of those lines.
small example:
ENST00000054666.6 2
ENST00000054666.6_2 15
ENST00000054668.5 4
ENST00000054668.5_2 10
ENST00000054950.3 0
ENST00000054950.3_2 4
expected output:
ENST00000054666.6 17
ENST00000054668.5 14
ENST00000054950.3 4
as you see the difference is in both columns. in the 1st column there is only one repeat of each common and without "_2" and in the 2nd column the values is sum up of both lines (which have common number in input file).
I tried this code but does not return what I want:
awk -F '\t' '{ col2 = $2, $2=col2; print }' OFS='\t' input.txt > output.txt
do you know how to fix it?
Solution 1st: Following awk may help you on same.
awk '{sub(/_.*/,"",$1)} {a[$1]+=$NF} END{for(i in a){print i,a[i]}}' Input_file
Solution 2nd: In case your Input_file is sorted by 1st field then following may help you.
awk '{sub(/_.*/,"",$1)} prev!=$1 && prev{print prev,val;val=""} {val+=$NF;prev=$1} END{if(val){print prev,val}}' Input_file
Use > output.txt at the end of the above codes in case you need the output in a output file too.
If order is not a concern, below may also help :
awk -v FS="\t|_" '{count[$1]+=$NF}
END{for(i in count){printf "%s\t%s%s",i,count[i],ORS;}}' file
ENST00000054668.5 14
ENST00000054950.3 4
ENST00000054666.6 17
Edit :
If the order of the output does matter, below approach using a flag helps :
$ awk -v FS="\t|_" '{count[$1]+=$NF;++i;
if(i==2){printf "%s\t%s%s",$1,count[$1],ORS;i=0}}' file
ENST00000054666.6 17
ENST00000054668.5 14
ENST00000054950.3 4
I have two files. First file is masterlist of IDS. Second file is normal input file.
I'm trying to print only the records of input where it's id (column 3) is NOT in masterlist (column 1).
sample:
masterlist.txt
111
222
333
input.txt
col1,col2,col3,col4,col5,col6
abc,abc,111,xyz,xyz,xyz
abc,abc,222,xyz,xyz,xyz
abc,abc,333,xyz,xyz,xyz
abc,abc,444,xyz,xyz,xyz
desired output:
col3,col4,col5,col6
abc,abc,444,xyz,xyz,xyz
I have come up with this code so far but I'm not getting the correct output.
awk -F\| '!b{a[$0]; next}$3 in a {true; next} {print $3","$4","$11","$12}' masterlist.txt b=1 input.txt
Could you please try following awk and let us know if this helps you.
awk 'FNR==NR{a[$1];next} !($3 in a)' masterlist.txt FS="," input.txt
So yeah im trying to match file1 that contains email to file2 that cointains email colons address, how do i go on bout doing that?
tried awk 'FNR==NR{a[$1]=$0; next}{print a[$1] $0}' but idk what im doing wrong
file1:
email#email.email
email#test.test
test#email.email
file2:
email#email.email:addressotest
email#test.club:clubbingson
test#email.email:addresso2
output:
test#email.email:addresso2
email#email.email:addressotest
Following awk may help you in same.
awk 'FNR==NR{a[$0];next} ($1 in a)' FILE_1 FS=":" FILE_2
join with presorting input files
$ join -t: <(sort file1) <(sort file2)
email#email.email:addressotest
test#email.email:addresso2
Hey why going for a awk solution when you can simply use the following join command:
join -t':' file 1 file2
where join as its names indicate is just a file joining command and you chose the field separator and usually the input columns and output to display (here not necessary)
Tested:
$more file{1,2}
::::::::::::::
file1
::::::::::::::
email#email.email
email#test.test
test#email.email
::::::::::::::
file2
::::::::::::::
email#email.email:addressotest
email#test.club:clubbingson
test#email.email:addresso2
$join -t':' file1 file2
email#email.email:addressotest
test#email.email:addresso2
If you need to sort the output as well, change the command into:
join -t':' file 1 file2 | sort -t":" -k1
or
join -t':' file 1 file2 | sort -t":" -k2
depending on which column you want to sort upon (eventually add the -r option to sort in reverse order.
join -t':' file 1 file2 | sort -t":" -k1 -r
or
join -t':' file 1 file2 | sort -t":" -k2 -r
I would like to obtain the match the IDs of the first file to the IDs of the second file, so i get, for example, Thijs Al,NED19800616,39. I know this should be possible with AWK, but I'm not really good at it.
file1 (few entries)
NED19800616,Thijs Al
BEL19951212,Nicolas Cleppe
BEL19950419,Ben Boes
FRA19900221,Arnaud Jouffroy
...
file2 (many entries)
38,FRA19920611
39,NED19800616
40,BEL19931210
41,NED19751211
...
Don't use awk, use join. First make sure the input files are sorted:
sort -t, -k1,1 file1 > file1.sorted
sort -t, -k2,2 file2 > file2.sorted
join -t, -1 1 -2 2 file[12].sorted
With awk you can do
$ awk -F, 'NR==FNR{a[$2]=$1;next}{print $2, $1, a[$1] }' OFS=, file2 file1
Thijs Al,NED19800616,39
Nicolas Cleppe,BEL19951212,
Ben Boes,BEL19950419,
Arnaud Jouffroy,FRA19900221,
All I want is the last two columns printed.
You can make use of variable NF which is set to the total number of fields in the input record:
awk '{print $(NF-1),"\t",$NF}' file
this assumes that you have at least 2 fields.
awk '{print $NF-1, $NF}' inputfile
Note: this works only if at least two columns exist. On records with one column you will get a spurious "-1 column1"
#jim mcnamara: try using parentheses for around NF, i. e. $(NF-1) and $(NF) instead of $NF-1 and $NF (works on Mac OS X 10.6.8 for FreeBSD awkand gawk).
echo '
1 2
2 3
one
one two three
' | gawk '{if (NF >= 2) print $(NF-1), $(NF);}'
# output:
# 1 2
# 2 3
# two three
using gawk exhibits the problem:
gawk '{ print $NF-1, $NF}' filename
1 2
2 3
-1 one
-1 three
# cat filename
1 2
2 3
one
one two three
I just put gawk on Solaris 10 M4000:
So, gawk is the cuplrit on the $NF-1 vs. $(NF-1) issue. Next question what does POSIX say?
per:
http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html
There is no direction one way or the other. Not good. gawk implies subtraction, other awks imply field number or subtraction. hmm.
Please try this out to take into account all possible scenarios:
awk '{print $(NF-1)"\t"$NF}' file
or
awk 'BEGIN{OFS="\t"}' file
or
awk '{print $(NF-1), $NF} {print $(NF-1), $NF}' file
try with this
$ cat /tmp/topfs.txt
/dev/sda2 xfs 32G 10G 22G 32% /
awk print last column
$ cat /tmp/topfs.txt | awk '{print $NF}'
awk print before last column
$ cat /tmp/topfs.txt | awk '{print $(NF-1)}'
32%
awk - print last two columns
$ cat /tmp/topfs.txt | awk '{print $(NF-1), $NF}'
32% /