Comparing fields of two files in awk - awk

I want to compare two fields of two files, such as follows:
Compare the 2nd filed of file one with the 1st field of file two, print the match (even if the match is repeated) and all the columns of file one and two.
File 1:
G4 b45 3 4
G4 b45 1 3
G3 b23 2 2
G3 b22 2 6
G3 b22 2 4
File 2:
b45 a b c
b64 d e f
b23 g h i
b22 j k l
b20 m n o
Output:
G4 b45 a b c 3 4
G4 b45 a b c 1 3
G3 b23 g h i 2 2
G3 b22 j k l 2 6
G3 b22 j k l 2 4
I have tried this with the following awk command using associative arrays:
awk 'FNR==NR {array1[$2] = $1 ; arrayrest[$2] = substr($0, index($0, $2)); next}($1 in array1) {print array1[$1] "\t" $0 "\t" arrayrest[$1]}' file1 file2
But there are two problems:
It does not print the lines if the match is repeated while I want them to be printed.
It repeats the first field of file two in the output.
How could I make this awk command work nicely? Thanks in advance.

Not quite the exact output formatting you want but the right output contents.
awk 'FNR==NR{seen[$1]=$0; next} ($2 in seen) {$2=seen[$2]}7' file2 file1
Add | column -t to get more consistent column spacing.

This should be simple and clear to u:
awk 'NR==FNR {n[$2]=$0} {if ($1 in n) print n[$1],$2,$3,$4}' file1 file2

small awk
awk '{x[$1]=$0}$2=x[$2]' f2 f1
If $1 and $2 can contain the same value
awk '{x[$1]=$0}FNR!=NR&&$2=x[$2]' f2 f1
output
G4 b45 a b c 3 4
G4 b45 a b c 1 3
G3 b23 g h i 2 2
G3 b22 j k l 2 6
G3 b22 j k l 2 4

Related

Count number of occurrences of a number larger than x from every raw

I have a file with multiple rows and 26 columns. I want to count the number of occurrences of values that are higher than 0 (I guess is also valid different from 0) in each row (excluding the first two columns). The file looks like this:
X Y Sample1 Sample2 Sample3 .... Sample24
a a1 0 7 0 0
b a2 2 8 0 0
c a3 0 3 15 3
d d3 0 0 0 0
I would like to have an output file like this:
X Y Result
a a1 1
b b1 2
c c1 3
d d1 0
awk or sed would be good.
I saw a similar question but in that case the columns were summed and the desired output was different.
awk 'NR==1{printf "X\tY\tResult%s",ORS} # Printing the header
NR>1{
count=0; # Initializing count for each row to zero
for(i=3;i<=NF;i++){ #iterating from field 3 to end, NF is #fields
if($i>0){ #$i expands to $3,$4 and so which are the fields
count++; # Incrementing if the condition is true.
}
};
printf "%s\t%s\t%s%s",$1,$2,count,ORS # For each row print o/p
}' file
should do that
another awk
$ awk '{if(NR==1) c="Result";
else for(i=3;i<=NF;i++) c+=($i>0);
print $1,$2,c; c=0}' file | column -t
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0
$ awk '{print $1, $2, (NR>1 ? gsub(/ [1-9]/,"") : "Result")}' file
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0

matching rows and fields from two files

I would like to match the record number in one file will the same field number in another file:
file1:
1
3
5
4
3
1
5
file2:
A B C D E F G
H I J J K L M
N O P Q R S T
I would like to use the record numbers corresponding to 5 in the first file to obtain the corresponding fields in the second file. Desired output:
C G
J M
P T
So far, I've done:
awk '{ if ($1=="5") print NR }' file1 > temp
for i in $(cat temp); do
awk '{ print $"'${i}'" }' file2
done
But get the output:
C
J
P
G
M
T
I would like to have this in the format of the desired output above, but can't get it to work. Perhaps using prinf or awk for-loop might work, but I have had no success.
Thank you all.
awk 'NR==FNR{if($1==5)a[NR];next}{for(i in a){printf $i" "}print ""}' a b
C G
J M
P T

Concatenate/merge columns

File( ~50,000 columns)
A1 2 123 f f j j k k
A2 10 789 f o p f m n
Output
A1 2 123 ff jj kk
A2 10 789 fo pf mn
I basically want to concatenate every two columns into one starting from column4. How can we do it in awk or sed?
It is possible in awk. See below
:~/t> more test.txt
A1 2 123 f f j j k k
:~/t> awk '{for(i=j=4; i < NF; i+=2) {$j = $i$(i+1); j++} NF=j-1}1' test.txt
A1 2 123 ff jj kk
Sorry just noticed you gave two lines as example...
:~/t> more test.txt
A1 2 123 f f j j k k
A2 10 789 f o p f m n
:~/t> awk '{for(i=j=4; i < NF; i+=2) {$j = $i$(i+1); j++} NF=j-1}1' test.txt
A1 2 123 ff jj kk
A2 10 789 fo pf mn

sum rows based on unique columns awk

I'm looking for a more elegant way to do this (for more than >100 columns):
awk '{a[$1]+=$4}{b[$1]+=$5}{c[$1]+=$6}{d[$1]+=$7}{e[$1]+=$8}{f[$1]+=$9}{g[$1]+=$10}END{for(i in a) print i,a[i],b[i],c[i],d[i],e[i],f[i],g[i]}'
Here is the input:
a1 1 1 2 2
a2 2 5 3 7
a2 2 3 3 8
a3 1 4 6 1
a3 1 7 9 4
a3 1 2 4 2
and output:
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7
Thanks :)
I break the one-liner down into lines, to make it easier to read.
awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}
END{for(x in n){
printf "%s ", x
for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:FS)
}
}' file
this awk command should work with your 100 columns file.
test with your file:
kent$ cat f
a1 1 1 2 2
a2 2 5 3 7
a2 2 3 3 8
a3 1 4 6 1
a3 1 7 9 4
a3 1 2 4 2
kent$ awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}END{for(x in n){printf "%s ", x;for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:OFS)}}' f
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7
Using arrays of arrays in gnu awk version 4
awk '{for (i=2;i<=NF;i++) a[$1][i]+=$i}
END{for (i in a)
{ printf i FS;
for (j in a[i]) printf a[i][j] FS
printf RS}
}' file
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7
If you care about order of output try this
$ cat file
a1 1 1 2 2
a2 2 5 3 7
a2 2 3 3 8
a3 1 4 6 1
a3 1 7 9 4
a3 1 2 4 2
Awk Code :
$ cat tester
awk 'FNR==NR{
U[$1] # Array U with index being field1
for(i=2;i<=NF;i++) # loop through columns thats is column2 to NF
A[$1,i]+=$i # Array A holds sum of columns
next # stop processing the current record and go on to the next record
}
($1 in U){ # Here we read same file once again,if field1 is found in array U, then following statements
for(i=1;i<=NF;i++)
s = s ? s OFS A[$1,i] : A[$1,i] # I am writing sum to variable s since I want to use only one print statement, here you can use printf also
print $1,s # print column1 and variable s
delete U[$1] # We have done, so delete array element
s = "" # reset variable s
}' OFS='\t' file{,} # output field separator is tab you can set comma also
Resulting
$ bash tester
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
--edit--
As requested in comment here is one liner, in above post for better reading purpose I had commented and it became several lines.
$ awk 'FNR==NR{U[$1];for(i=2;i<=NF;i++)A[$1,i]+=$i;next}($1 in U){for(i=1;i<=NF;i++)s = s ? s OFS A[$1,i] : A[$1,i];print $1,s;delete U[$1];s = ""}' OFS='\t' file{,}
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7

Looping and merging 2 files

First of all, please pardon me, I am a noob.
My problem is as follows:
I have 2 text files - file1 and file2.
Following are the file samples and the desired output:
file1:
A B C
D E F
G H I
file2:
a1 a2 a3
b1 b2 b3
c1 c2 c3
Desired output:
A B C a1 a2 a3
A B C b1 b2 b3
A B C c1 c2 c3
D E F a1 a2 a3
D E F b1 b2 b3
D E F c1 c2 c3
and so on.
Can anybody please help me out with this?
awk 'FNR == NR {file2[FNR] = $0; c++; next} {for (i = 1; i <= c; i++) {print $0, file2[i]}}' file2 file1
Read all the lines of file2 into an array. For each line of file1, loop through the array and print the line from file1 and the line from file2.
In Bash:
while read -r line
do
file2+=("$line")
done < file2
while read -r line
do
for line2 in "${file2[#]}"
do
echo "$line $line2"
done
done < file1