Common values in 2 columns in 2 files

Common values in 2 columns in 2 files - awk

Suppose I have these 2 tab delimited files, where the second column in first file contains matching values from first column of the second file, I would like to get an output like this:
FileA:
1 A
2 B
3 C
FileB:
A Apple
C Cinnabon
B Banana
I would like an output like this:
1 Apple
2 Banana
3 Cinnabon
I can write a script for this, but I would like to know how to make it in awk or perl in one line.

awk 'BEGIN{FS=OFS="\t"}NR==FNR{a[$1]=$2;next}{$2=a[$2]}1' f2 f1

GNU sed oneliner
sed -r 's:\s*(\S+)\s+(\S+):/\\s*\\S\\+\\s\\+\1/s/\\(\\s*\\S\\+\\s\\+\\)\1/\\1\2/:' fileB | sed -f - fileA
..output:
1 Apple
2 Banana
3 Cinnabon

The command you want is this:
$ awk 'FNR==NR{a[$1]=$2 FS $3;next}{$2=a[$2]; print}' file2 file1
1 Apple
2 Banana
3 Cinnabon

Related

awk the column of one file from another file

having some trouble with awk. I have two files and am trying to read a column of the 2nd file with the first and pull out all matches.
file1:
1
2
3
4
5
file2:
apples peaches 3
apples peaches 9
oranges pears 7
apricots figs 1
expected output:
apples peaches 3
apricots figs 1
awk -F"|" '
FNR==NR {f1[$1];next}
($3 in f1)
' file1 file2 > output.txt

It's not clear (to me) the format of file2 (eg, is that a space or tab between fields?), or if a line in file2 could have more than 3 (white) spaced delimited strings (eg, apples black raspberries 6), so picking a delimiter for file2 would require more details. Having said that ...
there are no pipes ('|') in the sample files so the current code (using -F"|") is going to lump the entire line into awk variable $1
we can make this a bit easier by recognizing that we're only interested in the last field from file2
Adding an entry to file2:
$ cat file2
apples peaches 3
apples peaches 9
oranges pears 7
apricots figs 1
apples black raspberries 2
A couple small changes to the current awk code:
awk 'FNR==NR {f1[$1]; next} $(NF) in f1' file1 file2
This generates:
apples peaches 3
apricots figs 1
apples black raspberries 2

This is more a side-note, I suggest to use awk, as explained by markp-fuso.
You can use the join command:
join -11 -23 <(sort -k1,1n file1) <(sort -k3,3n file2)
The example above is using join with the help of the shell and the sort command:
Command explanation:
join
-11 # Join based on column 1 of file 1 ...
-23 # and column 3 in file 2
<(sort -k1,1n file1) # sort file 1 based on column 1
<(sort -k3,3n file2) # sort file 2 based on column 3
The <() constructs are so called process substitutions, provided by the shell where you run the command in. The output of the command in parentheses will be treated like a file, and can be used as a parameter for our join command. We don't need to create an intermediate, sorted file.

Apply a sed command to every column of a specific row

I have a tab separated file:
samplename1/filename1 anotherthing/anotherfile asdfgh/hjklñ
2 3 4
5 6 7
I am trying to remove everything after the / just in the header of the file using sed:
sed 's/[/].*//' samplenames.txt
How can I do this for each column of the file? because right now I am removing everything after the first /, but I want to remove just the part of each column after the /.
Actual output:
samplename1
2 3 4
5 6 7
Desired output:
samplename1 anotherthing asdfgh
2 3 4
5 6 7

With GNU sed, you may use
sed -i '1 s,/[^[:space:]]*,,g' samplenames.txt
With FreeBSD sed, you need to add '' after -i.
See the online demo
The -i option will make sed change the file inline. The 1 means only the first line will be modified in the file.
The s,/[^[:space:]]*,,g command means that all occurrences of / followed with 0 or more non-whitespace chars after it will be removed.

Given:
printf "samplename1/filename1\tanotherthing/anotherfile\tasdfgh/hjklñ
2\t3\t4
5\t6\t7" >file # ie, note only one tab between fields...
Here is an POSIX awk to do this:
awk -F $"\t" 'NR==1{gsub("/[^\t]*",""); print; next} 1' file
Prints:
samplename1 anotherthing asdfgh
2 3 4
5 6 7
You can get those to line up with the column command:
awk -F $"\t" 'NR==1{gsub("/[^\t]*",""); print; next} 1' file | column -t
samplename1 anotherthing asdfgh
2 3 4
5 6 7

print first row of one file in front of each row of second file

fileA
abc
fileB
1
2
3
4
5
Expected output
abc 1
abc 2
abc 3
abc 4
abc 5
paste fileA fileB
my output like this
abc 1
2
3
4
5

Using awk
awk 'FNR==NR {a=$0;next} {print a,$0}' fileA fileB
abc 1
abc 2
abc 3
abc 4
abc 5

This might work for you (GNU sed & bash):
sed 's/^/'$(sed 1q fileA)' /' fileB
Insert the first line from fileA into the front of all lines in fileB.
Alternative using parallel:
parallel echo :::: <(head -1 fileA) fileB

you can try this way
this is for more than one column(3 in this case)
paste fileA fileB > file
awk 'NF==3 {a =$1;b=$3; print; next} {print a,$0}' file

Counting occurrences between pattern matches in data file and generating a report

I have a file structured like this:
MATCH A and B
001
005
101
MATCH A and C
020
400
MATCH B and C
001
156
807
920
I want to generate a report that looks like:
A and B: 3
A and C: 2
B and C: 4
I imagine the tools to use are sed/awk. I know that sed can print lines between pattern matches, but the following ends up printing out the whole file.
sed -n '/^MATCH/,/^MATCH/p' file.txt | wc -l
This returns the number of lines in the whole file. Any tips on where to look at next? It seems that this isn't the most common task and I haven't been able to find many other suggestions.

This awk should do:
awk -v RS= '{print $2,$3,$4":",NF-4}' file
A and B: 3
A and C: 2
B and C: 4
Since record are separated by one blank line, and RS is set to nothing,
we just have to count the fields NF minus first line.
This may be better:
awk -v RS= -F"\n" '{print $1":",NF-1}' file
MATCH A and B: 3
MATCH A and C: 2
MATCH B and C: 4
Or remove the MATCH word:
awk -v RS= -F"\n" '{sub("MATCH ","",$1);print $1":",NF-1}' file
A and B: 3
A and C: 2
B and C: 4

Find the difference between two files

I have the following situation:
The file1.dat is like:
1 2
1 3
1 4
2 1
and the file2.dat is like:
1 2
2 1
2 3
3 4
I want to find the differences between the second file from the first. I tried wit grep -v -f file1 file2 but my real files are bigger than this two and when I tried with it the shell never ended is work.
The result should be:
2 3
3 4
The files are sorted and they have the same number of elements. Any way to find a solution with awk?

Seems like you want lines in file2 that are not in file1:
$ awk 'FNR==NR{a[$0];next}!($0 in a)' file1 file2
2 3
3 4
However it's simpler to use comm:
$ comm -13 file1 file2
2 3
3 4

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Common values in 2 columns in 2 files - awk

awk 'BEGIN{FS=OFS="\t"}NR==FNR{a[$1]=$2;next}{$2=a[$2]}1' f2 f1

GNU sed oneliner sed -r 's:\s(\S+)\s+(\S+):/\\s\\S\\+\\s\\+\1/s/\\(\\s*\\S\\+\\s\\+\\)\1/\\1\2/:' fileB | sed -f - fileA ..output: 1 Apple 2 Banana 3 Cinnabon

The command you want is this: $ awk 'FNR==NR{a[$1]=$2 FS $3;next}{$2=a[$2]; print}' file2 file1 1 Apple 2 Banana 3 Cinnabon

Related

awk the column of one file from another file

Apply a sed command to every column of a specific row

print first row of one file in front of each row of second file

Counting occurrences between pattern matches in data file and generating a report

Find the difference between two files

Categories

Resources

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Common values in 2 columns in 2 files - awk

awk 'BEGIN{FS=OFS="\t"}NR==FNR{a[$1]=$2;next}{$2=a[$2]}1' f2 f1

GNU sed oneliner sed -r 's:\s*(\S+)\s+(\S+):/\\s*\\S\\+\\s\\+\1/s/\\(\\s*\\S\\+\\s\\+\\)\1/\\1\2/:' fileB | sed -f - fileA ..output: 1 Apple 2 Banana 3 Cinnabon

The command you want is this: $ awk 'FNR==NR{a[$1]=$2 FS $3;next}{$2=a[$2]; print}' file2 file1 1 Apple 2 Banana 3 Cinnabon

Related

awk the column of one file from another file

Apply a sed command to every column of a specific row

print first row of one file in front of each row of second file

Counting occurrences between pattern matches in data file and generating a report

Find the difference between two files

Categories

Resources

GNU sed oneliner sed -r 's:\s(\S+)\s+(\S+):/\\s\\S\\+\\s\\+\1/s/\\(\\s*\\S\\+\\s\\+\\)\1/\\1\2/:' fileB | sed -f - fileA ..output: 1 Apple 2 Banana 3 Cinnabon