Compare two big files with awk

Compare two big files with awk - awk

I took reference from following link for comparing two files :
Compare files with awk
awk 'NR==FNR{a[$1];next}$1 in a{print $2}' file1 file2
It prints 2nd column of file2, if 1st column of file2 found in file1.
But my requirement is little different. how to print 2nd column of file1 if 1st column of file2 found in associative array (built with 1st column of file1) ?

With this:
awk 'NR==FNR{a[$1]=$2;next}$1 in a{print a[$1]}' file1 file2
With this way you assign a value to each array element of array a.
For a line with fields foo bar, you actually create a[foo]=bar.
If you later give a command {print a[foo]} it will print bar (it's assigned value)
The previous {a[$1];next} creates an array with name a and index $1,but value is null; It is a stortcut of a[$1]="".
The whole thing works in awk, because awk has an easy way to look up indexes in an array using $1 in a{print something}. This is an awk if then shortcut.
It is the same like {if ($1 in a) {print something}}. The great about this is that the part $1 in a refers to array a indexes and not array values.

Related

How can I use awk to print a numeric index based on a matching pattern?

I have a comma-separated file with two columns like so:
A,france
B,france
C,germany
D,germany
E,germany
F,spain
G,spain
I want to use awk (or any similar tool) to print a numeric value for each of the different groups (countries in this example). i.e.
A,france,1
B,france,1
C,germany,2
D,germany,2
E,germany,2
F,spain,3
G,spain,3
Is there a straightforward way to achieve this without having to specify every single group manually?

Using an associative array t for the team numbers. For each line, test if the team is not yet a key in the array (value will equate to empty string), and in that case, increment the value of counter i and set the value in the t array to the counter value after this increment. Then print the whole line ($0), followed by the value looked up from the associative array.
The -F, -v OFS=, uses field separator , on both input and output.
awk -F, -v OFS=, '{if (t[$2]=="") {t[$2]=++i}; print $0,t[$2]}' filename
gives
A,france,1
B,france,1
C,germany,2
D,germany,2
E,germany,2
F,spain,3
G,spain,3

This one-liner works no matter the countries are sorted or not in the input file:
awk -F, -v OFS=',' '{a[$2]=a[$2]?a[$2]:++i}$3=a[$2]' file
For example:
$ awk -F, -v OFS=',' '{a[$2]=a[$2]?a[$2]:++i}$3=a[$2]' f
A,france,1
B,france,1
C,germany,2
D,germany,2
E,germany,2
F,spain,3
G,spain,3
H,germany,2
I,germany,2
J,spain,3

Using awk how do I combine data in two files and substitute multiple values from the second file to the first file?

This question is an extension to Using awk how do I combine data in two files and substitute values from the second file to the first file?
data.txt contains some data:
A;1
B;2
A;3
keys.txt contains "key;value1;value;value3;value4" ("C" is in this example not part of data.txt, but the awk script should still work):
A;30;BC;100;1000
B;20;CD;200;2000
C;10;DE;300;3000
Wanted output:
A;1;30;BC;100;1000
B;2;20;CD;200;2000
A;3;30;BC;100;1000
Hence, each row in data.txt that contains any key from keys.txt should get the corresponding values appended to the row in data.txt.

it's similar to the previous answer referred in the question.
$ awk 'BEGIN {FS=OFS=";"}
NR==FNR {k=$1; $1=""; a[k]=$0; next}
$1 in a {print $0 a[$1]}' file2 file1
A;1;30;BC;100;1000
B;2;20;CD;200;2000
A;3;30;BC;100;1000

Using awk how do I combine data in two files and substitute values from the second file to the first file?

Any ideas how to the following using awk?
Two input files, data.txt and keys.txt:
data.txt contains some data:
A;1
B;2
A;3
keys.txt contains "key;value" pairs ("C" is in this example not part of data.txt, but the awk script should still work):
A;30
B;20
C;10
The output should be as follows:
A;1;30
B;2;20
A;3;30
Hence, each row in data.txt that contains any key from keys.txt should get the corresponding value appended to the row in data.txt.

awk to the rescue!
assumes the second file has unique keys unlike first file (if not you need to specify what happens then)
$ awk 'BEGIN {FS=OFS=";"}
NR==FNR {a[$1]=$2; next}
$1 in a {print $0,a[$1]}' file2 file1
A;1;30
B;2;20
A;3;30
ps. note the order of files...

awk solution:
awk -F';' 'NR==FNR{a[$1]=$2; next}{if($1 in a) $0=$0 FS a[$1]; print}' file2 file1
The output:
A1;1;2
A2;2;1
A3;3;0.5
A1;1;2
A2;2;1
A3;3;0.5
NR==FNR - processing the first file i.e. file2
a[$1]=$2 - accumulating additional values for each key
if($1 in a) $0=$0 FS a[$1] - appending value if first column matches

Matching two fields between two files AWK

trying to match fields 1,3 to fields 1,2 in another file and print line of second file. First file is tab delimited and second is csv delimited. Unexpected token error?
file1
1 x 12345 x x x
file2
1,12345,x,x,x
script
awk -F',' FNR==NR{a[$1]=$1,$3; next} ($1,$2 in a) {print}' file1 file2 > output.txt

same idea, but doesn't depend on uniqueness of the first field but the pair instead
$ awk 'NR==FNR{a[$1,$3]; next} ($1,$2) in a' file1 FS=, file2
1,12345,x,x,x

You almost nailed it !
awk 'NR==FNR{first[$1]=$3;next} $1 in first{if(first[$1]==$2){print}}' file1 FS="," file2
Output
1,12345,x,x,x
Notes
Since the field separator is different for both the files, we have changed it in between files.
This script makes an assumption that the first field of each file is unique, else, the script breaks
See [ switching field separator ] in between files.

Using AWK to check column in file1 against file2

Im having some difficulties with AWK in comparing the contents of one file with another.
File1.txt
142317216-|--|-tree-|-apple-|-|--
150232802-|--|-plant-|-sugar-|-granular|--
153947334-|--|-flower-|-daisy-|-single|--
153188646-|--|-soil-|-earth-|-|--
File2.txt
apple,99817
sugar,75844
daisy,34566
earth,75544
Using "-" as the separator I can pull the information from column 7.
awk 'BEGIN { FS="-";} {print $7;}' file1.txt
Output
apple
sugar
daisy
earth
My full command to check if column7 within file1,txt exists in file2.txt.
awk 'BEGIN {FS="-";} NR==FR{a[$1]=$7;next} {FS=",";} $1 in a ' file1.txt file2.txt
Get column7, then change separator to "," and check $1 against variable a.
This shows no results and I'm struggling to get my head around the syntax to understand why. Could any perhaps give me some pointers.

You don't show the output you expect and you didn't include non-matching (or duplicate) values in your files so it's a guess but this MAY be what you want:
$ awk 'NR==FNR{file2[$1];next} {print ($7 in file2 ? "present:" : "absent:"), $7}' FS=',' file2 FS='-' file1
present: apple
present: sugar
present: daisy
present: earth
This situation is one reason why it's possible to set variables in the file list - to change their value between files.
Since you're just starting to learn awk - get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Compare two big files with awk - awk

Related

How can I use awk to print a numeric index based on a matching pattern?

Using awk how do I combine data in two files and substitute multiple values from the second file to the first file?

Using awk how do I combine data in two files and substitute values from the second file to the first file?

Matching two fields between two files AWK

Using AWK to check column in file1 against file2

Categories

Resources