Trying to read data from two files, and subtract values from both files using awk

Trying to read data from two files, and subtract values from both files using awk - awk

I have two files
0.975301988947238963 1.75276754663189283 2.00584
0.0457467532388459441 1.21307648993841410 1.21394
-0.664000617674924687 1.57872850852906366 1.71268
-0.812129324498058969 4.86617859243825635 4.93348
and
1.98005959631337536 -3.78935155011290536 4.27549
-1.04468782080821154 4.99192849476267053 5.10007
-1.47203672235857397 -3.15493073343947694 3.48145
2.68001948430755244 -0.0630730371855307004 2.68076
I want to subtract the two values in column 3 of each file.
My first awk statement was
**awk
'BEGIN {print "Test"} FNR>1 && FNR==NR { r[$1]=$3; next} FNR>1 { print $3, r[$1], (r[$1]-$3)}' zzz0.dat zzz1.dat**
Test
5.10007 -5.10007
3.48145 -3.48145
2.68076 -2.68076
This suggests it does not recognize r[$1]=$3
I created an additional column xyz by
**awk 'NR==1{$(NF+1)="xyz"} NR>1{$(NF+1)="xyz"}1' zzz0.dat**
then
awk 'BEGIN {print "Test"} FNR>1 && FNR==NR { xyz[$4]=$3; next} FNR>1 { print $3, xyz[$4], (xyz[$4]-$3)}' zzz00.dat zzz11.dat
Test
5.10007 4.93348 -0.16659
3.48145 4.93348 1.45203
2.68076 4.93348 2.25272
This now shows three columns, but xyz[$4] is printing only the value in the last column, instead of creating a array.
My real files have thousands of lines. How can I resolve this problem ?

You can do it relatively easily using a numeric index for your array. For example:
awk 'NR==FNR {a[++n]=$3; next} o<n{++o; printf "%lf - %lf = %lf\n", a[o], $3, a[o]-$3}' file1 file2
That way you preserve the ordering of the records across files. Without a numeric index, the arrays are associative and there is no specific ordering preserved.
Example Use/Output
With your files in file1 and file2 respectively, you would have:
$ awk 'NR==FNR {a[++n]=$3; next} o<n{++o; printf "%lf - %lf = %lf\n", a[o], $3, a[o]-$3}' file1 file2
2.005840 - 4.275490 = -2.269650
1.213940 - 5.100070 = -3.886130
1.712680 - 3.481450 = -1.768770
4.933480 - 2.680760 = 2.252720
Let me know if that is what you intended or if you have any further questions. If I missed your intent, drop a comment and I will help further.

if the records are aligned in both files, easiest is
$ paste file1 file2 | awk '{print $3,$6,$3-$6}'
2.00584 4.27549 -2.26965
1.21394 5.10007 -3.88613
1.71268 3.48145 -1.76877
4.93348 2.68076 2.25272
if you're only interested in the difference, change to print $3-$6.

Related

selecting columns in awk discarding corresponding header

How to properly select columns in awk after some processing. My file here:
cat foo
A;B;C
9;6;7
8;5;4
1;2;3
I want to add a first column with line numbers and then extract some columns of the result. For the example let's get the new first (line numbers) and third columns. This way:
awk -F';' 'FNR==1{print "linenumber;"$0;next} {print FNR-1,$1,$3}' foo
gives me this unexpected output:
linenumber;A;B;C
1 9 7
2 8 4
3 1 3
but expected is (note B is now the third column as we added linenumber as first):
linenumber;B
1;6
2;5
3;2
[fixed and revised]

To get your expected output, use:
$ awk 'BEGIN {
FS=OFS=";"
}
{
print (FNR==1?"linenumber":FNR-1),$(FNR==1?3:1)
}' file
Output:
linenumber;C
1;9
2;8
3;1
To add a column with line number and extract first and last columns, use:
$ awk 'BEGIN {
FS=OFS=";"
}
{
print (FNR==1?"linenumber":FNR-1),$1,$NF
}' file
Output this time:
linenumber;A;C
1;9;7
2;8;4
3;1;3

Why do you print $0 (the complete record) in your header? And, if you want only two columns in your output, why to you print 3 (FNR-1, $1 and $3)? Finally, the reason why your output field separators are spaces instead of the expected ; is simply that... you did not specify the output field separator (OFS). You can do this with a command line variable assignment (OFS=\;), as shown in the second and third versions below, but also using the -v option (-v OFS=\;) or in a BEGIN block (BEGIN {OFS=";"}) as you wish (there are differences between these 3 methods but they don't matter here).
[EDIT]: see a generic solution at the end.
If the field you want to keep is the second of the input file (the B column), try:
$ awk -F\; 'FNR==1 {print "linenumber;" $2; next} {print FNR-1 ";" $2}' foo
linenumber;B
1;6
2;5
3;2
or
$ awk -F\; 'FNR==1 {print "linenumber",$2; next} {print FNR-1,$2}' OFS=\; foo
linenumber;B
1;6
2;5
3;2
Note that, as long as you don't want to keep the first field of the input file ($1), you could as well overwrite it with the line number:
$ awk -F\; '{$1=FNR==1?"linenumber":FNR-1; print $1,$2}' OFS=\; foo
linenumber;B
1;6
2;5
3;2
Finally, here is a more generic solution to which you can pass the list of indexes of the columns of the input file you want to print (1 and 3 in this example):
$ awk -F\; -v cols='1;3' '
BEGIN { OFS = ";"; n = split(cols, c); }
{ printf("%s", FNR == 1 ? "linenumber" : FNR - 1);
for(i = 1; i <= n; i++) printf("%s", OFS $(c[i]));
printf("\n");
}' foo
linenumber;A;C
1;9;7
2;8;4
3;1;3

unix remove duplicates with swapped names

Here is a file with duplicate data in column1 and 2 swapped at different places.
$ cat partnership.dat
V_Kohli|Yuvraj_Singh|57
PA_Patel|CH_Gayle|5
CH_Gayle|V_Kohli|18
MA_Starc|S_Rana|14
S_Rana|MA_Starc|14
V_Kohli|CH_Gayle|18
CH_Gayle|PA_Patel|5
Yuvraj_Singh|V_Kohli|57
V_Kohli|AB_de_Villiers|61
AB_de_Villiers|V_Kohli|61
S_Rana|AB_de_Villiers|5
AB_de_Villiers|S_Rana|5
I'm trying to remove the duplicates and get the below data
V_Kohli|Yuvraj_Singh|57
PA_Patel|CH_Gayle|5
CH_Gayle|V_Kohli|18
MA_Starc|S_Rana|14
V_Kohli|AB_de_Villiers|61
S_Rana|AB_de_Villiers|5
The below awk command is listing all the records.
awk -F"|" ' NR==FNR {a[$1]=$2;b[$2$1]=$3;next} ($2$1 in b) { print }' partnership.dat partnership.dat
Can this be fixed?.

The idiomatic awk approach uses half as much memory as using the fields as 2 different array indices in their different possible orders:
$ awk -F'|' '!seen[$1>$2 ? $1 FS $2 : $2 FS $1]++' file
V_Kohli|Yuvraj_Singh|57
PA_Patel|CH_Gayle|5
CH_Gayle|V_Kohli|18
MA_Starc|S_Rana|14
V_Kohli|AB_de_Villiers|61
S_Rana|AB_de_Villiers|5

You can just simply group the file by making making an hash map, with keys taken out of $1 $2 and then with $2 $1. This way we uniquely identify a line only if it is unique irrespective of order of $1 and $2
awk -F'|' '!unique[$1 FS $2]++ && !unique[$2 FS $1]++' partnership.dat

Awk command to compare specific columns in file1 to file2 and display output

File1
111,222,560,0.7
111,333,560,0.2
111,444,560,0.1
File2
2017,111,560,0.0537
2018,111,560,0.0296
2019,111,560,0.0624
Desired output:
2017,111,560,0.0537,222,0.7
2018,111,560,0.0296,222,0.7
2019,111,560,0.0624,222,0.7
2017,111,560,0.0537,333,0.2
2018,111,560,0.0296,333,0.2
2019,111,560,0.0624,333,0.2
2017,111,560,0.0537,444,0.1
2018,111,560,0.0296,444,0.1
2019,111,560,0.0296,444,0.1
I tried awk NR==FNR command but it’s displaying only the last matched...
reads every line and check if column 1 and 3 of file1 exists in file2:
2017,111,560,0.0537,444,0.1
2018,111,560,0.0296,444,0.1
2019,111,560,0.0296,444,0.1

I tried awk NR==FNR command but it’s displaying only the last
matched...
reads every line and check if column 1 and 3 of file1 exists in file2:
Using awk and sort
awk 'BEGIN{
# set input and output field separator
FS=OFS=","
}
# read first file f1
# index key field1 and field3 of file1 (f1)
{
k=$1 FS $3
}
# save 2nd and last field of file1 (f1) in array a, key being k
FNR==NR{
a[k]=(k in a ? a[k] RS:"") $2 OFS $NF;
# stop processing go to next line
next
}
# read 2nd file f2 from here
# 2nd and 3rd field of fiel2 (f2) used as key
{
k=$2 FS $3
}
# if key exists in array a
k in a{
# split array value by RS row separator, and put it in array t
split(a[k],t,RS);
# iterate array t, print and sort
for(i=1; i in t; i++)
print $0,t[i] | "sort -t, -nk5"
}
' f1 f2
Test Results:
$ cat f1
111,222,560,0.7
111,333,560,0.2
111,444,560,0.1
$ cat f2
2017,111,560,0.0537
2018,111,560,0.0296
2019,111,560,0.0624
$ awk 'BEGIN{FS=OFS=","}{k=$1 FS $3}FNR==NR{a[k]=(k in a ? a[k] RS:"") $2 OFS $NF; next}{k=$2 FS $3}k in a{split(a[k],t,RS); for(i=1; i in t; i++)print $0,t[i] | "sort -t, -nk5" }' f1 f2
2017,111,560,0.0537,222,0.7
2018,111,560,0.0296,222,0.7
2019,111,560,0.0624,222,0.7
2017,111,560,0.0537,333,0.2
2018,111,560,0.0296,333,0.2
2019,111,560,0.0624,333,0.2
2017,111,560,0.0537,444,0.1
2018,111,560,0.0296,444,0.1
2019,111,560,0.0624,444,0.1

Following awk may help you in same.
awk -F, '
FNR==NR{
a[FNR]=$0;
next
}
{
for(i=1;i<=length(a);i++){
print a[i] FS $2 FS $NF
}
}' Input_file2 Input_file1
Adding explanation too for code as follows.
awk -F, ' ##Setting field separator as comma here for all the lines.
FNR==NR{ ##Using FNR==NR condition which will be only TRUE then first Input_file named File2 is being read.
##FNR and NR both indicates the number of lines for a Input_file only difference is FNR value will be RESET whenever a new file is being read and NR value will be keep increasing till all Input_files are read.
a[FNR]=$0; ##Creating an array named a whose index is FNR(current line) value and its value is current line value.
next ##Using next statement will sip all further statements now.
}
{
for(i=1;i<=length(a);i++){##Starting a for loop from variable i value from 1 to length of array a value. This will be executed on 2nd Input_file reading.
print a[i] FS $2 FS $NF ##Printing the value of array a whose index is variable i and printing 2nd and last field of current line.
}
}' File2 File1 ##Mentioning the Input_file names here.

another one with join/awk
$ join -t, -j99 file2 file1 |
awk -F, -v OFS=, '$3==$6 && $4==$8 {print $2,$3,$4,$5,$7,$9}'

awk returning whitespace matches when comparing columns in csv

I am trying to do a file comparison in awk but it seems to be returning all the lines instead of just the lines that match due to whitespace matching
awk -F "," 'NR==FNR{a[$2];next}$6 in a{print $6}' file1.csv fil2.csv
How do I instruct awk not to match the whitespaces?
I get something like the following:
cccs
dert
ssss
assak

this should do
$ awk -F, 'NR==FNR && $2 {a[$2]; next}
$6 in a {print $6}' file1 file2
if you data file includes spaces and numerical fields, as commented below better to change the check from $2 to $2!="" && $2!~/[[:space:]]+/

Consider cases like $2=<space>foo<space><space>bar in file1 vs $6=foo<space>bar<space> in file2.
Here's how to robustly compare $6 in file2 against $2 of file1 ignoring whitespace differences, and only printing lines that do not have empty or all-whitespace key fields:
awk -F, '
{
key = (NR==FNR ? $2 : $6)
gsub(/[[:space:]]+/," ",key)
gsub(/^ | $/,"",key)
}
key=="" { next }
NR==FNR { file1[key]; next }
key in file1
' file1 file2
If you want to make the comparison case-insensitive then add key=tolower(key) before the first gsub(). If you want to make it independent of punctuation add gsub(/[[:punct:]]/,"",key) before the first gsub(). And so on...
The above is untested of course since no testable sample input/output was provided.

awk to compare two file by identifier & output in a specific format

I have 2 large files i need to compare all pipe delimited
file 1
a||d||f||a
1||2||3||4
file 2
a||d||f||a
1||1||3||4
1||2||r||f
Now I want to compare the files & print accordingly such as if any update found in file 2 will be printed as updated_value#oldvalue & any new line added to file 2 will also be updated accordingly.
So the desired output is: (only the updated & new data)
1||1#2||3||4
1||2||r||f
what I have tried so far is to get the separated changed values:
awk -F '[||]+' 'NR==FNR{for(i=1;i<=NF;i++)a[NR,i]=$i;next}{for(i=1;i<=NF;i++)if(a[FNR,i]!=$i)print $i"#"a[FNR,i]}' file1 file2 >output
But I want to print the whole line. How can I achieve that??

I would say:
awk 'BEGIN{FS=OFS="|"}
FNR==NR {for (i=1;i<=NF;i+=2) a[FNR,i]=$i; next}
{for (i=1; i<=NF; i+=2)
if (a[FNR,i] && a[FNR,i]!=$i)
$i=$i"#"a[FNR,i]
}1' f1 f2
This stores the file1 in a matrix a[line number, column]. Then, it compares its values with its correspondence in file2.
Note I am using the field separator | instead of || and looping in steps of two to use the proper data. This is because I for example did gawk -F'||' '{print NF}' f1 and got just 1, meaning that FS wasn't well understood. Will be grateful if someone points the error here!
Test
$ awk 'BEGIN{FS=OFS="|"} FNR==NR {for (i=1;i<=NF;i+=2) a[FNR,i]=$i; next} {for (i=1; i<=NF; i+=2) if (a[FNR,i] && a[FNR,i]!=$i) $i=$i"#"a[FNR,i]}1' f1 f2
a||d||f||b#a
1||1#2||3||4
1||2||r||f

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Trying to read data from two files, and subtract values from both files using awk - awk

if the records are aligned in both files, easiest is $ paste file1 file2 | awk '{print $3,$6,$3-$6}' 2.00584 4.27549 -2.26965 1.21394 5.10007 -3.88613 1.71268 3.48145 -1.76877 4.93348 2.68076 2.25272 if you're only interested in the difference, change to print $3-$6.

Related

selecting columns in awk discarding corresponding header

unix remove duplicates with swapped names

Awk command to compare specific columns in file1 to file2 and display output

awk returning whitespace matches when comparing columns in csv

awk to compare two file by identifier & output in a specific format

Categories

Resources