How to update one file's column from another file's column in awk - awk

I have two files, the first file:
1 AA
2 BB
3 CC
4 DD
and the second file
15 AA
17 BB
20 CC
25 FF
File 1 should be updated and the expected output should looks like this:
15 AA
17 BB
20 CC
4 DD
I have tried this script from another post but it didn't work
awk 'NR==FNR{a[$1]=$2;next}a[$1]{print $2,a[$1]}' file1 file2

$ awk 'NR==FNR{a[$2]=$1; next} $2 in a{$1=a[$2]} 1' file2 file1
15 AA
17 BB
20 CC
4 DD

Here is an awk:
awk 'FNR==NR{f2[$2]=$0; next}
$2 in f2 {print f2[$2]; next}
1' file2 file1
Prints:
15 AA
17 BB
20 CC
4 DD

Related

AWK program that can read a second file either from a file specified on the command line or from data received via a pipe

I have an AWK program that does a join of two files, file1 and file2. The files are joined based on a set of columns. I placed the AWK program into a bash script that I named join.sh. See below. Here is an example of how the script is executed:
./join.sh '1,2,3,4' '2,3,4,5' file1 file2
That says this: Do a join of file1 and file2, using columns (fields) 1,2,3,4 of file1 and columns (fields) 2,3,4,5 of file2.
That works great.
Now what I would like to do is to filter file2 and pipe the results to the join tool:
./fetch.sh ident file2 | ./join.sh '1,2,3,4' '2,3,4,5' file1
fetch.sh is a bash script containing an AWK program that fetches the rows in file2 with primary key ident and outputs to stdout the rows that were fetched.
Unfortunately, that pipeline is not working. I get no results.
Recap: I want the join program to be able to read the second file either from a file that I specify on the command line or from data received via a pipe. How to do that?
Here is my bash script, named join.sh
#!/bin/bash
awk -v f1cols=$1 -v f2cols=$2 '
BEGIN { FS=OFS="\t"
m=split(f1cols,f1,",")
n=split(f2cols,f2,",")
}
{ sub(/\r$/, "") }
NR == 1 { b[0] = $0 }
(NR == FNR) && (NR > 1) { idx2=$(f2[1])
for (i=2;i<=n;i++)
idx2=idx2 $(f2[i])
a[idx2] = $0
next
}
(NR != FNR) && (FNR == 1) { print $0, b[0] }
FNR > 1 { idx1=$(f1[1])
for (i=2;i<=m;i++)
idx1=idx1 $(f1[i])
for (idx1 in a)
print $0, a[idx1]
}' $3 $4
I'm not sure if this is 'correct' as you haven't provided any example input and expected output, but does using - to signify stdin work for your use-case? E.g.
cat file1
1 2 3 4
AA BB CC DD
AA EE FF GG
cat file2
1 2 3 4
AA ZZ YY XX
AA 11 22 33
./join.sh '1' '1' file1 file2
1 2 3 4 1 2 3 4
AA ZZ YY XX AA BB CC DD
AA ZZ YY XX AA EE FF GG
AA 11 22 33 AA BB CC DD
AA 11 22 33 AA EE FF GG
cat file2 | ./join.sh '1' '1' file1 -
1 2 3 4 1 2 3 4
AA ZZ YY XX AA BB CC DD
AA ZZ YY XX AA EE FF GG
AA 11 22 33 AA BB CC DD
AA 11 22 33 AA EE FF GG
be able to read(...)from data received via a pipe
GNU AWK does support Using getline from a Pipe consider following simple example
awk 'BEGIN{cmd="seq 7";while((cmd | getline) > 0){print $1*7};close(cmd)}' emptyfile
gives output
7
14
21
28
35
42
49
Explanation: I process output of seq 7 command (numbers from 1 to 7 inclusive, each on separate line), body of while is executed for each line of seq 7 output, fields are set like for normal processing.

how to append a file to the second column of another tsv file

I have a file first.txt that looks like this :
45
56
74
62
I want to append this file to second.tsv that looks like this(there are 17 columns) :
2 a ...
3 b ...
5 c ...
6 d ...
The desired output is :
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
How can I append to the second column?
I've tried
awk -F, '{getline f1 <"first.txt" ;print $1,f1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17}' second.tsv
but did not work. This added the columns of first.txt to the last column of second.tsv, and it was not tab separated.
Thank you.
Your code works if you remove the -F, bit. This tells awk that the file is comma-separated, which it is not.
Another option would be to go for a piped version with paste, e.g.:
paste first.tsv second.tsv | awk '{ t=$2; $2=$1; $1=t } 1' OFS='\t'
Output:
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
$ awk 'NR==FNR{a[FNR]=$0;next} {$1=$1 OFS a[FNR]} 1' file1 file2
2 45 a ...
3 56 b ...
5 74 c ...
6 62 d ...
If your files are tab-separated add BEGIN{FS=OFS="\t"} at the front.

awk,merge two data sets based on column value

I need to combine two data sets stored in variables. This merge needs to be conditional based on the value of 1st column of "$x" and third column of "$y"
-->echo "$x"
12 hey
23 hello
34 hi
-->echo "$y"
aa bb 12
bb cc 55
ff gg 34
ss ww 23
By following command, I managed to store the value of first column of $x in a[] and check for third column of $y but not getting what I am expecting, can someone please help here.
awk 'NR==FNR{a[$1]=$1;next} $3 in a{print $0,a[$1]}' <(echo "$x") <(echo "$y")
aa bb 12
ff gg 34
ss ww 23
Expected result:
aa bb 12 hey
ff gg 34 hi
ss ww 23 hello
Your answer is almost right:
awk 'NR==FNR{a[$1]=$2;next} ($3 in a){print $0,a[$3]}' <(echo "$x") <(echo "$y")
Note the a[$1]=$2 and the print $0,a[$3].
join -1 1 -2 3 <(sort -k 1b,1 a.txt) <(sort -k 3b,3 b.txt) |awk '{print $3, $4, $1, $2 }'
Might be a solution for your input in two textfiles a.txt and b.txt using join on your two number columns.
It does not keep the order though. You might have to sort again if it is important.

Reading from a file and writing to another using Awk

There are two tab delimiter text files. My aim is to change File 1 so that corresponding values in the 2nd column of File 2 will be substituted with zeros in File 1.
To visualize,
File 1:
AA 0
BB 0
CC 0
DD 0
EE 0
File 2:
AA 256
DD 142
EE 26
File 1 - Output:
AA 256
BB 0
CC 0
DD 142
EE 26
I wrote below but as you can see I give the value of 1st row of File 2 by hand. I want to achieve this task automatically. What should I do?
awk -F'\t' 'BEGIN {OFS=FS} {if($1 == "AA") $2="256";print}' test > test.tmp && mv test.tmp test
Thank you in advance.
awk 'BEGIN {FS=OFS="\t"} NR==FNR{a[$1]=$2; next} {print $1, a[$1]+0}' file2 file1

Why is awk not printing out newlines?

I have a file which looks like this:
1
2
AA
4
5
AA BB
7
8
AA BB CC
10
11
AA BB CC DD
I am using awk to extract only every nth line where n=3.
>>awk 'NR%3==0' /input/file_foo >> output/file_foobar
The output is appearing in a single line as:
AA AA BB AA BB CC AA BB CC DD
.....and so on
I want it to appear as:
AA
AA BB
AA BB CC
AA BB CC DD
I tried using \n, printf with \n, and so on but it doesn't work as I expect. Please advise.
A verbose way,
awk '{ if (NR%3==0) { print $0} }'
Also you can use {printf("%s\n\n", $0)} too. if single \n does not work.
If it still does not work you might need to check the line terminator. It may not be proper. Use the RS variable in awk to separate on the unusual line terminator.
I think the problem is in the way you're showing the data, not in the processing.
$ cat x
1
2
AA
4
5
AA BB
7
8
AA BB CC
10
11
AA BB CC DD
$ awk 'NR%3==0' x
AA
AA BB
AA BB CC
AA BB CC DD
$
I suspect that what you're doing is similar to:
$ awk 'NR%3==0' x > y
$ x=$(<y)
$ echo $x
AA AA BB AA BB CC AA BB CC DD
$ echo "$x"
AA
AA BB
AA BB CC
AA BB CC DD
$
This would confuse you. See also: Capturing multi-line output to a bash variable.
Use the following with print for each line:
awk 'NR%3==0 { print $0 }' /input/file_foo >> output/file_foobar