Reading from a file and writing to another using Awk - awk

There are two tab delimiter text files. My aim is to change File 1 so that corresponding values in the 2nd column of File 2 will be substituted with zeros in File 1.
To visualize,
File 1:
AA 0
BB 0
CC 0
DD 0
EE 0
File 2:
AA 256
DD 142
EE 26
File 1 - Output:
AA 256
BB 0
CC 0
DD 142
EE 26
I wrote below but as you can see I give the value of 1st row of File 2 by hand. I want to achieve this task automatically. What should I do?
awk -F'\t' 'BEGIN {OFS=FS} {if($1 == "AA") $2="256";print}' test > test.tmp && mv test.tmp test
Thank you in advance.

awk 'BEGIN {FS=OFS="\t"} NR==FNR{a[$1]=$2; next} {print $1, a[$1]+0}' file2 file1

Related

How to update one file's column from another file's column in awk

I have two files, the first file:
1 AA
2 BB
3 CC
4 DD
and the second file
15 AA
17 BB
20 CC
25 FF
File 1 should be updated and the expected output should looks like this:
15 AA
17 BB
20 CC
4 DD
I have tried this script from another post but it didn't work
awk 'NR==FNR{a[$1]=$2;next}a[$1]{print $2,a[$1]}' file1 file2
$ awk 'NR==FNR{a[$2]=$1; next} $2 in a{$1=a[$2]} 1' file2 file1
15 AA
17 BB
20 CC
4 DD
Here is an awk:
awk 'FNR==NR{f2[$2]=$0; next}
$2 in f2 {print f2[$2]; next}
1' file2 file1
Prints:
15 AA
17 BB
20 CC
4 DD

AWK program that can read a second file either from a file specified on the command line or from data received via a pipe

I have an AWK program that does a join of two files, file1 and file2. The files are joined based on a set of columns. I placed the AWK program into a bash script that I named join.sh. See below. Here is an example of how the script is executed:
./join.sh '1,2,3,4' '2,3,4,5' file1 file2
That says this: Do a join of file1 and file2, using columns (fields) 1,2,3,4 of file1 and columns (fields) 2,3,4,5 of file2.
That works great.
Now what I would like to do is to filter file2 and pipe the results to the join tool:
./fetch.sh ident file2 | ./join.sh '1,2,3,4' '2,3,4,5' file1
fetch.sh is a bash script containing an AWK program that fetches the rows in file2 with primary key ident and outputs to stdout the rows that were fetched.
Unfortunately, that pipeline is not working. I get no results.
Recap: I want the join program to be able to read the second file either from a file that I specify on the command line or from data received via a pipe. How to do that?
Here is my bash script, named join.sh
#!/bin/bash
awk -v f1cols=$1 -v f2cols=$2 '
BEGIN { FS=OFS="\t"
m=split(f1cols,f1,",")
n=split(f2cols,f2,",")
}
{ sub(/\r$/, "") }
NR == 1 { b[0] = $0 }
(NR == FNR) && (NR > 1) { idx2=$(f2[1])
for (i=2;i<=n;i++)
idx2=idx2 $(f2[i])
a[idx2] = $0
next
}
(NR != FNR) && (FNR == 1) { print $0, b[0] }
FNR > 1 { idx1=$(f1[1])
for (i=2;i<=m;i++)
idx1=idx1 $(f1[i])
for (idx1 in a)
print $0, a[idx1]
}' $3 $4
I'm not sure if this is 'correct' as you haven't provided any example input and expected output, but does using - to signify stdin work for your use-case? E.g.
cat file1
1 2 3 4
AA BB CC DD
AA EE FF GG
cat file2
1 2 3 4
AA ZZ YY XX
AA 11 22 33
./join.sh '1' '1' file1 file2
1 2 3 4 1 2 3 4
AA ZZ YY XX AA BB CC DD
AA ZZ YY XX AA EE FF GG
AA 11 22 33 AA BB CC DD
AA 11 22 33 AA EE FF GG
cat file2 | ./join.sh '1' '1' file1 -
1 2 3 4 1 2 3 4
AA ZZ YY XX AA BB CC DD
AA ZZ YY XX AA EE FF GG
AA 11 22 33 AA BB CC DD
AA 11 22 33 AA EE FF GG
be able to read(...)from data received via a pipe
GNU AWK does support Using getline from a Pipe consider following simple example
awk 'BEGIN{cmd="seq 7";while((cmd | getline) > 0){print $1*7};close(cmd)}' emptyfile
gives output
7
14
21
28
35
42
49
Explanation: I process output of seq 7 command (numbers from 1 to 7 inclusive, each on separate line), body of while is executed for each line of seq 7 output, fields are set like for normal processing.

awk,merge two data sets based on column value

I need to combine two data sets stored in variables. This merge needs to be conditional based on the value of 1st column of "$x" and third column of "$y"
-->echo "$x"
12 hey
23 hello
34 hi
-->echo "$y"
aa bb 12
bb cc 55
ff gg 34
ss ww 23
By following command, I managed to store the value of first column of $x in a[] and check for third column of $y but not getting what I am expecting, can someone please help here.
awk 'NR==FNR{a[$1]=$1;next} $3 in a{print $0,a[$1]}' <(echo "$x") <(echo "$y")
aa bb 12
ff gg 34
ss ww 23
Expected result:
aa bb 12 hey
ff gg 34 hi
ss ww 23 hello
Your answer is almost right:
awk 'NR==FNR{a[$1]=$2;next} ($3 in a){print $0,a[$3]}' <(echo "$x") <(echo "$y")
Note the a[$1]=$2 and the print $0,a[$3].
join -1 1 -2 3 <(sort -k 1b,1 a.txt) <(sort -k 3b,3 b.txt) |awk '{print $3, $4, $1, $2 }'
Might be a solution for your input in two textfiles a.txt and b.txt using join on your two number columns.
It does not keep the order though. You might have to sort again if it is important.

how to append a line with sed/awk after specific text

i would like to translate this input file using sed or awk:
input
1 AA
3 BB
5 CC
output
1 AA
3 BB
3 GG
5 CC
the closest syntax I found on this site sed -i '/^BB:/ s/$/ GG/' file but it does 3 BB GG. What I need is similar to a vi yank, paste & regex replace.
can this be done with sed or awk? thanks
Rand
With GNU sed:
sed -r 's/^([^ ]*) BB$/&\n\1 GG/' file
Output:
1 AA
3 BB
3 GG
5 CC
This might work for you (GNU sed):
sed '/BB/p;s//GG/' file
If the line contains the required string print it then substitute another string for it.
awk is a fine choice for this:
awk '{print $0} $2=="BB"{print $1,"GG"}' yourfile.txt
That will print the line {print $0}. And then if the second field in the line is equal to "BB", it will print the first field in the line (the number) and the text "GG".
Example in use:
>echo "1 AA\n3 BB\n4 RR" | awk '{print $0} $2=="BB"{print $1,"GG"}'
1 AA
3 BB
3 GG
4 RR
In awk:
$ awk '1; /BB/ && $2="GG"' input
1 AA
3 BB
3 GG
5 CC
1 prints the record. If there was BB in the record just printed, replace it with GG and print again.

Concatenate files based off unique titles in their first column

I have many files that are of two column format with a label in the first column and a number in the second column. The number is positive (never zero):
AGS 3
KET 45
WEGWET 12
FEW 56
Within each file, the labels are not repeated.
I would like to concatenate these many files into one file with many+1 columns, such that the first column includes the unique set of all labels across all files, and the last five columns include the number for each label of each file. If the label did not exist in a certain file (and hence there is no number for it), I would like it to default to zero. For instance, if the second file contains this:
AGS 5
KET 14
KJV 2
FEW 3
then the final output would look like:
AGS 3 5
KET 45 14
WEGWET 12 0
KJV 0 2
FEW 56 3
I am new to Linux, and have been playing around with sed and awk, but realize this probably requires multiple steps...
*Edit note: I had to change it from just 2 files to many files. Even though my example only shows 2 files, I would like to do this in case of >2 files as well. Thank you...
Here is one way using awk:
awk '
NR==FNR {a[$1]=$0;next}
{
print (($1 in a)?a[$1] FS $2: $1 FS "0" FS $2)
delete a[$1]
}
END{
for (x in a) print a[x],"0"
}' file1 file2 | column -t
AGS 3 5
KET 45 14
KJV 0 2
FEW 56 3
WEGWET 12 0
You read file1 in to an array indexed at column 1 and assign entire line as it's value
For the file2, check if column 1 is present in our array. If it is print the value from file1 along with value from file2. If it is not present print 0 as value for file1.
Delete the array element as we go along to get only what was unique in file1.
In the END block print what was unique in file1 and print 0 for file2.
Pipe the output to column -t for pretty format.
Assuming that your data are in files named file1 and file2:
$ awk 'FNR==NR {a[$1]=$2; b[$1]=0; next} {a[$1]+=0; b[$1]=$2} END{for (x in b) {printf "%-15s%3s%3s\n",x,a[x],b[x]}}' file1 file2
KJV 0 2
WEGWET 12 0
KET 45 14
AGS 3 5
FEW 56 3
To understand the above, we have to understand an awk trick.
In awk, NR is the number of records (lines) that have been processed and FNR is the number of records that we have processed in the current file. Consequently, the condition FNR==NR is true only when we are processing in the first file. In this case, the associative array a gets all the values from the first file and associative array b gets placeholder, i.e. zero, values. When we process the second file, its values go in array b and we make sure that array a at least has a placeholder value of zero. When we are done with the second file, the data is printed.
More than two files using GNU Awk
I created a file3:
$ cat file3
AGS 3
KET 45
WEGWET 12
FEW 56
AGS 17
ABC 100
The awk program extended to work with any number of files is:
$ awk 'FNR==1 {n+=1} {a[$1][n]=$2} END{for (x in a) {printf "%-15s",x; for (i=1;i<=n;i++) {printf "%5s",a[x][i]};print ""}}' file1 file2 file3
KJV 2
ABC 100
WEGWET 12 12
KET 45 14 45
AGS 3 5 17
FEW 56 3 56
This code works creates a file counter. We know that we are in a new file every time that FNR is 1 and a counter, n, is incremented. For every line we encounter, we put the data in a 2-D array. The first dimension of a is the label and the second is the number of the file that we encountered it in. In the end, we just loop over all the labels and all the files, from 1 to n and print the data.
More than 2 files without GNU Awk
Without requiring GNU's awk, we can solve the problem using simulated two-dimensional arrays:
$ awk 'FNR==1 {n+=1} {b[$1]=1; a[$1,":",n]=$2} END{for (x in b) {printf "%-15s",x; for (i=1;i<=n;i++) {q=a[x,":",i]+0; printf "%5s",q};print ""}}' file1 file2 file3
KJV 0 2 0
ABC 0 0 100
WEGWET 12 0 12
KET 45 14 45
AGS 3 5 17
FEW 56 3 56