Join two tables with AWK, one from stdin, other from file - awk

I have two tab-delimited files:
file tmp1.tsv:
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
file tmp2.tsv:
3
2
4
I want to get this:
3 ccc
2 bbb
4 ddd
Using following routine:
$ cat tmp2.tsv | awk -F '\t' <magic here> tmp1.tsv
I know how to make it without stdin:
$ awk -F '\t' 'FNR==NR{ a[$1] = $2; next }{ print $1 FS a[$1] }' tmp1.tsv tmp2.tsv
But have no idea how to make it with stdin. Also, explanation of the solution will be appreciated.

Assuming your solution works as desired, it is trivial. Instead of:
awk -F '\t' 'FNR==NR{ a[$1] = $2; next }{ print $1 FS a[$1] }' tmp1.tsv tmp2.tsv
simply do:
< tmp2.tsv awk -F '\t' 'FNR==NR{ a[$1] = $2; next }{ print $1 FS a[$1] }' tmp1.tsv -
(Note that I've replaced cat tmp2.tsv | with a redirect to avoid UUOC.)
That is, specify a filename of - and awk will read from stdin.

Related

Replace work in line by combining after grep

I need to replace a word after performing a grep and getting the last line of the result.
Here my example file:
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 NONE
What I need is to select all lines containing 'aaa', get the last one in the result and replace NONE.
I tried
cat <file> | grep "aaa" | tail -n 1 | sed -i 's/NONE/ts8/g'
but it doesn't work.
Any suggestion to do that?
Thanks
With tac + awk solution please try following.
tac Input_file | awk '/aaa/ && ++count==1{sub(/NONE/,"ts8")} 1' | tac
once you are happy with above command try following, to do inplace save into Input_file.
tac Input_file | awk '/aaa/ && ++count==1{sub(/NONE/,"ts8")} 1' | tac > temp && mv temp Input_file
Explanation: Firstly printing Input_file in reverse order by tac then sending its standard output to awk as an input where substituting of NONE to ts8 in very first line(which is actually last line containing aaa). Simply printing all other lines, again sending output to tac to make it in actual order like Input_file's order.
For doing this in a single command, this should work in any version of awk:
awk 'FNR==NR {if ($1=="aaa") n=FNR; next} FNR == n {$3="TS7"} 1' file{,}
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 TS7
To save output in same file use:
awk 'FNR==NR {if ($1=="aaa") n=FNR; next}
FNR == n {$3="TS7"} 1' file{,} > file.out && mv file.out file
Or using gnu sed, you may use:
sed -i -Ez 's/(.*\naaa[[:blank:]]+[^[:blank:]]+[[:blank:]]+)NONE/\1ts8/' file
cat file
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 ts8
If you want to get the last line that matches aaa at the start, you can go through all lines, and in the END block, print the last occurrence and replace NONE with ts8 using awk:
awk '$1=="aaa"{last=$0}END{sub(/NONE/,"ts8",last);print last}' file
In parts:
$1=="aaa" { # If the first field is aaa
last=$0 # Set variable last to the whole line (overwrite on each match)
}
END { # Run once at the end
sub(/NONE/,"ts8",last) # Replace NONE with ts8 in the last variable
print last
}
' file
Output
aaa ts7 ts8

Join two columns from different files with awk

I want to join two columns from two different files using awk. These files look like (A, B, C, 0, 1, 2, etc are columns)
file1:
A B C D E F
fil2:
0 1 2 3 4 5
And I want to be able to select arbitrary columns on my ouput, something of the form:
Ie, I want the output to be:
A C E 4 5
I've seen a million answers with the following awk code (and very similar ones), offering no explanation. But none of them address the exact problem I want to solve:
awk 'FNR==NR{a[FNR]=$2;next};{$NF=a[FNR]};1' file2 file1
awk '
NR==FNR {A[$1,$3,$6] = $0; next}
($1 SUBSEP $2 SUBSEP $3) in A {print A[$1,$2,$3], $4}
' A.txt B.txt
But none of them seem to do what I want and I am not able to understand them.
So, how can I achieve the desired output using awk? (and please, offer an explanation, I want to actually learn)
Note:
I know I can do this using something like
paste <(awk '{print $1}' file1) <(awk '{print $2}' file2)
As I said, I'm trying to learn and understand awk.
With GNU awk for true multi-dimensional arrays and ARGIND:
$ awk -v flds='1 1 1 3 1 5 2 5 2 6' '
BEGIN{ nf = split(flds,o) }
{ f[ARGIND][1]; split($0,f[ARGIND]) }
NR!=FNR { for (i=2; i<=nf; i+=2) printf "%s%s", f[o[i-1]][o[i]], (i<nf?OFS:ORS) }
' file1 file2
A C E 4 5
The "flds" string is just a series of <file number> <field number in that file> pairs so you can print the fields from each file in whatever order you like, e.g.:
$ awk -v flds='1 1 2 2 1 3 2 4 1 5 2 6' 'BEGIN{nf=split(flds,o)} {f[ARGIND][1]; split($0,f[ARGIND])} NR!=FNR{for (i=2; i<=nf; i+=2) printf "%s%s",f[o[i-1]][o[i]], (i<nf?OFS:ORS)}' file1 file2
A 1 C 3 E 5
$ awk -v flds='2 1 1 2 2 3 1 4 2 5' 'BEGIN{nf=split(flds,o)} {f[ARGIND][1]; split($0,f[ARGIND])} NR!=FNR{for (i=2; i<=nf; i+=2) printf "%s%s",f[o[i-1]][o[i]], (i<nf?OFS:ORS)}' file1 file2
0 B 2 D 4

Compare (match) command output with lines in file

I'm piping the output of a command to awk and i want to check if that output has a match in the lines of a file.
Let's say i have the following file:
aaa
bbb
ccc
...etc
Then, let's say i have a command 'anything' that returns, my goal is to pipe anything | awk to check if the output of that command has a match inside the file (if it doesn't, i would like to append it to the file, but that's not difficult..). My problem is that i don't know how to read from both the command output and the file at the same time.
Any advice is welcome
My problem is that i don't know how to read from both the command output and the file at the same time.
Use - to represent standard input in the list of files for awk to read:
$ cat file
aaa
bbb
ccc
$ echo xyz | awk '{print}' - file
xyz
aaa
bbb
ccc
EDIT
There are various options for handing each input source separately:
Using FILENAME:
$ echo xyz | awk 'FILENAME=="-" {print "Command output: " $0} FILENAME=="input.txt" {print "from file: " $0}' - input.txt
Command output: xyz
from file: aaa
from file: bbb
from file: ccc
Using ARGIND (gawk only):
$ echo xyz | awk 'ARGIND==1 {print "Command output: " $0} ARGIND==2 {print "from file: " $0}' - input.txt
Command output: xyz
from file: aaa
from file: bbb
from file: ccc
When there are only two files, it is common to see the NR==FNR idiom. See subheading "Two-file processing" here: http://backreference.org/2010/02/10/idiomatic-awk/
$ echo xyz | awk 'FNR==NR {print "Command output: " $0; next} {print "from file: " $0}' - input.txt
Command output: xyz
from file: aaa
from file: bbb
from file: ccc
command | awk 'script' file -
The - represents stdin. Swap the order of the arguments if appropriate. Read Effective Awk Programming, 4th Edition, by Arnold Robbins to learn how to use awk.

linux/ubuntu awk match unique values (instead of bash "sort unique grep" unique values)

My command looks like this:
cut -f 1 dummy_FILE | sort | uniq -c | awk '{print $2}' | for i in $(cat -); do grep -w $i dummy_FILE |
awk -v VAR="$i" '{distance+=$3-$2} END {print VAR, distance}'; done
cat dummy_FILE
Red 13 14
Red 39 46
Blue 45 23
Blue 34 27
Green 31 73
I want to:
For every word in $1 dummy_FILE (Red, Blue, Green) - Calculate sum of differences between $3 and $2.
To get the output like this:
Red 8
Blue -29
Green 42
My questions are:
Is it possible to replace cut -f 1 dummy_FILE | sort | uniq -c | awk '{print $2}'?
I am using sort | uniq -c to extract every word from the dataset - is it possible to do it with awk?
How can I overcome useless cat in for i in $(cat -)?
grep -w $i dummy_FILE works fine, but I want to replace it with awk (should I?); If so how can I do this?
When I am trying to awk -v VAR="$i" '/^VAR/ '{distance+=$3-$2} END {print VAR, distance}' I am getting "fatal: division by zero attempted".
I got it using:
awk '{a[$1] = a[$1] + $3 - $2;} END{for (x in a) {print x" "a[x];}}' dummy_FILE
Output:
Blue -29
Green 42
Red 8
If you want to sort the output, just append sort after the AWK command.
Here's one way using awk:
awk '{ a[$1]=a[$1] + $3 - $2 } END { for(i in a) print i, a[i] }' dummy
Results:
Red 8
Blue -29
Green 42
If you require sorted output, you could simply pipe into sort like arutaku suggests:
awk '{ a[$1]=a[$1] + $3 - $2 } END { for(i in a) print i, a[i] }' dummy | sort
You can however, print into sort (within the awk statement), like this:
awk '{ a[$1]=a[$1] + $3 - $2 } END { for(i in a) print i, a[i] | "sort" }' dummy

Print all Fields with AWK separated by OFS

Is there a way to print all records separated by the OFS without typing out each column number.
#Desired style of syntax, undesired result
[kbrandt#glade: ~] echo "1 2 3 4" | gawk 'BEGIN { OFS=" :-( "}; {print $0}'
1 2 3 4
#Desired result, undesired syntax
[kbrandt#glade: ~] echo "1 2 3 4" | gawk 'BEGIN { OFS=" :-) "}; {print $1,$2,$3,$4}'
1 :-) 2 :-) 3 :-) 4
This is a variation on the first style:
echo "1 2 3 4" | gawk 'BEGIN { OFS=" :-( "}; {$1=$1; print $0}'
Results:
1 :-( 2 :-( 3 :-( 4
Explanation:
the $1=$1 is to rebuild the record, using the current OFS (you can also see http://www.gnu.org/software/gawk/manual/gawk.html#Changing-Fields)
Update:
(suggested by #EdMorton and #steve) This is a briefer, equivalent version of the awk command, that sets OFS in the command line, and takes advantage of print $0 as the default action:
awk -v OFS=" :-( " '{$1=$1}1'
Sed equivalent:
$ echo "1 2 3 4" | sed 's/ /:-)/g'
Here's another option with awk:
$ echo "1 2 3 4" | awk '{ gsub(/\s/, ":-)")}1'