Compare (match) command output with lines in file - awk

I'm piping the output of a command to awk and i want to check if that output has a match in the lines of a file.
Let's say i have the following file:
aaa
bbb
ccc
...etc
Then, let's say i have a command 'anything' that returns, my goal is to pipe anything | awk to check if the output of that command has a match inside the file (if it doesn't, i would like to append it to the file, but that's not difficult..). My problem is that i don't know how to read from both the command output and the file at the same time.
Any advice is welcome

My problem is that i don't know how to read from both the command output and the file at the same time.
Use - to represent standard input in the list of files for awk to read:
$ cat file
aaa
bbb
ccc
$ echo xyz | awk '{print}' - file
xyz
aaa
bbb
ccc
EDIT
There are various options for handing each input source separately:
Using FILENAME:
$ echo xyz | awk 'FILENAME=="-" {print "Command output: " $0} FILENAME=="input.txt" {print "from file: " $0}' - input.txt
Command output: xyz
from file: aaa
from file: bbb
from file: ccc
Using ARGIND (gawk only):
$ echo xyz | awk 'ARGIND==1 {print "Command output: " $0} ARGIND==2 {print "from file: " $0}' - input.txt
Command output: xyz
from file: aaa
from file: bbb
from file: ccc
When there are only two files, it is common to see the NR==FNR idiom. See subheading "Two-file processing" here: http://backreference.org/2010/02/10/idiomatic-awk/
$ echo xyz | awk 'FNR==NR {print "Command output: " $0; next} {print "from file: " $0}' - input.txt
Command output: xyz
from file: aaa
from file: bbb
from file: ccc

command | awk 'script' file -
The - represents stdin. Swap the order of the arguments if appropriate. Read Effective Awk Programming, 4th Edition, by Arnold Robbins to learn how to use awk.

Related

Replace work in line by combining after grep

I need to replace a word after performing a grep and getting the last line of the result.
Here my example file:
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 NONE
What I need is to select all lines containing 'aaa', get the last one in the result and replace NONE.
I tried
cat <file> | grep "aaa" | tail -n 1 | sed -i 's/NONE/ts8/g'
but it doesn't work.
Any suggestion to do that?
Thanks
With tac + awk solution please try following.
tac Input_file | awk '/aaa/ && ++count==1{sub(/NONE/,"ts8")} 1' | tac
once you are happy with above command try following, to do inplace save into Input_file.
tac Input_file | awk '/aaa/ && ++count==1{sub(/NONE/,"ts8")} 1' | tac > temp && mv temp Input_file
Explanation: Firstly printing Input_file in reverse order by tac then sending its standard output to awk as an input where substituting of NONE to ts8 in very first line(which is actually last line containing aaa). Simply printing all other lines, again sending output to tac to make it in actual order like Input_file's order.
For doing this in a single command, this should work in any version of awk:
awk 'FNR==NR {if ($1=="aaa") n=FNR; next} FNR == n {$3="TS7"} 1' file{,}
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 TS7
To save output in same file use:
awk 'FNR==NR {if ($1=="aaa") n=FNR; next}
FNR == n {$3="TS7"} 1' file{,} > file.out && mv file.out file
Or using gnu sed, you may use:
sed -i -Ez 's/(.*\naaa[[:blank:]]+[^[:blank:]]+[[:blank:]]+)NONE/\1ts8/' file
cat file
aaa ts1 ts2
bbb ts3 ts4
aaa ts5 ts6
aaa ts7 ts8
If you want to get the last line that matches aaa at the start, you can go through all lines, and in the END block, print the last occurrence and replace NONE with ts8 using awk:
awk '$1=="aaa"{last=$0}END{sub(/NONE/,"ts8",last);print last}' file
In parts:
$1=="aaa" { # If the first field is aaa
last=$0 # Set variable last to the whole line (overwrite on each match)
}
END { # Run once at the end
sub(/NONE/,"ts8",last) # Replace NONE with ts8 in the last variable
print last
}
' file
Output
aaa ts7 ts8

Query the contents of a file using another file in AWK

I am trying to conditionally filter a file based on values in a second file. File1 contains numbers and File2 contains two columns of numbers. The question is to filter out those rows in file1 which fall within the range denoted in each row of file2.
I have a series of loops which works, but takes >12hrs to run depending on the lengths of both files. This code is noted below. Alternatively, I have tried to use awk, and looked at other questions posted on slack overflow, but I cannot figure out how to change the code appropriately.
Loop method:
while IFS= read READ
do
position=$(echo $READ | awk '{print $4}')
while IFS= read BED
do
St=$(echo $BED | awk '{print $2}')
En=$(echo $BED | awk '{print $3}')
if (($position < "$St"))
then
break
else
if (($position >= "$St" && $position <= "$En"));
then
echo "$READ" | awk '{print $0"\t EXON"}' >> outputfile
fi
fi
done < file2
done < file1
Blogs with similar questions:
awk: filter a file with another file
awk 'NR==FNR{a[$1];next} !($2 in a)' d3_tmp FS="[ \t=]" m2p_tmp
Find content of one file from another file in UNIX
awk -v FS="[ =]" 'NR==FNR{rows[$1]++;next}(substr($NF,1,length($NF)-1) in rows)' File1 File2
file1: (tab delimited)
AAA BBB 1500
CCC DDD 2500
EEE FFF 2000
file2: (tab delimited)
GGG 1250 1750
HHH 1950 2300
III 2600 2700
Expected output would retain rows 1 and 3 from file1 (in a new file, file3) because these records fall within the ranges of row 1 columns 2 and 3, and row 2 columns 2 and columns 3 of file2. In the actual files, they're not row restricted i.e. I am not wanting to look at row1 of file1 and compare to row1 of file2, but compare row1 to all rows in file2 to get the hit.
file3 (output)
AAA BBB 1500
EEE FFF 2000
One way:
awk 'NR==FNR{a[i]=$2;b[i++]=$3;next}{for(j=0;j<i;j++){if ($3>=a[j] && $3<=b[j]){print;}}}' i=0 file2 file1
AAA BBB 1500
EEE FFF 2000
Read the file2 contents and store it in arrays a and b. When file1 is read, check for the number to be between the entire a and b arrays and print.
One more option:
$ awk 'NR==FNR{for(i=$2;i<=$3;i++)a[i];next}($3 in a)' file2 file1
AAA BBB 1500
EEE FFF 2000
File2 is read and the entire range of numbers is broken up and stored into the associate array a. When we read the file1, we just need to lookup the array a.
Another awk. It may or may not make sense depending on the filesizes:
$ awk '
NR==FNR {
a[$3]=$2 # hash file2 records, $3 is key, $2 value
next
}
{
for(i in a) # for each record in file1 go thru ever element in a
if($3<=i && $3>=a[i]) { # if it falls between
print # output
break # exit loop once match found
}
}' file2 file1
Output:
AAA BBB 1500
EEE FFF 2000

Join two tables with AWK, one from stdin, other from file

I have two tab-delimited files:
file tmp1.tsv:
1 aaa
2 bbb
3 ccc
4 ddd
5 eee
file tmp2.tsv:
3
2
4
I want to get this:
3 ccc
2 bbb
4 ddd
Using following routine:
$ cat tmp2.tsv | awk -F '\t' <magic here> tmp1.tsv
I know how to make it without stdin:
$ awk -F '\t' 'FNR==NR{ a[$1] = $2; next }{ print $1 FS a[$1] }' tmp1.tsv tmp2.tsv
But have no idea how to make it with stdin. Also, explanation of the solution will be appreciated.
Assuming your solution works as desired, it is trivial. Instead of:
awk -F '\t' 'FNR==NR{ a[$1] = $2; next }{ print $1 FS a[$1] }' tmp1.tsv tmp2.tsv
simply do:
< tmp2.tsv awk -F '\t' 'FNR==NR{ a[$1] = $2; next }{ print $1 FS a[$1] }' tmp1.tsv -
(Note that I've replaced cat tmp2.tsv | with a redirect to avoid UUOC.)
That is, specify a filename of - and awk will read from stdin.

Reading a file from line 4 to the end

I want to read a file from the line 4 to the very end is there anyway to this with awk or something?
This sed command will do:
sed -n '4,$p' file.txt
Or using awk:
awk 'NR>=4' file.txt
Or using tail:
tail +4 file.txt
awk 'NR >= 4 {print $0}'
For example
$> seq 101 110 | awk 'NR >= 4 {print $0}'
104
105
106
107
108
109
110
tail +4 filename ll serve ur purpose.
more on tail
heres a method (that can depend on the type of shell you use, bash should work):
tmpvar=`cat a_file | wc -l `; tail -$((tmpvar-4)) a_file
heres another method that should work in more shells:
cat a_file -n | awk '{if($1>4) print $2}'

Using awk for a table lookup

I would like to use awk to lookup a value from a text file. The text file has a very simple format:
text \t value
text \t value
text \t value
...
I want to pass the actual text for which the value should be looked up via a shell variable, e.g., $1.
Any ideas how I can do this with awk?
your help is great appreciated.
All the best,
Alberto
You can do this in a pure AWK script without a shell wrapper:
#!/usr/bin/awk -f
BEGIN { key = ARGV[1]; ARGV[1]="" }
$1 == key { print $2 }
Call it like this:
./lookup.awk keyval lookupfile
Example:
$ cat lookupfile
aaa 111
bbb 222
ccc 333
ddd 444
zzz 999
mmm 888
$ ./lookup.awk ddd lookupfile
444
$ ./lookup.awk zzz lookupfile
999
This could even be extended to select the desired field using an argument.
#!/usr/bin/awk -f
BEGIN { key = ARGV[1]; field = ARGV[2]; ARGV[1]=ARGV[2]="" }
$1 == key { print $field }
Example:
$ cat lookupfile2
aaa 111 abc
bbb 222 def
ccc 333 ghi
ddd 444 jkl
zzz 999 mno
mmm 888 pqr
$ ./lookupf.awk mmm 1 lookupfile2
mmm
$ ./lookupf.awk mmm 2 lookupfile2
888
$ ./lookupf.awk mmm 3 lookupfile2
pqr
Something like this would do the job:
#!/bin/sh
awk -vLOOKUPVAL=$1 '$1 == LOOKUPVAL { print $2 }' < inputFile
Essentially you set the lookup value passed into the shell script in $1 to an awk variable, then you can access that within awk itself. To clarify, the first $1 is the shell script argument passed in on the command line, the second $1 (and subsequent $2) are fields 1 and 2 of the input file.
TEXT=`grep value file | cut -f1`
I think grep might actually be a better fit:
$ echo "key value
ambiguous correct
wrong ambiguous" | grep '^ambiguous ' | awk ' { print $2 } '
The ^ on the pattern is to match to the start of the line and ensure that you don't match a line where the value, rather than the key, was the desired text.