awk, how to pass in a list of files based on a condition?

awk, how to pass in a list of files based on a condition? - awk

I was wondering if there is any way to pass in a file list to awk. The file list has thousands of files and I am using a grep -l to find a subset of files I am interested in passing to awk
E.g.,
grep -l id file-*.csv
file-1.csv
file-2.csv
$ cat file-1.csv
id,col_1,col_2
1,abc,100
2,def,200
$ cat file-2.csv
id,col_1,col_2
3,xyz,1000
4,hij,2000
If I do
$ awk -F, '{print $2,$3}' file-1.csv file-2.csv | grep -v col
abc 100
def 200
xyz 1000
hij 2000
it works how I would want but seeing as there are too many files to manually do like this
file-1.csv file-2.csv
I was wondering if there is a way to pass in the result of the...
grep -l id file-*.csv
Edit:
grep -l id
is the condition. Each file has a header but only some have 'id' in the header so I can't use the file-*.csv wildcard in the awk statement.
If I did an ls on file-*.csv I would end up with more the file-1.csv and file-2.csv.
e.g.,
$ cat file-3.csv
name,col,num
a1,hij,3000
b2,lmn,50000
$ ls -l file-*.csv
-rw-r--r-- 1 tp staff 35 20 Sep 18:50 file-1.csv
-rw-r--r-- 1 tp staff 37 20 Sep 18:51 file-2.csv
-rw-r--r-- 1 tp staff 38 20 Sep 18:52 file-3.csv
$ grep -l id file-*.csv
file-1.csv
file-2.csv

Based on the output you show under "If I do", it sounds like this might be what you're trying to do:
awk -F, 'FNR>1{print $2,$3}' file-*.csv
but your question isn't clear so it's a guess.
Given your updated question all you need with GNU awk for nextfile is:
awk -F, 'FNR==1{if ($1 != "id") nextfile} {print $2,$3}' file-*.csv
and with any awk (but less efficiently than with GNU awk):
awk -F, 'FNR==1{f=($1=="id"?1:0); next} f{print $2,$3}' file-*.csv

awk -F, 'NR > 1{print $2,$3}' $(grep -l id file-*.csv)
(This will not work if any of your filenames contain whitespace.)

To find the files with id field, merge/output their contents excluding the lines with field id:
grep trick:
grep --no-group-separator -hA 1000000 'id' file-*.csv | grep -v 'id'
-h - suppress the prefixing the file names on output
-A num - print num lines of trailing context after matching line(s). 1000000 - considered as maximal number of line which, presumably, will not be exceeded(you may adjust it in case if you really have files with more than 1000000 lines)
The output (for 2 sample files from the question):
1,abc,100
2,def,200
3,xyz,1000
4,hij,2000

Related

How to grep the outputs of awk, line by line?

Let's say I have the following text file:
$ cat file1.txt outputs
MarkerName Allele1 Allele2 Freq1 FreqSE P-value Chr Pos
rs2326918 a g 0.8510 0.0001 0.5255 6 130881784
rs2439906 c g 0.0316 0.0039 0.8997 10 6870306
rs10760160 a c 0.5289 0.0191 0.8107 9 123043147
rs977590 a g 0.9354 0.0023 0.8757 7 34415290
rs17278013 t g 0.7498 0.0067 0.3595 14 24783304
rs7852050 a g 0.8814 0.0006 0.7671 9 9151167
rs7323548 a g 0.0432 0.0032 0.4555 13 112320879
rs12364336 a g 0.8720 0.0015 0.4542 11 99515186
rs12562373 a g 0.7548 0.0020 0.6151 1 164634379
Here is an awk command which prints MarkerName if Pos >= 11000000
$ awk '{ if($8 >= 11000000) { print $1 }}' file1.txt
This command outputs the following:
MarkerName
rs2326918
rs10760160
rs977590
rs17278013
rs7323548
rs12364336
rs12562373
Question: I would like to feed this into a grep statement to parse another text file, textfile2.txt. Somehow, one pipes the output from the previous awk command into grep AWKOUTPUT textfile2.txt
I would like each row of the awk command above to be grepped against textfile2.txt, i.e.
grep "rs2326918" textfile2.txt
## and then
grep "rs10760160" textfile2.txt
### and then
...
Naturally, I would save all resulting rows from textfile2.txt into a final file, i.e.
$ awk '{ if($8 >= 11000000) { print $1 }}' file1.txt | grep PIPE_OUTPUT_BY_ROW textfile2.txt > final.txt
How does one grep from a pipe line by line?
EDIT: To clarify, the one constraint I have is that file1.txt is actually the output of a previous pipe. (I'm trying to simplify the question somewhat.) How would that change the answer?

awk + grep solution:
grep -f <(awk '$8 >= 11000000{ print $1 }' file1.txt) textfile2.txt > final.txt
-f file - obtain patterns from file, one per line

You can use bash to do this:
bash-3.1$ echo "rs2326918" > filename2.txt
bash-3.1$ (for i in `awk '{ if($8 >= 11000000) { print $1 }}' file1.txt |
grep -v MarkerName`; do grep $i filename2.txt; done) > final.txt
bash-3.1$ cat final.txt
rs2326918
Alternatively,
bash-3.1$ cat file1.txt | (for i in `awk '{ if($8 >= 11000000) { print $1 }}' |
grep -v MarkerName`; do grep $i filename2.txt; done) > final.txt
The switch grep -v tells grep to reverse its usual activity and print all lines that do not match the pattern. This switch "inVerts" the match.

only using awk can do this for you:
$ awk 'NR>1 && NR==FNR {if ($8 >= 110000000) a[$1]++;next} \
{ for(i in a){if($0~i) print}}' file1.txt file2.txt> final.txt

printf usage with awk when printing multiple columns

I am trying this below command:
cat dcl1serrfip_check.csv | grep -Fi 'BANK0_F5_WRDAT_P0[0]' | grep -i setup | grep 'L2H' | grep highv | grep -i low | awk -F ',' -v dev="0.861" -v rc="1.105" -v inte="0.872" '{ print ($10+$11)-(($12+$13)-($14))","($10*dev)+($11*rc)-(($12*dev)+($13*rc)-($14*inte))}'
This gives below output:
-6.93889e-18,0.000288
I want this output to be formatted to 4 decimal places. How to do it? The desired output would be
-0.0000,0.0002

You need, %0.4f or %.4f
To Test use :
awk 'BEGIN{ printf("%0.4f\n", -6.93889e-18) }'
So it becomes:
printf("%0.4f,%0.4f\n", ($10+$11)-(($12+$13)-($14)), ($10*dev)+($11*rc)-(($12*dev)+($13*rc)-($14*inte)) )
Actually you can rewrite your command in awk itself, no need of so many grep and cat combination

match duplicate string before a specified delimiter

cat test.txt
serverabc.test.net
serverabc.qa.net
serverabc01.test.net
serverstag.staging.net
serverstag.test.net
here i need to match the duplicate strings just before the delimiter '.'
So the expected output would be like below. because string "serverabc" and "serverstag" found to be duplicates. Please help.
serverabc.test.net
serverabc.qa.net
serverstag.staging.net
serverstag.test.net

awk to the rescue!
$ awk -F\. '{c[$1]++; a[$1]=a[$1]?a[$1]RS$0:$0}
END{for(k in c) if(c[k]>1) print a[k]}' file
serverabc.test.net
serverabc.qa.net
serverstag.staging.net
serverstag.test.net

If it is not going to be used allot I would probably just do something like this:
cut -f1 -d\. foo.txt | sort |uniq -c | grep -v " 1 " | cut -c 9-|sed 's/\(.*\)/^\1\\./' > dup.host
grep -f dup.host foo.txt
serverabc.test.net
serverabc.qa.net
serverstag.staging.net
serverstag.test.net

Awk last field string substitution

I am trying to get the last filed using string substiution of following output using awk -
ps -ef |grep -i "[o]cssd.bin"
Output:
grid 47275 1 1 Sep23 ? 17:49:39 /opt/grid/12.1/bin/ocssd.bin
used awk as -
$ ps -ef | grep -i "[o]cssd.bin" | awk '{ gsub("/ocssd.bin",""); print $NF}'
output:
$NF}
/opt/grid/12.1/bin
How to avoid "$NF}" ? I only need "/opt/grid/12.1/bin" ..!

try:
ps -ef | grep -i "[o]cssd.bin" | awk '{ if(gsub("/ocssd.bin","")) print $NF}'

How can I add lines in a file from another file using awk or sed

I have 2 files:
File 1:
1012055500012221
2011052210011021
3010051501010221
4015051510012201
File 2:
50222111
60202100
75222105
90202125
I want:
1012055500012221
2011052210011021
3010051501010221
4015051510012201
50222111
60202100
75222105
90202125
How can I do that in awk or sed?

Why do you need awk/sed when
cat file1 >> file2
will do just as well?
or if you want to leave the original two files alone and produce the joined file as a seperate one:
cat file1 file2 >> file3

A small (cryptic) awk game :)
$ cat 0
1012055500012221
2011052210011021
3010051501010221
4015051510012201
$ cat 1
50222111
60202100
75222105
90202125
$ awk 42 0 1
1012055500012221
2011052210011021
3010051501010221
4015051510012201
50222111
60202100
75222105
90202125

Since you wanted sed then here it is:
sed '' file{1,2}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

awk, how to pass in a list of files based on a condition? - awk

awk -F, 'NR > 1{print $2,$3}' $(grep -l id file-*.csv) (This will not work if any of your filenames contain whitespace.)

Related

How to grep the outputs of awk, line by line?

printf usage with awk when printing multiple columns

match duplicate string before a specified delimiter

Awk last field string substitution

How can I add lines in a file from another file using awk or sed

Categories

Resources