Awk match and for loop requirement

Awk match and for loop requirement - awk

I'm struggling to understand what I'm doing wrong here. The output gets repeated the number of times a disk is seen:
Parted -l |
awk -F '[ :] '/^Disk \//{print $2}' |
while read LINE; do
blkid $LINE1 |
awk -F ' ' '{
for(i=1;i<=NF;i++) {
print $i;
if($i ~ /UUID*/) {
print $i;
exit
}
}
}';
done
If I have three disks in my system e.g. /dev/sda1 /dev/sdb1 /dev/sdc1
Input to above set of commands:
/dev/sda1: LABEL="/" UUID="feg45687-576466-676445-6536-7534-5784326797fbfg" TYPE="ext4"
Current output from above commands:
UUID="feg45687-576466-676445-6536-7534-5784326797fbfg"
UUID="tug45687-576466-676445-6536-7534-68546df"
UUID="feg45687-576466-676445-6536-7534-9908536lk"
UUID="feg45687-576466-676445-6536-7534-5784326797fbfg"
UUID="tug45687-576466-676445-6536-7534-68546df"
UUID="feg45687-576466-676445-6536-7534-9908536lk"
UUID="feg45687-576466-676445-6536-7534-5784326797fbfg"
UUID="tug45687-576466-676445-6536-7534-68546df"
UUID="feg45687-576466-676445-6536-7534-9908536lk"
What is expected output:
UUID="feg45687-576466-676445-6536-7534-5784326797fbfg"
UUID="tug45687-576466-676445-6536-7534-68546df"
UUID="feg45687-576466-676445-6536-7534-9908536lk"
So if I have three disks i get the same UUID line repeated three times.
Any way of moving to the next LINE without repeating?

Your problem (or, at least, one of your problems) is probably in the lines:
while read LINE; do
blkid $LINE1 |
You read into $LINE, but you reference $LINE1, an unrelated and probably uninitialized variable, which probably leads blkid to list all devices each time.
Fix: choose one name or the other — don't use both.

Related

AWK:Convert columns to rows with condition (create list ) [duplicate]

I have a tab-delimited file with three columns (excerpt):
AC147602.5_FG004 IPR000146 Fructose-1,6-bisphosphatase class 1/Sedoheputulose-1,7-bisphosphatase
AC147602.5_FG004 IPR023079 Sedoheptulose-1,7-bisphosphatase
AC148152.3_FG001 IPR002110 Ankyrin repeat
AC148152.3_FG001 IPR026961 PGG domain
and I'd like to get this using bash:
AC147602.5_FG004 IPR000146 Fructose-1,6-bisphosphatase class 1/Sedoheputulose-1,7-bisphosphatase IPR023079 Sedoheptulose-1,7-bisphosphatase
AC148152.3_FG001 IPR023079 Sedoheptulose-1,7-bisphosphatase IPR002110 Ankyrin repeat IPR026961 PGG domain
So if ID in the first column are the same in several lines, it should produce one line for each ID with all other parts of lines joined. In the example it will give two-row file.

give this one-liner a try:
awk -F'\t' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file

For whatever reason, the awk solution does not work for me in cygwin. So I used Perl instead. It joins around a tab character and separates line by \n
cat FILENAME | perl -e 'foreach $Line (<STDIN>) { #Cols=($Line=~/^\s*(\d+)\s*(.*?)\s*$/); push(#{$Link{$Cols[0]}}, $Cols[1]); } foreach $List (values %Link) { print join("\t", #{$List})."\n"; }'

will depend off file size (and awk limitation)
if too big this will reduce the awk need by sorting file first and only keep 1 label in memory for printing
A classical version with post print using a modification of the whole line
sort YourFile \
| awk '
last==$1 { sub( /^[^[:blank:]]*[[:blank:]]+/, ""); C = C " " $0; next}
NR > 1 { print Last C; Last = $1; C = ""}
END { print Last}
'
Another version using field and pre-print but less "human readable"
sort YourFile \
| awk '
last!=$1 {printf( "%s%s", (! NR ? "\n" : ""), Last=$1)}
last==$1 {for( i=2;i<NF;i++) printf( " %s", $i)}
'

A pure bash version. It has no additional dependencies, but requires bash 4.0 or above (2009) for associative array support.
All on one line:
{ declare -A merged; merged=(); while IFS=$'\t' read -r key value; do merged[$key]="${merged[$key]}"$'\t'"$value"; done; for key in "${!merged[#]}"; do echo "$key${merged[$key]}"; done } < INPUT_FILE.tsv
Readable and commented equivalent:
{
# Define `merged` as an empty associative array.
declare -A merged
merged=()
# Read tab-separated lines. Any leftover fields also end up in `value`.
while IFS=$'\t' read -r key value
do
# Append to any value that's already there, separated by a tab.
merged[$key]="${merged[$key]}"$'\t'"$value"
done
# Loop over the input keys. Note that the order is arbitrary;
# pipe through `sort` if you want a predictable order.
for key in "${!merged[#]}"
do
# Each value is prefixed with a tab, so no need for a tab here.
echo "$key${merged[$key]}"
done
} < INPUT_FILE.tsv

How to AWK print only specific item?

I have a log file that looks like this:
RPT_LINKS=1,T1999
RPT_NUMALINKS=1
RPT_ALINKS=1,1999TK,2135,2009,31462,29467,2560
RPT_TXKEYED=1
RPT_ETXKEYED=0
I have used grep to isolate the line I am interested in with the RPT_ALINKS. In that line I want to know how to use AWK to print only the link that ends with a TK.
I am really close running this:
grep -w 'RPT_ALINKS' stats2.log | awk -F 'TK' '{print FS }'
But I am sure those who are smarter than me already know I am getting only the TK back, how do I get the entire field so that I would get a return of 1999TK?

If there is only a single RT in that line and RT is always at the end:
awk '/RPT_ALINKS/{match($0,/[^=,]*TK/); print substr($0,RSTART,RLENGTH)}'
You can also use a double grep
grep -w 'RPT_ALINKS' stats2.log | grep -wo '[^=,]*TK'
The following sed solution also works nicely:
sed '/RPT_ALINKS/s/\(^.*[,=]\)\([^=,]*TK\)\(,.*\)\?/\2/'

It doesn't get any more elegant
awk -F '=' '$1=="RPT_ALINKS" {n=split($2,array,",")
for(i=1; i<=n; i++)
if (array[i] ~ /TK$/)
{print array[i]}}
' stats2.log
n=split($2,array,","): split 1,1999TK,2135,2009,31462,29467,2560 with , to array array. n contains number of array elements, here 7.

Here is a simple solution
awk -F ',|=' '/^RPT_ALINKS/ { for (i=1; i<=NF; i++) if ($i ~ /TK$/) print $i }' stats2.log
It looks only on the record which begins with RPT_ALINKS. And there it check every field. If field ends with TK, then it prints it.

Dang, I was just about to post the double-grep alternative, but got scooped. And all the good awk solutions are taken as well.
Sigh. So here we go in bash, for fun.
$ mapfile a < stats2.log
$ for i in "${a[#]}"; do [[ $i =~ ^RPT_ALINKS=(.+,)*([^,]+TK) ]] && echo "${BASH_REMATCH[2]}"; done
1999TK
This has the disadvantage of running way slower than awk and not using fields. Oh, and it won't handle multiple *TK items on a single line. And like sed, this is processing lines as patterns rather than fields, which saps elegance. And by using mapfile, we limit the size of input you can handle because your whole log is loaded into memory. Of course you don't really need to do that, but if you were going to use a pipe, you'd use a different tool anyway. :-)
Happy Thursday.

With a sed that has -E for EREs, e.g. GNU or OSX/BSD sed:
$ sed -En 's/^RPT_ALINKS=(.*,)?([^,]*TK)(,.*|$)/\2/p' file
1999TK
With GNU awk for the 3rd arg to match():
$ awk 'match($0",",/^RPT_ALINKS=(.*,)?([^,]*TK),.*/,a){print a[2]}' file
1999TK

Instead of looping through it, you can use an other alternative.
This will be fast, loop takes time.
awk -F"TK" '/RPT_ALINKS/ {b=split($1,a,",");print a[b]FS}' stats2.log
1999TK
Here you split the line by setting field separator to TK and search for line that contains RPT_ALINKS
That gives $1=RPT_ALINKS=1,1999 and $2=,2135,2009,31462,29467,2560
$1 will always after last comma have our value.
So split it up using split function by comma. b would then contain number of fields.
Since we know that number would be in last section we do use a[b] and add FS that contains TK

AWK, Comma delimited fields enclosed in quotes [duplicate]

The intent of this question is to provide a canonical answer.
Given a CSV as might be generated by Excel or other tools with embedded newlines and/or double quotes and/or commas in fields, and empty fields like:
$ cat file.csv
"rec1, fld1",,"rec1"",""fld3.1
"",
fld3.2","rec1
fld4"
"rec2, fld1.1
fld1.2","rec2 fld2.1""fld2.2""fld2.3","",rec2 fld4
"""""","""rec3,fld2""",
What's the most robust way efficiently using awk to identify the separate records and fields:
Record 1:
$1=<rec1, fld1>
$2=<>
$3=<rec1","fld3.1
",
fld3.2>
$4=<rec1
fld4>
----
Record 2:
$1=<rec2, fld1.1
fld1.2>
$2=<rec2 fld2.1"fld2.2"fld2.3>
$3=<>
$4=<rec2 fld4>
----
Record 3:
$1=<"">
$2=<"rec3,fld2">
$3=<>
----
so it can be used as those records and fields internally by the rest of the awk script.
A valid CSV would be one that conforms to RFC 4180 or can be generated by MS-Excel.
The solution must tolerate the end of record just being LF (\n) as is typical for UNIX files rather than CRLF (\r\n) as that standard requires and Excel or other Windows tools would generate. It will also tolerate unquoted fields mixed with quoted fields. It will specifically not need to tolerate escaping "s with a preceding backslash (i.e. \" instead of "") as some other CSV formats allow - if you have that then adding a gsub(/\\"/,"\"\"") up front would handle it and trying to handle both escaping mechanisms automatically in one script would make the script unnecessarily fragile and complicated.

If your CSV cannot contain newlines then all you need is (with GNU awk for FPAT):
$ echo 'foo,"field,""with"",commas",bar' |
awk -v FPAT='[^,]*|("([^"]|"")*")' '{for (i=1; i<=NF;i++) print i " <" $i ">"}'
1 <foo>
2 <"field,""with"",commas">
3 <bar>
or the equivalent using any awk:
$ echo 'foo,"field,""with"",commas",bar' |
awk -v fpat='[^,]*|("([^"]|"")*")' -v OFS=',' '{
rec = $0
$0 = ""
i = 0
while ( (rec!="") && match(rec,fpat) ) {
$(++i) = substr(rec,RSTART,RLENGTH)
rec = substr(rec,RSTART+RLENGTH+1)
}
for (i=1; i<=NF;i++) print i " <" $i ">"
}'
1 <foo>
2 <"field,""with"",commas">
3 <bar>
See https://www.gnu.org/software/gawk/manual/gawk.html#More-CSV for info on the specific FPAT setting I use above.
If all you actually want to do is convert your CSV to individual lines by, say, replacing newlines with blanks and commas with semi-colons inside quoted fields then all you need is this, again using GNU awk for multi-char RS and RT:
$ awk -v RS='"([^"]|"")*"' -v ORS= '{gsub(/\n/," ",RT); gsub(/,/,";",RT); print $0 RT}' file.csv
"rec1; fld1",,"rec1"";""fld3.1 ""; fld3.2","rec1 fld4"
"rec2; fld1.1 fld1.2","rec2 fld2.1""fld2.2""fld2.3","",rec2 fld4
"""""","""rec3;fld2""",
Otherwise, though, the general, robust, portable solution to identify the fields that will work with any modern awk* is:
$ cat decsv.awk
function buildRec( fpat,fldNr,fldStr,done) {
CurrRec = CurrRec $0
if ( gsub(/"/,"&",CurrRec) % 2 ) {
# The string built so far in CurrRec has an odd number
# of "s and so is not yet a complete record.
CurrRec = CurrRec RS
done = 0
}
else {
# If CurrRec ended with a null field we would exit the
# loop below before handling it so ensure that cannot happen.
# We use a regexp comparison using a bracket expression here
# and in fpat so it will work even if FS is a regexp metachar
# or a multi-char string like "\\\\" for \-separated fields.
CurrRec = CurrRec ( CurrRec ~ ("[" FS "]$") ? "\"\"" : "" )
$0 = ""
fpat = "([^" FS "]*)|(\"([^\"]|\"\")+\")"
while ( (CurrRec != "") && match(CurrRec,fpat) ) {
fldStr = substr(CurrRec,RSTART,RLENGTH)
# Convert <"foo"> to <foo> and <"foo""bar"> to <foo"bar>
if ( gsub(/^"|"$/,"",fldStr) ) {
gsub(/""/, "\"", fldStr)
}
$(++fldNr) = fldStr
CurrRec = substr(CurrRec,RSTART+RLENGTH+1)
}
CurrRec = ""
done = 1
}
return done
}
# If your input has \-separated fields, use FS="\\\\"; OFS="\\"
BEGIN { FS=OFS="," }
!buildRec() { next }
{
printf "Record %d:\n", ++recNr
for (i=1;i<=NF;i++) {
# To replace newlines with blanks add gsub(/\n/," ",$i) here
printf " $%d=<%s>\n", i, $i
}
print "----"
}
.
$ awk -f decsv.awk file.csv
Record 1:
$1=<rec1, fld1>
$2=<>
$3=<rec1","fld3.1
",
fld3.2>
$4=<rec1
fld4>
----
Record 2:
$1=<rec2, fld1.1
fld1.2>
$2=<rec2 fld2.1"fld2.2"fld2.3>
$3=<>
$4=<rec2 fld4>
----
Record 3:
$1=<"">
$2=<"rec3,fld2">
$3=<>
----
The above assumes UNIX line endings of \n. With Windows \r\n line endings it's much simpler as the "newlines" within each field will actually just be line feeds (i.e. \ns) and so you can set RS="\r\n" (using GNU awk for multi-char RS) and then the \ns within fields will not be treated as line endings.
It works by simply counting how many "s are present so far in the current record whenever it encounters the RS - if it's an odd number then the RS (presumably \n but doesn't have to be) is mid-field and so we keep building the current record but if it's even then it's the end of the current record and so we can continue with the rest of the script processing the now complete record.
*I say "modern awk" above because there's apparently extremely old (i.e. circa 2000) versions of tawk and mawk1 still around which have bugs in their gsub() implementation such that gsub(/^"|"$/,"",fldStr) would not remove the start/end "s from fldStr. If you're using one of those then get a new awk, preferably gawk, as there could be other issues with them too but if that's not an option then I expect you can work around that particular bug by changing this:
if ( gsub(/^"|"$/,"",fldStr) ) {
to this:
if ( sub(/^"/,"",fldStr) && sub(/"$/,"",fldStr) ) {
Thanks to the following people for identifying and suggesting solutions to the stated issues with the original version of this answer:
#mosvy for escaped double quotes within fields.
#datatraveller1 for multiple contiguous pairs of escaped quotes in a field and null fields at the end of records.
Related: also see How do I use awk under cygwin to print fields from an excel spreadsheet? for how to generate CSVs from Excel spreadsheets.

An improvement upon #EdMorton's FPAT solution, which should be able to handle double-quotes(") escaped by doubling ("" -- as allowed by the CSV standard).
gawk -v FPAT='[^,]*|("[^"]*")+' ...
This STILL
isn't able to handle newlines inside quoted fields, which are perfectly legit in standard CSV files.
assumes GNU awk (gawk), a standard awk won't do.
Example:
$ echo 'a,,"","y""ck","""x,y,z"," ",12' |
gawk -v OFS='|' -v FPAT='[^,]*|("[^"]*")+' '{$1=$1}1'
a||""|"y""ck"|"""x,y,z"|" "|12
$ echo 'a,,"","y""ck","""x,y,z"," ",12' |
gawk -v FPAT='[^,]*|("[^"]*")+' '{
for(i=1; i<=NF;i++){
if($i~/"/){ $i = substr($i, 2, length($i)-2); gsub(/""/,"\"", $i) }
print "<"$i">"
}
}'
<a>
<>
<>
<y"ck>
<"x,y,z>
< >
<12>

This is exactly what csvquote is for - it makes things simple for awk and other command line data processing tools.
Some things are difficult to express in awk. Instead of running a single awk command and trying to get awk to handle the quoted fields with embedded commas and newlines, the data gets prepared for awk by csvquote, so that awk can always interpret the commas and newlines it finds as field separators and record separators. This makes the awk part of the pipeline simpler. Once awk is done with the data, it goes back through csvquote -u to restore the embedded commas and newlines inside quoted fields.
csvquote file.csv | awk -f my_awk_script | csvquote -u
EDIT:
For a complete description on csvquote, see: How it works. this also explains the `` characters which are shown in places where there was a carriage return.
csvquote file.csv | awk -f decsv.awk | csvquote -u
(for the source of decsv.awk see answer from Ed Morton )
outut:
Record 1:
$1=<rec1 fld1>
$2=<>
$3=<rec1","fld3.1",
fld3.2>
$4=<rec1
fld4>
----
Record 2:
$1=<rec2, fld1.1
fld1.2>
$2=<rec2 fld2.1"fld2.2"fld2.3>
$3=<>
$4=<rec2 fld4>
----
Record 3:
$1=<"">
$2=<"rec3fld2">
$3=<>
----

I have found csvkit a really useful toolkit to handle with csv files in command line.
line='test,t2,t3,"t5,"'
echo $line | csvcut -c 4
"t5,"
echo 'foo,"field,""with"",commas",bar' | csvcut -c 3
bar
It also contains csvstat, csvstack etc. tools which are also very handy.
cat file.csv
"rec1, fld1",,"rec1"",""fld3.1
"",
fld3.2","rec1
fld4"
"rec2, fld1.1
fld1.2","rec2 fld2.1""fld2.2""fld2.3","",rec2 fld4
"""""","""rec3,fld2""",
csvcut -c 1 file.csv
"rec1, fld1"
"rec2, fld1.1
fld1.2"
""""""
csvcut -c 3 file.csv
"rec1"",""fld3.1
"",
fld3.2"
""
""

Awk (gawk) actually provides extensions, one of which being csv processing, which is the most robust way to do so with gawk in my opinion. The extension takes care of many gotchas and parses the csv for you.
Assuming that extension is installed, you can use awk to show all lines where a specific csv field matches 123.
Assuming test.csv contains the following:
Name,Phone
"Woo, John",425-555-1212
"James T. Kirk",123
The following will print all lines where the Phone (aka the second field) is equal to 123:
gawk -l csv 'csvsplit($0,a) && a[2] == 123 {print a[1]}'
The output is:
James T. Kirk
How does it work?
-l csv asks gawk to load the csv extension by looking for it in $AWKLIBPATH;
csvsplit($0, a) splits the current line, and stores each field into a new array named a
&& a[2] == 123 checks that the second field is 123
if both conditions are true, it { print a[1] }, aka prints first csv field of the line.

If you're using one of the common AWK interpreters (Gawk, onetrueawk, mawk), the other solutions are your best bet. However, if you're able to use a different interpreter, frawk and GoAWK have proper CSV support built-in.
frawk is a very fast AWK implementation written in Rust. Use -i csv to process input in CSV mode. Note that frawk is not quite POSIX compatible (see differences).
GoAWK is a POSIX-compatible AWK implementation written in Go. Also supports -i csv mode, as well as -H (parse header row) with #"named_field" syntax (read more). Disclaimer: I'm the author of GoAWK.
With file.csv as per the question, you can simply use an AWK script with a regular for loop over the fields as follows:
$ cat records.awk
{
printf "Record %d:\n", NR
for (i=1; i<=NF; i++)
printf " $%d=<%s>\n", i, $i
print "----"
}
Then use either frawk -i csv or goawk -i csv to get the expected output. For example:
$ frawk -i csv -f records.awk file.csv
Record 1:
$1=<rec1, fld1>
$2=<>
$3=<rec1","fld3.1
",
fld3.2>
$4=<rec1
fld4>
----
Record 2:
$1=<rec2, fld1.1
fld1.2>
$2=<rec2 fld2.1"fld2.2"fld2.3>
$3=<>
$4=<rec2 fld4>
----
Record 3:
$1=<"">
$2=<"rec3,fld2">
$3=<>
----
$ goawk -i csv -f records.awk file.csv
Record 1:
... same as above ...
----

grep early stop with one match per pattern

Say I have a file where the patterns reside, e.g. patterns.txt. And I know that all the patterns will only be matched once in another file patterns_copy.txt, which in this case to make matters simple is just a copy of patterns.txt.
If I run
grep -m 1 --file=patterns.txt patterns_copy.txt > output.txt
I get only one line. I guess it's because the m flag stopped the whole matching process once the 1st line of the two files match.
What I would like to achieve is to have each pattern in patterns.txt matched only once, and then let grep move to the next pattern.
How do I achieve this?
Thanks.

Updated Answer
I have now had a chance to integrate what I was thinking about awk into the GNU Parallel concept.
I used /usr/share/dict/words as my patterns file and it has 235,000 lines in it. Using BenjaminW's code in another answer, it took 141 minutes, whereas this code gets that down to 11 minutes.
The difference here is that there are no temporary files and awk can stop once it has found all 8 of the things it was looking for...
#!/bin/bash
# Create a bash function that GNU Parallel can call to search for 8 things at once
doit() {
# echo Job: $9
# In following awk script, read "p1s" as a flag meaning "p1 has been seen"
awk -v p1="$1" -v p2="$2" -v p3="$3" -v p4="$4" -v p5="$5" -v p6="$6" -v p7="$7" -v p8="$8" '
$0 ~ p1 && !p1s {print; p1s++;}
$0 ~ p2 && !p2s {print; p2s++;}
$0 ~ p3 && !p3s {print; p3s++;}
$0 ~ p4 && !p4s {print; p4s++;}
$0 ~ p5 && !p5s {print; p5s++;}
$0 ~ p6 && !p6s {print; p6s++;}
$0 ~ p7 && !p7s {print; p7s++;}
$0 ~ p8 && !p8s {print; p8s++;}
{if(p1s+p2s+p3s+p4s+p5s+p6s+p7s+p8s==8)exit}
' patterns.txt
}
export -f doit
# Next line effectively uses 8 cores at a time to each search for 8 items
parallel -N8 doit {1} {2} {3} {4} {5} {6} {7} {8} {#} < patterns.txt
Just for fun, here is what it does to my CPU - blue means maxed out, and see if you can see where the job started in the green CPU history!
Other Thoughts
The above benefits from the fact that the input files are relatively well sorted, so it is worth looking for 8 things at a time because they are likely close to each other in the input file, and I can therefore avoid the overhead associated with creating one process per sought term. However, if your data are not well sorted, that may mean that you waste a lot of time looking further through the file than necessary to find the next 7, or 6 other items. In that case, you may be better off with this:
parallel grep -m1 "{}" patterns.txt < patterns.txt
Original Answer
Having looked at the size of your files, I now think awk is probably not the way to go, but GNU Parallel maybe is. I tried parallelising the problem two ways.
Firstly, I search for 8 items at a time in a single pass through the input file so that I have less to search through with the second set of greps that use the -m 1 parameter.
Secondly, I do as many of these "8-at-a-time" greps in parallel as I have CPU cores.
I use the GNU Parallel job number {#} as a unique temporary filename, and only create 16 (or however many CPU cores you have) temporary files at a time. The temporary files are prefixed ss (for sub-search) so they can call be deleted easily enough when testing.
The speedup seems to be a factor of about 4 times on my machine. I used /usr/share/dict/words as my test files.
#!/bin/bash
# Create a bash function that GNU Parallel can call to search for 8 things at once
doit() {
# echo Job: $9
# Make a temp filename using GNU Parallel's job number which is $9 here
TEMP=ss-${9}.txt
grep -E "$1|$2|$3|$4|$5|$6|$7|$8" patterns.txt > $TEMP
for i in $1 $2 $3 $4 $5 $6 $7 $8; do
grep -m1 "$i" $TEMP
done
rm $TEMP
}
export -f doit
# Next line effectively uses 8 cores at a time to each search for 8 items
parallel -N8 doit {1} {2} {3} {4} {5} {6} {7} {8} {#} < patterns.txt

You can loop over your patterns like this (assuming you're using Bash):
while read -r line; do
grep -m 1 "$line" patterns_copy.txt
done < patterns.txt > output.txt
Or, in one line:
while read -r line; do grep -m 1 "$line" patterns_copy.txt; done < patterns.txt > output.txt
For parallel processing, you can start the processes as background jobs:
while read -r line; do
grep -m 1 "$line" patterns_copy.txt &
read -r line && grep -m 1 "$line" patterns_copy.txt &
# Repeat the previous line as desired
wait # Wait for greps of this loop to finish
done < patterns.txt > output.txt
This is not really elegant as for each loop it will wait for the slowest grep to finish, but should still be faster than just one grep per loop.

How to use grep via awk?

I am trying to apply grep on just a few strings from a huge file. But, I'd like to pass that line to the grep command via the awk script. I also want the output redirected to the script.
I've an awk script that reads in records from a file. I want grep to be applied on only a few of the records. The current record, $0, will be the text on which grep is to be used.
How do I do the same? Currently, I'm trying this -
system("grep --count -w 'GOOD' \n" $0)
But, it doesn't seem to work. What should I be using?

In Gnu Awk you could use \< and \> to match beginning and end of a word, so
gawk '/\<GOOD\>/{++i} END{print i}'
will do the same as
grep -wc 'GOOD' file
If you want to count the total number of occurances (not only the number of lines, but also occurances within a given line/record) of the word GOOD you could use FPAT in Gnu Awk version 4, as
gawk 'BEGIN { FPAT="\\<GOOD\\>"; RS="^$" } { print NF }' file
If you want to count the number of exact matches of the phrase GOOD DI in a given record, for instance record number 3, you could use
gawk 'NR==3 { print patsplit($0,a,/GOOD DI/) }' file

Your question is not very clear, and it would help if you showed some of your input file, your entire script that you have so far and also the output you want to achieve.
In the mean time, as there is nothing in your question to suggest anything to the contrary, you could do the following:
awk 'somescript' somefile | grep --count -w 'GOOD DI'

You cannot apply grep on a text string, which is what you are doing. If you really need to use grep/system something like following would be needed:
system("echo '"$0"' | grep --count -w 'foo'")
But this is no good either as count only counts lines on which it occurs not the number of times on a line which is what you are after. Or so it seems.
If you use the regex as a split seperator you get the number of split occurences +1.
So following will work:
awk '{printf FNR; a=split($0,myarray,/.OOD/); print " "a-1}' file.txt
This would print each linenumber with the number of times your regex occured. (in this case ".OOD". Representing GOOD, FOOD, MOOD etc)

you can do it the old fashion way
awk 'BEGIN{count=0} {
for( i=1;i<=NF; i++) {
if( $i == "GOOD" ){
++count
}
}
}END {
print count
}' file

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Awk match and for loop requirement - awk

Related

AWK:Convert columns to rows with condition (create list ) [duplicate]

How to AWK print only specific item?

AWK, Comma delimited fields enclosed in quotes [duplicate]

grep early stop with one match per pattern

How to use grep via awk?

Categories

Resources