grep early stop with one match per pattern - awk

Say I have a file where the patterns reside, e.g. patterns.txt. And I know that all the patterns will only be matched once in another file patterns_copy.txt, which in this case to make matters simple is just a copy of patterns.txt.
If I run
grep -m 1 --file=patterns.txt patterns_copy.txt > output.txt
I get only one line. I guess it's because the m flag stopped the whole matching process once the 1st line of the two files match.
What I would like to achieve is to have each pattern in patterns.txt matched only once, and then let grep move to the next pattern.
How do I achieve this?
Thanks.

Updated Answer
I have now had a chance to integrate what I was thinking about awk into the GNU Parallel concept.
I used /usr/share/dict/words as my patterns file and it has 235,000 lines in it. Using BenjaminW's code in another answer, it took 141 minutes, whereas this code gets that down to 11 minutes.
The difference here is that there are no temporary files and awk can stop once it has found all 8 of the things it was looking for...
#!/bin/bash
# Create a bash function that GNU Parallel can call to search for 8 things at once
doit() {
# echo Job: $9
# In following awk script, read "p1s" as a flag meaning "p1 has been seen"
awk -v p1="$1" -v p2="$2" -v p3="$3" -v p4="$4" -v p5="$5" -v p6="$6" -v p7="$7" -v p8="$8" '
$0 ~ p1 && !p1s {print; p1s++;}
$0 ~ p2 && !p2s {print; p2s++;}
$0 ~ p3 && !p3s {print; p3s++;}
$0 ~ p4 && !p4s {print; p4s++;}
$0 ~ p5 && !p5s {print; p5s++;}
$0 ~ p6 && !p6s {print; p6s++;}
$0 ~ p7 && !p7s {print; p7s++;}
$0 ~ p8 && !p8s {print; p8s++;}
{if(p1s+p2s+p3s+p4s+p5s+p6s+p7s+p8s==8)exit}
' patterns.txt
}
export -f doit
# Next line effectively uses 8 cores at a time to each search for 8 items
parallel -N8 doit {1} {2} {3} {4} {5} {6} {7} {8} {#} < patterns.txt
Just for fun, here is what it does to my CPU - blue means maxed out, and see if you can see where the job started in the green CPU history!
Other Thoughts
The above benefits from the fact that the input files are relatively well sorted, so it is worth looking for 8 things at a time because they are likely close to each other in the input file, and I can therefore avoid the overhead associated with creating one process per sought term. However, if your data are not well sorted, that may mean that you waste a lot of time looking further through the file than necessary to find the next 7, or 6 other items. In that case, you may be better off with this:
parallel grep -m1 "{}" patterns.txt < patterns.txt
Original Answer
Having looked at the size of your files, I now think awk is probably not the way to go, but GNU Parallel maybe is. I tried parallelising the problem two ways.
Firstly, I search for 8 items at a time in a single pass through the input file so that I have less to search through with the second set of greps that use the -m 1 parameter.
Secondly, I do as many of these "8-at-a-time" greps in parallel as I have CPU cores.
I use the GNU Parallel job number {#} as a unique temporary filename, and only create 16 (or however many CPU cores you have) temporary files at a time. The temporary files are prefixed ss (for sub-search) so they can call be deleted easily enough when testing.
The speedup seems to be a factor of about 4 times on my machine. I used /usr/share/dict/words as my test files.
#!/bin/bash
# Create a bash function that GNU Parallel can call to search for 8 things at once
doit() {
# echo Job: $9
# Make a temp filename using GNU Parallel's job number which is $9 here
TEMP=ss-${9}.txt
grep -E "$1|$2|$3|$4|$5|$6|$7|$8" patterns.txt > $TEMP
for i in $1 $2 $3 $4 $5 $6 $7 $8; do
grep -m1 "$i" $TEMP
done
rm $TEMP
}
export -f doit
# Next line effectively uses 8 cores at a time to each search for 8 items
parallel -N8 doit {1} {2} {3} {4} {5} {6} {7} {8} {#} < patterns.txt

You can loop over your patterns like this (assuming you're using Bash):
while read -r line; do
grep -m 1 "$line" patterns_copy.txt
done < patterns.txt > output.txt
Or, in one line:
while read -r line; do grep -m 1 "$line" patterns_copy.txt; done < patterns.txt > output.txt
For parallel processing, you can start the processes as background jobs:
while read -r line; do
grep -m 1 "$line" patterns_copy.txt &
read -r line && grep -m 1 "$line" patterns_copy.txt &
# Repeat the previous line as desired
wait # Wait for greps of this loop to finish
done < patterns.txt > output.txt
This is not really elegant as for each loop it will wait for the slowest grep to finish, but should still be faster than just one grep per loop.

Related

Extracting data after a tag and CR with Busybox sed

I have a script that extracts a file from a bash script combined with a binary file. It does so using the following GNU sed syntax
sed -n '/__DATA__/{n;:1;n;p;b1}' /tmp/combined.file > /tmp/binary.file
The files are assembled by cat'ing an ISO file to the end of a bash script. Which is then sent over the network to an embedded device and extracted on the device, piping the ISO file to a temporary dir and executing the bash script to install it.
However, on executing this I get a
sed: unterminated {
Am I missing something here? Is this task possible with BusyBox sed?
It tried the "Second attempt" below with OSX/BSD awk and it failed, just printing up til the first NUL character. So you can't do this job portably with awk or sed.
Here's what should work everywhere given that the POSIX standard says
the input file to tail can be any type
so the input to tail doesn't have to be a POSIX text file (no NULs) and we're exiting from awk before the first NUL is encountered in the input so they should both be happy:
$ tail -n +"$(awk '/^__DATA__$/{print NR+2; exit}' binary.bin)" binary.bin | cat -ev
ER^H^#^#^#M-^PM-^P^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#3��M-^Nռ^#|��f1�f1�fSfQ^FWM-^N�M-^N�R�^#|�^#^F�^#^A��K^F^#^#R�A��U1�0���^Sr^VM-^A�U�u^PM-^C�^At^Kf�^F�^F�B�^U�^B1�ZQ�^H�^S[^O��#PM-^C�?Q��SRP�^#|�^D^#f��^G�D^#^OM-^BM-^#^#f#M-^#�^B��fM-^A>#|��xpu ��{�D|^#^#�M-^C^#isolinux.bin missing or corrupt.^M$
f`f1�f^C^F�{f^S^V�{fRfP^FSj^Aj^PM-^I�f�6�{��^FM-^H�M-^H�M-^R�6�{M-^H�^H�A�^A^BM-^J^V�{�^SM-^Md^Pfa��^^^#Operating system load error.^M$
^��^NM-^J>b^D�^G�^P<$
u��^X���^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#L^D^#^#^#^#^#^#�K�6^#^#M-^#^#^A^#^#?�M-^K^#^#^#^#^#`^\^#^#�������<R^#^#^#^_^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#U�EFI PART^#^#^A^#\^#^#^#]3�.^#^#^#^#^A^#^#^#^#^#^#^#�_^\^#^#^#^#^##^#^#^#^#^#^#^#�_^\^#^#^#^#^#Uc�r^Oqc#M-^Rc^F�$LZ�^L^#^#^#^#^#^#^#�^#^#^#M-^#^#^#^#�t^]F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#$
Second attempt:
Now that I have a better idea what you're trying to do (process a file consisting of POSIX text lines up to a point and then can contain NUL characters afterwards), try this:
$ cat -ev file
echo "I: Installation finished!"$
exit 0$
$
__DATA__$
$
foo^#bar^#etc
$ cat tst.awk
/^__DATA__$/ { n=NR + 1 }
n && (NR == n) { RS="\0"; ORS="" }
n && (NR > n) { print (c++ ? RS : "") $0 }
$ awk -f tst.awk file | cat -ev
foo^#bar^#etc
The above doesn't try to store any input lines containing NUL in memory, instead it reads \n-terminated text lines until it reaches the line after the one containing __DATA__ and then switches to reading NUL-terminated records into memory and printing NULs between them on output.
It's still undefined behavior per POSIX (see my comments below) but in theory it should work since it just relies on being able to set one variable (RS) to NUL rather than trying to store input strings that contain NULs. Also, setting RS to NUL has been a (flawed) workaround for awk scripts for years to be able to read a whole file into memory at once so being able to set RS to NUL should work in any modern awk.
Using the new sample you provided with the missing blank line after the __DATA__ line added:
$ cat -ev file
#!/bin/bash$
$
echo "I: Awesome Things happened here"$
exit 0$
$
__DATA__$
$
ER^H^#^#^#M-^PM-^P^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#3M-mM-zM-^NM-UM-<^#|M-{M-|f1M-[f1M-IfSfQ^FWM-^NM-]M-^NM-ERM->^#|M-?^#^FM-9^#^AM-sM-%M-jK^F^#^#RM-4AM-;M-*U1M-I0M-vM-yM-M^Sr^VM-^AM-{UM-*u^PM-^CM-a^At^KfM-G^FM-s^FM-4BM-k^UM-k^B1M-IZQM-4^HM-M^S[^OM-6M-F#PM-^CM-a?QM-wM-aSRPM-;^#|M-9^D^#fM-!M-0^GM-hD^#^OM-^BM-^#^#f#M-^#M-G^BM-bM-rfM-^A>#|M-{M-#xpu M-zM-<M-l{M-jD|^#^#M-hM-^C^#isolinux.bin missing or corrupt.^M$
f`f1M-Rf^C^FM-x{f^S^VM-|{fRfP^FSj^Aj^PM-^IM-ffM-w6M-h{M-#M-d^FM-^HM-aM-^HM-EM-^RM-v6M-n{M-^HM-F^HM-aAM-8^A^BM-^J^VM-r{M-M^SM-^Md^PfaM-CM-h^^^#Operating system load error.^M$
^M-,M-4^NM-^J>b^DM-3^GM-M^P<$
uM-qM-M^XM-tM-kM-}^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#L^D^#^#^#^#^#^#M-/KM-66^#^#M-^#^#^A^#^#?M-`M-^K^#^#^#^#^#`^\^#^#M-~M-^?M-^?M-oM-~M-^?M-^?<R^#^#^#^_^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#UM-*EFI PART^#^#^A^#\^#^#^#]3M-%.^#^#^#^#^A^#^#^#^#^#^#^#M-^?_^\^#^#^#^#^##^#^#^#^#^#^#^#M-J_^\^#^#^#^#^#UcM-)r^Oqc#M-^Rc^FM-2$LZM-p^L^#^#^#^#^#^#^#M-P^#^#^#M-^#^#^#^#M-{t^]F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#$
.
$ awk -f tst.awk file | cat -ev
ER^H^#^#^#M-^PM-^P^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#3M-mM-zM-^NM-UM-<^#|M-{M-|f1M-[f1M-IfSfQ^FWM-^NM-]M-^NM-ERM->^#|M-?^#^FM-9^#^AM-sM-%M-jK^F^#^#RM-4AM-;M-*U1M-I0M-vM-yM-M^Sr^VM-^AM-{UM-*u^PM-^CM-a^At^KfM-G^FM-s^FM-4BM-k^UM-k^B1M-IZQM-4^HM-M^S[^OM-6M-F#PM-^CM-a?QM-wM-aSRPM-;^#|M-9^D^#fM-!M-0^GM-hD^#^OM-^BM-^#^#f#M-^#M-G^BM-bM-rfM-^A>#|M-{M-#xpu M-zM-<M-l{M-jD|^#^#M-hM-^C^#isolinux.bin missing or corrupt.^M$
f`f1M-Rf^C^FM-x{f^S^VM-|{fRfP^FSj^Aj^PM-^IM-ffM-w6M-h{M-#M-d^FM-^HM-aM-^HM-EM-^RM-v6M-n{M-^HM-F^HM-aAM-8^A^BM-^J^VM-r{M-M^SM-^Md^PfaM-CM-h^^^#Operating system load error.^M$
^M-,M-4^NM-^J>b^DM-3^GM-M^P<$
uM-qM-M^XM-tM-kM-}^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#L^D^#^#^#^#^#^#M-/KM-66^#^#M-^#^#^A^#^#?M-`M-^K^#^#^#^#^#`^\^#^#M-~M-^?M-^?M-oM-~M-^?M-^?<R^#^#^#^_^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#UM-*EFI PART^#^#^A^#\^#^#^#]3M-%.^#^#^#^#^A^#^#^#^#^#^#^#M-^?_^\^#^#^#^#^##^#^#^#^#^#^#^#M-J_^\^#^#^#^#^#UcM-)r^Oqc#M-^Rc^FM-2$LZM-p^L^#^#^#^#^#^#^#M-P^#^#^#M-^#^#^#^#M-{t^]F^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#$
Original answer:
Assuming this question is related to your previous question, this will work using any awk in any shell on every UNIX box:
$ awk '/^__DATA__$/{n=NR+1} n && NR>n' file
3<ED>M-^PM-^PM-^PM-^PM-^
When it finds __DATA__ it sets a variable n to the line number to start printing after and then when n is set prints every line for which the line number is greater than n.
The above was run against this input file from your previous question:
$ cat -ev file
echo "I: Installation finished!"$
exit 0$
$
__DATA__$
$
3<ED>M-^PM-^PM-^PM-^PM-^$

How can i repeat a script?

i've search for command or solution to repeat a script after n times but i can't find it.
This is my rusty script:
#!/bin/csh -f
rm -rf result120
rm -rf result127
rm -rf result126
rm -rf result125
rm -rf result128
rm -rf result129
rm -rf result122
rm -rf output
rm -rf aaa
### Get job id from user name
foreach file ( `cat name` )
echo `bjobs -u $file | awk '$1 ~ /^[0-9]+/ {print $1}' >> aaa`
echo "loading"
end
### Read in job id
foreach file ( `cat aaa` )
echo `bjobs -l $file >> result120`
echo "loading"
end
### Get pattern in < >
awk '{\
gsub(/ /,"",$0)}\
BEGIN {\
RS =""\
FS=","\
}\
{\
s=1\
e=150\
if ($1 ~/Job/){\
for(i=s;i<=e;i++){\
printf("%s", $(i))}\
}\
}' result120 > result126
grep -oE '<[^>]+>' result126 > result125
### Get Current Work Location
awk '$1 ~ /<lsf_login..>/ {getline; print $1}' result125 >result122 #result127
### Get another information and paste it with CWD
foreach file1 ( `cat aaa` )
echo `bjobs $file1 >> result128`
echo "getting data"
end
awk '$1 ~ /JOBID/ {getline; printf "%-15s %-15s %-15s %-15s %-20s\n", $1, $2, $3, $4, $5}' result128 >> result129
paste result129 result122 >> output
### Summary
awk '{count1[$2]++}{count2[$4]++}{count3[$3]++}\
END{\
print "\n"\
print "##########################################################################"\
print "There are: ", NR " Jobs"\
for(name in count1){ print name, count1[name]}\
print "\n"\
for(queqe in count2){ print queqe, count2[queqe]}\
print "\n"\
for(stt in count3){ print stt, count3[stt]}\
}' output >> output
And my desire is run it again per 15 minutes to get report. Someone told me use Wait but i've searched for it in man wait and can't find any
useful example. That's why i need yours help to solve this problem.
Thanks a lot.
run the script every 15 mins
while true; do ./script.sh; sleep 900; done
or set a cron job or use watch
For c shell you have to write
while (1)
./script.sh
sleep 900
end
but why use csh since you have bash? Double check the syntax, since I don't remember it much anymore...
Following #karakfa answer, you have basically 2 options.
1) Your first option, even if you use a sleep implements a kind of busy-waiting strategy (https://en.wikipedia.org/wiki/Busy_waiting), this stragegy uses more CPU/memory than your second option (the cron approach) because you will have in memory your processus footprint even if it is actually doing nothing.
2) On the other hand, in the cron approach your processus will only appear while doing useful activities.
Just Imagine if you implement this kind of approach for many programs running on your machine, a lot of memory will be consume by processus in waiting states, it will also have an impact (memory/CPU usage) on the scheduling algorithm of your OS since it will have more processes in queue to manage.
Therefore, I would absolutely recommend the cron/scheduling approach.
Anyway,your cron daemon will be running in background whether you add the entry or not in the crontab, so why not adding it?
Last but not least, imagine if your busy-waiting processus is killed for any reason, if you go for the first option you will need to restart it manually and you might lose a couple of monitoring entries.
Hope it helps you.

Awk match and for loop requirement

I'm struggling to understand what I'm doing wrong here. The output gets repeated the number of times a disk is seen:
Parted -l |
awk -F '[ :] '/^Disk \//{print $2}' |
while read LINE; do
blkid $LINE1 |
awk -F ' ' '{
for(i=1;i<=NF;i++) {
print $i;
if($i ~ /UUID*/) {
print $i;
exit
}
}
}';
done
If I have three disks in my system e.g. /dev/sda1 /dev/sdb1 /dev/sdc1
Input to above set of commands:
/dev/sda1: LABEL="/" UUID="feg45687-576466-676445-6536-7534-5784326797fbfg" TYPE="ext4"
Current output from above commands:
UUID="feg45687-576466-676445-6536-7534-5784326797fbfg"
UUID="tug45687-576466-676445-6536-7534-68546df"
UUID="feg45687-576466-676445-6536-7534-9908536lk"
UUID="feg45687-576466-676445-6536-7534-5784326797fbfg"
UUID="tug45687-576466-676445-6536-7534-68546df"
UUID="feg45687-576466-676445-6536-7534-9908536lk"
UUID="feg45687-576466-676445-6536-7534-5784326797fbfg"
UUID="tug45687-576466-676445-6536-7534-68546df"
UUID="feg45687-576466-676445-6536-7534-9908536lk"
What is expected output:
UUID="feg45687-576466-676445-6536-7534-5784326797fbfg"
UUID="tug45687-576466-676445-6536-7534-68546df"
UUID="feg45687-576466-676445-6536-7534-9908536lk"
So if I have three disks i get the same UUID line repeated three times.
Any way of moving to the next LINE without repeating?
Your problem (or, at least, one of your problems) is probably in the lines:
while read LINE; do
blkid $LINE1 |
You read into $LINE, but you reference $LINE1, an unrelated and probably uninitialized variable, which probably leads blkid to list all devices each time.
Fix: choose one name or the other — don't use both.

Is there a way to create an awk input inactivity timer?

I have a text source (a log file), which gets new lines appended to it by some third party.
I can output the additions to my source file using tail -f source. I can then pipe that through an awk script awk -f parser.awk to parse and format the output.
My question is: while tail -f source | awk -f parser.awk is running, is there a way to call function foo() inside my parser.awk script every time there is more than 5 seconds elapsed without anything coming through the pipe into the standard input of the awk script?
Edit: Currently using GNU Awk 3.1.6. May be able to upgrade to newer version if required.
If your shell's read supports -t and -u, here's an ugly hack:
{ echo hello; sleep 6; echo world; } | awk 'BEGIN{
while( "while read -t 5 -u 3 line; do echo \"$line\"; done" | getline > 0 )
print
}' 3<&0
You can replace the print in the body of the while loop with your script. However, it would probably make a lot more sense to put the read timeout between tail and awk in the pipeline, and it would make even more sense to re-implement tail to timeout.
Not exactly the answer to your question. However there is a little hack in shell that can do practically what you want:
{ tail -f log.file >&2 | { while : ; do sleep 5; echo SECRET_PHRASE ; done ; } ; } 2>&1 | awk -f script.awk
When awk receives SECRET_PHRASE it will run foo function every 5 seconds. Unfortunately is will run it every 5 second even in case there was some output during this time from tail.
ps. You can replace '{}' with '()' and vice versa. In the first case it won't create subshell, in the second one it will.
The another way is to append this secret phrase dirctly to log file in case nobody wrote there during last five seconds. But looks like it's not good idea due to you will have spoiled log file.

Problem with awk and grep

I am using the following script to get the running process to print the id, command..
if [ "`uname`" = "SunOS" ]
then
awk_c="nawk"
ps_d="/usr/ucb/"
time_parameter=7
else
awk_c="awk"
ps_d=""
time_parameter=5
fi
main_class=RiskEngine
connection_string=db.regression
AWK_CMD='BEGIN{printf "%-15s %-6s %-8s %s\n","ID","PID","STIME","Cmd"} {printf "%-15s %-6s %-8s %s %s %s\n","MY_APP",$2,$time_parameter, main_class, connection_string, port}'
while getopts ":pnh" opt; do
case $opt in
p) AWK_CMD='{ print $2 }'
do_print_message=1;;
n) AWK_CMD='{printf "%-15s %-6s %-8s %s %s %s\n","MY_APP",$2,$time_parameter,main_class, connection_string, port}' ;;
h) print "usage : `basename ${0}` {-p} {-n} : Returns details of process running "
print " -p : Returns a list of PIDS"
print " -n : Returns process list without preceding header"
exit 1 ;
esac
done
ps auxwww | grep $main_class | grep 10348 | grep -v grep | ${awk_c} -v main_class=$merlin_main_class -v connection_string=$merlin_connection_
string -v port=10348 -v time_parameter=$time_parameter "$AWK_CMD"
# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 6)
# uname -a
Linux deapp25v 2.6.9-67.0.4.EL #1 Fri Jan 18 04:49:54 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
When I am executing the following from the script independently or inside script
# ps auxwww | grep $main_class | grep 10348 | grep -v grep | ${awk_c} -v main_class=$merlin_main_class -v connection_string=$merlin_connection_string -v port=10348 -v time_parameter=$time_parameter "$AWK_CMD"
I get two rows on Linux:
ID PID STIME Cmd
MY_APP 6217 2355352 RiskEngine 10348
MY_APP 21874 5316 RiskEngine 10348
I just have one jvm (Java command) running in the background but still I see 2 rows.
I know one of them (Duplicate with pid 21874) comes from awk command that I am executing. It includes again the main class and the port so two rows. Can you please help me to avoid the one that is duplicate row?
Can you please help me?
AWK can do all that grepping for you.
Here is a simple example of how an AWK command can be selective:
ps auxww | awk -v select="$mainclass" '$0 ~ select && /10348/ && ! (/grep/ || /awk/) && {print}'
ps can be made to selectively output fields which will help a little to reduce false positives. However pgrep may be more useful to you since all you're really using is the PID from the result.
pgrep -f "$mainclass.*10348"
I've reformatted the code as code, but you need to learn that the return key is your friend. The monstrously long pipelines should be split over multiple lines - I typically use one line per command in the pipeline. You can also write awk scripts on more than one line. This makes your code more readable.
Then you need to explain to us what you are up to.
However, it is likely that you are using 'awk' as a variant on grep and are finding that the value 10348 (possibly intended as a port number on some command line) is also in the output of ps as one of the arguments to awk (as is the 'main_class' value), so you get the extra information. You'll need to revise the awk script to eliminate (ignore) the line that contains 'awk'.
Note that you could still be bamboozled by a command running your main class on port 9999 (any value other than 10348) if it so happens that it is run by a process with PID or PPID equal to 10348. If you're going to do the job thoroughly, then the 'awk' script needs to analyze only the 'command plus options' part of the line.
You're already using the grep -v grep trick in your code, why not just update it to exclude the awk process as well with grep -v ${awk_c}?
In other words, the last line of your script would be (on one line and with the real command parameters to awk rather than blah blah blah).:
ps auxwww
| grep $main_class
| grep 10348
| grep -v grep
| grep -v ${awk_c}
| ${awk_c} -v blah blah blah
This will ensure the list of processes will not containg any with the word awk in it.
Keep in mind that it's not always a good idea to do it this way (false positives) but, since you're already taking the risk with processes containing grep, you may as well do so with those containing awk as well.
You can add this simple code in front of all your awk args:
'!/awk/ { .... original awk code .... }'
The '!/awk/' will have the effect of telling awk to ignore any line containing the string awk.
You could also remove your 'grep -v' if you extended my awk suggestion into something like:
'!/awk/ && !/grep/ { ... original awk code ... }'.