How can i repeat a script? - awk

i've search for command or solution to repeat a script after n times but i can't find it.
This is my rusty script:
#!/bin/csh -f
rm -rf result120
rm -rf result127
rm -rf result126
rm -rf result125
rm -rf result128
rm -rf result129
rm -rf result122
rm -rf output
rm -rf aaa
### Get job id from user name
foreach file ( `cat name` )
echo `bjobs -u $file | awk '$1 ~ /^[0-9]+/ {print $1}' >> aaa`
echo "loading"
end
### Read in job id
foreach file ( `cat aaa` )
echo `bjobs -l $file >> result120`
echo "loading"
end
### Get pattern in < >
awk '{\
gsub(/ /,"",$0)}\
BEGIN {\
RS =""\
FS=","\
}\
{\
s=1\
e=150\
if ($1 ~/Job/){\
for(i=s;i<=e;i++){\
printf("%s", $(i))}\
}\
}' result120 > result126
grep -oE '<[^>]+>' result126 > result125
### Get Current Work Location
awk '$1 ~ /<lsf_login..>/ {getline; print $1}' result125 >result122 #result127
### Get another information and paste it with CWD
foreach file1 ( `cat aaa` )
echo `bjobs $file1 >> result128`
echo "getting data"
end
awk '$1 ~ /JOBID/ {getline; printf "%-15s %-15s %-15s %-15s %-20s\n", $1, $2, $3, $4, $5}' result128 >> result129
paste result129 result122 >> output
### Summary
awk '{count1[$2]++}{count2[$4]++}{count3[$3]++}\
END{\
print "\n"\
print "##########################################################################"\
print "There are: ", NR " Jobs"\
for(name in count1){ print name, count1[name]}\
print "\n"\
for(queqe in count2){ print queqe, count2[queqe]}\
print "\n"\
for(stt in count3){ print stt, count3[stt]}\
}' output >> output
And my desire is run it again per 15 minutes to get report. Someone told me use Wait but i've searched for it in man wait and can't find any
useful example. That's why i need yours help to solve this problem.
Thanks a lot.

run the script every 15 mins
while true; do ./script.sh; sleep 900; done
or set a cron job or use watch
For c shell you have to write
while (1)
./script.sh
sleep 900
end
but why use csh since you have bash? Double check the syntax, since I don't remember it much anymore...

Following #karakfa answer, you have basically 2 options.
1) Your first option, even if you use a sleep implements a kind of busy-waiting strategy (https://en.wikipedia.org/wiki/Busy_waiting), this stragegy uses more CPU/memory than your second option (the cron approach) because you will have in memory your processus footprint even if it is actually doing nothing.
2) On the other hand, in the cron approach your processus will only appear while doing useful activities.
Just Imagine if you implement this kind of approach for many programs running on your machine, a lot of memory will be consume by processus in waiting states, it will also have an impact (memory/CPU usage) on the scheduling algorithm of your OS since it will have more processes in queue to manage.
Therefore, I would absolutely recommend the cron/scheduling approach.
Anyway,your cron daemon will be running in background whether you add the entry or not in the crontab, so why not adding it?
Last but not least, imagine if your busy-waiting processus is killed for any reason, if you go for the first option you will need to restart it manually and you might lose a couple of monitoring entries.
Hope it helps you.

Related

Change a string using sed or awk

I have some files which have wrong time and date, but the filename contains the correct time and date and I try to write a script to fix this with the touch command.
Example of filename:
071212_090537.jpg
I would like this to be converted to the following format:
1712120905.37
Note, the year is listed as 07 in the filename, even if it is 17 so I would like the first 0 to be changed to 1.
How can I do this using awk or sed?
I'm quite new to awk and sed, an programming in general. Have tried to search for a solution and instruction, but haven't manage to figure out how to solve this.
Can anyone help me?
Thanks. :)
Take your example:
awk -F'[_.]' '{$0=$1$2;sub(/^./,"1");sub(/..$/,".&")}1'<<<"071212_090537.jpg"
will output:
1712120905.37
If you want the file to be removed, you can let awk generate the mv origin new command, and pipe the output to |sh, like: (comments inline)
listYourFiles| # list your files as input to awk
awk -F'[_.]' '{o=$0;$0=$1$2;sub(/^./,"1");sub(/..$/,".&");
printf "mv %s %s\n",o,$0 }1' #this will print "mv ori new"
|sh # this will execute the mv command
It's completely unnecessary to call awk or sed for this, you can do it in your shell. e.g. with bash:
$ f='071212_090537.jpg'
$ [[ $f =~ ^.(.*)_(.*)(..)\.[^.]+$ ]]
$ echo "1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
1712120905.37
This is probably what you're trying to do:
for old in *.jpg; do
[[ $old =~ ^.(.*)_(.*)(..)\.[^.]+$ ]] || { printf 'Warning, unexpected old file name format "%s"\n' "$old" >&2; continue; }
new="1${BASH_REMATCH[1]}${BASH_REMATCH[2]}.${BASH_REMATCH[3]}"
[[ -f "$new" ]] && { printf 'Warning, new file name "%s" generated from "%s" already exists, skipping.\n' "$new" "$old" >&2; continue; }
mv -- "$old" "$new"
done
You need that test for new already existing since an old of 071212_090537.jpg or 171212_090537.jpg (or various other values) would create the same new of 1712120905.37
I think sed really is the easiest solution:
You could do this:
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
For more info:
You probably need to read an introductory tutorial on regular expressions.
Note that the -E option to sed allows use of extended regular expressions, allowing a more readable and convenient expression here.
Use of <<< is a Bashism known as a "here-string". If you are using a shell that doesn't support that, A <<< $b can be rewritten as echo $b | A.
Testing:
▶ touch 071212_090538.jpg 071212_090539.jpg
▶ ls -1 *.jpg
071212_090538.jpg
071212_090539.jpg
▶ for f in *.jpg ; do
new_f=$(sed -E 's/([0-9]{6})_([0-9]{4})([0-9]{2})\.jpg/\1\2.\3.jpg/' <<< $f)
mv $f $new_f
done
▶ ls -1
0712120905.38.jpg
0712120905.39.jpg

parallelize awk script - files splitting

I have a small awk script which takes input from a stream and writes to the appropriate file based on the second column value. Here is how it goes:
cat mydir/*.csv | awk -F, '{if(NF==29)print $0 >> "output/"$2".csv"}'
How do I parallelize it, so that it can use multiple cores available in the machine? Right now, this is running on a single core.
you can try this.
I execute 1 awk per source file. Put content in temporary file (in each process it is a series of different one to avoid conflict in same final file and/or too much open/close handle on it). At the end of the awk, it put the content of temporary file into final one and remove temporary
you maybe have to use a batch limiter (a sleep or more smart grouping) if there are lot of file to treat to avoid to kill the machine with too much subprocess concurrent.
rm output/*.csv
for File in mydir/*.csv
do
# shell sub process
{
# ref for a series of temporary file
FileRef="${File##*/}"
awk -F ',' -v FR="${FileRef}" '
NF == 29 {
# put info in temporary file
ListFiles [ OutTemp = "output/"$2".csv_" FR ] = "output/"$2".csv"
print > OutTemp}
END {
# put temporary content into final file
for ( TempFile in ListFiles ) {
Command = sprintf( "cat \042%s\042 >> \042%s\042; rm \042%s\042" \
, TempFile, ListFiles[TempFile], TempFile )
printf "" | Command
}
' File
} &
done
wait
echo ls -l output/*.csv
Untested:
do_one() {
# Make a workdir only used by this process to ensure no files are added to in parallel
mkdir -p $1
cd $1
cat ../"$2" | awk -F, '{if(NF==29)print $0 >> $2".csv"}'
}
export -f do_one
parallel do_one workdir-{%} {} ::: mydir/*.csv
ls workdir-*/ | sort -u |
parallel 'cat workdir*/{} > output/{}'
rm -rf workdir-*
If you want to avoid the extra cat you can use this instead, though I find the cat version easier to read (performance is normally the same on modern systems http://oletange.blogspot.com/2013/10/useless-use-of-cat.html):
do_one() {
# Make a workdir only used by this process to ensure no files are added to in parallel
mkdir -p $1
cd $1
awk -F, <../"$2" '{if(NF==29)print $0 >> $2".csv"}'
}
export -f do_one
parallel do_one workdir-{%} {} ::: mydir/*.csv
ls workdir-*/ | sort -u |
parallel 'cat workdir*/{} > output/{}'
rm -rf workdir-*
But as #Thor writes, you are most likely I/O starved.

grep early stop with one match per pattern

Say I have a file where the patterns reside, e.g. patterns.txt. And I know that all the patterns will only be matched once in another file patterns_copy.txt, which in this case to make matters simple is just a copy of patterns.txt.
If I run
grep -m 1 --file=patterns.txt patterns_copy.txt > output.txt
I get only one line. I guess it's because the m flag stopped the whole matching process once the 1st line of the two files match.
What I would like to achieve is to have each pattern in patterns.txt matched only once, and then let grep move to the next pattern.
How do I achieve this?
Thanks.
Updated Answer
I have now had a chance to integrate what I was thinking about awk into the GNU Parallel concept.
I used /usr/share/dict/words as my patterns file and it has 235,000 lines in it. Using BenjaminW's code in another answer, it took 141 minutes, whereas this code gets that down to 11 minutes.
The difference here is that there are no temporary files and awk can stop once it has found all 8 of the things it was looking for...
#!/bin/bash
# Create a bash function that GNU Parallel can call to search for 8 things at once
doit() {
# echo Job: $9
# In following awk script, read "p1s" as a flag meaning "p1 has been seen"
awk -v p1="$1" -v p2="$2" -v p3="$3" -v p4="$4" -v p5="$5" -v p6="$6" -v p7="$7" -v p8="$8" '
$0 ~ p1 && !p1s {print; p1s++;}
$0 ~ p2 && !p2s {print; p2s++;}
$0 ~ p3 && !p3s {print; p3s++;}
$0 ~ p4 && !p4s {print; p4s++;}
$0 ~ p5 && !p5s {print; p5s++;}
$0 ~ p6 && !p6s {print; p6s++;}
$0 ~ p7 && !p7s {print; p7s++;}
$0 ~ p8 && !p8s {print; p8s++;}
{if(p1s+p2s+p3s+p4s+p5s+p6s+p7s+p8s==8)exit}
' patterns.txt
}
export -f doit
# Next line effectively uses 8 cores at a time to each search for 8 items
parallel -N8 doit {1} {2} {3} {4} {5} {6} {7} {8} {#} < patterns.txt
Just for fun, here is what it does to my CPU - blue means maxed out, and see if you can see where the job started in the green CPU history!
Other Thoughts
The above benefits from the fact that the input files are relatively well sorted, so it is worth looking for 8 things at a time because they are likely close to each other in the input file, and I can therefore avoid the overhead associated with creating one process per sought term. However, if your data are not well sorted, that may mean that you waste a lot of time looking further through the file than necessary to find the next 7, or 6 other items. In that case, you may be better off with this:
parallel grep -m1 "{}" patterns.txt < patterns.txt
Original Answer
Having looked at the size of your files, I now think awk is probably not the way to go, but GNU Parallel maybe is. I tried parallelising the problem two ways.
Firstly, I search for 8 items at a time in a single pass through the input file so that I have less to search through with the second set of greps that use the -m 1 parameter.
Secondly, I do as many of these "8-at-a-time" greps in parallel as I have CPU cores.
I use the GNU Parallel job number {#} as a unique temporary filename, and only create 16 (or however many CPU cores you have) temporary files at a time. The temporary files are prefixed ss (for sub-search) so they can call be deleted easily enough when testing.
The speedup seems to be a factor of about 4 times on my machine. I used /usr/share/dict/words as my test files.
#!/bin/bash
# Create a bash function that GNU Parallel can call to search for 8 things at once
doit() {
# echo Job: $9
# Make a temp filename using GNU Parallel's job number which is $9 here
TEMP=ss-${9}.txt
grep -E "$1|$2|$3|$4|$5|$6|$7|$8" patterns.txt > $TEMP
for i in $1 $2 $3 $4 $5 $6 $7 $8; do
grep -m1 "$i" $TEMP
done
rm $TEMP
}
export -f doit
# Next line effectively uses 8 cores at a time to each search for 8 items
parallel -N8 doit {1} {2} {3} {4} {5} {6} {7} {8} {#} < patterns.txt
You can loop over your patterns like this (assuming you're using Bash):
while read -r line; do
grep -m 1 "$line" patterns_copy.txt
done < patterns.txt > output.txt
Or, in one line:
while read -r line; do grep -m 1 "$line" patterns_copy.txt; done < patterns.txt > output.txt
For parallel processing, you can start the processes as background jobs:
while read -r line; do
grep -m 1 "$line" patterns_copy.txt &
read -r line && grep -m 1 "$line" patterns_copy.txt &
# Repeat the previous line as desired
wait # Wait for greps of this loop to finish
done < patterns.txt > output.txt
This is not really elegant as for each loop it will wait for the slowest grep to finish, but should still be faster than just one grep per loop.

Quoting a complex awk program in tmux.conf

A user on Freenode #tmux asked:
How can we properly escape this shell command using GNU awk for set -g tmux status-right?
sensors | awk '/^Physical id 0:/ { s = $4; sub(/^+/, "", s); print s; exit }'
The result should be 45.0°C.
Also, how can we make it update every 30 seconds?
The output of sensors:
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +45.0°C (high = +80.0°C, crit = +100.0°C)
...
Setting status-right
Quoting with shell command #( ) in tmux
Quoting is complex in tmux #( ) because the contents are evaluated twice.
For this reason let's simplify the gawk program to:
sensors | awk '/^Physical id 0:/ { sub(/^+/, "", $4); print $4; exit }'
Now we plug it into .tmux.conf:
set-option -g status-right "#( sensors | awk \\' /Physical id 0:/ { sub\\(/\+/,\"\",$4\\); print \$4; exit } \\')"
But that's terribly complex to read and change next time you go tinkering...
An easier alternative
The easiest solution is to put the shell command into a file and call it from tmux.
~/bin/tmux-status.bash:
#!/bin/bash
sensors | awk '/^Physical id 0:/ { sub(/^+/, "", $4); print $4; exit }'
~/.tmux.conf:
set-option -g status-right "#(bash ~/bin/tmux-status.bash)"
Make it update every 30 seconds
set-option -g status-interval 30
  
See also
tmux titles-string not executing shell command, StackOverflow
"What is the proper way to escape characters with #(command)?", tmux FAQ on github

Is there a way to create an awk input inactivity timer?

I have a text source (a log file), which gets new lines appended to it by some third party.
I can output the additions to my source file using tail -f source. I can then pipe that through an awk script awk -f parser.awk to parse and format the output.
My question is: while tail -f source | awk -f parser.awk is running, is there a way to call function foo() inside my parser.awk script every time there is more than 5 seconds elapsed without anything coming through the pipe into the standard input of the awk script?
Edit: Currently using GNU Awk 3.1.6. May be able to upgrade to newer version if required.
If your shell's read supports -t and -u, here's an ugly hack:
{ echo hello; sleep 6; echo world; } | awk 'BEGIN{
while( "while read -t 5 -u 3 line; do echo \"$line\"; done" | getline > 0 )
print
}' 3<&0
You can replace the print in the body of the while loop with your script. However, it would probably make a lot more sense to put the read timeout between tail and awk in the pipeline, and it would make even more sense to re-implement tail to timeout.
Not exactly the answer to your question. However there is a little hack in shell that can do practically what you want:
{ tail -f log.file >&2 | { while : ; do sleep 5; echo SECRET_PHRASE ; done ; } ; } 2>&1 | awk -f script.awk
When awk receives SECRET_PHRASE it will run foo function every 5 seconds. Unfortunately is will run it every 5 second even in case there was some output during this time from tail.
ps. You can replace '{}' with '()' and vice versa. In the first case it won't create subshell, in the second one it will.
The another way is to append this secret phrase dirctly to log file in case nobody wrote there during last five seconds. But looks like it's not good idea due to you will have spoiled log file.