Quoting a complex awk program in tmux.conf - awk

A user on Freenode #tmux asked:
How can we properly escape this shell command using GNU awk for set -g tmux status-right?
sensors | awk '/^Physical id 0:/ { s = $4; sub(/^+/, "", s); print s; exit }'
The result should be 45.0°C.
Also, how can we make it update every 30 seconds?
The output of sensors:
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +45.0°C (high = +80.0°C, crit = +100.0°C)
...

Setting status-right
Quoting with shell command #( ) in tmux
Quoting is complex in tmux #( ) because the contents are evaluated twice.
For this reason let's simplify the gawk program to:
sensors | awk '/^Physical id 0:/ { sub(/^+/, "", $4); print $4; exit }'
Now we plug it into .tmux.conf:
set-option -g status-right "#( sensors | awk \\' /Physical id 0:/ { sub\\(/\+/,\"\",$4\\); print \$4; exit } \\')"
But that's terribly complex to read and change next time you go tinkering...
An easier alternative
The easiest solution is to put the shell command into a file and call it from tmux.
~/bin/tmux-status.bash:
#!/bin/bash
sensors | awk '/^Physical id 0:/ { sub(/^+/, "", $4); print $4; exit }'
~/.tmux.conf:
set-option -g status-right "#(bash ~/bin/tmux-status.bash)"
Make it update every 30 seconds
set-option -g status-interval 30
  
See also
tmux titles-string not executing shell command, StackOverflow
"What is the proper way to escape characters with #(command)?", tmux FAQ on github

Related

parallelize awk script - files splitting

I have a small awk script which takes input from a stream and writes to the appropriate file based on the second column value. Here is how it goes:
cat mydir/*.csv | awk -F, '{if(NF==29)print $0 >> "output/"$2".csv"}'
How do I parallelize it, so that it can use multiple cores available in the machine? Right now, this is running on a single core.
you can try this.
I execute 1 awk per source file. Put content in temporary file (in each process it is a series of different one to avoid conflict in same final file and/or too much open/close handle on it). At the end of the awk, it put the content of temporary file into final one and remove temporary
you maybe have to use a batch limiter (a sleep or more smart grouping) if there are lot of file to treat to avoid to kill the machine with too much subprocess concurrent.
rm output/*.csv
for File in mydir/*.csv
do
# shell sub process
{
# ref for a series of temporary file
FileRef="${File##*/}"
awk -F ',' -v FR="${FileRef}" '
NF == 29 {
# put info in temporary file
ListFiles [ OutTemp = "output/"$2".csv_" FR ] = "output/"$2".csv"
print > OutTemp}
END {
# put temporary content into final file
for ( TempFile in ListFiles ) {
Command = sprintf( "cat \042%s\042 >> \042%s\042; rm \042%s\042" \
, TempFile, ListFiles[TempFile], TempFile )
printf "" | Command
}
' File
} &
done
wait
echo ls -l output/*.csv
Untested:
do_one() {
# Make a workdir only used by this process to ensure no files are added to in parallel
mkdir -p $1
cd $1
cat ../"$2" | awk -F, '{if(NF==29)print $0 >> $2".csv"}'
}
export -f do_one
parallel do_one workdir-{%} {} ::: mydir/*.csv
ls workdir-*/ | sort -u |
parallel 'cat workdir*/{} > output/{}'
rm -rf workdir-*
If you want to avoid the extra cat you can use this instead, though I find the cat version easier to read (performance is normally the same on modern systems http://oletange.blogspot.com/2013/10/useless-use-of-cat.html):
do_one() {
# Make a workdir only used by this process to ensure no files are added to in parallel
mkdir -p $1
cd $1
awk -F, <../"$2" '{if(NF==29)print $0 >> $2".csv"}'
}
export -f do_one
parallel do_one workdir-{%} {} ::: mydir/*.csv
ls workdir-*/ | sort -u |
parallel 'cat workdir*/{} > output/{}'
rm -rf workdir-*
But as #Thor writes, you are most likely I/O starved.

How can i repeat a script?

i've search for command or solution to repeat a script after n times but i can't find it.
This is my rusty script:
#!/bin/csh -f
rm -rf result120
rm -rf result127
rm -rf result126
rm -rf result125
rm -rf result128
rm -rf result129
rm -rf result122
rm -rf output
rm -rf aaa
### Get job id from user name
foreach file ( `cat name` )
echo `bjobs -u $file | awk '$1 ~ /^[0-9]+/ {print $1}' >> aaa`
echo "loading"
end
### Read in job id
foreach file ( `cat aaa` )
echo `bjobs -l $file >> result120`
echo "loading"
end
### Get pattern in < >
awk '{\
gsub(/ /,"",$0)}\
BEGIN {\
RS =""\
FS=","\
}\
{\
s=1\
e=150\
if ($1 ~/Job/){\
for(i=s;i<=e;i++){\
printf("%s", $(i))}\
}\
}' result120 > result126
grep -oE '<[^>]+>' result126 > result125
### Get Current Work Location
awk '$1 ~ /<lsf_login..>/ {getline; print $1}' result125 >result122 #result127
### Get another information and paste it with CWD
foreach file1 ( `cat aaa` )
echo `bjobs $file1 >> result128`
echo "getting data"
end
awk '$1 ~ /JOBID/ {getline; printf "%-15s %-15s %-15s %-15s %-20s\n", $1, $2, $3, $4, $5}' result128 >> result129
paste result129 result122 >> output
### Summary
awk '{count1[$2]++}{count2[$4]++}{count3[$3]++}\
END{\
print "\n"\
print "##########################################################################"\
print "There are: ", NR " Jobs"\
for(name in count1){ print name, count1[name]}\
print "\n"\
for(queqe in count2){ print queqe, count2[queqe]}\
print "\n"\
for(stt in count3){ print stt, count3[stt]}\
}' output >> output
And my desire is run it again per 15 minutes to get report. Someone told me use Wait but i've searched for it in man wait and can't find any
useful example. That's why i need yours help to solve this problem.
Thanks a lot.
run the script every 15 mins
while true; do ./script.sh; sleep 900; done
or set a cron job or use watch
For c shell you have to write
while (1)
./script.sh
sleep 900
end
but why use csh since you have bash? Double check the syntax, since I don't remember it much anymore...
Following #karakfa answer, you have basically 2 options.
1) Your first option, even if you use a sleep implements a kind of busy-waiting strategy (https://en.wikipedia.org/wiki/Busy_waiting), this stragegy uses more CPU/memory than your second option (the cron approach) because you will have in memory your processus footprint even if it is actually doing nothing.
2) On the other hand, in the cron approach your processus will only appear while doing useful activities.
Just Imagine if you implement this kind of approach for many programs running on your machine, a lot of memory will be consume by processus in waiting states, it will also have an impact (memory/CPU usage) on the scheduling algorithm of your OS since it will have more processes in queue to manage.
Therefore, I would absolutely recommend the cron/scheduling approach.
Anyway,your cron daemon will be running in background whether you add the entry or not in the crontab, so why not adding it?
Last but not least, imagine if your busy-waiting processus is killed for any reason, if you go for the first option you will need to restart it manually and you might lose a couple of monitoring entries.
Hope it helps you.

Is there a way to create an awk input inactivity timer?

I have a text source (a log file), which gets new lines appended to it by some third party.
I can output the additions to my source file using tail -f source. I can then pipe that through an awk script awk -f parser.awk to parse and format the output.
My question is: while tail -f source | awk -f parser.awk is running, is there a way to call function foo() inside my parser.awk script every time there is more than 5 seconds elapsed without anything coming through the pipe into the standard input of the awk script?
Edit: Currently using GNU Awk 3.1.6. May be able to upgrade to newer version if required.
If your shell's read supports -t and -u, here's an ugly hack:
{ echo hello; sleep 6; echo world; } | awk 'BEGIN{
while( "while read -t 5 -u 3 line; do echo \"$line\"; done" | getline > 0 )
print
}' 3<&0
You can replace the print in the body of the while loop with your script. However, it would probably make a lot more sense to put the read timeout between tail and awk in the pipeline, and it would make even more sense to re-implement tail to timeout.
Not exactly the answer to your question. However there is a little hack in shell that can do practically what you want:
{ tail -f log.file >&2 | { while : ; do sleep 5; echo SECRET_PHRASE ; done ; } ; } 2>&1 | awk -f script.awk
When awk receives SECRET_PHRASE it will run foo function every 5 seconds. Unfortunately is will run it every 5 second even in case there was some output during this time from tail.
ps. You can replace '{}' with '()' and vice versa. In the first case it won't create subshell, in the second one it will.
The another way is to append this secret phrase dirctly to log file in case nobody wrote there during last five seconds. But looks like it's not good idea due to you will have spoiled log file.

Problem with awk and grep

I am using the following script to get the running process to print the id, command..
if [ "`uname`" = "SunOS" ]
then
awk_c="nawk"
ps_d="/usr/ucb/"
time_parameter=7
else
awk_c="awk"
ps_d=""
time_parameter=5
fi
main_class=RiskEngine
connection_string=db.regression
AWK_CMD='BEGIN{printf "%-15s %-6s %-8s %s\n","ID","PID","STIME","Cmd"} {printf "%-15s %-6s %-8s %s %s %s\n","MY_APP",$2,$time_parameter, main_class, connection_string, port}'
while getopts ":pnh" opt; do
case $opt in
p) AWK_CMD='{ print $2 }'
do_print_message=1;;
n) AWK_CMD='{printf "%-15s %-6s %-8s %s %s %s\n","MY_APP",$2,$time_parameter,main_class, connection_string, port}' ;;
h) print "usage : `basename ${0}` {-p} {-n} : Returns details of process running "
print " -p : Returns a list of PIDS"
print " -n : Returns process list without preceding header"
exit 1 ;
esac
done
ps auxwww | grep $main_class | grep 10348 | grep -v grep | ${awk_c} -v main_class=$merlin_main_class -v connection_string=$merlin_connection_
string -v port=10348 -v time_parameter=$time_parameter "$AWK_CMD"
# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 6)
# uname -a
Linux deapp25v 2.6.9-67.0.4.EL #1 Fri Jan 18 04:49:54 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
When I am executing the following from the script independently or inside script
# ps auxwww | grep $main_class | grep 10348 | grep -v grep | ${awk_c} -v main_class=$merlin_main_class -v connection_string=$merlin_connection_string -v port=10348 -v time_parameter=$time_parameter "$AWK_CMD"
I get two rows on Linux:
ID PID STIME Cmd
MY_APP 6217 2355352 RiskEngine 10348
MY_APP 21874 5316 RiskEngine 10348
I just have one jvm (Java command) running in the background but still I see 2 rows.
I know one of them (Duplicate with pid 21874) comes from awk command that I am executing. It includes again the main class and the port so two rows. Can you please help me to avoid the one that is duplicate row?
Can you please help me?
AWK can do all that grepping for you.
Here is a simple example of how an AWK command can be selective:
ps auxww | awk -v select="$mainclass" '$0 ~ select && /10348/ && ! (/grep/ || /awk/) && {print}'
ps can be made to selectively output fields which will help a little to reduce false positives. However pgrep may be more useful to you since all you're really using is the PID from the result.
pgrep -f "$mainclass.*10348"
I've reformatted the code as code, but you need to learn that the return key is your friend. The monstrously long pipelines should be split over multiple lines - I typically use one line per command in the pipeline. You can also write awk scripts on more than one line. This makes your code more readable.
Then you need to explain to us what you are up to.
However, it is likely that you are using 'awk' as a variant on grep and are finding that the value 10348 (possibly intended as a port number on some command line) is also in the output of ps as one of the arguments to awk (as is the 'main_class' value), so you get the extra information. You'll need to revise the awk script to eliminate (ignore) the line that contains 'awk'.
Note that you could still be bamboozled by a command running your main class on port 9999 (any value other than 10348) if it so happens that it is run by a process with PID or PPID equal to 10348. If you're going to do the job thoroughly, then the 'awk' script needs to analyze only the 'command plus options' part of the line.
You're already using the grep -v grep trick in your code, why not just update it to exclude the awk process as well with grep -v ${awk_c}?
In other words, the last line of your script would be (on one line and with the real command parameters to awk rather than blah blah blah).:
ps auxwww
| grep $main_class
| grep 10348
| grep -v grep
| grep -v ${awk_c}
| ${awk_c} -v blah blah blah
This will ensure the list of processes will not containg any with the word awk in it.
Keep in mind that it's not always a good idea to do it this way (false positives) but, since you're already taking the risk with processes containing grep, you may as well do so with those containing awk as well.
You can add this simple code in front of all your awk args:
'!/awk/ { .... original awk code .... }'
The '!/awk/' will have the effect of telling awk to ignore any line containing the string awk.
You could also remove your 'grep -v' if you extended my awk suggestion into something like:
'!/awk/ && !/grep/ { ... original awk code ... }'.

choose the newest file and use getline to read it

Having problems with a small awk script, Im trying to choose the newest of some log files and then use getline to read it. The problem is that it dosent work if I dont send it any input first to the script.
This works
echo | myprog.awk
this do not
myprog.awk
myprog.awk
BEGIN{
#find the newest file
command="ls -alrt | tail -1 | cut -c59-100"
command | getline logfile
close(command)
}
{
while((getline<logfile)>0){
#do the magic
print $0
}
}
Your problem is that while your program selects OK the logfile the block {} is to be executed for every line of the input file and you have not input file so it defaults to standard input. I don't know awk very well myself so I don't know how to change the input (if possible) from within an awk script, so I would:
#! /bin/awk -f
BEGIN{
# find the newest file
command = "ls -1rt | tail -1 "
command | getline logfile
close(command)
while((getline<logfile)>0){
getline<logfile
# do the magic
print $0
}
}
or maybe
alias myprog.awk="awk '{print $0}' `ls -1rt | tail -1`"
Again, this maybe a little dirty. We'll wait for a better answer. :-)
Never parse ls. See this for the reason.
Why do you need to use getline? Let awk do the work for you.
#!/bin/bash
# get the newest file
files=(*) newest=${f[0]}
for f in "${files[#]}"; do
if [[ $f -nt $newest ]]; then
newest=$f
fi
done
# process it with awk
awk '{
# do the magic
print $0
}' $newest