Is there a way with gawk5 to launch shell commands in parallel? - awk

Using gawk 5, we are able to launch shell commands with something like:
command = "echo toto"
command | getline
However, this hangs until the command is complete, for example:
BEGIN {
command = "{ echo start >> test.log ; sleep 10 ; echo stop >> test.log ;}"
command | getline
print "terminated"
}
I tried using a & at the end of the command, without success:
BEGIN {
command = "{ echo start >> test.log ; sleep 10 ; echo stop >> test.log ;} &"
command | getline
print "terminated"
}
I would like to be able to launch shell commands as new processes and "forget" about them. For example:
BEGIN {
command1 = "dosomething.sh &"
command1 | getline
command2 = "dosomething2.sh &"
command2 | getline
print "terminated"
}
Can this be achieved using gawk 5 only ?

| getline waits for something to read. So output something so it can read it.
$ awk 'BEGIN { cmd="{ echo something; ( echo start >&2 ; sleep 1; echo stop >&2 ) & }"; cmd | getline; print "Awk end"; }' <<<'' ; echo "Awk end"; sleep 2
Awk end
start
After awk
stop

Related

Why double quote does not work in echo statement inside cmd in awk script?

gawk 'BEGIN { FS="|"; OFS="|" }NR ==1 {print} NR >=2 {cmd1="echo -n "$2" | base64 -w 0";cmd1 | getline d1;close(cmd1); print $1,d1 }' dummy2.txt
input:
id|dummy
1|subhashree:1;user=phn
2|subha:2;user=phn
Expected output:
id|dummy
1|c3ViaGFzaHJlZToxO3VzZXI9cGhuCg==
2|c3ViaGE6Mjt1c2VyPXBobgo=
output produced by script:
id|dummy
1|subhashree:1
2|subha:2
I have understood that the double quote around $2 is causing the issue. It does not work hence not encoding the string properly and just stripping off the string after semi colon.Because it does work inside semicolon and gives proper output in terminal.
echo "subhashree:1;user=phn" | base64
c3ViaGFzaHJlZToxO3VzZXI9cGhuCg==
[root#DERATVIV04 encode]# echo "subha:2;user=phn" | base64
c3ViaGE6Mjt1c2VyPXBobgo=
I have tried with different variation with single and double quote inside awk but it does not work.Any help will be highly appreciated.
Thanks a lot in advance.
Your existing cmd1 producing
echo -n subhashree:1;user=phn | base64 -w 0
^ semicolon is there
So if you execute below would produce
$ echo -n subhashree:1;user=phn | base64 -w 0
subhashree:1
With quotes
$ echo -n 'subhashree:1;user=phn' | base64 -w 0
c3ViaGFzaHJlZToxO3VzZXI9cGhu
Solution is just to use quotes before echo -n '<your-string>' | base64 -w 0
$ cat file
id|dummy
1|subhashree:1;user=phn
2|subha:2;user=phn
$ gawk -v q="'" 'BEGIN { FS="|"; OFS="|" }NR ==1 {print} NR >=2 {cmd1="echo -n " q $2 q" | base64 -w 0"; cmd1 | getline d1;close(cmd1); print $1,d1 }' file
id|dummy
1|c3ViaGFzaHJlZToxO3VzZXI9cGhu
2|c3ViaGE6Mjt1c2VyPXBobg==
It can be simplified as below
gawk -v q="'" 'BEGIN {
FS=OFS="|"
}
NR==1{
print;
next
}
{
cmd1="echo -n " q $2 q" | base64 -w 0";
print ((cmd1 | getline d1)>0)? $1 OFS d1 : $0;
close(cmd1);
}
' file
Based on Ed Morton recommendation http://awk.freeshell.org/AllAboutGetline
if/while ( (getline var < file) > 0)
if/while ( (command | getline var) > 0)
if/while ( (command |& getline var) > 0)
The problem is because of lack of quotes, when trying to run the echo command in shell context. What you are trying to do is basically converted into
echo -n subhashree:1;user=phn | base64 -w 0
which the shell has executed as two commands separated by ; i.e. user=phn | base64 -w 0 means an assignment followed by a pipeline, which would be empty because the assignment would not produce any result over standard input for base64 for encode. The other segment subhashree:1 is just echoed out, which is stored in your getline variable d1.
The right approach fixing your problem should be using quotes
echo -n "subhashree:1;user=phn" | base64 -w 0
When you said, you were using quotes to $2, that is not actually right, the quotes are actually used in the context of awk to concatenate the cmd string i.e. "echo -n ", $2 and " | base64 -w 0" are just joined together. The proposed double quotes need to be in the context of the shell.
SO with that and few other fixes, your awk command should be below. Added gsub() to remove trailing spaces, which were present in your input shown. Also used printf over echo.
awk -v FS="|" '
BEGIN {
OFS = FS
}
NR == 1 {
print
}
NR >= 2 {
gsub(/[[:space:]]+/, "", $2)
cmd = "printf \"%s\" \"" $2 "\" | base64 -w 0"
if ((cmd | getline result) > 0) {
$2 = result
}
close(cmd)
print
}
' file
So with the command above, your command is executed as below, which would produce the right result.
printf "%s" "subhashree:1;user=phn" | base64 -w 0
You already got answers explaining how to use awk for this but you should also consider not using awk for this. The tool to sequence calls to other commands (e.g. bas64) is a shell, not awk. What you're trying to do in terms of calls is:
shell { awk { loop_on_input { shell { base64 } } } }
whereas if you call base64 directly from shell it'd just be:
shell { loop_on_input { base64 } }
Note that the awk command is spawning a new subshell once per line of input while the direct call from shell isn't.
For example:
#!/usr/bin/env bash
file='dummy2.txt'
head -n 1 "$file"
while IFS='|' read -r id dummy; do
printf '%s|%s\n' "$id" "$(base64 -w 0 <<<"$dummy")"
done < <(tail -n +2 "$file")
Here's the difference in execution speed for an input file that has each of your data lines duplicated 100 times created by awk -v n=100 'NR==1{print; next} {for (i=1;i<=n;i++) print}' dummy2.txt > file100
$ ./tst.sh file100
Awk:
real 0m23.247s
user 0m3.755s
sys 0m10.966s
Shell:
real 0m14.512s
user 0m1.530s
sys 0m4.776s
The above timing was produced by running this command (both awk scripts posted in answers will have about the same timeing so I just picked one at random):
#!/usr/bin/env bash
doawk() {
local file="$1"
gawk -v q="'" 'BEGIN {
FS=OFS="|"
}
NR==1{
print;
next
}
{
cmd1="echo -n " q $2 q" | base64 -w 0";
print ((cmd1 | getline d1)>0)? $1 OFS d1 : $0;
close(cmd1);
}
' "$file"
}
doshell() {
local file="$1"
head -n 1 "$file"
while IFS='|' read -r id dummy; do
printf '%s|%s\n' "$id" "$(base64 -w 0 <<<"$dummy")"
done < <(tail -n +2 "$file")
}
# Use 3rd-run timing to eliminate cache-ing as a factor
doawk "$1" >/dev/null
doawk "$1" >/dev/null
echo "Awk:"
time doawk "$1" >/dev/null
echo ""
doshell "$1" >/dev/null
doshell "$1" >/dev/null
echo "Shell:"
time doshell "$1" >/dev/null

Redirect input for gawk to a system command

Usually a gawk script processes each line of its stdin. Is it possible to instead specify a system command in the script use the process each line from output of the command in the rest of the script?
For example consider the following simple interaction:
$ { echo "abc"; echo "def"; } | gawk '{print NR ":" $0; }'
1:abc
2:def
I would like to get the same output without using pipe, specifying instead the echo commands as a system command.
I can of course use the pipe but that would force me to either use two different scripts or specify the gawk script inside the bash script and I am trying to avoid that.
UPDATE
The previous example is not quite representative of my usecase, this is somewhat closer:
$ { echo "abc"; echo "def"; } | gawk '/d/ {print NR ":" $0; }'
2:def
UPDATE 2
A shell script parallel would be as follows. Without the exec line the script would read from stdin; with the exec it would use the command that line as input:
/tmp> cat t.sh
#!/bin/bash
exec 0< <(echo abc; echo def)
while read l; do
echo "line:" $l
done
/tmp> ./t.sh
line: abc
line: def
From all of your comments, it sounds like what you want is:
$ cat tst.awk
BEGIN {
if ( ("mktemp" | getline file) > 0 ) {
system("(echo abc; echo def) > " file)
ARGV[ARGC++] = file
}
close("mktemp")
}
{ print FILENAME, NR, $0 }
END {
if (file!="") {
system("rm -f \"" file "\"")
}
}
$ awk -f tst.awk
/tmp/tmp.ooAfgMNetB 1 abc
/tmp/tmp.ooAfgMNetB 2 def
but honestly, I wouldn't do it. You're munging what the shell is good at (creating/destroying files and processes) with what awk is good at (manipulating text).
I believe what you're looking for is getline:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ print line} }' <<< ''
abc
def
Adjusting the answer to you second example:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ counter++; if ( line ~ /d/){print counter":"line} } }' <<< ''
2:def
Let's break it down:
awk '{
cmd = "echo abc; echo def"
# line below will create a line variable containing the ouptut of cmd
while ( ( cmd | getline line) > 0){
# we need a counter because NR will not work for us
counter++;
# if the line contais the letter d
if ( line ~ /d/){
print counter":"line
}
}
}' <<< ''
2:def

Frustrated with simple awk command

I am trying to list out the contents of a field 1 using a function:
help(){
if [[ $# -eq 0 ]] ; then
echo '######################################'
echo ''
echo 'Argument to run run name must be given: ./report.sh Name'
echo 'Report names are:'
ALLNAMES=$(cut -d '|' -f 1 $CONFIGFILE | awk '{printf $0"\n"}')
echo $ALLNAMES
echo '######################################'
exit 0
fi
}
The output I get is :
$ bin/report.sh
######################################
Argument to run run name must be given: ./report.sh Name
Report names are:
ItemA ItemB
######################################
Whereas I want:
$ bin/report.sh
######################################
Argument to run run name must be given: ./report.sh Name
Report names are:
ItemA
ItemB
######################################
If I run the cut command I get:
[david#kallibu]$ cut -d '|' -f 1 conf/report.conf
ItemA
ItemB
Whatdo I need to change to get my newline ?
The problem is:
echo $ALLNAMES
Should be solved with quotes:
echo "$ALLNAMES"
If you're not goint to use the var ALLNAMES in other place, just:
help(){
if [[ $# -eq 0 ]] ; then
echo '######################################'
echo ''
echo 'Argument to run run name must be given: ./report.sh Name'
echo 'Report names are:'
cut -d '|' -f 1 conf/report.conf
echo '######################################'
exit 0
fi
}
Your code would be,
help(){
if [[ $# -eq 0 ]] ; then
echo '######################################'
echo ''
echo 'Argument to run run name must be given: ./report.sh Name'
echo 'Report names are:'
ALLNAMES=$(awk -F'|' '{print $1}' $CONFIGFILE)
echo "$ALLNAMES"
echo '######################################'
exit 0
fi
}
You could try this awk -F'|' '{print $1}' $CONFIGFILE command to get the value of first column where | as delimiter.
You need to put ALLNAMES inside double quotes. So that only, the ALLNAMES variable got expanded.
#Tiago provided the answer to your specific problem, but overall your script should either be the shell script #klashxx posted or this awk script:
help(){
if [[ $# -eq 0 ]] ; then
awk '
BEGIN {
FS = "|"
print "######################################\n"
print "Argument to run run name must be given: ./report.sh Name"
print "Report names are:"
}
{ print $1 }
END {
print "######################################"
}
' "$CONFIGFILE"
exit 0
fi
}
or similar.

How do I get the exit status of a command in a getline pipeline?

In POSIX awk, how do I get the exit status (return code) from command after processing its output via command | getline var? I want my awk script to exit 1 if command exited with a non-zero exit status.
For example, suppose I had an awk script named foo.awk that looks like this:
function close_and_get_exit_status(cmd) {
# magic goes here...
}
BEGIN {
cmd = "echo foo; echo bar; echo baz; false"
while ((cmd | getline line) > 0)
print "got a line of text: " line
if (close_and_get_exit_status(cmd) != 0) {
print "ERROR: command '" cmd "' failed" | "cat >&2"
exit 1
}
print "command '" cmd "' was successful"
}
then I want the following to happen:
$ awk -f foo.awk
got a line of text: foo
got a line of text: bar
got a line of text: baz
ERROR: command 'echo foo; echo bar; echo baz; false' failed
$ echo $?
1
According to the POSIX specification for awk, command | getline returns 1 for successful input, zero for end-of-file, and -1 for an error. It's not an error if command exits with a non-zero exit status, so this can't be used to see if command is done and has failed.
Similarly, close() can't be used for this purpose: close() returns non-zero only if the close fails, not if the associated command returns a non-zero exit status. (In gawk, close(command) returns the exit status of command. This is the behavior I'd like, but I think it violates the POSIX spec and not all implementations of awk behave this way.)
The awk system() function returns the exit status of the command, but as far as I can tell there's no way to use getline with it.
The simplest thing to do is just echo the exit status from shell after the command executes and then read that with getline. e.g.
$ cat tst.awk
BEGIN {
cmd = "echo foo; echo bar; echo baz; false"
mod = cmd "; echo \"$?\""
while ((mod | getline line) > 0) {
if (numLines++)
print "got a line of text: " prev
prev = line
}
status = line
close(mod)
if (status != 0) {
print "ERROR: command '" cmd "' failed" | "cat >&2"
exit 1
}
print "command '" cmd "' was successful"
}
$ awk -f tst.awk
got a line of text: foo
got a line of text: bar
got a line of text: baz
ERROR: command 'echo foo; echo bar; echo baz; false' failed
$ echo $?
1
In case anyone's reading this and considering using getline, make sure you read http://awk.freeshell.org/AllAboutGetline and FULLY understand all the caveats and implications of doing so first.
Not an ideal solution, but you can do:
"command || echo failure" | getline var; ... if( var == "failure" ) exit;
There is some ambiguity in that you have to select the string "failure" in such a way that command can never generate the same string, but perhaps this is an adequate workaround.
The following is horrifically complicated, but it:
is POSIX conformant (mostly -- fflush() isn't yet in the POSIX standard, but it will be and it's widely available)
is general (it works no matter what kind of output is emitted by the command)
does not introduce any processing delay. The accepted answer to this question makes a line available only after the next line has been printed by the command. If the command slowly outputs lines and responsiveness is important (e.g., occasional events printed by an IDS system that should trigger a firewall change or email notification), this answer might be more appropriate than the accepted answer.
The basic approach is to echo the exit status/return value after the command completes. If this last line is non-zero, exit the awk script with an error. To prevent the code from mistaking a line of text output by the command for the exit status, each line of text output by the command is prepended with a letter that is later stripped off.
function stderr(msg) { print msg | "cat >&2"; }
function error(msg) { stderr("ERROR: " msg); }
function fatal(msg) { error(msg); exit 1; }
# Wrap cmd so that each output line of cmd is prefixed with "d".
# After cmd is done, an additional line of the format "r<ret>" is
# printed where "<ret>" is the integer return code/exit status of the
# command.
function safe_cmd_getline_wrap(cmd) {
return \
"exec 3>&1;" \
"ret=$(" \
" exec 4>&1;" \
" { ( "cmd" ) 4>&-; echo $? >&4; } 3>&- |" \
" awk '{print\"d\"$0;fflush()}' >&3 4>&-;" \
");" \
"exec 3>&-;" \
"echo r${ret};"
}
# like "cmd | getline line" except:
# * if getline fails, the awk script exits with an error
# * if cmd fails (returns non-zero), the awk script exits with an
# error
# * safe_cmd_getline_close(cmd) must be used instead of close(cmd)
function safe_cmd_getline(cmd, wrapped_cmd,ret,type) {
wrapped_cmd = safe_cmd_getline_wrap(cmd)
ret = (wrapped_cmd | getline line)
if (ret == -1) fatal("failed to read line from command: " cmd)
if (ret == 0) return 0
type = substr(line, 1, 1)
line = substr(line, 2)
if (type == "d") return 1
if (line != "0") fatal("command '" cmd "' failed")
return 0
}
function safe_cmd_getline_close(cmd) {
if (close(safe_cmd_getline_wrap(cmd))) fatal("failed to close " cmd)
}
You use the above like this:
cmd = "ls no-such-file"
while (safe_cmd_getline(cmd)) {
print "got a line of text: " line
}
safe_cmd_getline_close(cmd)
If you have mktemp command, you could store the exit status in a temporary file:
#!/bin/sh
set -e
file=$(mktemp)
finish() {
rm -f "$file"
}
trap 'finish' EXIT
trap 'finish; trap - INT; kill -s INT $$' INT
trap 'finish; trap - TERM; kill $$' TERM
awk -v file="$file" 'BEGIN{
o_cmd="echo foo; echo bar; echo baz; false"
cmd = "("o_cmd "); echo $? >\""file"\""
print cmd
while ((cmd | getline) > 0) {
print "got a line of text: " $0
}
close(cmd)
getline ecode <file; close(file)
print "exit status:", ecode
if(ecode)exit 1
}'

Assigning system command's output to variable

I want to run the system command in an awk script and get its output stored in a variable. I've been trying to do this, but the command's output always goes to the shell and I'm not able to capture it. Any ideas on how this can be done?
Example:
$ date | awk --field-separator=! {$1 = system("strip $1"); /*more processing*/}
Should call the strip system command and instead of sending the output to the shell, should assign the output back to $1 for more processing. Rignt now, it's sending output to shell and assigning the command's retcode to $1.
Note: Coprocess is GNU awk specific.
Anyway another alternative is using getline
cmd = "strip "$1
while ( ( cmd | getline result ) > 0 ) {
print result
}
close(cmd)
Calling close(cmd) will prevent awk to throw this error after a number of calls :
fatal: cannot open pipe `…' (Too many open files)
To run a system command in awk you can either use system() or cmd | getline.
I prefer cmd | getline because it allows you to catch the value into a variable:
$ awk 'BEGIN {"date" | getline mydate; close("date"); print "returns", mydate}'
returns Thu Jul 28 10:16:55 CEST 2016
More generally, you can set the command into a variable:
awk 'BEGIN {
cmd = "date -j -f %s"
cmd | getline mydate
close(cmd)
}'
Note it is important to use close() to prevent getting a "makes too many open files" error if you have multiple results (thanks mateuscb for pointing this out in comments).
Using system(), the command output is printed automatically and the value you can catch is its return code:
$ awk 'BEGIN {d=system("date"); print "returns", d}'
Thu Jul 28 10:16:12 CEST 2016
returns 0
$ awk 'BEGIN {d=system("ls -l asdfasdfasd"); print "returns", d}'
ls: cannot access asdfasdfasd: No such file or directory
returns 2
Figured out.
We use awk's Two-way I/O
{
"strip $1" |& getline $1
}
passes $1 to strip and the getline takes output from strip back to $1
gawk '{dt=substr($4,2,11); gsub(/\//," ",dt); "date -d \""dt"\" +%s"|getline ts; print ts}'
You can use this when you need to process a grep output:
echo "some/path/exex.c:some text" | awk -F: '{ "basename "$1"" |& getline $1; print $1 " ==> " $2}'
option -F: tell awk to use : as field separator
"basename "$1"" execute shell command basename on first field
|& getline $1 reads output of previous shell command in substream
output:
exex.c ==> some text
I am using macOS's awk and I also needed exit status of the command. So I extended #ghostdog74's solution to get the exit status too:
Exit if non-zero exit status:
cmd = <your command goes here>
cmd = cmd" ; printf \"\n$?\""
last_res = ""
value = ""
while ( ( cmd | getline res ) > 0 ) {
if (value == "") {
value = last_res
} else {
value = value"\n"last_res
}
last_res = res
}
close(cmd)
# Now `res` has the exit status of the command
# and `value` has the complete output of command
if (res != 0) {
exit 1
} else {
print value
}
So basically I just changed cmd to print exit status of the command on a new line. After the execution of the above while loop, res would contain the exit status of the command and
value would contain the complete output of the command.
Honestly not a very neat way and I myself would like to know if there is some better way.