How does gawk -e 'BEGIN {' -e 'print "hello" }' work? - awk

Gawk 5.0.0 was released on April 12, 2019. Going through the announcement I found this:
Changes from 4.2.1 to 5.0.0
(...) 11. Namespaces have been implemented! See the manual. One consequence of this is that files included with -i, read with -f, and command line program segments must all be self-contained syntactic units. E.g., you can no longer do something like this:
gawk -e 'BEGIN {' -e 'print "hello" }'
I was curious about this behaviour that is no longer supported, but unfortunately my Gawk 4.1.3 did not offer much output out of it:
$ gawk -e 'BEGIN {' -e 'print "hello" }'
gawk: cmd. line:1: BEGIN {
gawk: cmd. line:1: ^ unexpected newline or end of string
From what I see in the manual of GAWK 4.2, the -e option was marked as problematic already:
GNU Awk User's Guide, on Options
-e program-text
--source program-text
Provide program source code in the program-text. This option allows you to mix source code in files with source code that you enter on the command line. This is particularly useful when you have library functions that you want to use from your command-line programs (see AWKPATH Variable).
Note that gawk treats each string as if it ended with a newline character (even if it doesn’t). This makes building the total program easier.
CAUTION: At the moment, there is no requirement that each program-text be a full syntactic unit. I.e., the following currently works:
$ gawk -e 'BEGIN { a = 5 ;' -e 'print a }'
-| 5
However, this could change in the future, so it’s not a good idea to rely upon this feature.
But, again, this fails in my console:
$ gawk -e 'BEGIN {a=5; ' -e 'print a }'
gawk: cmd. line:1: BEGIN {a=5;
gawk: cmd. line:1: ^ unexpected newline or end of string
So what is gawk -e 'BEGIN {' -e 'print "hello" }' doing exactly on Gawk < 5?

It's doing just what you'd expect - concatenating the parts to form gawk 'BEGIN {print "hello" }' and then executing it. You can actually see how gawk is combining the code segments by pretty-printing it:
$ gawk -o- -e 'BEGIN {' -e 'print "hello" }'
BEGIN {
print "hello"
}
That script isn't useful to be written in sections and concatenated but if you consider something like:
$ cat usea.awk
{ a++ }
$ echo foo | gawk -e 'BEGIN{a=5}' -f usea.awk -e 'END{print a}'
6
then you can see the intended functionality might be useful for mixing some command-line code with scripts stored in files to run:
$ gawk -o- -e 'BEGIN{a=5}' -f usea.awk -e 'END{print a}'
BEGIN {
a = 5
}
{
a++
}
END {
print a
}

Related

Print paragraph if it contains a string stored in a variable (blank lines separate paragraphs)

I am trying to isolate the header of a mail in the /var/spool/mail/mysuser file.
Print a paragraph if it contains AAA (blank lines separate paragraphs)
sed is working when searching with the string "AAA"
$ sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;' /var/spool/mail/mysuser
When using a variable is does not work :
$ MyVar="AAA"
$ sed -e '/./{H;$!d;}' -e 'x;/$MyVar/!d;' /var/spool/mail/mysuser
=> No output as the single quotes prevent the expantion of the variable
Trying with singles quotes
$ sed -e "/./{H;$!d;}" -e "x;/$MyVar/!d; /var/spool/mail/mysuser
sed: -e expression #2, char 27: extra characters after command
Actually, the first search is also not working with doubles quotes
$ sed -e "/./{H;$!d;}" -e 'x;/AAA/!d;" /var/spool/mail/mysuser
sed -e "/./{H;$!d;}" -e "x;/AAA/date;" /var/spool/mail/mysuser
sed: -e expression #2, char 9: extra characters after command
I am also considering awk without success so far
Any advices ?
should be trivial with awk
$ awk -v RS= '/AAA/' file
with a variable, little more needed
$ awk -v RS= -v var='AAA' '$0~var'
or if it's defined elsewhere
$ awk -v RS= -v var="$variable_holding_value" '$0~var'
That is happening because of the single quotes. You need to go out of the single quotes to enable interpolation:
sed -e '/./{H;$!d;}' -e 'x;/'$MyVar'/!d;' /var/spool/mail/mysuser
or, better put the variable in double quotes:
sed -e '/./{H;$!d;}' -e 'x;/'"$MyVar"'/!d;' /var/spool/mail/mysuser
Thanks to karakfa
It works with :
MyVar="AAA"
awk -v RS= -v X=$MyVar '$0~X' file

AWK (igawk) #include statement fails

Here’s what I’m currently trying as a base case with the function definition written manually (which works):
igawk 'function tripleit(x) {return x*3} {print tripleit($1)}' <(echo 5)
Here is a theoretically more practical version calling a function library (which fails):
igawk '#include $HOME/code/thefunc {print tripleit($1)}' <(echo 5)
Here's "thefunc" :
function tripleit(x){return x*3}
If anyone knows HOW or WHY this is failing, and how I can get something like this to work, it would be super-helpful. I love AWK, but I'm not about to type and retype UDFs each and every time I need them.
I have tried to create foo.awk:
function foo(){print "Hello World"}
And call this as suggested:
$ cat foo.awk
function foo(){print "Hello World"}
$ igawk '#include "foo.awk"; BEGIN{foo()}'
igawk:/dev/stdin:0: cannot find "foo.awk";
$ igawk '#include "$PWD/foo.awk"; BEGIN{foo()}'
$ igawk '#include "./foo.awk"; BEGIN{foo()}'
$
No output yet.
awk has no idea what the shell variable $HOME contains and #include requires a string as it's argument.
$ cat foo.awk
function foo() {
print "Hello World"
}
$ gawk '#include $PWD/foo.awk; BEGIN{foo()}'
gawk: cmd. line:1: #include $PWD/foo.awk; BEGIN{foo()}
gawk: cmd. line:1: ^ syntax error
$ gawk '#include "$PWD/foo.awk"; BEGIN{foo()}'
gawk: cmd. line:1: error: can't open source file `$PWD/foo.awk' for reading (No such file or directory)
$ gawk '#include "./foo.awk"; BEGIN{foo()}'
Hello World
You can also use AWKPATH instead of explitly providing the library directory path every time:
$ echo "$AWKPATH"
$ gawk '#include "foo.awk"; BEGIN{foo()}'
Hello World
$ mkdir blob
$ mv foo.awk blob
$ gawk '#include "foo.awk"; BEGIN{foo()}'
gawk: cmd. line:1: error: can't open source file `foo.awk' for reading (No such file or directory)
$ AWKPATH="$PWD/blob:$AWKPATH" gawk '#include "foo.awk"; BEGIN{foo()}'
Hello World
alternatively try:
gawk -f foo.awk -f - <<<'BEGIN{foo()}'
(plopping this here in case I run into this again ...)
It took me some fiddling about to get this right, but you can encode AWKPATH (or any other environment variable) into any script like this:
#!/usr/bin/env -S AWKPATH=${HOME}/bin awk -f
#include "utilities.awk"
...
Don't forget to chmod +x the script.
The tricky part was the man page documentation for -S which says
-S, --split-string=S
which seems to imply the following (which fails):
#!/usr/bin/env -S AWKPATH=${HOME}/bin awk -f

Awk replacement pieces size limit

Trying to find a single word and replace it with the contents of a file. Works on MacOS, but not under linux.
Here is the awk that fails under linux:
awk -v var="${blah}" '{sub(/%WORD%/,var)}1' file.xml
(file.xml is 122 lines, 4.7K)
Error is:
awk: program limit exceeded: replacement pieces size=255
Same file.xml under MacOS, using a slightly different awk works fine:
awk -v var="${blah//$'\n'/\\n}" '{sub(/%WORD%/,var)1}'
Recompiling awk is not an option. This is Ubuntu 12.04, 32-bit.
You could use sed
FILE=`cat Filename`
sed "s/WORD/${FILE}/g" file.xml > newfile.xml
Turns out that good old 'replace' out performs awk in this use case--who would have thought?
replace -v "%WORD%" "$blah" -- file.xml
Using Gnu Awk version 4, and the readfile extension:
gawk -f a.awk file.xml
where a.awk is:
#load "readfile"
BEGIN{
var = readfile("blah")
if (var == "" && ERRNO != "")
print("problem reading file", ERRNO) > "/dev/stderr"
}
{
sub(/%WORD%/,var)
print
}

awk: passing variables from bash

I am getting syntax errors with the following code. Is there an awk version that does not support the "-v" option or am I missing something? Thanks.
#!/usr/local/bin/bash
f_name="crap.stat"
S_Date="2012-02-10"
E_Date="2012-02-13"
awk -F "\t" -v s_date="$S_Date" -v e_date="$E_Date" 'BEGIN {print s_date,e_date}' $f_name
Your code completely works on my awk (GNU Awk 3.1.6).
There is another way though, If you export your variables you can use it in ENVIRON array
$ export f_name="crap.stat"
$ awk '{ print ENVIRON["f_name"] }' anyfile
crap.stat
The default awk program on Solaris 10 (aka oawk) does not seem to support the -v option; the alternative nawk program does support it. Some people switch the name awk so it is a link to nawk, so you can't readily predict which you'll find as awk.
The awk programs on HP-UX 11.x, AIX 6.x and Mac OS X (10.7.x) all support the -v notation, which isn't very surprising since POSIX expects support for -v.

Awk processing of filenames containing backslash madness

I spent a whole day trying to process some files with backslashes and spaces inside their names. No matter what I do awk (gawk) refuses to print backslashes:
echo "this/pathname/contains/spa ces/and/back\\slashes" | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk "{printf {}}"'
this/pathname/contains/spa ces/and/back\slashes
gawk: {printf this/pathname/contains/spa ces/and/back\slashes}
gawk: ^ syntax error
gawk: {printf this/pathname/contains/spa ces/and/back\slashes}
gawk: ^ backslash not last character on line
This didn't work since the backspace gets directly into awk code.
echo "this/pathname/contains/spa ces/and/back\\slashes" | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk "{printf \"{}\"}"'
this/pathname/contains/spa ces/and/back\slashes
gawk: warning: escape sequence `\s' treated as plain `s'
this/pathname/contains/spa ces/and/backslashes
This worked, but awk eats the backslash. As you can see above, echo prints it but awk doesn't.
echo "this/pathname/contains/spa ces/and/back\\slashes" | ./escape.sh | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk "{printf \"{}\"}"'
this/pathname/contains/spa\ ces/and/back\slashes
gawk: warning: escape sequence `\ ' treated as plain ` '
gawk: warning: escape sequence `\s' treated as plain `s'
Next I tried escaping the filenames using escape.sh
#!/bin/bash
xargs -d'\n' -n1 -I{} bash -c 'echo $(printf "%q" "{}")'
Now there's a double backslash in there but awk still complains.
echo "this/pathname/contains/spa ces/and/back\\slashes" | ./escape.sh | xargs -d'\n' -n1 -I{} bash -c 'echo "{}"; echo whatever | gawk -v VAR=$(printf "%q" "{}") "{printf VAR}"'
this/pathname/contains/spa\ ces/and/back\slashes
gawk: ces/and/back\\slashes
gawk: ^ syntax error
gawk: ces/and/back\\slashes
gawk: ^ unterminated regexp
Now awk said some nonsense about some unterminated regexp.
Any ideas? Thanks!
You are solving the wrong problem: Regardless of the tool, backslashes and spaces in filenames on UNIX-Systems will always mean extra work. In my opinion you should sanitize the filenames, then process them.
Try:
sed "s/ /_/g;s/\\\\/-/g"
HTH Chris
The fix is just to double every backslash that is fed into mawk, either in the input or via variables.
Like this:
# awk needs escaped backslashes
VAR=$(echo "$1" | sed -r 's:\\:\\\\:g')
mawk -v VAR="$VAR" -f "script.awk"
Therefore, if a filename containing backslashes is passed inside $1, this is how you obtain the expected result.
I don't understand why you're piping into xargs. Is that a requirement of your process? Can you do something like this:
filename='this/pathname/contains/spa ces/and/back\slashes'
awk -v "fname=$filename" 'BEGIN {print fname}'