Variable operator for numerical comparison in awk - awk

Can awk use variable operators for numerical comparison? The following code works with a hard coded operator, but not with a variable operator:
awk -v o="$operator" -v c="$comparison" '$1 o c'

No, that cannot work. Awk's -v option defines actual Awk variables, and not token-level macro substitutions.
It doesn't work for the same reason that this doesn't work:
awk 'BEGIN { o = "+"; print 2 o 2 }' # hoping for 2 + 2
Awk is different from the POSIX shell and similar languages; it doesn't evaluate variables by means of textual substitution.
Since you're calling Awk from a shell command line, you can use the shell's substitution to generate the Awk syntax, thereby obtaining that effect:
awk -v c="$comparison" "\$1 $operator c"
We now need a backslash on the $1 because we switched to double quotes, inside of which $1 is now recognized by the shell itself.

Another way to the one proposed by Kaz would be to define your own mapping function which takes the two variables as argument and the corresponding operator string o:
awk -v o="$operator" -v c="$comparison" '
function operator(arg1, arg2, op) {
if (op == "==") return arg1 == arg2
if (op == "!=") return arg1 != arg2
if (op == "<") return arg1 < arg2
if (op == ">") return arg1 > arg2
if (op == "<=") return arg1 <= arg2
if (op == ">=") return arg1 >= arg2
}
{ print operator($1,c,o) }'
This way you can also define your own operators.

No but you have a couple of options, the simplest being to let the shell expand one of the variables to become part of the awk script before awk runs on it:
$ operator='>'; comparison='3'
$ echo 5 | awk -v c="$comparison" '$1 '"$operator"' c'
5
Otherwise you can write your own eval-style function, e.g.:
$ cat tst.awk
cmp($1,o,c)
function cmp(x,y,z, cmd,line,ret) {
cmd = "awk \047BEGIN{print (" x " " y " " z ")}\047"
if ( (cmd | getline line) > 0 ) {
ret = line
}
close(cmd)
return ret
}
$ echo 5 | awk -v c="$comparison" -v o="$operator" -f tst.awk
5
See https://stackoverflow.com/a/54161251/1745001. The latter would work even if your awk program was saved in a file while the former would not. If you want to mix a library of functions with command line scripts then here's one way with GNU awk for -i:
$ cat tst.awk
function cmp(x,y,z, cmd,line,ret) {
cmd = "awk \047BEGIN{print (" x " " y " " z ")}\047"
if ( (cmd | getline line) > 0 ) {
ret = line
}
close(cmd)
return ret
}
$ awk -v c="$comparison" -v o="$operator" -i tst.awk 'cmp($1,o,c)'
5

Related

awk not printing variables as expected

Trying really hard to get awk print out the following variables. But no matter how I tried,
awk -F, -v x=$CLIENT_ID -v y=$BRANCH -v z=$UUID -v b=$HERMES_GROUP_CSV_ID 'BEGIN {
OFS = ","; ORS = "\n"
} {
if (length($3) == 0) {
printf "\nCLIENT $x at $y Linux System Time: $z Pacific Time: $b #####: Column 3, Row "; printf NR; printf " data missing in the Client $x group input csv. Please check\n"
}
}' ${INPUT_FILE}
it always prints out
CLIENT $x at $y Linux System Time: $z Pacific Time: $b #####: Column 3, Row 249 data missing in the Client $x group input csv. Please check
Could any guru enlighten? Thanks.
You are using $x as a variable reference, but $ in awk is to reference fields in the input. Variables are used without decoration, like x. So:
awk -F, -v x=$CLIENT_ID -v y=$BRANCH -v z=$UUID -v b=$HERMES_GROUP_CSV_ID 'BEGIN {
OFS = ","; ORS = "\n"
} {
if (length($3) == 0) {
print "\nCLIENT "x" at "y" Linux System Time: "z" Pacific Time: "b" #####: Column 3, Row "; printf NR; printf " data missing in the Client "x" group input csv. Please check\n"
}
}' ${INPUT_FILE}
It looks like x is quoted here, but it is not: the point is to have x appear NOT in the quoted string, so that it can be expanded as a variable.

How to evaluate or process if statements in data?

Background
I wrote a bash script that pulls simple user functions from a PostgreSQL database, using awk converts pgplsql commands to SQL (like PERFORM function() to SELECT function(), removes comments --.*, etc.), stores the SQL commands to a file (file.sql) and reads and executes them in the database:
$ psql ... -f file.sql db
The functions are simple, mostly just calling other user defined functions. But how to "evaluate" or process an IF statement?:
IF $1 = 'customer1' THEN -- THESE $1 MEANS ARGUMENT TO PGPL/SQL FUNCTION
PERFORM subfunction1($1); -- THAT THIS IF STATEMENT IS IN:
ELSE -- SELECT function('customer1');
PERFORM subfunction2($1); -- $1 = 'customer1'
END IF;
Tl;dr:
IFs and such are not SQL so they should be pre-evaluated using awk. It's safe to assume that above is already processed into one record with comments removed:
IF $1 = 'customer1' THEN PERFORM subfunction1($1); ELSE PERFORM subfunction2($1); END IF;
After "evaluating" above should be replaced with:
SELECT subfunction1('customer1');
if the awk to evaluate it was called:
$ awk -v arg1="customer1' -f program.awk file.sql
or if arg1 is anything else, for example for customer2:
SELECT subfunction2('customer2');
Edit
expr popped into my mind first thing when I woke up:
$ awk -v arg="'customer1'" '
{
gsub(/\$1/,arg) # replace func arg with string
n=split($0,a,"(IF|THEN|ELSE|ELSE?IF|END IF;)",seps) # seps to get ready for SQL CASE
if(seps[1]=="IF") {
# here should be while for ELSEIF
c="expr " a[2]; c|getline r; close(c) # use expr to solve
switch (r) { # expr has 4 return values
case "1": # match
print a[3]
break
case "0": # no match
print a[4]
break
default: # (*) see below
print r
exit # TODO
} } }' file.sql
(*) expr outputs 0,1,2 or 3:
$ expr 1 = 1
1
$ expr 1 = 2
0
However, if you omit spaces:
$ expr 1=1
1=1
Without writing a full language parser, if you're looking for something cheap and cheerful then this might be a decent starting point:
$ cat tst.awk
{ gsub(/\$1/,"\047"arg1"\047") }
match($0,/^IF\s+(\S+)\s+(\S+)\s+(\S+)\s+THEN\s+(\S+)\s+(\S+)\s+ELSE\s+(\S+)\s+(\S+)\s+END\s+IF/,a) {
lhs = a[1]
op = a[2]
rhs = a[3]
trueAct = (a[4] == "PERFORM" ? "SELECT" : a[4]) FS a[5]
falseAct = (a[6] == "PERFORM" ? "SELECT" : a[6]) FS a[7]
if (op == "=") {
print (lhs == rhs ? trueAct : falseAct)
}
}
$ awk -v arg1='customer1' -f tst.awk file
SELECT subfunction1('customer1');
$ awk -v arg1='bob' -f tst.awk file
SELECT subfunction2('bob');
The above uses GNU awk for the 3rd arg to match(). Hopefully it's easy enough to understand that you can massage as needed to handle other constructs or other variations of this construct.

Delete a variable in awk

I wonder if it is possible to delete a variable in awk. For an array, you can say delete a[2] and the index 2 of the array a[] will be deleted. However, for a variable I cannot find a way.
The closest I get is to say var="" or var=0.
But then, it seems that the default value of a non-existing variable is 0 or False:
$ awk 'BEGIN {if (b==0) print 5}'
5
$ awk 'BEGIN {if (!b) print 5}'
5
So I also wonder if it is possible to distinguish between a variable that is set to 0 and a variable that has not been set, because it seems not to:
$ awk 'BEGIN {a=0; if (a==b) print 5}'
5
There is no operation to unset/delete a variable. The only time a variable becomes unset again is at the end of a function call when it's an unused function argument being used as a local variable:
$ cat tst.awk
function foo( arg ) {
if ( (arg=="") && (arg==0) ) {
print "arg is not set"
}
else {
printf "before assignment: arg=<%s>\n",arg
}
arg = rand()
printf "after assignment: arg=<%s>\n",arg
print "----"
}
BEGIN {
foo()
foo()
}
$ awk -f tst.awk file
arg is not set
after assignment: arg=<0.237788>
----
arg is not set
after assignment: arg=<0.291066>
----
so if you want to perform some actions A then unset the variable X and then perform actions B, you could encapsulate A and/or B in functions using X as a local var.
Note though that the default value is zero or null, not zero or false, since its type is "numeric string".
You test for an unset variable by comparing it to both null and zero:
$ awk 'BEGIN{ if ((x=="") && (x==0)) print "y" }'
y
$ awk 'BEGIN{ x=0; if ((x=="") && (x==0)) print "y" }'
$ awk 'BEGIN{ x=""; if ((x=="") && (x==0)) print "y" }'
If you NEED to have a variable you delete then you can always use a single-element array:
$ awk 'BEGIN{ if ((x[1]=="") && (x[1]==0)) print "y" }'
y
$ awk 'BEGIN{ x[1]=""; if ((x[1]=="") && (x[1]==0)) print "y" }'
$ awk 'BEGIN{ x[1]=""; delete x; if ((x[1]=="") && (x[1]==0)) print "y" }'
y
but IMHO that obfuscates your code.
What would be the use case for unsetting a variable? What would you do with it that you can't do with var="" or var=0?
An unset variable expands to "" or 0, depending on the context in which it is being evaluated.
For this reason, I would say that it's a matter of preference and depends on the usage of the variable.
Given that we use a + 0 (or the slightly controversial +a) in the END block to coerce the potentially unset variable a to a numeric type, I guess you could argue that the natural "empty" value would be "".
I'm not sure that there's too much to read in to the cases that you've shown in the question, given the following:
$ awk 'BEGIN { if (!"") print }'
5
("" is false, unsurprisingly)
$ awk 'BEGIN { if (b == "") print 5 }'
5
(unset variable evaluates equal to "", just the same as 0)

Redirect input for gawk to a system command

Usually a gawk script processes each line of its stdin. Is it possible to instead specify a system command in the script use the process each line from output of the command in the rest of the script?
For example consider the following simple interaction:
$ { echo "abc"; echo "def"; } | gawk '{print NR ":" $0; }'
1:abc
2:def
I would like to get the same output without using pipe, specifying instead the echo commands as a system command.
I can of course use the pipe but that would force me to either use two different scripts or specify the gawk script inside the bash script and I am trying to avoid that.
UPDATE
The previous example is not quite representative of my usecase, this is somewhat closer:
$ { echo "abc"; echo "def"; } | gawk '/d/ {print NR ":" $0; }'
2:def
UPDATE 2
A shell script parallel would be as follows. Without the exec line the script would read from stdin; with the exec it would use the command that line as input:
/tmp> cat t.sh
#!/bin/bash
exec 0< <(echo abc; echo def)
while read l; do
echo "line:" $l
done
/tmp> ./t.sh
line: abc
line: def
From all of your comments, it sounds like what you want is:
$ cat tst.awk
BEGIN {
if ( ("mktemp" | getline file) > 0 ) {
system("(echo abc; echo def) > " file)
ARGV[ARGC++] = file
}
close("mktemp")
}
{ print FILENAME, NR, $0 }
END {
if (file!="") {
system("rm -f \"" file "\"")
}
}
$ awk -f tst.awk
/tmp/tmp.ooAfgMNetB 1 abc
/tmp/tmp.ooAfgMNetB 2 def
but honestly, I wouldn't do it. You're munging what the shell is good at (creating/destroying files and processes) with what awk is good at (manipulating text).
I believe what you're looking for is getline:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ print line} }' <<< ''
abc
def
Adjusting the answer to you second example:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ counter++; if ( line ~ /d/){print counter":"line} } }' <<< ''
2:def
Let's break it down:
awk '{
cmd = "echo abc; echo def"
# line below will create a line variable containing the ouptut of cmd
while ( ( cmd | getline line) > 0){
# we need a counter because NR will not work for us
counter++;
# if the line contais the letter d
if ( line ~ /d/){
print counter":"line
}
}
}' <<< ''
2:def

AWK - execute string as command?

This command prints:
$ echo "123456789" | awk '{ print substr ($1,1,4) }'
1234
Is it possible to execute a string as command? For example, this command:
echo "123456789" | awk '{a="substr"; print a ($1,1,4) }'
Result:
$ echo "123456789" | awk '{a="substr"; print a ($1,1,4) }'
awk: {a="substr"; print a ($1,1,4) }
awk: ^ syntax error
EDIT:
$ cat tst.awk
function my_substr(x,y,z) { return substr(x,y,z) }
{ a="my_substr"; print #a($1,1,4) }
bolek#bolek-desktop:~/Pulpit$ echo "123456789" | gawk -f tst.awk
gawk: tst.awk:3: { a="my_substr"; print #a($1,1,4) }
gawk: tst.awk:3: ^ nieprawidłowy znak '#' w wyrażeniu
bolek#bolek-desktop:~/Pulpit$
I don't think it is possible to do that in awk directly, but you can get a similar effect by using the shell. Recall that the awk program is given as a string, and strings are concatenated in the shell just by writing them next to one another. Thus, you can do this:
a=substr
echo "123456789" | awk '{ print '"$a"'($1, 1, 4) }'
resulting in 1234.
You can call user-defined functions via variables in GNU awk using indirect function calls, see http://www.gnu.org/software/gawk/manual/gawk.html#Indirect-Calls
$ cat tst.awk
function foo() { print "foo() called" }
function bar() { print "bar() called" }
BEGIN {
the_func = "foo"
#the_func()
the_func = "bar"
#the_func()
}
$ gawk -f tst.awk
foo() called
bar() called
Unfortunately due to internal implementation issues, if you want to call builtin functions that way then you need to write a wrapper for each:
$ cat tst.awk
function my_substr(x,y,z) { return substr(x,y,z) }
{ a="my_substr"; print #a($1,1,4) }
$ echo "123456789" | gawk -f tst.awk
1234