I would like to create a single sentence (In order partially answer my question: AWK Assignment and execute operation with variables, using split and concatenation without space).
% awk 'BEGIN { str1 = "foo"; str2 = "bar"; str3 = str1 str2; print str3 }'
foobar
That is very easy. But, before is static!!!
Taking in account:
% echo $(echo "foo")
foo
Now, I would like to "calculate" the value of str1.
% awk 'BEGIN { str1 = $(echo "foo"); str2 = "bar"; str3 = str1 str2; print str3 }'
awk: illegal field $(foo), name "(null)"
source line number 1
Is it possible to do the assignment dynamically (product of other action/command) the value for str1 using AWK?
As #anubhava help me:
I get:
% awk -v str1="$(echo "foo")" 'BEGIN {str2 = "bar"; print str1 str2 }'
foobar
Now, How I can use the first variable as argument for assignment for second variable?
% awk -v str1="$(echo "foo")" -v str2="$(echo str1)bar" 'BEGIN {my operation with str2 }'
But Currently I get:
% awk -v str1="$(echo "foo")" -v str2="$(echo str1)bar" 'BEGIN {print str2 }'
str1bar
Partially:
% str1="$(echo 'foo')"; str2="$(echo ${str1}'bar')";awk -v result="$str2" 'BEGIN{print result}'
foobar
As I mentioned in my comment, do shell stuff in shell and awk stuff in awk. Don't try to hamfist your shell logic into your awk script.
Consider your attempt:
awk -v str1="$(echo "foo")" -v str2="$(echo str1)bar" 'BEGIN {my operation with str2 }'
You want, in shell, to echo "foo" into a variable. Then, in shell, you want to concatenate "bar" with that prior variable. So... do it in shell before calling your awk script:
str1="$(echo 'foo')"
str2="$(echo ${str1}'bar')"
awk -v foobar="$str2" '{BEGIN my operation with str2}'
All that -v flag does is say "Set my internal awk variable to this value" so there is no reason to try to hamfist logic into those flags.
You're overcomplicating things or the example you chose is too simplistic. There is no job for awk here, all can be done simply on command line.
$ str1=foo; str2=bar; echo ${str1}${str2}
foobar
Related
I'm processing a Wireshark config file (dfilter_buttons) for display filters and would like to print out the filter of a given name. The content of file is like:
Sample input
"TRUE","test","sip contains \x22Hello, world\x5cx22\x22",""
And the resulting output should have the escape sequences replaced, so I can use them later in my script:
Desired output
sip contains "Hello, world\x22"
My first pass is like this:
Current parser
filter_name=test
awk -v filter_name="$filter_name" 'BEGIN {FS="\",\""} ($2 == filter_name) {print $3}' "$config_file"
And my output is this:
Current output
sip contains \x22Hello, world\x5cx22\x22
I know I can handle these exact two escape sequences by piping to sed and matching those exact two sequences, but is there a generic way to substitutes all escape sequences? Future filters I build may utilize more escape sequences than just " and , and I would like to handle future scenarios.
Using gnu-awk you can do this using split, gensub and strtonum functions:
awk -F '","' -v filt='test' '$2 == filt {n = split($3, subj, /\\x[0-9a-fA-F]{2}/, seps); for (i=1; i<n; ++i) printf "%s%c", subj[i], strtonum("0" substr(seps[i], 2)); print subj[i]}' file
sip contains "Hello, world\x22"
A more readable form:
awk -F '","' -v filt='test' '
$2 == filt {
n = split($3, subj, /\\x[0-9a-fA-F]{2}/, seps)
for (i=1; i<n; ++i)
printf "%s%c", subj[i], strtonum("0" substr(seps[i], 2))
print subj[i]
}' file
Explanation:
Using -F '","' we split input using delimiter ","
$2 == filt we filter input for $2 == "test" condition
Using /\\x[0-9a-fA-F]{2}/ as regex (that matches 2 digit hex strings) we split $3 and save split tokens into array subj and matched separators into array seps
Using substr we remove first char i.e \\ and prepend 0
Using strtonum we convert hex string to equivalent ascii number
Using %c in printf we print corresponding ascii character
Last for loop joins $3 back using subj and seps array elements
Using GNU awk for FPAT, gensub(), strtonum(), and the 3rd arg to match():
$ cat tst.awk
BEGIN { FPAT="([^,]*)|(\"[^\"]*\")"; OFS="," }
$2 == ("\"" filter_name "\"") {
gsub(/^"|"$/,"",$3)
while ( match($3,/(\\x[0-9a-fA-F]{2})(.*)/,a) ) {
printf "%s%c", substr($3,1,RSTART-1), strtonum(gensub(/./,0,1,a[1]))
$3 = a[2]
}
print $3
}
$ awk -v filter_name='test' -f tst.awk file
sip contains "Hello, world\x22"
The above assumes your escape sequences are always \x followed by exactly 2 hex digits. It isolates every \xHH string in the input, replaces \ with 0 in that string so that strtonum() can then convert the string to a number, then uses %c in the printf formatting string to convert that number to a character.
Note that GNU awk has a debugger (see https://www.gnu.org/software/gawk/manual/gawk.html#Debugger) so if you're ever not sure what any part of a program does you can just run it in the debugger (-D) and trace it, e.g. in the following I plant a breakpoint to tell awk to stop at line 1 of the script (b 1), then start running (r) and the step (s) through the script printing the value of $3 (p $3) at each line so I can see how it changes after the gsub():
$ awk -D -v filter_name='test' -f tst.awk file
gawk> b 1
Breakpoint 1 set at file `tst.awk', line 1
gawk> r
Starting program:
Stopping in BEGIN ...
Breakpoint 1, main() at `tst.awk':1
1 BEGIN { FPAT="([^,]*)|(\"[^\"]*\")"; OFS="," }
gawk> p $3
$3 = uninitialized field
gawk> s
Stopping in Rule ...
2 $2 == "\"" filter_name "\"" {
gawk> p $3
$3 = "\"sip contains \\x22Hello, world\\x5cx22\\x22\""
gawk> s
3 gsub(/^"|"$/,"",$3)
gawk> p $3
$3 = "\"sip contains \\x22Hello, world\\x5cx22\\x22\""
gawk> s
4 while ( match($3,/(\\x[0-9a-fA-F]{2})(.*)/,a) ) {
gawk> p $3
$3 = "sip contains \\x22Hello, world\\x5cx22\\x22"
This awk command:
awk -F ',' 'BEGIN {line=1} {print line "\n0" gsub(/\./, ",", $2) "0 --> 0" gsub(/\./, ",", $3) "0\n" $10 "\n"; line++}' file
is supposed to convert these lines:
Dialogue: 0,1:51:19.56,1:51:21.13,Default,,0000,0000,0000,,Hello!
into these:
1273
01:51:19.560 --> 01:51:21.130
Hello!
But somehow I'm not able to make gsub behave to replace the . by , and instead get 010 as both gsub results. Can anyone spot the issue?
Thanks
The return value from gsub is not the result from the substitution. It returns the number of substitutions it performed.
You want to gsub first, then print the modified string, which is the third argument you pass to gsub.
awk -F ',' 'BEGIN {line=1}
{ gsub(/\./, ",", $2);
gsub(/\./, ",", $3);
print line "\n0" $2 "0 --> 0" $3 "0\n" $10 "\n";
line++}' file
Another way is to use GNU awk's gensub instead of gsub:
$ awk -F ',' '
{
print NR ORS "0" gensub(/\./, ",","g", $2) "0 --> 0" gensub(/\./, ",","g",$3) "0" ORS $10 ORS
}' file
Output:
1
01:51:19,560 --> 01:51:21,130
Hello!
It's not as readable as the gsub solution by #tripleee but there is a place for it.
Also, I replace the line with builtin NR and \ns with ORS.
Can awk use variable operators for numerical comparison? The following code works with a hard coded operator, but not with a variable operator:
awk -v o="$operator" -v c="$comparison" '$1 o c'
No, that cannot work. Awk's -v option defines actual Awk variables, and not token-level macro substitutions.
It doesn't work for the same reason that this doesn't work:
awk 'BEGIN { o = "+"; print 2 o 2 }' # hoping for 2 + 2
Awk is different from the POSIX shell and similar languages; it doesn't evaluate variables by means of textual substitution.
Since you're calling Awk from a shell command line, you can use the shell's substitution to generate the Awk syntax, thereby obtaining that effect:
awk -v c="$comparison" "\$1 $operator c"
We now need a backslash on the $1 because we switched to double quotes, inside of which $1 is now recognized by the shell itself.
Another way to the one proposed by Kaz would be to define your own mapping function which takes the two variables as argument and the corresponding operator string o:
awk -v o="$operator" -v c="$comparison" '
function operator(arg1, arg2, op) {
if (op == "==") return arg1 == arg2
if (op == "!=") return arg1 != arg2
if (op == "<") return arg1 < arg2
if (op == ">") return arg1 > arg2
if (op == "<=") return arg1 <= arg2
if (op == ">=") return arg1 >= arg2
}
{ print operator($1,c,o) }'
This way you can also define your own operators.
No but you have a couple of options, the simplest being to let the shell expand one of the variables to become part of the awk script before awk runs on it:
$ operator='>'; comparison='3'
$ echo 5 | awk -v c="$comparison" '$1 '"$operator"' c'
5
Otherwise you can write your own eval-style function, e.g.:
$ cat tst.awk
cmp($1,o,c)
function cmp(x,y,z, cmd,line,ret) {
cmd = "awk \047BEGIN{print (" x " " y " " z ")}\047"
if ( (cmd | getline line) > 0 ) {
ret = line
}
close(cmd)
return ret
}
$ echo 5 | awk -v c="$comparison" -v o="$operator" -f tst.awk
5
See https://stackoverflow.com/a/54161251/1745001. The latter would work even if your awk program was saved in a file while the former would not. If you want to mix a library of functions with command line scripts then here's one way with GNU awk for -i:
$ cat tst.awk
function cmp(x,y,z, cmd,line,ret) {
cmd = "awk \047BEGIN{print (" x " " y " " z ")}\047"
if ( (cmd | getline line) > 0 ) {
ret = line
}
close(cmd)
return ret
}
$ awk -v c="$comparison" -v o="$operator" -i tst.awk 'cmp($1,o,c)'
5
I'm having trouble using toupper() inside a gawk sub(). I'm using the feature that & substitutes for the matched string.
$ gawk '{sub(/abc/, toupper("&")); print $0; }'
xabcx
xabcx
I expected:
xABCx
Variants with toupper() but without & and with & but without toupper() work:
$ gawk '{sub(/abc/, toupper("def")); print $0; }'
xabcx
xDEFx
$ gawk '{sub(/abc/, "-&-"); print $0; }'
xabcx
x-abc-x
It fails similarly with tolower(). Am I misunderstanding something about how & works?
(Tested with gawk 3.1.x and the latest, 4.1.3).
I think I see what's going on: the toupper function is being evaluated first, before sub constructs the replacement string.
So you get
sub(/abc/, toupper("def")) => sub(/abc/, "DEF")
and the not-so-useful
sub(/abc/, toupper("&")) => sub(/abc/, "&")
To get your desired results, you have to extract the match first, upper-case it, and then perform the substitution:
$ echo foobar | gawk '{sub(/o+/, toupper("&")); print}'
foobar
$ echo foobar | gawk '{
if (match($0, /o+/, m)) {
replacement = toupper(m[0])
sub(/o+/, replacement)
}
print
}'
fOObar
Alternatively, you don't need the sub, you can reconstruct the record thusly:
echo foobar | gawk '{
if (match($0, /o+/, m)) {
$0 = substr($0, 1, RSTART-1) toupper(m[0]) substr($0, RSTART+RLENGTH)
}
print
}'
I wonder if it is possible to delete a variable in awk. For an array, you can say delete a[2] and the index 2 of the array a[] will be deleted. However, for a variable I cannot find a way.
The closest I get is to say var="" or var=0.
But then, it seems that the default value of a non-existing variable is 0 or False:
$ awk 'BEGIN {if (b==0) print 5}'
5
$ awk 'BEGIN {if (!b) print 5}'
5
So I also wonder if it is possible to distinguish between a variable that is set to 0 and a variable that has not been set, because it seems not to:
$ awk 'BEGIN {a=0; if (a==b) print 5}'
5
There is no operation to unset/delete a variable. The only time a variable becomes unset again is at the end of a function call when it's an unused function argument being used as a local variable:
$ cat tst.awk
function foo( arg ) {
if ( (arg=="") && (arg==0) ) {
print "arg is not set"
}
else {
printf "before assignment: arg=<%s>\n",arg
}
arg = rand()
printf "after assignment: arg=<%s>\n",arg
print "----"
}
BEGIN {
foo()
foo()
}
$ awk -f tst.awk file
arg is not set
after assignment: arg=<0.237788>
----
arg is not set
after assignment: arg=<0.291066>
----
so if you want to perform some actions A then unset the variable X and then perform actions B, you could encapsulate A and/or B in functions using X as a local var.
Note though that the default value is zero or null, not zero or false, since its type is "numeric string".
You test for an unset variable by comparing it to both null and zero:
$ awk 'BEGIN{ if ((x=="") && (x==0)) print "y" }'
y
$ awk 'BEGIN{ x=0; if ((x=="") && (x==0)) print "y" }'
$ awk 'BEGIN{ x=""; if ((x=="") && (x==0)) print "y" }'
If you NEED to have a variable you delete then you can always use a single-element array:
$ awk 'BEGIN{ if ((x[1]=="") && (x[1]==0)) print "y" }'
y
$ awk 'BEGIN{ x[1]=""; if ((x[1]=="") && (x[1]==0)) print "y" }'
$ awk 'BEGIN{ x[1]=""; delete x; if ((x[1]=="") && (x[1]==0)) print "y" }'
y
but IMHO that obfuscates your code.
What would be the use case for unsetting a variable? What would you do with it that you can't do with var="" or var=0?
An unset variable expands to "" or 0, depending on the context in which it is being evaluated.
For this reason, I would say that it's a matter of preference and depends on the usage of the variable.
Given that we use a + 0 (or the slightly controversial +a) in the END block to coerce the potentially unset variable a to a numeric type, I guess you could argue that the natural "empty" value would be "".
I'm not sure that there's too much to read in to the cases that you've shown in the question, given the following:
$ awk 'BEGIN { if (!"") print }'
5
("" is false, unsurprisingly)
$ awk 'BEGIN { if (b == "") print 5 }'
5
(unset variable evaluates equal to "", just the same as 0)