How to use the eval statement in (g)awk? - awk

Due to that awk does not seem to have callbacks, I was planning to use the eval statement for this. So I had a look at the GNU user guide,
https://www.gnu.org/software/gawk/manual/html_node/Viewing-And-Changing-Data.html
and then wrote this simple script.
BEGIN {
args[1]="\"a\""
args[2]="\"b\""
args[3]="\"c\""
args[4]="\"\""
run_callback("printargs",args)
print args[4]
}
function run_callback(callback,args)
{
nargs=length(args)
if (nargs>0)
{
argstring=args[1]
for (argn=2;argn<=nargs;argn++)
{
argstring=argstring","args[argn]
}
}
callbackstr = callback"("argstring")"
print callbackstr
eval callbackstr
}
function printargs(arg1,arg2,arg3,res)
{
res=arg1","arg2","arg3
print "res="res
}
However, the printout is not what I expected. I get this,
[~]-> gawk -f callback.awk
printargs(a,b,c,"")
""
And not the expected,
[~]-> gawk -f callback.awk
printargs(a,b,c,"")
res=a,b,c
"Not sure what is supposed to be here, but it is not relevant."
It feels as if nothing actually happens inside the eval statement. Anyone who knows what happens here?
gawk version is 4.1.3
BR
Patrik

That's in the documentation for the gawk debugger. It's not a normal gawk function.
However, gawk does support calling a function whose name is in a string with the #var(args,...) notation (More information in the documentation):
BEGIN {
args[1]="a"
args[2]="b"
args[3]="c"
args[4]="\"\""
run_callback("printargs",args[1],args[2],args[3],args[4])
print args[4]
}
function run_callback(callback,arg1,arg2,arg3,res)
{
#callback(arg1,arg2,arg3,res);
}
function printargs(arg1,arg2,arg3,res)
{
res=arg1","arg2","arg3
print "res="res
}
when run will print out
res=a,b,c
""
Note that args[4] isn't modified from this. From the documentation on function argument passing convention:
Instead, the passing convention is determined at runtime when the function is called, according to the following rule: if the argument is an array variable, then it is passed by reference. Otherwise, the argument is passed by value.
If you passed args directly and modified elements of it in the callback, you'd see the changes reflected.

awk doesn't have any eval keyword.
This can be check with gawk --dump-variables option
gawk --dump-variables -f callback.awk
It outputs the file awkvars.out and you'll find in it:
eval: uninitialized scalar

Related

How to write awk function

I have a quick file file.awk with
function attr(attrname,str, a) {
if (!str) str=$0
match(str,"#" attrname "=([^,/]*)",a)
return a[1]
}
I am getting an error
awk: file.awk: line 139: syntax error at or near ,
where line 139 is the line with match()
Any idea whats wrong with the syntax?
You're trying to use the non-POSIX 3rd arg to match() but not using GNU awk which supports it. See https://www.gnu.org/software/gawk/manual/gawk.html#String-Functions.
As always, Ed Morton has it correct. The only reason I'm posting is that I can share with MacOS users that the awk man page includes the answer.
man awk yields:
match(s, r)
the position in s where the regular expression r occurs, or 0 if it does not. The variables RSTART and RLENGTH are set to the position and
length of the matched string.
So it is very clear that your call has an extra parameter.
I tried runnning awk with your function on MacOS and got this error:
awk: syntax error at source line 1 in function attr source file
context is
match(str,"#" attrname >>> "=([^,/]*)", <<<
The "<<<" is indicating awk doesn't like that comma. This is similar to your error message:
syntax error at or near ,
You absolutely don't need the match() function's capture array to get the attribute value you're seeking :
echo 'april#token=sha256hmackey/0.ts' \
\
| mawk 'function attr(___,_,__,____) { ____="\32\21"
if(_=="") {
_=$(_<_) }
return \
sub("[#]"(___)"=[^,\\/]*",____"&"____,_) \
\
? substr(_=__[split(_,__,____)-\
(_~_)],index(_,"=")+(_~"")) : ""
} {
print attr("token") }'
sha256hmackey

Awk syntax error due to improper placement of the function definition

It is embarrassing, but could someone briefly explain why this gives syntax error?
echo why this fails? | gawk '{
function why(fail) {
print fail
}
why($0)
}'
The function definition has to be at the top level. You have it inside the {...}.
echo This works | gawk '
function why(fail) {
print fail
}
{
why($0)
}'

AWK multiple patterns substitution

Using AWK I'd like to process this text:
#replace count 12
in in
#replace in 77
main()
{printf("%d",count+in);
}
Into:
in in
ma77()
{pr77tf("%d",12+77);
}
When a '#replace' declaration occurs, only the code below it is affected. I've got:
/#replace/ { co=$2; czym=$3 }
!/#replace/ { gsub(co,czym); print }
However I'm getting only
in in
ma77()
{pr77tf("%d",count+77);
}
in return. As you can see only the second gsub works. Is there a simple way to remeber all the substitutions?
You just need to use an array to store the substitutions:
$ awk '/#replace/{a[$2]=$3;next}{for(k in a)gsub(k,a[k])}1' file
in in
ma77()
{pr77tf("%d",12+77);
}

Break down JSON string in simple perl or simple unix?

ok so i have have this
{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}
and at the moment i'm using this shell command to decode it to get the string i need,
echo $x | grep -Po '"utterance":.*?[^\\]"' | sed -e s/://g -e s/utterance//g -e 's/"//g'
but this only works when you have a grep compiled with perl and plus the script i use to get that JSON string is written in perl, so is there any way i can do this same decoding in a simple perl script or a simpler unix command, or better yet, c or objective-c?
the script i'm using to get the json is here, http://pastebin.com/jBGzJbMk and if you want a file to use then download http://trevorrudolph.com/a.flac
How about:
perl -MJSON -nE 'say decode_json($_)->{hypotheses}[0]{utterance}'
in script form:
use JSON;
while (<>) {
print decode_json($_)->{hypotheses}[0]{utterance}, "\n"
}
Well, I'm not sure if I can deduce what you are after correctly, but this is a way to decode that JSON string in perl.
Of course, you'll need to know the data structure in order to get the data you need. The line that prints the "utterance" string is commented out in the code below.
use strict;
use warnings;
use Data::Dumper;
use JSON;
my $json = decode_json
q#{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}#;
#print $json->{'hypotheses'}[0]{'utterance'};
print Dumper $json;
Output:
$VAR1 = {
'status' => 0,
'hypotheses' => [
{
'utterance' => 'hello how are you',
'confidence' => '0.96311796'
}
],
'id' => '7aceb216d02ecdca7ceffadcadea8950-1'
};
Quick hack:
while (<>) {
say for /"utterance":"?(.*?)(?<!\\)"/;
}
Or as a one-liner:
perl -lnwe 'print for /"utterance":"(.+?)(?<!\\)"/g' inputfile.txt
The one-liner is troublesome if you happen to be using Windows, since " is interpreted by the shell.
Quick hack#2:
This will hopefully go through any hash structure and find keys.
my $json = decode_json $str;
say find_key($json, 'utterance');
sub find_key {
my ($ref, $find) = #_;
if (ref $ref) {
if (ref $ref eq 'HASH' and defined $ref->{$find}) {
return $ref->{$find};
} else {
for (values $ref) {
my $found = find_key($_, $find);
if (defined $found) {
return $found;
}
}
}
}
return;
}
Based on the naming, it's possible to have multiple hypotheses. The prints the utterance of each hypothesis:
echo '{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}' | \
perl -MJSON::XS -n000E'
say $_->{utterance}
for #{ JSON::XS->new->decode($_)->{hypotheses} }'
Or as a script:
use feature qw( say );
use JSON::XS;
my $json = '{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}';
say $_->{utterance}
for #{ JSON::XS->new->decode($json)->{hypotheses} };
If you don't want to use any modules from CPAN and try a regex instead there are multiple variants you can try:
# JSON is on a single line:
$json = '{"other":"stuff","hypo":[{"utterance":"hi, this is \"bob\"","moo":0}]}';
# RegEx with negative look behind:
# Match everything up to a double quote without a Backslash in front of it
print "$1\n" if ($json =~ m/"utterance":"(.*?)(?<!\\)"/)
This regex works if there is only one utterance. It doesn't matter what else is in the string around it, since it only searches for the double quoted string following the utterance key.
For a more robust version you could add whitespace where necessary/possible and make the . in the RegEx match newlines: m/"utterance"\s*:\s*"(.*?)(?<!\\)"/s
If you have multiple entries for the utterance confidence hash/object, changing case and weird formatting of the JSON string try this:
# weird JSON:
$json = <<'EOJSON';
{
"status":0,
"id":"an ID",
"hypotheses":[
{
"UtTeraNcE":"hello my name is \"Bob\".",
"confidence":0.0
},
{
'utterance' : 'how are you?',
"confidence":0.1
},
{
"utterance"
: "
thought
so!
",
"confidence" : 0.9
}
]
}
EOJSON
# RegEx with alternatives:
print "$1\n" while ( $json =~ m/["']utterance["']\s*:\s*["'](([^\\"']|\\.)*)["']/gis);
The main part of this RegEx is "(([^\\"]|\\.)*)". Description in detail as extended regex:
/
["'] # opening quotes
( # start capturing parentheses for $1
( # start of grouping alternatives
[^\\"'] # anything that's not a backslash or a quote
| # or
\\. # a backslash followed by anything
) # end of grouping
* # in any quantity
) # end capturing parentheses
["'] # closing quotes
/xgs
If you have many data sets and speed is a concern you can add the o modifier to the regex and use character classes instead of the i modifier. You can suppress the capturing of the alternatives to $2 with clustering parenthesis (?:pattern). Then you get this final result:
m/["'][uU][tT][tT][eE][rR][aA][nN][cC][eE]["']\s*:\s*["']((?:[^\\"']|\\.)*)["']/gos
Yes, sometimes perl looks like a big explosion in a bracket factory ;-)
Just stubmled upon another nice method of doing this, i finaly found how to acsess the Mac OS X JavaScript engine form commandline, heres the script,
alias jsc='/System/Library/Frameworks/JavaScriptCore.framework/Versions/A/Resources/jsc'
x='{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}'
jsc -e "print(${x}['hypotheses'][0]['utterance'])"
Ugh, yes i came up with another answer, im strudying python and it reads arrays in both its python format and the same format as a json so, i jsut made this one liner when your variable is x
python -c "print ${x}['hypotheses'][0]['utterance']"
figured it out for unix but would love to see your perl and c, objective-c answers...
echo $X | sed -e 's/.*utterance//' -e 's/confidence.*//' -e s/://g -e 's/"//g' -e 's/,//g'
:D
shorter copy of the same sed:
echo $X | sed -e 's/.*utterance//;s/confidence.*//;s/://g;s/"//g;s/,//g'

Using a variable defined inside AWK

I got this piece of script working. This is what i wanted:
input
3.76023 0.783649 0.307724 8766.26
3.76022 0.764265 0.307646 8777.46
3.7602 0.733251 0.30752 8821.29
3.76021 0.752635 0.307598 8783.33
3.76023 0.79528 0.307771 8729.82
3.76024 0.814664 0.307849 8650.2
3.76026 0.845679 0.307978 8802.97
3.76025 0.826293 0.307897 8690.43
with script
!/bin/bash
awk -F ', ' '
{
for (i=3; i<=10; i++) {
if (i==NR) {
npc1[i]=sprintf("%s", $1);
npc2[i]=sprintf("%s", $2);
npc3[i]=sprintf("%s", $3);
npRs[i]=sprintf("%s", $4);
print npc1[i],npc2[i],\
npc3[i], npc4[i];
}
}
} ' p_walls.raw
echo "${npc1[100]}"
But now I can't use those arrays npc1[i], outside awk. That last echo prints nothing. Isnt it possible or am I missing something?
AWK is a separate process, after it finishes all internal data is gone. This is true for all external processes/commands. Bash only sees what bash builtins touch.
i is never 100, so why do you want to access npc1[100]?
What are you really trying to do? If you rewrite the question we might be able to help...
(Cherry on the cake is always good!)
Sorry, but all of #yi_H 's answer and comments above are correct.
But there's really no problem loading 2 sets of data into 2 separate arrays in awk, ie.
awk '{
if (FILENAME == "file1") arr1[i++]=$0 ;
#same for file2; }
END {
f1max=++i; f2max=++j;
for (i=1;i<f1max;i++) {
arr1[i]
# put what you need here for arr1 processing
#
# dont forget that you can do things like
if (arr1[i] in arr2) { print arr1[i]"=arr2[arr1["i"]=" arr2[arr1[i]] }
}
for j=1;j<f2max;j++) {
arr2[j]
# and here for arr2
}
}' file1 file2
You'll have to fill the actual processing for arr1[i] and arr2[j].
Also, get an awk book for the weekend and be up and running by Monday. It's easy. You can probably figure it out from grymoire.com/Unix/awk.html
I hope this helps.