AWK Curry Function - Can a function return a function? - gawk

I'm working on an AWK script that quite a big mess at the moment, and I'm trying to improve it (primarily because I want to improve my Awk scripting skillz)
I realize that there isn't a way to do Object Oriented Programming within Awk or Gawk, but is there a way to at least curry functions? As in returning a function from within a function? (Not return the result of the executed function, but rather return a function that can be executed)
I found a Stack Overflow post, where #GreenFox showed that its possible to execute a function with the name of the function stored in a variable. The example he posted is below:
function foo(s){print "Called foo "s}
function bar(s){print "Called bar "s}
{
var = "";
if(today_i_feel_like_calling_foo){
var = "foo";
}else{
var = "bar";
}
#var( "arg" ); # This calls function foo(), or function bar() with "arg"
}
What I'm wondering is if its possible to return a function from another function.
For example, a function that accepts a string that can be used in awks printf as a format, and returns a function that accepts two other arguments, and essentially executes printf( fmt_from_parent_func, sub_func_arg1, sub_func_arg2 ).
Here's my attempt at trying to accomplish the following:
#! /usr/local/bin/awk -f
function setFmt ( fmt ){
function _print ( var, val ){
printf ( fmt ? fmt : "%-15s: %s\n" ), str1, str2
}
return #_print
}
BEGIN {
fmtA = setFmt("%-5s: %s\n")
#fmtA("ONE","TWO")
}
Which results in the errors:
awk: ./curry.awk:4: function _print ( var, val ){
awk: ./curry.awk:4: ^ syntax error
awk: ./curry.awk:4: function _print ( var, val ){
awk: ./curry.awk:4: ^ syntax error
awk: ./curry.awk:6: printf ( fmt ? fmt : "%-15s: %s\n" ), str1, str2
awk: ./curry.awk:6: ^ unexpected newline or end of string
awk: ./curry.awk:11: fmtA = setFmt("%-5s: %s\n")
awk: ./curry.awk:11: ^ unexpected newline or end of string
awk: ./curry.awk:12: #fmtA("ONE","TWO")
awk: ./curry.awk:12: ^ unexpected newline or end of string
If anyone knows if this is at all possible (which Im starting to see myself), and knows a way to accomplish something to this effect.. that would be awesome.
Thanks!

With GNU awk you can return the name of a function as a string from another function but you can't declare a function within another function nor can you return a function (or an array) in any awk - all you can return from a function in awk is a scalar value (i.e. a number or string).
Is this what you're trying to do:
$ cat tst.awk
function _print ( var, val ){
printf _print_fmt, var, val
}
function setFmt ( fmt ){
_print_fmt = (fmt ? fmt : "%-15s: %s\n" )
return "_print"
}
BEGIN {
fmtA = setFmt("%-5s: %s\n")
#fmtA("ONE","TWO")
}
$ awk -f tst.awk
ONE : TWO

Related

AWK function w/ a variable number of arguments

How to define an AWK function w/ a variable number of arguments? I can emulate this w/ command line arguments:
awk 'BEGIN {for (i in ARGV) printf ARGV[i]" "}' 1 2 3
but BEGIN {for (i in ARGV) printf ARGV[i]" "} isn't a function (in AWK).
Currently I'm using MAWK (but can probably switch to GAWK if it would help)
NOTE: I cannot reveal the task (it's an exercise which I'm supposed to solve by myself)
There are no variadic functions in awk because they're not needed since you can just populate an array and pass that into a function:
$ cat tst.awk
BEGIN {
split("foo 17 bar",a)
foo(a)
}
function foo(arr, i,n) {
n = length(arr) # or loop incrementing n if length(arr) unsupported
for (i=1; i<=n; i++) {
printf "%s%s", arr[i], (i<n ? OFS : ORS)
}
}
$ awk -f tst.awk
foo 17 bar
or just define the function with a bunch of dummy argument names as #triplee mentions.
As per https://rosettacode.org/wiki/Variadic_function you can define a function with more arguments than you pass in; the ones you omit will come out as empty.
It's not clear from your example what you are actually trying to accomplish, but this is the answer to your actual question.
$ awk 'function handlemany(first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth, eleventh, twelfth) {
> print first, second, third, fourth, fifth, sixt, seventh, eighth, ninth, tenth, eleventh, twelfth
> }
> BEGIN { handlemany("one", "two", "three") }'
one two three
This is less than ideal, of course, but there is no support for proper variadic functions / varargs in the Awk language.

How to evaluate or process if statements in data?

Background
I wrote a bash script that pulls simple user functions from a PostgreSQL database, using awk converts pgplsql commands to SQL (like PERFORM function() to SELECT function(), removes comments --.*, etc.), stores the SQL commands to a file (file.sql) and reads and executes them in the database:
$ psql ... -f file.sql db
The functions are simple, mostly just calling other user defined functions. But how to "evaluate" or process an IF statement?:
IF $1 = 'customer1' THEN -- THESE $1 MEANS ARGUMENT TO PGPL/SQL FUNCTION
PERFORM subfunction1($1); -- THAT THIS IF STATEMENT IS IN:
ELSE -- SELECT function('customer1');
PERFORM subfunction2($1); -- $1 = 'customer1'
END IF;
Tl;dr:
IFs and such are not SQL so they should be pre-evaluated using awk. It's safe to assume that above is already processed into one record with comments removed:
IF $1 = 'customer1' THEN PERFORM subfunction1($1); ELSE PERFORM subfunction2($1); END IF;
After "evaluating" above should be replaced with:
SELECT subfunction1('customer1');
if the awk to evaluate it was called:
$ awk -v arg1="customer1' -f program.awk file.sql
or if arg1 is anything else, for example for customer2:
SELECT subfunction2('customer2');
Edit
expr popped into my mind first thing when I woke up:
$ awk -v arg="'customer1'" '
{
gsub(/\$1/,arg) # replace func arg with string
n=split($0,a,"(IF|THEN|ELSE|ELSE?IF|END IF;)",seps) # seps to get ready for SQL CASE
if(seps[1]=="IF") {
# here should be while for ELSEIF
c="expr " a[2]; c|getline r; close(c) # use expr to solve
switch (r) { # expr has 4 return values
case "1": # match
print a[3]
break
case "0": # no match
print a[4]
break
default: # (*) see below
print r
exit # TODO
} } }' file.sql
(*) expr outputs 0,1,2 or 3:
$ expr 1 = 1
1
$ expr 1 = 2
0
However, if you omit spaces:
$ expr 1=1
1=1
Without writing a full language parser, if you're looking for something cheap and cheerful then this might be a decent starting point:
$ cat tst.awk
{ gsub(/\$1/,"\047"arg1"\047") }
match($0,/^IF\s+(\S+)\s+(\S+)\s+(\S+)\s+THEN\s+(\S+)\s+(\S+)\s+ELSE\s+(\S+)\s+(\S+)\s+END\s+IF/,a) {
lhs = a[1]
op = a[2]
rhs = a[3]
trueAct = (a[4] == "PERFORM" ? "SELECT" : a[4]) FS a[5]
falseAct = (a[6] == "PERFORM" ? "SELECT" : a[6]) FS a[7]
if (op == "=") {
print (lhs == rhs ? trueAct : falseAct)
}
}
$ awk -v arg1='customer1' -f tst.awk file
SELECT subfunction1('customer1');
$ awk -v arg1='bob' -f tst.awk file
SELECT subfunction2('bob');
The above uses GNU awk for the 3rd arg to match(). Hopefully it's easy enough to understand that you can massage as needed to handle other constructs or other variations of this construct.

gsub for substituting translations not working

I have a dictionary dict with records separated by ":" and data fields by new lines, for example:
:one
1
:two
2
:three
3
:four
4
Now I want awk to substitute all occurrences of each record in the input
file, eg
onetwotwotwoone
two
threetwoone
four
My first awk script looked like this and works just fine:
BEGIN { RS = ":" ; FS = "\n"}
NR == FNR {
rep[$1] = $2
next
}
{
for (key in rep)
grub(key,rep[key])
print
}
giving me:
12221
2
321
4
Unfortunately another dict file contains some character used by regular expressions, so I have to substitute escape characters in my script. By moving key and rep[key] into a string (which can then be parsed for escape characters), the script will only substitute the second record in the dict. Why? And how to solve?
Here's the current second part of the script:
{
for (key in rep)
orig=key
trans=rep[key]
gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig)
gsub(orig,trans)
print
}
All scripts are run by awk -f translate.awk dict input
Thanks in advance!
Your fundamental problem is using strings in regexp and backreference contexts when you don't want them and then trying to escape the metacharacters in your strings to disable the characters that you're enabling by using them in those contexts. If you want strings, use them in string contexts, that's all.
You won't want this:
gsub(regexp,backreference-enabled-string)
You want something more like this:
index(...,string) substr(string)
I think this is what you're trying to do:
$ cat tst.awk
BEGIN { FS = ":" }
NR == FNR {
if ( NR%2 ) {
key = $2
}
else {
rep[key] = $0
}
next
}
{
for ( key in rep ) {
head = ""
tail = $0
while ( start = index(tail,key) ) {
head = head substr(tail,1,start-1) rep[key]
tail = substr(tail,start+length(key))
}
$0 = head tail
}
print
}
$ awk -f tst.awk dict file
12221
2
321
4
Never mind for asking....
Just some missing parentheses...?!
{
for (key in rep)
{
orig=key
trans=rep[key]
gsub(/[\]\[^$.*?+{}\\()|]/, "\\\\&", orig)
gsub(orig,trans)
}
print
}
works like a charm.

MAWK: Store match() in variable

I try to use MAWK where the match() built-in function doesn't have a third value for variable:
match($1, /9f7fde/) {
substr($1, RSTART, RLENGTH);
}
See doc.
How can I store this output into a variable named var when later I want to construct my output like this?
EDIT2 - Complete example:
Input file structure:
<iframe src="https://vimeo.com/191081157" frameborder="0" height="481" width="608" scrolling="no"></iframe>|Random title|Uploader|fun|tag1,tag2,tag3
<iframe src="https://vimeo.com/212192268" frameborder="0" height="481" width="608" scrolling="no"></iframe>|Random title|Uploader|fun|tag1,tag2,tag3
parser.awk:
{
Embed = $1;
Title = $2;
User = $3;
Categories = $4;
Tags = $5;
}
BEGIN {
FS="|";
}
# Regexp without pattern matching for testing purposes
match(Embed, /191081157/) {
Id = substr(Embed, RSTART, RLENGTH);
}
{
print Id"\t"Title"\t"User"\t"Categories"\t"Tags;
}
Expected output:
191081157|Random title|Uploader|fun|tag1,tag2,tag3
I want to call the Id variable outside the match() function.
MAWK version:
mawk 1.3.4 20160930
Copyright 2008-2015,2016, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan
random-funcs: srandom/random
regex-funcs: internal
compiled limits:
sprintf buffer 8192
maximum-integer 2147483647
The obvious answer would seem to be
match($1, /9f7fde/) { var = "9f7fde"; }
But more general would be:
match($1, /9f7fde/) { var = substr($1, RSTART, RLENGTH); }
UPDATE : The solution above mine could be simplified to :
from
match($1, /9f7fde/) { var = substr($1, RSTART, RLENGTH) }
to
{ __=substr($!_,match($!_,"9f7fde"),RLENGTH) }
A failed match would have RLENGTH auto set to -1, so nothing gets substring'ed out.
But even that is too verbose : since the matching criteria is a constant string, then simply
mawk '$(_~_)~_{__=_}' \_='9f7fde'
============================================
let's say this line
.....vimeo.com/191081157" frameborder="0" height="481" width="608" scrolling="no">Random title|Uploader|fun|tag1,tag2,tag3
{mawk/mawk2/gawk} 'BEGIN { OFS = "";
FS = "(^.+vimeo[\056]com[\057]|[\042] frameborder.+[\057]iframe[>])" ;
} (NF < 4) || ($2 !~ /191081157/) { next } ( $1 = $1 )'
\056 is the dot ( . ) \057 is forward slash ( / ) and \042 is double straight quote ( " )
if it can't even match at all, move onto next row. otherwise, use the power of the field separator to gobble away all the unneeded parts of the line. The $1 = $1 will collect the prefix and the rest of the HTML tags you don't need.
The assignment operation of $1 = $1 will also return true, providing the input for boolean evaluation for it to print. This way, you don't need either match( ) or substr( ) at all.

AWK -- How to assign a variable's value from matching regex which comes later?

While I have this awk script,
/regex2/{
var = $1}
/regex1/{
print var}
which I executed over input file:
regex1
This should be assigned as variable regex2
I got no printed output. The desired output is: "This" to be printed out.
I might then think to utilize BEGIN:
BEGIN{
/regex2/
var = $1}
/regex1/{
print var}
But apparently BEGIN cannot accommodate regex matching function. Any suggestion to this?
This would achieve the desired result:
awk '/regex2/ { print $1 }'
Otherwise, you'll need to read the file twice and perform something like the following. It will store the last occurrence of /regex2/ in var. Upon re-reading the file, it will print var for each occurrence of /regex1/. Note that you'll get an empty line in the output and the keyword 'This' on a newline:
awk 'FNR==NR && /regex2/ { var = $1; next } /regex1/ { print var }' file.txt{,}
Typically this sort of thing is done with a flag:
/regex1/ { f = 1 }
f && /regex2/ { var = $1; f = 0 }