AWK - execute string as command? - awk

This command prints:
$ echo "123456789" | awk '{ print substr ($1,1,4) }'
1234
Is it possible to execute a string as command? For example, this command:
echo "123456789" | awk '{a="substr"; print a ($1,1,4) }'
Result:
$ echo "123456789" | awk '{a="substr"; print a ($1,1,4) }'
awk: {a="substr"; print a ($1,1,4) }
awk: ^ syntax error
EDIT:
$ cat tst.awk
function my_substr(x,y,z) { return substr(x,y,z) }
{ a="my_substr"; print #a($1,1,4) }
bolek#bolek-desktop:~/Pulpit$ echo "123456789" | gawk -f tst.awk
gawk: tst.awk:3: { a="my_substr"; print #a($1,1,4) }
gawk: tst.awk:3: ^ nieprawidłowy znak '#' w wyrażeniu
bolek#bolek-desktop:~/Pulpit$

I don't think it is possible to do that in awk directly, but you can get a similar effect by using the shell. Recall that the awk program is given as a string, and strings are concatenated in the shell just by writing them next to one another. Thus, you can do this:
a=substr
echo "123456789" | awk '{ print '"$a"'($1, 1, 4) }'
resulting in 1234.

You can call user-defined functions via variables in GNU awk using indirect function calls, see http://www.gnu.org/software/gawk/manual/gawk.html#Indirect-Calls
$ cat tst.awk
function foo() { print "foo() called" }
function bar() { print "bar() called" }
BEGIN {
the_func = "foo"
#the_func()
the_func = "bar"
#the_func()
}
$ gawk -f tst.awk
foo() called
bar() called
Unfortunately due to internal implementation issues, if you want to call builtin functions that way then you need to write a wrapper for each:
$ cat tst.awk
function my_substr(x,y,z) { return substr(x,y,z) }
{ a="my_substr"; print #a($1,1,4) }
$ echo "123456789" | gawk -f tst.awk
1234

Related

Why double quote does not work in echo statement inside cmd in awk script?

gawk 'BEGIN { FS="|"; OFS="|" }NR ==1 {print} NR >=2 {cmd1="echo -n "$2" | base64 -w 0";cmd1 | getline d1;close(cmd1); print $1,d1 }' dummy2.txt
input:
id|dummy
1|subhashree:1;user=phn
2|subha:2;user=phn
Expected output:
id|dummy
1|c3ViaGFzaHJlZToxO3VzZXI9cGhuCg==
2|c3ViaGE6Mjt1c2VyPXBobgo=
output produced by script:
id|dummy
1|subhashree:1
2|subha:2
I have understood that the double quote around $2 is causing the issue. It does not work hence not encoding the string properly and just stripping off the string after semi colon.Because it does work inside semicolon and gives proper output in terminal.
echo "subhashree:1;user=phn" | base64
c3ViaGFzaHJlZToxO3VzZXI9cGhuCg==
[root#DERATVIV04 encode]# echo "subha:2;user=phn" | base64
c3ViaGE6Mjt1c2VyPXBobgo=
I have tried with different variation with single and double quote inside awk but it does not work.Any help will be highly appreciated.
Thanks a lot in advance.
Your existing cmd1 producing
echo -n subhashree:1;user=phn | base64 -w 0
^ semicolon is there
So if you execute below would produce
$ echo -n subhashree:1;user=phn | base64 -w 0
subhashree:1
With quotes
$ echo -n 'subhashree:1;user=phn' | base64 -w 0
c3ViaGFzaHJlZToxO3VzZXI9cGhu
Solution is just to use quotes before echo -n '<your-string>' | base64 -w 0
$ cat file
id|dummy
1|subhashree:1;user=phn
2|subha:2;user=phn
$ gawk -v q="'" 'BEGIN { FS="|"; OFS="|" }NR ==1 {print} NR >=2 {cmd1="echo -n " q $2 q" | base64 -w 0"; cmd1 | getline d1;close(cmd1); print $1,d1 }' file
id|dummy
1|c3ViaGFzaHJlZToxO3VzZXI9cGhu
2|c3ViaGE6Mjt1c2VyPXBobg==
It can be simplified as below
gawk -v q="'" 'BEGIN {
FS=OFS="|"
}
NR==1{
print;
next
}
{
cmd1="echo -n " q $2 q" | base64 -w 0";
print ((cmd1 | getline d1)>0)? $1 OFS d1 : $0;
close(cmd1);
}
' file
Based on Ed Morton recommendation http://awk.freeshell.org/AllAboutGetline
if/while ( (getline var < file) > 0)
if/while ( (command | getline var) > 0)
if/while ( (command |& getline var) > 0)
The problem is because of lack of quotes, when trying to run the echo command in shell context. What you are trying to do is basically converted into
echo -n subhashree:1;user=phn | base64 -w 0
which the shell has executed as two commands separated by ; i.e. user=phn | base64 -w 0 means an assignment followed by a pipeline, which would be empty because the assignment would not produce any result over standard input for base64 for encode. The other segment subhashree:1 is just echoed out, which is stored in your getline variable d1.
The right approach fixing your problem should be using quotes
echo -n "subhashree:1;user=phn" | base64 -w 0
When you said, you were using quotes to $2, that is not actually right, the quotes are actually used in the context of awk to concatenate the cmd string i.e. "echo -n ", $2 and " | base64 -w 0" are just joined together. The proposed double quotes need to be in the context of the shell.
SO with that and few other fixes, your awk command should be below. Added gsub() to remove trailing spaces, which were present in your input shown. Also used printf over echo.
awk -v FS="|" '
BEGIN {
OFS = FS
}
NR == 1 {
print
}
NR >= 2 {
gsub(/[[:space:]]+/, "", $2)
cmd = "printf \"%s\" \"" $2 "\" | base64 -w 0"
if ((cmd | getline result) > 0) {
$2 = result
}
close(cmd)
print
}
' file
So with the command above, your command is executed as below, which would produce the right result.
printf "%s" "subhashree:1;user=phn" | base64 -w 0
You already got answers explaining how to use awk for this but you should also consider not using awk for this. The tool to sequence calls to other commands (e.g. bas64) is a shell, not awk. What you're trying to do in terms of calls is:
shell { awk { loop_on_input { shell { base64 } } } }
whereas if you call base64 directly from shell it'd just be:
shell { loop_on_input { base64 } }
Note that the awk command is spawning a new subshell once per line of input while the direct call from shell isn't.
For example:
#!/usr/bin/env bash
file='dummy2.txt'
head -n 1 "$file"
while IFS='|' read -r id dummy; do
printf '%s|%s\n' "$id" "$(base64 -w 0 <<<"$dummy")"
done < <(tail -n +2 "$file")
Here's the difference in execution speed for an input file that has each of your data lines duplicated 100 times created by awk -v n=100 'NR==1{print; next} {for (i=1;i<=n;i++) print}' dummy2.txt > file100
$ ./tst.sh file100
Awk:
real 0m23.247s
user 0m3.755s
sys 0m10.966s
Shell:
real 0m14.512s
user 0m1.530s
sys 0m4.776s
The above timing was produced by running this command (both awk scripts posted in answers will have about the same timeing so I just picked one at random):
#!/usr/bin/env bash
doawk() {
local file="$1"
gawk -v q="'" 'BEGIN {
FS=OFS="|"
}
NR==1{
print;
next
}
{
cmd1="echo -n " q $2 q" | base64 -w 0";
print ((cmd1 | getline d1)>0)? $1 OFS d1 : $0;
close(cmd1);
}
' "$file"
}
doshell() {
local file="$1"
head -n 1 "$file"
while IFS='|' read -r id dummy; do
printf '%s|%s\n' "$id" "$(base64 -w 0 <<<"$dummy")"
done < <(tail -n +2 "$file")
}
# Use 3rd-run timing to eliminate cache-ing as a factor
doawk "$1" >/dev/null
doawk "$1" >/dev/null
echo "Awk:"
time doawk "$1" >/dev/null
echo ""
doshell "$1" >/dev/null
doshell "$1" >/dev/null
echo "Shell:"
time doshell "$1" >/dev/null

gawk sub() with ampersand and toupper() not working

I'm having trouble using toupper() inside a gawk sub(). I'm using the feature that & substitutes for the matched string.
$ gawk '{sub(/abc/, toupper("&")); print $0; }'
xabcx
xabcx
I expected:
xABCx
Variants with toupper() but without & and with & but without toupper() work:
$ gawk '{sub(/abc/, toupper("def")); print $0; }'
xabcx
xDEFx
$ gawk '{sub(/abc/, "-&-"); print $0; }'
xabcx
x-abc-x
It fails similarly with tolower(). Am I misunderstanding something about how & works?
(Tested with gawk 3.1.x and the latest, 4.1.3).
I think I see what's going on: the toupper function is being evaluated first, before sub constructs the replacement string.
So you get
sub(/abc/, toupper("def")) => sub(/abc/, "DEF")
and the not-so-useful
sub(/abc/, toupper("&")) => sub(/abc/, "&")
To get your desired results, you have to extract the match first, upper-case it, and then perform the substitution:
$ echo foobar | gawk '{sub(/o+/, toupper("&")); print}'
foobar
$ echo foobar | gawk '{
if (match($0, /o+/, m)) {
replacement = toupper(m[0])
sub(/o+/, replacement)
}
print
}'
fOObar
Alternatively, you don't need the sub, you can reconstruct the record thusly:
echo foobar | gawk '{
if (match($0, /o+/, m)) {
$0 = substr($0, 1, RSTART-1) toupper(m[0]) substr($0, RSTART+RLENGTH)
}
print
}'

Redirect input for gawk to a system command

Usually a gawk script processes each line of its stdin. Is it possible to instead specify a system command in the script use the process each line from output of the command in the rest of the script?
For example consider the following simple interaction:
$ { echo "abc"; echo "def"; } | gawk '{print NR ":" $0; }'
1:abc
2:def
I would like to get the same output without using pipe, specifying instead the echo commands as a system command.
I can of course use the pipe but that would force me to either use two different scripts or specify the gawk script inside the bash script and I am trying to avoid that.
UPDATE
The previous example is not quite representative of my usecase, this is somewhat closer:
$ { echo "abc"; echo "def"; } | gawk '/d/ {print NR ":" $0; }'
2:def
UPDATE 2
A shell script parallel would be as follows. Without the exec line the script would read from stdin; with the exec it would use the command that line as input:
/tmp> cat t.sh
#!/bin/bash
exec 0< <(echo abc; echo def)
while read l; do
echo "line:" $l
done
/tmp> ./t.sh
line: abc
line: def
From all of your comments, it sounds like what you want is:
$ cat tst.awk
BEGIN {
if ( ("mktemp" | getline file) > 0 ) {
system("(echo abc; echo def) > " file)
ARGV[ARGC++] = file
}
close("mktemp")
}
{ print FILENAME, NR, $0 }
END {
if (file!="") {
system("rm -f \"" file "\"")
}
}
$ awk -f tst.awk
/tmp/tmp.ooAfgMNetB 1 abc
/tmp/tmp.ooAfgMNetB 2 def
but honestly, I wouldn't do it. You're munging what the shell is good at (creating/destroying files and processes) with what awk is good at (manipulating text).
I believe what you're looking for is getline:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ print line} }' <<< ''
abc
def
Adjusting the answer to you second example:
awk '{ while ( ("echo abc; echo def" | getline line) > 0){ counter++; if ( line ~ /d/){print counter":"line} } }' <<< ''
2:def
Let's break it down:
awk '{
cmd = "echo abc; echo def"
# line below will create a line variable containing the ouptut of cmd
while ( ( cmd | getline line) > 0){
# we need a counter because NR will not work for us
counter++;
# if the line contais the letter d
if ( line ~ /d/){
print counter":"line
}
}
}' <<< ''
2:def

How to translate a column value in the file using awk with tr command in unix

Details:
Input file : file.txt
P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2
Expected output:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
If i try using a variable it is giving proper result.
(i.e) /tmp>echo "P123456789"|tr "0-9" "5-9"|tr "A-Z" "X-Z"
Z678999999
But if i do with awk command it is not giving result instead giving error:
/tmp>$ awk 'BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }' /tmp/file.txt >/tmp/file.txt.tmp
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
Can anyone help please?
just do what you wanted, without changing your logic:
awk line:
awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
with your data:
kent$ echo "P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2"|awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
$ cat tst.awk
function tr(old,new,str, oldA,newA,strA,i,j) {
split(old,oldA,"")
split(new,newA,"")
split(str,strA,"")
str = ""
for (i=1;i in strA;i++) {
for (j=1;(j in oldA) && !sub(oldA[j],newA[j],strA[i]);j++)
;
str = str strA[i]
}
return str
}
BEGIN { FS=OFS="," }
{ print tr("P012345678","Z567899999",$1), $2 }
$ awk -f tst.awk file
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
Unfortunately, AWK does not have a built in translation function. You could write one like Ed Morton has done, but I would reach for (and highly recommend) a more powerful tool. Perl, for example, can process fields using the autosplit (-a) command switch:
-a turns on autosplit mode when used with a -n or -p. An implicit split command to the #F array is done as the first thing inside the
implicit while loop produced by the -n or -p.
You can type perldoc perlrun for more details.
Here's my solution:
perl -F, -lane '$F[0] =~ tr/0-9/5-9/; $F[0] =~ tr/A-Z/X-Z/; print join (",", #F)' file.txt
Results:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2

awk - how to specify field separator as binary value 0x1

Is it possible to specify the separator field FS in binary for awk?
I have data file with ascii data fields but separated by binary delimiter 0x1.
If it was character '1' it would look like this:
awk -F1 '/FIELD/ { print $1 }'
Or in script:
#!/bin/awk -f
BEGIN { FS = "1" }
/FIELD/ { print $1 }
How can I specify FS/F to be 0x1.
#!/bin/awk -f
BEGIN { FS = "\x01" }
/FIELD/ { print $1 }
See http://www.gnu.org/manual/gawk/html_node/Escape-Sequences.html.
awk -F '\x01' '/FIELD/ { print $1 }'
works on mawk, gawk, or nawk :
awk -F'\1' '{ … }'
awks can properly decipher the octal code without needing help from the shell like
awk -F$'\1' '{ … }'