Can AWK call an external program during processing? - awk

Is it possible for AWK to call an external program during processing - passing it arguments and getting information returned - only to continue processing after the execution of the external program is complete ?

yes, there are two ways to call external commands, system() and getline. you can get the returned text by using getline, system() will give you the return code of the external cmd. see this example:
kent$ awk 'BEGIN{"wc -l /etc/passwd"|getline var; print var}'
20 /etc/passwd
this example called the external cmd wc -l /etc/passwd, and assigned the returned value to awk variable: var.

Yes, here is an example:
awk '
BEGIN {
"date +%Y" | getline
print "The year is "$0
}'
output:
The year is 2014

The system function can call an external command but only returns the exit code. You will have to redirect the output of the program you are calling and then read in that file.
retcode = system("command > file.out")
file="file.out";
while(( getline line < file ) > 0 ) {
print line
}

Related

How can I send the output of an AWK script to a file?

Within an AWK script, I'm needing to send the output of the script to a file while also printing it to the terminal. Is there a nice and tidy way I can do this without having a copy of every print redirect to the file?
I'm not particularly good at making SSCCE examples but here's my attempt at demonstrating my problem;
BEGIN{
print "This is an awk script"
# I don't want to have to do this for every print
print "This is an awk script" > thisiswhack.out
}
{
# data manip. stuff here
# ...
print "%s %s %s" blah, blah blah
# I don't want to have to do this for every print again
print "%s %s %s" blah blah blah >> thisiswhack.out
}
END{
print "Yay we're done!"
# Seriously, there has to be a better way to do this within the script
print "Yay we're done!" >> thisiswhack.out
}
Surely there must be a way to send the entire output of the script to an output file within the script itself, right?
The command to duplicate streams is tee, and we can use it inside awk:
awk '
BEGIN {tee = "tee out.txt"}
{print | tee}' in.txt
This invokes tee with the file argument out.txt, and opens a stream to this command.
The stream (and therefore tee) remains open until awk exits, or close(tee) is called.
Every time print | tee is used, the data is printed to that stream. tee then appends this data both to the file out.txt, and stdout.
The | command feature is POSIX awk. Also the tee variable isn't compulsory (you can use the string).
Of course, we can use tee outside awk too: awk ... | tee out.txt.
GNU AWK's Redirection allows sending output to command, rather than file, therefore I suggest following exploit of said feature:
awk 'BEGIN{command="tee output.txt"}{print tolower($0) | command}' input.txt
Note: I use tolower($0) for demonstration purposes. I redirect print into tee command, which does output to mentioned file and standard output, thus you should get lowercase version of input.txt written to output.txt and standard output.
If you are not confined to single awk usage then you might alternatively use tee outside, like so
awk '{print tolower($0)}' input.txt | tee output.txt
awk '
function prtf(str) {
printf "%s", str > "thisiswhack.out"
printf "%s", str
fflush()
}
function prt(str) {
prtf( str ORS )
}
{
# to print adding a newline at the end:
prt( "foo" )
# to print as-is without adding a newline:
prtf( sprintf("%s, %s, %d", $2, "bar", 17) )
}
' file
In the above we are not spawning a subshell to call any other command so it's efficient, and we're using fflush() after every print to ensure both output streams (stdout and the extra file) don't get out of sync with respect to each other (e.g. stdout displays less text than the file or vice-versa if the command is killed).
The above always overwrites the contents of "thisiswhack.out" with whatever the script outputs. If you want to append instead then change > to >>. If you want the option of doing both, introduce a variable (which I've named prtappend below) to control it which you can set on the command line, e.g. change:
printf "%s", str > "thisiswhack.out"
to:
printf "%s", str >> "thisiswhack.out"
and add:
BEGIN {
if ( !prtappend ) {
printf "" > "thisiswhack.out"
}
}
then if you do awk -v prtappend=1 '...' it'll append to thisiswhack.out instead of overwriting it.
Of course, the better approach if you're on a Unix system is to have your awk script called from a shell script with it's output piped to tee, e.g.:
#!/usr/bin/env bash
awk '
{
print "foo"
printf"%s, %s, %d", $2, "bar", 17
}
' "${#:--}" |
tee 'thisiswhack.out'
Note that this is one more example of why you should not call awk from a shebang.

TCL (expect): can't read "NF" (from awk): no such variable

Attempting to get the last word of the first line from a file. Not sure why the following command:
send "cat moo.txt | grep QUACK * | awk 'NF>1{print $NF}' meow.txt >> bark.txt "
is getting the error message can't read "NF": no such variable.
I can run the awk 'NF>1{print $NF}' meow.txt >> bark.txt snippet just fine on my machine. Yet, when it runs in my expect script, it gives me that error.
Anyone know why expect doesn't recognize the awk built-in variable?
I think your script is trying to expand the variable $NF with it's value before shooting that command through send. $NF isn't set in your shell since it's internal to awk, which hasn't had a chance to even run yet and so it's balking.
Try escaping that variable so it is treated as a string literal and awk will be able to use it when it comes time for awk to run:
send "cat moo.txt | grep QUACK * | awk 'NF>1{print \$NF}' meow.txt >> bark.txt "

awk print without a file

How to print using awk without a file.
script.sh
#!/bin/sh
for i in {2..10};do
awk '{printf("%.2f %.2f\n", '$i', '$i'*(log('$i'/('$i'-1))))}'
done
sh script.sh
Desired output
2 value
3 value
4 value
and so on
value indicates the quantity after computation
BEGIN Block is needed if you are not providing any input to awk either by file or standard input. This block executes at the very start of awk execution even before the first file is opened.
awk 'BEGIN{printf.....
so it is like:
From man page:
Gawk executes AWK programs in the following order. First, all variable assignments specified via the -v option are performed. Next, gawk compiles the program into an internal form. Then, gawk executes the code in the BEGIN block(s) (if any), and then proceeds to read each file named in the ARGV array. If there are no files named on the command line, gawk reads the standard input.
awk structure:
awk 'BEGIN{get initialization data from this block}{execute the logic}' optional_input_file
As PS. correctly pointed out, do use the BEGIN block to print stuff when you don't have a file to read from.
Furthermore, in your case you are looping in Bash and then calling awk on every loop. Instead, loop directly in awk:
$ awk 'BEGIN {for (i=2;i<=10;i++) print i, i*log(i/(i-1))}'
2 1.38629
3 1.2164
4 1.15073
5 1.11572
6 1.09393
7 1.07905
8 1.06825
9 1.06005
10 1.05361
Note I started the loop in 2 because otherwise i=1 would mean log(1/(1-1))=log(1/0)=log(inf).
I would suggest a different approach:
seq 2 10 | awk '{printf("%.2f %.2f\n", $1, $1*(log($1/($1-1))))}'

Is there a way to create an awk input inactivity timer?

I have a text source (a log file), which gets new lines appended to it by some third party.
I can output the additions to my source file using tail -f source. I can then pipe that through an awk script awk -f parser.awk to parse and format the output.
My question is: while tail -f source | awk -f parser.awk is running, is there a way to call function foo() inside my parser.awk script every time there is more than 5 seconds elapsed without anything coming through the pipe into the standard input of the awk script?
Edit: Currently using GNU Awk 3.1.6. May be able to upgrade to newer version if required.
If your shell's read supports -t and -u, here's an ugly hack:
{ echo hello; sleep 6; echo world; } | awk 'BEGIN{
while( "while read -t 5 -u 3 line; do echo \"$line\"; done" | getline > 0 )
print
}' 3<&0
You can replace the print in the body of the while loop with your script. However, it would probably make a lot more sense to put the read timeout between tail and awk in the pipeline, and it would make even more sense to re-implement tail to timeout.
Not exactly the answer to your question. However there is a little hack in shell that can do practically what you want:
{ tail -f log.file >&2 | { while : ; do sleep 5; echo SECRET_PHRASE ; done ; } ; } 2>&1 | awk -f script.awk
When awk receives SECRET_PHRASE it will run foo function every 5 seconds. Unfortunately is will run it every 5 second even in case there was some output during this time from tail.
ps. You can replace '{}' with '()' and vice versa. In the first case it won't create subshell, in the second one it will.
The another way is to append this secret phrase dirctly to log file in case nobody wrote there during last five seconds. But looks like it's not good idea due to you will have spoiled log file.

Shell variable interpreted wrongly in awk

In following code I am trying to pass shell varibale to awk. But when I try to run it as a.sh foo_bar the output printed is "foo is not declared" and when I run it as a.sh bar_bar the output printed is " foo is declared" . Is there a bug in awk or I am doing something wrong here?
I am using gawk-3.0.3.
#!/bin/awk
model=$1
awk ' {
match("'$model'", /foo/)
ismodel=substr("'$model'", RSTART, RLENGTH)
if ( ismodel != foo ) {
print " foo is not declared"
} else {
print " foo is declared"
}
}
' dummy
dummy is file with single blank line.
Thanks,
You should use AWK's variable passing instead of complex quoting:
awk -v awkvar=$shellvar 'BEGIN {print awkvar}'
Your script is written as a shell script, but you have an AWK shebang line. You could change that to #!/bin/sh.
This is not a bug, but an error in your code. The problematic line is:
if ( ismodel != foo ) {
Here foo should be "foo". Right now you are comparing with an empty variable. This gives false when you have a match, and true when you have no match. So the problem is not the way you use the shell variables.
But as the other answerers have said, the preferred way of passing arguments to awk is by using the -v switch. This will also work when you decide to put your awk script in a separate file and prevents all kind of quoting issues.
I'm also not sure about your usage of a dummy file. Is this just for the example? Otherwise you should omit the file and put all your code in the BEGIN {} block.
use -v option to pass in variable from the shell
awk -v model="$1" '{
match(model, /foo/)
.....
}
' dummy