awk script is not running the middle block - awk

the following script will only run the BEGIN and END blocks:
#!/bin/awk -f
BEGIN {print "Hello, World!"}
{ print "Don't Panic" }
END { print "and we're panicking... I told you not to panic. Did you miss that part?" }
and the output is:
$ awk -f joint.awk .
Hello, World!
and we're panicking... I told you not to panic. Did you miss that part?
the expected output is:
$ awk -f joint.awk .
Hello, World!
Don't panic
and we're panicking... I told you not to panic. Did you miss that part?
what's odd is that when I change the middle block to print $1, instead of printing a piece of text, it runs as expected when I pass a file in.

The inner line with explicit no condition gets run once per line of input on stdin (or in your input file, if one is explicitly named).
Thus, how many times Don't Panic gets printed depends on how much input there is.
See this tested by the following code:
awkScript=$(cat <<'EOF'
BEGIN {print "Hello, World!"}
{ print "Don't Panic" }
END { print "and we're panicking... I told you not to panic. Did you miss that part?" }
EOF
)
echo "Testing with no input:"
awk "$awkScript" </dev/null
echo
echo "Testing with one line of input:"
awk "$awkScript" <<<"One line of input"
echo
echo "Testing with two lines of input:"
awk "$awkScript" <<<$'First line\nSecond line'
...which emits as output:
Testing with no input:
Hello, World!
and we're panicking... I told you not to panic. Did you miss that part?
Testing with one line of input:
Hello, World!
Don't Panic
and we're panicking... I told you not to panic. Did you miss that part?
Testing with two lines of input:
Hello, World!
Don't Panic
Don't Panic
and we're panicking... I told you not to panic. Did you miss that part?

Related

Non blocking read from GNU awk coprocess?

I would like to implement incremental execution of scripts using gawk in order to interleave script source and script output in a document.
The idea would be to read script lines into awk to print them and also pipe them into an appropriate interpreter. Then, on a queue from the input file, read any output from the coprocess and print it to standard output. But it seems that I must know how much output has been generated before looping over the coprocess output.
Is there any way to do a non-blocking read from the coprocess?
function script_checkpoint() {
while(("python3" |& getline output) > 0)
print output
}
/^# checkpoint/ { script_checkpoint(); next }
{ print; print $0 |& "python3" }
END { script_checkpoint() }
EDIT: I have tried to implement this without using a coprocess by buffering the input lines until a checkpoint and just letting the interpreter print to standard out itself but the interpreter always buffers its output until the stream closes. I don't want to close it until the program ends to preserve its internal state.
EDIT: made it more clear that my first intended use case is running python scripts. Here is a sample input/output pair.
print('first line')
# checkpoint
print('second line')
should result in
print('first line')
first line
print('second line')
second line
The general issue:
while ((interpreter |& getline output) > 0) runs until it sees an EOF but ...
interpreter does not end/terminate/exit, thus no EOF is sent so ...
awk hangs while waiting for interpreter to send more data so ...
we end up with a deadlock situation (awk waiting for input from interpreter; interpreter waiting for input from awk)
Assumptions:
need to maintain a single invocation of interpreter throughout the run (per a comment from OP); net result: awk cannot depend on interpreter sending an EOF
interpreter can be modified (to generate additional output)
the awk script has no way of knowing how many lines of output will be generated by interpreter
One idea is to setup a handshake between awk and interpreter. Within the while ((interpreter |& getline output) > 0) loop we'll test for our handshake and when we see it break out of the loop and return back to the main awk script.
For demo purposes I'll use a simple bash script that does some handshake processing otherwise just prints to stdout whatever it reads from stdin:
$ cat interpreter
#!/usr/bin/bash
while read -r line
do
if [[ "${line}" = 'checkpoint' ]] # received 'checkpoint' handshake?
then
echo "CHECKPOINT" # send "CHECKPOINT" handshake/acknowledgement
continue
else
echo "interpreter: $line"
fi
done
Demo awk code with handshake logic:
awk '
function script_checkpoint() {
while (( cmd |& getline output) > 0) {
if ( output == "CHECKPOINT" ) # received "CHECKPOINT" handshake/acknowledgement?
break
print output
}
}
BEGIN { cmd= "./interpreter" }
/^# checkpoint/ { print "checkpoint" |& cmd # send "checkpoint" handshake
script_checkpoint()
next
}
{ print "awk: " $0
print $0 |& cmd
}
END { print "awk: last checkpoint" # in case last line of input is not "# checkpoint" we will ...
print "checkpoint" |& cmd # send one last "checkpoint" handshake
script_checkpoint()
print "awk: done"
}
' test.dat
Sample input file:
$ cat test.dat
line1
line2
# checkpoint
line3
line4
# checkpoint
line5
Output:
awk: line1
awk: line2
interpreter: line1
interpreter: line2
awk: line3
awk: line4
interpreter: line3
interpreter: line4
awk: line5
awk: last checkpoint
interpreter: line5
awk: done
NOTES:
awk will still hang in the event interpreter crashes and/or fails to send back the CHECKPOINT handshake
if the strings checkpoint and/or CHECKPOINT can show up in the 'normal' data streams then update the code to use strings that are not expected in the data streams
It sounds like you're trying to do something like this:
BEGIN { cmd="/my/python/script/path" }
function script_checkpoint( output) {
close(cmd,"to")
while ( (cmd |& getline output) > 0 ) {
print output
}
close(cmd)
}
/^# checkpoint/ {
script_checkpoint()
next
}
{
print
print |& cmd
}
END { script_checkpoint() }

How can I store the length of a line into a var withing awk script?

I have this simple awk script with which I attempt to check the amount of characters in the first line.
if the first line has more of less than 10 characters I want to store the amount
of caracters into a var.
Somehow the first print statement works but storing that result into a var doesn't.
Please help.
I tried removing dollar sign " thelength=(length($0))"
and removing the parenthesis "thelength=length($0)" but it doen't print anything...
Thanks!
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=$(length($0))
print "The length of the first line is: ",$thelength;
exit 1;
}
}
END { print "STOP" }' $1
Two issues dealing with mixing ksh and awk scripting ...
no need to make a sub-shell call within awk to obtain the length; use thelength=length($0)
awk variables do not require a leading $ when being referenced; use print ... ,thelength
So your code becomes:
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=length($0)
print "The length of the first line is: ",thelength;
exit 1;
}
}
END { print "STOP" }' $1

Trap or evaluate bad regular expression string at runtime in awk script

How can I trap an error if a dynamic regular expression evaluation is bad like:
var='lazy dog'
# a fixed Regex here, but original is coming from ouside the script
Regex='*.'
#try and failed
if (var ~ Regex) foo
The goal is to manage this error as I cannot test the regex itself (it comes from external source). Using POSIX awk (AIX)
Something like this?
$ echo 'foo' |
awk -v re='*.' '
BEGIN {
cmd="awk --posix \047/" re "/\047 2>&1"
cmd | getline rslt
print "rslt="rslt
close(cmd)
}
{ print "got " $0 " but re was bad" }
'
rslt=awk: cmd. line:1: error: Invalid preceding regular expression: /*./
got foo but re was bad
I use gawk so I had to add --posix to make it not just accept that regexp as a literal * followed by any char. You'll probably have to change the awk command being called in cmd to behave sensibly for your needs with both valid and invalid regexps but you get the idea - to do something like an eval in awk you need to have awk call itself via system() or a pipe to getline. Massage to suit...
Oh, and I don't think you can get the exit status of cmd with the above syntax and you can't capture the output of a system() call within awk so you may need to test the re twice - first with system() to find out if it fails but redirecting it's output to /dev/null, and then on a failure run it again with getline to capture the error message.
Something like:
awk -v re='*.' '
BEGIN {
cmd="awk --posix \047/" re "/\047 2>&1"
if ( system(cmd " > /dev/null") ) {
close(cmd " > /dev/null")
cmd | getline rslt
print "rslt="rslt
close(cmd)
}
}
{ print "got " $0 " but re was bad" }
'

Awk iterating with out a loop construct

I was reading a tutorial on awk scripting, and observed this strange behaviour, Why this awk script while executing asks for a number repeatedly even with out a loop construct like while or for. If we enter CTRL+D(EOF) it stops prompting for another number.
#!/bin/awk -f
BEGIN {
print "type a number";
}
{
print "The square of ", $1, " is ", $1*$1;
print "type another number";
}
END {
print "Done"
}
Please explain this behaviour of the above awk script
awk continues to work on lines until end of file is reached. Since in this case the input (STDIN) never ends as you keep entering number or hitting enter, it causes an endless loop.
When you hit CTRL+D you indicate the awk script that EOF is reached there by exiting the loop.
try this and enter 0 to exit
BEGIN {
print "type a number";
}
{
if($1==0)
exit;
print "The square of ", $1, " is ", $1*$1;
print "type another number";
}
END {
print "Done"
}
From the famous The AWK Programming Language:
If you don't provide a input file to the awk script on the command line, awk will apply the program to whatever you type next on your terminal until you type an end-of-file signal (control-d on Unix systems).

Using a variable defined inside AWK

I got this piece of script working. This is what i wanted:
input
3.76023 0.783649 0.307724 8766.26
3.76022 0.764265 0.307646 8777.46
3.7602 0.733251 0.30752 8821.29
3.76021 0.752635 0.307598 8783.33
3.76023 0.79528 0.307771 8729.82
3.76024 0.814664 0.307849 8650.2
3.76026 0.845679 0.307978 8802.97
3.76025 0.826293 0.307897 8690.43
with script
!/bin/bash
awk -F ', ' '
{
for (i=3; i<=10; i++) {
if (i==NR) {
npc1[i]=sprintf("%s", $1);
npc2[i]=sprintf("%s", $2);
npc3[i]=sprintf("%s", $3);
npRs[i]=sprintf("%s", $4);
print npc1[i],npc2[i],\
npc3[i], npc4[i];
}
}
} ' p_walls.raw
echo "${npc1[100]}"
But now I can't use those arrays npc1[i], outside awk. That last echo prints nothing. Isnt it possible or am I missing something?
AWK is a separate process, after it finishes all internal data is gone. This is true for all external processes/commands. Bash only sees what bash builtins touch.
i is never 100, so why do you want to access npc1[100]?
What are you really trying to do? If you rewrite the question we might be able to help...
(Cherry on the cake is always good!)
Sorry, but all of #yi_H 's answer and comments above are correct.
But there's really no problem loading 2 sets of data into 2 separate arrays in awk, ie.
awk '{
if (FILENAME == "file1") arr1[i++]=$0 ;
#same for file2; }
END {
f1max=++i; f2max=++j;
for (i=1;i<f1max;i++) {
arr1[i]
# put what you need here for arr1 processing
#
# dont forget that you can do things like
if (arr1[i] in arr2) { print arr1[i]"=arr2[arr1["i"]=" arr2[arr1[i]] }
}
for j=1;j<f2max;j++) {
arr2[j]
# and here for arr2
}
}' file1 file2
You'll have to fill the actual processing for arr1[i] and arr2[j].
Also, get an awk book for the weekend and be up and running by Monday. It's easy. You can probably figure it out from grymoire.com/Unix/awk.html
I hope this helps.