Shell variable interpreted wrongly in awk - awk

In following code I am trying to pass shell varibale to awk. But when I try to run it as a.sh foo_bar the output printed is "foo is not declared" and when I run it as a.sh bar_bar the output printed is " foo is declared" . Is there a bug in awk or I am doing something wrong here?
I am using gawk-3.0.3.
#!/bin/awk
model=$1
awk ' {
match("'$model'", /foo/)
ismodel=substr("'$model'", RSTART, RLENGTH)
if ( ismodel != foo ) {
print " foo is not declared"
} else {
print " foo is declared"
}
}
' dummy
dummy is file with single blank line.
Thanks,

You should use AWK's variable passing instead of complex quoting:
awk -v awkvar=$shellvar 'BEGIN {print awkvar}'
Your script is written as a shell script, but you have an AWK shebang line. You could change that to #!/bin/sh.

This is not a bug, but an error in your code. The problematic line is:
if ( ismodel != foo ) {
Here foo should be "foo". Right now you are comparing with an empty variable. This gives false when you have a match, and true when you have no match. So the problem is not the way you use the shell variables.
But as the other answerers have said, the preferred way of passing arguments to awk is by using the -v switch. This will also work when you decide to put your awk script in a separate file and prevents all kind of quoting issues.
I'm also not sure about your usage of a dummy file. Is this just for the example? Otherwise you should omit the file and put all your code in the BEGIN {} block.

use -v option to pass in variable from the shell
awk -v model="$1" '{
match(model, /foo/)
.....
}
' dummy

Related

How can I send the output of an AWK script to a file?

Within an AWK script, I'm needing to send the output of the script to a file while also printing it to the terminal. Is there a nice and tidy way I can do this without having a copy of every print redirect to the file?
I'm not particularly good at making SSCCE examples but here's my attempt at demonstrating my problem;
BEGIN{
print "This is an awk script"
# I don't want to have to do this for every print
print "This is an awk script" > thisiswhack.out
}
{
# data manip. stuff here
# ...
print "%s %s %s" blah, blah blah
# I don't want to have to do this for every print again
print "%s %s %s" blah blah blah >> thisiswhack.out
}
END{
print "Yay we're done!"
# Seriously, there has to be a better way to do this within the script
print "Yay we're done!" >> thisiswhack.out
}
Surely there must be a way to send the entire output of the script to an output file within the script itself, right?
The command to duplicate streams is tee, and we can use it inside awk:
awk '
BEGIN {tee = "tee out.txt"}
{print | tee}' in.txt
This invokes tee with the file argument out.txt, and opens a stream to this command.
The stream (and therefore tee) remains open until awk exits, or close(tee) is called.
Every time print | tee is used, the data is printed to that stream. tee then appends this data both to the file out.txt, and stdout.
The | command feature is POSIX awk. Also the tee variable isn't compulsory (you can use the string).
Of course, we can use tee outside awk too: awk ... | tee out.txt.
GNU AWK's Redirection allows sending output to command, rather than file, therefore I suggest following exploit of said feature:
awk 'BEGIN{command="tee output.txt"}{print tolower($0) | command}' input.txt
Note: I use tolower($0) for demonstration purposes. I redirect print into tee command, which does output to mentioned file and standard output, thus you should get lowercase version of input.txt written to output.txt and standard output.
If you are not confined to single awk usage then you might alternatively use tee outside, like so
awk '{print tolower($0)}' input.txt | tee output.txt
awk '
function prtf(str) {
printf "%s", str > "thisiswhack.out"
printf "%s", str
fflush()
}
function prt(str) {
prtf( str ORS )
}
{
# to print adding a newline at the end:
prt( "foo" )
# to print as-is without adding a newline:
prtf( sprintf("%s, %s, %d", $2, "bar", 17) )
}
' file
In the above we are not spawning a subshell to call any other command so it's efficient, and we're using fflush() after every print to ensure both output streams (stdout and the extra file) don't get out of sync with respect to each other (e.g. stdout displays less text than the file or vice-versa if the command is killed).
The above always overwrites the contents of "thisiswhack.out" with whatever the script outputs. If you want to append instead then change > to >>. If you want the option of doing both, introduce a variable (which I've named prtappend below) to control it which you can set on the command line, e.g. change:
printf "%s", str > "thisiswhack.out"
to:
printf "%s", str >> "thisiswhack.out"
and add:
BEGIN {
if ( !prtappend ) {
printf "" > "thisiswhack.out"
}
}
then if you do awk -v prtappend=1 '...' it'll append to thisiswhack.out instead of overwriting it.
Of course, the better approach if you're on a Unix system is to have your awk script called from a shell script with it's output piped to tee, e.g.:
#!/usr/bin/env bash
awk '
{
print "foo"
printf"%s, %s, %d", $2, "bar", 17
}
' "${#:--}" |
tee 'thisiswhack.out'
Note that this is one more example of why you should not call awk from a shebang.

How to expand awk variables within the code?

Assuming that i passed some variables to the awk script:
$AWK -f script.awk -v var01="foo" var02="bar"
And inside the script i obtain some pattern:
# pattern01 var01
/pattern01/ {
if (??? == "foo") print
}
I want to expand the variable "$2" ("var01") to its given value.
I have been trying with gawk and it seems to be able to expand variables in the following way:
print $$x
But this, for some reason, doesn't work in the first example, also i need to keep POSIX compatibility. Is it possible to expand the variable in the given example?
(Note: I want specifically this behavior (if possible), so i don't want workarounds with other tools or shell expansion)
Equivalent in shell:
file01:
foobar
some random text
pattern01 var01
more random text...
code.sh:
#!/bin/sh
var01="Hello"
x="$(grep '^pattern01' file01 | awk '{print $2}')"
eval echo "$"$x # prints Hello
Using POSIX awk, there is no way to lookup the value of a variable by it's name. Instead consider using an array to store the values. Not the most elegant, but portable:
$AWK -e 'BEGIN { v["var01"] = "foo" ; v["var02"] = "bar" }' -f script.awk
script.awk
# pattern01 var01
/pattern01/ {
if ( v[$2] == "foo") print
}
If you know that you will be new GNU AWK version, and OK with using extensions, you can use the SYMTAB array. From man page:
SYMTAB An array whose indices are the names of all currently
defined global variables and arrays in the program. The array may be
used for indirect access to read or write the value of a variable:
foo = 5
SYMTAB["foo"] = 4
print foo # prints 4
$AWK -f script.awk -v var01="foo" var02="bar"
script.awk
# pattern01 var01
/pattern01/ {
if ( SYMTAB[$2] == "foo") print
}
Both approached eliminate the need to create environment variables, which may have impact on other programs, and may be hard to scale.
I have found one solution, by setting the variable as part of the environment and then calling the special variable "ENVIRON" with the name (as it acts as a dictionary):
# pattern01 var01
/pattern01/ {
if (ENVIRON[$2] == "foo") print
}
I think that by creating manually the dictionary at the BEGIN stage, the same behaviour could be achieved without making use of the environment.
Can you try this
var01="Hello"
x="$(grep '^pattern01' file01 | awk '{print $2}')"
echo ${!x}
hope this helps..
Thanks,

Using an awk variable in the string substitution portion of gsub

I want to use a command line variable to replace text found with a regular expression.
Something like:
awk --lint=fatal -v awk_var=XYZ '{ gsub(/^ABCD=.*$/, "ABCD=<awk_var>"); print}'
Haven't been able to figure out what the awk_var syntax should be.
Since you have not shown samples so based on your shown code and ask, could you please try following.
awk --lint=fatal -v awk_var=XYZ '{ gsub(/^ABCD=.*$/, "ABCD=" awk_var); print}'
You shouldn't give " with your variable else it will be considered as a text.

awk difference between commands from file and from commandline

The following script
#! /bin/bash
B=5
#FILE INPUT
cat <<EOF > awk.in
BEGIN{b=$B;printf("B is %s\n", b)}
EOF
awk -f awk.in sometextfile.txt
#COMMANDLINE INPUT
awk 'BEGIN{b=$B;printf("B is %s\n", b)}' sometextfile.txt
produces the output
B is 5
B is
The commands I am issuing to awk are exactly the same, so why is the variable B interpreted correctly in the first case but not in the latter?
Thanks!
In the line
awk 'BEGIN{b=$B;printf("B is %s\n", b)}' sometextfile.txt
The string literal 'BEGIN{b=$B;printf("B is %s\n", b)}' is singly-quoted, therefore $B is not expanded and treated as awk code. In awk code, B is uninitialized, so $B becomes $0, which is in the BEGIN block empty.
In contrast, shell variables in here documents (as in your first example) are expanded, so awk.in ends up containing the value that $B had in the shell script. This, by the way, would have made writing awk code very painful as soon as you'd tried to use a field variable (named $1, $2, and so forth) or the full line (named $0) because you'd have to manually resolve the ambiguity between the awk fields and shell variables of the same name.
Use
awk -v b="$B" 'BEGIN{ printf("B is %s\n", b) }' sometextfile.txt
to make a shell variable known to awk code. Do not try to substitute it directly into awk code; it isn't necessary, you will hate writing awk code that way, and it leads to code injection problems, especially when B comes from an untrusted source.

Test two variables in the same script

Question :
I'm breaking my teeths under this:
awk -v full=wc -v empty=wp '{
...blablabla...
if ($4==full) stop=yes
if (stop==yes && $4==empty) exit
...blablabla...
}'
The code works well (I mean, I get an output) if I don't declare the two variables full and empty at the beginning, and use instead the values of these in the script. If I only use the first variable in the script, I get the same output. But if I only use the second variable, I get no output at all.
Consider what happens when the variables are expanded. A portion of the awk body goes from:
' ... if ($4=='$full') ...'
to
' ... if ($4==wc) ...'
Since wc is a "bare word", awk thinks it is a variable and substitutes it's value (empty string), so you get this:
' ... if ($4=="") ...'
When you're building your awk script, you need to be aware of quoting strings in awk. You need:
' ... if ($4=="'$full'") ...'
However, it is much more elegant to pass values with awk's -v option as you have done.