awk set command line options in script - awk

I'm curious about how to set command-line options in awk script, like -F for field separator. I try to write the shebang line like
#!/usr/bin/awk -F ":" -f
and get the following error:
awk: 1: unexpected character '.'
For this example, I can do with
BEGIN {FS=":"}
but I still want to know a way to set all those options. Thanks in advance.
EDIT:
let's use another example that should be easy to test.
inputfile:
1
2
3
4
test.awk:
#!/usr/bin/awk -d -f
{num += $1}
END { print num}
run
/usr/bin/awk -d -f test.awk inputfile
will get 10 and generate a file called awkvars.out with some awk global variables in it.
but
./test.awk inputfile
will get
awk: cmd. line:1: ./test.awk
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1: ./test.awk
awk: cmd. line:1: ^ unterminated regexp
if I remove '-d' from shebang line,
./test.awk inputfile
will normally output 10.
My question is that whether there is a way to write "-d" in test.awk file to generate awkvars.out file?

Answering for the OP question, beyond the setting of FS.
Short Answer: you can not use multiple options with '#!', and since you need to tell awk to read the program from stdin (-f-), you are out of luck.
Long Answer:
When using shebang (#!), there is a limit of single argument (which is passed to the named programs as the 1st argument. So in general:
#! /path/to/prog arg1
input-1
input-2
Will execute /path/to/prog arg1, with the content of the file (including the leading shebang) available as stdin. This is oversimplification, actual rules are more complex., see https://unix.stackexchange.com/questions/87560/does-the-shebang-determine-the-shell-which-runs-the-script
Given this limitation of one argument, when executing awk, the only valid and required parameter is '-f', which indicates that the awk programs is provided on STDIN. You can prepend few other options that do NOT take any argument, for example 'traditional' (e.g., '-Pf-' will force POSIX behavior).
As much as I can tell, all the 'interesting' options (setting FS, RS, ORS, ...) need to be separated from the '-f-' with a space, making it impossible to embed them into the command line, other then using the 'BEGIN { ... }' or similar in the script.
Bottom line, trying #! /usr/bin/awk -f- -F, will attempt to look for program is the same as awk -f' -F', and will look for a file named '- -F`. Usually not very useful, and will not set the FS.

Let's say following is our Input_file, which we are going to use for all mentioned solutions here.
cat Input_file
a,b,c,d
ab,c
1st way of setting Field separator: 1st simple way will be setting FS value in BEGIN section of awk program file. Following is our .awk file.
cat file1.awk
BEGIN{
FS=","
}
{
print $1"..."$2
}
Now when we run the code following output will come:
/usr/local/bin/awk -f file1.awk Input_file
a...b
ab...c
2nd way of setting field separator: 2nd way will be pass FS value before reading Input_file like as follows.
/usr/local/bin/awk -f file.awk FS="," Input_file
Example: Now following is the file.awk file which has awk code.
cat file.awk
{
print $1".."$2
}
Now when we run awk file with awk -f .. command as follows will be result.
/usr/local/bin/awk -f file.awk FS="," Input_file
a..b
ab..c
Which means it is picking up the field separator as , in this above program.
3rd way of setting field separator: We can set field separator in awk -f programs like how we do for usual awk programs using -F',' option as follows.
/usr/local/bin/awk -F',' -f file.awk Input_file
a..b
ab..c
4th way of setting field separator: We could mention field separator as a variable by using -v option on command line while running file.awk script as follows.
/usr/local/bin/awk -v FS=',' -f file.awk Input_file

Never use a shebang to call awk as it robs you of the ability to separate shell arguments into awk arguments and awk variables and do anything else that's better done in shell (e.g. arg parsing with getopts) before calling awk. Just call awk from inside your shell script.
Also, don't name your shell script test.awk as it's a shell script. The fact it's implemented in awk is irrelevant. There's no reason to create a file that you sometimes call as awk file to have awk interpret and other times as just file to have the shell interpret.

Related

Why AWK program FS variable can be specified with -F flag of gawk (or other awk) interpreter/command?

Why AWK program's FS variable can be specified with -F flag of gawk (or other awk) interpreter/command?
Let me explain, AWK is a programming language and gawk is (one of many) an interpreter for AWK. gawk interpreter/execute/runs the AWK program that given to it. So why the FS (field separator) variable can be specified with gawk's -F flag? I find it kind of unnatural... and how does it technically do that?
My best guess as to "why" is as a convenience. FS is probably the most used/manipulated awk variable, so having a short option to set it is helpful
Consider
awk -F, '...' file.csv
# vs
awk 'BEGIN {FS=","} ...' file.csv
"How does it technically do that" -- see https://git.savannah.gnu.org/cgit/gawk.git/tree/main.c#n1586
Historically -F was implemented in gawk v1.01 so it would have existed in whatever legacy awk that gawk was based on.
Additionally, the POSIX specification mandates -F.
So why the FS (field separator) variable can be specified with gawk's
-F flag?
awk man page claims that
Command line variable assignment is most useful for dynamically
assigning values to the variables AWK uses to control how input is
broken into fields and records. It is also useful for controlling
state if multiple passes are needed over a single data file.
So -F comes handy when field separator is not etched in stone, but rather computed dynamically, as -F allows you tu use bash variable easily, imagine that you was tasked with developing part of bash script which should output last field of each line of file.txt when using character stored in variable sep as separator, then you could do that following way
awk -F ${sep} '{print $NF}' file.txt
find it kind of unnatural
This depend on what you have used before, cut user which want to get 3rd column from csv file might do that following way
cut -d , -f 3 file.csv

Why is field separator taken into account differently if set before or after the expression?

The code print split("foo:bar", a) returns how many slices did split() when trying to cut based on the field separator. Since the default field separator is the space and there is none in "foo:bar", the result is 1:
$ awk 'BEGIN{print split("foo:bar",a)}'
1
However, if the field separator is ":" then the result is obviously 2 ("foo" and "bar"):
$ awk 'BEGIN{FS=":"; print split("foo:bar", a)}'
2
$ awk -F: 'BEGIN{print split("foo:bar", a)}'
2
However, it does not if FS is defined after the Awk expression:
$ awk 'BEGIN{print split("foo:bar", a)}' FS=":"
1
If I print it not in the BEGIN block but when processing a file, the FS is already taken into account:
$ echo "bla" > file
$ awk '{print split("foo:bar",a)}' FS=":" file
2
So it looks like FS set before the expression is already taken into account in the BEGIN block, while it is not if defined after.
Why is this happening? I could not find details on this in GNU Awk User's Guide → 4.5.4 Setting FS from the Command Line. I am working on GNU Awk 5.
This feature is not inherent to GNU awk but is POSIX.
Calling convention:
The awk calling convention is the following:
awk [-F sepstring] [-v assignment]... program [argument...]
awk [-F sepstring] -f progfile [-f progfile]... [-v assignment]...
[argument...]
This shows that any option (flags -F,-v,-f) passed to awk should occur before the program definition and possible arguments. This shows that:
# this works
$ awk -F: '1' /dev/null
# this fails
$ awk '1' -F: /dev/null
awk: fatal: cannot open file `-F:' for reading (No such file or directory)
Fieldseparators and assignments as options:
The Standard states:
-F sepstring: Define the input field separator. This option shall be equivalent to: -v FS=sepstring
-v assignment:
The application shall ensure that the assignment argument is in the same form as an assignment operand. The specified variable assignment shall occur prior to executing the awk program, including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option can be specified.
source: POSIX awk standard
So, if you define a variable assignment or declare a field separator using the options, BEGIN will know them:
$ awk -F: -v a=1 'BEGIN{print FS,a}'
: 1
What are arguments?:
The Standard states:
argument: Either of the following two types of argument can be intermixed:
file
A pathname of a file that contains the input to be read, which is matched against the set of patterns in the program. If no file operands are specified, or if a file operand is '-', the standard input shall be used.
assignment
An <snip: extremely long sentence to state varname=varvalue>, shall specify a variable assignment rather than a pathname. <snip: some extended details on the meaning of varname=varvalue> Each such variable assignment shall occur just prior to the processing of the following file, if any. Thus, an assignment before the first file argument shall be executed after the BEGIN actions (if any), while an assignment after the last file argument shall occur before the END actions (if any). If there are no file arguments, assignments shall be executed before processing the standard input.
source: POSIX awk standard
Which means that if you do:
$ awk program FS=val file
BEGIN will not know about the new definition of FS but any other part of the program will.
Example:
$ awk -v OFS="|" 'BEGIN{print "BEGIN",FS,a,""}END{print "END",a,""}' FS=: a=1 /dev/null
BEGIN| ||
END|:|1|
$ awk -v OFS="|" 'BEGIN{print "BEGIN",FS,a,""}
{print "ACTION",FS,a,""}
END{print "END",a,""}' FS=: a=1 <(echo 1) a=2
BEGIN| ||
ACTION|:|1|
END|:|2|
See also:
GNU awk manual: Section Other arguments for an understanding how GNU awk interprets the above.
Because you can set the variable individually for each file you process, and BEGIN happens before any of that.
bash$ awk '{ print NF }' <(echo "foo:bar") FS=: <(echo "foo:bar")
1
2

Proper way to use variables in awk in a script? [duplicate]

I found some ways to pass external shell variables to an awk script, but I'm confused about ' and ".
First, I tried with a shell script:
$ v=123test
$ echo $v
123test
$ echo "$v"
123test
Then tried awk:
$ awk 'BEGIN{print "'$v'"}'
$ 123test
$ awk 'BEGIN{print '"$v"'}'
$ 123
Why is the difference?
Lastly I tried this:
$ awk 'BEGIN{print " '$v' "}'
$ 123test
$ awk 'BEGIN{print ' "$v" '}'
awk: cmd. line:1: BEGIN{print
awk: cmd. line:1: ^ unexpected newline or end of string
I'm confused about this.
#Getting shell variables into awk
may be done in several ways. Some are better than others. This should cover most of them. If you have a comment, please leave below.                                                                                    v1.5
Using -v (The best way, most portable)
Use the -v option: (P.S. use a space after -v or it will be less portable. E.g., awk -v var= not awk -vvar=)
variable="line one\nline two"
awk -v var="$variable" 'BEGIN {print var}'
line one
line two
This should be compatible with most awk, and the variable is available in the BEGIN block as well:
If you have multiple variables:
awk -v a="$var1" -v b="$var2" 'BEGIN {print a,b}'
Warning. As Ed Morton writes, escape sequences will be interpreted so \t becomes a real tab and not \t if that is what you search for. Can be solved by using ENVIRON[] or access it via ARGV[]
PS If you have vertical bar or other regexp meta characters as separator like |?( etc, they must be double escaped. Example 3 vertical bars ||| becomes -F'\\|\\|\\|'. You can also use -F"[|][|][|]".
Example on getting data from a program/function inn to awk (here date is used)
awk -v time="$(date +"%F %H:%M" -d '-1 minute')" 'BEGIN {print time}'
Example of testing the contents of a shell variable as a regexp:
awk -v var="$variable" '$0 ~ var{print "found it"}'
Variable after code block
Here we get the variable after the awk code. This will work fine as long as you do not need the variable in the BEGIN block:
variable="line one\nline two"
echo "input data" | awk '{print var}' var="${variable}"
or
awk '{print var}' var="${variable}" file
Adding multiple variables:
awk '{print a,b,$0}' a="$var1" b="$var2" file
In this way we can also set different Field Separator FS for each file.
awk 'some code' FS=',' file1.txt FS=';' file2.ext
Variable after the code block will not work for the BEGIN block:
echo "input data" | awk 'BEGIN {print var}' var="${variable}"
Here-string
Variable can also be added to awk using a here-string from shells that support them (including Bash):
awk '{print $0}' <<< "$variable"
test
This is the same as:
printf '%s' "$variable" | awk '{print $0}'
P.S. this treats the variable as a file input.
ENVIRON input
As TrueY writes, you can use the ENVIRON to print Environment Variables.
Setting a variable before running AWK, you can print it out like this:
X=MyVar
awk 'BEGIN{print ENVIRON["X"],ENVIRON["SHELL"]}'
MyVar /bin/bash
ARGV input
As Steven Penny writes, you can use ARGV to get the data into awk:
v="my data"
awk 'BEGIN {print ARGV[1]}' "$v"
my data
To get the data into the code itself, not just the BEGIN:
v="my data"
echo "test" | awk 'BEGIN{var=ARGV[1];ARGV[1]=""} {print var, $0}' "$v"
my data test
Variable within the code: USE WITH CAUTION
You can use a variable within the awk code, but it's messy and hard to read, and as Charles Duffy points out, this version may also be a victim of code injection. If someone adds bad stuff to the variable, it will be executed as part of the awk code.
This works by extracting the variable within the code, so it becomes a part of it.
If you want to make an awk that changes dynamically with use of variables, you can do it this way, but DO NOT use it for normal variables.
variable="line one\nline two"
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
Here is an example of code injection:
variable='line one\nline two" ; for (i=1;i<=1000;++i) print i"'
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
1
2
3
.
.
1000
You can add lots of commands to awk this way. Even make it crash with non valid commands.
One valid use of this approach, though, is when you want to pass a symbol to awk to be applied to some input, e.g. a simple calculator:
$ calc() { awk -v x="$1" -v z="$3" 'BEGIN{ print x '"$2"' z }'; }
$ calc 2.7 '+' 3.4
6.1
$ calc 2.7 '*' 3.4
9.18
There is no way to do that using an awk variable populated with the value of a shell variable, you NEED the shell variable to expand to become part of the text of the awk script before awk interprets it. (see comment below by Ed M.)
Extra info:
Use of double quote
It's always good to double quote variable "$variable"
If not, multiple lines will be added as a long single line.
Example:
var="Line one
This is line two"
echo $var
Line one This is line two
echo "$var"
Line one
This is line two
Other errors you can get without double quote:
variable="line one\nline two"
awk -v var=$variable 'BEGIN {print var}'
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ backslash not last character on line
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ syntax error
And with single quote, it does not expand the value of the variable:
awk -v var='$variable' 'BEGIN {print var}'
$variable
More info about AWK and variables
Read this faq.
It seems that the good-old ENVIRON awk built-in hash is not mentioned at all. An example of its usage:
$ X=Solaris awk 'BEGIN{print ENVIRON["X"], ENVIRON["TERM"]}'
Solaris rxvt
You could pass in the command-line option -v with a variable name (v) and a value (=) of the environment variable ("${v}"):
% awk -vv="${v}" 'BEGIN { print v }'
123test
Or to make it clearer (with far fewer vs):
% environment_variable=123test
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }'
123test
You can utilize ARGV:
v=123test
awk 'BEGIN {print ARGV[1]}' "$v"
Note that if you are going to continue into the body, you will need to adjust
ARGC:
awk 'BEGIN {ARGC--} {print ARGV[2], $0}' file "$v"
I just changed #Jotne's answer for "for loop".
for i in `seq 11 20`; do host myserver-$i | awk -v i="$i" '{print "myserver-"i" " $4}'; done
I had to insert date at the beginning of the lines of a log file and it's done like below:
DATE=$(date +"%Y-%m-%d")
awk '{ print "'"$DATE"'", $0; }' /path_to_log_file/log_file.log
It can be redirect to another file to save
Pro Tip
It could come handy to create a function that handles this so you dont have to type everything every time. Using the selected solution we get...
awk_switch_columns() {
cat < /dev/stdin | awk -v a="$1" -v b="$2" " { t = \$a; \$a = \$b; \$b = t; print; } "
}
And use it as...
echo 'a b c d' | awk_switch_columns 2 4
Output:
a d c b

Combine grep -f and awk

I am using two commands:
awk '{ print $2 }' SomeFile.txt > Pattern.txt
grep -f Pattern.txt File.txt
With the first command I create a list of desirable patterns. With the second command I extract all lines in File.txt that match the lines in the Pattern.txt
My question is, is there a way to combine awk and grep in a pipeline so that I don't have to generate the intermediate Pattern.txt file?
Thanks!
You can do this all in one invocation of awk:
awk 'NR==FNR{a[$2];next}{for(i in a)if($0~i)print}' Somefile.txt File.txt
Populate keys in the array a from the second column of the first file. NR==FNR identifies the first file (total record number is equal to this file's record number). next skips the second block for the first file.
In the second block, loop through all the keys in the array and if the line matches any of them, print it. To avoid printing the line more than once if it matches more than one pattern, you could add a next here too, i.e. {for(i in a)if($0~i){print;next}}.
If the "patterns" are actually fixed strings, it is even simpler:
awk 'NR==FNR{a[$2];next}$0 in a' Somefile.txt File.txt
If your shell supports it, you can use process substitution:
grep -f <(awk '{ print $2 }' SomeFile.txt) File.txt
bash and zsh will support that, others will probably too, didn't tested.
Simpler as the above and supported by all shells would be to use a pipe:
awk '{ print $2 }' SomeFile.txt | grep -f - File.txt
- is used as the argument to -f. - has a special meaning here and stands for stdin. Thanks to Tom Fenech for mentioning that!

reading a variable in awk from the command line after entering the command

I try searching a file by using awk. How can I ask awk to read a variable from the command line as a name to get searched in the file:
this is a regular way I use to search the file and I can ask the user to enter a name to search in the file.txt
awk -f myAwk.awk file.txt
How can I manage it like this :
awk -f myAwk.awk file.txt nameToSearch
How can I use ARGC and ARGV to search the nameToSearch in the file.txt?
What you're probably looking for is
awk [-W option] [-F value] [-v var=value] [--] 'program text' [file ...]
so
awk -v MYVAR=nameToSearch -v OTHERVAR=somethingElse -f myAwk.awk file.txt
Is that it? of course order of switches ( -f, -v ) does not matter. Obvously you then need to include MYVAR ( OTHERVAR ) for a variable identifier inside awk program itself.
To pass a variable to awk, you can use the -v command.
For example:
cat file.txt | awk -v p="stringToSearch" '$0 ~ p'
In this command, tou replace stringToSearch with a pattern (please keep the double quote, they are useful for preserving spaces). The awk command $0 ~ p compares the current line to the given pattern.
Another approach is to build the awk command from the shell:
p="stringToSearch"
awk "/$p/" file.txt
You must use double quotes in the command to force expanding $p.
If it's permitted to change the order of arguments, so that we can do this:
awk -f myAwk.awk nameToSearch file.txt
then you can do:
awk 'NR==1 { nameToSearch = $0; next} { ... rest of myAwk.awk here ...}' nameToSearch file.txt
You can of course add the NR==1 {...} block to the beginning of your myAwk.awk file, then continue using:
awk -f myAwk.awk nameToSearch file.txt
The technique Piotr Wadas describes has the same effect:
awk -v nameToSearch=whatever -f myAwk.awk file.txt
and that's what I'd use myself, rather than passing whatever as an additional argument to the script. Passing whatever as an additional argument is what scripters had to do before the -v facilities were added to awk. If writing -v nameToSearch= is too verbose, then I'd wrap the whole thing up in a shell script, and say:
myShellScript whatever file.txt
But you asked how to do it by passing whatever as an additional argument to the awk script, so that's what I demonstrated.