reading a variable in awk from the command line after entering the command - awk

I try searching a file by using awk. How can I ask awk to read a variable from the command line as a name to get searched in the file:
this is a regular way I use to search the file and I can ask the user to enter a name to search in the file.txt
awk -f myAwk.awk file.txt
How can I manage it like this :
awk -f myAwk.awk file.txt nameToSearch
How can I use ARGC and ARGV to search the nameToSearch in the file.txt?

What you're probably looking for is
awk [-W option] [-F value] [-v var=value] [--] 'program text' [file ...]
so
awk -v MYVAR=nameToSearch -v OTHERVAR=somethingElse -f myAwk.awk file.txt
Is that it? of course order of switches ( -f, -v ) does not matter. Obvously you then need to include MYVAR ( OTHERVAR ) for a variable identifier inside awk program itself.

To pass a variable to awk, you can use the -v command.
For example:
cat file.txt | awk -v p="stringToSearch" '$0 ~ p'
In this command, tou replace stringToSearch with a pattern (please keep the double quote, they are useful for preserving spaces). The awk command $0 ~ p compares the current line to the given pattern.
Another approach is to build the awk command from the shell:
p="stringToSearch"
awk "/$p/" file.txt
You must use double quotes in the command to force expanding $p.

If it's permitted to change the order of arguments, so that we can do this:
awk -f myAwk.awk nameToSearch file.txt
then you can do:
awk 'NR==1 { nameToSearch = $0; next} { ... rest of myAwk.awk here ...}' nameToSearch file.txt
You can of course add the NR==1 {...} block to the beginning of your myAwk.awk file, then continue using:
awk -f myAwk.awk nameToSearch file.txt
The technique Piotr Wadas describes has the same effect:
awk -v nameToSearch=whatever -f myAwk.awk file.txt
and that's what I'd use myself, rather than passing whatever as an additional argument to the script. Passing whatever as an additional argument is what scripters had to do before the -v facilities were added to awk. If writing -v nameToSearch= is too verbose, then I'd wrap the whole thing up in a shell script, and say:
myShellScript whatever file.txt
But you asked how to do it by passing whatever as an additional argument to the awk script, so that's what I demonstrated.

Related

Use awk to interpret }{ as RS and output with ORS }\n{

I have data that looks like this:
{"anonymousId":"abc123",{"hello":"world"}}{"anonymousId":"abc456",{"hi": "again"}}
It's as if you took a newline-delimited json file and removed all the newlines.
I'm trying to use awk to convert it to to ndjson.
That is, my expected output is this:
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
I don't want to load the entire file into memory (which is why I'm not using sed), so my thought is I should use }{ as row separator. Then, I figure if I use }\n{ as ORS I should get my desired output.
So I tried this:
cat my-file.txt | awk -v RS="}{" -v ORS="}\n{" '{$1=$1}1'
But it doesn't work!
Here's the output I get:
{"anonymousId":"abc123",{"hello":"world"}
{}
{{"anonymousId":"abc456",{"hi": "again"}
{}
{}
{
Apart from the constraint of not loading the entire file into memory, I don't care what bash command is used, but my thinking is awk will be the way. E.g. if tr supported multi-character expressions, that would be fine with me.
Please help me understand why this isn't working as expected and what I need to change.
Thanks!
Update
Following the answers given, will add some learnings.
The TLDR is don't use a macOS if you need to do trickier things like this.
For one this doen't work on mac: echo -e "a\nb\nc\nd\ne\n" | head -n -2; it complains about illegal line parameter, but this is valid on a linux system.
The other problem was the way awk was working on my (mac) system.
My awk command was close to correct.
On linux it produces this output:
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}}
{
So I just have to find a way to trim the trailing }\n{ (and as pointed out in the answer, the {$1=$1} is not necessary).
But all of those extraneous newlines were due to the screwy implementation of awk on my system ( It wasn't gawk and i'm not sure what it was ).
Doing $1=$1 inside awk -v RS='}{' -v ORS='}\n{' '{$1=$1}1' file isn't useful - it tells awk to recompile the current record replacing all chains of white space with blanks but you the only white space in your example is the \n at the end of the file and there's no point converting that to a blank. So your script can be reduced to:
awk -v RS='}{' -v ORS='}\n{' '1' file
but RS='}{' means different things to different awk variants.
Use of a multi-char RS with GNU awk (and probably a couple of others now) means that the RS is treated as a regexp to separate the records:
$ awk -v RS='}{' -v ORS='}\n{' '1' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
}
{$
Note the extra }\n{ added at the end because there is no }{ at the end of your input and so the end of input itself indicates the end of a record and so gets replaced with the ORS value.
Use of a multi-char RS with a POSIX awk means that the 2nd and subsequent chars in the RS get ignored and the first char is taken as the RS, hence the output you reported seeing in your question:
$ awk --posix -v RS='}{' -v ORS='}\n{' '1' file
{"anonymousId":"abc123",{"hello":"world"}
{}
{{"anonymousId":"abc456",{"hi": "again"}
{}
{
}
{$
where every } alone gets treated as matching RS and so gets replaced by ORS.
So you are not using an awk that supports multi-char RS. Your choices are to install one (preferably gawk) and do:
$ awk -v RS='}[{\n]' '{ORS=gensub(/}{/,"}\n{",1,RT)} 1' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
otherwise do something like this with any awk:
$ awk --posix -v RS='{' -v ORS= '{print pfx $0; pfx=(/}$/ ? "\n" : "") RS}' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
In the gawk solution above we define the RS as '}[{\n]' to say that the records mid-line are terminated by }{ but the record at the end of the line is terminated by }\n. So RT holds }{ for every record except the last one on the line which is }\n if your line ends with \n or NULL otherwise and so we just have to set ORS to be RT but with }{ converted to }\n{ for those records where RT has that value, otherwise ORS just gets set to }\n when RT has that value or NULL if your input didn't have a terminating \n.
An alternative gawk solution that I think I might actually prefer would be:
$ awk -v RS='}{' -v ORS='}\n{' 'NR>1{print prev} {prev=$0} END{printf "%s",prev}' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
EDIT: original answer for posterity before I noticed the OP said they don't want to read the whole file into memory:
Simple substitutions on individual strings like this is what sed is best at:
$ sed 's/}{/}\n{/g' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
otherwise with any awk:
$ awk '{gsub(/}{/,"}\n{")} 1' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
using record separator will create an extra delimiter at the end of the file, since it's static we can just remove it afterwards
$ echo '{"anonymousId":"abc123",{"hello":"world"}}{"anonymousId":"abc456",{"hi": "again"}}' |
awk -v RS='}{' -v ORS='}\n{' 1 | head -n -2
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
if you don't have gawk for multi-char RS support, you can have this workaround
$ echo ... |
awk -v RS='}' 'NF{printf "%s", $0 RS} !NF{print RS}' | head -n -2
there will be an extra RS, which will be trimmed afterwards.

awk set command line options in script

I'm curious about how to set command-line options in awk script, like -F for field separator. I try to write the shebang line like
#!/usr/bin/awk -F ":" -f
and get the following error:
awk: 1: unexpected character '.'
For this example, I can do with
BEGIN {FS=":"}
but I still want to know a way to set all those options. Thanks in advance.
EDIT:
let's use another example that should be easy to test.
inputfile:
1
2
3
4
test.awk:
#!/usr/bin/awk -d -f
{num += $1}
END { print num}
run
/usr/bin/awk -d -f test.awk inputfile
will get 10 and generate a file called awkvars.out with some awk global variables in it.
but
./test.awk inputfile
will get
awk: cmd. line:1: ./test.awk
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1: ./test.awk
awk: cmd. line:1: ^ unterminated regexp
if I remove '-d' from shebang line,
./test.awk inputfile
will normally output 10.
My question is that whether there is a way to write "-d" in test.awk file to generate awkvars.out file?
Answering for the OP question, beyond the setting of FS.
Short Answer: you can not use multiple options with '#!', and since you need to tell awk to read the program from stdin (-f-), you are out of luck.
Long Answer:
When using shebang (#!), there is a limit of single argument (which is passed to the named programs as the 1st argument. So in general:
#! /path/to/prog arg1
input-1
input-2
Will execute /path/to/prog arg1, with the content of the file (including the leading shebang) available as stdin. This is oversimplification, actual rules are more complex., see https://unix.stackexchange.com/questions/87560/does-the-shebang-determine-the-shell-which-runs-the-script
Given this limitation of one argument, when executing awk, the only valid and required parameter is '-f', which indicates that the awk programs is provided on STDIN. You can prepend few other options that do NOT take any argument, for example 'traditional' (e.g., '-Pf-' will force POSIX behavior).
As much as I can tell, all the 'interesting' options (setting FS, RS, ORS, ...) need to be separated from the '-f-' with a space, making it impossible to embed them into the command line, other then using the 'BEGIN { ... }' or similar in the script.
Bottom line, trying #! /usr/bin/awk -f- -F, will attempt to look for program is the same as awk -f' -F', and will look for a file named '- -F`. Usually not very useful, and will not set the FS.
Let's say following is our Input_file, which we are going to use for all mentioned solutions here.
cat Input_file
a,b,c,d
ab,c
1st way of setting Field separator: 1st simple way will be setting FS value in BEGIN section of awk program file. Following is our .awk file.
cat file1.awk
BEGIN{
FS=","
}
{
print $1"..."$2
}
Now when we run the code following output will come:
/usr/local/bin/awk -f file1.awk Input_file
a...b
ab...c
2nd way of setting field separator: 2nd way will be pass FS value before reading Input_file like as follows.
/usr/local/bin/awk -f file.awk FS="," Input_file
Example: Now following is the file.awk file which has awk code.
cat file.awk
{
print $1".."$2
}
Now when we run awk file with awk -f .. command as follows will be result.
/usr/local/bin/awk -f file.awk FS="," Input_file
a..b
ab..c
Which means it is picking up the field separator as , in this above program.
3rd way of setting field separator: We can set field separator in awk -f programs like how we do for usual awk programs using -F',' option as follows.
/usr/local/bin/awk -F',' -f file.awk Input_file
a..b
ab..c
4th way of setting field separator: We could mention field separator as a variable by using -v option on command line while running file.awk script as follows.
/usr/local/bin/awk -v FS=',' -f file.awk Input_file
Never use a shebang to call awk as it robs you of the ability to separate shell arguments into awk arguments and awk variables and do anything else that's better done in shell (e.g. arg parsing with getopts) before calling awk. Just call awk from inside your shell script.
Also, don't name your shell script test.awk as it's a shell script. The fact it's implemented in awk is irrelevant. There's no reason to create a file that you sometimes call as awk file to have awk interpret and other times as just file to have the shell interpret.

Proper way to use variables in awk in a script? [duplicate]

I found some ways to pass external shell variables to an awk script, but I'm confused about ' and ".
First, I tried with a shell script:
$ v=123test
$ echo $v
123test
$ echo "$v"
123test
Then tried awk:
$ awk 'BEGIN{print "'$v'"}'
$ 123test
$ awk 'BEGIN{print '"$v"'}'
$ 123
Why is the difference?
Lastly I tried this:
$ awk 'BEGIN{print " '$v' "}'
$ 123test
$ awk 'BEGIN{print ' "$v" '}'
awk: cmd. line:1: BEGIN{print
awk: cmd. line:1: ^ unexpected newline or end of string
I'm confused about this.
#Getting shell variables into awk
may be done in several ways. Some are better than others. This should cover most of them. If you have a comment, please leave below.                                                                                    v1.5
Using -v (The best way, most portable)
Use the -v option: (P.S. use a space after -v or it will be less portable. E.g., awk -v var= not awk -vvar=)
variable="line one\nline two"
awk -v var="$variable" 'BEGIN {print var}'
line one
line two
This should be compatible with most awk, and the variable is available in the BEGIN block as well:
If you have multiple variables:
awk -v a="$var1" -v b="$var2" 'BEGIN {print a,b}'
Warning. As Ed Morton writes, escape sequences will be interpreted so \t becomes a real tab and not \t if that is what you search for. Can be solved by using ENVIRON[] or access it via ARGV[]
PS If you have vertical bar or other regexp meta characters as separator like |?( etc, they must be double escaped. Example 3 vertical bars ||| becomes -F'\\|\\|\\|'. You can also use -F"[|][|][|]".
Example on getting data from a program/function inn to awk (here date is used)
awk -v time="$(date +"%F %H:%M" -d '-1 minute')" 'BEGIN {print time}'
Example of testing the contents of a shell variable as a regexp:
awk -v var="$variable" '$0 ~ var{print "found it"}'
Variable after code block
Here we get the variable after the awk code. This will work fine as long as you do not need the variable in the BEGIN block:
variable="line one\nline two"
echo "input data" | awk '{print var}' var="${variable}"
or
awk '{print var}' var="${variable}" file
Adding multiple variables:
awk '{print a,b,$0}' a="$var1" b="$var2" file
In this way we can also set different Field Separator FS for each file.
awk 'some code' FS=',' file1.txt FS=';' file2.ext
Variable after the code block will not work for the BEGIN block:
echo "input data" | awk 'BEGIN {print var}' var="${variable}"
Here-string
Variable can also be added to awk using a here-string from shells that support them (including Bash):
awk '{print $0}' <<< "$variable"
test
This is the same as:
printf '%s' "$variable" | awk '{print $0}'
P.S. this treats the variable as a file input.
ENVIRON input
As TrueY writes, you can use the ENVIRON to print Environment Variables.
Setting a variable before running AWK, you can print it out like this:
X=MyVar
awk 'BEGIN{print ENVIRON["X"],ENVIRON["SHELL"]}'
MyVar /bin/bash
ARGV input
As Steven Penny writes, you can use ARGV to get the data into awk:
v="my data"
awk 'BEGIN {print ARGV[1]}' "$v"
my data
To get the data into the code itself, not just the BEGIN:
v="my data"
echo "test" | awk 'BEGIN{var=ARGV[1];ARGV[1]=""} {print var, $0}' "$v"
my data test
Variable within the code: USE WITH CAUTION
You can use a variable within the awk code, but it's messy and hard to read, and as Charles Duffy points out, this version may also be a victim of code injection. If someone adds bad stuff to the variable, it will be executed as part of the awk code.
This works by extracting the variable within the code, so it becomes a part of it.
If you want to make an awk that changes dynamically with use of variables, you can do it this way, but DO NOT use it for normal variables.
variable="line one\nline two"
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
Here is an example of code injection:
variable='line one\nline two" ; for (i=1;i<=1000;++i) print i"'
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
1
2
3
.
.
1000
You can add lots of commands to awk this way. Even make it crash with non valid commands.
One valid use of this approach, though, is when you want to pass a symbol to awk to be applied to some input, e.g. a simple calculator:
$ calc() { awk -v x="$1" -v z="$3" 'BEGIN{ print x '"$2"' z }'; }
$ calc 2.7 '+' 3.4
6.1
$ calc 2.7 '*' 3.4
9.18
There is no way to do that using an awk variable populated with the value of a shell variable, you NEED the shell variable to expand to become part of the text of the awk script before awk interprets it. (see comment below by Ed M.)
Extra info:
Use of double quote
It's always good to double quote variable "$variable"
If not, multiple lines will be added as a long single line.
Example:
var="Line one
This is line two"
echo $var
Line one This is line two
echo "$var"
Line one
This is line two
Other errors you can get without double quote:
variable="line one\nline two"
awk -v var=$variable 'BEGIN {print var}'
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ backslash not last character on line
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ syntax error
And with single quote, it does not expand the value of the variable:
awk -v var='$variable' 'BEGIN {print var}'
$variable
More info about AWK and variables
Read this faq.
It seems that the good-old ENVIRON awk built-in hash is not mentioned at all. An example of its usage:
$ X=Solaris awk 'BEGIN{print ENVIRON["X"], ENVIRON["TERM"]}'
Solaris rxvt
You could pass in the command-line option -v with a variable name (v) and a value (=) of the environment variable ("${v}"):
% awk -vv="${v}" 'BEGIN { print v }'
123test
Or to make it clearer (with far fewer vs):
% environment_variable=123test
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }'
123test
You can utilize ARGV:
v=123test
awk 'BEGIN {print ARGV[1]}' "$v"
Note that if you are going to continue into the body, you will need to adjust
ARGC:
awk 'BEGIN {ARGC--} {print ARGV[2], $0}' file "$v"
I just changed #Jotne's answer for "for loop".
for i in `seq 11 20`; do host myserver-$i | awk -v i="$i" '{print "myserver-"i" " $4}'; done
I had to insert date at the beginning of the lines of a log file and it's done like below:
DATE=$(date +"%Y-%m-%d")
awk '{ print "'"$DATE"'", $0; }' /path_to_log_file/log_file.log
It can be redirect to another file to save
Pro Tip
It could come handy to create a function that handles this so you dont have to type everything every time. Using the selected solution we get...
awk_switch_columns() {
cat < /dev/stdin | awk -v a="$1" -v b="$2" " { t = \$a; \$a = \$b; \$b = t; print; } "
}
And use it as...
echo 'a b c d' | awk_switch_columns 2 4
Output:
a d c b

Combine grep -f and awk

I am using two commands:
awk '{ print $2 }' SomeFile.txt > Pattern.txt
grep -f Pattern.txt File.txt
With the first command I create a list of desirable patterns. With the second command I extract all lines in File.txt that match the lines in the Pattern.txt
My question is, is there a way to combine awk and grep in a pipeline so that I don't have to generate the intermediate Pattern.txt file?
Thanks!
You can do this all in one invocation of awk:
awk 'NR==FNR{a[$2];next}{for(i in a)if($0~i)print}' Somefile.txt File.txt
Populate keys in the array a from the second column of the first file. NR==FNR identifies the first file (total record number is equal to this file's record number). next skips the second block for the first file.
In the second block, loop through all the keys in the array and if the line matches any of them, print it. To avoid printing the line more than once if it matches more than one pattern, you could add a next here too, i.e. {for(i in a)if($0~i){print;next}}.
If the "patterns" are actually fixed strings, it is even simpler:
awk 'NR==FNR{a[$2];next}$0 in a' Somefile.txt File.txt
If your shell supports it, you can use process substitution:
grep -f <(awk '{ print $2 }' SomeFile.txt) File.txt
bash and zsh will support that, others will probably too, didn't tested.
Simpler as the above and supported by all shells would be to use a pipe:
awk '{ print $2 }' SomeFile.txt | grep -f - File.txt
- is used as the argument to -f. - has a special meaning here and stands for stdin. Thanks to Tom Fenech for mentioning that!

awk won't print new line characters

I am using the below code to change an existing awk script so that I can add more and more cases with a simple command.
echo `awk '{if(/#append1/){print "pref'"$1"'=0\n" $0 "\n"} else{print $0 "\n"}}' tf.a
note that the first print is "pref'"$1"'=0\n" so it is referring to the variable $1 in its environment, not in awk itself.
The command ./tfb.a "c" should change the code from:
BEGIN{
#append1
}
...
to:
BEGIN{
prefc=0
#append1
}
...
However, it gives me everything on one line.
Does anyone know why this is?
If you take awk right out of the equation you can see what's going on:
# Use a small test file instead of an awk script
$ cat xxx
hello
there
$ echo `cat xxx`
hello there
$ echo "`cat xxx`"
hello
there
$ echo "$(cat xxx)"
hello
there
$
The backtick operator expands the output into shell "words" too soon. You could play around with the $IFS variable in the shell (yikes), or you could just use double-quotes.
If you're running a modern sh (e.g. ksh or bash, not the "classic" Bourne sh), you may also want to use the $() syntax (it's easier to find the matching start/end delimiter).
do it like this. pass the variable from shell to awk properly using -v
#!/bin/bash
toinsert="$1"
awk -v toinsert=$toinsert '
/#append1/{
$0="pref"toinsert"=0\n"$0
}
{print}
' file > temp
mv temp file
output
$ cat file
BEGIN{
#append1
}
$ ./shell.sh c
BEGIN{
prefc=0
#append1
}