how to search using variables in awk - awk

I've this file test1 as follows
NAME1 04-03-2014
NAME2 04-04-2014
Now I'm able to get the o/p that I need when I execute from the command line but not when I use it as a variable. My awk version is 3.1.5
works from the cmd line: awk '$2 = /04-03-2014/' test1
doesn't work in the bash/ksh script:
export m=04 d=03 y=2014
awk -v m="$m" -v d="$d" -v y="$y" '$2 = /m-d-y/' test1 OR
awk -v m="$m" -v d="$d" -v y="$y" '$2 = m-d-y' test1 ### This replaces the 2nd field as -2010
I've used variables in awk before in printf statements to calculate numbers but the above example somehow doesn't seem to work for me. However, I tried a workaround i.e echo the awk cmd in a file and then executing the file as a script which gives me the desired o/p however I don't think this is the correct way of doing things and would appreciate if someone could give me a smarter way.

It's a guess but I bet you're trying to print every line where $2 is equal to a specific string constructed from your variables. If so that'd be:
awk -v m="$m" -v d="$d" -v y="$y" '$2 == m"-"d"-"y' file

Related

capture last line of file as integer variable and use in awk command

I am trying to capture the last line of a file as a variable for use in an awk command.
Here is an example of the file (the end of it) :
cat file.txt
....
phylum:Chlorophyta 1
phylum:Mucoromycota 1
column 6:
superkingdom:Eukaryota 99
column 7:
99
I want to use that '99' as an integer in an awk command, saving it as a variable,
tail -n1 file.txt
99
e.g.
div=$(tail -n1 file.txt)
echo $div
99
To be used in a 2nd file (conf.txt), to divide the numbers in the 2nd field:
cat conf.txt
Class 88
Family 78
Genus 44
Species 23
BUT, when I try to use the $div variable in the awk command (using -v flag as suggested here and elsewhere with awk when taking a variable) I get this error:
awk -v a=$div '{print $2/a}' conf.txt
awk: can't open file {print $2/a}
source line number 1
But when saivng 99 as a variable simply on the cmd line, It works just fine:
num=99
awk -v a=$num '{print $2/a}' conf.txt
0.888889
0.787879
0.444444
0.232323
Are there extra spaces/characters in the capture from tail -1? I am missing something simple, but fundamental.
Ultimatey, I don't even want to have to save as a separate variable first If I dont have to, instead, just capture that last line number (99) and put directly into an awk cmd, e.g.:
awk '{print $2/[tail -1 file.txt]}' conf.txt
This is psuedo code (in the brackets) ...but, this would ultimately be what Id want...
Thanks for any help!
There's a space at the beginning of the last line, so the command is becoming
awk -v a= 99 '{print $2/a}' conf.txt
This is setting a to an empty string, treating 99 as the awk script, and the rest as filenames.
Remove the spaces from $div.
div=${div// /}
Use quotes as a habit in the shell.
Given:
cat file
blah blah
99
The command n=$(tail -n1 file) produces leading spaces in front of the 99:
n=$(tail -n1 file)
printf "\"%s\"\n" "$n"
" 99"
It is especially a bug that bites when you think you are checking the value of $n without quotes because the leading spaces are stripped by the shell prior to invoking echo.
Consider:
echo $n # no quotes - leading spaces stripped
99
echo "$n" # preserve whitespace...
99
Now if you try and pass that argument without quotes to awk, the space has meaning to the shell and screws up how the command is interpreted:
awk -v n=$n 'BEGIN{printf "\"%s\", %s\n", n, n+1}'
awk: fatal: cannot open file `BEGIN{printf "\"%s\", %s\n", n, n+1}' for reading: No such file or directory
vs:
awk -v n="$n" 'BEGIN{printf "\"%s\", %s\n", n, n+1}'
" 99", 100
If you want to use awk to replace the use of tail you use the idiom of FNR==NR to test if the file is the first file and $1==$1+0 to test if awk is interpreting what it sees as a number:
awk 'FNR==NR {n=$1+0==$1 ? $1+0 : n; next} # n ends up being the last number seen
$2==$2+0{print $2/n}
' file conf.txt
0.888889
0.787879
0.444444
0.232323
Rather than have shell call some command to get the last line of file.txt then save it in a shell variable, then set an awk variable to that same value populated from the shell variable and passing it to awk, just use one call to awk:
$ awk 'NR==FNR{n=$1; next} {print $2/n}' file.txt conf.txt
0.888889
0.787879
0.444444
0.232323
Enabling debug mode and running the awk command:
$ set -x
$ awk -v a=$div '{print $2/a}' conf.txt
+ awk -v a= 99 '{print $2/a}'
awk: fatal: cannot open file `{print $2/a}' for reading: No such file or directory
Of interest:
-v a= - define awk variable a as being empty
99 - awk code/script
'{print $2/a}' - first file passed to awk script, and the source of the error message
As others have pointed out you can get around the error by wrapping $div in double quotes:
$ awk -v a="$div" '{print $2/a}' conf.txt
+ awk -v 'a= 99' '{print $2/a}' conf.txt
0.888889
0.787879
0.444444
0.232323
Of interest:
-v '= 99' - define awk variable a and string ' 99'
in this case awk ignores the spaces when the rest of the variable can be interpreted as a numeric
'{print $2/a}' - awk code/script
conf.txt - file passed to awk script
Barmar and dawg have addressed stripping the blanks from div and using awk for the entire process, respectively.

Use awk to interpret }{ as RS and output with ORS }\n{

I have data that looks like this:
{"anonymousId":"abc123",{"hello":"world"}}{"anonymousId":"abc456",{"hi": "again"}}
It's as if you took a newline-delimited json file and removed all the newlines.
I'm trying to use awk to convert it to to ndjson.
That is, my expected output is this:
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
I don't want to load the entire file into memory (which is why I'm not using sed), so my thought is I should use }{ as row separator. Then, I figure if I use }\n{ as ORS I should get my desired output.
So I tried this:
cat my-file.txt | awk -v RS="}{" -v ORS="}\n{" '{$1=$1}1'
But it doesn't work!
Here's the output I get:
{"anonymousId":"abc123",{"hello":"world"}
{}
{{"anonymousId":"abc456",{"hi": "again"}
{}
{}
{
Apart from the constraint of not loading the entire file into memory, I don't care what bash command is used, but my thinking is awk will be the way. E.g. if tr supported multi-character expressions, that would be fine with me.
Please help me understand why this isn't working as expected and what I need to change.
Thanks!
Update
Following the answers given, will add some learnings.
The TLDR is don't use a macOS if you need to do trickier things like this.
For one this doen't work on mac: echo -e "a\nb\nc\nd\ne\n" | head -n -2; it complains about illegal line parameter, but this is valid on a linux system.
The other problem was the way awk was working on my (mac) system.
My awk command was close to correct.
On linux it produces this output:
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}}
{
So I just have to find a way to trim the trailing }\n{ (and as pointed out in the answer, the {$1=$1} is not necessary).
But all of those extraneous newlines were due to the screwy implementation of awk on my system ( It wasn't gawk and i'm not sure what it was ).
Doing $1=$1 inside awk -v RS='}{' -v ORS='}\n{' '{$1=$1}1' file isn't useful - it tells awk to recompile the current record replacing all chains of white space with blanks but you the only white space in your example is the \n at the end of the file and there's no point converting that to a blank. So your script can be reduced to:
awk -v RS='}{' -v ORS='}\n{' '1' file
but RS='}{' means different things to different awk variants.
Use of a multi-char RS with GNU awk (and probably a couple of others now) means that the RS is treated as a regexp to separate the records:
$ awk -v RS='}{' -v ORS='}\n{' '1' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
}
{$
Note the extra }\n{ added at the end because there is no }{ at the end of your input and so the end of input itself indicates the end of a record and so gets replaced with the ORS value.
Use of a multi-char RS with a POSIX awk means that the 2nd and subsequent chars in the RS get ignored and the first char is taken as the RS, hence the output you reported seeing in your question:
$ awk --posix -v RS='}{' -v ORS='}\n{' '1' file
{"anonymousId":"abc123",{"hello":"world"}
{}
{{"anonymousId":"abc456",{"hi": "again"}
{}
{
}
{$
where every } alone gets treated as matching RS and so gets replaced by ORS.
So you are not using an awk that supports multi-char RS. Your choices are to install one (preferably gawk) and do:
$ awk -v RS='}[{\n]' '{ORS=gensub(/}{/,"}\n{",1,RT)} 1' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
otherwise do something like this with any awk:
$ awk --posix -v RS='{' -v ORS= '{print pfx $0; pfx=(/}$/ ? "\n" : "") RS}' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
In the gawk solution above we define the RS as '}[{\n]' to say that the records mid-line are terminated by }{ but the record at the end of the line is terminated by }\n. So RT holds }{ for every record except the last one on the line which is }\n if your line ends with \n or NULL otherwise and so we just have to set ORS to be RT but with }{ converted to }\n{ for those records where RT has that value, otherwise ORS just gets set to }\n when RT has that value or NULL if your input didn't have a terminating \n.
An alternative gawk solution that I think I might actually prefer would be:
$ awk -v RS='}{' -v ORS='}\n{' 'NR>1{print prev} {prev=$0} END{printf "%s",prev}' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
EDIT: original answer for posterity before I noticed the OP said they don't want to read the whole file into memory:
Simple substitutions on individual strings like this is what sed is best at:
$ sed 's/}{/}\n{/g' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
otherwise with any awk:
$ awk '{gsub(/}{/,"}\n{")} 1' file
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
using record separator will create an extra delimiter at the end of the file, since it's static we can just remove it afterwards
$ echo '{"anonymousId":"abc123",{"hello":"world"}}{"anonymousId":"abc456",{"hi": "again"}}' |
awk -v RS='}{' -v ORS='}\n{' 1 | head -n -2
{"anonymousId":"abc123",{"hello":"world"}}
{"anonymousId":"abc456",{"hi": "again"}}
if you don't have gawk for multi-char RS support, you can have this workaround
$ echo ... |
awk -v RS='}' 'NF{printf "%s", $0 RS} !NF{print RS}' | head -n -2
there will be an extra RS, which will be trimmed afterwards.

AWK column from a script param?

I want to call awk from a bash script like this:
#!/bin/bash
awk -vFPAT='[^ ]*|"[^"]*"|\\[[^]]*\\]' '{ print $2 }' $1
I want $2 to be a number that I specify. So if the script is named get-log-column I'd like to be able to call it this way: get-log-column /var/log/apache2/access.log 4
In this example, 4 would be the column so the output would be column 4 from access.log.
In other words, if access.log looked like this:
alpha beta orange apple snickers
paris john michael peace world
So the output would be:
apple
peace
Could you please try following.
#!/bin/bash
var="$1"
awk -v FPAT='[^ ]*|"[^"]*"|\\[[^]]*\\]' -v varcol="$var" '{ print $varcol }' Input_file
Explanation:
Have created a shell variable var which will have $1 value in it. Where $1 value is the argument passed to script. Now in awk we can't pass shell variables directly so created 1 awk variable named var_col which will have value of var in it. Now mentioning $varcol will print column value from current line as per OP's question. $ means field number and varcol is a variable which has user entered value in it.
This may work
#!/bin/bash
awk -v var="$1" -v FPAT='[^ ]*|"[^"]*"|\\[[^]]*\\]' '{ print $var }' $1
See this on how to use variable from shell in awk
How do I use shell variables in an awk script?

awk use a command line variable

awk -F, -f awkfile.awk -v mysearch="search term"
I am trying to use the above command from terminal and use search as the search term in the awk program. My awk program runs perfectly fine while actually assigning the search term inside of the program but I am wondering how to get the variable search to be used?
example of the line it's used at if($j ~ /mysearch/){, this does not work at setting the search term, but actually searching for the string mysearch.
Just remove the slashes:
$j ~ mysearch
This is not ideal, but I suggest to write a bash script, which takes in the search term, replace that search term in the awk script, then run the script. For example:
$ cat dosearch.sh
sed "s/XXX/$1/" awktemplate.awk > awkfile.awk
awk -f awkfile.awk data.txt
$ cat awktemplate.awk
{
j = 1
if ($j ~ /XXX/) {
# Do something, such as
print "Found:", $0
}
}
$ cat data.txt
foo here
bar there
xyz everywhere
$ ./dosearch.sh foo
Found: foo here
$ ./dosearch.sh bar
Found: bar there
In the above example, the awk template contains "XXX" as a search term, the bash script replaces that search term with the first parameter, then invoke awk on the modified script.
$ cat input
tinky-winky
dipsy
laa-laa
noo-noo
po
$ teletubby='po'
$ awk -v "regexp=$teletubby" '$0 ~ regexp' input
po
Note that anything could go into the shell-variable,
even a full-blown regexp, e.g ^d.*y. Just make sure to use single-quotes
to prevent the shell from doing any expansion.

awk won't print new line characters

I am using the below code to change an existing awk script so that I can add more and more cases with a simple command.
echo `awk '{if(/#append1/){print "pref'"$1"'=0\n" $0 "\n"} else{print $0 "\n"}}' tf.a
note that the first print is "pref'"$1"'=0\n" so it is referring to the variable $1 in its environment, not in awk itself.
The command ./tfb.a "c" should change the code from:
BEGIN{
#append1
}
...
to:
BEGIN{
prefc=0
#append1
}
...
However, it gives me everything on one line.
Does anyone know why this is?
If you take awk right out of the equation you can see what's going on:
# Use a small test file instead of an awk script
$ cat xxx
hello
there
$ echo `cat xxx`
hello there
$ echo "`cat xxx`"
hello
there
$ echo "$(cat xxx)"
hello
there
$
The backtick operator expands the output into shell "words" too soon. You could play around with the $IFS variable in the shell (yikes), or you could just use double-quotes.
If you're running a modern sh (e.g. ksh or bash, not the "classic" Bourne sh), you may also want to use the $() syntax (it's easier to find the matching start/end delimiter).
do it like this. pass the variable from shell to awk properly using -v
#!/bin/bash
toinsert="$1"
awk -v toinsert=$toinsert '
/#append1/{
$0="pref"toinsert"=0\n"$0
}
{print}
' file > temp
mv temp file
output
$ cat file
BEGIN{
#append1
}
$ ./shell.sh c
BEGIN{
prefc=0
#append1
}