capture last line of file as integer variable and use in awk command - awk

I am trying to capture the last line of a file as a variable for use in an awk command.
Here is an example of the file (the end of it) :
cat file.txt
....
phylum:Chlorophyta 1
phylum:Mucoromycota 1
column 6:
superkingdom:Eukaryota 99
column 7:
99
I want to use that '99' as an integer in an awk command, saving it as a variable,
tail -n1 file.txt
99
e.g.
div=$(tail -n1 file.txt)
echo $div
99
To be used in a 2nd file (conf.txt), to divide the numbers in the 2nd field:
cat conf.txt
Class 88
Family 78
Genus 44
Species 23
BUT, when I try to use the $div variable in the awk command (using -v flag as suggested here and elsewhere with awk when taking a variable) I get this error:
awk -v a=$div '{print $2/a}' conf.txt
awk: can't open file {print $2/a}
source line number 1
But when saivng 99 as a variable simply on the cmd line, It works just fine:
num=99
awk -v a=$num '{print $2/a}' conf.txt
0.888889
0.787879
0.444444
0.232323
Are there extra spaces/characters in the capture from tail -1? I am missing something simple, but fundamental.
Ultimatey, I don't even want to have to save as a separate variable first If I dont have to, instead, just capture that last line number (99) and put directly into an awk cmd, e.g.:
awk '{print $2/[tail -1 file.txt]}' conf.txt
This is psuedo code (in the brackets) ...but, this would ultimately be what Id want...
Thanks for any help!

There's a space at the beginning of the last line, so the command is becoming
awk -v a= 99 '{print $2/a}' conf.txt
This is setting a to an empty string, treating 99 as the awk script, and the rest as filenames.
Remove the spaces from $div.
div=${div// /}

Use quotes as a habit in the shell.
Given:
cat file
blah blah
99
The command n=$(tail -n1 file) produces leading spaces in front of the 99:
n=$(tail -n1 file)
printf "\"%s\"\n" "$n"
" 99"
It is especially a bug that bites when you think you are checking the value of $n without quotes because the leading spaces are stripped by the shell prior to invoking echo.
Consider:
echo $n # no quotes - leading spaces stripped
99
echo "$n" # preserve whitespace...
99
Now if you try and pass that argument without quotes to awk, the space has meaning to the shell and screws up how the command is interpreted:
awk -v n=$n 'BEGIN{printf "\"%s\", %s\n", n, n+1}'
awk: fatal: cannot open file `BEGIN{printf "\"%s\", %s\n", n, n+1}' for reading: No such file or directory
vs:
awk -v n="$n" 'BEGIN{printf "\"%s\", %s\n", n, n+1}'
" 99", 100
If you want to use awk to replace the use of tail you use the idiom of FNR==NR to test if the file is the first file and $1==$1+0 to test if awk is interpreting what it sees as a number:
awk 'FNR==NR {n=$1+0==$1 ? $1+0 : n; next} # n ends up being the last number seen
$2==$2+0{print $2/n}
' file conf.txt
0.888889
0.787879
0.444444
0.232323

Rather than have shell call some command to get the last line of file.txt then save it in a shell variable, then set an awk variable to that same value populated from the shell variable and passing it to awk, just use one call to awk:
$ awk 'NR==FNR{n=$1; next} {print $2/n}' file.txt conf.txt
0.888889
0.787879
0.444444
0.232323

Enabling debug mode and running the awk command:
$ set -x
$ awk -v a=$div '{print $2/a}' conf.txt
+ awk -v a= 99 '{print $2/a}'
awk: fatal: cannot open file `{print $2/a}' for reading: No such file or directory
Of interest:
-v a= - define awk variable a as being empty
99 - awk code/script
'{print $2/a}' - first file passed to awk script, and the source of the error message
As others have pointed out you can get around the error by wrapping $div in double quotes:
$ awk -v a="$div" '{print $2/a}' conf.txt
+ awk -v 'a= 99' '{print $2/a}' conf.txt
0.888889
0.787879
0.444444
0.232323
Of interest:
-v '= 99' - define awk variable a and string ' 99'
in this case awk ignores the spaces when the rest of the variable can be interpreted as a numeric
'{print $2/a}' - awk code/script
conf.txt - file passed to awk script
Barmar and dawg have addressed stripping the blanks from div and using awk for the entire process, respectively.

Related

How can I print only lines that are immediately preceeded by an empty line in a file using sed?

I have a text file with the following structure:
bla1
bla2
bla3
bla4
bla5
So you can see that some lines of text are preceeded by an empty line.
I understand that sed has the concept of two buffers, a pattern space buffer and a hold space buffer, so I'm guessing these need to come in to play here, but I'm unclear how to specify them to accomplish what I need.
In my contrived example above, I'd expect to see the following lines outputted:
bla3
bla5
sed is for doing s/old/new on individual lines, that is all. Any time you start talking about buffers or doing anything related to multi-lines comparisons you're using the wrong tool.
You could do this with awk:
$ awk -v RS= -F'\n' 'NR>1{print $1}' file
bla3
bla5
but it would fail to print the first non-empty line if the first line(s) in the file were empty so this may be what you want if you want lines of all space chars considered to be empty lines:
$ awk 'NF && !p{print} {p=NF}' file
bla3
bla5
and this otherwise:
$ awk '($0!="") && (p==""){print} {p=$0}' file
bla3
bla5
All of the above will work even if there are multiple empty lines preceding any given non-empty line.
To see the difference between the 3 approaches (which you won't see given the sample input in the question):
PS1> printf '\nfoo\n \nbar\n\netc\n' | cat -E
$
foo$
$
bar$
$
etc$
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk -v RS= -F'\n' 'NR>1{print $1}'
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk 'NF && !p{print} {p=NF}'
foo
bar
etc
PS1> printf '\nfoo\n \nbar\n\netc\n' | awk '($0!="") && (p==""){print} {p=$0}'
foo
etc
You can use the hold buffer easily to print the line before the blank like this:
sed -n -e '/^$/{x; p;}' -e h input
But I don't see an easy way to use it for your use case. For your case, instead of using the hold buffer, you could do:
sed -n -e '/^$/ba' -e d -e :a -e n -e p input
But I would do this with awk.
awk 'NR!=1{print $1}' RS= FS=\\n input-file
awk 'p;{p=/^$/}' file
above command does these for each line:
if p is 1, print line;
if line is empty, set p to 1.
if lines consisting of one or more spaces are also considered empty:
awk 'p;{p=!NF}' file
to print non-empty lines each coming right after an empty line, you can use this:
awk 'p*!(p=/^$/)' file
if p is 1 and this line is not empty (1*!(0) = 1*1 = 1), print this line;
otherwise (1*!(1) = 1*0 = 0, 0*anything = 0), don't print anything.
note that this one may not work with all awks, a portable version of this would look like:
awk 'p*(/./);{p=/^$/}' file
if lines consisting of one or more spaces are also considered empty:
awk 'p*NF;{p=!NF}' file
see them online here, and here.
If sed/awk is not mandatory, you can do it with grep:
grep -A 1 '^$' input.txt | grep -v -E '^$|--'
You can use sed to match a range of lines and do sub-matches inside the matches, like so:
# - use the "-n" option to omit printing of lines
# - match lines between a blank line (/^$/) and a non-blank one (/^./),
# then print only the line that contains at least a character,
# i.e, the non-blank line.
sed -ne '
/^$/,/^./ {
/^./{ p; }
}' input.txt
tested by gnu sed, your data in 'a':
$ sed -nE '/^$/{N;s/\n(.+)/\1/p}' a
bla3
bla5
add -i option precedes -n to real editing

Proper way to use variables in awk in a script? [duplicate]

I found some ways to pass external shell variables to an awk script, but I'm confused about ' and ".
First, I tried with a shell script:
$ v=123test
$ echo $v
123test
$ echo "$v"
123test
Then tried awk:
$ awk 'BEGIN{print "'$v'"}'
$ 123test
$ awk 'BEGIN{print '"$v"'}'
$ 123
Why is the difference?
Lastly I tried this:
$ awk 'BEGIN{print " '$v' "}'
$ 123test
$ awk 'BEGIN{print ' "$v" '}'
awk: cmd. line:1: BEGIN{print
awk: cmd. line:1: ^ unexpected newline or end of string
I'm confused about this.
#Getting shell variables into awk
may be done in several ways. Some are better than others. This should cover most of them. If you have a comment, please leave below.                                                                                    v1.5
Using -v (The best way, most portable)
Use the -v option: (P.S. use a space after -v or it will be less portable. E.g., awk -v var= not awk -vvar=)
variable="line one\nline two"
awk -v var="$variable" 'BEGIN {print var}'
line one
line two
This should be compatible with most awk, and the variable is available in the BEGIN block as well:
If you have multiple variables:
awk -v a="$var1" -v b="$var2" 'BEGIN {print a,b}'
Warning. As Ed Morton writes, escape sequences will be interpreted so \t becomes a real tab and not \t if that is what you search for. Can be solved by using ENVIRON[] or access it via ARGV[]
PS If you have vertical bar or other regexp meta characters as separator like |?( etc, they must be double escaped. Example 3 vertical bars ||| becomes -F'\\|\\|\\|'. You can also use -F"[|][|][|]".
Example on getting data from a program/function inn to awk (here date is used)
awk -v time="$(date +"%F %H:%M" -d '-1 minute')" 'BEGIN {print time}'
Example of testing the contents of a shell variable as a regexp:
awk -v var="$variable" '$0 ~ var{print "found it"}'
Variable after code block
Here we get the variable after the awk code. This will work fine as long as you do not need the variable in the BEGIN block:
variable="line one\nline two"
echo "input data" | awk '{print var}' var="${variable}"
or
awk '{print var}' var="${variable}" file
Adding multiple variables:
awk '{print a,b,$0}' a="$var1" b="$var2" file
In this way we can also set different Field Separator FS for each file.
awk 'some code' FS=',' file1.txt FS=';' file2.ext
Variable after the code block will not work for the BEGIN block:
echo "input data" | awk 'BEGIN {print var}' var="${variable}"
Here-string
Variable can also be added to awk using a here-string from shells that support them (including Bash):
awk '{print $0}' <<< "$variable"
test
This is the same as:
printf '%s' "$variable" | awk '{print $0}'
P.S. this treats the variable as a file input.
ENVIRON input
As TrueY writes, you can use the ENVIRON to print Environment Variables.
Setting a variable before running AWK, you can print it out like this:
X=MyVar
awk 'BEGIN{print ENVIRON["X"],ENVIRON["SHELL"]}'
MyVar /bin/bash
ARGV input
As Steven Penny writes, you can use ARGV to get the data into awk:
v="my data"
awk 'BEGIN {print ARGV[1]}' "$v"
my data
To get the data into the code itself, not just the BEGIN:
v="my data"
echo "test" | awk 'BEGIN{var=ARGV[1];ARGV[1]=""} {print var, $0}' "$v"
my data test
Variable within the code: USE WITH CAUTION
You can use a variable within the awk code, but it's messy and hard to read, and as Charles Duffy points out, this version may also be a victim of code injection. If someone adds bad stuff to the variable, it will be executed as part of the awk code.
This works by extracting the variable within the code, so it becomes a part of it.
If you want to make an awk that changes dynamically with use of variables, you can do it this way, but DO NOT use it for normal variables.
variable="line one\nline two"
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
Here is an example of code injection:
variable='line one\nline two" ; for (i=1;i<=1000;++i) print i"'
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
1
2
3
.
.
1000
You can add lots of commands to awk this way. Even make it crash with non valid commands.
One valid use of this approach, though, is when you want to pass a symbol to awk to be applied to some input, e.g. a simple calculator:
$ calc() { awk -v x="$1" -v z="$3" 'BEGIN{ print x '"$2"' z }'; }
$ calc 2.7 '+' 3.4
6.1
$ calc 2.7 '*' 3.4
9.18
There is no way to do that using an awk variable populated with the value of a shell variable, you NEED the shell variable to expand to become part of the text of the awk script before awk interprets it. (see comment below by Ed M.)
Extra info:
Use of double quote
It's always good to double quote variable "$variable"
If not, multiple lines will be added as a long single line.
Example:
var="Line one
This is line two"
echo $var
Line one This is line two
echo "$var"
Line one
This is line two
Other errors you can get without double quote:
variable="line one\nline two"
awk -v var=$variable 'BEGIN {print var}'
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ backslash not last character on line
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ syntax error
And with single quote, it does not expand the value of the variable:
awk -v var='$variable' 'BEGIN {print var}'
$variable
More info about AWK and variables
Read this faq.
It seems that the good-old ENVIRON awk built-in hash is not mentioned at all. An example of its usage:
$ X=Solaris awk 'BEGIN{print ENVIRON["X"], ENVIRON["TERM"]}'
Solaris rxvt
You could pass in the command-line option -v with a variable name (v) and a value (=) of the environment variable ("${v}"):
% awk -vv="${v}" 'BEGIN { print v }'
123test
Or to make it clearer (with far fewer vs):
% environment_variable=123test
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }'
123test
You can utilize ARGV:
v=123test
awk 'BEGIN {print ARGV[1]}' "$v"
Note that if you are going to continue into the body, you will need to adjust
ARGC:
awk 'BEGIN {ARGC--} {print ARGV[2], $0}' file "$v"
I just changed #Jotne's answer for "for loop".
for i in `seq 11 20`; do host myserver-$i | awk -v i="$i" '{print "myserver-"i" " $4}'; done
I had to insert date at the beginning of the lines of a log file and it's done like below:
DATE=$(date +"%Y-%m-%d")
awk '{ print "'"$DATE"'", $0; }' /path_to_log_file/log_file.log
It can be redirect to another file to save
Pro Tip
It could come handy to create a function that handles this so you dont have to type everything every time. Using the selected solution we get...
awk_switch_columns() {
cat < /dev/stdin | awk -v a="$1" -v b="$2" " { t = \$a; \$a = \$b; \$b = t; print; } "
}
And use it as...
echo 'a b c d' | awk_switch_columns 2 4
Output:
a d c b

Why does awk not filter the first column in the first line of my files?

I've got a file with following records:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt;2;CLI001
depots/import/HDN1YYAA_20102018.txt;32;CLI001
depots/import/HDN1YYAA_25102018.txt;1;CAB001
depots/import/HDN1YYAA_50102018.txt;1;CAB001
depots/import/HDN1YYAA_65102018.txt;1;CAB001
depots/import/HDN1YYAA_80102018.txt;2;CLI001
depots/import/HDN1YYAA_93102018.txt;2;CLI001
When I execute following oneliner awk:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR==1){print $1}}END {}'
the output is not the expected:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
While I am suppose get only the frist column:
If I run it through all the records:
cat lignes_en_erreur.txt | awk 'FS=";"{ if(NR>0){print $1}}END {}'
then it will start filtering only after the second line and I get the following output:
depots/import/HDN1YYAA_15102018.txt;1;CAB001
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_20102018.txt
depots/import/HDN1YYAA_25102018.txt
depots/import/HDN1YYAA_50102018.txt
depots/import/HDN1YYAA_65102018.txt
depots/import/HDN1YYAA_80102018.txt
depots/import/HDN1YYAA_93102018.txt
Does anybody knows why awk is skiping the first line only.
I tried deleting first record but the behaviour is the same, it will skip the first line.
First, it should be
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}END {}' filename
You can omit the END block if it is empty:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}' filename
You can use the -F command line argument to set the field delimiter:
awk -F';' '{if(NR==1){print $1}}' filename
Furthermore, awk programs consist of a sequence of CONDITION [{ACTIONS}] elements, you can omit the if:
awk -F';' 'NR==1 {print $1}' filename
You need to specify delimiter in either BEGIN block or as a command-line option:
awk 'BEGIN{FS=";"}{ if(NR==1){print $1}}'
awk -F ';' '{ if(NR==1){print $1}}'
cut might be better suited here, for all lines
$ cut -d';' -f1 file
to skip the first line
$ sed 1d file | cut -d';' -f1
to get the first line only
$ sed 1q file | cut -d';' -f1
however at this point it's better to switch to awk
if you have a large file and only interested in the first line, it's better to exit early
$ awk -F';' '{print $1; exit}' file

Exact string match in awk

I have a file test.txt with the next lines
1997 100 500 2010TJ
2010TJXML 16 20 59
I'm using the next awk line to get information only about string 2010TJ
awk -v var="2010TJ" '$0 ~ var {print $0}' test.txt
But the code print the two lines. I want to know how to get the line containing the exact string
1997 100 500 2010TJ
the string can be placed in any column of the file.
Several options:
Use a gawk word boundary (not POSIX awk...):
$ gawk '/\<2010TJ\>/' file
An actual space or tab or what is separating the columns:
$ awk '/^2010TJ /' file
Or compare the field directly to the string:
$ awk '$1=="2010TJ"' file
You can loop over the fields to test each field if you wish:
$ awk '{for (i=1;i<=NF;i++) if ($i=="2010TJ") {print; next}}' file
Or, given your example of setting a variable, those same using a variable:
$ gawk -v s=2010TJ '$0~"\\<" s "\\>"'
$ awk -v s=2010TJ '$0~"^" s " "'
$ awk -v s=2010TJ '$1==s'
Note the first is a little different than the second and third. The first is the standalone string 2010TJ anywhere in $0; the second and third is a string that starts with that string.
Try this (for testing only column 1) :
awk '$1 == "2010TJ" {print $0}' test.txt
or grep like (all columns) :
gawk '/\<2010TJ\>/ {print $0}' test.txt
Note
\< \> is word boundarys
another awk with word boundary
awk '/\y2010TJ\y/' file
note \y matches either beginning or end of a word.

reading a variable in awk from the command line after entering the command

I try searching a file by using awk. How can I ask awk to read a variable from the command line as a name to get searched in the file:
this is a regular way I use to search the file and I can ask the user to enter a name to search in the file.txt
awk -f myAwk.awk file.txt
How can I manage it like this :
awk -f myAwk.awk file.txt nameToSearch
How can I use ARGC and ARGV to search the nameToSearch in the file.txt?
What you're probably looking for is
awk [-W option] [-F value] [-v var=value] [--] 'program text' [file ...]
so
awk -v MYVAR=nameToSearch -v OTHERVAR=somethingElse -f myAwk.awk file.txt
Is that it? of course order of switches ( -f, -v ) does not matter. Obvously you then need to include MYVAR ( OTHERVAR ) for a variable identifier inside awk program itself.
To pass a variable to awk, you can use the -v command.
For example:
cat file.txt | awk -v p="stringToSearch" '$0 ~ p'
In this command, tou replace stringToSearch with a pattern (please keep the double quote, they are useful for preserving spaces). The awk command $0 ~ p compares the current line to the given pattern.
Another approach is to build the awk command from the shell:
p="stringToSearch"
awk "/$p/" file.txt
You must use double quotes in the command to force expanding $p.
If it's permitted to change the order of arguments, so that we can do this:
awk -f myAwk.awk nameToSearch file.txt
then you can do:
awk 'NR==1 { nameToSearch = $0; next} { ... rest of myAwk.awk here ...}' nameToSearch file.txt
You can of course add the NR==1 {...} block to the beginning of your myAwk.awk file, then continue using:
awk -f myAwk.awk nameToSearch file.txt
The technique Piotr Wadas describes has the same effect:
awk -v nameToSearch=whatever -f myAwk.awk file.txt
and that's what I'd use myself, rather than passing whatever as an additional argument to the script. Passing whatever as an additional argument is what scripters had to do before the -v facilities were added to awk. If writing -v nameToSearch= is too verbose, then I'd wrap the whole thing up in a shell script, and say:
myShellScript whatever file.txt
But you asked how to do it by passing whatever as an additional argument to the awk script, so that's what I demonstrated.