How to control the format of float numbers in gawk? - awk

The following two runs are different. How to make the first run the same as the second run (I still want print without any explicit arguments)? Is there a way to control the number of digits in $1 = 1/3?
$ gawk -v OFMT='%.20g' -e 'BEGIN { $1 = 1/3; print }'
0.333333
$ gawk -v OFMT='%.20g' -e 'BEGIN { print 1/3}'
0.33333333333333331483
EDIT: The following comparison is also unexpected. Ideally, if there is just one field, print $1 and print should be just the same. I think it could be considered as a bug?
$ gawk -v OFMT='%.20g' -e 'BEGIN { $1 = 1/3; print $1}'
0.33333333333333331483
$ gawk -v OFMT='%.20g' -e 'BEGIN { $1 = 1/3; print}'
0.333333

There is a subtlety here. There are two variables, OFMT and CONVFMT. The variable OFMT is used to control how numbers are converted to strings in the print statement while the variable CONVFMT is used to define how numbers are converted to strings in general (outside of the print statement):
Prior to the POSIX standard, awk used the value of OFMT for converting numbers to strings. OFMT specifies the output format to use when printing numbers with print. CONVFMT was introduced in order to separate the semantics of conversion from the semantics of printing. Both CONVFMT and OFMT have the same default value: "%.6g". In the vast majority of cases, old awk programs do not change their behaviour.
source: GNU awk manual
More detailed information about this reasoning can be found in the secion rationale of the awk POSIX standard.
numeric value in print statement:
$ awk 'BEGIN{print 1/3}'
0.333333
$ awk 'BEGIN{OFMT="%.20g"; print 1/3 }'
0.33333333333333331483
$ awk 'BEGIN{CONVFMT="%.20g"; print 1/3 }'
0.333333
variable with a numeric value in print statement:
$ awk 'BEGIN{a=1/3; print a}'
0.333333
$ awk 'BEGIN{OFMT="%.20g"; a=1/3; print a }'
0.33333333333333331483
$ awk 'BEGIN{CONVFMT="%.20g"; a=1/3; print a }'
0.333333
variable with a numeric value converted to string in print statement:
$ awk 'BEGIN{a=1/3; a=a""; print a}'
0.333333
$ awk 'BEGIN{OFMT="%.20g"; a=1/3; a=a""; print a }'
0.333333
$ awk 'BEGIN{CONVFMT="%.20g"; a=1/3; a=a""; print a }'
0.33333333333333331483

I am not sure if its a bug, but try to set a variable and not first field
gawk -v OFMT='%.20g' -e 'BEGIN { a = 1/3; print a}'
0.33333333333333331483

Related

using AWK, how do I convert a decimal number to hexadecimal

If I do have an input stream of decimal numbers, e.g.
100 2000 599 232
and I pass them to awk, how do I print them in Hexadecimal notation?.
for example
0x64 0x74D 0x257 0xE8
starting script ...
echo "100 2000 599 232" | awk '{ print $1 }' #here print in hexa instead of decimal
You can use printf in awk with a format string to convert to hex:
awk '{ printf "%x\n", $1 }'
quick caveat - mawk 1.3.4 has severe limitations when it comes to printing octal and hex codes :
$ gawk 'BEGIN{ printf("%\043.16x\n",8^8*-1-2) }'
0xfffffffffefffffe
$ nawk 'BEGIN{ printf("%\043.16x\n",8^8*-1-2) }'
0xfffffffffefffffe
$ mawk 'BEGIN{ printf("%\043.16x\n",8^8*-1-2) }'
0000000000000000
$ mawk2 'BEGIN{ printf("%\043.16x\n",8^8*-1-2) }'
0xfffffffffefffffe
It's not even that large a value (-16777218), and mawk 1.3.4 completely bellyflops. On the flip side, it can directly decipher some hex constants (only gawk not in either posix or traditional mode can directly decipher octal constants :
$ mawk 'BEGIN { OFMT="%.f"; print +"0xDEADBEEF" }'
3735928559
nawk 'BEGIN { OFMT="%.f"; print +"0xDEADBEEF" }'
3735928559
$ mawk2 'BEGIN { OFMT="%.f"; print +"0xDEADBEEF" }'
0
$ gawk --posix 'BEGIN{ OFMT="%.f"; print +"0xDEADBEEF" }'
3735928559 <==== note the difference - posix mode only can decipher strings
the "+" in front is also necessary cuz gawk will just print
it as a string otherwise.
$ gawk -e 'BEGIN { OFMT="%.f"; print 0xDEADBEEF }'
3735928559 <==== standard mode only can decipher clear text ones
- mawk2 is the only one among those above that
even prints anything out with %p in printf(),
but still erroring out, as such
mawk2: line 1: invalid control character 'p'
in [s]printf format ("0x10f0099da
- both gawk and nawk properly prints out %a

Proper way to use variables in awk in a script? [duplicate]

I found some ways to pass external shell variables to an awk script, but I'm confused about ' and ".
First, I tried with a shell script:
$ v=123test
$ echo $v
123test
$ echo "$v"
123test
Then tried awk:
$ awk 'BEGIN{print "'$v'"}'
$ 123test
$ awk 'BEGIN{print '"$v"'}'
$ 123
Why is the difference?
Lastly I tried this:
$ awk 'BEGIN{print " '$v' "}'
$ 123test
$ awk 'BEGIN{print ' "$v" '}'
awk: cmd. line:1: BEGIN{print
awk: cmd. line:1: ^ unexpected newline or end of string
I'm confused about this.
#Getting shell variables into awk
may be done in several ways. Some are better than others. This should cover most of them. If you have a comment, please leave below.                                                                                    v1.5
Using -v (The best way, most portable)
Use the -v option: (P.S. use a space after -v or it will be less portable. E.g., awk -v var= not awk -vvar=)
variable="line one\nline two"
awk -v var="$variable" 'BEGIN {print var}'
line one
line two
This should be compatible with most awk, and the variable is available in the BEGIN block as well:
If you have multiple variables:
awk -v a="$var1" -v b="$var2" 'BEGIN {print a,b}'
Warning. As Ed Morton writes, escape sequences will be interpreted so \t becomes a real tab and not \t if that is what you search for. Can be solved by using ENVIRON[] or access it via ARGV[]
PS If you have vertical bar or other regexp meta characters as separator like |?( etc, they must be double escaped. Example 3 vertical bars ||| becomes -F'\\|\\|\\|'. You can also use -F"[|][|][|]".
Example on getting data from a program/function inn to awk (here date is used)
awk -v time="$(date +"%F %H:%M" -d '-1 minute')" 'BEGIN {print time}'
Example of testing the contents of a shell variable as a regexp:
awk -v var="$variable" '$0 ~ var{print "found it"}'
Variable after code block
Here we get the variable after the awk code. This will work fine as long as you do not need the variable in the BEGIN block:
variable="line one\nline two"
echo "input data" | awk '{print var}' var="${variable}"
or
awk '{print var}' var="${variable}" file
Adding multiple variables:
awk '{print a,b,$0}' a="$var1" b="$var2" file
In this way we can also set different Field Separator FS for each file.
awk 'some code' FS=',' file1.txt FS=';' file2.ext
Variable after the code block will not work for the BEGIN block:
echo "input data" | awk 'BEGIN {print var}' var="${variable}"
Here-string
Variable can also be added to awk using a here-string from shells that support them (including Bash):
awk '{print $0}' <<< "$variable"
test
This is the same as:
printf '%s' "$variable" | awk '{print $0}'
P.S. this treats the variable as a file input.
ENVIRON input
As TrueY writes, you can use the ENVIRON to print Environment Variables.
Setting a variable before running AWK, you can print it out like this:
X=MyVar
awk 'BEGIN{print ENVIRON["X"],ENVIRON["SHELL"]}'
MyVar /bin/bash
ARGV input
As Steven Penny writes, you can use ARGV to get the data into awk:
v="my data"
awk 'BEGIN {print ARGV[1]}' "$v"
my data
To get the data into the code itself, not just the BEGIN:
v="my data"
echo "test" | awk 'BEGIN{var=ARGV[1];ARGV[1]=""} {print var, $0}' "$v"
my data test
Variable within the code: USE WITH CAUTION
You can use a variable within the awk code, but it's messy and hard to read, and as Charles Duffy points out, this version may also be a victim of code injection. If someone adds bad stuff to the variable, it will be executed as part of the awk code.
This works by extracting the variable within the code, so it becomes a part of it.
If you want to make an awk that changes dynamically with use of variables, you can do it this way, but DO NOT use it for normal variables.
variable="line one\nline two"
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
Here is an example of code injection:
variable='line one\nline two" ; for (i=1;i<=1000;++i) print i"'
awk 'BEGIN {print "'"$variable"'"}'
line one
line two
1
2
3
.
.
1000
You can add lots of commands to awk this way. Even make it crash with non valid commands.
One valid use of this approach, though, is when you want to pass a symbol to awk to be applied to some input, e.g. a simple calculator:
$ calc() { awk -v x="$1" -v z="$3" 'BEGIN{ print x '"$2"' z }'; }
$ calc 2.7 '+' 3.4
6.1
$ calc 2.7 '*' 3.4
9.18
There is no way to do that using an awk variable populated with the value of a shell variable, you NEED the shell variable to expand to become part of the text of the awk script before awk interprets it. (see comment below by Ed M.)
Extra info:
Use of double quote
It's always good to double quote variable "$variable"
If not, multiple lines will be added as a long single line.
Example:
var="Line one
This is line two"
echo $var
Line one This is line two
echo "$var"
Line one
This is line two
Other errors you can get without double quote:
variable="line one\nline two"
awk -v var=$variable 'BEGIN {print var}'
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ backslash not last character on line
awk: cmd. line:1: one\nline
awk: cmd. line:1: ^ syntax error
And with single quote, it does not expand the value of the variable:
awk -v var='$variable' 'BEGIN {print var}'
$variable
More info about AWK and variables
Read this faq.
It seems that the good-old ENVIRON awk built-in hash is not mentioned at all. An example of its usage:
$ X=Solaris awk 'BEGIN{print ENVIRON["X"], ENVIRON["TERM"]}'
Solaris rxvt
You could pass in the command-line option -v with a variable name (v) and a value (=) of the environment variable ("${v}"):
% awk -vv="${v}" 'BEGIN { print v }'
123test
Or to make it clearer (with far fewer vs):
% environment_variable=123test
% awk -vawk_variable="${environment_variable}" 'BEGIN { print awk_variable }'
123test
You can utilize ARGV:
v=123test
awk 'BEGIN {print ARGV[1]}' "$v"
Note that if you are going to continue into the body, you will need to adjust
ARGC:
awk 'BEGIN {ARGC--} {print ARGV[2], $0}' file "$v"
I just changed #Jotne's answer for "for loop".
for i in `seq 11 20`; do host myserver-$i | awk -v i="$i" '{print "myserver-"i" " $4}'; done
I had to insert date at the beginning of the lines of a log file and it's done like below:
DATE=$(date +"%Y-%m-%d")
awk '{ print "'"$DATE"'", $0; }' /path_to_log_file/log_file.log
It can be redirect to another file to save
Pro Tip
It could come handy to create a function that handles this so you dont have to type everything every time. Using the selected solution we get...
awk_switch_columns() {
cat < /dev/stdin | awk -v a="$1" -v b="$2" " { t = \$a; \$a = \$b; \$b = t; print; } "
}
And use it as...
echo 'a b c d' | awk_switch_columns 2 4
Output:
a d c b

How to check the type of an awk variable?

The Beta release of gawk 4.2.0, available in http://www.skeeve.com/gawk/gawk-4.1.65.tar.gz is a major release, with many significant new features.
I previously asked about What is the behaviour of FS = " " in GNU Awk 4.2?, and now I noticed the brand new typeof() function to deprecate isarray():
Changes from 4.1.4 to 4.2.0
The new typeof() function can be used to indicate if a variable or array element is an array, regexp, string or number. The isarray() function is deprecated in favor of typeof().
I could cover four cases: string, number, array and unassigned:
$ awk 'BEGIN {print typeof("a")}'
string
$ awk 'BEGIN {print typeof(1)}'
number
$ awk 'BEGIN {print typeof(a[1])}'
unassigned
$ awk 'BEGIN {a[1]=1; print typeof(a)}'
array
However, I struggle to get "regexp" since none of my attempts reach that and always yield "number":
$ awk 'BEGIN {print typeof(/a/)}'
number
$ awk 'BEGIN {print typeof(/a*/)}'
number
$ awk 'BEGIN {print typeof(/a*d/)}'
number
$ awk 'BEGIN {print typeof(!/a*d/)}'
number
$ awk -v var="/a/" 'BEGIN{print typeof(var)}'
string
$ awk -v var=/a/ 'BEGIN{print typeof(var)}'
string
How can I get a variable to be defined as "regexp"?
I noticed the previous bullet:
Gawk now supports strongly typed regexp constants. Such constants look like #/.../. You can assign them to variables, pass them to functions, use them in ~, !~ and the case part of a switch statement. More details are provided in the manual.
And tried a bit, but with no luck:
$ awk -v pat=#/a/ '{print typeof(pat)}' <<< "bla ble"
string
typeof(/a/) is running typeof() on the result of $0 ~ /a/ which is a number. I haven't tried this yet myself but I'd expect this to be what you're looking for:
typeof(#/a/)
and
var = #/a/
typeof(var)
So this works:
$ awk 'BEGIN {print typeof(#/a/)}'
regexp
$ awk 'BEGIN {var=#/a/; print typeof(var)}'
regexp

Awk print string with variables

How do I print a string with variables?
Trying this
awk -F ',' '{printf /p/${3}_abc/xyz/${5}_abc_def/}' file
Need this at output
/p/APPLE_abc/xyz/MANGO_abc_def/
where ${3} = APPLE
and ${5} = MANGO
printf allows interpolation of variables. With this as the test file:
$ cat file
a,b,APPLE,d,MANGO,f
We can use printf to achieve the output you want as follows:
$ awk -F, '{printf "/p/%s_abc/xyz/%s_abc_def/\n",$3,$5;}' file
/p/APPLE_abc/xyz/MANGO_abc_def/
In printf, the string %s means insert-a-variable-here-as-a-string. We have two occurrences of %s, one for $3 and one for $5.
Not as readable, but the printf isn't necessary here. Awk can insert the variables directly into the strings if you quote the string portion.
$ cat file.txt
1,2,APPLE,4,MANGO,6,7,8
$ awk -F, '{print "/p/" $3 "_abc/xyz/" $5 "_abc_def/"}' file.txt
/p/APPLE_abc/xyz/MANGO_abc_def/

Counting the number of specific values in a column with awk

I have data (data.csv):
"1",5.1,"s"
"2",3.3,"s"
"3",2.7,"c"
and I want to count the number of line whose 3rd element is "s" or "c" with AWK (count.awk):
BEGIN{FS=","; s_count=0; c_count=0}
($3=="s"){s_count++}
($3=="c"){c_count++}
END{print s_count; print c_count}
then
$awk -f count.awk data.csv
but this does not work. Its output is:
0
0
this is not I expected. Why?
$ awk -V
GNU Awk 4.1.0, API: 1.0 (GNU MPFR 3.1.2, GNU MP 5.1.2)
Note: I use Awk on cygwin.
The problem is that your target field has embedded double quotes, so you need to match them too, by including them - \-escaped - in the string to match against:
awk '
BEGIN{FS=","; s_count=0; c_count=0}
($3=="\"s\"") {s_count++}
($3=="\"c\"") {c_count++}
END{ print s_count; print c_count }
' data.csv
As an aside, you can simplify your awk program somewhat:
the parentheses are not needed (have not verified on cygwin, but given that it's awk interpreting the string, I wouldn't expect that to matter)
you don't strictly need to initialize your output variables, because awk defaults uninitialized variables to 0 in numerical contexts.
BEGIN{FS=","}
$3 == "\"s\"" {s_count++}
$3 == "\"c\"" {c_count++}
END{ print s_count; print c_count }
This is a job for an array. Here is an awk command:
awk -F, '{gsub(/\"/,"",$3);a[$3]++} END {for (i in a) print i,a[i]}' file
c 1
s 2
It counts the number of c and s occurrences. Also counts other letters if they exist.