using awk and printf not rounding correctly [duplicate] - awk

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
I have an issue where I'm using printf to round a float to the proper number of decimal points. I'm getting inconsistent results as shown below.
echo 104.45 | awk '{printf "%.1f\n",$1}'
104.5 <-- seem to be correct behaviour
echo 104.445 | awk '{printf "%.2f\n",$1}'
104.44 (should be 104.45) <-- seems to be INCORRECT behaviour
echo 104.4445 | awk '{printf "%.3f\n",$1}'
104.445 <-- seems to be correct behaviour
I've seen examples where float number in calculations may cause problems, but did not expect this with formatting.

The number 104.4445 cannot be represented exactly as a binary number. In other words, your computer doesn't know such a number.
# echo 104.4445 | awk '{printf "%.20f\n",$1}'
104.44450000000000500222
# echo 104.445 | awk '{printf "%.20f\n",$1}'
104.44499999999999317879
That's why the former is rounded to 104.445, while the latter is rounded to 104.44 .
The sjsam's answer is relevant only to numbers which can be represented exactly as a binary number, i. e. m/2**n , where m and n are integers and not too big. Changing ROUNDINGMODE to "A" has absolutely no effect on printing 104.45, 104.445, or 104.4445 :
# echo 104.4445 | awk -v ROUNDMODE="A" '{printf "%.3f\n",$1}'
104.445
# echo 104.4445 | awk '{printf "%.3f\n",$1}'
104.445
# echo 104.445 | awk -v ROUNDMODE="A" '{printf "%.2f\n",$1}'
104.44
# echo 104.445 | awk '{printf "%.2f\n",$1}'
104.44

I tried something analogous in Python and got similar results to you:
>>> round(104.445, 2)
104.44
>>> round(104.4445, 3)
104.445
This seems to be run-of-the-mill wonky floating point wonkiness, especially considering that the floating-point representation of 104.445 is less than the actual mathematical value of 104.445:
>>> 104.445 - 104.44
0.0049999999999954525
>>> 104.445 - 104.44 + 104.44
104.445

I strongly suspect that the reason for this behavior has less to do with awk than with how computers store numbers. As user31264 states: "Your computer doesn't know such a number [as 104.4445]."
Here are the results of an experiment I just did with the JavaScript Scratchpad in Pale Moon Web browser:
(104.45).toFixed(55)
/*
104.4500000000000028421709430404007434844970703125000000000
*/
(104.445).toFixed(55)
/*
104.4449999999999931787897367030382156372070312500000000000
*/
(104.4445).toFixed(55)
/*
104.4445000000000050022208597511053085327148437500000000000
*/
In all probability, your awk interpreter is not dealing with the decimal numbers 104.45, etc., but rather with the "wonky" values shown here. Rounding the first, second, and third of these "wonky" values to, respectively, 1, 2, and 3 decimal places will give the same results as your awk interpreter is giving you.

Related

Generating 10 random numbers in a range in an awk script

So I'm trying to write an awk script that generates passwords given random names inputted from a .csv file. I'm aiming to do first 3 letters of last name, number of characters in the fourth field, then a random number between 1-200 after a space. So far I've got the letters and num of characters fine, but am having a hard time getting the syntax in my for loop to work for the random numbers. Here is an example of the input:
Danette,Suche,Female,"Kingfisher, malachite"
Corny,Chitty,Male,"Seal, southern elephant"
And desired output:
Suc21 80
Chi23 101
For 10 rows total. My code looks like this:
BEGIN{
FS=",";OFS=","
}
{print substr($2,0,3)length($4)
for(i=0;i<10;i++){
echo $(( $RANDOM % 200 ))
}}
Then I've been running it like
awk -F"," -f script.awk file.csv
But it only shows the 3 characters and length of fourth field, no random numbers. If anyone's able to point out where I'm screwing up it would be much appreciated , thanks guys
You can use rand() to generate a random number between 0 and 1:
awk -F, '{print substr($2,0,3)length($4),int(rand()*200)+1}' file.csv
BEGIN{
FS=",";OFS=","
}
{print substr($2,0,3)length($4)
for(i=0;i<10;i++){
echo $(( $RANDOM % 200 ))
}}
There is not echo function defined in GNU AWK, if you wish to use shell command you might use system function, however keep in mind that it does return status code and does print what said command output, without ability to alter it, so you need to design command so you get desired output from it.
Let file.txt content be
A
B
C
then
awk '{printf "%s ",$0;system("echo ${RANDOM}%200 | bc")}' file.txt
might give output
A 95
B 139
C 1
Explanation: firstly I use printf so no newline is appended automatically, I output whole line followed by space, then I execute command which does output random value in range
echo ${RANDOM}%200 | bc
it does simply ram RANDOM followed by %200 into calculator, which does output result of such action.
If you are not dead set on using RANDOM variable, then rand function, might be use without hassle.
(tested with gawk 4.2.1 and bc 1.07.1)

Strange behavior of awk when converting hexa to decimal

Can someone explain why 2 different hexa are converted to the same decimal?
$ echo A0000044956EA2 | gawk '{print strtonum("0x" $1)}'
45035997424348832
$ echo A0000044956EA0 | gawk '{print strtonum("0x" $1)}'
45035997424348832
Starting with GNU awk 4.1 you can use --bignum or -M
$ awk -M 'BEGIN {print 0xA0000044956EA2}'
45035997424348834
$ awk -M 'BEGIN {print 0xA0000044956EA0}'
45035997424348832
§ Command-Line Options
Not as much an answer but a workaround to at least not bin the strtonum function completely:
It seems to be the doubles indeed. I found the calculation here : strtonum.
Nothing wrong with it.
However if you really need this in some awk you should strip the last digit from the hexa number and manually add that after the strtonum did its calculation on the main part of it.
So 0xA0000044956EA1 , 0xA0000044956EA2 and 0xA0000044956EA"whatever" should all become 0xA0000044956EA0 with a simple regex and then add the "whatever".
Edit* Maybe I should delete this all together as I am even to downgrade this even further. This is not working to satisfaction either, just tried it and I actually can't add a number that small to a number this big i.e. print (45035997424348832 + 4) just comes out as 45035997424348832. So this workaround will have to remain having output like 45035997424348832 + 4 for hexa 0xA0000044956EA4.

floating point calculations in awk

I am surprised with behaviour of awk while performing floating point calculations. It lead me to wrong calculation on table data.
$ awk 'BEGIN {print 2.3/0.1}'
23 <-- Ok
$ awk 'BEGIN {print int(2.3/0.1)}'
22 <-- Wrong!
$ awk 'BEGIN {print 2.3-2.2==0.1}'
0 <-- Surprise!
$ awk 'BEGIN {print 2.3-2.2>0.1}' <-- Din't produce any output :(
$ awk 'BEGIN {print 2.3-2.2<0.1}'
1 <-- Totally confused now ...
Can somebody throw light as to what's happing here?
EDIT 1
As pointed by #fedorqui, output of second last command goes to file named 0.1 because of redirection operator (>).
Then how am I supposed to perform greater than (>) operation?
Solution to it is also given by #fedorqui
$ awk 'BEGIN {print (2.3-2.2>0.1)}'
0 <-- Wrong!
The following section from the manual should help you understand the issue you're observing:
15.1.1.2 Floating Point Numbers Are Not Abstract Numbers
Unlike numbers in the abstract sense (such as what you studied in high
school or college arithmetic), numbers stored in computers are limited
in certain ways. They cannot represent an infinite number of digits,
nor can they always represent things exactly. In particular,
floating-point numbers cannot always represent values exactly. Here is
an example:
$ awk '{ printf("%010d\n", $1 * 100) }'
515.79
-| 0000051579
515.80
-| 0000051579
515.81
-| 0000051580
515.82
-| 0000051582
Ctrl-d
This shows that some values can be represented exactly, whereas others
are only approximated. This is not a “bug” in awk, but simply an
artifact of how computers represent numbers.
A highly recommended reading:
What every computer scientist should know about floating-point arithmetic

How can I make awk not use scientific notation when printing small values?

In the following awk command
awk '{sum+=$1; ++n} END {avg=sum/n; print "Avg monitoring time = "avg}' file.txt
what should I change to remove scientific notation output (very small values displayed as 1.5e-05) ?
I was not able to succeed with the OMFT variable.
You should use the printf AWK statement. That way you can specify padding, precision, etc. In your case, the %f control letter seems the more appropriate.
I was not able to succeed with the OMFT variable.
It is actually OFMT (outputformat), so for example:
awk 'BEGIN{OFMT="%f";print 0.000015}'
will output:
0.000015
as opposed to:
awk 'BEGIN{print 0.000015}'
which output:
1.5e-05
GNU AWK manual says that if you want to be POSIX-compliant it should be floating-point conversion specification.
Setting -v OFMT='%f' (without having to embed it into my awk statement) worked for me in the case where all I wanted from awk was to sum columns of arbitrary floating point numbers.
As the OP found, awk produces exponential notation with very small numbers,
$ some_accounting_widget | awk '{sum+=$0} END{print sum+0}'
8.992e-07 # Not useful to me
Setting OFMT for fixed that, but also rounded too aggressively,
$ some_accounting_widget | awk -v OFMT='%f' '{sum+=$0} END{print sum+0}'
0.000001 # Oops. Rounded off too much. %f rounds to 6 decimal places by default.
Specifying the number of decimal places got me what I needed,
$ some_accounting_widget | awk -v OFMT='%.10f' '{sum+=$0} END{print sum+0}'
0.0000008992 # Perfect.

AWK, printf format specifier %x has problems with negative values

It seems AWK has problems with the unsigned hex format specifier:
echo 0x80000000 | awk '{printf("0x%08x\n", $1)}'
gives back: 0x7fffffff
Is this a known problem with awk?
Thanks!
The problem is that awk only converts input parameters to numbers automatically if they are decimal. But this should work:
echo 0x80000000 | awk '{printf("0x%08x\n", strtonum($1))}'
It's all explained in here, in the strtonum section:
http://www.gnu.org/manual/gawk/html_node/String-Functions.html#String-Functions
Not seeing it here, although I wasn't able to use the hex input as you are, but converted to decimal was no problem.
$ echo 2147483648 | awk '{printf("0x%08x\n", $1)}'
0x80000000
If you care to enlighten us what platform you're on (this was GNU awk 3.1.5), we might be able to help you more.