AWK, printf format specifier %x has problems with negative values - awk

It seems AWK has problems with the unsigned hex format specifier:
echo 0x80000000 | awk '{printf("0x%08x\n", $1)}'
gives back: 0x7fffffff
Is this a known problem with awk?
Thanks!

The problem is that awk only converts input parameters to numbers automatically if they are decimal. But this should work:
echo 0x80000000 | awk '{printf("0x%08x\n", strtonum($1))}'
It's all explained in here, in the strtonum section:
http://www.gnu.org/manual/gawk/html_node/String-Functions.html#String-Functions

Not seeing it here, although I wasn't able to use the hex input as you are, but converted to decimal was no problem.
$ echo 2147483648 | awk '{printf("0x%08x\n", $1)}'
0x80000000
If you care to enlighten us what platform you're on (this was GNU awk 3.1.5), we might be able to help you more.

Related

using awk and printf not rounding correctly [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
I have an issue where I'm using printf to round a float to the proper number of decimal points. I'm getting inconsistent results as shown below.
echo 104.45 | awk '{printf "%.1f\n",$1}'
104.5 <-- seem to be correct behaviour
echo 104.445 | awk '{printf "%.2f\n",$1}'
104.44 (should be 104.45) <-- seems to be INCORRECT behaviour
echo 104.4445 | awk '{printf "%.3f\n",$1}'
104.445 <-- seems to be correct behaviour
I've seen examples where float number in calculations may cause problems, but did not expect this with formatting.
The number 104.4445 cannot be represented exactly as a binary number. In other words, your computer doesn't know such a number.
# echo 104.4445 | awk '{printf "%.20f\n",$1}'
104.44450000000000500222
# echo 104.445 | awk '{printf "%.20f\n",$1}'
104.44499999999999317879
That's why the former is rounded to 104.445, while the latter is rounded to 104.44 .
The sjsam's answer is relevant only to numbers which can be represented exactly as a binary number, i. e. m/2**n , where m and n are integers and not too big. Changing ROUNDINGMODE to "A" has absolutely no effect on printing 104.45, 104.445, or 104.4445 :
# echo 104.4445 | awk -v ROUNDMODE="A" '{printf "%.3f\n",$1}'
104.445
# echo 104.4445 | awk '{printf "%.3f\n",$1}'
104.445
# echo 104.445 | awk -v ROUNDMODE="A" '{printf "%.2f\n",$1}'
104.44
# echo 104.445 | awk '{printf "%.2f\n",$1}'
104.44
I tried something analogous in Python and got similar results to you:
>>> round(104.445, 2)
104.44
>>> round(104.4445, 3)
104.445
This seems to be run-of-the-mill wonky floating point wonkiness, especially considering that the floating-point representation of 104.445 is less than the actual mathematical value of 104.445:
>>> 104.445 - 104.44
0.0049999999999954525
>>> 104.445 - 104.44 + 104.44
104.445
I strongly suspect that the reason for this behavior has less to do with awk than with how computers store numbers. As user31264 states: "Your computer doesn't know such a number [as 104.4445]."
Here are the results of an experiment I just did with the JavaScript Scratchpad in Pale Moon Web browser:
(104.45).toFixed(55)
/*
104.4500000000000028421709430404007434844970703125000000000
*/
(104.445).toFixed(55)
/*
104.4449999999999931787897367030382156372070312500000000000
*/
(104.4445).toFixed(55)
/*
104.4445000000000050022208597511053085327148437500000000000
*/
In all probability, your awk interpreter is not dealing with the decimal numbers 104.45, etc., but rather with the "wonky" values shown here. Rounding the first, second, and third of these "wonky" values to, respectively, 1, 2, and 3 decimal places will give the same results as your awk interpreter is giving you.

remove decimal places in strings ids using awk

I want to remove the decimal places in strings from a list of identifiers:
ENSG00000166224.12
ENSG00000102897.5
ENSG00000168496.3
ENSG00000010295.15
ENSG00000147533.12
ENSG00000119242.4
My desired output will be
ENSG00000166224
ENSG00000102897
ENSG00000168496
ENSG00000010295
ENSG00000147533
ENSG00000119242
I would like to do it with awk, I have been playing with printf but with no success.
UPDATE:
The awk answer setting the field separator to . works well in files with only one column, but what if the file is composed of different columns (strings and float numbers)?
Here is an example:
ENSG00000166224.12 0.0730716237772557 -0.147970450702234
ENSG00000102897.5 0.156405616866614 -0.0398488625782745
ENSG00000168496.3 -0.110396121325736 -0.0147093758392248
How can I remove only the decimal places in the first field?
Thanks
You can set the field separator to the dot and print the first element:
$ awk -F. '{print $1}' file
ENSG00000166224
ENSG00000102897
ENSG00000168496
ENSG00000010295
ENSG00000147533
ENSG00000119242
In sed you would say sed 's/\.[^\.]*$//' file, which will catch everything from the last dot on and remove it.
You would be able to do it with printf if it just was a number. Then, you would use something to not print the decimal places. However, since it is an alphanumeric string it is best to handle it as a string.
Update
Use gsub to replace everything from . in the first field:
$ awk '{gsub(/\..*$/,"",$1)}1' a
ENSG00000166224 0.0730716237772557 -0.147970450702234
ENSG00000102897 0.156405616866614 -0.0398488625782745
ENSG00000168496 -0.110396121325736 -0.0147093758392248
use sub function also.
awk '{sub(/\..*/, "")}1' file
Using cut:
$ cut -d. -f1 file
ENSG00000166224
ENSG00000102897
ENSG00000168496
ENSG00000010295
ENSG00000147533
ENSG00000119242
If you are looking for a solution in perl
perl -pne 's/\..*$//' file.txt
This eventually remove everything after the decimal point.

Strange behavior of awk when converting hexa to decimal

Can someone explain why 2 different hexa are converted to the same decimal?
$ echo A0000044956EA2 | gawk '{print strtonum("0x" $1)}'
45035997424348832
$ echo A0000044956EA0 | gawk '{print strtonum("0x" $1)}'
45035997424348832
Starting with GNU awk 4.1 you can use --bignum or -M
$ awk -M 'BEGIN {print 0xA0000044956EA2}'
45035997424348834
$ awk -M 'BEGIN {print 0xA0000044956EA0}'
45035997424348832
§ Command-Line Options
Not as much an answer but a workaround to at least not bin the strtonum function completely:
It seems to be the doubles indeed. I found the calculation here : strtonum.
Nothing wrong with it.
However if you really need this in some awk you should strip the last digit from the hexa number and manually add that after the strtonum did its calculation on the main part of it.
So 0xA0000044956EA1 , 0xA0000044956EA2 and 0xA0000044956EA"whatever" should all become 0xA0000044956EA0 with a simple regex and then add the "whatever".
Edit* Maybe I should delete this all together as I am even to downgrade this even further. This is not working to satisfaction either, just tried it and I actually can't add a number that small to a number this big i.e. print (45035997424348832 + 4) just comes out as 45035997424348832. So this workaround will have to remain having output like 45035997424348832 + 4 for hexa 0xA0000044956EA4.

Does AWK understand number written in in E notation?

I have a tab-separated file with several columns, where one column contains numbers written in a format like this
4.07794484177529E-293
I wonder if AWK understands this notation? I.e. I want to get only the lines where the numbers in that column are smaller than 0.1.
But I am not sure if AWK will understand what "4.07794484177529E-293" is - can it do arithmetic comparisons on this format?
Yes, to answer your question, awk does understand E notation.
You can confirm that by:
awk '{printf "%f\n", $1}' <<< "4.07794484177529E-3"
0.004078
In general, awk uses double-precision floating-point numbers to represent all numeric values. This gives you the range between 1.7E–308 and 1.7E+308 to work with, so you are okay with 4.07794484177529E-293
Aside: you can specify how to format the print of floating point number with awk as follows:
awk '{printf "%5.8f\n", $1}' <<< "1.2345678901234556E+4"
12345.67890123
Explanation:
%5.8f is what formats the float
the 5 part before the . specifies how many digits to print before the decimal apoint
the 8 part after the . specifies how many digits to print after the decimal point

How can I make awk not use scientific notation when printing small values?

In the following awk command
awk '{sum+=$1; ++n} END {avg=sum/n; print "Avg monitoring time = "avg}' file.txt
what should I change to remove scientific notation output (very small values displayed as 1.5e-05) ?
I was not able to succeed with the OMFT variable.
You should use the printf AWK statement. That way you can specify padding, precision, etc. In your case, the %f control letter seems the more appropriate.
I was not able to succeed with the OMFT variable.
It is actually OFMT (outputformat), so for example:
awk 'BEGIN{OFMT="%f";print 0.000015}'
will output:
0.000015
as opposed to:
awk 'BEGIN{print 0.000015}'
which output:
1.5e-05
GNU AWK manual says that if you want to be POSIX-compliant it should be floating-point conversion specification.
Setting -v OFMT='%f' (without having to embed it into my awk statement) worked for me in the case where all I wanted from awk was to sum columns of arbitrary floating point numbers.
As the OP found, awk produces exponential notation with very small numbers,
$ some_accounting_widget | awk '{sum+=$0} END{print sum+0}'
8.992e-07 # Not useful to me
Setting OFMT for fixed that, but also rounded too aggressively,
$ some_accounting_widget | awk -v OFMT='%f' '{sum+=$0} END{print sum+0}'
0.000001 # Oops. Rounded off too much. %f rounds to 6 decimal places by default.
Specifying the number of decimal places got me what I needed,
$ some_accounting_widget | awk -v OFMT='%.10f' '{sum+=$0} END{print sum+0}'
0.0000008992 # Perfect.