I have a tab-separated file with several columns, where one column contains numbers written in a format like this
4.07794484177529E-293
I wonder if AWK understands this notation? I.e. I want to get only the lines where the numbers in that column are smaller than 0.1.
But I am not sure if AWK will understand what "4.07794484177529E-293" is - can it do arithmetic comparisons on this format?
Yes, to answer your question, awk does understand E notation.
You can confirm that by:
awk '{printf "%f\n", $1}' <<< "4.07794484177529E-3"
0.004078
In general, awk uses double-precision floating-point numbers to represent all numeric values. This gives you the range between 1.7E–308 and 1.7E+308 to work with, so you are okay with 4.07794484177529E-293
Aside: you can specify how to format the print of floating point number with awk as follows:
awk '{printf "%5.8f\n", $1}' <<< "1.2345678901234556E+4"
12345.67890123
Explanation:
%5.8f is what formats the float
the 5 part before the . specifies how many digits to print before the decimal apoint
the 8 part after the . specifies how many digits to print after the decimal point
Related
I have this problem,
awk 'BEGIN{ x = 0.703125; p = x/2; print p }' somefile
and the output is 0.351562. But a decimal number is missing. It should be 0.3515625 So is there a way to improve the division to get all decimals stored into a variable and not only printed? So that p really holds the value 0.3515625
It is because of the built-in value for CONVFMT (newer versions of awk) and OFMT is only 6 digit precision. You need to make it different by modifying that variable to use along with print or use printf with precision needed.
From Awk Strings and Numbers
CONVFMT's default value is "%.6g", which creates a value with at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, 17 digits is usually enough to capture a floating-point number’s value exactly.
Prior to the POSIX standard, awk used the value of OFMT for converting numbers to strings. OFMT specifies the output format to use when printing numbers with print. CONVFMT was introduced in order to separate the semantics of conversion from the semantics of printing. Both CONVFMT and OFMT have the same default value: "%.6g". In the vast majority of cases, old awk programs do not change their behavior. However, these semantics for OFMT are something to keep in mind if you must port your new style program to older implementations of awk.
awk 'BEGIN{ x = 0.703125; p = x/2; printf "%.7f", p }'
In addition to that, OP want's this high-precision number in a variable for which sprintf should be used
awk 'BEGIN{ x = 0.703125; p = x/2; pcustom=sprintf("%.7f", p) }'
or by changing the OFMT variable on GNU Awk 4.1.4,
awk 'BEGIN{ OFMT = "%0.7f"; x = 0.703125; p = x/2; print p; }'
Related to another post I had...
parsing a sql string for integer values with multiple delimiters,
In which I say I can easily accomplish the same with UNIX tools (ahem). I found it a bit more messy than expected. I'm looking for an awk solution. Any suggestions on the following?
Here is my original post, paraphrased:
#
I want to use awk to parse data sourced from a flat file that is pipe delimited. One of the fields is sub-formatted as follows. My end state is to sum the integers within the field, but my question here is to see of ways to use awk to sum the numeric values in the field. The pattern of the sub-formatting will always be where the desired integers will be preceded by a tilde (~) and followed by an asterisk (*), except for the last one in field. The number of sub fields may vary too (my example has 5, but there could more or less). The 4 char TAG name is of no importance.
So here is a sample:
|GADS~55.0*BILK~0.0*BOBB~81.0*HETT~32.0*IGGR~51.0|
From this example, all I would want for processing is the final number of 219. Again, I can work on the sum part as a further step; just interested in getting the numbers.
#
My solution currently entails two awk statements. First using gsub to replace the '~' with a '*' delimiter in my target field, 77:
awk -F'|' 'BEGIN {OFS="|"} { gsub("~", "*", $77) ; print }' file_1 > file_2
My second awk statement is to calculate the numeric sums on the target field, 77, which is the last field, and replace it with the calculated value. It is built on the assumption that there will be no other asterisks (*) anywhere else in the file. I'm okay with that. It is working for most examples, but not others, and my gut tells me this isn't that robust of an answer. Any ideas? The suggestions on my other post for SQL were great, but I couldn't implement them for unrelated silly reasons.
awk -F'*' '{if (NF>=2) {s=0; for (i=1; i<=NF; i++) s=s+$i; print substr($1, 1, length($1)-4) s;} else print}' file_2 > file_3
To get the sum (219) from your example, you can use this:
awk -F'[^0-9.]+' '{for(i=1;i<=NF;i++)s+=$i;print s}' file
or the following for 219.00 :
awk -F'[^0-9.]+' '{for(i=1;i<=NF;i++)s+=$i;printf "%.2f\n", s}' file
I have input data using scientific notation as in (TAB-separated)
-2.60000000E-001 -2.84200000E-011 1.00000000E+000 2.45060000E-010 0.00000000E+000 -1.98000000E-012
using awk, I'm extracting some column and do a mathematical operation on another. To make sure that the format is as needed, printf is applied:
awk '{ printf "%9.8E\t%9.8E\n", $1,sqrt($4) }' infile.dat
However in my output the number of digits for the exponent changes from 3 to 2:
-3.00000000E-01 1.90446843E-05
How do I define these in the printf statement, so that I get the desired output:
-3.00000000E-001 1.90446843E-005
printf uses the stdio and this does not provide a way to set the exponent length. So you need to run your own.
awk 'BEGIN{
v="-3.00000000E-01 "
v=gensub("E([+-])([0-9][0-9]) ","E\\10\\2","",v )
print v
exit}'
This puts the value into variable v, then applies a substitution to search for the exponent, and if it is on 2 positions, it adds a 0. If it is already on 3 positions, nothing is added.
gensub is only available in gawk
I am surprised with behaviour of awk while performing floating point calculations. It lead me to wrong calculation on table data.
$ awk 'BEGIN {print 2.3/0.1}'
23 <-- Ok
$ awk 'BEGIN {print int(2.3/0.1)}'
22 <-- Wrong!
$ awk 'BEGIN {print 2.3-2.2==0.1}'
0 <-- Surprise!
$ awk 'BEGIN {print 2.3-2.2>0.1}' <-- Din't produce any output :(
$ awk 'BEGIN {print 2.3-2.2<0.1}'
1 <-- Totally confused now ...
Can somebody throw light as to what's happing here?
EDIT 1
As pointed by #fedorqui, output of second last command goes to file named 0.1 because of redirection operator (>).
Then how am I supposed to perform greater than (>) operation?
Solution to it is also given by #fedorqui
$ awk 'BEGIN {print (2.3-2.2>0.1)}'
0 <-- Wrong!
The following section from the manual should help you understand the issue you're observing:
15.1.1.2 Floating Point Numbers Are Not Abstract Numbers
Unlike numbers in the abstract sense (such as what you studied in high
school or college arithmetic), numbers stored in computers are limited
in certain ways. They cannot represent an infinite number of digits,
nor can they always represent things exactly. In particular,
floating-point numbers cannot always represent values exactly. Here is
an example:
$ awk '{ printf("%010d\n", $1 * 100) }'
515.79
-| 0000051579
515.80
-| 0000051579
515.81
-| 0000051580
515.82
-| 0000051582
Ctrl-d
This shows that some values can be represented exactly, whereas others
are only approximated. This is not a “bug” in awk, but simply an
artifact of how computers represent numbers.
A highly recommended reading:
What every computer scientist should know about floating-point arithmetic
In the following awk command
awk '{sum+=$1; ++n} END {avg=sum/n; print "Avg monitoring time = "avg}' file.txt
what should I change to remove scientific notation output (very small values displayed as 1.5e-05) ?
I was not able to succeed with the OMFT variable.
You should use the printf AWK statement. That way you can specify padding, precision, etc. In your case, the %f control letter seems the more appropriate.
I was not able to succeed with the OMFT variable.
It is actually OFMT (outputformat), so for example:
awk 'BEGIN{OFMT="%f";print 0.000015}'
will output:
0.000015
as opposed to:
awk 'BEGIN{print 0.000015}'
which output:
1.5e-05
GNU AWK manual says that if you want to be POSIX-compliant it should be floating-point conversion specification.
Setting -v OFMT='%f' (without having to embed it into my awk statement) worked for me in the case where all I wanted from awk was to sum columns of arbitrary floating point numbers.
As the OP found, awk produces exponential notation with very small numbers,
$ some_accounting_widget | awk '{sum+=$0} END{print sum+0}'
8.992e-07 # Not useful to me
Setting OFMT for fixed that, but also rounded too aggressively,
$ some_accounting_widget | awk -v OFMT='%f' '{sum+=$0} END{print sum+0}'
0.000001 # Oops. Rounded off too much. %f rounds to 6 decimal places by default.
Specifying the number of decimal places got me what I needed,
$ some_accounting_widget | awk -v OFMT='%.10f' '{sum+=$0} END{print sum+0}'
0.0000008992 # Perfect.