`awk` - define number of digits in exponent when using scientific notation - awk

I have input data using scientific notation as in (TAB-separated)
-2.60000000E-001 -2.84200000E-011 1.00000000E+000 2.45060000E-010 0.00000000E+000 -1.98000000E-012
using awk, I'm extracting some column and do a mathematical operation on another. To make sure that the format is as needed, printf is applied:
awk '{ printf "%9.8E\t%9.8E\n", $1,sqrt($4) }' infile.dat
However in my output the number of digits for the exponent changes from 3 to 2:
-3.00000000E-01 1.90446843E-05
How do I define these in the printf statement, so that I get the desired output:
-3.00000000E-001 1.90446843E-005

printf uses the stdio and this does not provide a way to set the exponent length. So you need to run your own.
awk 'BEGIN{
v="-3.00000000E-01 "
v=gensub("E([+-])([0-9][0-9]) ","E\\10\\2","",v )
print v
exit}'
This puts the value into variable v, then applies a substitution to search for the exponent, and if it is on 2 positions, it adds a 0. If it is already on 3 positions, nothing is added.
gensub is only available in gawk

Related

awk - Rounding all floating-point numbers in multi-line text file

Assume a multi-line text file that contains multiple floating-point numbers as well as alphanumeric strings and special characters per line. The only consistency is that all floats are separated from any other string by a single whitespace. Further, assume that we wish to round each floating-point number to a maximum of n digits after the comma. All strings other than the floats shall remain in place and as is. Let us assume that n=5.
I know this can be implemented via awk easily. My current code (below) only rounds the last float of each line and swallows all strings that precede it. How do I improve it?
echo -e "\textit{foo} & 1234.123456 & -1234.123456\n1234.123456" |\
awk '{for(i=1;i<=NF;i++);printf("%.05f\n",$NF)}'
# -1234.12346
# 1234.12346
Using perl :
perl -i -pe 's/(\d+\.\d+)/sprintf "%.05f", $1/eg' file
One solution :
$ echo -e "\textit{foo} & 1234.123456 & -1234.123456\n1234.123456" |
awk '{for(i=1;i<=NF;i++){if ($i ~ /[0-9]+.[0-9]+/){printf "%.05f\n", $i}}}'
Output :
1234.12346
-1234.12346
1234.12346
Is this what you're trying to do?
$ printf '\textit{foo} & 1234.123456 & -1234.123456\n1234.123456\n' |
awk -F'[ ]' '{for(i=1;i<=NF;i++) if ($i+0 == $i) $i = sprintf("%.05f",$i)} 1'
extit{foo} & 1234.12346 & -1234.12346
1234.12346
if ($i+0 == $i) is the idiomatic awk way to test for a value being a number since only a number could have the same value on the left and right side of that comparison.
I'm setting the FS to a literal, single blank char instead of it's default which, confusingly, is also a blank char but the latter (i.e. ' ' vs '[ ]') is treated specially and results in ALL chains of contiguous white space being treated as a separator and ignoring stripping leading/trailing blanks on a recompilation of $0 (e.g. as caused by assigning to any field) and so would not allow your formatting to be maintained in the output.

Getting more decimal positions as a result of division when using awk

I have this problem,
awk 'BEGIN{ x = 0.703125; p = x/2; print p }' somefile
and the output is 0.351562. But a decimal number is missing. It should be 0.3515625 So is there a way to improve the division to get all decimals stored into a variable and not only printed? So that p really holds the value 0.3515625
It is because of the built-in value for CONVFMT (newer versions of awk) and OFMT is only 6 digit precision. You need to make it different by modifying that variable to use along with print or use printf with precision needed.
From Awk Strings and Numbers
CONVFMT's default value is "%.6g", which creates a value with at most six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, 17 digits is usually enough to capture a floating-point number’s value exactly.
Prior to the POSIX standard, awk used the value of OFMT for converting numbers to strings. OFMT specifies the output format to use when printing numbers with print. CONVFMT was introduced in order to separate the semantics of conversion from the semantics of printing. Both CONVFMT and OFMT have the same default value: "%.6g". In the vast majority of cases, old awk programs do not change their behavior. However, these semantics for OFMT are something to keep in mind if you must port your new style program to older implementations of awk.
awk 'BEGIN{ x = 0.703125; p = x/2; printf "%.7f", p }'
In addition to that, OP want's this high-precision number in a variable for which sprintf should be used
awk 'BEGIN{ x = 0.703125; p = x/2; pcustom=sprintf("%.7f", p) }'
or by changing the OFMT variable on GNU Awk 4.1.4,
awk 'BEGIN{ OFMT = "%0.7f"; x = 0.703125; p = x/2; print p; }'

awk: fatal: 1 is invalid as number of arguments for match

myfile looks like
Split level: train 67.0%
importance 0.17
Score metric (accuracy_score): 0.986
Score metric (precision_score): 0.903
I want to extract the accuracy score (here it is 0.986) by awk:
$ awk '/Score metric \(accuracy_score\):/ { match(/[0-9]+\.[0-9]+ *$/); substr($0, RSTART, RLENGTH) }' myfile
awk: fatal: 1 is invalid as number of arguments for match
What does the error mean here? I don't have 1 in my awk program.
How can I correct my program to make it work?
What is your better solution?
Others have stated the issue with the number of match parameters... this can be found in the awk manual. The following answer is quick and easy -- avoiding the match() and substr() functions. It outputs the last field when your pattern is found. LC_ALL=C is used because your match criterion and number are all representable in ASCII -- the script will run faster in this mode.
LC_ALL=C awk '/Score metric \(accuracy_score\):/ { print $NF }'
If you stick to awk, I would do:
awk -F':\\s*' '/Score metric \(accuracy_score\)/{print $2}' file
In your codes, you used match() function, in man page:
match(s, r [, a]) s is the string, r is the regex, and optional a is array.
You gave only a /..../ to match() this would be interpreted by awk as a boolean, it does $0~/.../, so the result it true. and boolean.true in awk has int value 1.
Here you found the 1(one).
*awk what I meant above is gnu awk.

Does AWK understand number written in in E notation?

I have a tab-separated file with several columns, where one column contains numbers written in a format like this
4.07794484177529E-293
I wonder if AWK understands this notation? I.e. I want to get only the lines where the numbers in that column are smaller than 0.1.
But I am not sure if AWK will understand what "4.07794484177529E-293" is - can it do arithmetic comparisons on this format?
Yes, to answer your question, awk does understand E notation.
You can confirm that by:
awk '{printf "%f\n", $1}' <<< "4.07794484177529E-3"
0.004078
In general, awk uses double-precision floating-point numbers to represent all numeric values. This gives you the range between 1.7E–308 and 1.7E+308 to work with, so you are okay with 4.07794484177529E-293
Aside: you can specify how to format the print of floating point number with awk as follows:
awk '{printf "%5.8f\n", $1}' <<< "1.2345678901234556E+4"
12345.67890123
Explanation:
%5.8f is what formats the float
the 5 part before the . specifies how many digits to print before the decimal apoint
the 8 part after the . specifies how many digits to print after the decimal point

How can I make awk not use scientific notation when printing small values?

In the following awk command
awk '{sum+=$1; ++n} END {avg=sum/n; print "Avg monitoring time = "avg}' file.txt
what should I change to remove scientific notation output (very small values displayed as 1.5e-05) ?
I was not able to succeed with the OMFT variable.
You should use the printf AWK statement. That way you can specify padding, precision, etc. In your case, the %f control letter seems the more appropriate.
I was not able to succeed with the OMFT variable.
It is actually OFMT (outputformat), so for example:
awk 'BEGIN{OFMT="%f";print 0.000015}'
will output:
0.000015
as opposed to:
awk 'BEGIN{print 0.000015}'
which output:
1.5e-05
GNU AWK manual says that if you want to be POSIX-compliant it should be floating-point conversion specification.
Setting -v OFMT='%f' (without having to embed it into my awk statement) worked for me in the case where all I wanted from awk was to sum columns of arbitrary floating point numbers.
As the OP found, awk produces exponential notation with very small numbers,
$ some_accounting_widget | awk '{sum+=$0} END{print sum+0}'
8.992e-07 # Not useful to me
Setting OFMT for fixed that, but also rounded too aggressively,
$ some_accounting_widget | awk -v OFMT='%f' '{sum+=$0} END{print sum+0}'
0.000001 # Oops. Rounded off too much. %f rounds to 6 decimal places by default.
Specifying the number of decimal places got me what I needed,
$ some_accounting_widget | awk -v OFMT='%.10f' '{sum+=$0} END{print sum+0}'
0.0000008992 # Perfect.