awk floating point comparison not working - awk

I have input file with with x1 , x2 and x values, I want to check if x is midpoint between x1 and x2.
But the comparison is failing.
sample input file
x1=20.9280 x2=20.9600 x=20.9440
x1=20.9280 x2=20.9600 x=20.9440
x1=22.7840 x2=22.8160 x=22.8000
Awk command
awk -F'[ =]' '{ if(($2 + $4)/2 != ($6)) print ($2 + $4)/2, " ", $6;}' sample
OUTPUT
20.944 20.9440
20.944 20.9440
22.8 22.8000
Comparison is failing due to extra zeros after decimal points. Please help to fix it.

This is happening due to floating point comparison issue commonly found in all platforms.
You may use this awk for floating point number comparison by converting number to a floating point with 4 decimal points:
awk -F'[ =]+' '{avg = sprintf("%.4f", ($2 + $4) / 2)} avg != $6 { print avg, $6 }' file
If you have gnu awk then you can set precision to a lower number:
awk -M -v PREC=30 -F'[ =]+' '{avg = ($2 + $4) / 2; $6 += 0} avg != $6 { print avg, $6 }' file

Not really an anwser but do demonstrate. You are comparing floating point numbers, they are not equal. I replaced print with printf and modifiers with enough decimals (20, %.20f):
$ awk -F'[ =]' '{
if(($2 + $4)/2 != ($6))
printf "%.20f %.20f\n", ($2 + $4)/2, $6
}' file
Ottput:
20.94400000000000261480 20.94399999999999906208
20.94400000000000261480 20.94399999999999906208
22.79999999999999715783 22.80000000000000071054
So use sprintf and appropriate modifiers (see the printf I used) to control the values.

As others have pointed out, if you are having a problem then it's probably that you're just tripping over the common floating point arithmetic issue but since all of your input values have the same precision you can just get rid of the .s to treat the input numbers as integers and multiply by 2 instead of dividing by 2 just to keep it an integer comparison too:
$ awk -F'[ =]' '{o=$0; gsub(/\./,"")} ($6*2) == ($2+$4){$0=o; print ($2+$4)/2, $6}' file
20.944 20.9440
20.944 20.9440
22.8 22.8000
$ awk -F'[ =]' '{o=$0; gsub(/\./,"")} ($6*2) != ($2+$4){$0=o; print ($2+$4)/2, $6}' file
$

Related

AWK print specific column if chosen column is only numeric else print another column AIX

Is it possible to use an awk if condition to print a specific column if the chosen column is only values or characters?
below is an example:
echo "This is example test 1 for VAL1 value = int or VAL2 = string" | awk '{if ($5 == [0-9]) print $10 else print $14}'
OR:
echo "This is example test one for VAL1 value = int or VAL2 = string" | awk '{if ($5 == [A-Z]) print $14; else print $10}'
The two examples above is determening from the awk if column 5 is all values or numbers and print a specific column based on column if whether it is only numbers or has string characters. In my example it can only be one or the either and not both mixed with numbers and characters.
How is it possible to do this using an awk?
Here is the problem:
$5 == [0-9]
In awk we use == operator for equality not for regex evaluation. We must use ~ for regex and also enclose a regex in /.../ notation.
So all of these awk solutions should work for you:
# check presence of a digit anywhere in the fifth field
awk '{print ($5 ~ /[0-9]/ ? $10 : $14)}'
# check if fifth field contains 1+ digits only
awk '{print ($5 ~ /^[0-9]+$/ ? $10 : $14)}'
# awk shorthand to check if $5 is numeric value
awk '{print ($5+0 == $5 ? $10 : $14)}'
Similarly to check an uppercase character use:
awk '{print ($5 ~ /[A-Z]/ ? $14 : $10)}'

awk match pattern and convert number to different unit

I have a csv file that contains this kind of values:
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
vm48,8,32794384Ki,16257312Ki
vm48,8,30223304245,15223072Ki
vm49,8,32794384Ki,16257320Ki
vm49,8,30223304245,15223080Ki
The columns 3 and 4 are memoy values expressed either in bytes, or kibibytes. The problem is that the "Ki" string appears randomly through the CSV file, particularly in column3, it's inconsistent.
So to make the file consistent, I need to convert everything in bytes. So basically, any value matching a trailing "Ki" needs to have its numeric value multiplied by 1024, and then replace the corresponding XXXXXKi match.
The reason why I want to do it with awk is because I am already using awk to generate that csv format, but I am happy to do it with sed too.
This is my code so far but obviously it's wrong as it's multiplying any value in columns 3 and 4 by 1024 even though it does not match "Ki". I am not sure at this point how to ask awk "if you see Ki at the end, then multiply by 1024".
kubectl describe node --context=$context| sed -E '/Name:|cpu:|ephemeral-storage:|memory:/!d' | sed 's/\s//g' | awk '
BEGIN {FS = ":"; OFS = ","}
{record[$1] = $2}
$1 == "memory" {print record["Name"], record["cpu"], record["ephemeral-storage"], record["memory"]}
' | awk -F, '{print $1,$2,$3,$3*1024,$4,$4*1024}' >> describe_nodes.csv
Edit: I made a mistake, you need to multiply by 128 to convert KiB in bytes, not 1024.
"if you see Ki at the end, then multiply by 1024
You may use:
awk 'BEGIN{FS=OFS=","} $3 ~ /Ki$/ {$3 *= 1024} $4 ~ /Ki$/ {$4 *= 1024} 1' file
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920
vm48,8,33581449216,16647487488
vm48,8,30223304245,15588425728
vm49,8,33581449216,16647495680
vm49,8,30223304245,15588433920
Or a bit shorter:
awk 'BEGIN{FS=OFS=","} {
for (i=3; i<=4; ++i) $i ~ /Ki$/ && $i *= 1024} 1' file
With your shown samples/attempts, please try following awk code. Simple explanation would be, traverse through fields from 3rd field onwards and look for if a value has Ki(ignore cased manner) then multiply it with 128, print all edited/non-edited lines at last.
awk 'BEGIN{FS=OFS=","} {for(i=3;i<=NF;i++){if($i~/[Kk][Ii]$/){$i *= 128}}} 1' Input_file
You could try numfmt:
$ numfmt -d, --field 3,4 --from=auto --to=none <<EOF
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
EOF
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920

awk / gawk printf when variable format string, changing zero to dash

I have a table of numbers I am printing in awk using printf.
The printf accomplishes some truncation for the numbers.
(cat <<E\OF
Name,Where,Grade
Bob,Sydney,75.12
Sue,Sydney,65.2475
George,Sydney,84.6
Jack,Sydney,35
Amy,Sydney,
EOF
)|gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3=0}
printf("%s,%s,%d \n",$1,$2,$3)}'
This produces:
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,0
Amy,Sydney,0
What I want is to display scores which are less than 50, or missing, as a dash ("-").
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
This requires the 3rd string format in printf change from %d to %s.
So in some rows, the third column should be a value, and in some rows, the third column should be a string. How can I tell this to GAWK? Or should I just pipe through another awk to re-format?
$ gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3="-"} else {$3=sprintf("%d", $3)}
printf("%s,%s,%s \n",$1,$2,$3)}' ip.txt
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
use if-else to assign value to $3 as needed
sprintf allows to assign result of formatting to a variable
for this case, you could use int function as well
now printf will have %s for $3 as well
Assuming you missed the commas for the header and space after third column is not needed, you could do this with a simple one-liner
$ awk -F, -v OFS=, 'NR>1{$3 = $3 < 50 ? "-" : int($3)} 1' ip.txt
Name,Where,Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
?: ternary operator is alternate for if-else
1 is an awk idiom to print contents of $0

awk not rounding with OFMT and $0

I'm printing an array with 100 columns and I would like all columns to have 2 decimals. I would like to use print $0 and not have to individually specify the format for all columns.
OFMT does seen to work with $0:
echo '0.77767686 0.76555555 0.6667667 0.77878878' |awk '{CONVFMT="%.2g";OFMT="%.2g";print ($0+0);print ($0+0)"";print $0}'
Results:
0.78
0.78
0.77767686 0.76555555 0.6667667 0.77878878
Note that all input is treated as strings until implicitly converted by how it is used.
OFMT is used when strings are converted to numbers numbers are printed, e.g.:
<<< 0.77767686 awk '{ print 0+$0 }' OFMT='%.2g'
CONVFMT is used when numbers are explicitly converted to strings, e.g.:
<<< 0.77767686 awk '{ print "" 0+$0 }' CONVFMT='%.2g'
Output in both cases:
0.78
The latter converts $0 into a number and then concatenates it with the empty string.
To achieve this for every column I would suggest using a sensible setting of the input and output record separators:
<<< '0.77767686 0.76555555 0.6667667 0.77878878' \
awk '{ print 0+$0 RT }' CONVFMT='%.2g' RS='[ \t\n]+' ORS=''
Note the two conversions, first to a number with 0+$0 then back to a string by concatenating it with RT. RT will be set to the matched record separator. Note that this is GNU awk specific, for a more portable solution, use a loop, e.g.:
<<< '0.77767686 0.76555555 0.6667667 0.77878878' \
awk '{ for (i=1; i<=NF; i++) $i+=0 } 1' CONVFMT='%.2g'
Output in both cases:
0.78 0.77 0.67 0.78
Edit - Responding to #BeeOnRope
#BeeOnRope is correct, OFMT is used as the format specifier when the print-function calls sprintf(), while CONVFMT is used in other conversions. Here is an example that illustrates the difference:
<<< 0.77767686 awk '{ n=0+$1; s=""n; print n, s }' OFMT='%.2g' CONVFMT='%.3g'
Output:
0.78 0.778
Two relevant sections from the GNU awk manual:
https://www.gnu.org/software/gawk/manual/html_node/OFMT.html
https://www.gnu.org/software/gawk/manual/html_node/Strings-And-Numbers.html
Why don't you use a for loop?
echo '0.77767686 0.76555555 0.6667667 0.77878878' |awk '{ for (i=1; i<=NF; i++) printf "%.2f\n", $i }'
Results:
0.78
0.77
0.67
0.78
As others have mentioned you need to treat the field as a number to get a conversion. To combine some other ideas you can try:
awk '{ for (i=1; i<=NF; i++) $i += 0; print }'
That will convert every field to a number. You can just convert individual fields with $7 += 0 and so on. You could get fancier by using if (match($i, ...)) with some regexp to select only the numbers you want to convert.

how to get rid of awk fatal division by zero error

when ever I am trying to calculate mean and standard deviation using awk i am getting "awk: fatal: division by zero attempted" error.
my command is
awk '{s+=$3} END{print $2"\t"s/(NR)}' >> mean;
awk '{sum+=$3;sumsq+=$3*$3} END {print $2"\t"sqrt(sumsq/NR - (sum/NR)^2)}' >>sd
does any one know how to solve this ?
Your trouble is that ... you are dividing by zero.
You have two commands:
awk '{s+=$3} END{print $2"\t"s/(NR)}' >> mean;
awk '{sum+=$3;sumsq+=$3*$3} END {print $2"\t"sqrt(sumsq/NR - (sum/NR)^2)}' >>sd
The first command reads from standard input to EOF. The second command is then run, tries to read standard input, but finds that it is empty, so it has zero records read, so NR is zero, and you are dividing by 0, and crashing.
You will need to deal with both the mean and the standard deviation in a single command.
awk '{s1 += $3; s2 += $3*$3}
END { if (NR > 0){
print $2 "\t" s1 / NR;
print $2 "\t" sqrt(s2 / NR - (s1/NR)^2);
}
}'
This avoids divide-by-zero errors.