awk not rounding with OFMT and $0 - awk

I'm printing an array with 100 columns and I would like all columns to have 2 decimals. I would like to use print $0 and not have to individually specify the format for all columns.
OFMT does seen to work with $0:
echo '0.77767686 0.76555555 0.6667667 0.77878878' |awk '{CONVFMT="%.2g";OFMT="%.2g";print ($0+0);print ($0+0)"";print $0}'
Results:
0.78
0.78
0.77767686 0.76555555 0.6667667 0.77878878

Note that all input is treated as strings until implicitly converted by how it is used.
OFMT is used when strings are converted to numbers numbers are printed, e.g.:
<<< 0.77767686 awk '{ print 0+$0 }' OFMT='%.2g'
CONVFMT is used when numbers are explicitly converted to strings, e.g.:
<<< 0.77767686 awk '{ print "" 0+$0 }' CONVFMT='%.2g'
Output in both cases:
0.78
The latter converts $0 into a number and then concatenates it with the empty string.
To achieve this for every column I would suggest using a sensible setting of the input and output record separators:
<<< '0.77767686 0.76555555 0.6667667 0.77878878' \
awk '{ print 0+$0 RT }' CONVFMT='%.2g' RS='[ \t\n]+' ORS=''
Note the two conversions, first to a number with 0+$0 then back to a string by concatenating it with RT. RT will be set to the matched record separator. Note that this is GNU awk specific, for a more portable solution, use a loop, e.g.:
<<< '0.77767686 0.76555555 0.6667667 0.77878878' \
awk '{ for (i=1; i<=NF; i++) $i+=0 } 1' CONVFMT='%.2g'
Output in both cases:
0.78 0.77 0.67 0.78
Edit - Responding to #BeeOnRope
#BeeOnRope is correct, OFMT is used as the format specifier when the print-function calls sprintf(), while CONVFMT is used in other conversions. Here is an example that illustrates the difference:
<<< 0.77767686 awk '{ n=0+$1; s=""n; print n, s }' OFMT='%.2g' CONVFMT='%.3g'
Output:
0.78 0.778
Two relevant sections from the GNU awk manual:
https://www.gnu.org/software/gawk/manual/html_node/OFMT.html
https://www.gnu.org/software/gawk/manual/html_node/Strings-And-Numbers.html

Why don't you use a for loop?
echo '0.77767686 0.76555555 0.6667667 0.77878878' |awk '{ for (i=1; i<=NF; i++) printf "%.2f\n", $i }'
Results:
0.78
0.77
0.67
0.78

As others have mentioned you need to treat the field as a number to get a conversion. To combine some other ideas you can try:
awk '{ for (i=1; i<=NF; i++) $i += 0; print }'
That will convert every field to a number. You can just convert individual fields with $7 += 0 and so on. You could get fancier by using if (match($i, ...)) with some regexp to select only the numbers you want to convert.

Related

awk match pattern and convert number to different unit

I have a csv file that contains this kind of values:
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
vm48,8,32794384Ki,16257312Ki
vm48,8,30223304245,15223072Ki
vm49,8,32794384Ki,16257320Ki
vm49,8,30223304245,15223080Ki
The columns 3 and 4 are memoy values expressed either in bytes, or kibibytes. The problem is that the "Ki" string appears randomly through the CSV file, particularly in column3, it's inconsistent.
So to make the file consistent, I need to convert everything in bytes. So basically, any value matching a trailing "Ki" needs to have its numeric value multiplied by 1024, and then replace the corresponding XXXXXKi match.
The reason why I want to do it with awk is because I am already using awk to generate that csv format, but I am happy to do it with sed too.
This is my code so far but obviously it's wrong as it's multiplying any value in columns 3 and 4 by 1024 even though it does not match "Ki". I am not sure at this point how to ask awk "if you see Ki at the end, then multiply by 1024".
kubectl describe node --context=$context| sed -E '/Name:|cpu:|ephemeral-storage:|memory:/!d' | sed 's/\s//g' | awk '
BEGIN {FS = ":"; OFS = ","}
{record[$1] = $2}
$1 == "memory" {print record["Name"], record["cpu"], record["ephemeral-storage"], record["memory"]}
' | awk -F, '{print $1,$2,$3,$3*1024,$4,$4*1024}' >> describe_nodes.csv
Edit: I made a mistake, you need to multiply by 128 to convert KiB in bytes, not 1024.
"if you see Ki at the end, then multiply by 1024
You may use:
awk 'BEGIN{FS=OFS=","} $3 ~ /Ki$/ {$3 *= 1024} $4 ~ /Ki$/ {$4 *= 1024} 1' file
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920
vm48,8,33581449216,16647487488
vm48,8,30223304245,15588425728
vm49,8,33581449216,16647495680
vm49,8,30223304245,15588433920
Or a bit shorter:
awk 'BEGIN{FS=OFS=","} {
for (i=3; i<=4; ++i) $i ~ /Ki$/ && $i *= 1024} 1' file
With your shown samples/attempts, please try following awk code. Simple explanation would be, traverse through fields from 3rd field onwards and look for if a value has Ki(ignore cased manner) then multiply it with 128, print all edited/non-edited lines at last.
awk 'BEGIN{FS=OFS=","} {for(i=3;i<=NF;i++){if($i~/[Kk][Ii]$/){$i *= 128}}} 1' Input_file
You could try numfmt:
$ numfmt -d, --field 3,4 --from=auto --to=none <<EOF
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
EOF
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920

Trying to read data from two files, and subtract values from both files using awk

I have two files
0.975301988947238963 1.75276754663189283 2.00584
0.0457467532388459441 1.21307648993841410 1.21394
-0.664000617674924687 1.57872850852906366 1.71268
-0.812129324498058969 4.86617859243825635 4.93348
and
1.98005959631337536 -3.78935155011290536 4.27549
-1.04468782080821154 4.99192849476267053 5.10007
-1.47203672235857397 -3.15493073343947694 3.48145
2.68001948430755244 -0.0630730371855307004 2.68076
I want to subtract the two values in column 3 of each file.
My first awk statement was
**awk
'BEGIN {print "Test"} FNR>1 && FNR==NR { r[$1]=$3; next} FNR>1 { print $3, r[$1], (r[$1]-$3)}' zzz0.dat zzz1.dat**
Test
5.10007 -5.10007
3.48145 -3.48145
2.68076 -2.68076
This suggests it does not recognize r[$1]=$3
I created an additional column xyz by
**awk 'NR==1{$(NF+1)="xyz"} NR>1{$(NF+1)="xyz"}1' zzz0.dat**
then
awk 'BEGIN {print "Test"} FNR>1 && FNR==NR { xyz[$4]=$3; next} FNR>1 { print $3, xyz[$4], (xyz[$4]-$3)}' zzz00.dat zzz11.dat
Test
5.10007 4.93348 -0.16659
3.48145 4.93348 1.45203
2.68076 4.93348 2.25272
This now shows three columns, but xyz[$4] is printing only the value in the last column, instead of creating a array.
My real files have thousands of lines. How can I resolve this problem ?
You can do it relatively easily using a numeric index for your array. For example:
awk 'NR==FNR {a[++n]=$3; next} o<n{++o; printf "%lf - %lf = %lf\n", a[o], $3, a[o]-$3}' file1 file2
That way you preserve the ordering of the records across files. Without a numeric index, the arrays are associative and there is no specific ordering preserved.
Example Use/Output
With your files in file1 and file2 respectively, you would have:
$ awk 'NR==FNR {a[++n]=$3; next} o<n{++o; printf "%lf - %lf = %lf\n", a[o], $3, a[o]-$3}' file1 file2
2.005840 - 4.275490 = -2.269650
1.213940 - 5.100070 = -3.886130
1.712680 - 3.481450 = -1.768770
4.933480 - 2.680760 = 2.252720
Let me know if that is what you intended or if you have any further questions. If I missed your intent, drop a comment and I will help further.
if the records are aligned in both files, easiest is
$ paste file1 file2 | awk '{print $3,$6,$3-$6}'
2.00584 4.27549 -2.26965
1.21394 5.10007 -3.88613
1.71268 3.48145 -1.76877
4.93348 2.68076 2.25272
if you're only interested in the difference, change to print $3-$6.

awk floating point comparison not working

I have input file with with x1 , x2 and x values, I want to check if x is midpoint between x1 and x2.
But the comparison is failing.
sample input file
x1=20.9280 x2=20.9600 x=20.9440
x1=20.9280 x2=20.9600 x=20.9440
x1=22.7840 x2=22.8160 x=22.8000
Awk command
awk -F'[ =]' '{ if(($2 + $4)/2 != ($6)) print ($2 + $4)/2, " ", $6;}' sample
OUTPUT
20.944 20.9440
20.944 20.9440
22.8 22.8000
Comparison is failing due to extra zeros after decimal points. Please help to fix it.
This is happening due to floating point comparison issue commonly found in all platforms.
You may use this awk for floating point number comparison by converting number to a floating point with 4 decimal points:
awk -F'[ =]+' '{avg = sprintf("%.4f", ($2 + $4) / 2)} avg != $6 { print avg, $6 }' file
If you have gnu awk then you can set precision to a lower number:
awk -M -v PREC=30 -F'[ =]+' '{avg = ($2 + $4) / 2; $6 += 0} avg != $6 { print avg, $6 }' file
Not really an anwser but do demonstrate. You are comparing floating point numbers, they are not equal. I replaced print with printf and modifiers with enough decimals (20, %.20f):
$ awk -F'[ =]' '{
if(($2 + $4)/2 != ($6))
printf "%.20f %.20f\n", ($2 + $4)/2, $6
}' file
Ottput:
20.94400000000000261480 20.94399999999999906208
20.94400000000000261480 20.94399999999999906208
22.79999999999999715783 22.80000000000000071054
So use sprintf and appropriate modifiers (see the printf I used) to control the values.
As others have pointed out, if you are having a problem then it's probably that you're just tripping over the common floating point arithmetic issue but since all of your input values have the same precision you can just get rid of the .s to treat the input numbers as integers and multiply by 2 instead of dividing by 2 just to keep it an integer comparison too:
$ awk -F'[ =]' '{o=$0; gsub(/\./,"")} ($6*2) == ($2+$4){$0=o; print ($2+$4)/2, $6}' file
20.944 20.9440
20.944 20.9440
22.8 22.8000
$ awk -F'[ =]' '{o=$0; gsub(/\./,"")} ($6*2) != ($2+$4){$0=o; print ($2+$4)/2, $6}' file
$

awk / gawk printf when variable format string, changing zero to dash

I have a table of numbers I am printing in awk using printf.
The printf accomplishes some truncation for the numbers.
(cat <<E\OF
Name,Where,Grade
Bob,Sydney,75.12
Sue,Sydney,65.2475
George,Sydney,84.6
Jack,Sydney,35
Amy,Sydney,
EOF
)|gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3=0}
printf("%s,%s,%d \n",$1,$2,$3)}'
This produces:
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,0
Amy,Sydney,0
What I want is to display scores which are less than 50, or missing, as a dash ("-").
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
This requires the 3rd string format in printf change from %d to %s.
So in some rows, the third column should be a value, and in some rows, the third column should be a string. How can I tell this to GAWK? Or should I just pipe through another awk to re-format?
$ gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3="-"} else {$3=sprintf("%d", $3)}
printf("%s,%s,%s \n",$1,$2,$3)}' ip.txt
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
use if-else to assign value to $3 as needed
sprintf allows to assign result of formatting to a variable
for this case, you could use int function as well
now printf will have %s for $3 as well
Assuming you missed the commas for the header and space after third column is not needed, you could do this with a simple one-liner
$ awk -F, -v OFS=, 'NR>1{$3 = $3 < 50 ? "-" : int($3)} 1' ip.txt
Name,Where,Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
?: ternary operator is alternate for if-else
1 is an awk idiom to print contents of $0

How to split a delimited string into an array in awk?

How to split the string when it contains pipe symbols | in it.
I want to split them to be in array.
I tried
echo "12:23:11" | awk '{split($0,a,":"); print a[3] a[2] a[1]}'
Which works fine. If my string is like "12|23|11" then how do I split them into an array?
Have you tried:
echo "12|23|11" | awk '{split($0,a,"|"); print a[3],a[2],a[1]}'
To split a string to an array in awk we use the function split():
awk '{split($0, array, ":")}'
# \/ \___/ \_/
# | | |
# string | delimiter
# |
# array to store the pieces
If no separator is given, it uses the FS, which defaults to the space:
$ awk '{split($0, array); print array[2]}' <<< "a:b c:d e"
c:d
We can give a separator, for example ::
$ awk '{split($0, array, ":"); print array[2]}' <<< "a:b c:d e"
b c
Which is equivalent to setting it through the FS:
$ awk -F: '{split($0, array); print array[2]}' <<< "a:b c:d e"
b c
In GNU Awk you can also provide the separator as a regexp:
$ awk '{split($0, array, ":*"); print array[2]}' <<< "a:::b c::d e
#note multiple :
b c
And even see what the delimiter was on every step by using its fourth parameter:
$ awk '{split($0, array, ":*", sep); print array[2]; print sep[1]}' <<< "a:::b c::d e"
b c
:::
Let's quote the man page of GNU awk:
split(string, array [, fieldsep [, seps ] ])
Divide string into pieces separated by fieldsep and store the pieces in array and the separator strings in the seps array. The first piece is stored in array[1], the second piece in array[2], and so forth. The string value of the third argument, fieldsep, is a regexp describing where to split string (much as FS can be a regexp describing where to split input records). If fieldsep is omitted, the value of FS is used. split() returns the number of elements created. seps is a gawk extension, with seps[i] being the separator string between array[i] and array[i+1]. If fieldsep is a single space, then any leading whitespace goes into seps[0] and any trailing whitespace goes into seps[n], where n is the return value of split() (i.e., the number of elements in array).
Please be more specific! What do you mean by "it doesn't work"?
Post the exact output (or error message), your OS and awk version:
% awk -F\| '{
for (i = 0; ++i <= NF;)
print i, $i
}' <<<'12|23|11'
1 12
2 23
3 11
Or, using split:
% awk '{
n = split($0, t, "|")
for (i = 0; ++i <= n;)
print i, t[i]
}' <<<'12|23|11'
1 12
2 23
3 11
Edit: on Solaris you'll need to use the POSIX awk (/usr/xpg4/bin/awk) in order to process 4000 fields correctly.
I do not like the echo "..." | awk ... solution as it calls unnecessary fork and execsystem calls.
I prefer a Dimitre's solution with a little twist
awk -F\| '{print $3 $2 $1}' <<<'12|23|11'
Or a bit shorter version:
awk -F\| '$0=$3 $2 $1' <<<'12|23|11'
In this case the output record put together which is a true condition, so it gets printed.
In this specific case the stdin redirection can be spared with setting an awk internal variable:
awk -v T='12|23|11' 'BEGIN{split(T,a,"|");print a[3] a[2] a[1]}'
I used ksh quite a while, but in bash this could be managed by internal string manipulation. In the first case the original string is split by internal terminator. In the second case it is assumed that the string always contains digit pairs separated by a one character separator.
T='12|23|11';echo -n ${T##*|};T=${T%|*};echo ${T#*|}${T%|*}
T='12|23|11';echo ${T:6}${T:3:2}${T:0:2}
The result in all cases is
112312
Actually awk has a feature called 'Input Field Separator Variable' link. This is how to use it. It's not really an array, but it uses the internal $ variables. For splitting a simple string it is easier.
echo "12|23|11" | awk 'BEGIN {FS="|";} { print $1, $2, $3 }'
I know this is kind of old question, but I thought maybe someone like my trick. Especially since this solution not limited to a specific number of items.
# Convert to an array
_ITEMS=($(echo "12|23|11" | tr '|' '\n'))
# Output array items
for _ITEM in "${_ITEMS[#]}"; do
echo "Item: ${_ITEM}"
done
The output will be:
Item: 12
Item: 23
Item: 11
Joke? :)
How about echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
This is my output:
p2> echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
112312
so I guess it's working after all..
echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
should work.
echo "12|23|11" | awk '{split($0,a,"|"); print a[3] a[2] a[1]}'
code
awk -F"|" '{split($0,a); print a[1],a[2],a[3]}' <<< '12|23|11'
output
12 23 11
The challenge: parse and store split strings with spaces and insert them into variables.
Solution: best and simple choice for you would be convert the strings list into array and then parse it into variables with indexes. Here's an example how you can convert and access the array.
Example: parse disk space statistics on each line:
sudo df -k | awk 'NR>1' | while read -r line; do
#convert into array:
array=($line)
#variables:
filesystem="${array[0]}"
size="${array[1]}"
capacity="${array[4]}"
mountpoint="${array[5]}"
echo "filesystem:$filesystem|size:$size|capacity:$capacity|mountpoint:$mountpoint"
done
#output:
filesystem:/dev/dsk/c0t0d0s1|size:4000|usage:40%|mountpoint:/
filesystem:/dev/dsk/c0t0d0s2|size:5000|usage:50%|mountpoint:/usr
filesystem:/proc|size:0|usage:0%|mountpoint:/proc
filesystem:mnttab|size:0|usage:0%|mountpoint:/etc/mnttab
filesystem:fd|size:1000|usage:10%|mountpoint:/dev/fd
filesystem:swap|size:9000|usage:9%|mountpoint:/var/run
filesystem:swap|size:1500|usage:15%|mountpoint:/tmp
filesystem:/dev/dsk/c0t0d0s3|size:8000|usage:80%|mountpoint:/export
awk -F'['|'] -v '{print $1"\t"$2"\t"$3}' file <<<'12|23|11'