awk / gawk printf when variable format string, changing zero to dash - awk

I have a table of numbers I am printing in awk using printf.
The printf accomplishes some truncation for the numbers.
(cat <<E\OF
Name,Where,Grade
Bob,Sydney,75.12
Sue,Sydney,65.2475
George,Sydney,84.6
Jack,Sydney,35
Amy,Sydney,
EOF
)|gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3=0}
printf("%s,%s,%d \n",$1,$2,$3)}'
This produces:
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,0
Amy,Sydney,0
What I want is to display scores which are less than 50, or missing, as a dash ("-").
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
This requires the 3rd string format in printf change from %d to %s.
So in some rows, the third column should be a value, and in some rows, the third column should be a string. How can I tell this to GAWK? Or should I just pipe through another awk to re-format?

$ gawk 'BEGIN{FS=","}
FNR==1 {print("Name","Where","Grade");next}
{if ($3<50) {$3="-"} else {$3=sprintf("%d", $3)}
printf("%s,%s,%s \n",$1,$2,$3)}' ip.txt
Name Where Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
use if-else to assign value to $3 as needed
sprintf allows to assign result of formatting to a variable
for this case, you could use int function as well
now printf will have %s for $3 as well
Assuming you missed the commas for the header and space after third column is not needed, you could do this with a simple one-liner
$ awk -F, -v OFS=, 'NR>1{$3 = $3 < 50 ? "-" : int($3)} 1' ip.txt
Name,Where,Grade
Bob,Sydney,75
Sue,Sydney,65
George,Sydney,84
Jack,Sydney,-
Amy,Sydney,-
?: ternary operator is alternate for if-else
1 is an awk idiom to print contents of $0

Related

AWK print specific column if chosen column is only numeric else print another column AIX

Is it possible to use an awk if condition to print a specific column if the chosen column is only values or characters?
below is an example:
echo "This is example test 1 for VAL1 value = int or VAL2 = string" | awk '{if ($5 == [0-9]) print $10 else print $14}'
OR:
echo "This is example test one for VAL1 value = int or VAL2 = string" | awk '{if ($5 == [A-Z]) print $14; else print $10}'
The two examples above is determening from the awk if column 5 is all values or numbers and print a specific column based on column if whether it is only numbers or has string characters. In my example it can only be one or the either and not both mixed with numbers and characters.
How is it possible to do this using an awk?
Here is the problem:
$5 == [0-9]
In awk we use == operator for equality not for regex evaluation. We must use ~ for regex and also enclose a regex in /.../ notation.
So all of these awk solutions should work for you:
# check presence of a digit anywhere in the fifth field
awk '{print ($5 ~ /[0-9]/ ? $10 : $14)}'
# check if fifth field contains 1+ digits only
awk '{print ($5 ~ /^[0-9]+$/ ? $10 : $14)}'
# awk shorthand to check if $5 is numeric value
awk '{print ($5+0 == $5 ? $10 : $14)}'
Similarly to check an uppercase character use:
awk '{print ($5 ~ /[A-Z]/ ? $14 : $10)}'

awk match pattern and convert number to different unit

I have a csv file that contains this kind of values:
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
vm48,8,32794384Ki,16257312Ki
vm48,8,30223304245,15223072Ki
vm49,8,32794384Ki,16257320Ki
vm49,8,30223304245,15223080Ki
The columns 3 and 4 are memoy values expressed either in bytes, or kibibytes. The problem is that the "Ki" string appears randomly through the CSV file, particularly in column3, it's inconsistent.
So to make the file consistent, I need to convert everything in bytes. So basically, any value matching a trailing "Ki" needs to have its numeric value multiplied by 1024, and then replace the corresponding XXXXXKi match.
The reason why I want to do it with awk is because I am already using awk to generate that csv format, but I am happy to do it with sed too.
This is my code so far but obviously it's wrong as it's multiplying any value in columns 3 and 4 by 1024 even though it does not match "Ki". I am not sure at this point how to ask awk "if you see Ki at the end, then multiply by 1024".
kubectl describe node --context=$context| sed -E '/Name:|cpu:|ephemeral-storage:|memory:/!d' | sed 's/\s//g' | awk '
BEGIN {FS = ":"; OFS = ","}
{record[$1] = $2}
$1 == "memory" {print record["Name"], record["cpu"], record["ephemeral-storage"], record["memory"]}
' | awk -F, '{print $1,$2,$3,$3*1024,$4,$4*1024}' >> describe_nodes.csv
Edit: I made a mistake, you need to multiply by 128 to convert KiB in bytes, not 1024.
"if you see Ki at the end, then multiply by 1024
You may use:
awk 'BEGIN{FS=OFS=","} $3 ~ /Ki$/ {$3 *= 1024} $4 ~ /Ki$/ {$4 *= 1024} 1' file
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920
vm48,8,33581449216,16647487488
vm48,8,30223304245,15588425728
vm49,8,33581449216,16647495680
vm49,8,30223304245,15588433920
Or a bit shorter:
awk 'BEGIN{FS=OFS=","} {
for (i=3; i<=4; ++i) $i ~ /Ki$/ && $i *= 1024} 1' file
With your shown samples/attempts, please try following awk code. Simple explanation would be, traverse through fields from 3rd field onwards and look for if a value has Ki(ignore cased manner) then multiply it with 128, print all edited/non-edited lines at last.
awk 'BEGIN{FS=OFS=","} {for(i=3;i<=NF;i++){if($i~/[Kk][Ii]$/){$i *= 128}}} 1' Input_file
You could try numfmt:
$ numfmt -d, --field 3,4 --from=auto --to=none <<EOF
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
EOF
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920

Awk Field number of matched pattern

I was wondering if there's a built in command in awk to get the field number of the phrase that you just matched.
Banana is yellow.
awk {
/yellow/{ for (i=1;i<=NF;i++) if($i ~/yellow/) print $i}'
Is there a way to avoid writing the loop?
Your command doesn't work when I test it. Here's my version:
echo "banana is yellow" | awk '{for (i=1;i<=NF;i++) if($i ~/yellow/) print i}'
The output is :
3
As far as I know, there's no such built-in feature, to improve your command, the pattern match /yellow/ at the beginning is not necessary, and also $i will print the matching field other than the field number that you need.
Alternatively, you can use an array to store each field and its corresponding index number, and then print field by arr["yellow"]
If the input string is a oneline string you can set the record delimiter to the field delimiter. Doing so you can use NR to print the position:
awk 'BEGIN{RS=FS}/yellow/{print NR}' <<< 'banana is yellow'
3

Awk editing with field delimiter

Imagine if you have a string like this
Amazon.com Inc.:181,37:184,22
and you do awk -F':' '{print $1 ":" $2 ":" $3}' then it will output the same thing.
But can you declare $2 in this example so it only outputs 181 and not ,37?
Thanks in advance!
You can change the field separator so that it contains either : or ,, using a bracket expression:
awk -F'[:,]' '{ print $2 }' file
If you are worried that , may appear in the first field (which will break this approach), you could use split:
awk -F: '{ split($2, a, /,/); print a[1] }' file
This splits the second field on the comma and then prints the first part. Any other fields containing a comma are unaffected.

Adding columns with awk. What is wrong with this awk command?

I want to add two columns to a file with ~10,000 columns. I want to insert as the first column the nr 22 on each row. Then I want the original first column as the second column, then as the third column I want to insert the line nr (NR), and after that I want the rest of the original columns to be printed. I thought I could do that with the following awk line:
awk '{print 22, $1, NR; for(i=2;i<=NF;++i) print $i}' file
It prints the first three columns (22, $1, NR) well, but after that, there is a new line started for each value, so the file is printed like this:
22 $1 NR
$2
$3
$4
etc...
instead of:
22 $1 NR $2 $3 $4 etc...
What did I do wrong?
How about using printf instead since print adds a newline.
awk '{printf("%d, %d, %d, ", 22, $1, NR); for(i=2;i<=NF;++i) printf("%d, ", i)}' file
Or you can play with the ORS and OFS, the Output Record Separator and the Output Field Separator. Normally you add those in a BEGIN statement like this:
awk 'BEGIN { ORS = " " } {print 22, $1, NR; for(i=2;i<=NF;++i) print $i}{print "\n"}' file
Note that an extra printf "\n" is needed, else everything ends up on one line...
Read more in gawk manual output separators
For more precise control over the output format than what is provided by print(which print a newline by default), use printf.