Is it possible to use an awk if condition to print a specific column if the chosen column is only values or characters?
below is an example:
echo "This is example test 1 for VAL1 value = int or VAL2 = string" | awk '{if ($5 == [0-9]) print $10 else print $14}'
OR:
echo "This is example test one for VAL1 value = int or VAL2 = string" | awk '{if ($5 == [A-Z]) print $14; else print $10}'
The two examples above is determening from the awk if column 5 is all values or numbers and print a specific column based on column if whether it is only numbers or has string characters. In my example it can only be one or the either and not both mixed with numbers and characters.
How is it possible to do this using an awk?
Here is the problem:
$5 == [0-9]
In awk we use == operator for equality not for regex evaluation. We must use ~ for regex and also enclose a regex in /.../ notation.
So all of these awk solutions should work for you:
# check presence of a digit anywhere in the fifth field
awk '{print ($5 ~ /[0-9]/ ? $10 : $14)}'
# check if fifth field contains 1+ digits only
awk '{print ($5 ~ /^[0-9]+$/ ? $10 : $14)}'
# awk shorthand to check if $5 is numeric value
awk '{print ($5+0 == $5 ? $10 : $14)}'
Similarly to check an uppercase character use:
awk '{print ($5 ~ /[A-Z]/ ? $14 : $10)}'
I have a csv file that contains this kind of values:
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
vm48,8,32794384Ki,16257312Ki
vm48,8,30223304245,15223072Ki
vm49,8,32794384Ki,16257320Ki
vm49,8,30223304245,15223080Ki
The columns 3 and 4 are memoy values expressed either in bytes, or kibibytes. The problem is that the "Ki" string appears randomly through the CSV file, particularly in column3, it's inconsistent.
So to make the file consistent, I need to convert everything in bytes. So basically, any value matching a trailing "Ki" needs to have its numeric value multiplied by 1024, and then replace the corresponding XXXXXKi match.
The reason why I want to do it with awk is because I am already using awk to generate that csv format, but I am happy to do it with sed too.
This is my code so far but obviously it's wrong as it's multiplying any value in columns 3 and 4 by 1024 even though it does not match "Ki". I am not sure at this point how to ask awk "if you see Ki at the end, then multiply by 1024".
kubectl describe node --context=$context| sed -E '/Name:|cpu:|ephemeral-storage:|memory:/!d' | sed 's/\s//g' | awk '
BEGIN {FS = ":"; OFS = ","}
{record[$1] = $2}
$1 == "memory" {print record["Name"], record["cpu"], record["ephemeral-storage"], record["memory"]}
' | awk -F, '{print $1,$2,$3,$3*1024,$4,$4*1024}' >> describe_nodes.csv
Edit: I made a mistake, you need to multiply by 128 to convert KiB in bytes, not 1024.
"if you see Ki at the end, then multiply by 1024
You may use:
awk 'BEGIN{FS=OFS=","} $3 ~ /Ki$/ {$3 *= 1024} $4 ~ /Ki$/ {$4 *= 1024} 1' file
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920
vm48,8,33581449216,16647487488
vm48,8,30223304245,15588425728
vm49,8,33581449216,16647495680
vm49,8,30223304245,15588433920
Or a bit shorter:
awk 'BEGIN{FS=OFS=","} {
for (i=3; i<=4; ++i) $i ~ /Ki$/ && $i *= 1024} 1' file
With your shown samples/attempts, please try following awk code. Simple explanation would be, traverse through fields from 3rd field onwards and look for if a value has Ki(ignore cased manner) then multiply it with 128, print all edited/non-edited lines at last.
awk 'BEGIN{FS=OFS=","} {for(i=3;i<=NF;i++){if($i~/[Kk][Ii]$/){$i *= 128}}} 1' Input_file
You could try numfmt:
$ numfmt -d, --field 3,4 --from=auto --to=none <<EOF
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
EOF
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920
I was wondering if there's a built in command in awk to get the field number of the phrase that you just matched.
Banana is yellow.
awk {
/yellow/{ for (i=1;i<=NF;i++) if($i ~/yellow/) print $i}'
Is there a way to avoid writing the loop?
Your command doesn't work when I test it. Here's my version:
echo "banana is yellow" | awk '{for (i=1;i<=NF;i++) if($i ~/yellow/) print i}'
The output is :
3
As far as I know, there's no such built-in feature, to improve your command, the pattern match /yellow/ at the beginning is not necessary, and also $i will print the matching field other than the field number that you need.
Alternatively, you can use an array to store each field and its corresponding index number, and then print field by arr["yellow"]
If the input string is a oneline string you can set the record delimiter to the field delimiter. Doing so you can use NR to print the position:
awk 'BEGIN{RS=FS}/yellow/{print NR}' <<< 'banana is yellow'
3
Imagine if you have a string like this
Amazon.com Inc.:181,37:184,22
and you do awk -F':' '{print $1 ":" $2 ":" $3}' then it will output the same thing.
But can you declare $2 in this example so it only outputs 181 and not ,37?
Thanks in advance!
You can change the field separator so that it contains either : or ,, using a bracket expression:
awk -F'[:,]' '{ print $2 }' file
If you are worried that , may appear in the first field (which will break this approach), you could use split:
awk -F: '{ split($2, a, /,/); print a[1] }' file
This splits the second field on the comma and then prints the first part. Any other fields containing a comma are unaffected.
I want to add two columns to a file with ~10,000 columns. I want to insert as the first column the nr 22 on each row. Then I want the original first column as the second column, then as the third column I want to insert the line nr (NR), and after that I want the rest of the original columns to be printed. I thought I could do that with the following awk line:
awk '{print 22, $1, NR; for(i=2;i<=NF;++i) print $i}' file
It prints the first three columns (22, $1, NR) well, but after that, there is a new line started for each value, so the file is printed like this:
22 $1 NR
$2
$3
$4
etc...
instead of:
22 $1 NR $2 $3 $4 etc...
What did I do wrong?
How about using printf instead since print adds a newline.
awk '{printf("%d, %d, %d, ", 22, $1, NR); for(i=2;i<=NF;++i) printf("%d, ", i)}' file
Or you can play with the ORS and OFS, the Output Record Separator and the Output Field Separator. Normally you add those in a BEGIN statement like this:
awk 'BEGIN { ORS = " " } {print 22, $1, NR; for(i=2;i<=NF;++i) print $i}{print "\n"}' file
Note that an extra printf "\n" is needed, else everything ends up on one line...
Read more in gawk manual output separators
For more precise control over the output format than what is provided by print(which print a newline by default), use printf.