awk and log2 divisions - awk

I have a tab delimited file that looks something like this:
foo 0 4
boo 3 2
blah 4 0
flah 1 1
I am trying to calculate log2 for between the two columns for each row. my problem is with the division by zero
What I have tried is this:
cat file.txt | awk -v OFS='\t' '{print $1, log($3/$2)log(2)}'
when there is a zero as the denominator, the awk will crash. What I would want to do is some sort of conditional statement that would print an "inf" as the result when the denominator is equal to 0.
I am really not sure how to go about this?
Any help would be appreciated
Thanks

You can implement that as follows (with a few additional tweaks):
awk 'BEGIN{OFS="\t"} {if ($2==0) {print $1, "inf"} else {print $1, log($3/$2)log(2)}} file.txt
Explanation:
if ($2==0) {print $1, "inf"} else {...} - First check to see if the 2nd field ($2) is zero. If so, print $1 and inf and move on to the next line; otherwise proceed as usual.
BEGIN{OFS="\t"} - Set OFS inside the awk script; mostly a preference thing.
... file.txt - awk can read from files when you specify it as an argument; this saves the use of a cat process. (See UUCA)

awk -F'\t' '{print $1,($2 ? log($3/$2)log(2) : "inf")}' file.txt

Related

awk match pattern and convert number to different unit

I have a csv file that contains this kind of values:
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
vm48,8,32794384Ki,16257312Ki
vm48,8,30223304245,15223072Ki
vm49,8,32794384Ki,16257320Ki
vm49,8,30223304245,15223080Ki
The columns 3 and 4 are memoy values expressed either in bytes, or kibibytes. The problem is that the "Ki" string appears randomly through the CSV file, particularly in column3, it's inconsistent.
So to make the file consistent, I need to convert everything in bytes. So basically, any value matching a trailing "Ki" needs to have its numeric value multiplied by 1024, and then replace the corresponding XXXXXKi match.
The reason why I want to do it with awk is because I am already using awk to generate that csv format, but I am happy to do it with sed too.
This is my code so far but obviously it's wrong as it's multiplying any value in columns 3 and 4 by 1024 even though it does not match "Ki". I am not sure at this point how to ask awk "if you see Ki at the end, then multiply by 1024".
kubectl describe node --context=$context| sed -E '/Name:|cpu:|ephemeral-storage:|memory:/!d' | sed 's/\s//g' | awk '
BEGIN {FS = ":"; OFS = ","}
{record[$1] = $2}
$1 == "memory" {print record["Name"], record["cpu"], record["ephemeral-storage"], record["memory"]}
' | awk -F, '{print $1,$2,$3,$3*1024,$4,$4*1024}' >> describe_nodes.csv
Edit: I made a mistake, you need to multiply by 128 to convert KiB in bytes, not 1024.
"if you see Ki at the end, then multiply by 1024
You may use:
awk 'BEGIN{FS=OFS=","} $3 ~ /Ki$/ {$3 *= 1024} $4 ~ /Ki$/ {$4 *= 1024} 1' file
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920
vm48,8,33581449216,16647487488
vm48,8,30223304245,15588425728
vm49,8,33581449216,16647495680
vm49,8,30223304245,15588433920
Or a bit shorter:
awk 'BEGIN{FS=OFS=","} {
for (i=3; i<=4; ++i) $i ~ /Ki$/ && $i *= 1024} 1' file
With your shown samples/attempts, please try following awk code. Simple explanation would be, traverse through fields from 3rd field onwards and look for if a value has Ki(ignore cased manner) then multiply it with 128, print all edited/non-edited lines at last.
awk 'BEGIN{FS=OFS=","} {for(i=3;i<=NF;i++){if($i~/[Kk][Ii]$/){$i *= 128}}} 1' Input_file
You could try numfmt:
$ numfmt -d, --field 3,4 --from=auto --to=none <<EOF
vm47,8,32794384Ki,16257320Ki
vm47,8,30223304245,15223080Ki
EOF
vm47,8,33581449216,16647495680
vm47,8,30223304245,15588433920

awk conditional statement based on a value between colon

I was just introduced to awk and I'm trying to retrieve rows from my file based on the value on column 10.
I need to filter the data based on the value of the third value if ":" was used as a separator in column 10 (last column).
Here is an example data in column 10. 0/1:1,9:10:15:337,0,15.
I was able to extract the third value using this command awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
This returns the value 10 but how can I return other rows (not just the value in column 10) if this third value is less than or greater than a specific number?
I tried this awk '{if($10 -F ":" "/1/ ($3<10))" print $0;}' file.txt but it returns a syntax error.
Thanks!
Your code:
awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
should be just 1 awk script:
awk '$10 ~ /1/ { split($10,f,/:/); print f[3] }' file.txt
but I'm not sure that code is doing what you think it does. If you want to print the 3rd value of all $10s that contain :s, as it sounds like from your text, that'd be:
awk 'split($10,f,/:/) > 1 { print f[3] }' file.txt
and to print the rows where that value is less than 7 would be:
awk '(split($10,f,/:/) > 1) && (f[3] < 7)' file.txt

How to use awk script to generate a file

I have a very large compressed file(dataFile.gz) that I want to generate another file using cat and awk. So using cat to view the contents and then piping it to awk to generate the new file.
The contents of compressed as like below
Time,SequenceNumber,MsgType,MsgLength,CityOrign,RTime
7:20:13,1,A,34,Tokyo,0
7:20:13,2,C,35,Nairobi,7:20:14
7:20:14,3,E,30,Berlin,7:20:15
7:20:16,4,A,34,Berlin,7:20:17
7:20:17,5,C,35,Denver,0
7:20:17,6,D,33,Helsinki,7:20:18
7:20:18,7,F,37,Tokyo,0
….
….
….
For the new file, I want to generate, I only want the Time, MsgType and RTime. Meaning columns 0,2 and 5. And for column 5, if the value is 0, replace it with the value at column 0. i.e replace RTime with Time
Time,MsgType,RTime
7:20:13,A,7:20:13
7:20:13,C,7:20:14
7:20:14,E,7:20:15
7:20:16,A,7:20:17
7:20:17,C,7:20:17
7:20:17,D,7:20:18
7:20:18,F,7:20:18
This is my script so far:
#!/usr/bin/awk -f
BEGIN {FS=","
print %0,%2,
if ($5 == "0") {
print $0
} else {
print $5
}
}
My question is, will this script work and how do I call it. Can I call it on the terminal like below?
zcat dataFile.gz | <awk script> > generatedFile.csv
awk index starts with 1 and $0 represents full record. So column numbers would be 1, 3, 6.
You may use this awk:
awk 'BEGIN{FS=OFS=","} !$6{$6=$1} {print $1, $3, $6}' file
Time,MsgType,RTime
7:20:13,A,7:20:13
7:20:13,C,7:20:14
7:20:14,E,7:20:15
7:20:16,A,7:20:17
7:20:17,C,7:20:17
7:20:17,D,7:20:18
7:20:18,F,7:20:18
Could you please try following. A bit shorter version of #anubhava sir's solution. This one is NOT having assignment to 6th field it only checks if that is zero or not and accordingly it prints the values.
awk 'BEGIN{FS=OFS=","} {print $1, $3, $6==0?$1:$6}' Input_file

awk: print each column of a file into separate files

I have a file with 100 columns of data. I want to print the first column and i-th column in 99 separate files, I am trying to use
for i in {2..99}; do awk '{print $1" " $i }' input.txt > data${i}; done
But I am getting errors
awk: illegal field $(), name "i"
input record number 1, file input.txt
source line number 1
How to correctly use $i inside the {print }?
Following single awk may help you too here:
awk -v start=2 -v end=99 '{for(i=start;i<=end;i++){print $1,$i > "file"i;close("file"i)}}' Input_file
An all awk solution. First test data:
$ cat foo
11 12 13
21 22 23
Then the awk:
$ awk '{for(i=2;i<=NF;i++) print $1,$i > ("data" i)}' foo
and results:
$ ls data*
data2 data3
$ cat data2
11 12
21 22
The for iterates from 2 to the last field. If there are more fields that you desire to process, change the NF to the number you'd like. If, for some reason, a hundred open files would be a problem in your system, you'd need to put the print into a block and add a close call:
$ awk '{for(i=2;i<=NF;i++){f=("data" i); print $1,$i >> f; close(f)}}' foo
If you want to do what you try to accomplish :
for i in {2..99}; do
awk -v x=$i '{print $1" " $x }' input.txt > data${i}
done
Note
the -v switch of awk to pass variables
$x is the nth column defined in your variable x
Note2 : this is not the fastest solution, one awk call is fastest, but I just try to correct your logic. Ideally, take time to understand awk, it's never a wasted time

using awk to match and sum a file of multiple lines

I am trying to combine matching lines in file.txt $1 and then display the sum of `$2 for those matches. Thank you :).
File.txt
ENSMUSG00000000001:001
ENSMUSG00000000001:002
ENSMUSG00000000001:003
ENSMUSG00000000002:003
ENSMUSG00000000002:003
ENSMUSG00000000003:002
Desired output
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
awk -F':' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file > output.txt
$ awk -F':' -v OFS='\t' '{sum[$1]+=$2} END{for (key in sum) print key, sum[key]}' file
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
{x=$1;a[x]=a[x] + $2} END{for(x in a)print x,a[x]}
Just a typo I guess: instead of adding $0 add $2. That gives me the expected output. And the $1="" is not necessary. To make sure that there isn't anything funny with $2 you may consider 1.0*$2.