Calculate exponential value of a record - awk

I need to calculate and print the exponential value of records of field $2 after multiplying by a factor of -0.05.
The data looks like this:
101 205 560
101 200 530
107 160 480
110 95 600
I need the output to look like this:
101 205 560 0.000035
101 200 530 0.000045
107 160 480 0.00033
110 95 600 0.00865

This should work:
$ awk '{ print $0, sprintf("%f", exp($2 * -0.05)) }' infile
101 205 560 0.000035
101 200 530 0.000045
107 160 480 0.000335
110 95 600 0.008652
This just prints the whole line $0, followed by the exponential of the second field multiplied by -0.05. The sprintf formatting makes sure that the result is not printed in scientific notation (which would happen otherwise).
If the input data is tab separated and you need tabs in the output as well, you have to set the output field separator first:
$ awk 'BEGIN{OFS="\t"} { print $0, sprintf("%f", exp($2 * -0.05)) }' infile
101 205 560 0.000035
101 200 530 0.000045
107 160 480 0.000335
110 95 600 0.008652

Related

Sum of a block of duplicated rows, if not duplicated add 1 and calculate difference

I need to parse several sorted files that have the following structure:
1_0.91 10
1_0.91 20
1_0.91 30
1_0.91 40
2_0.89 50
1_0.91 60
1_0.91 70
1_0.91 80
2_0.89 90
2_0.89 100
2_0.89 110
3_0.86 120
The first column is a feature in the genome, and the second column is their location. Each feature is interspaced with others. I want to find the size of each feature or blocks of each feature. For this example, my desired output is the following:
1_0.91 10 40 30
2_0.89 50 51 1
1_0.91 60 80 20
2_0.89 90 110 20
3_0.86 120 121 1
The feature 1_0.91 starts at 10 and is found on locations 20, 30, and 40. I want to create a new column with the start and end. In this case, starts at 10 and ends at 40. Then output their size in a new column (end minus start, in this case, 30). There are several places where I have each feature only once. In my example, 2_0.89 is between blocks of feature 1_0.91. In this case, I want to add 1 to the current value and estimate the size as well, which in this case, equals 1.
I have tried to use awk, but I am stuck with the features that appear only once.
Here is what I used so far. It is a bit convoluted:
Let's call the first file file1.txt:
cat file1.txt | awk '$1!=prev{if (pline){print pline;}print;}{prev=$1;pline=$0;}END{print pline;}' > file2.txt
The output:
1_0.91 10
1_0.91 40
2_0.89 50
2_0.89 50
1_0.91 60
1_0.91 80
2_0.89 90
2_0.89 110
3_0.86 120
3_0.86 120
Now, I print the odd and even lines with sed, then I use paste to place the files together:
paste <(cat file2.txt | sed 'n; d') <(cat file2.txt | sed '1d; n; d' ) | awk 'BEGIN{OFS="\t"} {print $1,$2,$4}' > file3.txt
The output:
1_0.91 10 40
2_0.89 50 50
1_0.91 60 80
2_0.89 90 110
3_0.86 120 120
Next, I estimate the size of each feature:
cat file3.txt | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$3-$2}' > file4.txt
The output:
1_0.91 10 40 30
2_0.89 50 50 0
1_0.91 60 80 20
2_0.89 90 110 20
3_0.86 120 120 0
Next, I replace zeros in column 4 with 1:
cat file4.txt | awk 'BEGIN{OFS="\t"} { $4 = ($4 == "0" ? 1 : $4) } 1' > file5.txt
The output:
1_0.91 10 40 30
2_0.89 50 50 1
1_0.91 60 80 20
2_0.89 90 110 20
3_0.86 120 120 1
Finally, I fix the end of each feature with awk:
cat file5.txt | awk 'BEGIN{OFS="\t"} { $3 = ($2 == $3 ? $3+1 : $3) } 1' > file6.txt
The output:
1_0.91 10 40 30
2_0.89 50 51 1
1_0.91 60 80 20
2_0.89 90 110 20
3_0.86 120 121 1
I wonder if there was a faster and easy way to do it.
Thank you.
Assumptions:
consecutive lines with the same feature (field #1) are sorted by location (field #2) in ascending order
input/output field delimiters are \t
location (field #2) values are always positive integers (otherwise we could tweak the code)
One awk idea:
awk '
function print_feature() {
if ( feature != "" )
print feature,min,max,(max-min)
}
BEGIN { FS=OFS="\t" }
$1 != feature { print_feature() # row contains a new/different feature, so print previous feature details
feature=$1
min=$2
max=min+1
next
}
{ max=$2 } # row contains a repeated/duplicate feature
END { print_feature() } # flush last feature details to stdout
' feature.dat
This generates:
1_0.91 10 40 30
2_0.89 50 51 1
1_0.91 60 80 20
2_0.89 90 110 20
3_0.86 120 121 1

awk getting good distribution of random integer values between 2 inputs

How to get a good distribution of random integer values between 2 inputs using awk?.
I'm trying the below
$ awk -v min=200 -v max=500 ' BEGIN { srand();for(i=0;i<10;i++) print int(min+rand()*100*(max/min)) } '
407
406
360
334
264
365
303
417
249
225
$
Is there a better way in awk
Sorry to inform you that your code is not even correct. Try with min=max=10.
Something like this will work. You can verify the uniformity as well.
$ awk -v min=200 -v max=210 ' BEGIN{srand();
for(i=0;i<10000;i++) a[int(min+rand()*(max-min))]++;
for(k in a) print k,a[k]}' | sort
200 1045
201 966
202 1014
203 1016
204 985
205 1010
206 988
207 1027
208 986
209 963
Note also that min is inclusive but max is not.

conditional awk statement to create a new field with additive value

Question
How would I use awk to create a new field that has $2+consistent value?
I am planning to cycle through a list of values but I wouldn't mind using a one liner for each command
PseudoCode
awk '$1 == Bob {$4 = $2 + 400}' file
Sample Data
Philip 13 2
Bob 152 8
Bob 4561 2
Bob 234 36
Bob 98 12
Rey 147 152
Rey 15 1547
Expected Output
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
just quote Bob, also you want to add third field not second
$ awk '$1=="Bob" {$4=$3+400}1' file | column -t
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
Here , check if $1 is equal to Bob and , reconstruct the record ($0) by appending $2 FS 400 in to $0. Here FS is the field separator used between 3rd and 4th fields. 1 in the end means tell awk to take the default action which is print.
awk '$1=="Bob"{$0=$0 FS $2 + 400}1' file
Philip 13 2
Bob 152 8 552
Bob 4561 2 4961
Bob 234 36 634
Bob 98 12 498
Rey 147 152
Rey 15 1547
Or , if you want to keep name(Bob) as variable
awk -vname="Bob" '$1==name{$0=$0 FS $2 + 400}1' file
1st solutiuon: Could you please try following too once. I am using here NF and NF+1 awk's out of the box variables. Where $NF denotes value of last column of current line and $(NF+1) will create an additional column if condition of st field stringBob` is found is TRUE.
awk '{$(NF+1)=$1=="Bob"?400+$NF:""} 1' OFS="\t" Input_file
2nd solution: In case we don't want to create a new field and simply want to print the values as per condition then try following, this should be more faster I believe.
awk 'BEGIN{OFS="\t"}{$1=$1;print $0,$1=="Bob"?400+$NF:""}' Input_file
Output will be as follows.
Philip 13 2
Bob 152 8 408
Bob 4561 2 402
Bob 234 36 436
Bob 98 12 412
Rey 147 152
Rey 15 1547
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
{
$(NF+1)=$1=="Bob"?400+$NF:"" ##Creating a new last field here whose value will be depending upon condition check.
##its checking condition if 1st field is having Bob string in it then add 400 value to last field value or make it NULL.
}
1 ##awk works on method of condition then action. so by mentioning 1 making condition TRUE here and NO action defined so by default print of current line will happen.
' OFS="\t" Input_file ##Setting OFS as TAB here where OFS ois output field separator and mentioning Input_file name here.

exchange columns based on some conditions

I have a text file with 5 columns. If the number of the 5th column is less than the 3rd column, replace the 4th and 5th column as 2nd and 3rd column. If the number of the 5th column is greater than 3rd column, leave that line as same.
1EAD A 396 B 311
1F3B A 118 B 171
1F5V A 179 B 171
1G73 A 162 C 121
1BS0 E 138 G 230
Desired output
1EAD B 311 A 396
1F3B A 118 B 171
1F5V B 171 A 179
1G73 C 121 A 162
1BS0 E 138 G 230
$ awk '{ if ($5 >= $3) print $0; else print $1"\t"$4"\t"$5"\t"$2"\t"$3; }' foo.txt

How to append the count of numbers in each line of text using awk?

I have several very large text files and would like to append the count of numbers following by a space in front of each line. Could anyone kindly suggest how to do it efficiently using Awk?
Input:
10 109 120 200 1148 1210 1500 5201
9 139 1239 1439 6551
199 5693 5695
Desired Output:
8 10 109 120 200 1148 1210 1500 5201
5 9 139 1239 1439 6551
3 199 5693 5695
You can use
awk '{print NF,$0}' input.txt
It says print number of field of the current line NF separated by current field separator , which in this case is a space then print the current line itself $0.
this will work for you:
awk '{$0=NF FS $0}7' file