percentage of numbers in a range - awk

I have a file with a range of inputs varying from 0-100. I want to generate the percentage of numbers below a range of values.
Is it possible to do it in awk? Can someone show me an example of it.
Eg input file: 1, 4, 6, 7, 7, 7, 8, 9, 2, 4
output:
below 2: 20%
below 4: 40%
below 6: 50%
below 8: 90%
** included below or equal

Assuming data is in a.txt file:
awk -v limit=16 '$1 < limit {count++} END {print 100*count/NR}' a.txt
Where limit is the cutoff value, it just counts the elements smaller than limit ($1 < limit), and after all lines are read it prints the count divided by number of records (NR).

Related

Calculating the average length of a column

I have a task, where I have to count the average length of each word in a column with awk.
awk -F'\t' '{print length ($8) } END { print "Average = ",sum/NR}' file
In the output I get the total length of each line, but it does not count the average length, the output just says Average = 0 which can not be the case because the printed lines before have numbers.
For better understanding i will copy paste the last line of the output here:
4
4
3
4
4
2
5
7
6
5
Average = 0
How do i need to change my code to get the average letters of the whole column as output?
Ty very much for your time and help :)
In the output i get the total length of each line, but it does not count the average length, the output just says Average=0 which can not be the case because the printed lines before have numbers.
Because you're not adding lengths of columns to sum. Do it like this instead:
awk -F'\t' '{
print length($8)
sum += length($8)
}
END {
print "Average =", sum/NR
}' file
Initialise a sum variable in a BEGIN section and accumulate the length of a column at each iteration.
I don't have your original file so I did a similar exercise for the 1st column of my /etc/passwd file:
awk -F':' 'BEGIN{sum=0} {sum += length($1); print length($1)} END{print "Average = " sum/NR}' /etc/passwd

associative arrays limit in awk. memory usage

I have a file with more than 20 million records and want to sum 5th column for every unique value of 1st column, i have used the code below.
cat test.txt |awk 'BEGIN{FS="|"}{a[$1]+=$5;}END{for(i in a) print i"|"a[i];}'
maximum value of a[i] in the output is limited to 9999.
kindly help me...any solution??
$ cat > file
1||||a|NOTICE A LETTER IN FIFTH
1||||5
2||||57
2||||34535
3||||34535353
3||||1
1||||1
$ cat file|awk 'BEGIN{FS="|"}{a[$1]+=$5;}END{for(i in a) print i"|"a[i];}'
1: 6
2: 34592
3: 34535354
What do you get with my data above?

awk compare two files gives unexpected output when swapping the position of argument files

Below are my two files' content:
cat f1
9
5
3
cat f2
1
2
3
This is my code, which works perfectly and gives output as per my understanding:
awk 'NR==FNR {a[$0]; next} FNR in a' f1 f2
3
But, when I swap the position of these 2 argument files, the output is different than what I expected.
awk 'NR==FNR {a[$0]; next} FNR in a' f2 f1
9
5
3
I expected the output as 3 again as like previous, because f2 and f1 both has exactly 3 lines and the key 3 is however stored in the hash map. Please explain how the 2nd code works.
The output from the second example is, of course, correct.
Since f2 contains the values 1, 2, 3, the array a ends up with elements a[1], a[2], and a[3]. When it is processing f1, line 1 has FNR == 1, and 1 is an index in a, so line 1 (containing 9) is printed; similarly for lines 2 and 3, hence the output you see.

awk substr several times in a row

I am using awk substr() to extract a sub string from the string.
For example if my string looks like this:
qwertyuiop
And I want to extract (1-3) & (6-9) characters I use this:
awk '{print (substr($1, 1, 3) substr($1, 6, 4))}'
qweyui
How can I repeat a specific subtraction several times?
For example, I want to extract (1-3) & (6-9)(6-9)(6-9) characters to get the result like ths:
qweyuioyuioyuio
Of course I can use a command like this:
awk '{print (substr($1, 1, 3) substr($1, 6, 4) substr($1, 6, 4) substr($1, 6, 4))}'
Is there a simpler way?
Provided you want to extract non overlapping substrings, you can use the fixed column width option of gawk:
echo "qwertyuiop" | gawk -v FIELDWIDTHS="3 2 4" '{ print $1 $3 $3 $3 }'
You define 3 columns. The first one is 3 characters wide (this is the same as substr($1, 1, 3)). The second one is 2 characters wide (and we will ignore it). The 3rd is your second substring (substr($1, 6, 4)).
The you can directly print the fields you have defined.
See https://www.gnu.org/software/gawk/manual/gawk.html#Constant-Size
There is a delightful post explaining various ways of repeating string in awk.
I'll quote the most obvious:
function rep1(s,n, r) {
# O(n) allocate/appends
# 2 lines of code
# This is the simplest possible solution that will work:
# just repeatedly append the input string onto the value
# that will be passed back, decrementing the input count
# until it reaches zero.
while (n-->0) r = r s;
return r;
}
PS: The large amount of space before function parameter in awk indicates that this parameter is used as temporary local variable.
Yes. You can simply save the substring to a variable, then re-print it as needed. Don't forget to set a null OFS:
awk '{ print substr($1, 1, 3), x = (substr($1, 6, 4)), x, x }' OFS=
Testing:
echo "qwertyuiop" | awk '{ print substr($1, 1, 3), x = (substr($1, 6, 4)), x, x }' OFS=
Results:
qweyuioyuioyuio
If you need to print something more than three or four times, it may be worthwhile using a for loop:
echo "qwertyuiop" | awk '{ for(i=1;i<=5;i++) x = x substr($1, 6, 4); print substr($1, 1, 3), x }' OFS=
Results:
qweyuioyuioyuioyuioyuio
This is one of the solutions to such a problem (messy but works).
echo qwertyuiop | awk '{m=substr($1, 6, 4); {while (count++<3) string=string m;
print substr($1, 1, 3) string}}'
qweyuioyuioyuio

Find values greater than or equal to 0.021 to 0 using awk

I have a large file with three sections. Part of it only has 3 columns, then it moves to the image section and it has 5 columns, then the rotation section which has 7 columns (I'm interested in changing the rotation section only).
I'm trying to get awk to produce a file that changes all negative values to 0 and all positive numbers greater than 0.2 to 0.
The values that I'm concerned with are only in column 7 and must be in the line containing the word ROTATION.
Here is my attempt.
awk BEGIN '/ROTATION/ {if (function abs($7) > = 0.021) $7=0; print}' awktest.tlt > awktest1.tlt
I need awk to keep the rest of the data in there as well and not just produce the change from X to 0.
As far as executing it I am using awk -f fix.awk awktest.tlt
This will do it:
awk 'NF==7&&/ROTATION/{if($7<0||$7>0.2)$7=0}1' file