How to sum up every 10 lines and calculate average using AWK? - awk

I have a file containing N*10 lines, each line consisting of a number. I need to sum up every 10 lines and then print out an average for every such group. I know it's doable in awk, I just don't know how.

Try something like this:
$ cat input
1
2
3
4
5
6
2.5
3.5
4
$ awk '{sum+=$1} (NR%3)==0{print sum/3; sum=0;}' input
2
5
3.33333
(Adapt for 10-line blocks, obviously.)

May be something like this -
[jaypal:~/Temp] seq 20 > test.file
[jaypal:~/Temp] awk '
{sum+=$1}
(NR%10==0){avg=sum/10;print $1"\nTotal: "sum "\tAverage: "avg;sum=0;next}1' test.file
1
2
3
4
5
6
7
8
9
10
Total: 55 Average: 5.5
11
12
13
14
15
16
17
18
19
20
Total: 155 Average: 15.5
If you don't want all lines to be printed then the following would work.
[jaypal:~/Temp] awk '
{sum+=$1}
(NR%10==0){avg=sum/10;print "Total: "sum "\tAverage: "avg;sum=0;next}' test.file
Total: 55 Average: 5.5
Total: 155 Average: 15.5

Related

Transform a 1xA table into a BxC table in awk

I am trying to turn a 1xA table into a BxC table. Let's say A is 15, B is 3 and C is 5, hence after each 5 entries I want it to start a new row in the same table.
I have a rather tedious way that appears to get close be it misses some values after each 5. I think the issue is with RS, as a new line forgets the "space" needed by RS, but I tried changing this to something else in file.sum and still no luck. Perhaps there is a better way to do it, but feel this should work.
awk -v RS=" " '{getline a1; getline a2; getline a3; getline a4; getline a5; print a1,a2,a3,a4,a5}' OFS='\t' file.sum
file.sum (my 1xA):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Expected results (my BxC):
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
Actual results:
1 2 3 4 5
7 8 9 10 11
13 14 15 10 11
This should be one of the simplest solution:
xargs -n5 <file
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
To follow up on your awk. I do not like the getline so I always try to avoid it. Also loop slows down awk some.
But using RS=" " you can do like this:
awk -v RS=" " '{$1=$1} {printf NR%5==0?"%s\n":"%s ",$0}' file
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
You can remove the {$1=$1}, but will then get a blank line at the end.
The NR%5==0 test if record is every 5th and insert newline when needed.
A tab version:
awk -v RS=" " '{$1=$1} {printf NR%5==0?"%s\n":"%s\t",$0}' file
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15

Transforming data by a constant

I have a file like this
3 4 5 6 7 1
4 5 4 4 4 4
2 2 3 3 2 1
and I want to multiple every data point by a constant (e.g. 10) to get the following output
30 40 50 60 70 10
40 50 40 40 40 40
20 20 30 30 20 10
I have been trying to do it like this without success
awk '{i=1; while (i<=NF) print $i*10; i++}'
You are using a while loop, whereas a simple for would suffice:
$ awk '{for (i=1; i<=NF; i++) $i*=10}1' file
30 40 50 60 70 10
40 50 40 40 40 40
20 20 30 30 20 10
A while needs a finish condition and you are not providing none. Also, if the amount of loops to do is a fixed number, it is better to just use for.
Note that you were saying print, so that every field would be printed in a different line. By replacing each one of them and using 1 to print it afterwards, you keep the format.

calculating sum and average only for selected data set only

I have a dataset as below:
col-1 col-2 col-3 col-4 col-5 col-6 col-7 col-8
0 17 215 55.7059 947 BMR_42 O22-BMR_1 O23-H23
1 1 1 1.0000 1 BMR_42 O23-BMR_1 O23-H23
2 31 3 1.0968 34 BMR_31 O22-BMR_1 O26-H26
3 11 2 1.0909 12 BMR_31 O13-BMR_1 O26-H26
4 20 5 1.8500 37 BMR_49 O22-BMR_1 O26-H26
5 24 4 1.7917 43 BMR_49 O23-BMR_1 O26-H26
6 41 2 1.0488 43 BMR_49 O12-BMR_1 O12-H12
7 28 2 1.0357 29 BMR_49 O22-BMR_1 O13-H13
8 1 1000 1000.0000 1000 BMR_49 O13-BMR_1 O13-H13
9 1 1 1.0000 1 BMR_22 O12-BMR_2 O22-H22
10 50 62 18.9400 947 BMR_59 O13-BMR_2 O22-H22
11 1 1 1.0000 1 BMR_59 O25-BMR_2 O23-H23
12 34 5 1.1471 39 BMR_59 O13-BMR_2 O23-H23
13 7 6 2.1429 15 BMR_59 O26-BMR_2 O24-H24
14 6 8 3.6667 22 BMR_59 O25-BMR_2 O24-H24
15 28 2 1.1071 31 BMR_10 O26-BMR_2 O26-H26
16 52 121 15.1346 787 BMR_10 O25-BMR_2 O26-H26
17 65 9 1.9231 125 BMR_10 O13-BMR_2 O26-H26
18 4 4 2.2500 9 BMR_59 O26-BMR_2 O26-H26
19 9 1 1.0000 9 BMR_22 O15-BMR_2 O13-H13
20 1 1 1.0000 1 BMR_10 O11-BMR_2 O16-H16
21 7 2 1.1429 8 BMR_53 O13-BMR_2 O16-H16
22 2 3 2.5000 5 BMR_33 O13-BMR_3 O22-H22
23 97 54 6.8247 662 BMR_61 O26-BMR_3 O22-H22
24 1 1 1.0000 1 BMR_29 O26-BMR_3 O23-H23
25 31 36 3.3226 103 BMR_29 O16-BMR_3 O23-H23
(The real file contains over 2000 lines).
I want to select data under certain criteria and find the sum and average of that. For example I want to select lines containing O22 in column $7 and $8 and calculate the sum and average of the values in column $4.
I tried a script as below:
awk '$7 ~ /O22/ && $8 ~ /O22/ {sum += $4} END {print sum, (sum/NR) }' hhsolute.lifetime2.dat
This code could select the line correctly but when I want to calculate the average (sum/NR), I don't get the correct value.
I wish to get some help on this. How I could get the sum and average values only for the data lines I wanted?
Appreciate any help in advance.
awk -v tgt="O22" '
$7 ~ tgt && $8 ~ tgt { sum+=$4; cnt++ }
END { print sum+0, (cnt ? sum/cnt : 0) }
' file
Try this:
awk '$7~/O22/ && $8~/O22/{++n;sum+=$4}END{if(n) print "Sum = " (sum), "Average= "(sum/n)}' File
If 7th and 8th field both contains pattern O22, add 4th field value to variable sum, increase n. Within END block, print the sum and average.

AWK: print colums of a matrix using first column as reference

I want to read first colum in a matrix, and then print columns of this matrix using this first colum as reference. And example:
mat.txt
2 10 6 12 3
4 11 1 22 6
5 15 3 18 9
Using first column as reference, I would like to get columns 2, 4 and 5, and also put the value of first colum at the begining.
2 10 12 3
4 11 22 6
5 15 18 9
I try this, but doesn't work well:
awk 'FNR==NR{c++;cols[c]=$1;end}
{for(i=1;i&lt=c;i++) printf("%s%s",$(cols[i]+1),i&ltc ? OFS : "\n")}' mat.txt mat.txt
This may do:
awk 'FNR==NR {a[NR]=$1;next} {printf "%s ",a[FNR];for (i in a) printf "%s ",$(a[i]);print ""}' mat.txt{,}
2 10 12 3
4 11 22 6
5 15 18 9
The {,} make the file be used two times.

Sum of every N lines ; awk

I have a file containing data in a single column .. I have to find the sum of every 4 lines and print the sum
That is, I have to compute sum of values from 0-3rd line sum of line 4 to 7,sum of lines 8 to 11 and so on .....
awk '{s+=$1}NR%4==0{print s;s=0}' file
if your file has remains
$ cat file
1
2
3
4
5
6
7
8
9
10
$ awk '{s+=$1}NR%4==0{print s;t+=s;s=0}END{print "total: ",t;if(s) print "left: " s}' file
10
26
total: 36
left: 19
$ cat file
1
2
3
4
5
6
7
8
$ awk '{subtotal+=$1} NR % 4 == 0 { print "subtotal", subtotal; total+=subtotal; subtotal=0} END {print "TOTAL", total}' file
subtotal 10
subtotal 26
TOTAL 36