Average of a fixed number of rows

Average of a fixed number of rows - awk

Given the following input:
256 1 4 1 130.363
256 1 4 2 128.332
256 1 4 3 130.262
256 1 4 4 128.395
256 1 4 5 128.484
64 2 4 1 95.227
64 2 4 2 96.582
64 2 4 3 95.785
64 2 4 4 93.944
64 2 4 5 97.398
64 4 4 1 143.519
64 4 4 2 143.579
64 4 4 3 143.937
64 4 4 4 142.292
64 4 4 5 143.304
I am trying to obtain the average of a given number of rows. In this case, I've got 5 samples indicated by the 4th column. So the expected output should be:
256 1 4 129.167
64 2 4 95.787
64 4 4 143.326
To loop over, I have tried something like
awk 'BEGIN {i = 1; while (s[$4] <= 5) { print $4 } }'
But it is not even printing what I want. Also tried this
awk '{array[$1" "$2]+=$5} END { for (i in array) {print i" " array[i]/length(array)}}'

$ awk '{curr = $1 OFS $2 OFS $3} curr!=prev {if (cnt) print prev, sum/cnt; prev=curr; sum=cnt=0} {sum+=$5; cnt++} END{if (cnt) print prev, sum/cnt}' file
256 1 4 129.167
64 2 4 95.7872
64 4 4 143.326
The differences between this and #NinjaGaiden's solution are that:
This relies on all the data associated with key values being
contiguous as shown in your sample input while NGs does not.
This does not save the contents of the input file in memory while NGs does.
This will print the output in the same order it occurred in the input while NGs will print it in random (hash) order.

Try this
awk '{k=$1" "$2" "$3; j[k]+=$5;z[k]+=1} END { for (x in j) { print x,j[x]/z[x] }} ' f

awk '{array[$1" "$2" "$3]+=$5} END { for (i in array) {print i" " array[i]/length(array)}}'

This also work with a variable number of rows
awk '
{
a[$1" "$2" "$3]+=$5
b[$1" "$2" "$3]++
} END { for (i in a) { print i, (a[i] / b[i]) }}
' file

Related

To sum adjacent lines from the same column in AWK

I have a file:
A 1 20
B 2 21
C 3 22
D 4 23
I have to find the sum of values from 0-3rd line then the sum of line 1 to 3 and finally the sum of line 2 to 3. The last value has to be simply 0. In another words, I want to get an output file with two columns where the values are the sum of adjacent lines something like this:
10 86
9 66
7 45
0 0
The last row has to have two zeros as values. How to do it in AWK?

This might be what you want:
$ tac file | awk 'NR==1{ print 0, 0; a=$2; b=$3; next} { print a+=$2, b+=$3 }' | tac
10 86
9 66
7 45
0 0

Avoid two tacs by accumulating the sums in two arrays:
$ awk '{
for (i = 1; i <= NR; ++i) { sum2[i] += $2; sum3[i] += $3 }
}
END {
sum2[NR] = sum3[NR] = 0
for (i = 1; i <= NR; ++i) print sum2[i], sum3[i]
}' file
10 86
9 66
7 45
0 0
The value of each row is added into all the previous rows. Once all rows have been processed, the last values are zeroed out and everything is printed.

Finding NR of row with specific conditions (using next line)

Guys I have a file like this
NR column
1 1
2 1
3 0
4 0
5 0
6 1
7 1
8 1
9 1
10 0
11 0
12 0
13 1
14 1
What I need is to find the NR what will tell me where there are 1.
so my ideal output should tell me from NR=1 - 2 (there are 1s, then), NR=6 - 9, NR=13 - 14
or
1
2
6
9
13
14
Since, I think is easier not consider in the output the first row and the last. I expect that the output is
2
6
9
13
I've been trying a way to use getline but unsuccessfully.
I am sure there is an easy way to do this, help?
Thanks

Assuming your output above was incorrect (and it should really be the line number where the 0/1 or 1/0 transition happens - so the lines would be: "1, 3, 6, 10, 13"), then an awk oneliner is:
awk 'prev!=$0{print NR};{prev=$0}' file
which says:
for every line that doesn't match the prev line, print the line number, and
for every line, save the prev line

$ awk 'NR>1 && $0!=prev{print NR} {prev=$0}' file
3
6
10
13
or for your updated requirements:
$ awk '$1!=prev{print NR-prev} {prev=$1} END{if (prev) print NR}' file
1
2
6
9
13
14

awk to the rescue!
$ awk '!p&&$2==1{p=$1}
p&&!$2{print p"-"($1-1);p=0}
END{if(p) print p"-"$1}' file
1-2
6-9
13-14

{
if (NR > 1 && last != $0) {
print NR;
}
last = $0;
}

Another way
awk '$2!=x{x=$2;print NR-!($2)}END{if(x)print NR}' file
1
2
6
9
13
14

print a line from every 5 elements of a column

I am looking for a way to select a column (e. g. eighth column) of a data file and write the first five numbers of that column in a row, the next five numbers in second row, and so on.
I have been testing with awk and printf without success.

The awk way to do this is to switch from using OFS and ORS to separate the output using the modulus function:
$ seq 1 20 | awk '{printf "%s", $1 (NR % 5 ? OFS : ORS)}'
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Change $1 to $8 for the eigth column for example and NR % 5 to NR % 10 for rows of 10 instead of 5. The seq command just generate a single column of digits from 1 to 20 used for demonstration.
I also find using xargs useful for this kind of thing:
$ seq 1 20 | awk '{print $1}' | xargs -n5
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
The awk isn't necessary for the example as seq only produces a single column however for your question change $1 to $8 to select only the eighth column from your input. With this approach you could also switch out awk with cut.

This will also produce the format requested
seq 1 20 | awk '{printf("%s ", $1); if (NR % 5 == 0) printf("\n")}'
where $1 indicates de column number which could be changed when passing an archive to the awk line.

Get identical rows

I have a file like this: (data.dat)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 7
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 9
6 6 6 6 6 6 6 6 6 6 6 7 6 7
7 9 7 7 7 7 7 7 7 7 7 8 7 9
8 10 8 9 8 9 8 8 8 8 8 9
9 11 9 10 9 9 9 9 9 10
10 12 10 11 10 10 10 11
The odd columns are simple line counters (NR), the even columns are simple values. I would like to get those values, in which the second (or even) colum values are the same in all even columns, i.e. I should get this output:
1
2
3
9
I have already tried to make this line, but something is wrong:
awk '{arr1[$1]=$2;arr2[$3]=$4;arr3[$5]=$6;arr4[$7]=$8;arr5[$9]=$10;arr6[$11]=$12;arr7[$13]=$14;arr8[$15]=$16;}END{for(x in arr1) if(x in arr2 && x in arr3 && x in arr4 && x in arr5 && x in arr6 && x in arr7 && x in arr8) print arr1[x];}' data.dat | sort -n
Is there a better way, by the way?
UPDATE: The real problem is that the array indices are different. So, the arr[...] method does not work... :(

This would work -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' INPUT_FILE
Explanation:
We set a variable x=0 in the BEGIN statement.
We use this variable to get to find out maximum number of fields (This is useful later).
We store value of every second column to an array and get their number of occurrences.
We divide the variable x by 2 to verify maximum number a value can occur in every second column.
If the occurrences of numbers in an array matches this variable it means they are present in every second column.
Test: with your sample file
[jaypal:~/Temp] awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' file
2
3
9
1
You can either pipe the output to sort -n to get it in order or use this -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (i=1;i<=length(a);i++) if (x==a[i]) print i}' INPUT_FILE

Your example works with just a simple;
awk '{if($2==$4 && $2==$6 && $2==$8 && $2==$10 && $2==$12 && $2==$14 && $2==$16) print $1}' test.txt | sort -n
Any other requirements I'm missing?
EDIT: Apparently with the missing columns you added :) Try
awk '{if(NF>1) { found=1; for(i=4; i<NF+1; i+=2) { if($2!=$i) { found=0; } } } if(found) print $1}' test.txt | sort -n

In your input data row # 9 doesn't have all even columns same so not sure how you show 9 in your desired output. You can try following awk command to print 1st col for your task:
awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}' awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}'

Sum of every N lines ; awk

I have a file containing data in a single column .. I have to find the sum of every 4 lines and print the sum
That is, I have to compute sum of values from 0-3rd line sum of line 4 to 7,sum of lines 8 to 11 and so on .....

awk '{s+=$1}NR%4==0{print s;s=0}' file
if your file has remains
$ cat file
1
2
3
4
5
6
7
8
9
10
$ awk '{s+=$1}NR%4==0{print s;t+=s;s=0}END{print "total: ",t;if(s) print "left: " s}' file
10
26
total: 36
left: 19

$ cat file
1
2
3
4
5
6
7
8
$ awk '{subtotal+=$1} NR % 4 == 0 { print "subtotal", subtotal; total+=subtotal; subtotal=0} END {print "TOTAL", total}' file
subtotal 10
subtotal 26
TOTAL 36

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Average of a fixed number of rows - awk

Try this awk '{k=$1" "$2" "$3; j[k]+=$5;z[k]+=1} END { for (x in j) { print x,j[x]/z[x] }} ' f

awk '{array[$1" "$2" "$3]+=$5} END { for (i in array) {print i" " array[i]/length(array)}}'

This also work with a variable number of rows awk ' { a[$1" "$2" "$3]+=$5 b[$1" "$2" "$3]++ } END { for (i in a) { print i, (a[i] / b[i]) }} ' file

Related

To sum adjacent lines from the same column in AWK

Finding NR of row with specific conditions (using next line)

print a line from every 5 elements of a column

Get identical rows

Sum of every N lines ; awk

Categories

Resources