Sum of every N lines ; awk - scripting

I have a file containing data in a single column .. I have to find the sum of every 4 lines and print the sum
That is, I have to compute sum of values from 0-3rd line sum of line 4 to 7,sum of lines 8 to 11 and so on .....

awk '{s+=$1}NR%4==0{print s;s=0}' file
if your file has remains
$ cat file
1
2
3
4
5
6
7
8
9
10
$ awk '{s+=$1}NR%4==0{print s;t+=s;s=0}END{print "total: ",t;if(s) print "left: " s}' file
10
26
total: 36
left: 19

$ cat file
1
2
3
4
5
6
7
8
$ awk '{subtotal+=$1} NR % 4 == 0 { print "subtotal", subtotal; total+=subtotal; subtotal=0} END {print "TOTAL", total}' file
subtotal 10
subtotal 26
TOTAL 36

Related

Select current and previous line if certain value is found

To figure out my problem, I subtract column 3 and create a new column 5 with new values, then I print the previous and current line if the value found is equal to 25 in column 5.
Input file
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
I tried
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' file
output
1 1 35 1 35
2 5 50 1 15
2 6 75 1 25
4 7 85 1 10
5 8 100 1 15
6 9 125 1 25
4 1 200 1 75
Desired Output
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Thanks in advance
you're almost there, in addition to previous $3, keep the previous $0 and only print when condition is satisfied.
$ awk '{$5=$3-p3} $5==25{print p0; print} {p0=$0;p3=$3}' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
this can be further golfed to
$ awk '25==($5=$3-p3){print p0; print} {p0=$0;p3=$3}' file
check the newly computed field $5 whether equal to 25. If so print the previous line and current line. Save the previous line and previous $3 for the computations in the next line.
You are close to the answer, just pipe it another awk and print it
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
with Inputs:
$ cat oxxo.txt
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
$ awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
$
Could you please try following.
awk '$3-prev==25{print line ORS $0,$3} {$(NF+1)=$3-prev;prev=$3;line=$0}' Input_file | column -t
Here's one:
$ awk '{$5=$3-q;t=p;p=$0;q=$3;$0=t ORS $0}$10==25' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Explained:
$ awk '{
$5=$3-q # subtract
t=p # previous to temp
p=$0 # store previous for next round
q=$3 # store subtract value for next round
$0=t ORS $0 # prepare record for output
}
$10==25 # output if equals
' file
No checking for duplicates so you might get same record printed twice. Easiest way to fix is to pipe the output to uniq.

Average of a fixed number of rows

Given the following input:
256 1 4 1 130.363
256 1 4 2 128.332
256 1 4 3 130.262
256 1 4 4 128.395
256 1 4 5 128.484
64 2 4 1 95.227
64 2 4 2 96.582
64 2 4 3 95.785
64 2 4 4 93.944
64 2 4 5 97.398
64 4 4 1 143.519
64 4 4 2 143.579
64 4 4 3 143.937
64 4 4 4 142.292
64 4 4 5 143.304
I am trying to obtain the average of a given number of rows. In this case, I've got 5 samples indicated by the 4th column. So the expected output should be:
256 1 4 129.167
64 2 4 95.787
64 4 4 143.326
To loop over, I have tried something like
awk 'BEGIN {i = 1; while (s[$4] <= 5) { print $4 } }'
But it is not even printing what I want. Also tried this
awk '{array[$1" "$2]+=$5} END { for (i in array) {print i" " array[i]/length(array)}}'
$ awk '{curr = $1 OFS $2 OFS $3} curr!=prev {if (cnt) print prev, sum/cnt; prev=curr; sum=cnt=0} {sum+=$5; cnt++} END{if (cnt) print prev, sum/cnt}' file
256 1 4 129.167
64 2 4 95.7872
64 4 4 143.326
The differences between this and #NinjaGaiden's solution are that:
This relies on all the data associated with key values being
contiguous as shown in your sample input while NGs does not.
This does not save the contents of the input file in memory while NGs does.
This will print the output in the same order it occurred in the input while NGs will print it in random (hash) order.
Try this
awk '{k=$1" "$2" "$3; j[k]+=$5;z[k]+=1} END { for (x in j) { print x,j[x]/z[x] }} ' f
awk '{array[$1" "$2" "$3]+=$5} END { for (i in array) {print i" " array[i]/length(array)}}'
This also work with a variable number of rows
awk '
{
a[$1" "$2" "$3]+=$5
b[$1" "$2" "$3]++
} END { for (i in a) { print i, (a[i] / b[i]) }}
' file

Finding NR of row with specific conditions (using next line)

Guys I have a file like this
NR column
1 1
2 1
3 0
4 0
5 0
6 1
7 1
8 1
9 1
10 0
11 0
12 0
13 1
14 1
What I need is to find the NR what will tell me where there are 1.
so my ideal output should tell me from NR=1 - 2 (there are 1s, then), NR=6 - 9, NR=13 - 14
or
1
2
6
9
13
14
Since, I think is easier not consider in the output the first row and the last. I expect that the output is
2
6
9
13
I've been trying a way to use getline but unsuccessfully.
I am sure there is an easy way to do this, help?
Thanks
Assuming your output above was incorrect (and it should really be the line number where the 0/1 or 1/0 transition happens - so the lines would be: "1, 3, 6, 10, 13"), then an awk oneliner is:
awk 'prev!=$0{print NR};{prev=$0}' file
which says:
for every line that doesn't match the prev line, print the line number, and
for every line, save the prev line
$ awk 'NR>1 && $0!=prev{print NR} {prev=$0}' file
3
6
10
13
or for your updated requirements:
$ awk '$1!=prev{print NR-prev} {prev=$1} END{if (prev) print NR}' file
1
2
6
9
13
14
awk to the rescue!
$ awk '!p&&$2==1{p=$1}
p&&!$2{print p"-"($1-1);p=0}
END{if(p) print p"-"$1}' file
1-2
6-9
13-14
{
if (NR > 1 && last != $0) {
print NR;
}
last = $0;
}
Another way
awk '$2!=x{x=$2;print NR-!($2)}END{if(x)print NR}' file
1
2
6
9
13
14

print a line from every 5 elements of a column

I am looking for a way to select a column (e. g. eighth column) of a data file and write the first five numbers of that column in a row, the next five numbers in second row, and so on.
I have been testing with awk and printf without success.
The awk way to do this is to switch from using OFS and ORS to separate the output using the modulus function:
$ seq 1 20 | awk '{printf "%s", $1 (NR % 5 ? OFS : ORS)}'
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Change $1 to $8 for the eigth column for example and NR % 5 to NR % 10 for rows of 10 instead of 5. The seq command just generate a single column of digits from 1 to 20 used for demonstration.
I also find using xargs useful for this kind of thing:
$ seq 1 20 | awk '{print $1}' | xargs -n5
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
The awk isn't necessary for the example as seq only produces a single column however for your question change $1 to $8 to select only the eighth column from your input. With this approach you could also switch out awk with cut.
This will also produce the format requested
seq 1 20 | awk '{printf("%s ", $1); if (NR % 5 == 0) printf("\n")}'
where $1 indicates de column number which could be changed when passing an archive to the awk line.

How to append columns of data using awk

I have a file in this format:-
1 2 3 4
5 6 7 8
9 10 11 12
I need assistance to append the columns in a loop like this
1
5
9
2
6
10
...
this line should work with dynamic rows and columns
awk '{for(i=1;i<=NF;i++)a[NR][i]=$i}END{for(i=1;i<=NF;i++){for(j=1;j<=NR;j++)print a[j][i]; print ""}}' file
it looks better in this format:
awk '{for(i=1;i<=NF;i++)a[NR][i]=$i}
END{
for(i=1;i<=NF;i++){
for(j=1;j<=NR;j++)
print a[j][i]
print ""
}
}' file
with your example:
kent$ awk '{for(i=1;i<=NF;i++)a[NR][i]=$i}END{for(i=1;i<=NF;i++){for(j=1;j<=NR;j++)print a[j][i]; print ""}}' file
1
5
9
2
6
10
3
7
11
4
8
12