Find average of numbers from a specific line - awk

I have a text file with 2 columns of numbers.
10 2
20 3
30 4
40 5
50 6
60 7
70 8
80 9
90 10
100 11
110 12
120 13
130 14
I would like to find the average of the 2nd column data from the 6th line. That is ( (7+8+9+10+11+12+13+14)/8 = 10.5 )
I could find this post Scripts for computing the average of a list of numbers in a data file
and used the following:
awk'{s+=$2}END{print "ave:",s/NR}' fileName
but I get an average of entire second column data.
Any hint here.

This one-liner should do:
awk -v s=6 'NR<s{next} {c++; t+=$2} END{printf "%.2f (%d samples)\n", t/c, c}' file
This awk script has three pattern/action pairs. The first is responsible for skipping the first s lines. The second executes on every line (from s onwards); it increments a counter and adds column 2 to a running total. The third runs after all data have been processed, and prints your results.

Below script should do the job
awk 'NR>=6{avg+=$2}END{printf "Average of field 2 starting from 6th line %.1f\n",avg/(NR-5)}' file
Output
Average of field 2 starting from 6th line 10.5

Related

Match and add exactly line

folks! I am new in programming and stuck in some moments. So, I have two files:
file1 contains:
s145 12 32 56
s430 48 56 20
s76 45 arg in
file2 contains only the name of s values:
s145 protos
s430 retus
s76 cosess
I want to add name of s values in new line, after the last column. Could someone show me how I can solve this issue. Thanks.

Distance between two lines

I have a set of points for which I need to calculate the distance between lines.
Especially for the range 70:80. Can it be possible via awk ? or any other method
sample data
70.9247 24
73.6148 24
70.9231 25
73.6144 25
70.9216 26
73.6141 26
70.9201 27
73.6138 27
70.9187 28
73.6136 28
Few points
1) Data sorted on y. So each value of y has 2 points.
2) I want the distance between x points for every y. i.e. y(new) = y(n+1)-y(n)
expected output:
2.6901 24
2.6912 25
...........
2.6949 28
thanks
What you are after is something like:
awk 'NR%2{t=$1;next}{print $1-t,$2}'
This does something like:
If the record/line number NR is an odd number, store the value of the first field in t and skip to the next record/line
Otherwise, print the expected output.
A similar way of writing this is:
awk '{if(NR%2){t=$1}else{print $1-t,$2}}'
but this is less awk-ish!

gnuplot: Spurious data points in plots when using index

I'm trying to use gnuplot 4.6 patchlevel 6 to visualize some data from a file test.dat which looks like this:
#Pkg 1
type min max avg
small 1 10 5
medium 5 15 7
large 10 20 15
#Pkg 2
small 3 9 5
medium 5 13 6
large 11 17 13
(Note that the values are actually separated by tabs even though it shows as spaces here.)
My gnuplot commands are
reset
set datafile separator "\t"
plot 'test.dat' index 0 using 2:xticlabels(1) title col, '' using 3 title col, '' using 4 title col
This works fine as long as there is only a single data block in test.dat. When I add the second block spurious data points appear. Why is that and how can it be fixed?
YFTR: Using stat on the file yields only expected results. It reports two data blocks for the full file and correct values (for min, max and sum) when I specify one of the two using index
as mentioned in the comment to the question, one has to explicitly repeat the index 0 specification within all parts of the plot command as
plot 'test.dat' index 0 using 2, '' index 0 using 3, ...
otherwise '' refers to all blocks in the data file

How to do multi-row calculations using awk on a large file

I have a big file that is sorted on the first word. I need to add a new column for each line with the proportional value: line value/total value for that group; group is determined by the first column. In the below example, the total of group "a" = 100 and hence each line gets a proportion. The total of group "the" is 1000 and hence each line gets the proprotion value of the total of that group.
I need an awk script to do this.
Sample File:
a lot 10
a few 20
a great 20
a little 40
a good 10
the best 250
the dog 750
zisty cool 20
Output:
a lot 10 0.1
a few 20 0.2
a great 20 0.1
a little 40 0.4
a good 10 0.1
the best 25 .25
the dog 75 .75
zisty cool 20 1
You describe this as a "big file." Consequently, this solution tries to save memory: it holds no more than one group in memory at a time. When we are done with that group, we print it out before starting on the next group:
$ awk -v i=0 'NR==1{name=$1} $1==name{a[i]=$0;b[i++]=$3;tot+=$3+0;next} {for (j=0;j<i;j++){print a[j],b[j]/tot} name=$1;a[0]=$0;tot=b[0]=$3;i=1} END{for (j=0;j<i;j++){print a[j],b[j]/tot}}' file
a lot 10 0.1
a few 20 0.2
a great 20 0.2
a little 40 0.4
a good 10 0.1
the best 250 0.25
the dog 750 0.75
zisty cool 20 1
How it works
-v i=0
This initializes the variable i to zero.
NR==1{name=$1}
For the first line, set the variable name to the first field, $1. This is the name of the group.
$1==name {a[i]=$0; b[i++]=$3; tot+=$3+0; next}
If the first field matches name, then save the whole line into array a and save the value of column (field) three into array b. Increment the variable tot by the value of the third field. Then, skip the rest of the commands and jump to the next line.
for (j=0;j<i;j++){print a[j],b[j]/tot} name=$1;a[0]=$0;tot=b[0]=$3;i=1
If we get to this line, then we are at the start of a new group. Print out all the values for the old group and initialize the variables for the start of the next group.
END{for (j=0;j<i;j++){print a[j],b[j]/tot}}
After we get to the last line, print out what we have for the last group.
awk '{a[$1]+=$3; b[i++]=$0; c[j++]=$1; d[k++]=$3} END{for(i=0;i<NR;i++) {print b[i], d[i]/a[c[i]]}}' File
Example:
sdlcb#Goofy-Gen:~/AMD$ cat ff
a lot 10
a few 20
a great 20
a little 40
a good 10
the best 250
the dog 750
zisty cool 20
sdlcb#Goofy-Gen:~/AMD$ awk '{a[$1]+=$3; b[i++]=$0; c[j++]=$1; d[k++]=$3} END{for(i=0;i<NR;i++) {print b[i], d[i]/a[c[i]]}}' ff
a lot 10 0.1
a few 20 0.2
a great 20 0.2
a little 40 0.4
a good 10 0.1
the best 250 0.25
the dog 750 0.75
zisty cool 20 1
Logic: update an array (a[]) with first column as index for each line. save array b[] with complete line for each line, to be used in the end for printing. similarly, update arrays c[] and d[] with first and third column values for each line. at the end, use these arrays to get the results using a for loop, looping through all the lines processed. First printing the line as itself, then the proportion value.

Circle Summation (30 Points) InterviewStree Puzzle

The following is the problem from Interviewstreet I am not getting any help from their site, so asking a question here. I am not interested in an algorithm/solution, but I did not understand the solution given by them as an example for their second input. Can anybody please help me to understand the second Input and Output as specified in the problem statement.
Circle Summation (30 Points)
There are N children sitting along a circle, numbered 1,2,...,N clockwise. The ith child has a piece of paper with number ai written on it. They play the following game:
In the first round, the child numbered x adds to his number the sum of the numbers of his neighbors.
In the second round, the child next in clockwise order adds to his number the sum of the numbers of his neighbors, and so on.
The game ends after M rounds have been played.
Input:
The first line contains T, the number of test cases. T cases follow. The first line for a test case contains two space seperated integers N and M. The next line contains N integers, the ith number being ai.
Output:
For each test case, output N lines each having N integers. The jth integer on the ith line contains the number that the jth child ends up with if the game starts with child i playing the first round. Output a blank line after each test case except the last one. Since the numbers can be really huge, output them modulo 1000000007.
Constraints:
1 <= T <= 15
3 <= N <= 50
1 <= M <= 10^9
1 <= ai <= 10^9
Sample Input:
2
5 1
10 20 30 40 50
3 4
1 2 1
Sample Output:
80 20 30 40 50
10 60 30 40 50
10 20 90 40 50
10 20 30 120 50
10 20 30 40 100
23 7 12
11 21 6
7 13 24
Here is an explanation of the second test case. I will use a notation (a, b, c) meaning that child one has number a, child two has number b and child three has number c. In the beginning, the position is always (1,2,1).
If the first child is the first to sum its neighbours, the table goes through the following situations (I will put an asterisk in front of the child that just added its two neighbouring numbers):
(1,2,1)->(*4,2,1)->(4,*7,1)->(4,7,*12)->(*23,7,12)
If the second child is the first to move:
(1,2,1)->(1,*4,1)->(1,4,*6)->(*11,4,6)->(11,*21,6)
And last if the third child is first to move:
(1,2,1)->(1,2,*4)->(*7,2,4)->(7,*13,4)->(7,13,*24)
And as you notice the output to the second case are exactly the three triples computed that way.
Hope that helps.