I am trying read a CSV text file and find average of weekly hours (columns 3 through 7) spent by all user-ids (column 2) ending with an even number (2,4,6,...).
The input sample is as below:
Computer ID,User ID,M,T,W,T,F
Computer1,User3,5,7,3,5,2
Computer2,User5,8,8,8,8,8
Computer3,User4,0,8,0,8,4
Computer4,User1,5,4,5,5,8
Computer5,User2,9,8,10,0,0
Computer6,User7,4,7,8,2,5
Computer7,User6,8,8,8,0,0
Computer8,User9,5,2,0,6,8
Computer9,User8,2,5,7,3,6
Computer10,User10,8,9,9,9,10
I have written the following script:
awk -F, '$2~/[24680]$/{for(i=3;i<=7;i++){a+=$i};printf "%s\t%.2g\n",$2,a/5;a=0}' user-list.txt > superuser.txt
The output of this script is:
User4 4
User2 5.4
User6 4.8
User8 4.6
User10 9
However, I want to change the script to only print one average for all user-Ids ending with an even number.
The desired output for this would be as below (which is technically the average of all hours for the IDs ending with even numbers):
5.56
Any help would be appreciated.
TIA
Trying to fix OP's attempt here and adding logic to get average of averages at last of the file's reading. Written on mobile so couldn't test it should work in case I got the thought correct by OP's description.
awk -F, '
$2~/[24680]$/{
count++
for(i=3;i<=7;i++){
sum+=$i
}
tot+=sum/5
sum=0
}
END{
print "Average of averages is: " (count?tot/count:"NaN")
}
' user-list.txt > superuser.txt
You may try:
awk -F, '$2 ~ /[02468]$/ {
for(i=3; i<=7; i++) {
s += $i
++n
}
}
END {
if (n)
printf "%.2f\n", s/n
}' cust.csv
5.56
awk -F, 'NR == 1 { next } { match($2,/[[:digit:]]+/);num=substr($2,RSTART,RLENGTH);if(num%2==0) { av+=($3+$4+$5+$6+$7)/5 } } END { printf "%.2f\n",av/5}' user-list.txt
Ignore the first header like. Pick the number out of the userid with awk's match function. Set the num variable to this number. Check to see if the number is even with num%2. If it is average, set the variable av to av plus the average. At the end, print the average to 2 decimal places.
Print the daily average, for all even numbered user IDs:
#!/bin/sh
awk -F , '
(NR>1) &&
($2 ~ /[02468]$/) {
hours += ($3 + $4 + $5 + $6 + $7)
(users++)
}
END {
print (hours/users/5)
}' \
"$1"
Usage example:
$ script user-list
5.56
One way to get evenness or oddness of an integer is to use modulus (%), as in N % 2. For even values of N, this sum evaluates to zero, and for odd values, it evaluates to 1.
However in this case, a string operation would be required to extract the number any way, so may as well just use a single string match, to get odd or even.
Also, IMO, for 5 fields, which are not going to change (days of the week), it's more succinct to just add them directly, instead of a loop. (NR>1) skips the titles line too, in case there's a conflict.
Finally, you can of of course swap /[02468]$/ for /[13579]$/ to get the same data, for odd numbered users.
I have a task, where I have to count the average length of each word in a column with awk.
awk -F'\t' '{print length ($8) } END { print "Average = ",sum/NR}' file
In the output I get the total length of each line, but it does not count the average length, the output just says Average = 0 which can not be the case because the printed lines before have numbers.
For better understanding i will copy paste the last line of the output here:
4
4
3
4
4
2
5
7
6
5
Average = 0
How do i need to change my code to get the average letters of the whole column as output?
Ty very much for your time and help :)
In the output i get the total length of each line, but it does not count the average length, the output just says Average=0 which can not be the case because the printed lines before have numbers.
Because you're not adding lengths of columns to sum. Do it like this instead:
awk -F'\t' '{
print length($8)
sum += length($8)
}
END {
print "Average =", sum/NR
}' file
Initialise a sum variable in a BEGIN section and accumulate the length of a column at each iteration.
I don't have your original file so I did a similar exercise for the 1st column of my /etc/passwd file:
awk -F':' 'BEGIN{sum=0} {sum += length($1); print length($1)} END{print "Average = " sum/NR}' /etc/passwd
I have this file:
- - - Results from analysis of weight - - -
Akaike Information Criterion 307019.66 (assuming 2 parameters).
Bayesian Information Criterion 307036.93
Approximate stratum variance decomposition
Stratum Degrees-Freedom Variance Component Coefficients
id 39892.82 490.360 0.7 0.6 1.0
damid 0.00 0.00000 0.0 0.0 1.0
Residual Variance 1546.46 320.979 0.0 0.0 1.0
Model_Term Gamma Sigma Sigma/SE % C
id NRM_V 17633 0.18969 13.480 4.22 0 P
damid NRM_V 17633 0.07644 13.845 2.90 0 P
ide(damid) IDV_V 17633 0.00000 32.0979 1.00 0 S
Residual SCA_V 12459 1.0000 320.979 27.81 0 P
And I Would Like to print the Value of Sigma on id, note there are two id on the file, so I used the condition based on NRM_V too.
I tried this code:
tac myfile | awk '(/id/ && /NRM_V/){print $5}'
but the results printed were:
13.480
13.845
and I need just the first one
Could you please try following, I have added exit function of awk here which will help us to exit from code ASAP whenever first occurrence of condition comes, it will help us to save time too, since its no longer reading whole Input_file.
awk '(/id/ && /NRM_V/){print $5;exit}' Input_file
OR with columns:
awk '($1=="id" && $2=="NRM_V"){print $5;exit}' Input_file
In case you want to read file from last line towards first line and get its first value then try:
tac Input_file | awk '(/id/ && /NRM_V/){print $5;exit}'
OR with columns comparisons:
tac Input_file | awk '($1=="id" && $2=="NRM_V"){print $5;exit}'
The problem is that /id/ also matches damid. You could use the following to print the Sigma value only if the first field is id and the second field is NRM_V:
awk '$1=="id" && $2=="NRM_V"{ print $5 }' myfile
I have data:
1 82 0.20971070
2 7200 13659.50038631
3 7443 15389.87972458
and I want to print quotient of sum of cilumn 3 and sum of column 2. How to do that?
I tried:
print((sum+=$3)/(sum+=$2))
and the result is 3 numbers - it computed according to rows. The desired result is 1,972807...
EDIT
Please one more question, I have a code:
/Curve No./ { in_f_format=1; next }
/^[[:space:]]*$/ { in_f_format=0; next }
{sum2+=$2; sum3+=$3} END{printf("%.6f\n",sum3/sum2)}
How to get a column of results for more files. I wrote
awk -f program.awk file??.txt
and I get only one result - for file01.txt
awk '{sum2+=$2; sum3+=$3} END{print sum3/sum2}' file
Output:
1.97281
or
awk '{sum2+=$2; sum3+=$3} END{printf("%.20f\n",sum3/sum2)}' file
Output:
1.97280745817249592022
I have an input file that contains, per row, a value and two weights.
I would like to generate two output files - where the value in the first column is repeated once per line, according to the weights. This is probably best explained with a short example. If the input file is:
file.in:
35 2 0
37 2 3
38 0 4
Then I would like to generate two output files:
file.out1:
35
35
37
37
file.out2:
37
37
37
38
38
38
38
I will then use these output files to calculate the average and median of first column according to the weights in the second and third column.
This is pretty easy in awk.
awk '{for(i=0;i<$2;i++) print $1;}' file.in > file.out1
generates the first file, and
awk '{for(i=0;i<$3;i++) print $1;}' file.in > file.out2
generates the second
It is not clear from your question whether you know how to compute the mean and median from these files - it seems you just wanted to create these output files. Let me know if the rest is giving your trouble, or whether the above scripts are not clear (I think they are pretty self-explanatory).
If I understood well you need average and median.
Average:
awk '{a+=$1}END{print a/NR}' file.in
36.6667
Median:
cat file.in | awk '{print $1}' | sort | awk '{a[NR]=$1}END{ b=NR/2; b=b%1?int(b)+1:b; print a[b] }'
37
Explanation:
Putting in simple terms NR is a variable which keeps the number of lines, for average you want a sum of every line divided by the number of lines.
For median you want you input sorted and pick the median value, but it's not so simple for your input because I you divide number of lines which is 3 by 2 you will get 1.5 so you need a ceiling function which awk doesn't have so I am doing it with b=NR/2; b=b%1?int(b)+1:b;
I hope this helps.