How to calculate quocient of sums of columns as one number? - awk

I have data:
1 82 0.20971070
2 7200 13659.50038631
3 7443 15389.87972458
and I want to print quotient of sum of cilumn 3 and sum of column 2. How to do that?
I tried:
print((sum+=$3)/(sum+=$2))
and the result is 3 numbers - it computed according to rows. The desired result is 1,972807...
EDIT
Please one more question, I have a code:
/Curve No./ { in_f_format=1; next }
/^[[:space:]]*$/ { in_f_format=0; next }
{sum2+=$2; sum3+=$3} END{printf("%.6f\n",sum3/sum2)}
How to get a column of results for more files. I wrote
awk -f program.awk file??.txt
and I get only one result - for file01.txt

awk '{sum2+=$2; sum3+=$3} END{print sum3/sum2}' file
Output:
1.97281
or
awk '{sum2+=$2; sum3+=$3} END{printf("%.20f\n",sum3/sum2)}' file
Output:
1.97280745817249592022

Related

Create new columns in specific positions by row-wise summing other columns

I would like to tranform this file, by adding two columns with the result of summing a value to another columns. I would like this new columns to be located next to the corresponding summed column.
A B C
2000 33000 2
2000 153000 1
2000 178000 1
2000 225000 1
2000 252000 1
I would like to get the following data
A A1 B B1 C
2000 2999 33000 33999 2
2000 2999 153000 153999 1
2000 2999 178000 78999 1
2000 2999 225000 225999 1
2000 2999 252000 252999 1
I have found how to sum a column: awk '{{$2 += 999}; print $0}' myFile but this transforms second column, instead of creating a new one. In addition, I am not sure about how to append that column in the desired positions.
Thanks!
awk '{
# increase number of columns
NF++
# shift columns right, note - from the back!
for (i = NF; i >= 2; --i) {
$(i + 1) = $i
}
# increase second column
$2 += 999
# print it
print
}
' myfile
And similar for 4th column.
Sample specific answer: Could you please try following, written and tested with shown samples in GNU awk.
awk '
FNR==1{
$1=$1 OFS "A1"
$2=$2 OFS "B1"
print
next
}
{
$1=$1 OFS $1+999
$2=$2 OFS $2+999
}
1
' Input_file | column -t
Generic solution: Adding more generic solution, where we need NOT to write field logic for each field, just give field number inside variable fieldsChange(give only field number with comma separated) and even heading of it will be taken care. variable valAdd is having value which you need to add into new columns.
awk -v valAdd="999" -v fieldsChange="1,2" '
BEGIN{
num=split(fieldsChange,arr,",")
for(i=1;i<=num;i++){ value[arr[i]] }
}
FNR==1{
for(i=1;i<=NF;i++) {
if(i in value) { $i=$i OFS $i"1" }
}
}
FNR>1{
for(i=1;i<=NF;i++) {
if(i in value) { $i=$i OFS $i+valAdd }
}
}
1
' Input_file | column -t

Calculating the average length of a column

I have a task, where I have to count the average length of each word in a column with awk.
awk -F'\t' '{print length ($8) } END { print "Average = ",sum/NR}' file
In the output I get the total length of each line, but it does not count the average length, the output just says Average = 0 which can not be the case because the printed lines before have numbers.
For better understanding i will copy paste the last line of the output here:
4
4
3
4
4
2
5
7
6
5
Average = 0
How do i need to change my code to get the average letters of the whole column as output?
Ty very much for your time and help :)
In the output i get the total length of each line, but it does not count the average length, the output just says Average=0 which can not be the case because the printed lines before have numbers.
Because you're not adding lengths of columns to sum. Do it like this instead:
awk -F'\t' '{
print length($8)
sum += length($8)
}
END {
print "Average =", sum/NR
}' file
Initialise a sum variable in a BEGIN section and accumulate the length of a column at each iteration.
I don't have your original file so I did a similar exercise for the 1st column of my /etc/passwd file:
awk -F':' 'BEGIN{sum=0} {sum += length($1); print length($1)} END{print "Average = " sum/NR}' /etc/passwd

how to extract lines which have no duplicated values in first column?

For some statistics research, I want to separate my data which have duplicated value in first column. I work with vim.
suppose that a part of my data is like this:
Item_ID Customer_ID
123 200
104 134
734 500
123 345
764 347
1000 235
734 546
as you can see, some lines have equal values in first column,
i want to generate two separated files, which one of them contains just non repeated values and the other contains lines with equal first column value.
for above example i want to have these two files:
first one contains:
Item_ID Customer_ID
123 200
734 500
123 345
734 546
and second one contains:
Item_ID Customer_ID
104 134
764 347
1000 235
can anybody help me?
I think awk would be a better option here.
$ awk 'FNR == NR { seen[$1]++; next } seen[$1] == 1' input.txt input.txt > uniq.txt
$ awk 'FNR == NR { seen[$1]++; next } seen[$1] > 1' input.txt input.txt > dup.txt
Prettier version of awk code:
FNR == NR {
seen[$1]++;
next
}
seen[$1] == 1
Overview
We loop over the text twice. By supplying the same file to our awk script twice we are effectively looping over the text twice. First time though the loop count the number of times we see our field's value. The second time though the loop output only the records which have a field value count of 1. For the duplicate line case we only output lines which have field value counts greater than 1.
Awk primer
awk loops over lines (or records) in a text file/input and splits each line into fields. $1 for the first field, $2 for the second field, etc. By default fields are separated by whitespaces (this can be configured).
awk runs each line through a series of rules in the form of condition { action }. Any time a condition matches then action is taken.
Example of printing the first field which line matches foo:
awk '/foo/ { print $1 }` input.txt
Glory of Details
Let's take a look at finding only the unique lines which the first field only appears once.
$ awk 'FNR == NR { seen[$1]++; next } seen[$1] == 1' input.txt input.txt > uniq.txt
Prettier version for readability:
FNR == NR {
seen[$1]++;
next
}
seen[$1] == 1
awk 'code' input > output - run code over the input file, input, and then redirect the output to file, output
awk can take more than one input. e.g. awk 'code' input1.txt input2.txt.
Use the same input file, input.txt, twice to loop over the input twice
awk 'FNR == NR { code1; next } code2' file1 file2 is a common awk idiom which will run code1 for file1 and run code2 for file2
NR is the current record (line) number. This increments after each record
FNR is the current file's record number. e.g. FNR will be 1 for the first line in each file
next will stop executing any more actions and go to the next record/line
FNR == NR will only be true for the first file
$1 is the first field's data
seen[$1]++ - seen is an array/dictionary where we use the first field, $1, as our key and increment the value so we can get a count
$0 is the entire line
print ... prints out the given fields
print $0 will print out the entire line
just print is short for print $0
condition { print $0 } can be shorted to condition { print } which can be shorted further to just condition
seen[$1] == 1 which check to see if the first field's value count is equal to 1 and print the line
Here is an awk solution:
awk 'NR>1{a[$1]++;b[NR]=$1;c[NR]=$2} END {for (i=2;i<=NR;i++) print b[i],c[i] > (a[b[i]]==1?"single":"multiple")}' file
cat single
104 134
764 347
1000 235
cat multiple
123 200
734 500
123 345
734 546
PS I skipped the first line, but it could be implemented.
This way you get one file for single hits, one for double, one for triple etc.
awk 'NR>1{a[$1]++;b[NR]=$1;c[NR]=$2} END {for (i=2;i<=NR;i++) print b[i],c[i] > "file"a[b[i]]}'
That would require some filtering of the list of lines in the buffer. If you're really into statistics research, I'd go search for a tool that is better suited than a general-purpose text editor, though.
That said, my PatternsOnText plugin has some commands that can do the job:
:2,$DeleteUniqueLinesIgnoring /\s\+\d\+$/
:w first
:undo
:2,$DeleteAllDuplicateLinesIgnoring /\s\+\d\+$/
:w second
As you want to filter on the first column, the commands' /{pattern}/ has to filter out the second column; \s\+\d\+$ matches the final number and its preceding whitespace.
:DeleteUniqueLinesIgnoring (from the plugin) gives you just the duplicates, :DeleteAllDuplicateLinesIgnoring just the unique lines. I simply :write them to separate files and :undo in between.

How to average columns over an ID?

I have a large file of data with multiple ID's followed by several columns of observations. I need to average over one of the columns of ID's. I think this can be done using awk, but I'm not sure of how to set it up.
Data:
ID1 ID2 Observation
1 15_24 -0.00002649
2 15_24 0.00001584
3 15_24 -0.00003168
1 16_2 0.00002649
2 16_2 -0.00002014
3 16_2 -0.00003058
1 12_25 0.00009636
2 12_25 -0.00007514
3 12_25 0.00003021
Need the observations averaged over ID2 like this:
1 15_24 -0.00001411
2 16_2 -0.00000808
3 12_25 0.00001714
Thank you.
Maybe so:
awk 'BEGIN{ FS=" " } { cnt[$2] += $3; lincnt[$2] +=1; } END{i=1; for (x in cnt){print i++, x, (cnt[x] /lincnt[x] ) } }' file
If ordering is relevant, this awk script could help:
#!/usr/bin/env awk
lastItem==$2{
observation+=$3
observationCounter+=1
next
}
observationCounter>0{
print ++i" "lastItem" - "observation/observationCounter
}
{
lastItem=$2
observation=$3
observationCounter=1
}
END{
print ++i" "lastItem" - "observation/observationCounter
}

Calculate Average of Column Data with Headers

I have data for example that looks like this:
Flats 2b
01/1991, 3.45
01/1992, 4.56
01/1993, 4.21
01/1994, 5.21
01/1995, 7.09
01/2013, 6.80
Eagle 2
01/1991, 4.22
01/1992, 6.32
01/1993, 5.21
01/1994, 8.09
01/1995, 7.92
01/2013, 6.33
I'm trying to calculate an average of column 2 so that my desired output looks like this preferably:
Flats 2b
Avg = 4.67
Eagle 2
Avg = 5.26
or even simpler that looks like this without the header:
Avg = 4.67
Avg = 5.26
and so on...since the input file is full of many headers with data like that shown above.
I have tried to do pattern matching options and using NR with something like this without success as an awk one-liner:
awk '/01/1991,/01/1993 {sum+=$2; cnt+=1} {print "Avg =" sum/cnt}' myfile.txt
I get averages but not my desired average for JUST the years 1991, 1992 and 1993 separately for each met tower.
Your help is much appreciated!
If you want to consider only the years 1991-1993
#! /usr/bin/awk -f
# new header, print average if exists, reset values
/[a-zA-Z]/ {
if (cnt > 0) {
print header;
printf("Avg = %.2f\n", sum/cnt);
}
header=$0; sum=0; cnt=0;
}
# calculate average
/^01\/199[123]/ { sum+=$2; cnt++; }
# print last average
END {
if (cnt > 0) {
print header;
printf("Avg = %.2f\n", sum/cnt);
}
}
This looks for awk script looks for a header, prints an average, if there is one and then resets all variables for the next average calculation. If it finds a data row it calculates the sum needed for the average later. If last line is read it prints the final average.
The script considers only the years 1991 until 1993 inclusive. If you want to include more years, you can either duplicate the calculation line or add multiple years with the or operator ||
# calculate average
/^01\/199[0-9]/ || /^01\/200[0-9]/ { sum+=$2; cnt++; }
This takes all the 1990s and 2000s into account.
If you don't want to print the headers, remove the appropriate lines print header.
You call this awk script as
awk -f script.awk myfile.txt