average of specific rows in a file - awk

I have 6 rows in files. I need to find average only of specific rows in a file and the others should be left as they are. The average should be calculated for A1 and A2, B1 and B2, other lines should stay as they are
Input:
A1 1 1 2
A2 5 6 1
A3 1 1 1
B1 10 12 12
B2 10 12 10
B3 100 200 300
Output:
A1A2 3 3.5 1.5
A3 1 1 1
B1B2 10 12 11
B3 100 200 300
EDIT: There are n columns in total

awk to the rescue!
$ awk '/[AB][12]/{a=substr($1,1,1);
k=a"1"a"2";
c1[k]+=$2; c2[k]+=$3; c3[k]+=$4; n[k]++; next}
1;
END{for(k in c1)
print k, c1[k]/n[k], c2[k]/n[k], c3[k]/n[k]}' file | sort | column -t
A1A2 3 3.5 1.5
A3 1 1 1
B1B2 10 12 11
B3 100 200 300
pattern match grouped rows, create a key, calculate sum of all fields and count of rows per key; print unmatched rows; when done print the averaged rows, since order is not preserved sort and pipe to column for easy formatting.

$ cat tst.awk
$1 ~ /^[AB]1$/ { for (i=2;i<=NF;i++) val[$1,i]=$i; next }
$1 ~ /^[AB]2$/ { p=$1; sub(2,1,p); $1=p $1; for (i=2;i<=NF;i++) $i=($i + val[p,i])/2 }
{ print }
$ awk -f tst.awk file | column -t
A1A2 3 3.5 1.5
A3 1 1 1
B1B2 10 12 11
B3 100 200 300

Related

Select current and previous line if certain value is found

To figure out my problem, I subtract column 3 and create a new column 5 with new values, then I print the previous and current line if the value found is equal to 25 in column 5.
Input file
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
I tried
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' file
output
1 1 35 1 35
2 5 50 1 15
2 6 75 1 25
4 7 85 1 10
5 8 100 1 15
6 9 125 1 25
4 1 200 1 75
Desired Output
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Thanks in advance
you're almost there, in addition to previous $3, keep the previous $0 and only print when condition is satisfied.
$ awk '{$5=$3-p3} $5==25{print p0; print} {p0=$0;p3=$3}' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
this can be further golfed to
$ awk '25==($5=$3-p3){print p0; print} {p0=$0;p3=$3}' file
check the newly computed field $5 whether equal to 25. If so print the previous line and current line. Save the previous line and previous $3 for the computations in the next line.
You are close to the answer, just pipe it another awk and print it
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
with Inputs:
$ cat oxxo.txt
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
$ awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
$
Could you please try following.
awk '$3-prev==25{print line ORS $0,$3} {$(NF+1)=$3-prev;prev=$3;line=$0}' Input_file | column -t
Here's one:
$ awk '{$5=$3-q;t=p;p=$0;q=$3;$0=t ORS $0}$10==25' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Explained:
$ awk '{
$5=$3-q # subtract
t=p # previous to temp
p=$0 # store previous for next round
q=$3 # store subtract value for next round
$0=t ORS $0 # prepare record for output
}
$10==25 # output if equals
' file
No checking for duplicates so you might get same record printed twice. Easiest way to fix is to pipe the output to uniq.

Count number of occurrences of a number larger than x from every raw

I have a file with multiple rows and 26 columns. I want to count the number of occurrences of values that are higher than 0 (I guess is also valid different from 0) in each row (excluding the first two columns). The file looks like this:
X Y Sample1 Sample2 Sample3 .... Sample24
a a1 0 7 0 0
b a2 2 8 0 0
c a3 0 3 15 3
d d3 0 0 0 0
I would like to have an output file like this:
X Y Result
a a1 1
b b1 2
c c1 3
d d1 0
awk or sed would be good.
I saw a similar question but in that case the columns were summed and the desired output was different.
awk 'NR==1{printf "X\tY\tResult%s",ORS} # Printing the header
NR>1{
count=0; # Initializing count for each row to zero
for(i=3;i<=NF;i++){ #iterating from field 3 to end, NF is #fields
if($i>0){ #$i expands to $3,$4 and so which are the fields
count++; # Incrementing if the condition is true.
}
};
printf "%s\t%s\t%s%s",$1,$2,count,ORS # For each row print o/p
}' file
should do that
another awk
$ awk '{if(NR==1) c="Result";
else for(i=3;i<=NF;i++) c+=($i>0);
print $1,$2,c; c=0}' file | column -t
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0
$ awk '{print $1, $2, (NR>1 ? gsub(/ [1-9]/,"") : "Result")}' file
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0

use awk for printing selected rows

I have a text file and i want to print only selected rows of it. Below is the dummy format of the text file:
Name Sub Marks percentage
A AB 50 50
Name Sub Marks percentage
b AB 50 50
Name Sub Marks percentage
c AB 50 50
Name Sub Marks percentage
d AB 50 50
I need the output as:(Don't need heading before every record and need only 3 columns omitting "MARKS")
Name Sub percentage
A AB 50
b AB 50
c AB 50
d AB 50
Please Suggest me a form of awk command using which I can achieve this, and thanks for supporting.
You can use:
awk '(NR == 1) || ((NR % 2) == 0) {print $1" "$2" "$4}' inputFile
This will print columns one, two and four but only if the record number is one or even. The results are:
Name Sub percentage
A AB 50
b AB 50
c AB 50
d AB 50
If you want it nicely formatted, you can use printf instead:
awk '(NR == 1) || ((NR % 2) == 0) {printf "%-10s %-10s %10s\n", $1, $2, $4}' inputFile
Name Sub percentage
A AB 50
b AB 50
c AB 50
d AB 50
awk solution:
awk 'NR==1 || !(NR%2){ print $1,$2,$4 }' OFS='\t' file
NR==1 || !(NR%2) - considering only the 1st and each even line
OFS='\t' - output field separator
The output:
Name Sub percentage
A AB 50
b AB 50
c AB 50
d AB 50
In case that input file has a slight different format above solution will fail. For example:
Name Sub Marks percentage
A AB 50 50
Name Sub Marks percentage
b AB 50 50
c AB 50 50
Name Sub Marks percentage
d AB 50 50
In such a case, something like this will work in all cases:
$ awk '$0!=h;NR==1{h=$0}' file1
Name Sub Marks percentage
A AB 50 50
b AB 50 50
c AB 50 50
d AB 50 50

Sorting a column based on the value of another

I have a file with the following structure:
Input
1 30923 2 300 G:0.503333 T:0.496667 T
1 51476 2 300 T:0.986667 C:0.0133333 C
1 51479 2 300 T:0.966667 A:0.0333333 T
What I would like to do is to change the position of the fifth and sixth column in a way that one column gets the order identical as of the seventh column. You can see in the example. In the seventh column, we have T, C, T and after the change, the sixth column from T, C, A has changed into T, C, T in the output, that is in the third line, the position of the fifth and sixth columns have switched when compared to the seventh column.
Output
1 30923 2 300 G:0.503333 T:0.496667 T
1 51476 2 300 T:0.986667 C:0.0133333 C
1 51479 2 300 A:0.0333333 T:0.966667 T
I hope I could explain clearly, I have not been able to find a solution, could you please give me a hint as how to do this?
Thank you in advance.
Using output as tab delimiters and all columns justified.
awk -F'[ :]*' '{if($7 == $9 ) print $1,$2,$3,$4,$5,$6,$7,$8,$9; else print $1,$2,$3,$4,$7,$8,$5,$6,$9}' input.txt|column -t
Output:
1 30923 2 300 G 0.503333 T 0.496667 T
1 51476 2 300 T 0.986667 C 0.0133333 C
1 51479 2 300 A 0.0333333 T 0.966667 T
If I understand correctly, maybe this will work for you?
: file a.awk
substr($6,1,1) == $7 { print }
substr($6,1,1) != $7 { print $1, $2, $3, $4, $6, $5, $7 }
: file a.txt
1 30923 2 300 G:0.503333 T:0.496667 T
1 51476 2 300 T:0.986667 C:0.0133333 C
1 51479 2 300 T:0.966667 A:0.0333333 T
bash-3.2$ awk -f a.awk a.txt
1 30923 2 300 G:0.503333 T:0.496667 T
1 51476 2 300 T:0.986667 C:0.0133333 C
1 51479 2 300 A:0.0333333 T:0.966667 T

sum rows based on unique columns awk

I'm looking for a more elegant way to do this (for more than >100 columns):
awk '{a[$1]+=$4}{b[$1]+=$5}{c[$1]+=$6}{d[$1]+=$7}{e[$1]+=$8}{f[$1]+=$9}{g[$1]+=$10}END{for(i in a) print i,a[i],b[i],c[i],d[i],e[i],f[i],g[i]}'
Here is the input:
a1 1 1 2 2
a2 2 5 3 7
a2 2 3 3 8
a3 1 4 6 1
a3 1 7 9 4
a3 1 2 4 2
and output:
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7
Thanks :)
I break the one-liner down into lines, to make it easier to read.
awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}
END{for(x in n){
printf "%s ", x
for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:FS)
}
}' file
this awk command should work with your 100 columns file.
test with your file:
kent$ cat f
a1 1 1 2 2
a2 2 5 3 7
a2 2 3 3 8
a3 1 4 6 1
a3 1 7 9 4
a3 1 2 4 2
kent$ awk '{n[$1];for(i=2;i<=NF;i++)a[$1,i]+=$i}END{for(x in n){printf "%s ", x;for(y=2;y<=NF;y++)printf "%s%s", a[x,y],(y==NF?ORS:OFS)}}' f
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7
Using arrays of arrays in gnu awk version 4
awk '{for (i=2;i<=NF;i++) a[$1][i]+=$i}
END{for (i in a)
{ printf i FS;
for (j in a[i]) printf a[i][j] FS
printf RS}
}' file
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7
If you care about order of output try this
$ cat file
a1 1 1 2 2
a2 2 5 3 7
a2 2 3 3 8
a3 1 4 6 1
a3 1 7 9 4
a3 1 2 4 2
Awk Code :
$ cat tester
awk 'FNR==NR{
U[$1] # Array U with index being field1
for(i=2;i<=NF;i++) # loop through columns thats is column2 to NF
A[$1,i]+=$i # Array A holds sum of columns
next # stop processing the current record and go on to the next record
}
($1 in U){ # Here we read same file once again,if field1 is found in array U, then following statements
for(i=1;i<=NF;i++)
s = s ? s OFS A[$1,i] : A[$1,i] # I am writing sum to variable s since I want to use only one print statement, here you can use printf also
print $1,s # print column1 and variable s
delete U[$1] # We have done, so delete array element
s = "" # reset variable s
}' OFS='\t' file{,} # output field separator is tab you can set comma also
Resulting
$ bash tester
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
--edit--
As requested in comment here is one liner, in above post for better reading purpose I had commented and it became several lines.
$ awk 'FNR==NR{U[$1];for(i=2;i<=NF;i++)A[$1,i]+=$i;next}($1 in U){for(i=1;i<=NF;i++)s = s ? s OFS A[$1,i] : A[$1,i];print $1,s;delete U[$1];s = ""}' OFS='\t' file{,}
a1 1 1 2 2
a2 4 8 6 15
a3 3 13 19 7