For each unique occurrence in field, transform each unique occurrence in another field in a different column

For each unique occurrence in field, transform each unique occurrence in another field in a different column - awk

I have a file
splice_region_variant,intron_variant A1CF 1
3_prime_UTR_variant A1CF 18
intron_variant A1CF 204
downstream_gene_variant A1CF 22
synonymous_variant A1CF 6
missense_variant A1CF 8
5_prime_UTR_variant A2M 1
stop_gained A2M 1
missense_variant A2M 15
splice_region_variant,intron_variant A2M 2
synonymous_variant A2M 2
upstream_gene_variant A2M 22
intron_variant A2M 308
missense_variant A4GNT 1
intron_variant A4GNT 21
5_prime_UTR_variant A4GNT 3
3_prime_UTR_variant A4GNT 7
This file is sorted by $2
for each occurrence of an unique element in $2, I wanna transform in a column each unique occurrence of an element in $1, with corresponding value in $3, or 0 if the record is not there. So that I have:
splice_region_variant,intron_variant 3_prime_UTR_variant intron_variant downstream_gene_variant synonymous_variant missense_variant 5_prime_UTR_variant stop_gained upstream_gene_variant
A1CF 1 18 204 22 6 8 0 0 0
A2M 2 0 308 0 2 15 1 1 22
A4GNT 0 7 21 0 0 22 3 0 0
test file:
a x 2
b,c x 4
dd x 3
e,e,t x 5
a b 1
cc b 2
e,e,t b 1
This is what I'm getting:
a b,c dd e,e,t cc
x 5 2 4 3
b 1 2 1
EDIT: This might be doing it but doesn't output 0s in blank fields
'BEGIN {FS = OFS = "\t"}
NR > 1 {data[$2][$1] = $3; blocks[$1]}
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
# header
printf "gene"
for (block in blocks) {
printf "%s%s", OFS, block
}
print ""
# data
for (ts in data) {
printf "%s", ts
for (block in blocks) {
printf "%s%s", OFS, data[ts][block]
}
print ""
}
}' file
modified from https://unix.stackexchange.com/questions/424642/dynamic-transposing-rows-to-columns-using-awk-based-on-row-value

If you want to print 0 if a certain value is absent, you could do something like this:
val = data[ts][block] ? data[ts][block] : 0;
printf "%s%s", OFS, val

Related

Counting max and min per row across columns and outputting associated column names

I'm trying to count both max and min (except 0s) per row across columns and outputting associated column names.
I'm trying this:
BEGIN{OFS="\t"}
NR==1{print $1,$2,"ref","max","ref","min";
for(i=3;i<=6;++i)BASES[i]=$(i);
}
NR>1{l=1;basemax=BASES[3];basemin=BASES[3]; max=$3; min=$3;
for(i=4;i<=6;++i){
if($i>max){basemax=BASES[i];max=$i;}
else if($i==max){basemax=basemax","BASES[i];++l}
}
for(i=4;i<=6;++i){
if($i<min && $i !=0){basemmin=BASES[i];mim=$i}
else if($i==min){basemin=basemin","BASES[i];++l}
}
print $1,$2,basemax,max,basemin,min
}
In a input that looks like this
chr pos C T A G
NC_044998.1 3732 22 0 7 0
NC_044998.1 3733 22 0 0 0
NC_044998.1 3734 22 3 3 0
NC_044998.1 3735 22 0 0 3
NC_044998.1 3736 0 7 22 3
NC_044998.1 3737 0 0 0 25
NC_044998.1 3738 22 7 0 0
NC_044998.1 3739 7 3 22 25
NC_044998.1 3740 0 22 22 0
NC_044998.1 3741 22 0 0 0
The desired output is
chr pos ref max ref min
NC_044998.1 3732 C 22 A 7
NC_044998.1 3733 C 22 C 22
NC_044998.1 3734 C 22 T,A 3
NC_044998.1 3735 C 22 G 3
NC_044998.1 3736 A 22 G 3
NC_044998.1 3737 G 25 G 25
NC_044998.1 3738 C 22 C 22
NC_044998.1 3739 G 25 C 7
NC_044998.1 3740 T,A 22 T,A 22
NC_044998.1 3741 C 22 C 22
But it outputs this instead
chr pos ref max ref min
NC_044998.1 3732 C 22 C 22
NC_044998.1 3733 C 22 C 22
NC_044998.1 3734 C 22 C 22
NC_044998.1 3735 C 22 C 22
NC_044998.1 3736 A 22 C 0
NC_044998.1 3737 G 25 C,T,A 0
NC_044998.1 3738 C 22 C 22
NC_044998.1 3739 G 25 C 7
NC_044998.1 3740 T 22 C,A,G 0
NC_044998.1 3741 C 22 C 22

With your shown samples, please try following awk code. Written and tested in GNU awk.
awk -v startField="3" -v endField="6" '
BEGIN{ OFS="\t"; print "chr pos ref max ref min"}
FNR==1{
for(i=startField;i<=endField;i++){
heading[i]=$i
}
next
}
{
min=max2=maxInd2=minInd=max=maxInd=minAllInd=maxAllInd=maxAllInd2=""
for(i=startField;i<=endField;i++){
if($i!=0){
minInd=(min>$i?i:(min==$i?minInd","i:(minInd!=""?minInd:i)))
min=(min>$i?$i:(min!=""?min:$i))
}
maxInd=(max<$i?i:(max==$i?maxInd","i:(maxInd!=""?maxInd:i)))
max=(max<$i?$i:(max!=""?max:$i))
}
for(i=startField+1;i<=endField;i++){
maxInd2=(max2<$i?i:(max2==$i?maxInd2","i:(maxInd2!=""?maxInd2:i)))
max2=(max2<$i?$i:(max2!=""?max2:$i))
}
num1=split(maxInd,arr1,",")
num2=split(minInd,arr2,",")
num3=split(maxInd2,arr3,",")
if(num1>1){
for(k=1;k<=num1;k++){
maxAllInd = (maxAllInd?maxAllInd ",":"") heading[arr1[k]]
}
}
else{
maxAllInd = heading[maxInd]
}
if(num2>1){
for(k=1;k<=num2;k++){
minAllInd = (minAllInd?minAllInd ",":"") heading[arr2[k]]
}
}
else{
minAllInd = heading[minInd]
}
if(num3>1){
for(k=1;k<=num3;k++){
maxAllInd2 = (maxAllInd2?maxAllInd2 ",":"") heading[arr3[k]]
}
}
else{
maxAllInd2 = heading[maxInd2]
}
if(startField>1){
NF=(startField-1)
if(min !=0 ){
print $0,maxAllInd,max,minAllInd,min
}
if(min == 0 && max2 != 0){
print $0,maxAllInd,max,maxAllInd2,max2
}
if(min == 0 && max2 == 0){
print $0,maxAllInd,max,maxAllInd,max
}
}
else{
if(min !=0 ){
print maxAllInd,max,minAllInd,min
}
if(min == 0 && max2 != 0){
print maxAllInd,max,maxAllInd2,max2
}
if(min == 0 && max2 == 0){
print maxAllInd,max,maxAllInd,max
}
}
}
' Input_file

This awk script should work for you:
cat maxmin.awk
NR == 1 {
for (i=b; i<=NF; ++i)
hdr[i] = $i
print $1, $2, "ref", "max", "ref", "min"
next
}
{
for (i=b; i<=NF; ++i) {
max = ($i > max ? $i : max)
min = ($i && (min == "" || $i < min) ? $i : min)
}
for (i=b; i<=NF; ++i) {
if ($i == min)
rmin = (rmin ? rmin "," : "") hdr[i]
if ($i == max)
rmax = (rmax ? rmax "," : "") hdr[i]
}
print $1, $2, rmax, max, rmin, min
max = min = rmax = rmin = ""
}
And use it as:
awk -v b=3 -f maxmin.awk gg | column -t
chr pos ref max ref min
NC_044998.1 3732 C 22 A 7
NC_044998.1 3733 C 22 C 22
NC_044998.1 3734 C 22 T,A 3
NC_044998.1 3735 C 22 G 3
NC_044998.1 3736 A 22 G 3
NC_044998.1 3737 G 25 G 25
NC_044998.1 3738 C 22 T 7
NC_044998.1 3739 G 25 T 3
NC_044998.1 3740 T,A 22 T,A 22
NC_044998.1 3741 C 22 C 22
column -t has been used for tabular output only.

You have some typos in variable names such as basemmin and mim.
If the count of C is 0, the min value has no chance to be updated.
You can combine the two for loops into one.
The variable l is not used.
Then would you please try the following:
awk -v OFS="\t" '
NR==1 {
print $1, $2, "ref", "max", "ref", "min"
for (i = 3; i <= 6; i++) bases[i] = $i
}
NR>1 {
basemax = bases[3]; basemin = bases[3]; max = $3; min = $3
for (i = 4; i <= 6; i++) {
if ($i > max) {basemax = bases[i]; max = $i}
else if ($i == max) {basemax = basemax "," bases[i]}
if ($i < min && $i != 0 || min == 0) {basemin = bases[i]; min = $i}
else if ($i == min) {basemin = basemin "," bases[i]}
}
print $1, $2, basemax, max, basemin, min
}' input_file
Output:
chr pos ref max ref min
NC_044998.1 3732 C 22 A 7
NC_044998.1 3733 C 22 C 22
NC_044998.1 3734 C 22 T,A 3
NC_044998.1 3735 C 22 G 3
NC_044998.1 3736 A 22 G 3
NC_044998.1 3737 G 25 G 25
NC_044998.1 3738 C 22 T 7
NC_044998.1 3739 G 25 T 3
NC_044998.1 3740 T,A 22 T,A 22
NC_044998.1 3741 C 22 C 22
Please note the output slightly differs from your desired output, which may contain typos.

Print sorted output with awk to avoid pipe sort command

I'm trying to match the lines containing (123) and then manipulate field 2 replacing x and + by space that will give 4 columns. Then change order of column 3 by Column 4.
To finally print sorted first by column 3 and second by column 4.
I'm able to get the output piping sort command after awk output in this way.
$ echo "
0: 1920x1663+0+0 kpwr(746)
323: 892x550+71+955 kpwr(746)
211: 891x550+1003+410 kpwr(746)
210: 892x451+71+410 kpwr(746)
415: 891x451+1003+1054 kpwr(746)
1: 894x532+70+330 kpwr(123)
324: 894x532+1001+975 kpwr(123)
2: 894x631+1001+330 kpwr(123)
212: 894x631+70+876 kpwr(123)
61: 892x1+71+375 kpwr(0)
252: 892x1+71+921 kpwr(0)" |
awk '/\(123\)/{b = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2); print b}' |
sort -k3 -k4 -n
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
How can I get the same output using only awk without the need to pipe sort? Thanks for any help.

Here is how you can get it from awk (gnu) itself:
awk '/\(123\)/{
$2 = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2)
split($2, a) # split by space and store into array a
# store array by index 3 and 4
rec[a[3]][a[4]] = (rec[a[3]][a[4]] == "" ? "" : rec[a[3]][a[4]] ORS) $2
}
END {
PROCINFO["sorted_in"]="#ind_num_asc" # sort by numeric key ascending
for (i in rec) # print stored array rec
for (j in rec[i])
print rec[i][j]
}' file
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001

Can you handle GNU awk?:
$ gawk '
BEGIN {
PROCINFO["sorted_in"]="#val_num_asc" # for order strategy
}
/\(123\)$/ { # pick records
split($2,t,/[+x]/) # split 2nd field
if((t[4] in a) && (t[3] in a[t[4]])) { # if index collision
n=split(a[t[4]][t[3]],u,ORS) # split stacked element
u[n+1]=t[1] OFS t[2] OFS t[4] OFS t[3] # add new data
delete a[t[4]][t[3]] # del before rebuilding
for(i in u) # sort on whole record
a[t[4]][t[3]]=a[t[4]][t[3]] ORS u[i] # restack to element
} else
a[t[4]][t[3]]=t[1] OFS t[2] OFS t[4] OFS t[3] # no collision, just add
}
END {
PROCINFO["sorted_in"]="#ind_num_asc" # strategy on output
for(i in a)
for(j in a[i])
print a[i][j]
}' file
Output:
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001
With collisioning data like:
1: 894x532+70+330 kpwr(123) # this
1: 123x456+70+330 kpwr(123) # and this, notice order
324: 894x532+1001+975 kpwr(123)
2: 894x631+1001+330 kpwr(123)
212: 894x631+70+876 kpwr(123)
output would be:
123 456 330 70 # ordered by the whole record when collision
894 532 330 70
894 631 330 1001
894 631 876 70
894 532 975 1001

I was almost done with writing and my solution was ditto as #anubhava's so adding a bit tweak to his solution :) This one will take care of multiple lines of same values here.
awk '
BEGIN{
PROCINFO["sorted_in"]="#ind_num_asc"
}
/\(123\)/{
$2 = gensub(/(.+)x(.+)\+(.+)\+(.+)/, "\\1 \\2 \\4 \\3", "g", $2)
split($2, a," ")
arr[a[3]][a[4]] = (arr[a[3]][a[4]]!=""?arr[a[3]][a[4]] ORS:"")$2
}
END {
for (i in arr){
for (j in arr[i]){ print arr[i][j] }
}
}' Input_file

How would I print the elements matching a regex pattern in order?

I have a text file that has text in this format:
ptr[0] = Alloc(1) returned 1000 (searched 1 elements)
Free List [ Size 1 ]: [ addr:1001 sz:99 ]
Free(ptr[0]) returned 0
Free List [ Size 2 ]: [ addr:1000 sz:1 ] [ addr:1001 sz:99 ]
ptr[1] = Alloc(7) returned 1001 (searched 2 elements)
Free List [ Size 2 ]: [ addr:1000 sz:1 ] [ addr:1008 sz:92 ]
Free(ptr[1]) returned 0
Free List [ Size 3 ]: [ addr:1000 sz:1 ] [ addr:1001 sz:7 ] [ addr:1008 sz:92 ]
ptr[2] = Alloc(5) returned 1001 (searched 3 elements)
Free List [ Size 3 ]: [ addr:1000 sz:1 ] [ addr:1006 sz:2 ] [ addr:1008 sz:92 ]
Free(ptr[2]) returned 0
Free List [ Size 5 ]: [ addr:1000 sz:1 ] [ addr:1001 sz:5 ] [ addr:1006 sz:2 ] [ addr:1008 sz:8 ] [ addr:1016 sz:84 ]
And I am trying to print out only the values that match with the sz: in the text file and print them in the order they are in but as a list. Like so:
$ awk -f list.awk file.txt | head
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84
I've tried the following, but it prints out only the lines that contain sz:. How could I break it further to get the output I want?
/Free List/{
s = $0
split(s, a, /sz:/)
print s
}

Following awk solutions may help you on same.
Solution 1st: When you want only digit value associated with string sz then following may help you on same.
awk '{while(match($0,/sz:[0-9]+/)){val=(val?val FS:"") substr($0,RSTART+3,RLENGTH-3);$0=substr($0,RSTART+RLENGTH)}}val!=""{print val;val=""}' Input_file
Adding a non-one liner form of solution too now.
awk '
{
while(match($0,/sz:[0-9]+/)){
val=(val?val FS:"") substr($0,RSTART+3,RLENGTH-3);
$0=substr($0,RSTART+RLENGTH)}
}
val!=""{
print val;
val=""
}
' Input_file
Solution 2nd: In case you need to have string sz also with values then following may help you on same.
awk '{while(match($0,/sz:[0-9]+/)){val=(val?val FS:"") substr($0,RSTART+3,RLENGTH-3);$0=substr($0,RSTART+RLENGTH)}}val!=""{print val;val=""}' Input_file
Adding a non one liner form of solution too now.
awk '
{
while(match($0,/sz:[0-9]+/)){
val=(val?val FS:"") substr($0,RSTART+3,RLENGTH-3);
$0=substr($0,RSTART+RLENGTH)}
}
val!=""{
print val;
val=""
}
' Input_file
NOTE: In case you want to perform this operation only on those lines which havd string Free List then add /Free List/{ before while and add } before ' in above solutions simply.

if perl is okay:
$ perl -lne 'print join " ", /sz:(\d+)/g if /Free List/' ip.txt
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84
if /Free List/ if line contains Free List
/sz:(\d+)/g match all digits that follows sz:
print join " " print those matches separated by space
see https://perldoc.perl.org/perlrun.html#Command-Switches for details on -lne options

Using bash and grep:
while IFS= read -r line; do
x=$(grep -oP '(sz:\K\d+)' <<< "$line")
[[ $x ]] && echo $x
done < file
Output :
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84

With GNU awk for FPAT:
$ awk -v FPAT='sz:[0-9]+' '{for (i=1;i<=NF;i++) printf "%s%s", substr($i,4), (i<NF?OFS:ORS)}' file
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84
With any awk:
$ awk '{out=""; while (match($0,/sz:[0-9]+/)) { out = (out=="" ? "" : out OFS) substr($0,RSTART+3,RLENGTH-3); $0=substr($0,RSTART+RLENGTH) } $0=out } NF' file
99
1 99
1 92
1 7 92
1 2 92
1 5 2 8 84

AWK - Select lines to print according to score

I have a tab-separated file containing a series of lemmas with associated scores.
The file contains 5 columns, the first column is the lemma and the third is the one that contains the score. What I need to do is print the line as it is, when lemma is not repeated and print the line with the highest score when lemma is repeated.
IN
Lemma --- Score --- ---
cserép 06a 55 6 bueno
darázs 05 38 1 bueno
dél 06a 34 1 bueno
dér 06a 29 1 bueno
díj 05 14 89 malo
díj 06a 2 101 malo
díj 06b 2 101 malo
díj 07 90 13 bueno
díj 08a 2 101 malo
díj 08b 2 101 malo
egér 06a 66 5 bueno
fonal 05 12 1 bueno
fonal 07 52 4 bueno
Desired output
Lemma --- Score --- ---
cserép 06a 55 6 bueno
darázs 05 38 1 bueno
dél 06a 34 1 bueno
dér 06a 29 1 bueno
díj 07 90 13 bueno
egér 06a 66 5 bueno
fonal 07 52 4 malo
What I have done. But it only works when the lemma is repeated once.
BEGIN {
OFS=FS="\t";
flag="";
}
{
id=$1;
if (id != flag)
{
if (line != "")
{
sub("^;","",line);
z=split(line,A,";");
if ((A[3] > A[8]) && (A[8] != ""))
{
print A[1]"\t"A[2]"\t"A[3]"\t"A[4]"\t"A[5];
}
else if ((A[8] > A[3]) && (A[8] != ""))
{
print A[6]"\t"A[7]"\t"A[8]"\t"A[9]"\t"A[10]
}
else
{
print A[1]"\t"A[2]"\t"A[3]"\t"A[4]"\t"A[5];
}
}
delete line;
flag=id;
}
line[$1]=line[$1]";"$2";"$3";"$4";"$5;
}
END {
line=line ";"$1";"$2";"$3";"$4";"$5
sub("^;","",line);
z=split(line,A,";");
if ((A[3] > A[8]) && (A[8] != ""))
{
print A[1]"\t"A[2]"\t"A[3]"\t"A[4]"\t"A[5];
}
else if ((A[8] > A[3]) && (A[8] != ""))
{
print A[6]"\t"A[7]"\t"A[8]"\t"A[9]"\t"A[10]
}
else
{
print A[1]"\t"A[2]"\t"A[3]"\t"A[4]"\t"A[5]
}
}

This one doesn't require the file to be sorted by lemma, but, it keeps all the lines to be printed in memory (one for each lemma) so may not be appropriate for a file with millions of different lemmas.
It also does not respect the order of the original file.
Finally, it assumes that all scores are non-negative!
$ cat lemma.awk
BEGIN { FS = OFS = "\t" }
NR == 1 { print }
NR > 1 {
if ($3 > score[$1]) {
score[$1] = $3
line[$1] = $0
}
}
END { for (lemma in line) print line[lemma] }
$ awk -f lemma.awk lemma.txt
Lemma --- Score --- ---
cserép 06a 55 6 bueno
díj 07 90 13 bueno
fonal 07 52 4 bueno
darázs 05 38 1 bueno
egér 06a 66 5 bueno
dél 06a 34 1 bueno
dér 06a 29 1 bueno

Tested with gnu awk:
prevLemma != $1 {
if( prevLemma ) {
print line;
}
prevLemma = $1;
prevScore = $3;
line = $0;
}
prevLemma == $1 { if( prevScore < $3 ) {
prevScore = $3;
line = $0;
}
}
END { print line;}
assumption is: the file is sorted by lemma
when the lemma changes (or at the very beginning when the var is empty) the lemma, score and line are saved
when the lemma changes (or in the END), the line for the previous lemma is printed
when the current line belongs to the same lemma and has a higher score the values are saved again

$ cat tst.awk
$1 != prev { printf "%s", maxLine; maxLine=""; max=$3; prev=$1 }
$3 >= max { max=$3; maxLine=$0 ORS }
END { printf "%s", maxLine }
$ awk -f tst.awk file
Lemma --- Score --- ---
cserép 06a 55 6 bueno
darázs 05 38 1 bueno
dél 06a 34 1 bueno
dér 06a 29 1 bueno
díj 07 90 13 bueno
egér 06a 66 5 bueno
fonal 07 52 4 bueno

Use a script:
if ($1 != $5) print $0
else
{
score($NR) = $3
print $0
}
Actually , this might be better done with perl.

Division a column in period and print min max for each in awk

I have a data file which content two columns. One of them have periodic variation of whom the max and min are different in each period :
a 3
b 4
c 5
d 4
e 3
f 2
g 1
h 2
i 3
j 4
k 5
l 6
m 5
n 4
o 3
p 2
q 1
r 0
s 1
t 2
u 3
We can find that in the 1st period (from a to i): max = 5, min = 1. In the 2nd period (from i to u) : max = 6, min = 0.
Using awk, I can only print the max and min of all second column, but I cannot print these values min and max after each period. That means I wish to obtain results like this :
period min max
1 1 5
2 0 6
Here is what I did :
{
nb_lignes = 21
period = 9
nb_periodes = int(nb_lignes/period)
}
{
for (j = 0; j <= nb_periodes; j++)
{ if (NR == (1 + period*j)) {{max=$2 ; min=$2}}
for (i = (period*j); i <= (period*(j+1)); i++)
{
if (NR == i)
{
if ($2 >= max) {max = $2}
if ($2 <= min) {min = $2}
{print "Min: "min,"Max: "max,"Ligne: " NR}
}
}
}
}
#END { print "Min: "min,"Max: "max }
However the result is far away from what I search for :
Min: 3 Max: 3 Ligne: 1
Min: 3 Max: 4 Ligne: 2
Min: 3 Max: 5 Ligne: 3
Min: 3 Max: 5 Ligne: 4
Min: 3 Max: 5 Ligne: 5
Min: 2 Max: 5 Ligne: 6
Min: 1 Max: 5 Ligne: 7
Min: 1 Max: 5 Ligne: 8
Min: 1 Max: 5 Ligne: 9
Min: 1 Max: 5 Ligne: 9
Min: 4 Max: 4 Ligne: 10
Min: 4 Max: 5 Ligne: 11
Min: 4 Max: 6 Ligne: 12
Min: 4 Max: 6 Ligne: 13
Min: 4 Max: 6 Ligne: 14
Min: 3 Max: 6 Ligne: 15
Min: 2 Max: 6 Ligne: 16
Min: 1 Max: 6 Ligne: 17
Min: 0 Max: 6 Ligne: 18
Min: 0 Max: 6 Ligne: 18
Min: 1 Max: 1 Ligne: 19
Min: 1 Max: 2 Ligne: 20
Min: 1 Max: 3 Ligne: 21
Thank you in advance for you help.

Try something like:
$ awk '
BEGIN{print "period", "min", "max"}
!f{min=$2; max=$2; ++f; next}
{max = ($2>max)?$2:max; min = ($2<min)?$2:min; f++}
f==9{print ++a, min, max; f=0}' file
period min max
1 1 5
2 0 6
When the flag f is not set, you assign the second column to max and min variables and start incrementing your flag.
For each line, check the second column. If it is bigger than our max variable assign that column to max. Like wise, if it is smaller than our min variable, assign it to our min variable. Keep incrementing the flag.
Once the flag reaches 9, print the period number, min and max variables. Reset the flag to 0 and start again afresh from next line.

I've started, so I'll finish. I chose to create an array which contains the minimum and maximum for each period:
awk -v period=9 '
BEGIN { print "period", "min", "max" }
NR % period == 1 { ++i }
!min[i] || $2 < min[i] { min[i] = $2 }
$2 > max[i] { max[i] = $2 }
END { for (i in min) print i, min[i], max[i] }' input
The index i increases every period number of lines (in this case 9). If no value has been set yet or a new minimum/maximum has been found, update the array.
edit: if max[i] has not yet been set then $2 > max[i], so no need to check !max[i].

awk 'BEGIN{print "Period","min","max"}
NR==1||(NR%10==0){mi=ma=$2}
{$2<mi?mi=$2:0;$2>ma?ma=$2:0}
NR%9==0{print ++i,mi,ma}' your_file
Tester here

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

For each unique occurrence in field, transform each unique occurrence in another field in a different column - awk

If you want to print 0 if a certain value is absent, you could do something like this: val = data[ts][block] ? data[ts][block] : 0; printf "%s%s", OFS, val

Related

Counting max and min per row across columns and outputting associated column names

Print sorted output with awk to avoid pipe sort command

How would I print the elements matching a regex pattern in order?

AWK - Select lines to print according to score

Division a column in period and print min max for each in awk

Categories

Resources