how to grep a N (7) column row before a grepped number from a long text file - awk

I have a text file with a spacial format.
After the top "N" rows, the file will have a 7 column row ans then there will be "X" rows (X is the value from column number 6 in this 7 column row). Then there will be another row with 7 column and it will have further "Y" sub-rows (Y is the value from column number 6 in this 7 column row). and it occurance of rows will go upto some fixed numbers, say 40.
En example is here
(I am skipping top few rows).
2.857142857143E-01 2.857142857143E-01-2.857142857143E-01 1 1533 9 1.0
1 -3.52823873905418
2 -3.52823873905417
3 -1.77620635653680
4 -1.77620635653680
5 -1.77620570068355
6 -1.77620570068354
7 -1.77620570066112
8 -1.77620570066112
9 -1.60388273192418
1.428571428571E-01 1.428571428571E-01-1.428571428571E-01 2 1506 14 8.0
1 -3.52823678441811
2 -3.52823678441810
3 -1.77620282216865
4 -1.77620282216865
5 -1.77619365786042
6 -1.77619365786042
7 -1.77619324280126
8 -1.77619324280125
9 -1.60387130881086
10 -1.60387130881086
11 -1.60387074066972
12 -1.60387074066972
13 -1.51340357895078
14 -1.51340357895078
1.000000000000E+00 4.285714285714E-01-1.428571428571E-01 20 1524 51 24.0
1 -3.52823580096110
2 -3.52823580096109
3 -1.77624472106293
4 -1.77624472106293
5 -1.77623455229910
6 -1.77623455229909
7 -1.77620473017160
8 -1.77620473017159
9 -1.60387169115834
10 -1.60387169115834
11 -1.60386634866654
12 -1.60386634866654
13 -1.51340851656332
14 -1.51340851656332
15 -1.51340086887553
16 -1.51340086887553
17 -1.51321967923767
18 -1.51321967923766
19 -1.40212716813452
20 -1.40212716813451
21 -1.40187887062753
22 -1.40187887062753
23 -0.749391485667459
24 -0.749391485667455
25 -0.740712218931955
26 -0.740712218931954
27 -0.714030906779278
28 -0.714030906779278
29 -0.689087278411268
30 -0.689087278411265
31 -0.687054399753234
32 -0.687054399753233
33 -0.677686868127079
34 -0.677686868127075
35 -0.405343895324740
36 -0.405343895324739
37 -0.404786479693490
38 -0.404786479693488
39 -0.269454266134757
40 -0.269454266134755
41 -0.267490250650300
42 -0.267490250650296
43 -0.262198373307171
44 -0.262198373307170
45 -0.260912148881762
46 -0.260912148881761
47 -9.015623907768122E-002
48 -9.015623907767983E-002
49 0.150591609452852
50 0.150591609452856
51 0.201194203960446
I want to grep a particular number from my text file and to do so, I use
awk -v c=2 -v t=$GREP 'NR==1{d=$c-t;d=d<0?-d:d;v=$c;next}{m=$c-t;m=m<0?-m:m}m<d{d=m;v=$c}END{print v}' case.dat
Here $GREP is 0.2011942 which prints the last row (it will change according to different file)
51 0.201194203960446
I want to print the header row also with this number, i.e., my script should print,
51 0.201194203960446
1.000000000000E+00 4.285714285714E-01-1.428571428571E-01 20 1524 51 24.0.
How can I print this header row of the grepped numbers?
I have idea, but I could not implement it in script format.
Simply, grep the number using my script and print the first row before this number that have 7 columns.

This may be what you're looking for
awk -v t="$GREP" '
BEGIN { sub("\\.", "\\.", t) }
NF > 2 { header=$0; next }
NF == 2 && $2 ~ t { printf("%s %s\n%s\n", $1, $2, header) }
' file
You can replace the NF > 2 with NF == 7 if you want the strictly seven-column header to be printed (that header contains 6 columns in your sample data, not 7).
Update after the comment "Can you please modify my script so that it should grep upto 13 decimal number":
awk -v t="$GREP" '
BEGIN { if (match(t, "\\.")) {
t = substr(t, 1, RSTART + 13)
sub("\\.", "\\.", t)
}
}
NF > 2 { header=$0; next }
NF == 2 && $2 ~ t { printf("%s %s\n%s\n", $1, $2, header) }
' file

Related

Select current and previous line if certain value is found

To figure out my problem, I subtract column 3 and create a new column 5 with new values, then I print the previous and current line if the value found is equal to 25 in column 5.
Input file
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
I tried
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' file
output
1 1 35 1 35
2 5 50 1 15
2 6 75 1 25
4 7 85 1 10
5 8 100 1 15
6 9 125 1 25
4 1 200 1 75
Desired Output
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Thanks in advance
you're almost there, in addition to previous $3, keep the previous $0 and only print when condition is satisfied.
$ awk '{$5=$3-p3} $5==25{print p0; print} {p0=$0;p3=$3}' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
this can be further golfed to
$ awk '25==($5=$3-p3){print p0; print} {p0=$0;p3=$3}' file
check the newly computed field $5 whether equal to 25. If so print the previous line and current line. Save the previous line and previous $3 for the computations in the next line.
You are close to the answer, just pipe it another awk and print it
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
with Inputs:
$ cat oxxo.txt
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
$ awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
$
Could you please try following.
awk '$3-prev==25{print line ORS $0,$3} {$(NF+1)=$3-prev;prev=$3;line=$0}' Input_file | column -t
Here's one:
$ awk '{$5=$3-q;t=p;p=$0;q=$3;$0=t ORS $0}$10==25' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Explained:
$ awk '{
$5=$3-q # subtract
t=p # previous to temp
p=$0 # store previous for next round
q=$3 # store subtract value for next round
$0=t ORS $0 # prepare record for output
}
$10==25 # output if equals
' file
No checking for duplicates so you might get same record printed twice. Easiest way to fix is to pipe the output to uniq.

change place of last column but getting new line

I have a file separated by \t.
header text with many lines
V F A B
10 30 26 42
14 33 25 45
16 32 23 43
18 37 22 48
I want to change the 3rd column by the 4th and vice versa. I'm using
awk '
BEGIN {
RS = "\n";
OFS="\t";
record=0;
};
record {
a = $4;
$4 = $3;
$3 = a;
};
$1=="V" {
record=1
};
{
print $0
};
'
}
Instead of just changing the position of the columns, column 3 also has the line break of the original 4th column:
header text with many lines
V F A B
10 30 42
26
14 33 45
25
16 32 43
23
18 37 48
22
How can I prevent this in order to get?
header text with many lines
V F A B
10 30 42 26
14 33 45 25
16 32 43 23
18 37 48 22
Could you please try following, using usual method of storing 1 field's value to a variable and then exchanging the value of 4th field to 3rd field, at last putting 4th field value as variable value(could say swapping values using a variable).
awk 'FNR==1{print;next} {val=$3;$3=$4;$4=val} 1' OFS="\t" Input_file
Or, this messy sed:
sed -E 's/([[:digit:]]+)([[:blank:]]+)([[:digit:]]+)([[:space:]]*)$/\3\2\1\4/' file
# ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# 3rd column tab 4th column optional whitespce

calculating sum and average only for selected data set only

I have a dataset as below:
col-1 col-2 col-3 col-4 col-5 col-6 col-7 col-8
0 17 215 55.7059 947 BMR_42 O22-BMR_1 O23-H23
1 1 1 1.0000 1 BMR_42 O23-BMR_1 O23-H23
2 31 3 1.0968 34 BMR_31 O22-BMR_1 O26-H26
3 11 2 1.0909 12 BMR_31 O13-BMR_1 O26-H26
4 20 5 1.8500 37 BMR_49 O22-BMR_1 O26-H26
5 24 4 1.7917 43 BMR_49 O23-BMR_1 O26-H26
6 41 2 1.0488 43 BMR_49 O12-BMR_1 O12-H12
7 28 2 1.0357 29 BMR_49 O22-BMR_1 O13-H13
8 1 1000 1000.0000 1000 BMR_49 O13-BMR_1 O13-H13
9 1 1 1.0000 1 BMR_22 O12-BMR_2 O22-H22
10 50 62 18.9400 947 BMR_59 O13-BMR_2 O22-H22
11 1 1 1.0000 1 BMR_59 O25-BMR_2 O23-H23
12 34 5 1.1471 39 BMR_59 O13-BMR_2 O23-H23
13 7 6 2.1429 15 BMR_59 O26-BMR_2 O24-H24
14 6 8 3.6667 22 BMR_59 O25-BMR_2 O24-H24
15 28 2 1.1071 31 BMR_10 O26-BMR_2 O26-H26
16 52 121 15.1346 787 BMR_10 O25-BMR_2 O26-H26
17 65 9 1.9231 125 BMR_10 O13-BMR_2 O26-H26
18 4 4 2.2500 9 BMR_59 O26-BMR_2 O26-H26
19 9 1 1.0000 9 BMR_22 O15-BMR_2 O13-H13
20 1 1 1.0000 1 BMR_10 O11-BMR_2 O16-H16
21 7 2 1.1429 8 BMR_53 O13-BMR_2 O16-H16
22 2 3 2.5000 5 BMR_33 O13-BMR_3 O22-H22
23 97 54 6.8247 662 BMR_61 O26-BMR_3 O22-H22
24 1 1 1.0000 1 BMR_29 O26-BMR_3 O23-H23
25 31 36 3.3226 103 BMR_29 O16-BMR_3 O23-H23
(The real file contains over 2000 lines).
I want to select data under certain criteria and find the sum and average of that. For example I want to select lines containing O22 in column $7 and $8 and calculate the sum and average of the values in column $4.
I tried a script as below:
awk '$7 ~ /O22/ && $8 ~ /O22/ {sum += $4} END {print sum, (sum/NR) }' hhsolute.lifetime2.dat
This code could select the line correctly but when I want to calculate the average (sum/NR), I don't get the correct value.
I wish to get some help on this. How I could get the sum and average values only for the data lines I wanted?
Appreciate any help in advance.
awk -v tgt="O22" '
$7 ~ tgt && $8 ~ tgt { sum+=$4; cnt++ }
END { print sum+0, (cnt ? sum/cnt : 0) }
' file
Try this:
awk '$7~/O22/ && $8~/O22/{++n;sum+=$4}END{if(n) print "Sum = " (sum), "Average= "(sum/n)}' File
If 7th and 8th field both contains pattern O22, add 4th field value to variable sum, increase n. Within END block, print the sum and average.

AWK: print colums of a matrix using first column as reference

I want to read first colum in a matrix, and then print columns of this matrix using this first colum as reference. And example:
mat.txt
2 10 6 12 3
4 11 1 22 6
5 15 3 18 9
Using first column as reference, I would like to get columns 2, 4 and 5, and also put the value of first colum at the begining.
2 10 12 3
4 11 22 6
5 15 18 9
I try this, but doesn't work well:
awk 'FNR==NR{c++;cols[c]=$1;end}
{for(i=1;i&lt=c;i++) printf("%s%s",$(cols[i]+1),i&ltc ? OFS : "\n")}' mat.txt mat.txt
This may do:
awk 'FNR==NR {a[NR]=$1;next} {printf "%s ",a[FNR];for (i in a) printf "%s ",$(a[i]);print ""}' mat.txt{,}
2 10 12 3
4 11 22 6
5 15 18 9
The {,} make the file be used two times.

AWK reading a file and operate between different columns

I have three files!
coord.xvg
veloc.xvg
force.xvg
each of these files have lines with multiple numbers lets say 10000
I would like to construct a script that
opens the three files
reads columns
and make arithmetic operations between them for every line.
For example
if every file has 4 words
coord.xvg >> Time x y z
veloc.xvg >> Time vx vy vz
force.xvg >> Time fx fy fz
and c,v,f stands for coord.xvg, veloc.xvg,force.xvg
if I write the operation 2*v*v+c*f*c the output should be
column1 Column2 Column3 Column4
Time 2*vx*vx+cx*fx*cx 2*vy*vy+cy*fy*cy 2*vz*vz+cz*fz*cz
I have found in the internet the following
awk '{
{ getline < "coord.xvg" ; if (FNR==90307) for(i=1;i<=2;i+=1) c=$i}
{ getline < "veloc.xvg" ; if (FNR==90307) for(i=1;i<=2;i+=1) v=$i}
{ getline < "force.xvg" ; if (FNR==90307) for(i=1;i<=2;i+=1) f=$i}
}
END {print c+v+f}' coord.xvg
which stands for my files which I want to begin reading after 90307 lines.
but it didn't help me much as it returns only the last values of every variable
Any thought??
Something to get you started if I understood you correctly
$ cat *.xvg
Time 1 2 3
Time 4 5 6
Time 7 8 9
Time 10 11 12
Time 13 14 15
Time 16 17 18
Time 19 20 21
Time 22 23 24
Time 25 26 27
The following awk-script
{ if (FNR>=1) {
{ getline < "coord.xvg" ; c1=$2;c2=$3;c3=$4}
{ getline < "veloc.xvg" ; v1=$2;v2=$3;v3=$4}
{ getline < "force.xvg" ; f1=$2;f2=$3;f3=$4}
print c1,c2,c3,v1,v2,v3,f1,f2,f3
print $1, c1+v1+f1, c2+v2+f2, c3+v3+f3
}}
reads a line from each of the files and puts the data in variables
as can be seen here
$ awk -f s.awk coord.xvg
1 2 3 19 20 21 10 11 12
Time 30 33 36
4 5 6 22 23 24 13 14 15
Time 39 42 45
7 8 9 25 26 27 16 17 18
Time 48 51 54
The if (FNR>=1) part controls which lines are displayed. Counting starts at 1, change this to match your need. The actual calculations I leave to you :-)