Awk extract data block between text strings - awk

I'm fighting with awk again for pulling out data from a log file. The area in question of my log file looks like this, however there are a few thousand lines above and below this block:
4C*DJ - (B-C)*DJK + 2*(2A+B+C)*D1 - 4*(4A+B-3C)*D2 = 0
Value = 0.5293955920D-22
Alpha Matrix in cm-1
Axis Mode Inertia Coriol. Anharm. Total
x 1 -0.37699D-03 -0.36413D-02 0.10830D-01 0.68121D-02
x 2 -0.83656D-03 -0.53163D-02 0.14483D-01 0.83306D-02
x 3 -0.15253D-02 -0.10512D-01 0.20064D-01 0.80264D-02
x 4 -0.17103D-03 -0.73492D-03 0.14953D-01 0.14047D-01
x 5 -0.96312D-03 -0.11748D-01 0.15825D-02 -0.11128D-01
x 6 -0.46095D-03 -0.94225D-02 0.44165D-02 -0.54669D-02
x 7 -0.26926D-01 -0.10167D-01 0.29406D-01 -0.76866D-02
x 8 -0.17827D-02 -0.21079D-01 0.74564D-02 -0.15405D-01
x 9 -0.55840D-02 0.84897D-01 -0.29596D-02 0.76354D-01
x 10 -0.50287D-24 0.36312D-01 -0.44078D-02 0.31904D-01
x 11 -0.48777D-24 -0.63320D-01 0.18876D-02 -0.61432D-01
x 12 -0.35364D-24 0.42877D-01 0.62352D-03 0.43500D-01
y 1 -0.23141D-05 -0.13777D-03 0.53278D-03 0.39270D-03
y 2 -0.62128D-05 -0.87905D-04 0.36602D-03 0.27190D-03
y 3 -0.55613D-05 -0.33722D-04 0.28874D-03 0.24946D-03
y 4 -0.47995D-04 -0.60863D-03 0.17426D-02 0.10860D-02
y 5 -0.36076D-04 -0.20493D-03 0.12026D-03 -0.12075D-03
y 6 -0.12725D-03 -0.61930D-03 -0.15830D-03 -0.90485D-03
y 7 -0.19917D-03 -0.55423D-04 0.10520D-02 0.79740D-03
y 8 -0.48978D-03 -0.13733D-02 0.54899D-03 -0.13141D-02
y 9 -0.11432D-02 0.62058D-03 -0.20074D-04 -0.54272D-03
y 10 -0.16078D-24 0.20852D-02 -0.88466D-04 0.19967D-02
y 11 -0.63877D-25 0.18274D-03 -0.13682D-03 0.45922D-04
y 12 -0.43257D-25 0.92039D-03 -0.61669D-03 0.30370D-03
z 1 -0.69174D-07 -0.23737D-03 0.59290D-03 0.35547D-03
z 2 -0.60773D-05 -0.18704D-03 0.53271D-03 0.33960D-03
z 3 -0.46425D-05 -0.29722D-03 0.57403D-03 0.27217D-03
z 4 -0.22234D-04 -0.47670D-03 0.15748D-02 0.10759D-02
z 5 -0.20254D-04 0.24124D-03 0.11848D-03 0.33947D-03
z 6 -0.42788D-04 0.99264D-04 -0.40246D-04 0.16230D-04
z 7 -0.10941D-03 0.30020D-03 0.13135D-02 0.15043D-02
z 8 -0.19997D-03 0.32196D-03 0.54501D-03 0.66699D-03
z 9 -0.20819D-03 0.45666D-03 -0.67765D-04 0.18071D-03
z 10 -0.55249D-25 0.00000D+00 -0.14491D-03 -0.14491D-03
z 11 -0.55828D-26 0.00000D+00 -0.69139D-04 -0.69139D-04
z 12 -0.26265D-26 0.00000D+00 -0.45200D-03 -0.45200D-03
Vibro-Rot alpha Matrix (cm-1)
a(z) b(x) c(y)
Q( 1) 0.00681 0.00039 0.00036
I need to extract the data from (in this case) " x 1 -0.37..." through "z 12 -0.262..."
I can head and tail the file if I can just get awk to extract the data to some known point. I have about 300 of these files, each has a different number of lines so I can't just count lines, but they all start with "Axis Mode Inertia..." and end with "Vibro-Rot alpha Matrix".
I'm currently trying to use:
awk '$1=="Axis"&&$2=="Mode"{t=1};t;/[0-9]+ "Vibro-Rot alpha Matrix"/{exit}' file.log
Which works to get the start of the file (though it includes the header which I can subsequently cut off). But the end part of the awk command doesn't work. I've tried to end it with ^Vib/{exit} and other things, but nothing seems to work, I just get a few thousand lines of the log file when I do it.
As I'm sure it matters, there is a single space before "axis" at the top, and before "Vibro-Rot" at the bottom of the file. Though the " $1=="Axis"&&$2=="Mode" " part doesn't seem to care about a single white space.
What am I missing to cut until the line that has "Vibro-Rot alpha Matrix" in it?
Thanks in advance!
Ben

It worked for me:
awk '$1 == "Axis" && $2 == "Mode" {t = 1;} $1 == "Vibro-Rot" && $2 == "alpha" && $3 == "Matrix" {t = 0;} t == 1 && NF == 6 {print $0}' file.log
In case you do not want the header, try:
awk '$1 == "Vibro-Rot" && $2 == "alpha" && $3 == "Matrix" {t = 0;} t == 1 && NF == 6 {print $0} $1 == "Axis" && $2 == "Mode" {t = 1;}' file.log

Try something like:
awk '!NF{p=0}p; /Axis Mode/{p=1}' file.log
--
Using your original approach:
How about:
awk '/Vibro-Rot alpha Matrix/{exit}t; $1=="Axis"&&$2=="Mode"{t=1}' file.log

Huh? Use grep:
egrep "^x|^y|^z" yourfile

Related

Changing a list of values in Awk

I am trying to change values in the following list:
A 0.702
B 0.868
C 3.467
D 2.152
If the second column is less than 0.5 I would like to change to -2, between 0.5-1 to -1, between 1-1.5 to 1 and if > 1.5 then to 2.
When I try the following:
awk '$2<0.9 || $2>2' | awk '{if ($2 < 0.5) print $1,-2;}{if($2>0.5 || $2<1) print $1,-1;}{if($2>1 || $2<1.5) print $1,1;}{if($2>2) print $1,2;}'
I get the following:
A -1
A 1
B -1
B 1
C 1
C 2
D 1
D 2
I know I am missing something but for the life of me I can't figure out what - any help gratefully recieved.
If you have multiple if statements and the current value can match multiple statements, you can print multiple outputs.
If you only want to print the output of the first match, you would have to prevent running the if statements that follow.
You can use a single awk and define non overlapping matches with greater than and && lower than.
Note that using only > and < you will not for example 0.5
awk '{
if($2 < 0.5) print($1, -2)
if($2 > 0.5 && $2<1) print($1,-1)
if($2 > 1 && $2<1.5) print($1, 1)
if($2 > 1.5) print($1 ,2)
}
' file
Output
A -1
B -1
C 2
D 2
With your shown samples only. Adding one more solution with using ternary operators for condition checking(for Fun :) ).
awk '{print (NF?($2>1.5?($1 OFS 2):($2>1?($1 OFS 1):($2>0.5?($1 OFS "-1"):($1 OFS "-2")))):"")}' Input_file
Better readable form of above awk code. Since its a one-liner so breaking it up into multi form for better readability here.
awk '
{
print \
(\
NF\
?\
($2>1.5\
?\
($1 OFS 2)\
:\
($2>1\
?\
($1 OFS 1)\
:\
($2>0.5\
?\
($1 OFS "-1")\
:\
($1 OFS "-2")\
)\
)\
)\
:\
""\
)
}
' Input_file
Explanation: Simple explanation would be using ternary operators to perform conditions and accordingly printing values(since its happening in print function).
Another. Replace <s with <=s where needed:
$ awk '{
if($2<0.5) # from low to higher sets the lower limit
$2=-2
else if($2<1) # so only upper limit needs to be tested
$2=-1
else if($2<1.5)
$2=1
else
$2=2
}1' file
Output:
A -1
B -1
C 2
D 2
Probably overkill for your needs but here's a data-driven approach using GNU awk for arrays of arrays and +/-inf:
$ cat tst.awk
BEGIN {
range["-inf"][0.5] = -2
range[0.5][1] = -1
range[1][1.5] = 1
range[1.5]["+inf"] = 2
}
{
val = ""
for ( beg in range ) {
for ( end in range[beg] ) {
if ( (beg+0 < $2) && ($2 <= end+0) ) {
val = range[beg][end]
}
}
}
print $1, val
}
$ awk -f tst.awk file
A -1
B -1
C 2
D 2
I'm assuming above that "between" excludes the start of the range but includes the end of it. You could make it slightly more efficient with:
for ( beg in range ) {
if ( beg+0 < $2 ) {
for ( end in range[beg] ) {
if ( $2 <= end+0 ) {
val = range[beg][end]
}
}
}
}
but I just like having the range comparison all on 1 line and there's only 1 end for every begin so it doesn't make much difference.
UPDATE 1 : new equation should cover nearly all scenarios :
1st half equation handles the sign +/-
2nd half handles the magnitude of the binning
mawk '$NF = (-++_)^(+(__=$NF)<_) * ++_^(int(__+_--^-_)!=_--)'
X -1.25 -2
X -1.00 -2
X -0.75 -2
X -0.50 -2
X -0.25 -2
X 0.00 -2
X 0.25 -2
X 0.50 -1
X 0.75 -1
X 1.00 1
X 1.25 1
X 1.50 2
X 1.75 2
X 2.00 2
X 2.25 2
X 2.50 2
==============================
this may not cover every possible scenario, but if u want a single liner to cover the samples shown :
mawk '$NF = 4 < (_=int(2*$NF)-2)^2 ? 1+(-3)^(_<-_) :_'
A -1
B -1
C 2
D 2

How to add numbers from files to computation?

I need to get results of this formula - a column of numbers
{x = ($1-T1)/Fi; print (x-int(x))}
from inputs file1
4 4
8 4
7 78
45 2
file2
0.2
3
2
1
From this files should be 4 outputs.
$1 is the first column from file1, T1 is the first line in first column of the file1 (number 4) - it is alway this number, Fi, where i = 1, 2, 3, 4 are numbers from the second file. So I need a cycle for i from 1 to 4 and compute the term one times with F1=0.2, the second output with F2=3, then third output with F3=2 and the last output will be for F4=1. How to express T1 and Fi in this way and how to do a cycle?
awk 'FNR == NR { F[++n] = $1; next } FNR == 1 { T1 = $1 } { for (i = 1; i <= n; ++i) { x = ($1 - T1)/F[i]; print x - int(x) >"output" FNR} }' file2 file1
This gives more than 4 outputs. What is wrong please?
FNR == 1 { T1 = $1 } is being run twice, when file2 is started being read T1 is set to 0.2,
>"output" FNR is problematic, you should enclose the output name expression in parentheses.
Here's how I'd do it:
awk '
NR==1 {t1=$1}
NR==FNR {f[NR]=$1; next}
{
fn="output"FNR
for(i in f) {
x=(f[i]-t1)/$1
print x-int(x) >fn
}
close(fn)
}
' file1 file2

AWK to print min max values of unique value in column

I am trying to use awk to do the following:
Input file:
6:28866209 NA NA NA 8.51368e-06 Y
6:28856689 1 0.007828 1 1.50247e-06 X
6:28856740 2 0.007828 1 1.50247e-06 Y
6:28856889 3 7.51E-08 3 1.50247e-06 X
I want to:
Get min and max of column 5 for each independent value in column 6
Print the min max in the file for each column 5 at the end of the file
The file can have different N columns, but all have at least columns 1-8, which are the same in each of my files.
Output:
6:28866209 NA NA NA 8.51368e-06 Y 8.51368e-06 1.50247e-06
6:28856689 1 0.007828 1 1.50247e-06 X 1.50247e-061.50247e-06
6:28856740 2 0.007828 1 1.50247e-06 Y 8.51368e-06 1.50247e-06
6:28856889 3 7.51E-08 3 1.50247e-06 X 1.50247e-06 1.50247e-06
I have attempted this using the following awk command, but I am only getting back the first value in column 6...
awk 'BEGIN{OFS="\t";FS="\t"}{if (a[$6] == "") a[$6]=$5; if (a[$6] > $5) {a[$6]=$5}} {if (b[$6] == "") b[$6]=$5; if (b[$6] < $5) {b[$6]=$5}} END {if (i=$6) print $0,i,a[i],b[i]}' FILE
I believe the easiest to do is using a double pass of the file :
awk '(NR==FNR) && !($6 in min) { min[$6] = $5; max[$6] = $5; next }
(NR==FNR) { m=min[$6]; M=max[$6];
min[$6] = $5<m ? $5 : m;
max[$6] = $5>M ? $5 : M;
next; }
{print $0,min[$6],max[$6] }' <file> <file>
Your original code has the following flaw. The END statement is only executed when the end of the file is reached. You attempt to print the full file, but you did not store any lines in the parsing.
A correction to your original idea is :
awk 'BEGIN{OFS="\t";FS="\t"}
{if (a[$6] == "") a[$6]=$5;
if (a[$6] > $5) {a[$6]=$5}
}
{if (b[$6] == "") b[$6]=$5;
if (b[$6] < $5) {b[$6]=$5}
}
{ c[NR]=$0; d[NR]=$6 }
END { for (i=1;i<=NR;i++) print c[i],a[d[i]],b[d[i]] }' FILE
Here, I stored the full FILE in array c which is indexed by the line-number NR. I also store the index $6 in array d. At the end, I loop trough all lines I stored and print what is expected.
The downside of this approach is that you have to store the full file in memory.
The downside of my proposal, is that you have to read the full file twice from disk.
awk 'FNR<NR{$7=m[$6];$8=M[$6];print;next} (!M[$6])||$5>M[$6]{M[$6]=$5}(!m[$6])||$5<m[$6]{m[$6]=$5}' file file
with comment
awk '
# optional format
BEGIN { OFS=FS="\t"}
# for second pass (second file read)
FNR<NR{
# add a column 7 and 8 with value of min and max correponsing to column 5
$7=m[$6];$8=M[$6]
# print it and reda next line (don't go further in script)
print;next}
# this point is only reach by first file read
# if Max is unknow or value 5 bigger than max
(!M[$6])||$5>M[$6]{
# set new max
M[$6]=$5}
# do the same for min
(!m[$6])||$5<m[$6]{m[$6]=$5}
# read 2 times the same file (first to find min/max, second to print it)
' sample.txt sample.txt

how to skip specific lines in awk and print the remaining

how can I read only lines: 3,9,12, 15 from the file containing the ff lines.
The idea is whenever I get x and y , I wanted to print the last line among lines containing x and y.
What I meant is , for example , if I have awk script like : BEGIN { name = $2; value=$3; } { if(name == x && value==y && the scan reaches at lines 3, 9, 12 and 15) printf("hello world") }. what expression can I use instead of "the scan reaches at lines 3, 9 12 and 15"
1 x y
2 x y
3 x y
4 a d
5 e f
6 x y
7 x y
8 x y
9 x y
10 g f
11 x y
12 x y
13 p r
14 w c
15 x y
16 a z
One way with awk:
$ awk '/^[0-9]+ x y$/{a=$0;f=1;next}f{print a;f=0}' file
3 x y
9 x y
12 x y
15 x y
One way without awk:
$ tac file | uniq -f1 | fgrep -w 'x y' | tac
3 x y
9 x y
12 x y
15 x y
Some like this?
awk 'a=="xy" && $2$3!="xy" {print b} {a=$2$3;b=$0}' file
3 x y
9 x y
12 x y
15 x y
You need to use two while loops here one to check the line and another to iterate. Something like this. Hope that helps
String line = "";
int i = 0;
try {
BufferedReader in = new BufferedReader(new FileReader("D:\\readline.txt"));
while ((line = in.readLine()) != null) {
i++;
if (line.charAt(0) == 'x' && line.charAt(2) == 'y') {
System.out.println("Line containg Y and Y");
String searchline = line;
while ((line = in.readLine()) != null) { //Iterate untill you find the last line of X and Y
i++; //To keep count of the line read
if (line.charAt(0) == 'x' && line.charAt(2) == 'y') {
searchline = line;
continue;
} else {
break;
}
}
System.out.println("Printing the line ::" + (i - 1) + ":: containing X and Y::::::::" + searchline);
}
}
} catch (Exception e) {
System.out.println("Exception Caught::::");
}
}

find the Max and Min with AWK in specific range

I have file with three columns , I want to get max of $3 and min of $2 but in specific range of $1 with awk:
Col1 Col2 Col3
==============
X 1 2
X 3 4
Y 5 6
Y 7 8
E.g. I want to get the minimum value of Col2 , and the maximum value of Col3 while Col1=X.
I could handle max and min value but I dont find out how to find it in specific range
this is my code :
awk ' min=="" || $2 < min {min=$2; minline=$0} $3 > max {max=$3; maxline=$0};END {print $1,min,max}'
I tried to add {If ($1==X)} but It doesnt work well.
kent$ echo "X 1 2
X 3 4
Y 5 6
Y 7 8
"|awk '$1=="X"{min=$2<min||min==""?$2:min;max=$3>max||max==""?$3:max}END{print min,max}'
1 4
is this what you want?
What about:
awk 'BEGIN { c=1 }
$1 == "X" { if (c==1) { mmin=$2; mmax=$3 ;c++ }
if ($2<mmin) { mmin=$2 }
if ($3>mmax) { mmax=$3 }
}
END { print "X min: " mmin ", max: " mmax }' INPUTFILE
See it in action # Ideone.
If you want to collect all the minima and maxima:
awk '
$2 < min[$1] {min[$1] = $2}
$3 > max[$1] {max[$1] = $3}
{col1[$1] = 1}
END {for (c in col1) {print c, min[c], max[c]}}
' file