how to skip specific lines in awk and print the remaining - awk

how can I read only lines: 3,9,12, 15 from the file containing the ff lines.
The idea is whenever I get x and y , I wanted to print the last line among lines containing x and y.
What I meant is , for example , if I have awk script like : BEGIN { name = $2; value=$3; } { if(name == x && value==y && the scan reaches at lines 3, 9, 12 and 15) printf("hello world") }. what expression can I use instead of "the scan reaches at lines 3, 9 12 and 15"
1 x y
2 x y
3 x y
4 a d
5 e f
6 x y
7 x y
8 x y
9 x y
10 g f
11 x y
12 x y
13 p r
14 w c
15 x y
16 a z

One way with awk:
$ awk '/^[0-9]+ x y$/{a=$0;f=1;next}f{print a;f=0}' file
3 x y
9 x y
12 x y
15 x y
One way without awk:
$ tac file | uniq -f1 | fgrep -w 'x y' | tac
3 x y
9 x y
12 x y
15 x y

Some like this?
awk 'a=="xy" && $2$3!="xy" {print b} {a=$2$3;b=$0}' file
3 x y
9 x y
12 x y
15 x y

You need to use two while loops here one to check the line and another to iterate. Something like this. Hope that helps
String line = "";
int i = 0;
try {
BufferedReader in = new BufferedReader(new FileReader("D:\\readline.txt"));
while ((line = in.readLine()) != null) {
i++;
if (line.charAt(0) == 'x' && line.charAt(2) == 'y') {
System.out.println("Line containg Y and Y");
String searchline = line;
while ((line = in.readLine()) != null) { //Iterate untill you find the last line of X and Y
i++; //To keep count of the line read
if (line.charAt(0) == 'x' && line.charAt(2) == 'y') {
searchline = line;
continue;
} else {
break;
}
}
System.out.println("Printing the line ::" + (i - 1) + ":: containing X and Y::::::::" + searchline);
}
}
} catch (Exception e) {
System.out.println("Exception Caught::::");
}
}

Related

Using awk to count number of row group

I have a data set: (file.txt)
X Y
1 a
2 b
3 c
10 d
11 e
12 f
15 g
20 h
25 i
30 j
35 k
40 l
41 m
42 n
43 o
46 p
I want to add two columns which are Up10 and Down10,
Up10: From (X) to (X-10) count of row.
Down10 : From (X) to (X+10)
count of row
For example:
X Y Up10 Down10
35 k 3 5
For Up10; 35-10 X=35 X=30 X=25 Total = 3 row
For Down10; 35+10 X=35 X=40 X=41 X=42 X=42 Total = 5 row
Desired Output:
X Y Up10 Down10
1 a 1 5
2 b 2 5
3 c 3 4
10 d 4 5
11 e 5 4
12 f 5 3
15 g 4 3
20 h 5 3
25 i 3 3
30 j 3 3
35 k 3 5
40 l 3 5
41 m 3 4
42 n 4 3
43 o 5 2
46 p 5 1
This is the Pierre François' solution: Thanks again #Pierre François
awk '
BEGIN{OFS="\t"; print "X\tY\tUp10\tDown10"}
(NR == FNR) && (FNR > 1){a[$1] = $1 + 0}
(NR > FNR) && (FNR > 1){
up = 0; upl = $1 - 10
down = 0; downl = $1 + 10
for (i in a) { i += 0 # tricky: convert i to integer
if ((i >= upl) && (i <= $1)) {up++}
if ((i >= $1) && (i <= downl)) {down++}
}
print $1, $2, up, down;
}
' file.txt file.txt > file-2.txt
But when i use this command for 13GB data, it takes too long.
I have used this way for 13GB data again:
awk 'BEGIN{ FS=OFS="\t" }
NR==FNR{a[NR]=$1;next} {x=y=FNR;while(--x in a&&$1-10<a[x]){} while(++y in a&&$1+10>a[y]){} print $0,FNR-x,y-FNR}
' file.txt file.txt > file-2.txt
When file-2.txt reaches 1.1GB it is frozen. I am waiting several hours, but i can not see finish of command and final output file.
Note: I am working on Gogole cloud. Machine type
e2-highmem-8 (8 vCPUs, 64 GB memory)
A single pass awk that keeps the sliding window of 10 last records and uses that to count the ups and downs. For symmetricy's sake there should be deletes in the END but I guess a few extra array elements in memory isn't gonna make a difference:
$ awk '
BEGIN {
FS=OFS="\t"
}
NR==1 {
print $1,$2,"Up10","Down10"
}
NR>1 {
a[NR]=$1
b[NR]=$2
for(i=NR-9;i<=NR;i++) {
if(a[i]>=a[NR]-10&&i>=2)
up[NR]++
if(a[i]<=a[NR-9]+10&&i>=2)
down[NR-9]++
}
}
NR>10 {
print a[NR-9],b[NR-9],up[NR-9],down[NR-9]
delete a[NR-9]
delete b[NR-9]
delete up[NR-9]
delete down[NR-9]
}
END {
for(nr=NR+1;nr<=NR+9;nr++) {
for(i=nr-9;i<=nr;i++)
if(a[i]<=a[nr-9]+10&&i>=2&&i<=NR)
down[nr-9]++
print a[nr-9],b[nr-9],up[nr-9],down[nr-9]
}
}' file
Output:
X Y Up10 Down10
1 a 1 5
2 b 2 5
...
35 k 3 5
...
43 o 5 2
46 p 5 1
Another single pass approach with a sliding window
awk '
NR == 1 { next } # skip the header
NR == 2 { min = max = cur = 1; X[cur] = $1; Y[cur] = $2; next }
{ X[++max] = $1; Y[max] = $2
if (X[cur] >= $1 - 10) next
for (; X[cur] + 10 < X[max]; ++cur) {
for (; X[min] < X[cur] - 10; ++min) {
delete X[min]
delete Y[min]
}
print X[cur], Y[cur], cur - min + 1, max - cur
}
}
END {
for (; cur <= max; ++cur) {
for (; X[min] < X[cur] - 10; ++min);
for (i = max; i > cur && X[cur] + 10 < X[i]; --i);
print X[cur], Y[cur], cur - min + 1, i - cur + 1
}
}
' file
The script assumes the X column is ordered numerically.

Chapel iterators with conditionals

I am trying to write an iterator with a conditional in Chapel. This works
var x = [1,4,5,2,6,3];
iter dx(x) {
for y in x do yield 2*y;
}
for y in dx(x) {
writeln("y -> ", y);
}
returning
y -> 2
y -> 8
y -> 10
y -> 4
y -> 12
y -> 6
Suppose I want to only return the ones that are greater than 3. None of these will compile. What is the proper syntax?
var x = [1,4,5,2,6,3];
iter dx(x) {
//for y in x do {if x > 3} yield 2*y; // Barf
//for y in x do {if x > 3 yield 2*y }; // Barf
//for y in x do if x > 3 yield 2*y ; // Barf
}
for y in dx(x) {
writeln("y -> ", y);
}
The error is that you are checking against the iterator argument x instead of the current element y in the conditional. Try:
iter dx(x) {
for y in x {
if y > 3 {
yield 2*y;
}
}
}
or in the more concise form:
iter dx(x) {
for y in x do if y > 3 then yield 2*y;
}
Note that when the body of an if statement is a single statement, you may use a then keyword to introduce the body rather than enclosing it in braces { }. Unlike C, the then keyword is required (due to syntactic ambiguities that would occur otherwise).

Output the result of each loop in different columns

price.txt file has two columns: (name and value)
Mary 134
Lucy 56
Jack 88
range.txt file has three columns: (fruit and min_value and max_value)
apple 57 136
banana 62 258
orange 88 99
blueberry 98 121
My aim is to test whether the value in price.txt file is between the min_value and max_value in range.txt. If yes, putout 1, If not, output "x".
I tried:
awk 'FNR == NR { name=$1; price[name]=$2; next} {
for (name in price) {
if ($2<=price[name] && $3>=price[name]) {print 1} else {print "x"}
}
}' price.txt range.txt
But my results are all in one column, just like follows:
1
1
x
x
x
x
x
x
1
1
1
x
Actually, I want my result to be like: (Each name has one column)
1 x 1
1 x 1
x x 1
x x x
Because I need to use paste to add the output file and range.txt file together. The final result should be like:
apple 57 136 1 x 1
banana 62 258 1 x 1
orange 88 99 x x 1
blueberry 98 121 x x x
So, how can I get the result of each loop in different columns? And is there anyway to output the final result without paste based on my current code? Thank you.
This builds on what you provided,
# load prices by index to maintain read order
FNR == NR {
price[names++]=$2
next
}
# save max index to avoid using non-standard length(array)
END {
names=NR
}
{
l = $1 " " $2 " " $3
for (i=0; i < names; i++) {
if ($2 <= price[i] && $3 >= price[i]) {
l = l " 1"
} else {
l = l " x"
}
}
print l
}
and generates output,
apple 57 136 1 x 1
banana 62 258 1 x 1
orange 88 99 x x 1
blueberry 98 121 x x x
However, you don't have the person name for the score (anonymous results) - maybe that's intentional?
The change here is to explicitly index array populated in first block to maintain order.

Awk extract data block between text strings

I'm fighting with awk again for pulling out data from a log file. The area in question of my log file looks like this, however there are a few thousand lines above and below this block:
4C*DJ - (B-C)*DJK + 2*(2A+B+C)*D1 - 4*(4A+B-3C)*D2 = 0
Value = 0.5293955920D-22
Alpha Matrix in cm-1
Axis Mode Inertia Coriol. Anharm. Total
x 1 -0.37699D-03 -0.36413D-02 0.10830D-01 0.68121D-02
x 2 -0.83656D-03 -0.53163D-02 0.14483D-01 0.83306D-02
x 3 -0.15253D-02 -0.10512D-01 0.20064D-01 0.80264D-02
x 4 -0.17103D-03 -0.73492D-03 0.14953D-01 0.14047D-01
x 5 -0.96312D-03 -0.11748D-01 0.15825D-02 -0.11128D-01
x 6 -0.46095D-03 -0.94225D-02 0.44165D-02 -0.54669D-02
x 7 -0.26926D-01 -0.10167D-01 0.29406D-01 -0.76866D-02
x 8 -0.17827D-02 -0.21079D-01 0.74564D-02 -0.15405D-01
x 9 -0.55840D-02 0.84897D-01 -0.29596D-02 0.76354D-01
x 10 -0.50287D-24 0.36312D-01 -0.44078D-02 0.31904D-01
x 11 -0.48777D-24 -0.63320D-01 0.18876D-02 -0.61432D-01
x 12 -0.35364D-24 0.42877D-01 0.62352D-03 0.43500D-01
y 1 -0.23141D-05 -0.13777D-03 0.53278D-03 0.39270D-03
y 2 -0.62128D-05 -0.87905D-04 0.36602D-03 0.27190D-03
y 3 -0.55613D-05 -0.33722D-04 0.28874D-03 0.24946D-03
y 4 -0.47995D-04 -0.60863D-03 0.17426D-02 0.10860D-02
y 5 -0.36076D-04 -0.20493D-03 0.12026D-03 -0.12075D-03
y 6 -0.12725D-03 -0.61930D-03 -0.15830D-03 -0.90485D-03
y 7 -0.19917D-03 -0.55423D-04 0.10520D-02 0.79740D-03
y 8 -0.48978D-03 -0.13733D-02 0.54899D-03 -0.13141D-02
y 9 -0.11432D-02 0.62058D-03 -0.20074D-04 -0.54272D-03
y 10 -0.16078D-24 0.20852D-02 -0.88466D-04 0.19967D-02
y 11 -0.63877D-25 0.18274D-03 -0.13682D-03 0.45922D-04
y 12 -0.43257D-25 0.92039D-03 -0.61669D-03 0.30370D-03
z 1 -0.69174D-07 -0.23737D-03 0.59290D-03 0.35547D-03
z 2 -0.60773D-05 -0.18704D-03 0.53271D-03 0.33960D-03
z 3 -0.46425D-05 -0.29722D-03 0.57403D-03 0.27217D-03
z 4 -0.22234D-04 -0.47670D-03 0.15748D-02 0.10759D-02
z 5 -0.20254D-04 0.24124D-03 0.11848D-03 0.33947D-03
z 6 -0.42788D-04 0.99264D-04 -0.40246D-04 0.16230D-04
z 7 -0.10941D-03 0.30020D-03 0.13135D-02 0.15043D-02
z 8 -0.19997D-03 0.32196D-03 0.54501D-03 0.66699D-03
z 9 -0.20819D-03 0.45666D-03 -0.67765D-04 0.18071D-03
z 10 -0.55249D-25 0.00000D+00 -0.14491D-03 -0.14491D-03
z 11 -0.55828D-26 0.00000D+00 -0.69139D-04 -0.69139D-04
z 12 -0.26265D-26 0.00000D+00 -0.45200D-03 -0.45200D-03
Vibro-Rot alpha Matrix (cm-1)
a(z) b(x) c(y)
Q( 1) 0.00681 0.00039 0.00036
I need to extract the data from (in this case) " x 1 -0.37..." through "z 12 -0.262..."
I can head and tail the file if I can just get awk to extract the data to some known point. I have about 300 of these files, each has a different number of lines so I can't just count lines, but they all start with "Axis Mode Inertia..." and end with "Vibro-Rot alpha Matrix".
I'm currently trying to use:
awk '$1=="Axis"&&$2=="Mode"{t=1};t;/[0-9]+ "Vibro-Rot alpha Matrix"/{exit}' file.log
Which works to get the start of the file (though it includes the header which I can subsequently cut off). But the end part of the awk command doesn't work. I've tried to end it with ^Vib/{exit} and other things, but nothing seems to work, I just get a few thousand lines of the log file when I do it.
As I'm sure it matters, there is a single space before "axis" at the top, and before "Vibro-Rot" at the bottom of the file. Though the " $1=="Axis"&&$2=="Mode" " part doesn't seem to care about a single white space.
What am I missing to cut until the line that has "Vibro-Rot alpha Matrix" in it?
Thanks in advance!
Ben
It worked for me:
awk '$1 == "Axis" && $2 == "Mode" {t = 1;} $1 == "Vibro-Rot" && $2 == "alpha" && $3 == "Matrix" {t = 0;} t == 1 && NF == 6 {print $0}' file.log
In case you do not want the header, try:
awk '$1 == "Vibro-Rot" && $2 == "alpha" && $3 == "Matrix" {t = 0;} t == 1 && NF == 6 {print $0} $1 == "Axis" && $2 == "Mode" {t = 1;}' file.log
Try something like:
awk '!NF{p=0}p; /Axis Mode/{p=1}' file.log
--
Using your original approach:
How about:
awk '/Vibro-Rot alpha Matrix/{exit}t; $1=="Axis"&&$2=="Mode"{t=1}' file.log
Huh? Use grep:
egrep "^x|^y|^z" yourfile

transpose column and rows using gawk

I am trying to transpose a really long file and I am concerned that it will not be transposed entirely.
My data looks something like this:
Thisisalongstring12345678 1 AB abc 937 4.320194
Thisisalongstring12345678 1 AB efg 549 0.767828
Thisisalongstring12345678 1 AB hi 346 -4.903441
Thisisalongstring12345678 1 AB jk 193 7.317946
I want my data to look like this:
Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678
1 1 1 1
AB AB AB AB
abc efg hi jk
937 549 346 193
4.320194 0.767828 -4.903441 7.317946
Would the length of the first string prove to be an issue? My file is much longer than this approx 2000 lines long. Also is it possible to change the name of the first string to Thisis234, and then transpose?
I don't see why it will not be - unless you don't have enough memory. Try the below and see if you run into problems.
Input:
$ cat inf.txt
a b c d
1 2 3 4
. , + -
A B C D
Awk program:
$ cat mkt.sh
awk '
{
for(c = 1; c <= NF; c++) {
a[c, NR] = $c
}
if(max_nf < NF) {
max_nf = NF
}
}
END {
for(r = 1; r <= NR; r++) {
for(c = 1; c <= max_nf; c++) {
printf("%s ", a[r, c])
}
print ""
}
}
' inf.txt
Run:
$ ./mkt.sh
a 1 . A
b 2 , B
c 3 + C
d 4 - D
Credits:
http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_12.html#SEC121
Hope this helps.
This can be done with the rs BSD command:
http://www.unix.com/man-page/freebsd/1/rs/
Check out the -T option.
I tried icyrock.com's answer, but found that I had to change:
for(r = 1; r <= NR; r++) {
for(c = 1; c <= max_nf; c++) {
to
for(r = 1; r <= max_nf; r++) {
for(c = 1; c <= NR; c++) {
to get the NR columns and max_nf rows. So icyrock's code becomes:
$ cat mkt.sh
awk '
{
for(c = 1; c <= NF; c++) {
a[c, NR] = $c
}
if(max_nf < NF) {
max_nf = NF
}
}
END {
for(r = 1; r <= max_nf; r++) {
for(c = 1; c <= NR; c++) {
printf("%s ", a[r, c])
}
print ""
}
}
' inf.txt
If you don't do that and use an asymmetrical input, like:
a b c d
1 2 3 4
. , + -
You get:
a 1 .
b 2 ,
c 3 +
i.e. still 3 rows and 4 columns (the last of which is blank).
For # ScubaFishi and # icyrock code:
"if (max_nf < NF)" seems unnecessary. I deleted it, and the code works just fine.