Print lines matching a pattern only if the next line does not match the pattern - awk

I want to use awk to print lines that match a pattern only if the following line does not match the pattern. In this case, the pattern is that the line begins with O. This is what I tried:
awk '!/^O/ {print x}; /^O/ {x=$0}' myfile.txt
This is printing far too many lines though, including printing the lines I specifically do not want to print.

Not tested.
Should probz work
awk '/^O/{if(seen==0){seen=1};c=$0} !/^O/{if (seen==1) {print c; seen=0;}}' myfile.txt
Shortened version
awk '/^O/{x=$0} !/^O/{if(x!=0) {print x; x=0;}}' myfile.txt
More shortening
awk '/^O/{x=$0} !/^O/{if(x){print x;x=0;}}' myfile
Think this is shortest it can go
awk '/^O/{x=$0} !/^O/&&x{print x;x=0;}' myfile
Changed them all because it printed the wrong lines.
also made it shorter :)
awk 'a=/^O/{x=$0} !a&&x{print x;x=0;}' myfile

Related

Match regexp at the end of the string with AWK

I am trying to match two different Regexp to long strings with awk, removing the part of the string that matches in a 35 characters window.
The problem is that the same bunch of code works when I am looking for the first (which matches at the beginnng) whereas fails to match with the second one (end of string).
Input:
Regexp1(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)Regexp2
Desired output
(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)
So far I used this code that extracts correctly Regexp1, but, unfortunately, is not able to extract also Regexp2 since indexed of RSTART and RLENGTH for Regexp2 are incorrect.
Code for extracting Regexp1 (correct output):
awk -v F="Regexp1" '{if (match(substr($1,1,35),F)) print substr($1,RSTART,RLENGTH)}' file
Code for extracting Regexp2 (wrong output)
awk -v F="Regexp2" '{if (match(substr($1,length($1)-35,35),F)) print substr($1,RSTART,RLENGTH)}' file
Despite the indexes for Regexp1 are correct, for Regexp2 indexes are wrond (RSTART=13). I cannot figure out how to extract the second Regexp.
Considering that your actual Input_file is same as shown samples, if this is the case could you please try following then(good to have new version of awk since old versions may not support number of times logic for regex).
awk '
match($0,/\([0-9]+\){5}.*\([0-9]\){4}/){
print substr($0,RSTART,RLENGTH)
}' Input_file
In case your number of parenthesis values are not fixed then you could do like as follows:
awk '
match($0,/\([0-9]+\){1,}.*\([0-9]\){1,}/){
print substr($0,RSTART,RLENGTH)
}' Input_file
If this isn't all you need:
$ sed 's/Regexp1\(.*\)Regexp2/\1/' file
(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)
or using GNU awk for gensub():
$ awk '{print gensub(/Regexp1(.*)Regexp2/,"\\1",1)}' file
(1)(2)(3)(4)(5)xxxxxxxxxxxxxxx(20)(21)(22)(23)
then edit your question to be far clearer with your requirements and example.

Merge the next line if the current line contains a pattern at the end

I have a very big text file. I want to merge the next line into the current line if the current line has a word OR in the end.
Eg. Like in the lines below
somerandomstring OR
someotherrandomstring
The above 2 lines should become
somerandomstring OR someotherrandomstring
Only those lines should change. Rest of the lines must be kept as they are. Thanks in advance.
Allow me to extend the question a bit further.
I want to also see if the next line starts with OR and the OR is not in the end of the current line, then how to achieve the above case and this case together?
With GNU sed:
sed '/ OR$/{N;s/\n/ /}' file
Search white space followed byOR at end of line ($) and if found then read next line (N) to pattern space and replace newline in pattern space (with s///) by one white space.
If you want to edit your file "in place" use sed's option -i.
You can do this in awk by assigning the "OR" row to a variable and printing whatever is stored in the variable when there is no "OR" found.
awk '$NF=="OR"{buffer=buffer$0" "} $NF!="OR"{print buffer$0;buffer=""}' test.txt
This also works on multiple rows that may have "OR" by concatenating the row to the buffer variable until it finds a non-OR row, prints it and clears the buffer variable.
Another option in awk is to use printf on the OR rows and print on the non-or rows (which is kind of similar to the GNU sed example by #cyrus, but in awk)
awk '$NF=="OR"{printf "%s ", $0} $NF!="OR"{print $0}' test.txt
And this is the same beast, but using a ternary operator within printf:
awk '{printf $NF=="OR"?"%s ":"%s\n", $0}' test.txt
another awk
$2$ awk -v RS='^$' -v k='OR' '{gsub(k ORS,k FS); gsub(ORS k,FS k); printf "%s",$0}' file
x OR y
x OR y
for test input file
$ cat file
x OR
y
x
OR y

print last two words of last line

I have a script which returns few lines of output and I am trying to print the last two words of the last line (irrespective of number of lines in the output)
$ ./test.sh
service is running..
check are getting done
status is now open..
the test is passed
I tried running as below but it prints last word of each line.
$ ./test.sh | awk '{ print $NF }'
running..
done
open..
passed
how do I print the last two words "is passed" using awk or sed?
Just say:
awk 'END {print $(NF-1), $NF}'
"normal" awks store the last line (but not all of them!), so that it is still accessible by the time you reach the END block.
Then, it is a matter of printing the penultimate and the last one. This can be done using the NF-1 and NF trick.
For robustness if your last line can only contain 1 field and your awk doesn't retain the field values in the END section:
awk '{split($0,a)} END{print (NF>1?a[NF-1]OFS:"") a[NF]}'
This might work for you (GNU sed):
sed '$s/.*\(\<..*\<.*\)/\1/p;d' file
This deletes all lines in the file but on the last line it replaces all words by the last two words and prints them if successful.

How to force AWK to stop stop applying rules?

I want AWK to process my file, and change only some lines. But it prints only rule-matched lines. So I've added a /*/ {print $0} rule. But then I've got a duplications.
awk '/rule 1/ {actions 1;} /*/ {print $0}' file.txt
I want all lines in the output, with 'rule 1' lines changed.
Adding a 1 to the end of the script, forces awk to return true, which has the effect of enabling printing of all lines by default. For example, the following will print all lines in the file. However, if the line contains the words rule 1, only the first field of that line will be printed.
awk '/rule 1/ { print $1; next }1' file
The word next skips processing the rest of the code for that particular line. You can apply whatever action you'd like. I just chose to print $1. HTH.
I'll make a big edit, following Ed Norton latest explanations.
as Ed Morton pointed out, it can even be simplified as : changing lines with specific patterns, and then printing all lines
awk '/regex1/ {actions_1} 1' file.txt
(see his comments below for the reason why it's preferable to the one I proposed)
For the record, there exist ways to skip the rest of the processing for the current line, such as : continue or break if it is in a loop, or next if it is in the main loop.
See for example : http://www.gnu.org/software/gawk/manual/html_node/Next-Statement.html#Next-Statement
Or assign the result of actions 1 to $0:
awk '/rule 1/{$0=actions 1}1' file.txt
for example:
awk '/rule 1/{$0=$1}1' file.txt

Unable to match regex in string using awk

I am trying to fetch the lines in which the second part of the line contains a pattern from the first part of the line.
$ cat file.txt
String1 is a big string|big
$ awk -F'|' ' { if ($2 ~ /$1/) { print $0 } } ' file.txt
But it is not working.
I am not able to find out what is the mistake here.
Can someone please help?
Two things: No slashes, and your numbers are backwards.
awk -F\| '$1~$2' file.txt
I guess what you meant is part of the string in the first part should be a part of the 2nd part.if this is what you want! then,
awk -F'|' '{n=split($1,a,' ');for(i=1,i<=n;i++){if($2~/a[i]/)print $0}}' your_file
There are surprisingly many things wrong with your command line:
1) You aren't using the awk condition/action syntax but instead needlessly embedding a condition within an action,
2) You aren't using the default awk action but instead needlessly hand-coding a print $0.
3) You have your RE operands reversed.
4) You are using RE comparison but it looks like you really want to match strings.
You can fix the first 3 of the above by modifying your command to:
awk -F'|' '$1~$2' file.txt
but I think what you really want is "4" which would mean you need to do this instead:
awk -F'|' 'index($1,$2)' file.txt