regex with look around to match following - regex-lookarounds

need to write regex to match quoted and unquoted item
\"[a-z,A-Z]+.*\"|\b[a-z,A-Z,0-9]+.*(?<=[\n+\r+\s+]})
sample data for matching
"visitor block" {
option {
disable-server-response no;
}
source any;
category any;
tag guest-in;
}
visitor_Internet {
option {
disable-server-response no;
}
source-user any;
action allow;
}
"Deny guest" {
option {
disable-server-response no;
}
action deny;
tag guest;
}
By regex I need to select string in double quotes (visitor block,Deny guest) and also string not in quotes such that visitor_Internet.only name string to capture

I assume you want
to capture a substring at the start of a line
that is inside double quotes and starts with an ASCII letter
and then capture all text from the { until the } that is followed with line break and any whitespaces followed with the first group pattern, or at the end of string.
You may use
(?ms)^\s*"([a-zA-Z][^"]*)"\s*{(.*?)}(?=\h*\n\s*"[a-zA-Z][^"]*"\s*{|\s*\z)
See the regex demo
NOTE: This won't work if there are non-matching lines between the blocks.
Details
(?ms) - multiline and dotall modes on
^ - start of a line
\s* - 0 or more whitespaces
" - a "
([a-zA-Z][^"]*) - Group 1: an ASCII letter and then 0+ chars other than "
" - a " char
\s* - 0+ whitespaces
{ - a { char
(.*?) - Group 2: any 0+ chars, as few as possible
} - a } char that is followed with
(?=\h*\n\s*"[a-zA-Z][^"]*"\s*{|\s*\z) - either
\h*\n\s*"[a-zA-Z][^"]*"\s*{ - 0+ horizontal whitespaces, a newline, 0+ whitespaces, ", an ASCII letter, 0+ chars other than ", " and then 0+ whitespaces and {
| - or
\s*\z - 0+ whitespaces at the very end of the string.

Related

LIKE operator with a bindvar pattern which works also for special characters

I have such a query:
WHERE x LIKE $1
, where $1 is a bindvar string built in the backend:
$1 = "%" + PATTERN + "%"
Is it possible to build a LIKE PATTERN in that way that special characters (% and _) are escaped, so I have the same functionality, but it works for all possible PATTERN values.
You would want to escape the literal % and _ with backslash. For example, in PHP we might try:
$pattern = "something _10%_ else";
$pattern = preg_replace("/([%_])/", "\\\\$1", $pattern);
echo $pattern; // something \_10\%\_ else

Is Perl 6's uncuddled else a special case for statement separation?

From the syntax doc:
A closing curly brace followed by a newline character implies a statement separator, which is why you don't need to write a semicolon after an if statement block.
if True {
say "Hello";
}
say "world";
That's fine and what was going on with Why is this Perl 6 feed operator a “bogus statement”?.
However, how does this rule work for an uncuddled else? Is this a special case?
if True {
say "Hello";
}
else {
say "Something else";
}
say "world";
Or, how about the with-orwith example:
my $s = "abc";
with $s.index("a") { say "Found a at $_" }
orwith $s.index("b") { say "Found b at $_" }
orwith $s.index("c") { say "Found c at $_" }
else { say "Didn't find a, b or c" }
The documentation you found was not completely correct. The documentation has been updated and is now correct. It now reads:
Complete statements ending in bare blocks can omit the trailing semicolon, if no additional statements on the same line follow the block's closing curly brace }.
...
For a series of blocks that are part of the same if/elsif/else (or similar) construct, the implied separator rule only applies at the end of the last block of that series.
Original answer:
Looking at the grammar for if in nqp and Rakudo, it seems that an if/elsif/else set of blocks gets parsed out together as one control statement.
Rule for if in nqp
rule statement_control:sym<if> {
<sym>\s
<xblock>
[ 'elsif'\s <xblock> ]*
[ 'else'\s <else=.pblock> ]?
}
(https://github.com/perl6/nqp/blob/master/src/NQP/Grammar.nqp#L243, as of August 5, 2017)
Rule for if in Rakudo
rule statement_control:sym<if> {
$<sym>=[if|with]<.kok> {}
<xblock(so ~$<sym>[0] ~~ /with/)>
[
[
| 'else'\h*'if' <.typed_panic: 'X::Syntax::Malformed::Elsif'>
| 'elif' { $/.typed_panic('X::Syntax::Malformed::Elsif', what => "elif") }
| $<sym>='elsif' <xblock>
| $<sym>='orwith' <xblock(1)>
]
]*
{}
[ 'else' <else=.pblock(so ~$<sym>[-1] ~~ /with/)> ]?
}
(https://github.com/rakudo/rakudo/blob/nom/src/Perl6/Grammar.nqp#L1450 as of August 5, 2017)

Perl6 grammars: match full line

I've just started exploring perl6 grammars. How can I make up a token "line" that matches everything between the beginning of a line and its end? I've tried the following without success:
my $txt = q:to/EOS/;
row 1
row 2
row 3
EOS
grammar sample {
token TOP {
<line>
}
token line {
^^.*$$
}
}
my $match = sample.parse($txt);
say $match<line>[0];
I can see 2 problem in your Grammar here, the first one here is the token line, ^^ and $$ are anchor to start and end of line, howeve you can have new line in between. To illustrate, let's just use a simple regex, without Grammar first:
my $txt = q:to/EOS/;
row 1
row 2
row 3
EOS
if $txt ~~ m/^^.*$$/ {
say "match";
say $/;
}
Running that, the output is:
match
「row 1
row 2
row 3」
You see that the regex match more that what is desired, however the first problem is not there, it is because of ratcheting, matching with a token will not work:
my $txt = q:to/EOS/;
row 1
row 2
row 3
EOS
my regex r {^^.*$$};
if $txt ~~ &r {
say "match regex";
say $/;
} else {
say "does not match regex";
}
my token t {^^.*$$};
if $txt ~~ &t {
say "match token";
say $/;
} else {
say "does not match token";
}
Running that, the output is:
match regex
「row 1
row 2
row 3」
does not match token
I am not really sure why, but token and anchor $$ does not seems to work well together. But what you want instead is searching for everything except a newline, which is \N*
The following grammar solve mostly your issue:
grammar sample {
token TOP {<line>}
token line {\N+}
}
However it only matches the first occurence, as you search for only one line, what you might want to do is searching for a line + an optional vertical whitespace (In your case, you have a new line at the end of your string, but i guess you would like to take the last line even if there is no new line at the end ), repeated several times:
my $txt = q:to/EOS/;
row 1
row 2
row 3
EOS
grammar sample {
token TOP {[<line>\v?]*}
token line {\N+}
}
my $match = sample.parse($txt);
for $match<line> -> $l {
say $l;
}
Output of that script begin:
「row 1」
「row 2」
「row 3」
Also to help you using and debugging Grammar, 2 really usefull modules : Grammar::Tracer and Grammar::Debugger . Just include them at the beginning of the script. Tracer show a colorful tree of the matching done by your Grammar. Debugger allows you to see it matching step by step in real time.
Your original aproach can be made to work via
grammar sample {
token TOP { <line>+ %% \n }
token line { ^^ .*? $$ }
}
Personally, I would not try to anchor line and use \N instead as already suggested.
my $txt = q:to/EOS/;
row 1
row 2
row 3
EOS
grammar sample {
token TOP {
<line>+
}
token line {
\N+ \n
}
}
my $match = sample.parse($txt);
say $match<line>[0];
Or if you can be specific about the line:
grammar sample {
token TOP {
<line>+
}
rule line {
\w+ \d
}
}
my $txt = q:to/EOS/;
row 1
row 2
row 3
EOS
grammar sample {
token TOP { <line> }
token line { .* }
}
for $txt.lines -> $line {
## An single line of text....
say $line;
## Parse line of text to find match obj...
my $match = sample.parse($line);
say $match<line>;
}

Need help to understand a small piece of a lex/flex hello world example

I'm a beginner of lex/flex and yacc. I'm now reading a book which gives a hello world example of lex/flex input file, to implement a simple calculator lexer.
The code is here:
%{
#include <stdoio.h>
#include "y.tab.h"
int
yywrap(void)
{
return 1;
}
%}
%%
"+" return ADD;
"-" return SUB;
"*" return MUL;
"/" return DIV;
"\n" return CR;
([1-9][0-9]*)|0|([0-9]+\.[0-9]+) {
double temp;
sscanf(yytext,"%lf",&temp);
yylval.double_value=temp;
return DOUBLE_LITERAL;
}
[ \t] ;
.{
fprintf(stderr, "lexical error.\n");
exit(1);
}
%%
I don't quite understand what does the line [ \t] ; do here. Could anybody help me? thx.
The brackets indicate a "character class." Any character that appears within the brackets is considered a match. Here we have two characters, space and horizontal tab (\t). These characters are often called "whitespace."
The bare semicolon says "do nothing."
So the rule says, "whenever you see either a space or a tab (a whitespace character), do nothing and get the next character."
Since the input to the lexer might have multiple whitespace characters repeated together, this lexer rule could be applied multiple times. As a simplification, it is common to see a quantifier like + (1 or more) or * (zero or more) after the character class. This rule means, "whenever you see one or more whitespace characters, do nothing and get the next character."
[ \t]+ ;

Lex : line with one character but spaces

I have sentences like :
" a"
"a "
" a "
I would like to catch all this examples (with lex), but I don't how to say the beginning of the line
I'm not totally sure what exactly you're looking for, but the regex symbol to specify matching the beginning of a line in a lex definition is the caret:
^
If I understand correctly, you're trying to pull the "a" out as the token, but you don't want to grab any of the whitespace? If this is the case, then you just need something like the following:
[\n\t\r ]+ {
// do nothing
}
"a" {
assignYYText( yylval );
return aToken;
}