I see the use of flip-flop in doc.perl6.org, see the code below :
my $excerpt = q:to/END/;
Here's some unimportant text.
=begin code
This code block is what we're after.
We'll use 'ff' to get it.
=end code
More unimportant text.
=begin code
I want this line.
and this line as well.
HaHa
=end code
More unimport text.
=begin code
Let's to go home.
=end code
END
my #codelines = gather for $excerpt.lines {
take $_ if "=begin code" ff "=end code"
}
# this will print four lines, starting with "=begin code" and ending with
# "=end code"
.say for #codelines;
=begin code
This code block is what we're after.
We'll use 'ff' to get it.
=end code
=begin code
I want this line.
and this line as well.
HaHa
=end code
=begin code
Let's to go home.
=end code
I want to save the lines between =begin code and =end code into separate arrays, such as this:
['This code block is what we're after.', 'We'll use 'ff' to get it.']
['I want this line.', 'and this line as well.', 'HaHa']
['Let's to go home.']
I know grammar can do this, but I want to know if there is a beter way?
You need to specify that you want the matched values to not be included. You do this by adding ^ to the side of the operator you want excluded. In this case it is both sides of the operator.
You also need to collect the values up to have the grouped together. The simplest way in this case is to put that off between matches.
(If you wanted the endpoints included it would require more thought to get it right)
my #codelines = gather {
my #current;
for $excerpt.lines {
if "=begin code" ^ff^ "=end code" {
# collect the values between matches
push #current, $_;
} else {
# take the next value between matches
# don't bother if there wasn't any values matched
if #current {
# you must do something so that you aren't
# returning the same instance of the array
take #current.List;
#current = ();
}
}
}
}
If you need the result to be an array of arrays (mutable).
if #current {
take #current;
#current := []; # bind it to a new array
}
An alternative would be to use do for with sequences that share the same iterator.
This works because for is more eager than map would be.
my $iterator = $excerpt.lines.iterator;
my #codelines = do for Seq.new($iterator) {
when "=begin code" {
do for Seq.new($iterator) {
last when "=end code";
$_<> # make sure it is decontainerized
}
}
# add this because `when` will return False if it doesn't match
default { Empty }
}
map takes one sequence and turns it into another, but doesn't do anything until you try to get the next value from the sequence.
for starts iterating immediately, only stopping when you tell it to.
So map would cause race conditions even when running on a single thread, but for won't.
You can also use good old regular expressions:
say ( $excerpt ~~ m:s:g{\=begin code\s+(.+?)\s+\=end code} ).map( *.[0] ).join("\n\n")
s for significant whitespace (not really needed), g for extracting all matches (not the first one), .map goes through the returned Match object and extracts the first element (it's a data structure that contains the whole matched code). This creates a List that is ultimatelly printed with every element separated by two CRs.
another answer in reddit by bobthecimmerian, i copy it here for completeness:
my $excerpt = q:to/END/;
Here's some unimportant text.
=begin code
This code block is what we're after.
We'll use 'ff' to get it.
=end code
More unimportant text.
=begin code
I want this line.
and this line as well.
HaHa
=end code
More unimport text.
=begin code
Let's to go home.
=end code
END
sub doSomething(Iterator $iter) {
my #lines = [];
my $item := $iter.pull-one;
until ($item =:= IterationEnd || $item.Str ~~ / '=end code' /) {
#lines.push($item);
$item := $iter.pull-one;
}
say "Got #lines[]";
}
my Iterator $iter = $excerpt.lines.iterator;
my $item := $iter.pull-one;
until ($item =:= IterationEnd) {
if ($item.Str ~~ / '=begin code' /) {
doSomething($iter);
}
$item := $iter.pull-one;
}
the output is:
Got This code block is what we're after. We'll use 'ff' to get it.
Got I want this line. and this line as well. HaHa
Got Let's to go home.
use grammar:
#use Grammar::Tracer;
#use Grammar::Debugger;
my $excerpt = q:to/END/;
Here's some unimportant text.
=begin code
This code block is what we're after.
We'll use 'ff' to get it.
=end code
More unimportant text.
=begin code
I want this line.
and this line as well.
HaHa
=end code
More unimport text.
=begin code
Let's to go home.
=end code
END
grammar ExtractSection {
rule TOP { ^ <section>+ %% <.comment> $ }
token section { <line>+ % <.ws> }
token line { <?!before <comment>> \N+ \n }
token comment { ['=begin code' | '=end code' ] \n }
}
class ExtractSectionAction {
method TOP($/) { make $/.values».ast }
method section($/) { make ~$/.trim }
method line($/) { make ~$/.trim }
method comment($/) { make Empty }
}
my $em = ExtractSection.parse($excerpt, :actions(ExtractSectionAction)).ast;
for #$em -> $line {
say $line;
say '-' x 35;
}
Output:
Here's some unimportant text.
-----------------------------------
This code block is what we're after.
We'll use 'ff' to get it.
-----------------------------------
More unimportant text.
-----------------------------------
I want this line.
and this line as well.
HaHa
-----------------------------------
More unimport text.
-----------------------------------
Let's to go home.
-----------------------------------
But it included unrelated lines, based on #Brad Gilbert's solution, I updated the above answer as follows(and thanks again):
#use Grammar::Tracer;
#use Grammar::Debugger;
my $excerpt = q:to/END/;
Here's some unimportant text.
=begin code
This code block is what we're after.
We'll use 'ff' to get it.
=end code
More unimportant text.
=begin code
I want this line.
and this line as well.
HaHa
=end code
More unimport text.
=begin code
Let's to go home.
=end code
END
grammar ExtractSection {
token start { ^^ '=begin code' \n }
token finish { ^^ '=end code' \n }
token line { ^^ \N+)> \n }
token section { <start> ~ <finish> <line>+? }
token comment { ^^\N+ \n }
token TOP { [<section> || <comment>]+ }
}
class ExtractSectionAction {
method TOP($/) { make #<section>».ast.List }
method section($/) { make ~«#<line>.List }
method line($/) { make ~$/.trim }
method comment($/) { make Empty }
}
my $em = ExtractSection.parse($excerpt, :actions(ExtractSectionAction)).ast;
for #$em -> $line {
say $line.perl;
say '-' x 35;
}
Output is:
$("This code block is what we're after.", "We'll use 'ff' to get it.")
-----------------------------------
$("I want this line.", "and this line as well.", "HaHa")
-----------------------------------
$("Let's to go home.",)
-----------------------------------
So it works as expected.
It could be another solution, use rotor
my $excerpt = q:to/END/;
Here's some unimportant text.
=begin code
This code block is what we're after.
We'll use 'ff' to get it.
=end code
More unimportant text.
=begin code
I want this line.
and this line as well.
HaHa
=end code
More unimport text.
=begin code
Let's to go home.
=end code
END
my #sections =
gather for $excerpt.lines -> $line {
if $line ~~ /'=begin code'/ ff $line ~~ /'end code'/ {
take $line.trim;
}
}
my #idx = # gather take the indices of every `=begin code` and `=end code`
gather for #sections.kv -> $k, $v {
if $v ~~ /'=begin code'/ or $v ~~ /'end code'/ {
take $k;
}
}
my #r = # gather take the lines except every line of `=begin code` and `=end code`
gather for #sections.kv -> $k, $v {
if $v !~~ /'=begin code' | '=end code'/ {
take $v;
}
}
my #counts = #idx.rotor(2)».minmax».elems »-» 2;
say #r.rotor(|#counts).perl;
Output:
(("This code block is what we're after.", "We'll use 'ff' to get it."), ("I want this line.", "and this line as well.", "HaHa"), ("Let's to go home.",)).Seq
Another answer:
my $excerpt = q:to/END/;
Here's some unimportant text.
=begin code
This code block is what we're after.
We'll use 'ff' to get it.
=end code
More unimportant text.
=begin code
I want this line.
and this line as well.
HaHa
=end code
More unimport text.
=begin code
Let's to go home.
=end code
END
for $excerpt.comb(/'=begin code' \s* <( .+? )> \s+ '=end code' /) -> $c {
say $c;
say '-' x 15;
}
use the comb operator:
my $str = q:to/EOS/;
Here's some unimportant text.
=begin code
This code block is what we're after.
We'll use 'ff' to get it.
=end code
More unimportant text.
=begin code
I want this line.
and this line as well.
HaHa.
=end code
More unimport text.
=begin code
Let's go home.
=end code
EOS
my token separator { '=begin code' \n | '=end code' \n }
my token lines { [<!separator> .]+ }
say $str.comb(
/
<lines> # match lines that not start with
# =begin code or =end code
<separator> # match lines that start with
# =begin code or =end code
<( # start capture
<lines>+ # match lines between
# =begin code and =end code
)> # end capture
<separator> # match lines that start with
# =begin code or =end code
/).raku;
Output:
("This code block is what we're after.\nWe'll use 'ff' to get it.\n", "I want this line.\nand this line as well.\nHaHa.\n", "Let's go home.\n").Seq
Related
Let's say I have the following module:
module Simple-Mod;
#| Calculate the nth fibonacci number.
multi fib( 0 ) { 1 }
multi fib( 1 ) { 1 }
multi fib( Int $n where * > 1 ) {
fib($n - 2 ) + fib($n - 1);
}
#| Say hello to a person.
sub hello( $person ) { say "Hello, $person!" }
=begin pod
=head1 SYNOPSIS
A really simple module.
=head1 Example
=begin code
use Simple-Mod;
say fib(3); #=> 2
hello("Gina"); #=> Hello, Gina!
=end code
=head1 Subroutines
=end pod
At the moment, when I extract the Pod from this module, I obtain this:
sub fib(
Int $ where { ... },
)
Calculate the nth fibonacci number.
sub hello(
$person,
)
Say hello to a person.
SYNOPSIS
A really simple module.
Example
use Simple-Mod;
say fib(3); #=> 2
hello("Gina"); #=> Hello, Gina!
Subroutines
Is it possible to instruct the Pod parsing process to place the
subroutine definitions and comments after the Subroutines header? Like this:
SYNOPSIS
A really simple module.
Example
use Simple-Mod;
say fib(3); #=> 2
hello("Gina"); #=> Hello, Gina!
Subroutines
sub fib(
Int $ where { ... },
)
Calculate the nth fibonacci number.
sub hello(
$person,
)
Say hello to a person.
I could probably place everything from the =begin pod to the =head1 Subroutines (followed by =end pod) directive at the top of file and then the usual code with the declarator blocks. However I'd like to keep all the Pod at the bottom of the file if possible.
By tinkering with the Pod::To::Text module, I came up a somewhat hackish solution
that is far from robust. It solely depends on a new subroutine and some changes to the render, pod2text and heading2text routines:
unit class Pod::To::Textx;
my $top-pod = Any;
method render($pod, Bool $declarator-displacement = False) {
$top-pod = $pod if $declarator-displacement;
pod2text($pod)
}
sub pod2text($pod) is export {
# other code
when Pod::Block::Declarator { if $top-pod { succeed }
else { declarator2text($pod) }
}
# remaining code
}
sub add-code-info($pod) {
return pod2text($pod.contents) unless $top-pod;
if $pod.contents.head.contents.lc.contains("routines") {
pod2text($pod.contents) ~
#($top-pod).grep({ $_ ~~ Pod::Block::Declarator })
.map({ "\n\n" ~ declarator2text($_) })
}
}
sub heading2text($pod) {
given $pod.level {
when 1 { add-code-info($pod) }
when 2 { ' ' ~ pod2text($pod.contents) }
default { ' ' ~ pod2text($pod.contents) }
}
}
# rest of code
To render the Pod in a .p6 file and place the declarator blocks underneath a heading level 1 titled Subroutines/Routines, use:
use Pod::To::Textx;
say Text.new().render($=pod, True);
inside the file.
In perl6 grammars, as explained here (note, the design documents are not guaranteed to be up-to-date as the implementation is finished), if an opening angle bracket is followed by an identifier then the construct is a call to a subrule, method or function.
If the character following the identifier is an opening paren, then it's a call to a method or function eg: <foo('bar')>. As explained further down the page, if the first char after the identifier is a space, then the rest of the string up to the closing angle will be interpreted as a regex argument to the method - to quote:
<foo bar>
is more or less equivalent to
<foo(/bar/)>
What's the proper way to use this feature? In my case, I'm parsing line oriented data and I'm trying to declare a rule that will instigate a seperate search on the current line being parsed:
#!/usr/bin/env perl6
# use Grammar::Tracer ;
grammar G {
my $SOLpos = -1 ; # Start-of-line pos
regex TOP { <line>+ }
method SOLscan($regex) {
# Start a new cursor
my $cur = self."!cursor_start_cur"() ;
# Set pos and from to start of the current line
$cur.from($SOLpos) ;
$cur.pos($SOLpos) ;
# Run the given regex on the cursor
$cur = $regex($cur) ;
# If pos is >= 0, we found what we were looking for
if $cur.pos >= 0 {
$cur."!cursor_pass"(self.pos, 'SOLscan')
}
self
}
token line {
{ $SOLpos = self.pos ; say '$SOLpos = ' ~ $SOLpos }
[
|| <word> <ws> 'two' { say 'matched two' } <SOLscan \w+> <ws> <word>
|| <word>+ %% <ws> { say 'matched words' }
]
\n
}
token word { \S+ }
token ws { \h+ }
}
my $mo = G.subparse: q:to/END/ ;
hello world
one two three
END
As it is, this code produces:
$ ./h.pl
$SOLpos = 0
matched words
$SOLpos = 12
matched two
Too many positionals passed; expected 1 argument but got 2
in method SOLscan at ./h.pl line 14
in regex line at ./h.pl line 32
in regex TOP at ./h.pl line 7
in block <unit> at ./h.pl line 41
$
Line 14 is $cur.from($SOLpos). If commented out, line 15 produces the same error. It appears as though .pos and .from are read only... (maybe :-)
Any ideas what the proper incantation is?
Note, any proposed solution can be a long way from what I've done here - all I'm really wanting to do is understand how the mechanism is supposed to be used.
It does not seem to be in the corresponding directory in roast, so that would make it a "Not Yet Implemented" feature, I'm afraid.
In Edit distance: Ignore start/end, I offered a Perl 6 solution to a fuzzy fuzzy matching problem. I had a grammar like this (although maybe I've improved it after Edit #3):
grammar NString {
regex n-chars { [<.ignore>* \w]**4 }
regex ignore { \s }
}
The literal 4 itself was the length of the target string in the example. But the next problem might be some other length. So how can I tell the grammar how long I want that match to be?
Although the docs don't show an example or using the $args parameter, I found one in S05-grammar/example.t in roast.
Specify the arguments in :args and give the regex an appropriate signature. Inside the regex, access the arguments in a code block:
grammar NString {
regex n-chars ($length) { [<.ignore>* \w]**{ $length } }
regex ignore { \s }
}
class NString::Actions {
method n-chars ($/) {
put "Found $/";
}
}
my $string = 'The quick, brown butterfly';
loop {
state $from = 0;
my $match = NString.subparse(
$string,
:rule('n-chars'),
:actions(NString::Actions),
:c($from++),
:args( \(5) )
);
last unless ?$match;
}
I'm still not sure about the rules for passing the arguments though. This doesn't work:
:args( 5 )
I get:
Too few positionals passed; expected 2 arguments but got 1
This works:
:args( 5, )
But that's enough thinking about this for one night.
I was playing around with Interpolating into names. I was mostly interested in this colon syntax feature to turn a variable into a pair where the identifier is the key.
my %Hamadryas = map { slip $_, 0 }, <
februa
honorina
velutina
>;
{
my $pair = :%Hamadryas;
say $pair; # Hamadryas => { ... }
}
put '-' x 50;
But, just for giggles, I wanted to try it with variable name interpolation too. I know this is stupid because if I know the name I don't need the colon syntax to get it. But, I also thought that it should work by accident:
{
my $name = 'Hamadryas';
# Since I already have the name, I could just:
# my $pair = $name => %::($name)
# But, couldn't I just line up the syntax?
my $pair = :%::($name); # does not work
say $pair;
}
Why doesn't that :%::($name) syntax work? That's more a question of when the parser decides that it's not parsing something it wants to understand. I figured it would see the : and start processing a colon pair, then see the % and know it had a hash, even though there's the :: after the %.
Is there a way to make it work with tricks and grammar mutations?
I'm writing a "compiler" of sorts: it reads a description of a game (with rooms, characters, things, etc.) Think of it as a visual version of an Adventure-style game, but with much simpler problems.
When I run my "compiler" I'm getting a syntax error on my input, and I can't figure out why. Here's the relevant section of my yacc input:
character
: char-head general-text character-insides { PopChoices(); }
;
character-insides
: LEFTBRACKET options RIGHTBRACKET
;
char-head
: char-namesWT opt-imgsWT char-desc opt-cond
;
char-desc
: general-text { SetText($1); }
;
char-namesWT
: DOTC ID WORD { AddCharacter($3, $2); expect(EXP_TEXT); }
;
opt-cond
: %empty
| condition
;
condition
: condition-reason condition-main general-text
{ AddCondition($1, $2, $3); }
;
condition-reason
: DOTU { $$ = 'u'; }
| DOTV { $$ = 'v'; }
;
condition-main
: money-conditionWT
| have-conditionWT
| moves-conditionWT
| flag-conditionWT
;
have-conditionWT
: PERCENT_SLASH opt-bang ID
{ $$ = MkCondID($1, $2, $3) ; expect(EXP_TEXT); }
;
opt-bang
: %empty { $$ = TRUE; }
| BANG { $$ = FALSE; }
;
ID: WORD
;
Things in all caps are terminal symbols, things in lower or mixed case are non-terminals. If a non-terminal ends in WT, then it "wants text". That is, it expects that what comes after it may be arbitrary text.
Background: I have written my own token recognizer in C++ because(*) I want the syntax to be able to change the way the lexer's behavior. Two types of tokens should be matched only when the syntax expects them: FILENAME (with slashes and other non-alphameric characters) and TEXT, which means "all the text from here to the end of the line" (but not starting with certain keywords).
The function "expect" tells the lexer when to look for these two symbols. The expectation is reset to EXP_NORMAL after each token is returned.
I have added code to yylex that prints out the tokens as it recognizes them, and it looks to me like the tokenizer is working properly -- returning the tokens I expect.
(*) Also because I want to be able to ask the tokenizer for the column where the error occurred, and get the contents of the line being scanned at the time so I can print out a more useful error message.
Here is the relevant part of the input:
.c Wendy wendy
OK, now you caught me, what do you want to do with me?
.u %/lasso You won't catch me like that.
[
Here is the last part of the debugging output from yylex:
token: 262: DOTC/
token: 289: WORD/Wendy
token: 289: WORD/wendy
token: 292: TEXT/OK, now you caught me, what do you want to do with me?
token: 286: DOTU/
token: 274: PERCENT_SLASH/%/
token: 289: WORD/lasso
token: 292: TEXT/You won't catch me like that.
token: 269: LEFTBRACKET/
here's my error message:
: line 124, columns 3-4: syntax error, unexpected LEFTBRACKET, expecting TEXT
[
To help you understand the equations above, here is the relevant part of the description of the input syntax that I wrote the yacc code from.
// Character:
// .c id charactername,[imagename,[animationname]]
// description-text
// .u condition on the character being usable [optional]
// .v condition on the character being visible [optional]
// [
// (options)
// ]
// Conditions:
// %$[-]n Must [not] have at least n dollars
// %/[-]name Must [not] have named thing
// %t-nnn At/before specified number of moves
// %t+nnn At/after specified number of moves
// %#[-]name named flag must [not] be set
// Condition-char: $, /, t, or #, as described above
//
// Condition:
// % condition-char (identifier/int) ['/' text-if-fail ]
// description-text: Can be either on-line text or multi-line text
// On-line text is the rest of the line
brackets mark optional non-terminals, but a bracket standing alone (represented by LEFTBRACKET and RIGHTBRACKET in the yacc) is an actual token, e.g.
// [
// (options)
// ]
above.
What am I doing wrong?
To debug parsing problems in your grammar, you need to understand the shift/reduce machine that yacc/bison produces (described in the .output file produced with the -v option), and you need to look at the trail of states that the parser goes through to reach the problem you see.
To enable debugging code in the parser (which can print the states and the shift and reduce actions as they occur), you need to compile with -DYYDEBUG or put #define YYDEBUG 1 in the top of your grammar file. The debugging code is controlled by the global variable yydebug -- set to non-zero to turn on the trace and zero to turn it off. I often use the following in main:
#ifdef YYDEBUG
extern int yydebug;
if (char *p = getenv("YYDEBUG"))
yydebug = atoi(p);
#endif
Then you can include -DYYDEBUG in your compiler flags for debug builds and turn on the debugging code by something like setenv YYDEBUG 1 to set the envvar prior to running your program.
I suppose your syntax error message was generated by bison. What is striking is that it claims to have found a LEFTBRACKET when it expects a [. Naively, you might expect it to be satisfied with the LEFTBRACKET it found, but of course bison knows nothing about LEFTBRACKET except its numeric value, which will be some integer larger than 256.
The only reason bison might expect [ is if your grammar includes the terminal '['. But since your scanner seems to return LEFTBRACKET when it sees a [, the parser will never see '['.