Perl 6 regex variable and capturing groups - raku

When I make a regex variable with capturing groups, the whole match is OK, but capturing groups are Nil.
my $str = 'nn12abc34efg';
my $atom = / \d ** 2 /;
my $rgx = / ($atom) \w+ ($atom) /;
$str ~~ / $rgx / ;
say ~$/; # 12abc34
say $0; # Nil
say $1; # Nil
If I modify the program to avoid $rgx, everything works as expected:
my $str = 'nn12abc34efg';
my $atom = / \d ** 2 /;
my $rgx = / ($atom) \w+ ($atom) /;
$str ~~ / ($atom) \w+ ($atom) /;
say ~$/; # 12abc34
say $0; # 「12」
say $1; # 「34」

With your code, the compiler gives the following warning:
Regex object coerced to string (please use .gist or .perl to do that)
That tells us something is wrong—regex shouldn't be treated as strings. There are two more proper ways to nest regexes. First, you can include sub-regexes within assertions(<>):
my $str = 'nn12abc34efg';
my Regex $atom = / \d ** 2 /;
my Regex $rgx = / (<$atom>) \w+ (<$atom>) /;
$str ~~ $rgx;
Note that I'm not matching / $rgx /. That is putting one regex inside another. Just match $rgx.
The nicer way is to use named regexes. Defining atom and the regex as follows will let you access the match groups as $<atom>[0] and $<atom>[1]:
my regex atom { \d ** 2 };
my $rgx = / <atom> \w+ <atom> /;
$str ~~ $rgx;

The key observation is that $str ~~ / $rgx /; is a "regex inside of a regex". $rgx matched as it should and set $0 and $1 within it's own Match object, but then there was no where within the surrounding match object to store that information, so you couldn't see it. Maybe it's clear with an example, try this:
my $str = 'nn12abc34efg';
my $atom = / \d ** 2 /;
my $rgx = / ($atom) \w+ ($atom) /;
$str ~~ / $0=$rgx /;
say $/;
Note the contents of $0. Or as another example, let's give it a proper name:
my $str = 'nn12abc34efg';
my $atom = / \d ** 2 /;
my $rgx = / ($atom) \w+ ($atom) /;
$str ~~ / $<bits-n-pieces>=$rgx /;
say $/;

Related

How to flatten or stringify an object (esp. Match)?

How do we flatten or stringify Match (or else) object to be string data type (esp. in multitude ie. as array elements)? e.g.
'foobar' ~~ m{ (foo) };
say $0.WHAT;
my $foo = $0;
say $foo.WHAT
(Match)
(Match)
How to end up with (Str)?
~ is the Str contextualizer:
'foobar' ~~ m{ (foo) };
say ~$0
will directly coerce it to a Str. You can use that if you have many matches, i. e.:
'foobar' ~~ m{ (f)(o)(o) };
say $/.map: ~*; # (f o o)
Just treat the objects as if they were strings.
If you apply a string operation to a value/object Raku will almost always just automatically coerce it to a string.
String operations include functions such as print and put, operators such as infix eq and ~ concatenation, methods such as .starts-with or .chop, interpolation such as "A string containing a $variable", and dedicated coercers such as .Str and Str(...).
A Match object contains an overall match. Any "children" (sub-matches) are just captures of substrings of that overall match. So there's no need to flatten anything because you can just deal with the single overall match.
A list of Match objects is a list. And a list is itself an object. If you apply a string operation to a list, you get the elements of the list stringified with a space between each element.
So:
'foobar' ~~ m{ (f) (o) (o) };
put $/; # foo
put $/ eq 'foo'; # True
put $/ ~ 'bar'; # foobar
put $/ .chop; # fo
put "[$/]"; # [foo]
put $/ .Str; # foo
my Str() $foo = $/;
say $foo.WHAT; # (Str)
put 'foofoo' ~~ m:g{ (f) (o) (o) }; # foo foo
The constructor for Str takes any Cool value as argument, including a regex Match object.
'foobar' ~~ m{ (foo) };
say $0.WHAT; # (Match)
say $0.Str.WHAT; # (Str)

How to pass params to token referenced by variable?

I can easily use token signatures by using token name directly:
my token t ( $x ) { $x };
'axb' ~~ / 'a' <t: 'x'> 'b' /; # match
'axb' ~~ / 'a' <t( 'x' )> 'b' /; # match
However I haven't found a way to do this, when token is stored in variable:
my $t = token ( $x ) { $x };
'axb' ~~ / 'a' <$t: 'x'> 'b' /;
'axb' ~~ / 'a' <$t( 'x' )> 'b' /;
Both give:
===SORRY!=== Error while compiling ...
Unable to parse expression in metachar:sym<assert>; couldn't find final '>'
What is the magic syntax to do that?
BTW: I've even browsed Raku test suite and it does not include such case in roast/S05-grammar/signatures.t.
Place an & before the variable:
my $t = token ( $x ) { $x };
say 'axb' ~~ / 'a' <&$t: 'x'> 'b' /;
say 'axb' ~~ / 'a' <&$t( 'x' )> 'b' /;
The parser looks for the &, and then delegates to the Raku variable parse rule, which will happily parse a contextualizer like this.
Either:
Use the solution in jnthn's answer to let Raku explicitly know you wish to use your $ sigil'd token variable as a Callable.
Declare the variable as explicitly being Callable in the first place and make the corresponding change in the call:
my &t = token ( $x ) { $x };
say 'axb' ~~ / 'a' <&t: 'x'> 'b' /; # 「axb」
say 'axb' ~~ / 'a' <&t( 'x' )> 'b' /; # 「axb」

How to match the same number of different atoms in Perl 6 regex?

Should be very simple, but I can't cope with it.
I want to match exactly the same number of as as bs. So, the following
my $input = 'aaabbbb';
$input ~~ m:ex/ ... /;
should produce:
aaabbb
aabb
ab
UPD: The following variants don't work, perhaps because of the :ex bug , mentioned in #smls's answer (but more likely because I made some mistakes?):
> my $input = "aaabbbb";
> .put for $input ~~ m:ex/ (a) * (b) * <?{ +$0 == +$1 }> /;
Nil
> .put for $input ~~ m:ex/ (a) + (b) + <?{+$0 == +$1}> /;
Nil
This one, with :ov and ?, works:
> my $input = "aaabbbb";
> .put for $input ~~ m:ov/ (a)+ (b)+? <?{+$0 == +$1}> /;
aaabbb
aabb
ab
UPD2: The following solution works with :ex as well, but I had to do it without <?...> assertion.
> $input = 'aaabbbb'
> $input ~~ m:ex/ (a) + (b) + { put $/ if +$0 == +$1 } /;
aaabbb
aabb
ab
my $input = "aaabbbb";
say .Str for $input ~~ m:ov/ (a)+ b ** {+$0} /;
Output:
aaabbb
aabb
ab
It's supposed to work with :ex instead of :ov, too - but Rakudo bug #130711 currently prevents that.
my $input = "aaabbbb";
say .Str for $input ~~ m:ov/ a <~~>? b /;
Works with ex too
my $input = "aaabbbb";
say .Str for $input ~~ m:ex/ a <~~>? b /;
Upd: explanation
<~~> means call myself recursively see Extensible metasyntax. (It is not yet fully implemented.)
Following (longer, but maybe clearer) example works too:
my $input = "aaabbbb";
my token anbn { a <&anbn>? b}
say .Str for $input ~~ m:ex/ <&anbn> /;

Combining regexes using a loop in Perl 6

Here I make a regex manually from Regex elements of an array.
my Regex #reg =
/ foo /,
/ bar /,
/ baz /,
/ pun /
;
my $r0 = #reg[0];
my $r1 = #reg[1];
my Regex $r = / 0 $r0 | 1 $r1 /;
"0foo_1barz" ~~ m:g/<$r>/;
say $/; # (「0foo」 「1bar」)
How to do it with for #reg {...}?
If a variable contains a regex, you can use it without further ado inside another regex.
The second trick is to use an array variable inside a regex, which is equivalent to the disjunction of the array elements:
my #reg =
/foo/,
/bar/,
/baz/,
/pun/
;
my #transformed = #reg.kv.map(-> $i, $rx { rx/ $i $rx /});
my #match = "0foo_1barz" ~~ m:g/ #transformed /;
.say for #match;
my #reg =
/foo/,
/bar/,
/baz/,
/pun/
;
my $i = 0;
my $reg = #reg
.map({ $_ = .perl; $_.substr(1, $_.chars - 2); })
.map({ "{$i++}{$_}" })
.join('|');
my #match = "foo", "0foo_1barz" ~~ m:g/(<{$reg}>) /;
say #match[1][0].Str;
say #match[1][1].Str;
# 0foo
# 2baz
See the docs
Edit: Actually read the docs myself. Changed implicit eval to $() construct.
Edit: Rewrote answer to something that actually works
Edit: Changed answer to a terrible, terrible hack

How to interpolate variables into Perl 6 regex character class?

I want to make all the consonants in a word uppercase:
> my $word = 'camelia'
camelia
> $word ~~ s:g/<-[aeiou]>/{$/.uc}/
(「c」 「m」 「l」)
> $word
CaMeLia
To make the code more general, I store the list of all the consonants in a string variable
my $vowels = 'aeiou';
or in an array
my #vowels = $vowels.comb;
How to solve the original problem with $vowels or #vowels variables?
Maybe the trans method would be more appropriate than the subst sub or operator.
Try this:
my $word = "camelia";
my #consonants = keys ("a".."z") (-) <a e i o u>;
say $word.trans(#consonants => #consonants>>.uc);
# => CaMeLia
With the help of moritz's explanation, here is the solution:
my constant $vowels = 'aeiou';
my regex consonants {
<{
"<-[$vowels]>"
}>
}
my $word = 'camelia';
$word ~~ s:g/<consonants>/{$/.uc}/;
say $word; # CaMeLia
You can use <!before …> along with <{…}>, and . to actually capture the character.
my $word = 'camelia';
$word ~~ s:g/
<!before # negated lookahead
<{ # use result as Regex code
$vowel.comb # the vowels as individual characters
}>
>
. # any character (that doesn't match the lookahead)
/{$/.uc}/;
say $word; # CaMeLia
You can do away with the <{…}> with #vowels
I think it is also important to realize you can use .subst
my $word = 'camelia';
say $word.subst( :g, /<!before #vowels>./, *.uc ); # CaMeLia
say $word; # camelia
I would recommend storing the regex in a variable instead.
my $word = 'camelia'
my $vowel-regex = /<-[aeiou]>/;
say $word.subst( :g, $vowel-regex, *.uc ); # CaMeLia
$word ~~ s:g/<$vowel-regex>/{$/.uc}/;
say $word # CaMeLia