Building a regex with sub in Perl 6 - raku

After learning how to pass regexes as arguments, I've tried to build my first regex using a sub, and I'm stuck once more. Sorry for the complex rules below, I've made my best to simplify them. I need at least some clues how to approach this problem.
The regex should consist of alternations, each of them consisting of left, middle and right, where left and right should come in pairs and the variant of middle depends on which right is chosen.
An array of Pairs contains pairs of left and right:
my Pair #leftright =
A => 'a',
...
Z => 'z',
;
Middle variants are read from a hash:
my Regex %middle =
z => / foo /,
a => / bar /,
m => / twi /,
r => / bin /,
...
;
%middle<z> should be chosen if right is z, %middle<a> — if right is a, etc.
So, the resulting regex should be
my token word {
| A <%middle[a]> a
| Z <%middle[z]> z
| ...
}
or, more generally
my token word {
| <left=#leftright[0].key>
<middle=%middle{#leftright[0].value}>
<right=#leftright[0].value>
| (the same for index == 1)
| (the same for index == 2)
| (the same for index == 3)
...
}
and it should match Abara and Zfooz.
How to build token word (which can be used e.g. in a grammar) with a sub that will take every pair from #leftright, put the suitable %middle{} depending on the value of right and then combine it all into one regex?
my Regex sub sub_word(Pair #l_r, Regex %m) {
...
}
my token word {
<{sub_word(#leftright, %middle)}>
}
After the match I need to know the values of left, middle, and right:
"Abara" ~~ &word;
say join '|', $<left>, $<middle>, $<right> # A|bar|a

I was not able to do this using token yet, but here is a solution with EVAL and Regex (and also I am using %middle as a hash of Str and not a hash of Regex):
my Regex sub build_pattern (%middle, #leftrigth) {
my $str = join '|', #leftright.map(
{join ' ',"\$<left>='{$_.key}'", "\$<middle>='{%middle{$_.value}}'", "\$<right>='{$_.value}'"});
);
my Regex $regex = "rx/$str/".EVAL;
return $regex;
}
my Regex $pat = build_pattern(%middle, #leftright);
say $pat;
my $res = "Abara" ~~ $pat;
say $res;
Output:
rx/$<left>='A' $<middle>='bar' $<right>='a'|$<left>='Z' $<middle>='foo' $<right>='z'/
「Abara」
left => 「A」
middle => 「bar」
right => 「a」
For more information on why I chose to use EVAL, see How can I interpolate a variable into a Perl 6 regex?

Related

Why can't pass a manually created Pair to method without a slip?

:5hours is a Pair, hours => 5 is also a Pair:
> DateTime.now.truncated-to('day').later(:5hours)
2022-02-14T05:00:00+08:00
> :5hours.WHAT
(Pair)
> DateTime.now.truncated-to('day').later(hours => 5)
2022-02-14T05:00:00+08:00
> (hours => 5).WHAT
(Pair)
However, when I create a Pair manually, it doesn't match the signatures of later:
> DateTime.now.truncated-to('day').later(Pair.new('hours', 5))
Cannot resolve caller later(DateTime:D: Pair:D); none of these signatures match:
(Dateish:D: *%unit --> Dateish:D)
(Dateish:D: #pairs, *%_)
in block <unit> at <unknown file> line 1
But use a vertical before the Pair parameter is ok:
> DateTime.now.truncated-to('day').later(|Pair.new('hours', 5))
2022-02-14T05:00:00+08:00
So what's the difference between :5hours, Pair.new('hours', 5) and hours => 5? Why can't pass a manually created Pair such as Pair.new('hours', 5) to later method?
Aren't the following two the same thing, right?
> :5hours === Pair.new('hours', 5) === hours => 5
True
> :5hours eqv Pair.new('hours', 5) eqv hours => 5
True
> my $pair1 = Pair.new('hours', 5); dd $pair1; # Pair $pair1 = :hours(5)
> my $pair2 = :5hours; dd $pair2; # Pair $pair2 = :hours(5)
> my $pair3 = hours => 5; dd $pair3; # Pair $pair3 = :hours(5)
> my $pair4 = 'hours' => 5; dd $pair4; # Pair $pair4 = :hours(5)
Although :5hours and hours => 5 and :hours(5) and Pair.new(hours,5) and Pair.new(key => "hours", value => 5) are all different ways to create a Pair object, only the first three are syntactic sugar to indicate a named argument.
When you pass Pair.new("hours",5) as an argument, it is considered to be a Positional argument. Observe:
sub foo(*#_, *%_) {
dd #_, %_
}
foo hours => 5;
# []
# {:hours(5)}
foo Pair.new("hours",5);
# [:hours(5)]
# {}
As to why this is this way? Well, sometimes you want to pass a Pair as a positional argument. If a Pair was always considered to be a named argument, you wouldn't be able to do so.
As to why |Pair.new("hours",5) works as a named argument? The | in this context, flattens the given object (which is usually a Capture or a Hash/Map) into the arguments to the given subroutine. The Pair in this case, is seen as a degenerate case of a Map: an immutable Map with a single key / value. Observe:
foo |Pair.new("hours",5);
# []
# {:hours(5)}
Well, probably any Associative :-)
say Pair ~~ Associative; # True
.say for (:5hours).keys; # hours
.say for (:5hours).values; # 5
Finally, the | in this context is technically not a Slip, but syntactic sugar to flatten the given value into the arguments of a call.
The syntax predates the concept of a Slip (which was introduced in 2015 during the Great List Refactor). It was only very late in 2015 that | was OK'd by #Larry to also be used to indicate a Slip, as they conceptually do similar things.

Perl6 Regex Match Num

I would like to match any Num from part of a text string. So far, this (stolen from from https://docs.perl6.org/language/regexes.html#Best_practices_and_gotchas) does the job...
my token sign { <[+-]> }
my token decimal { \d+ }
my token exponent { 'e' <sign>? <decimal> }
my regex float {
<sign>?
<decimal>?
'.'
<decimal>
<exponent>?
}
my regex int {
<sign>?
<decimal>
}
my regex num {
<float>?
<int>?
}
$str ~~ s/( <num>? \s*) ( .* )/$1/;
This seems like a lot of (error prone) reinvention of the wheel. Is there a perl6 trick to match built in types (Num, Real, etc.) in a grammar?
If you can make reasonable assumptions about the number, like that it's delimited by word boundaries, you can do something like this:
regex number {
« # left word boundary
\S+ # actual "number"
» # right word boundary
<?{ defined +"$/" }>
}
The final line in this regex stringifies the Match ("$/"), and then tries to convert it to a number (+). If it works, it returns a defined value, otherwise a Failure. This string-to-number conversion recognizes the same syntax as the Perl 6 grammar. The <?{ ... }> construct is an assertion, so it makes the match fail if the expression on the inside returns a false value.

Why doesn't Perl 6's colon pair and name interpolation work together?

I was playing around with Interpolating into names. I was mostly interested in this colon syntax feature to turn a variable into a pair where the identifier is the key.
my %Hamadryas = map { slip $_, 0 }, <
februa
honorina
velutina
>;
{
my $pair = :%Hamadryas;
say $pair; # Hamadryas => { ... }
}
put '-' x 50;
But, just for giggles, I wanted to try it with variable name interpolation too. I know this is stupid because if I know the name I don't need the colon syntax to get it. But, I also thought that it should work by accident:
{
my $name = 'Hamadryas';
# Since I already have the name, I could just:
# my $pair = $name => %::($name)
# But, couldn't I just line up the syntax?
my $pair = :%::($name); # does not work
say $pair;
}
Why doesn't that :%::($name) syntax work? That's more a question of when the parser decides that it's not parsing something it wants to understand. I figured it would see the : and start processing a colon pair, then see the % and know it had a hash, even though there's the :: after the %.
Is there a way to make it work with tricks and grammar mutations?

Should there be an indicesWhere method on Scala's List class?

Scala's List classes have indexWhere methods, which return a single index for a List element which matches the supplied predicate (or -1 if none exists).
I recently found myself wanting to gather all indices in a List which matched a given predicate, and found myself writing an expression like:
list.zipWithIndex.filter({case (elem, _) => p(elem)}).map({case (_, index) => index})
where p here is some predicate function for selecting matching elements. This seems a bit of an unwieldy expression for such a simple requirement (but I may be missing a trick or two).
I was half expecting to find an indicesWhere function on List which would allow me to write instead:
list.indicesWhere(p)
Should something like this be part of the Scala's List API, or is there a much simpler expression than what I've shown above for doing the same thing?
Well, here's a shorter expression that removes some of the syntactic noise you have in yours (modified to use Travis's suggestion):
list.zipWithIndex.collect { case (x, i) if p(x) => i }
Or alternatively:
for ((x,i) <- list.zipWithIndex if p(x)) yield i
But if you use this frequently, you should just add it as an implicit method:
class EnrichedWithIndicesWhere[T, CC[X] <: Seq[X]](xs: CC[T]) {
def indicesWhere(p: T => Boolean)(implicit bf: CanBuildFrom[CC[T], Int, CC[Int]]): CC[Int] = {
val b = bf()
for ((x, i) <- xs.zipWithIndex if p(x)) b += i
b.result
}
}
implicit def enrichWithIndicesWhere[T, CC[X] <: Seq[X]](xs: CC[T]) = new EnrichedWithIndicesWhere(xs)
val list = List(1, 2, 3, 4, 5)
def p(i: Int) = i % 2 == 1
list.indicesWhere(p) // List(0, 2, 4)
You could use unzip to replace the map:
list.zipWithIndex.filter({case (elem, _) => p(elem)}).unzip._2

simple math expression parser

I have a simple math expression parser and I want to build the AST by myself (means no ast parser). But every node can just hold two operands. So a 2+3+4 will result in a tree like this:
+
/ \
2 +
/ \
3 4
The problem is, that I am not able to get my grammer doing the recursion, here ist just the "add" part:
add returns [Expression e]
: op1=multiply { $e = $op1.e; Print.ln($op1.text); }
( '+' op2=multiply { $e = new AddOperator($op1.e, $op2.e); Print.ln($op1.e.getClass(), $op1.text, "+", $op2.e.getClass(), $op2.text); }
| '-' op2=multiply { $e = null; } // new MinusOperator
)*
;
But at the end of the day this will produce a single tree like:
+
/ \
2 4
I know where the problem is, it is because a "add" can occour never or infinitly (*) but I do not know how to solve this. I thought of something like:
"add" part:
add returns [Expression e]
: op1=multiply { $e = $op1.e; Print.ln($op1.text); }
( '+' op2=(multiply|add) { $e = new AddOperator($op1.e, $op2.e); Print.ln($op1.e.getClass(), $op1.text, "+", $op2.e.getClass(), $op2.text); }
| '-' op2=multiply { $e = null; } // new MinusOperator
)?
;
But this will give me a recoursion error. Any ideas?
I don't have the full grammar to test this solution, but consider replacing this (from the first add rule in the question):
$e = new AddOperator($op1.e, $op2.e);
With this:
$e = new AddOperator($e, $op2.e); //$e instead of $op1.e
This way each iteration over ('+' multiply)* extends e rather than replaces it.
It may require a little playing around to get it right, or you may need a temporary Expression in the rule to keep things managed. Just make sure that the last expression created by the loop is somewhere on the right-hand side of the = operator, as in $e = new XYZ($e, $rhs.e);.