parse string with pairs of values into hash the Perl6 way - raku

I have a string which looks like that:
width=13
height=15
name=Mirek
I want to turn it into hash (using Perl 6). Now I do it like that:
my $text = "width=13\nheight=15\nname=Mirek";
my #lines = split("\n", $text);
my %params;
for #lines {
(my $k, my $v) = split('=', $_);
%params{$k} = $v;
}
say %params.perl;
But I feel there should exist more concise, more idiomatic way to do that. Is there any?

In Perl, there's generally more than one way to do it, and as your problem involves parsing, one solution is, of course, regexes:
my $text = "width=13\nheight=15\nname=Mirek";
$text ~~ / [(\w+) \= (\N+)]+ %% \n+ /;
my %params = $0>>.Str Z=> $1>>.Str;
Another useful tool for data extraction is comb(), which yields the following one-liner:
my %params = $text.comb(/\w+\=\N+/)>>.split("=").flat;
You can also write your original approach that way:
my %params = $text.split("\n")>>.split("=").flat;
or even simpler:
my %params = $text.lines>>.split("=").flat;
In fact, I'd probably go with that one as long as your data format does not become any more complex.

If you have more complex data format, you can use grammar.
grammar Conf {
rule TOP { ^ <pair> + $ }
rule pair {<key> '=' <value>}
token key { \w+ }
token value { \N+ }
token ws { \s* }
}
class ConfAct {
method TOP ($/) { make (%).push: $/.<pair>».made}
method pair ($/) { make $/.<key>.made => $/.<value>.made }
method key ($/) { make $/.lc }
method value ($/) { make $/.trim }
}
my $text = " width=13\n\theight = 15 \n\n nAme=Mirek";
dd Conf.parse($text, actions => ConfAct.new).made;

Related

Passing data to form grammar rules in Perl 6

Not sure whether grammars are meant to do such things: I want tokens to be defined in runtime (in future — with data from a file). So I wrote a simple test code, and as expected it wouldn't even compile.
grammar Verb {
token TOP {
<root>
<ending>
}
token root {
(\w+) <?{ ~$0 (elem) #root }>
}
token ending {
(\w+) <?{ ~$0 (elem) #ending }>
}
}
my #root = <go jump play>;
my #ending = <ing es s ed>;
my $string = "going";
my $match = Verb.parse($string);
.Str.say for $match<root>;
What's the best way of doing such things in Perl 6?
To match any of the elements of an array, just write the name of the array variable (starting with a # sigil) in the regex:
my #root = <go jump play>;
say "jumping" ~~ / #root /; # Matches 「jump」
say "jumping" ~~ / #root 'ing' /; # Matches 「jumping」
So in your use-case, the only tricky part is passing the arrays from the code that creates them (e.g. by parsing data files), to the grammar tokens that need them.
The easiest way would probably be to make them dynamic variables (signified by the * twigil):
grammar Verb {
token TOP {
<root>
<ending>
}
token root {
#*root
}
token ending {
#*ending
}
}
my #*root = <go jump play>;
my #*ending = <ing es s ed>;
my $string = "going";
my $match = Verb.parse($string);
say $match<root>.Str;
Another way would be to pass a Capture with the arrays to the args adverb of method .parse, which will pass them on to token TOP, from where you can in turn pass them on to the sub-rules using the <foo(...)> or <foo: ...> syntax:
grammar Verb {
token TOP (#known-roots, #known-endings) {
<root: #known-roots>
<ending: #known-endings>
}
token root (#known) {
#known
}
token ending (#known) {
#known
}
}
my #root = <go jump play>;
my #ending = <ing es s ed>;
my $string = "going";
my $match = Verb.parse($string, args => \(#root, #ending));
say $match<root>.Str; # go
The approach you were taking could have worked but you made three mistakes.
Scoping
Lexical variable declarations need to appear textually before the compiler encounters their use:
my $foo = 42; say $foo; # works
say $bar; my $bar = 42; # compile time error
Backtracking
say .parse: 'going' for
grammar using-token {token TOP { \w+ ing}}, # Nil
grammar using-regex-with-ratchet {regex TOP {:ratchet \w+ ing}}, # Nil
grammar using-regex {regex TOP { \w+ ing}}; # 「going」
The regex declarator has exactly the same effect as the token declarator except that it defaults to doing backtracking.
Your first use of \w+ in the root token matches the entire input 'going', which then fails to match any element of #root. And then, because there's no backtracking, the overall parse immediately fails.
(Don't take this to mean that you should default to using regex. Relying on backtracking can massively slow down parsing and there's typically no need for it.)
Debugging
See https://stackoverflow.com/a/19640657/1077672
This works:
my #root = <go jump play>;
my #ending = <ing es s ed>;
grammar Verb {
token TOP {
<root>
<ending>
}
regex root {
(\w+) <?{ ~$0 (elem) #root }>
}
token ending {
(\w+) <?{ ~$0 (elem) #ending }>
}
}
my $string = "going";
my $match = Verb.parse($string);
.Str.say for $match<root>;
outputs:
go

How can I pass arguments to a Perl 6 grammar?

In Edit distance: Ignore start/end, I offered a Perl 6 solution to a fuzzy fuzzy matching problem. I had a grammar like this (although maybe I've improved it after Edit #3):
grammar NString {
regex n-chars { [<.ignore>* \w]**4 }
regex ignore { \s }
}
The literal 4 itself was the length of the target string in the example. But the next problem might be some other length. So how can I tell the grammar how long I want that match to be?
Although the docs don't show an example or using the $args parameter, I found one in S05-grammar/example.t in roast.
Specify the arguments in :args and give the regex an appropriate signature. Inside the regex, access the arguments in a code block:
grammar NString {
regex n-chars ($length) { [<.ignore>* \w]**{ $length } }
regex ignore { \s }
}
class NString::Actions {
method n-chars ($/) {
put "Found $/";
}
}
my $string = 'The quick, brown butterfly';
loop {
state $from = 0;
my $match = NString.subparse(
$string,
:rule('n-chars'),
:actions(NString::Actions),
:c($from++),
:args( \(5) )
);
last unless ?$match;
}
I'm still not sure about the rules for passing the arguments though. This doesn't work:
:args( 5 )
I get:
Too few positionals passed; expected 2 arguments but got 1
This works:
:args( 5, )
But that's enough thinking about this for one night.

Can I insert named captures in the Match tree without actually matching anything?

I was curious if I could insert things into the Match tree without actually anything. There's no associated problem I'm trying to solve.
In this example, I have a token market that checks that its match is a key in the hash. I was trying to then insert the value of that hash into the match tree somehow. I figured I could have a token that always matches, long_market_string, and then look into the tree somehow to see what market had matched.
grammar OrderNumber::Grammar {
token TOP {
<channel> <product> <market> <long_market_string> '/' <revision>
}
token channel { <[ M F P ]> }
token product { <[ 0..9 A..Z ]> ** 4 }
token market {
(<[ A..Z ]>** 1..2) <?{ %Market_Shortcode{$0}:exists }>
}
# this should figure out what market matched
# I don't particularly care how this happens as long as
# I can insert this into the match tree
token long_market_string { <?> }
token revision { <[ A..C ]> }
}
Is there some way to mess with the Match tree as it is being created?
I could do something that inverts things:
grammar AppleOrderNumber::Grammar {
token TOP {
<channel> <product> <long_market_string> '/' <revision>
}
token channel { <[ M F P ]> }
token product { <[ 0..9 A..Z ]> ** 4 }
token market {
(<[ A..Z ]>** 1..2) <?{ %Market_Shortcode{$0}:exists }>
}
token long_market_string { <market> }
token revision { <[ A..C ]> }
}
But, that handles that case. I'm more interested in inserting an arbitrary number of things.
Tokens are a type of method, so if you wrote a method that did all of the setup work that a token does for you, you could do almost anything.
This is not specced, and is currently not easy.
( I only have a vague idea of where to start looking in the source code to figure it out )
What you can do easily is add to the .made/.ast of the result
( .made and .ast are synonyms )
$/ = grammar {
token TOP {
.*
{
make 'World'
}
}
}.parse('Hello');
say "$/ $/.made()"; # Hello World
It doesn't even have to be inside of a grammar
'asdf' ~~ /{make 42}/;
say $/; # 「」
say $/.made # 42
Most of the time you would use an Actions class for this type of thing
grammar example-grammar {
token TOP {
[ <number> | <word> ]+ % \s*
}
token word {
<.alpha>+
}
token number {
\d+
{ make +$/ }
}
}
class example-actions {
method TOP ($/) { make $/.pairs.map:{ .key => .value».made} }
method number ($/) { #`( already done in grammar, so this could be removed ) }
method word ($/) { make ~$/ }
}
.say for example-grammar.parse(
'Hello 123 World',
:actions(example-actions)
).made».perl
# :number([123])
# :word(["Hello", "World"])
It sounds like you want to subvert the match tree into doing something the match tree isn't really supposed to do. The match tree tracks what substrings were matched where in the input string, not arbitrary data generated by the parser. If you want to track arbitrary data, what's wrong with the AST tree?
Sure, in one sense the AST tree has to mirror the parse tree, since it's constructed in a bottom-up fashion as the match methods complete successfully. But the AST itself, in the sense of "the object attached to any given node" is not so restricted. Consider for example:
grammar G {
token TOP { <foo> <bar> {make "TOP is: " ~ $<foo> ~ $<bar>} }
token foo { foo {make "foo"} }
token bar { bar {make "bar"} }
}
G.parse("foobar");
Here $/.made will simply be the string "TOP is: foobar" while the match tree has child nodes with the components that were used to construct the top-level AST. If then return to your example, we can make it:
grammar G {
my %Market_Shortcode = :AA('Double A');
token TOP {
<channel> <product> <market>
{} # Force the computation of the $/ object. Note that this will also terminate LTM here.
<long_market_string(~$<market>)> '/' <revision>
}
token channel { <[ M F P ]> }
token product { <[ 0..9 A..Z ]> ** 4 }
token market {
(<[ A..Z ]>** 1..2) <?{ %Market_Shortcode{$0}:exists }>
}
token long_market_string($shortcode) { <?> { say 'c='~$shortcode; make %Market_Shortcode{$shortcode} } }
token revision { <[ A..C ]> }
}
G.parse('M0000AA/A');
$<long_market_string>.ast will now be 'Double A'. Of course, I'd probably dispense with token long_market_name and just make the AST of token market whatever is in %Market_Shortcode (or a Market object with both short and long name, if you want to track both at once).
A less trivial example of this kind of thing would be something like a grammar of Python. Since Python's block level structure is line-based, your grammar (and thus match tree) need to reflect this in some way. But you can also chain several simple statements together on a single line by separating them with semi-colons. Now, you'll probably want the AST of a block to be a list of statements, while the AST of a single line may itself be a list of several statements. Thus you'd construct the AST of the block by (for example) flatmaping together the list of the lines (or something along those lines, depending on how you represent block statements like if and while).
Now, if you really, really, really want to do nasty things to the match tree I'm pretty sure it can be done, of course. You'll have to implement the parsing code yourself with method long_market_name, the API for which is undocumented and internal, and will likely involve at least some dropping down into nqp::ops. The stuff pointed to here will probably be useful. Other relevant files are src/core/{Match,Cursor}.pm in the Rakudo repo. Note also that the stringification of Matches is computed by extracting the matched substring from the input string, so if you want it to stringify usefully, you'll have to subclass Match.

Lex: Gather all text not defined in rules

I'm trying to gather all text that is not defined by a previous rule into a string and prefix it with a formatting string using lex. I'm wondering if there's a standard way of doing this.
For example, say I have the rules:
word1|word2|word3|word4 {printf("%s%s", "<>", yytext);}
[0-9]+ {printf("%s%s", "{}", yytext);}
everything else {printf("%s%s", "[]", yytext);}
And I attempt to lex the string:
word1 this is some other text ; word2 98 foo bar .
I would want this to produce the following when run through the lexer:
<>word1[] this is some other text ; <>word2[] {}98[] foo bar .
I attempted to do this using states, but realize I can't determine when to stop the check, like:
%x OTHER
%%
. {yymore(); BEGIN OTHER;}
<OTHER>.|\n yymore();
<OTHER>how to determine when to end? {printf("%s%s", "[]", yytex); BEGIN INITIAL;}
What is a good way to do this? Is there someway to continue as long as another rule isn't met?
AFAIK, there is no "standard" solution, but a simple one is to keep a bit of context (the prefix last printed) and use that to decide whether or not to print a new prefix. For example, you could use a custom printer like this:
enum OutputType { NO_TOKEN = 0, WORD, NUMBER, OTHER };
void print_with_prefix(enum OutputType type, const char* token) {
static enum OutputType prev = NO_TOKEN;
const char* prefix = "";
switch (type) {
case WORD: prefix = "<>"; break;
case NUMBER: prefix = "{}"; break;
case OTHER: if (prev != OTHER) prefix = "[]"; break;
default: assert(false);
}
prev = type;
printf("%s%s", prefix, token);
}
Then you just need to change the calls to printf to invoke print_with_prefix instead (and, as written, to supply an enum value instead of a string).
For the OTHER case, you then don't need to do anything special to accumulate the token. Just
. { print_with_prefix(OTHER, yytext); }
(I'm skating over the handling of whitespace and newlines, but it's just conceptual.)

perl6/rakudo: Unable to parse postcircumfix:sym<( )>

Why do I get this error-message?
#!perl6
use v6;
my #a = 1..3;
my #b = 7..10;
my #c = 'a'..'d';
for zip(#a;#b;#c) -> $nth_a, $nth_b, $nth_c { ... };
# Output:
# ===SORRY!===
# Unable to parse postcircumfix:sym<( )>, couldn't find final ')' at line 9
Rakudo doesn't implement the lol ("list of lists") form yet, and so cannot parse #a;#b;#c. For the same reason, zip doesn't have a form which takes three lists yet. Clearly the error message is less than awesome.
There isn't really a good workaround yet, but here's something that will get the job done:
sub zip3(#a, #b, #c) {
my $a-list = flat(#a.list);
my $b-list = flat(#b.list);
my $c-list = flat(#c.list);
my ($a, $b, $c);
gather while ?$a-list && ?$b-list && ?$c-list {
$a = $a-list.shift unless $a-list[0] ~~ ::Whatever;
$b = $b-list.shift unless $b-list[0] ~~ ::Whatever;
$c = $c-list.shift unless $c-list[0] ~~ ::Whatever;
take ($a, $b, $c);
}
}
for zip3(#a,#b,#c) -> $nth_a, $nth_b, $nth_c {
say $nth_a ~ $nth_b ~ $nth_c;
}
The multi-dimensional syntax (the use of ; inside parens) and zip across more than two lists both work, so the code originally posted now works (if you provide some real code rather than the { ... } stub block).