Scope of nested regexes in Perl 6 - raku

Is it possible to define nested regexes in arbitrary sequence?
The following program works as expected:
my regex letter { <[a b]> }
my regex word { <letter> + }
my $string = 'abab';
$string ~~ &word;
put $/; # abab
If I swap the first two lines, compiler produces an error.
Is there a way to override this restriction (without using grammars)?

You can put the regex in a variable you declare up front but later set:
my $letter;
my regex word { <$letter> + }
$letter = regex { <[a b]> }
my $string = 'abab';
$string ~~ &word;
put $/; # abab

Related

How to flatten or stringify an object (esp. Match)?

How do we flatten or stringify Match (or else) object to be string data type (esp. in multitude ie. as array elements)? e.g.
'foobar' ~~ m{ (foo) };
say $0.WHAT;
my $foo = $0;
say $foo.WHAT
(Match)
(Match)
How to end up with (Str)?
~ is the Str contextualizer:
'foobar' ~~ m{ (foo) };
say ~$0
will directly coerce it to a Str. You can use that if you have many matches, i. e.:
'foobar' ~~ m{ (f)(o)(o) };
say $/.map: ~*; # (f o o)
Just treat the objects as if they were strings.
If you apply a string operation to a value/object Raku will almost always just automatically coerce it to a string.
String operations include functions such as print and put, operators such as infix eq and ~ concatenation, methods such as .starts-with or .chop, interpolation such as "A string containing a $variable", and dedicated coercers such as .Str and Str(...).
A Match object contains an overall match. Any "children" (sub-matches) are just captures of substrings of that overall match. So there's no need to flatten anything because you can just deal with the single overall match.
A list of Match objects is a list. And a list is itself an object. If you apply a string operation to a list, you get the elements of the list stringified with a space between each element.
So:
'foobar' ~~ m{ (f) (o) (o) };
put $/; # foo
put $/ eq 'foo'; # True
put $/ ~ 'bar'; # foobar
put $/ .chop; # fo
put "[$/]"; # [foo]
put $/ .Str; # foo
my Str() $foo = $/;
say $foo.WHAT; # (Str)
put 'foofoo' ~~ m:g{ (f) (o) (o) }; # foo foo
The constructor for Str takes any Cool value as argument, including a regex Match object.
'foobar' ~~ m{ (foo) };
say $0.WHAT; # (Match)
say $0.Str.WHAT; # (Str)

Combining regexes using a loop in Perl 6

Here I make a regex manually from Regex elements of an array.
my Regex #reg =
/ foo /,
/ bar /,
/ baz /,
/ pun /
;
my $r0 = #reg[0];
my $r1 = #reg[1];
my Regex $r = / 0 $r0 | 1 $r1 /;
"0foo_1barz" ~~ m:g/<$r>/;
say $/; # (「0foo」 「1bar」)
How to do it with for #reg {...}?
If a variable contains a regex, you can use it without further ado inside another regex.
The second trick is to use an array variable inside a regex, which is equivalent to the disjunction of the array elements:
my #reg =
/foo/,
/bar/,
/baz/,
/pun/
;
my #transformed = #reg.kv.map(-> $i, $rx { rx/ $i $rx /});
my #match = "0foo_1barz" ~~ m:g/ #transformed /;
.say for #match;
my #reg =
/foo/,
/bar/,
/baz/,
/pun/
;
my $i = 0;
my $reg = #reg
.map({ $_ = .perl; $_.substr(1, $_.chars - 2); })
.map({ "{$i++}{$_}" })
.join('|');
my #match = "foo", "0foo_1barz" ~~ m:g/(<{$reg}>) /;
say #match[1][0].Str;
say #match[1][1].Str;
# 0foo
# 2baz
See the docs
Edit: Actually read the docs myself. Changed implicit eval to $() construct.
Edit: Rewrote answer to something that actually works
Edit: Changed answer to a terrible, terrible hack

How to interpolate variables into Perl 6 regex character class?

I want to make all the consonants in a word uppercase:
> my $word = 'camelia'
camelia
> $word ~~ s:g/<-[aeiou]>/{$/.uc}/
(「c」 「m」 「l」)
> $word
CaMeLia
To make the code more general, I store the list of all the consonants in a string variable
my $vowels = 'aeiou';
or in an array
my #vowels = $vowels.comb;
How to solve the original problem with $vowels or #vowels variables?
Maybe the trans method would be more appropriate than the subst sub or operator.
Try this:
my $word = "camelia";
my #consonants = keys ("a".."z") (-) <a e i o u>;
say $word.trans(#consonants => #consonants>>.uc);
# => CaMeLia
With the help of moritz's explanation, here is the solution:
my constant $vowels = 'aeiou';
my regex consonants {
<{
"<-[$vowels]>"
}>
}
my $word = 'camelia';
$word ~~ s:g/<consonants>/{$/.uc}/;
say $word; # CaMeLia
You can use <!before …> along with <{…}>, and . to actually capture the character.
my $word = 'camelia';
$word ~~ s:g/
<!before # negated lookahead
<{ # use result as Regex code
$vowel.comb # the vowels as individual characters
}>
>
. # any character (that doesn't match the lookahead)
/{$/.uc}/;
say $word; # CaMeLia
You can do away with the <{…}> with #vowels
I think it is also important to realize you can use .subst
my $word = 'camelia';
say $word.subst( :g, /<!before #vowels>./, *.uc ); # CaMeLia
say $word; # camelia
I would recommend storing the regex in a variable instead.
my $word = 'camelia'
my $vowel-regex = /<-[aeiou]>/;
say $word.subst( :g, $vowel-regex, *.uc ); # CaMeLia
$word ~~ s:g/<$vowel-regex>/{$/.uc}/;
say $word # CaMeLia

Is Perl 6's uncuddled else a special case for statement separation?

From the syntax doc:
A closing curly brace followed by a newline character implies a statement separator, which is why you don't need to write a semicolon after an if statement block.
if True {
say "Hello";
}
say "world";
That's fine and what was going on with Why is this Perl 6 feed operator a “bogus statement”?.
However, how does this rule work for an uncuddled else? Is this a special case?
if True {
say "Hello";
}
else {
say "Something else";
}
say "world";
Or, how about the with-orwith example:
my $s = "abc";
with $s.index("a") { say "Found a at $_" }
orwith $s.index("b") { say "Found b at $_" }
orwith $s.index("c") { say "Found c at $_" }
else { say "Didn't find a, b or c" }
The documentation you found was not completely correct. The documentation has been updated and is now correct. It now reads:
Complete statements ending in bare blocks can omit the trailing semicolon, if no additional statements on the same line follow the block's closing curly brace }.
...
For a series of blocks that are part of the same if/elsif/else (or similar) construct, the implied separator rule only applies at the end of the last block of that series.
Original answer:
Looking at the grammar for if in nqp and Rakudo, it seems that an if/elsif/else set of blocks gets parsed out together as one control statement.
Rule for if in nqp
rule statement_control:sym<if> {
<sym>\s
<xblock>
[ 'elsif'\s <xblock> ]*
[ 'else'\s <else=.pblock> ]?
}
(https://github.com/perl6/nqp/blob/master/src/NQP/Grammar.nqp#L243, as of August 5, 2017)
Rule for if in Rakudo
rule statement_control:sym<if> {
$<sym>=[if|with]<.kok> {}
<xblock(so ~$<sym>[0] ~~ /with/)>
[
[
| 'else'\h*'if' <.typed_panic: 'X::Syntax::Malformed::Elsif'>
| 'elif' { $/.typed_panic('X::Syntax::Malformed::Elsif', what => "elif") }
| $<sym>='elsif' <xblock>
| $<sym>='orwith' <xblock(1)>
]
]*
{}
[ 'else' <else=.pblock(so ~$<sym>[-1] ~~ /with/)> ]?
}
(https://github.com/rakudo/rakudo/blob/nom/src/Perl6/Grammar.nqp#L1450 as of August 5, 2017)

Break down JSON string in simple perl or simple unix?

ok so i have have this
{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}
and at the moment i'm using this shell command to decode it to get the string i need,
echo $x | grep -Po '"utterance":.*?[^\\]"' | sed -e s/://g -e s/utterance//g -e 's/"//g'
but this only works when you have a grep compiled with perl and plus the script i use to get that JSON string is written in perl, so is there any way i can do this same decoding in a simple perl script or a simpler unix command, or better yet, c or objective-c?
the script i'm using to get the json is here, http://pastebin.com/jBGzJbMk and if you want a file to use then download http://trevorrudolph.com/a.flac
How about:
perl -MJSON -nE 'say decode_json($_)->{hypotheses}[0]{utterance}'
in script form:
use JSON;
while (<>) {
print decode_json($_)->{hypotheses}[0]{utterance}, "\n"
}
Well, I'm not sure if I can deduce what you are after correctly, but this is a way to decode that JSON string in perl.
Of course, you'll need to know the data structure in order to get the data you need. The line that prints the "utterance" string is commented out in the code below.
use strict;
use warnings;
use Data::Dumper;
use JSON;
my $json = decode_json
q#{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}#;
#print $json->{'hypotheses'}[0]{'utterance'};
print Dumper $json;
Output:
$VAR1 = {
'status' => 0,
'hypotheses' => [
{
'utterance' => 'hello how are you',
'confidence' => '0.96311796'
}
],
'id' => '7aceb216d02ecdca7ceffadcadea8950-1'
};
Quick hack:
while (<>) {
say for /"utterance":"?(.*?)(?<!\\)"/;
}
Or as a one-liner:
perl -lnwe 'print for /"utterance":"(.+?)(?<!\\)"/g' inputfile.txt
The one-liner is troublesome if you happen to be using Windows, since " is interpreted by the shell.
Quick hack#2:
This will hopefully go through any hash structure and find keys.
my $json = decode_json $str;
say find_key($json, 'utterance');
sub find_key {
my ($ref, $find) = #_;
if (ref $ref) {
if (ref $ref eq 'HASH' and defined $ref->{$find}) {
return $ref->{$find};
} else {
for (values $ref) {
my $found = find_key($_, $find);
if (defined $found) {
return $found;
}
}
}
}
return;
}
Based on the naming, it's possible to have multiple hypotheses. The prints the utterance of each hypothesis:
echo '{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}' | \
perl -MJSON::XS -n000E'
say $_->{utterance}
for #{ JSON::XS->new->decode($_)->{hypotheses} }'
Or as a script:
use feature qw( say );
use JSON::XS;
my $json = '{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}';
say $_->{utterance}
for #{ JSON::XS->new->decode($json)->{hypotheses} };
If you don't want to use any modules from CPAN and try a regex instead there are multiple variants you can try:
# JSON is on a single line:
$json = '{"other":"stuff","hypo":[{"utterance":"hi, this is \"bob\"","moo":0}]}';
# RegEx with negative look behind:
# Match everything up to a double quote without a Backslash in front of it
print "$1\n" if ($json =~ m/"utterance":"(.*?)(?<!\\)"/)
This regex works if there is only one utterance. It doesn't matter what else is in the string around it, since it only searches for the double quoted string following the utterance key.
For a more robust version you could add whitespace where necessary/possible and make the . in the RegEx match newlines: m/"utterance"\s*:\s*"(.*?)(?<!\\)"/s
If you have multiple entries for the utterance confidence hash/object, changing case and weird formatting of the JSON string try this:
# weird JSON:
$json = <<'EOJSON';
{
"status":0,
"id":"an ID",
"hypotheses":[
{
"UtTeraNcE":"hello my name is \"Bob\".",
"confidence":0.0
},
{
'utterance' : 'how are you?',
"confidence":0.1
},
{
"utterance"
: "
thought
so!
",
"confidence" : 0.9
}
]
}
EOJSON
# RegEx with alternatives:
print "$1\n" while ( $json =~ m/["']utterance["']\s*:\s*["'](([^\\"']|\\.)*)["']/gis);
The main part of this RegEx is "(([^\\"]|\\.)*)". Description in detail as extended regex:
/
["'] # opening quotes
( # start capturing parentheses for $1
( # start of grouping alternatives
[^\\"'] # anything that's not a backslash or a quote
| # or
\\. # a backslash followed by anything
) # end of grouping
* # in any quantity
) # end capturing parentheses
["'] # closing quotes
/xgs
If you have many data sets and speed is a concern you can add the o modifier to the regex and use character classes instead of the i modifier. You can suppress the capturing of the alternatives to $2 with clustering parenthesis (?:pattern). Then you get this final result:
m/["'][uU][tT][tT][eE][rR][aA][nN][cC][eE]["']\s*:\s*["']((?:[^\\"']|\\.)*)["']/gos
Yes, sometimes perl looks like a big explosion in a bracket factory ;-)
Just stubmled upon another nice method of doing this, i finaly found how to acsess the Mac OS X JavaScript engine form commandline, heres the script,
alias jsc='/System/Library/Frameworks/JavaScriptCore.framework/Versions/A/Resources/jsc'
x='{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}'
jsc -e "print(${x}['hypotheses'][0]['utterance'])"
Ugh, yes i came up with another answer, im strudying python and it reads arrays in both its python format and the same format as a json so, i jsut made this one liner when your variable is x
python -c "print ${x}['hypotheses'][0]['utterance']"
figured it out for unix but would love to see your perl and c, objective-c answers...
echo $X | sed -e 's/.*utterance//' -e 's/confidence.*//' -e s/://g -e 's/"//g' -e 's/,//g'
:D
shorter copy of the same sed:
echo $X | sed -e 's/.*utterance//;s/confidence.*//;s/://g;s/"//g;s/,//g'