How do we flatten or stringify Match (or else) object to be string data type (esp. in multitude ie. as array elements)? e.g.
'foobar' ~~ m{ (foo) };
say $0.WHAT;
my $foo = $0;
say $foo.WHAT
(Match)
(Match)
How to end up with (Str)?
~ is the Str contextualizer:
'foobar' ~~ m{ (foo) };
say ~$0
will directly coerce it to a Str. You can use that if you have many matches, i. e.:
'foobar' ~~ m{ (f)(o)(o) };
say $/.map: ~*; # (f o o)
Just treat the objects as if they were strings.
If you apply a string operation to a value/object Raku will almost always just automatically coerce it to a string.
String operations include functions such as print and put, operators such as infix eq and ~ concatenation, methods such as .starts-with or .chop, interpolation such as "A string containing a $variable", and dedicated coercers such as .Str and Str(...).
A Match object contains an overall match. Any "children" (sub-matches) are just captures of substrings of that overall match. So there's no need to flatten anything because you can just deal with the single overall match.
A list of Match objects is a list. And a list is itself an object. If you apply a string operation to a list, you get the elements of the list stringified with a space between each element.
So:
'foobar' ~~ m{ (f) (o) (o) };
put $/; # foo
put $/ eq 'foo'; # True
put $/ ~ 'bar'; # foobar
put $/ .chop; # fo
put "[$/]"; # [foo]
put $/ .Str; # foo
my Str() $foo = $/;
say $foo.WHAT; # (Str)
put 'foofoo' ~~ m:g{ (f) (o) (o) }; # foo foo
The constructor for Str takes any Cool value as argument, including a regex Match object.
'foobar' ~~ m{ (foo) };
say $0.WHAT; # (Match)
say $0.Str.WHAT; # (Str)
Should be very simple, but I can't cope with it.
I want to match exactly the same number of as as bs. So, the following
my $input = 'aaabbbb';
$input ~~ m:ex/ ... /;
should produce:
aaabbb
aabb
ab
UPD: The following variants don't work, perhaps because of the :ex bug , mentioned in #smls's answer (but more likely because I made some mistakes?):
> my $input = "aaabbbb";
> .put for $input ~~ m:ex/ (a) * (b) * <?{ +$0 == +$1 }> /;
Nil
> .put for $input ~~ m:ex/ (a) + (b) + <?{+$0 == +$1}> /;
Nil
This one, with :ov and ?, works:
> my $input = "aaabbbb";
> .put for $input ~~ m:ov/ (a)+ (b)+? <?{+$0 == +$1}> /;
aaabbb
aabb
ab
UPD2: The following solution works with :ex as well, but I had to do it without <?...> assertion.
> $input = 'aaabbbb'
> $input ~~ m:ex/ (a) + (b) + { put $/ if +$0 == +$1 } /;
aaabbb
aabb
ab
my $input = "aaabbbb";
say .Str for $input ~~ m:ov/ (a)+ b ** {+$0} /;
Output:
aaabbb
aabb
ab
It's supposed to work with :ex instead of :ov, too - but Rakudo bug #130711 currently prevents that.
my $input = "aaabbbb";
say .Str for $input ~~ m:ov/ a <~~>? b /;
Works with ex too
my $input = "aaabbbb";
say .Str for $input ~~ m:ex/ a <~~>? b /;
Upd: explanation
<~~> means call myself recursively see Extensible metasyntax. (It is not yet fully implemented.)
Following (longer, but maybe clearer) example works too:
my $input = "aaabbbb";
my token anbn { a <&anbn>? b}
say .Str for $input ~~ m:ex/ <&anbn> /;
Here I make a regex manually from Regex elements of an array.
my Regex #reg =
/ foo /,
/ bar /,
/ baz /,
/ pun /
;
my $r0 = #reg[0];
my $r1 = #reg[1];
my Regex $r = / 0 $r0 | 1 $r1 /;
"0foo_1barz" ~~ m:g/<$r>/;
say $/; # (「0foo」 「1bar」)
How to do it with for #reg {...}?
If a variable contains a regex, you can use it without further ado inside another regex.
The second trick is to use an array variable inside a regex, which is equivalent to the disjunction of the array elements:
my #reg =
/foo/,
/bar/,
/baz/,
/pun/
;
my #transformed = #reg.kv.map(-> $i, $rx { rx/ $i $rx /});
my #match = "0foo_1barz" ~~ m:g/ #transformed /;
.say for #match;
my #reg =
/foo/,
/bar/,
/baz/,
/pun/
;
my $i = 0;
my $reg = #reg
.map({ $_ = .perl; $_.substr(1, $_.chars - 2); })
.map({ "{$i++}{$_}" })
.join('|');
my #match = "foo", "0foo_1barz" ~~ m:g/(<{$reg}>) /;
say #match[1][0].Str;
say #match[1][1].Str;
# 0foo
# 2baz
See the docs
Edit: Actually read the docs myself. Changed implicit eval to $() construct.
Edit: Rewrote answer to something that actually works
Edit: Changed answer to a terrible, terrible hack
Is it possible to define nested regexes in arbitrary sequence?
The following program works as expected:
my regex letter { <[a b]> }
my regex word { <letter> + }
my $string = 'abab';
$string ~~ &word;
put $/; # abab
If I swap the first two lines, compiler produces an error.
Is there a way to override this restriction (without using grammars)?
You can put the regex in a variable you declare up front but later set:
my $letter;
my regex word { <$letter> + }
$letter = regex { <[a b]> }
my $string = 'abab';
$string ~~ &word;
put $/; # abab
ok so i have have this
{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}
and at the moment i'm using this shell command to decode it to get the string i need,
echo $x | grep -Po '"utterance":.*?[^\\]"' | sed -e s/://g -e s/utterance//g -e 's/"//g'
but this only works when you have a grep compiled with perl and plus the script i use to get that JSON string is written in perl, so is there any way i can do this same decoding in a simple perl script or a simpler unix command, or better yet, c or objective-c?
the script i'm using to get the json is here, http://pastebin.com/jBGzJbMk and if you want a file to use then download http://trevorrudolph.com/a.flac
How about:
perl -MJSON -nE 'say decode_json($_)->{hypotheses}[0]{utterance}'
in script form:
use JSON;
while (<>) {
print decode_json($_)->{hypotheses}[0]{utterance}, "\n"
}
Well, I'm not sure if I can deduce what you are after correctly, but this is a way to decode that JSON string in perl.
Of course, you'll need to know the data structure in order to get the data you need. The line that prints the "utterance" string is commented out in the code below.
use strict;
use warnings;
use Data::Dumper;
use JSON;
my $json = decode_json
q#{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}#;
#print $json->{'hypotheses'}[0]{'utterance'};
print Dumper $json;
Output:
$VAR1 = {
'status' => 0,
'hypotheses' => [
{
'utterance' => 'hello how are you',
'confidence' => '0.96311796'
}
],
'id' => '7aceb216d02ecdca7ceffadcadea8950-1'
};
Quick hack:
while (<>) {
say for /"utterance":"?(.*?)(?<!\\)"/;
}
Or as a one-liner:
perl -lnwe 'print for /"utterance":"(.+?)(?<!\\)"/g' inputfile.txt
The one-liner is troublesome if you happen to be using Windows, since " is interpreted by the shell.
Quick hack#2:
This will hopefully go through any hash structure and find keys.
my $json = decode_json $str;
say find_key($json, 'utterance');
sub find_key {
my ($ref, $find) = #_;
if (ref $ref) {
if (ref $ref eq 'HASH' and defined $ref->{$find}) {
return $ref->{$find};
} else {
for (values $ref) {
my $found = find_key($_, $find);
if (defined $found) {
return $found;
}
}
}
}
return;
}
Based on the naming, it's possible to have multiple hypotheses. The prints the utterance of each hypothesis:
echo '{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}' | \
perl -MJSON::XS -n000E'
say $_->{utterance}
for #{ JSON::XS->new->decode($_)->{hypotheses} }'
Or as a script:
use feature qw( say );
use JSON::XS;
my $json = '{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}';
say $_->{utterance}
for #{ JSON::XS->new->decode($json)->{hypotheses} };
If you don't want to use any modules from CPAN and try a regex instead there are multiple variants you can try:
# JSON is on a single line:
$json = '{"other":"stuff","hypo":[{"utterance":"hi, this is \"bob\"","moo":0}]}';
# RegEx with negative look behind:
# Match everything up to a double quote without a Backslash in front of it
print "$1\n" if ($json =~ m/"utterance":"(.*?)(?<!\\)"/)
This regex works if there is only one utterance. It doesn't matter what else is in the string around it, since it only searches for the double quoted string following the utterance key.
For a more robust version you could add whitespace where necessary/possible and make the . in the RegEx match newlines: m/"utterance"\s*:\s*"(.*?)(?<!\\)"/s
If you have multiple entries for the utterance confidence hash/object, changing case and weird formatting of the JSON string try this:
# weird JSON:
$json = <<'EOJSON';
{
"status":0,
"id":"an ID",
"hypotheses":[
{
"UtTeraNcE":"hello my name is \"Bob\".",
"confidence":0.0
},
{
'utterance' : 'how are you?',
"confidence":0.1
},
{
"utterance"
: "
thought
so!
",
"confidence" : 0.9
}
]
}
EOJSON
# RegEx with alternatives:
print "$1\n" while ( $json =~ m/["']utterance["']\s*:\s*["'](([^\\"']|\\.)*)["']/gis);
The main part of this RegEx is "(([^\\"]|\\.)*)". Description in detail as extended regex:
/
["'] # opening quotes
( # start capturing parentheses for $1
( # start of grouping alternatives
[^\\"'] # anything that's not a backslash or a quote
| # or
\\. # a backslash followed by anything
) # end of grouping
* # in any quantity
) # end capturing parentheses
["'] # closing quotes
/xgs
If you have many data sets and speed is a concern you can add the o modifier to the regex and use character classes instead of the i modifier. You can suppress the capturing of the alternatives to $2 with clustering parenthesis (?:pattern). Then you get this final result:
m/["'][uU][tT][tT][eE][rR][aA][nN][cC][eE]["']\s*:\s*["']((?:[^\\"']|\\.)*)["']/gos
Yes, sometimes perl looks like a big explosion in a bracket factory ;-)
Just stubmled upon another nice method of doing this, i finaly found how to acsess the Mac OS X JavaScript engine form commandline, heres the script,
alias jsc='/System/Library/Frameworks/JavaScriptCore.framework/Versions/A/Resources/jsc'
x='{"status":0,"id":"7aceb216d02ecdca7ceffadcadea8950-1","hypotheses":[{"utterance":"hello how are you","confidence":0.96311796}]}'
jsc -e "print(${x}['hypotheses'][0]['utterance'])"
Ugh, yes i came up with another answer, im strudying python and it reads arrays in both its python format and the same format as a json so, i jsut made this one liner when your variable is x
python -c "print ${x}['hypotheses'][0]['utterance']"
figured it out for unix but would love to see your perl and c, objective-c answers...
echo $X | sed -e 's/.*utterance//' -e 's/confidence.*//' -e s/://g -e 's/"//g' -e 's/,//g'
:D
shorter copy of the same sed:
echo $X | sed -e 's/.*utterance//;s/confidence.*//;s/://g;s/"//g;s/,//g'