What scope does ":my $foo" have and what is it used for? - raku

With a regex, token or rule, its possible to define a variable like so;
token directive {
:my $foo = "in command";
<command> <subject> <value>?
}
There is nothing about it in the language documentation here, and very little in S05 - Regexes and Rules, to quote;
Any grammar regex is really just a kind of method, and you may declare variables in such a routine using a colon followed by any scope declarator parsed by the Perl 6 grammar, including my, our, state, and constant. (As quasi declarators, temp and let are also recognized.) A single statement (up through a terminating semicolon or line-final closing brace) is parsed as normal Perl 6 code:
token prove-nondeterministic-parsing {
:my $threshold = rand;
'maybe' \s+ <it($threshold)>
}
I get that regexen within grammars are very similar to methods in classes; I get that you can start a block anywhere within a rule and if parsing successfully gets to that point, the block will be executed - but I don't understand what on earth this thing is for.
Can someone clearly define what it's scope is; explain what need it fulfills and give the typical use case?

What scope does :my $foo; have?
:my $foo ...; has the lexical scope of the rule/token/regex in which it appears.
(And :my $*foo ...; -- note the extra * signifying a dynamic variable -- has both the lexical and dynamic scope of the rule/token/regex in which it appears.)
What this is used for
Here's what happens without this construct:
regex scope-too-small { # Opening `{` opens a regex lexical scope.
{ my $foo = / bar / } # Block with its own inner lexical scope.
$foo # ERROR: Variable '$foo' is not declared
}
grammar scope-too-large { # Opening `{` opens lexical scope for gramamr.
my $foo = / bar / ;
regex r1 { ... } # `$foo` is recognized inside `r1`...
...
regex r999 { ... } # ...but also inside r999
}
So the : ... ; syntax is used to get exactly the desired scope -- neither too broad nor too narrow.
Typical use cases
This feature is typically used in large or complex grammars to avoid lax scoping (which breeds bugs).
For a suitable example of precise lexical only scoping see the declaration and use of #extra_tweaks in token babble as defined in a current snapshot of Rakudo's Grammar.nqp source code.
P6 supports action objects. These are classes with methods corresponding one-to-one with the rules in a grammar. Whenever a rule matches, it calls its corresponding action method. Dynamic variables provide precisely the right scoping for declaring variables that are scoped to the block (method, rule, etc.) they're declared in both lexically and dynamically -- which latter means they're available in the corresponding action method too. For an example of this, see the declaration of #*nibbles in Rakudo's Grammar module and its use in Rakudo's Actions module.

Related

How to die on undefined values?

I am attempting to work with a hash in Raku, but when I put some fake values into it (intentionally) like
say %key<fake_key>;
I get
(Any)
but I want the program to die in such occurrences, as Perl does, because this implies that important data is missing.
For example,
#!/usr/bin/env perl
use strict;
use warnings 'FATAL' => 'all';
use autodie ':all';
my %hash;
print "$hash{y}\n";
as of 5.26.1 produces
Use of uninitialized value $hash{"y"} in concatenation (.) or string at undefined.pl line 8.
Command exited with non-zero status 255
How can I get the equivalent of use warnings 'FATAL' => 'all' and use autodie qw(:all) in Raku?
Aiui your question is:
I'm looking for use autodie qw(:all) & use warnings 'FATAL' => 'all' in Raku
The equivalent of Perl's autodie in Raku
Aiui the equivalent of use autodie qw(:all) in Perl is use fatal; in Raku. This is a lexically scoped effect (at least it is in Raku).
The autodie section in the Perl-to-Raku nutshell guide explains that routines now return Failures to indicate errors.
The fatal pragma makes returning a Failure from a routine automatically throw an exception that contains the Failure. Unless you provide code that catches them, these exceptions that wrap Failures automatically die.
The equivalent of Perl's use warnings 'FATAL' in Raku
Aiui the equivalent of use warnings 'FATAL' => 'all' in Perl is CONTROL { when CX::Warn { note $_; exit 1 } } in Raku. This is a lexically scoped effect (at least it is in Raku).
CONTROL exceptions explains how these work.
CONTROL exceptions are a subset of all exceptions that are .resume'd by default -- the program stays alive by default when they're thrown.
The Raku code I've provided (which is lifted from How could I make all warnings fatal? which you linked to) instead makes CONTROL exceptions die (due to the exit routine).
Returning to your current question text:
say %key<fake_key>; # (Any)
I want the program to die in such occurrences ...
Use either Jonathan++'s answer (use put, which, unlike say, isn't trying to keep your program alive) or Scimon++'s KeyRequired answer which will make accessing of a non-existent key fatal.
... as Perl does ...
Only if you use use warnings 'FATAL' ..., just as Raku does if you use the equivalent.
... because this implies that important data is missing.
Often it implies unimportant data is missing, or even important data that you don't want defined some of the time you attempt to access it, so both Perls and Raku default to keeping your program alive and require that you tell it what you want if you want something different.
You can use the above constructs to get the precise result you want and they'll be limited to a given variable (if you use the KeyRequired role) or statement (using put instead of say) or lexical scope (using a pragma or the CONTROL block).
You can create a role to do this. A simple version would be :
role KeyRequired {
method AT-KEY( \key ) {
die "Key {key} not found" unless self.EXISTS-KEY(key);
nextsame;
}
};
Then you create your hash with : my %key does KeyRequired; and it will die if you request a non existent key.
The simple answer is to not use say.
say %key<fake_key>;
# (Any)
put %key<fake_key>;
# Use of uninitialized value of type Any in string context.
# Methods .^name, .perl, .gist, or .say can be used to stringify it to something
# meaningful.
# in block <unit> at <unknown file> line 1
say calls .gist which prints enough information for a human to understand what was printed.
put just tries to turn it into a Str and print that, but in this case it produces an error.
Perl 5 warns, not dies, in this case. Perl 6 will do the same if equivalent code is used:
my %key;
print "%key<fake_key>\n"; # Gives a warning
Or, more neatly, use put:
my %key;
put %key<fake_key>;
The put routine ("print using terminator") will stringify the value, which is what triggers the warning about use of an undefined value in string context.
By contrast, say does not stringify, but instead calls .gist on the object, and prints whatever it returns. In the case of an undefined value, the gist of it is the name of its type, wrapped in parentheses. In general, say - which uses .gist underneath - gives more information. For example, consider an array:
my #a = 1..5;
put #a; # 1 2 3 4 5
say #a; # [1 2 3 4 5]
Where put just joins the elements with spaces, but say represents the structure and that it's an array.

Using a variable in a Perl 6 program before assigning to it

I want to assign literals to some of the variables at the end of the file with my program, but to use these variables earlier. The only method I've come up with to do it is the following:
my $text;
say $text;
BEGIN {
$text = "abc";
}
Is there a better / more idiomatic way?
Just go functional.
Create subroutines instead:
say text();
sub text { "abc" }
UPDATE (Thanks raiph! Incorporating your feedback, including reference to using term:<>):
In the above code, I originally omitted the parentheses for the call to text, but it would be more maintainable to always include them to prevent the parser misunderstanding our intent. For example,
say text(); # "abc"
say text() ~ text(); # "abcabc"
say text; # "abc", interpreted as: say text()
say text ~ text; # ERROR, interpreted as: say text(~text())
sub text { "abc" };
To avoid this, you could make text a term, which effectively makes the bareword text behave the same as text():
say text; # "abc", interpreted as: say text()
say text ~ text; # "abcabc", interpreted as: say text() ~ text()
sub term:<text> { "abc" };
For compile-time optimizations and warnings, we can also add the pure trait to it (thanks Brad Gilbert!). is pure asserts that for a given input, the function "always produces the same output without any additional side effects":
say text; # "abc", interpreted as: say text()
say text ~ text; # "abcabc", interpreted as: say text() ~ text()
sub term:<text> is pure { "abc" };
Unlike Perl 5, in Perl 6 a BEGIN does not have to be a block. However, the lexical definition must be seen before it can be used, so the BEGIN block must be done before the say.
BEGIN my $text = "abc";
say $text;
Not sure whether this constitutes an answer to your question or not.
First, a rephrase of your question:
What options are there for succinctly referring to a variable (or constant etc.) whose initialization code appears further down in the same source file?
Post declare a routine
say foo;
sub foo { 'abc' }
When a P6 compiler parses an identifier that has no sigil, it checks to see if it has already seen a declaration of that identifier. If it hasn't, then it assumes that the identifier corresponds to a routine which will be declared later as a "listop" routine (which takes zero or more arguments) and moves on. (If its assumption turns out to be wrong, it fails the compilation.)
So you can use routines as if they were variables as described in Christopher Bottom's answer.
Autodeclare a variable on first use
strict is a "pragma" that controls how a P6 compiler reacts when it parses an as yet undeclared variable/constant that starts with a sigil.
P6 starts programs with strict mode switched on. This means that the compiler will insist on predeclaration of any sigil'd variable/constant. (By predeclaration I mean an explicit declaration that appears textually before the variable/constant is used.)
But you can write use strict or no strict to control whether the strict pragma is on or off in a given lexical scope, so this will work:
no strict;
say $text;
BEGIN {
$text = "abc";
}
Warning Having no strict in effect (which is unfortunately how most programming languages work) makes accidental misspelling of variable names a bigger nuisance than it is with use strict mode on.
Declare a variable explicitly in the same statement as its first use
You don't have to write a declaration as a separate statement. You can instead declare and use a variable in the same statement or expression:
say my $text;
BEGIN {
$text = "abc";
}
Warning If you repeat my $bar in the exact same lexical scope, the compiler will emit a warning. In contrast, say my $bar = 42; if foo { say my $bar = 99 } creates two distinct $bar variables without warning.
Initialize at run-time
The BEGIN phaser shown above runs at compile-time (after the my declaration, which also happens at compile-time, but before the say, which happens at run-time).
If you want to initialize variables/constants at run-time instead, use INIT instead:
say my $text;
INIT {
$text = "abc";
}
INIT code runs before any other run-time code, so the initialization still happens before the say gets executed.
Use a positronic (ym) variable
Given a literal interpretation of your question a "positronic" or ym variable would be yet another solution. (This feature is not built-in. I'm including it mostly because I encountered it after answering this question and think it belongs here, at the very least for entertainment value.)
Initialization and calculation of such a variable starts in the last statement using it and occurs backwards relative to the textual order of the code.
This is one of the several crazy sounding but actually working and useful concepts that Damian "mad scientist" Conway discusses in his 2011 presentation Temporally Quaquaversal Virtual Nanomachine Programming In Multiple Topologically Connected Quantum-Relativistic Parallel Spacetimes... Made Easy!.
Here's a link to the bit where he focuses on these variables.
(The whole presentation is a delight, especially if you're interested in physics; programming techniques; watching highly creative wunderkinds; and/or enjoy outstanding presentation skills and humor.)
Create a PS pragma?
In terms of coolness, the following pales in comparison to Damian's positronic variable feature that I just covered, but it's an idea I had while pondering this question.
Someone could presumably implement something like the following pragma:
use PS;
say $text;
BEGIN $text = 'abc';
This PS would lexically apply no strict and in addition require that, to avoid a compile-time error:
An auto-declared variable/constant must match up with a post declaration in a BEGIN or INIT phaser;
The declaration must include initialization if the first use (textually) of a variable/constant is not a binding or assignment.

antlr global rule scope declaration vs #members declaration

Which one would you prefer to declare a variable in which case, global scope or #members declaration? It seems to me that they can serve for same purpose?
UPDATE here is a grammar to explain what i mean.
grammar GlobalVsScope;
scope global{
int i;
}
#lexer::header{package org.inanme.antlr;}
#parser::header{package org.inanme.antlr;}
#parser::members {
int j;
}
start
scope global;
#init{
System.out.println($global::i);
System.out.println(j);
}:R EOF;
R:'which one';
Note that besides global (ANTLR) scopes, you can also have local rule-scopes, like this:
grammar T;
options { backtrack=true; }
parse
scope { String x; }
parse
: 'foo'? ID {$parse::x = "xyz";} rule*
| 'foo' ID
;
rule
: ID {System.out.println("x=" + $parse::x);}
;
The only time I'd consider using local rule-scopes is when there are a lot of predicates, or global backtracking is enabled (resulting in all rules to have predicates in front of them). In that case, you could create a member variable String x (or define it in a global scope) and set it in the parse rule, but you might be changing this instance/scope variable after which the parser could backtrack, and this backtracking will not cause the global variable to be set to it's original form/state! The local scoped variable will also not be "unset", but that will likely be less of a risk: them being local to a single rule.
To summarize: yes, you're right, global scopes and member/instance variables are much alike. But I'd sooner opt for members-variables because of the friendlier syntax.

NullPointerException with ANTLR text attribute

I have a problem that I've been stuck on for a while and I would appreciate some help if possible.
I have a few rules in an ANTLR tree grammar:
block
: compoundstatement
| ^(VAR declarations) compoundstatement
;
declarations
: (^(t=type idlist))+
;
idlist
: IDENTIFIER+
;
type
: REAL
| i=INTEGER
;
I have written a Java class VarTable that I will insert all of my variables into as they are declared at the beginning of my source file. The table will also hold their variable types (ie real or integer). I'll also be able to use this variable table to check for undeclared variables or duplicate declarations etc.
So basically I want to be able to send the variable type down from the 'declarations' rule to the 'idlist' rule and then loop through every identifier in the idlist rule, adding them to my variable table one by one.
The major problem I'm getting is that I get a NullPointerException when I try and access the 'text' attribute if the $t variable in the 'declarations' rule (This is one one which refers to the type).
And yet if I try and access the 'text' attribute of the $i variable in the 'type' rule, there's no problem.
I have looked at the place in the Java file where the NullPointerException is being generated and it still makes no sense to me.
Is it a problem with the fact that there could be multiple types because the rule is
(^(typeidlist))+
??
I have the same issue when I get down to the idlist rule, becasue I'm unsure how I can write an action that will allow me to loop through all of the IDENTIFIER Tokens found.
Grateful for any help or comments.
Cheers
You can't reference the attributes from production rules like you tried inside tree grammars, only in parser (or combined) grammars (they're different objects!). Note that INTEGER is not a production rule, just a "simple" token (terminal). That's why you can invoke its .text attribute.
So, if you want to get a hold the text of the type rule in your tree grammar and print it in your declarations rule, your could do something like this:
tree grammar T;
...
declarations
: (^(t=type idlist {System.out.println($t.returnValue);}))+
;
...
type returns [String returnValue]
: i=INTEGER {returnValue = "[" + $i.text + "]";}
;
...
But if you really want to do it without specifying a return object, you could do something like this:
declarations
: (^(t=type idlist {System.out.println($t.start.getText());}))+
;
Note that type returns an instance of a TreeRuleReturnScope which has an attribute called start which in its turn is a CommonTree instance. You could then call getText() on that CommonTree instance.

Interpreting a variable number of tree nodes in ANTLR Tree Grammar

Whilst creating an inline ANTLR Tree Grammar interpreter I have come across an issue regarding the multiplicity of procedure call arguments.
Consider the following (faulty) tree grammar definition.
procedureCallStatement
: ^(PROCEDURECALL procedureName=NAME arguments=expression*)
{
if(procedureName.equals("foo")) {
callFooMethod(arguments[0], arguments[1]);
}elseif(procedureName.equals("bar")) {
callBarMethod(arguments[0], arguments[1], arguments[2]);
}
}
;
My problem lies with the retrieval of the given arguments. If there would be a known quantity of expressions I would just assign the values coming out of these expressions to their own variable, e.g.:
procedureCallStatement
: ^(PROCEDURECALL procedureName=NAME argument1=expression argument2=expression)
{
...
}
;
This however is not the case.
Given a case like this, what is the recommendation on interpreting a variable number of tree nodes inline within the ANTLR Tree Grammar?
Use the += operator. To handle any number of arguments, including zero:
procedureCallStatement
: ^(PROCEDURECALL procedureName=NAME argument+=expression*)
{
...
}
;
See the tree construction documentation on the antlr website.
The above will change the type of the variable argument from typeof(expression) to a List (well, at least when you're generating Java code). Note that the list types are untyped, so it's just a plain list.
If you use multiple parameters with the same variable name, they will also create a list, for example:
twoParameterCall
: ^(PROCEDURECALL procedureName=NAME argument=expression argument=expression)
{
...
}
;