Using a variable in a Perl 6 program before assigning to it - raku

I want to assign literals to some of the variables at the end of the file with my program, but to use these variables earlier. The only method I've come up with to do it is the following:
my $text;
say $text;
BEGIN {
$text = "abc";
}
Is there a better / more idiomatic way?

Just go functional.
Create subroutines instead:
say text();
sub text { "abc" }
UPDATE (Thanks raiph! Incorporating your feedback, including reference to using term:<>):
In the above code, I originally omitted the parentheses for the call to text, but it would be more maintainable to always include them to prevent the parser misunderstanding our intent. For example,
say text(); # "abc"
say text() ~ text(); # "abcabc"
say text; # "abc", interpreted as: say text()
say text ~ text; # ERROR, interpreted as: say text(~text())
sub text { "abc" };
To avoid this, you could make text a term, which effectively makes the bareword text behave the same as text():
say text; # "abc", interpreted as: say text()
say text ~ text; # "abcabc", interpreted as: say text() ~ text()
sub term:<text> { "abc" };
For compile-time optimizations and warnings, we can also add the pure trait to it (thanks Brad Gilbert!). is pure asserts that for a given input, the function "always produces the same output without any additional side effects":
say text; # "abc", interpreted as: say text()
say text ~ text; # "abcabc", interpreted as: say text() ~ text()
sub term:<text> is pure { "abc" };

Unlike Perl 5, in Perl 6 a BEGIN does not have to be a block. However, the lexical definition must be seen before it can be used, so the BEGIN block must be done before the say.
BEGIN my $text = "abc";
say $text;
Not sure whether this constitutes an answer to your question or not.

First, a rephrase of your question:
What options are there for succinctly referring to a variable (or constant etc.) whose initialization code appears further down in the same source file?
Post declare a routine
say foo;
sub foo { 'abc' }
When a P6 compiler parses an identifier that has no sigil, it checks to see if it has already seen a declaration of that identifier. If it hasn't, then it assumes that the identifier corresponds to a routine which will be declared later as a "listop" routine (which takes zero or more arguments) and moves on. (If its assumption turns out to be wrong, it fails the compilation.)
So you can use routines as if they were variables as described in Christopher Bottom's answer.
Autodeclare a variable on first use
strict is a "pragma" that controls how a P6 compiler reacts when it parses an as yet undeclared variable/constant that starts with a sigil.
P6 starts programs with strict mode switched on. This means that the compiler will insist on predeclaration of any sigil'd variable/constant. (By predeclaration I mean an explicit declaration that appears textually before the variable/constant is used.)
But you can write use strict or no strict to control whether the strict pragma is on or off in a given lexical scope, so this will work:
no strict;
say $text;
BEGIN {
$text = "abc";
}
Warning Having no strict in effect (which is unfortunately how most programming languages work) makes accidental misspelling of variable names a bigger nuisance than it is with use strict mode on.
Declare a variable explicitly in the same statement as its first use
You don't have to write a declaration as a separate statement. You can instead declare and use a variable in the same statement or expression:
say my $text;
BEGIN {
$text = "abc";
}
Warning If you repeat my $bar in the exact same lexical scope, the compiler will emit a warning. In contrast, say my $bar = 42; if foo { say my $bar = 99 } creates two distinct $bar variables without warning.
Initialize at run-time
The BEGIN phaser shown above runs at compile-time (after the my declaration, which also happens at compile-time, but before the say, which happens at run-time).
If you want to initialize variables/constants at run-time instead, use INIT instead:
say my $text;
INIT {
$text = "abc";
}
INIT code runs before any other run-time code, so the initialization still happens before the say gets executed.
Use a positronic (ym) variable
Given a literal interpretation of your question a "positronic" or ym variable would be yet another solution. (This feature is not built-in. I'm including it mostly because I encountered it after answering this question and think it belongs here, at the very least for entertainment value.)
Initialization and calculation of such a variable starts in the last statement using it and occurs backwards relative to the textual order of the code.
This is one of the several crazy sounding but actually working and useful concepts that Damian "mad scientist" Conway discusses in his 2011 presentation Temporally Quaquaversal Virtual Nanomachine Programming In Multiple Topologically Connected Quantum-Relativistic Parallel Spacetimes... Made Easy!.
Here's a link to the bit where he focuses on these variables.
(The whole presentation is a delight, especially if you're interested in physics; programming techniques; watching highly creative wunderkinds; and/or enjoy outstanding presentation skills and humor.)
Create a PS pragma?
In terms of coolness, the following pales in comparison to Damian's positronic variable feature that I just covered, but it's an idea I had while pondering this question.
Someone could presumably implement something like the following pragma:
use PS;
say $text;
BEGIN $text = 'abc';
This PS would lexically apply no strict and in addition require that, to avoid a compile-time error:
An auto-declared variable/constant must match up with a post declaration in a BEGIN or INIT phaser;
The declaration must include initialization if the first use (textually) of a variable/constant is not a binding or assignment.

Related

Is it possible to introspect into the scope of a Scalar at runtime?

If I have the following variables
my $a = 0;
my $*b = 1;
state $c = 2;
our $d = 3;
I can easily determine that $*b is dynamic but $a is not with the following code
say $a.VAR.dynamic;
say $*b.VAR.dynamic;
Is there any way to similarly determine that $c is a state variable and $d is a package-scoped variable? (I know that I could do so with a will trait on each variable declaration, but I'm hopping there is a way that doesn't require annotating every declaration. Maybe something with ::(...) interpolation?)
In the case of the package-scoped variable, not too hard:
our $foo = 'bar';
say $foo.VAR.name ∈ OUR::.keys
where we're using the OUR pseudopackage. However, there's no such thing as a STATE pseudopackage. They obviously show up in the LEXICAL pseudopackage, but I can't find a way to check if they're a state variable or not. Sorry.
To my knowledge, there is no way to recognize a state variable. Like any lexical, it lives in the lexpad. The only thing different about it, is that it effectively has code generated to do the initialization the first time the scope is entered.
As Elizabeth Mattijsen correctly noted, it is currently not possible to see whether a variable is a state variable at run time. ... at least technically at runtime.
However, as Jonathan Worthington's comment implies, it is possible to check this at compile time. And, absent deep meta-programming shenanigans, whether a variable is a state variable is immutable after compile-time. And, of course, it's possible to make note of some info at compile time and then use it at runtime.
Thus, it's possible to know, at runtime, whether a variable is a state one with (compile-time) code along the following lines, which provides a list-state-vars trait that lists all the state variables in a function:
multi trait_mod:<is>(Sub \f, :$list-state-vars) {
use nqp;
given f.^attributes.first({.name eq '#!compstuff'}).get_value(f)[0] {
say .list[0].list.grep({try .decl ~~ 'statevar'}).map({.name});
}
};
This code is obviously pretty fragile/dependent on the Rakudo implementation details of QAST. Hopefully this will be much easier with RAST, but this basic approach is already workable and, in the meantime, this guide to QAST hacking is a helpful resource for this sort of meta programming.

Alter how arguments are processed before they're passed to sub MAIN

Given the documentation and the comments on an earlier question, by request I've made a minimal reproducible example that demonstrates a difference between these two statements:
my %*SUB-MAIN-OPTS = :named-anywhere;
PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True;
Given a script file with only this:
#!/usr/bin/env raku
use MyApp::Tools::CLI;
and a module file in MyApp/Tools called CLI.pm6:
#PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True;
my %*SUB-MAIN-OPTS = :named-anywhere;
proto MAIN(|) is export {*}
multi MAIN( 'add', :h( :$hostnames ) ) {
for #$hostnames -> $host {
say $host;
}
}
multi MAIN( 'remove', *#hostnames ) {
for #hostnames -> $host {
say $host;
}
}
The following invocation from the command line will not result in a recognized subroutine, but show the usage:
mre.raku add -h=localhost -h=test1
Switching my %*SUB-MAIN-OPTS = :named-anywhere; for PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True; will print two lines with the two hostnames provided, as expected.
If however, this is done in a single file as below, both work identical:
#!/usr/bin/env raku
#PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True;
my %*SUB-MAIN-OPTS = :named-anywhere;
proto MAIN(|) is export {*}
multi MAIN( 'add', :h( :$hostnames )) {
for #$hostnames -> $host {
say $host;
}
}
multi MAIN( 'remove', *#hostnames ) {
for #hostnames -> $host {
say $host;
}
}
I find this hard to understand.
When reproducing this, be alert of how each command must be called.
mre.raku remove localhost test1
mre.raku add -h=localhost -h=test1
So a named array-reference is not recognized when this is used in a separate file with my %*SUB-MAIN-OPTS = :named-anywhere;. While PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True; always works. And for a slurpy array, both work identical in both cases.
The problem is that it isn't the same variable in both the script and in the module.
Sure they have the same name, but that doesn't mean much.
my \A = anon class Foo {}
my \B = anon class Foo {}
A ~~ B; # False
B ~~ A; # False
A === B; # False
Those two classes have the same name, but are separate entities.
If you look at the code for other built-in dynamic variables, you see something like:
Rakudo::Internals.REGISTER-DYNAMIC: '$*EXECUTABLE-NAME', {
PROCESS::<$EXECUTABLE-NAME> := $*EXECUTABLE.basename;
}
This makes sure that the variable is installed into the right place so that it works for every compilation unit.
If you look for %*SUB-MAIN-OPTS, the only thing you find is this line:
my %sub-main-opts := %*SUB-MAIN-OPTS // {};
That looks for the variable in the main compilation unit. If it isn't found it creates and uses an empty Hash.
So when you try do it in a scope other than the main compilation unit, it isn't in a place where it could be found by that line.
To test if adding that fixes the issue, you can add this to the top of the main compilation unit. (The script that loads the module.)
BEGIN Rakudo::Internals.REGISTER-DYNAMIC: '%*SUB-MAIN-OPTS', {
PROCESS::<%SUB-MAIN-OPTS> := {}
}
Then in the module, write this:
%*SUB-MAIN-OPTS = :named-anywhere;
Or better yet this:
%*SUB-MAIN-OPTS<named-anywhere> = True;
After trying this, it seems to work just fine.
The thing is, that something like that used to be there.
It was removed on the thought that it slows down every Raku program.
Though I think that any slowdown it caused would still be an issue as the line that is still there has to look to see if there is a dynamic variable of that name.
(There are more reasons given, and I frankly disagree with all of them.)
May a cuppa bring enlightenment to future SO readers pondering the meaning of things.[1]
Related answers by Liz
I think Liz's answer to an SO asking a similar question may be a good read for a basic explanation of why a my (which is like a lesser our) in the mainline of a module doesn't work, or at least confirmation that core devs know about it.
Her later answer to another SO explains how one can use my by putting it inside a RUN-MAIN.
Why does a slurpy array work by default but not named anywhere?
One rich resource on why things are the way they are is the section Declaring a MAIN subroutine of S06 (Synopsis on Subroutines)[2].
A key excerpt:
As usual, switches are assumed to be first, and everything after the first non-switch, or any switches after a --, are treated as positionals or go into the slurpy array (even if they look like switches).
So it looks like this is where the default behavior, in which nameds can't go anywhere, comes from; it seems that #Larry[3] was claiming that the "usual" shell convention was as described, and implicitly arguing that this should dictate that the default behavior was as it is.
Since Raku was officially released RFC: Allow subcommands in MAIN put us on the path to todays' :named-anywhere option. The RFC presented a very powerful 1-2 punch -- an unimpeachable two line hackers' prose/data argument that quickly led to rough consensus, with a working code PR with this commit message:
Allow --named-switches anywhere in command line.
Raku was GNU-like in that it has '--double-dashes' and that it stops interpreting named parameters when it encounters '--', but unlike GNU-like parsing, it also stopped interpreting named parameters when encountering any positional argument. This patch makes it a bit more GNU-like by allowing named arguments after a positional, to prepare for allowing subcommands.
> Alter how arguments are processed before they're passed to sub MAIN
In the above linked section of S06 #Larry also wrote:
Ordinarily a top-level Raku "script" just evaluates its anonymous mainline code and exits. During the mainline code, the program's arguments are available in raw form from the #*ARGS array.
The point here being that you can preprocess #*ARGS before they're passed to MAIN.
Continuing:
At the end of the mainline code, however, a MAIN subroutine will be called with whatever command-line arguments remain in #*ARGS.
Note that, as explained by Liz, Raku now has a RUN-MAIN routine that's called prior to calling MAIN.
Then comes the standard argument processing (alterable by using standard options, of which there's currently only the :named-anywhere one, or userland modules such as SuperMAIN which add in various other features).
And finally #Larry notes that:
Other [command line parsing] policies may easily be introduced by calling MAIN explicitly. For instance, you can parse your arguments with a grammar and pass the resulting Match object as a Capture to MAIN.
A doc fix?
Yesterday you wrote a comment suggesting a doc fix.
I now see that we (collectively) know about the coding issue. So why is the doc as it is? I think the combination of your SO and the prior ones provide enough anecdata to support at least considering filing a doc issue to the contrary. Then again Liz hints in one of the SO's that a fix might be coming, at least for ours. And SO is itself arguably doc. So maybe it's better to wait? I'll punt and let you decide. At least you now have several SOs to quote if you decide to file a doc issue.
Footnotes
[1] I want to be clear that if anyone perceives any fault associated with posting this SO then they're right, and the fault is entirely mine. I mentioned to #acw that I'd already done a search so they could quite reasonably have concluded there was no point in them doing one as well. So, mea culpa, bad coffee inspired puns included. (Bad puns, not bad coffee.)
[2] Imo these old historical speculative design docs are worth reading and rereading as you get to know Raku, despite them being obsolete in parts.
[3] #Larry emerged in Raku culture as a fun and convenient shorthand for Larry Wall et al, the Raku language team led by Larry.

Sigilless variables, constants, bindings: when to choose what

So, from the documentation is almost clear what the three entities from the title are, but it is not very clear what their purpose is.
Constants are common in many languages. You don't want to write 3.14 all over your code and that's why you define a constant PI that, by the nature of the thing it represents, cannot be changed and that's why its value is constant in time.
Defining a variable bound to another entity with := is also almost clear but not really. If I bind a variable to 3.14, isn't it the same as defining a PI constant? Is $a := $b actually defining $a as an alias to $b or, as some other languages call it, a reference? Also, references are generally used in some languages to make it clear, in object construction or function calls, that you don't want to copy an object around, but why would it be useful, in the same scope, to have
$number_of_cakes = 4;
$also_the_number_of_cakes := $number_of_cakes;
?
Finally, the documentation explains how one can define a sigilless variable (that cannot be varied, so, actually, yet another kind of constant) but not why one would do that. What problems do sigilless variables solve? What's a practical scenario when a variable, a constant, a binding variable do not solve the problem that a sigilless variable solve?
Constants are common in many languages. You don't want to write 3.14 all over your code and that's why you define a constant PI that, by the nature of the thing it represents, cannot be changed and that's why its value is constant in time.
For that you use constant.
A constant is only initialized once, when first encountered during compilation, and never again, so its value remains constant throughout a program run. The compiler can rely on that. So can the reader -- provided they clearly understand what "value" means in this context.1
For example:
BEGIN my $bar = 42;
loop {
constant foo = $bar;
$bar++;
last if $++ > 4;
print foo; # 4242424242
}
Note that if you didn't have the BEGIN then the constant would be initialized and be stuck with a value of (Any).
I tend to leave off the sigil to somewhat reinforce the idea it's a constant but you can use a sigil if you prefer.
Defining a variable bound to another entity with := is also almost clear but not really. If I bind a variable to 3.14, isn't it the same as defining a PI constant? Is $a := $b actually defining $a as an alias to $b or, as some other languages call it, a reference?
Binding just binds the identifiers. If either changes, they both do:
my $a = 42;
my $b := $a;
$a++;
$b++;
say $a, $b; # 4444
Finally, the documentation explains how one can define a sigilless variable (that cannot be varied, so, actually, another kind of constant) but not why one would do that.
It can be varied if the thing bound is a variable. For example:
my $variable = 42; # puts the value 42 INTO $variable
my \variable = $variable; # binds variable to $variable
say ++variable; # 43
my \variable2 = 42; # binds variable2 to 42
say ++variable2; # Cannot resolve caller prefix:<++>(Int:D);
# ... candidates ... require mutable arguments
I personally prefer to slash sigils if an identifier is bound to an immutable basic scalar value (eg 42) or other entirely immutable value (eg a typical List) and use a sigil otherwise. YMMV.
What problems do sigilless variables solve? What's a practical scenario when a variable, a constant, a binding variable do not solve the problem that a sigilless variable solve?
Please add comments if what's already in my answer (or another; I see someone else has posted one) leaves you with remaining questions.
Foonotes
1 See my answer to JJ's SO about use of constant with composite containers.
That's not one but at 5 questions?
If I bind a variable to 3.14, isn't it the same as defining a PI constant?
Well, technically it would be except for the fact that 3.14 is a Rat, and pi (aka π) is a Num.
Is $a := $b actually defining $a as an alias to $b or, as some other languages call it, a reference?
It's an alias.
$number_of_cakes = 4; $also_the_number_of_cakes := $number_of_cakes;
There would be little point. However, sometimes it can be handy to alias something to an element in an array, or a key in a hash, to prevent repeated lookups:
my %foo;
my $bar := %foo<bar>;
++$bar for ^10;
What problems do sigilless variables solve?
Sigilless variables only solve a programming style issue, as far as I know. Some people, for whatever reason, prefer sigilless varriables. It makes it harder to interpolate, but others might find the extra curlies actually a good thing:
my answer := my $ = 42;
say "The answer is {answer}";
What's a practical scenario when a variable, a constant, a binding variable do not solve the problem that a sigilless variable solve?
In my view, sigilless variables do not solve a problem, but rather try to cater different programming styles. So I'm not sure what a correct answer to this question would be.
Also constant is our scoped by default.

How to die on undefined values?

I am attempting to work with a hash in Raku, but when I put some fake values into it (intentionally) like
say %key<fake_key>;
I get
(Any)
but I want the program to die in such occurrences, as Perl does, because this implies that important data is missing.
For example,
#!/usr/bin/env perl
use strict;
use warnings 'FATAL' => 'all';
use autodie ':all';
my %hash;
print "$hash{y}\n";
as of 5.26.1 produces
Use of uninitialized value $hash{"y"} in concatenation (.) or string at undefined.pl line 8.
Command exited with non-zero status 255
How can I get the equivalent of use warnings 'FATAL' => 'all' and use autodie qw(:all) in Raku?
Aiui your question is:
I'm looking for use autodie qw(:all) & use warnings 'FATAL' => 'all' in Raku
The equivalent of Perl's autodie in Raku
Aiui the equivalent of use autodie qw(:all) in Perl is use fatal; in Raku. This is a lexically scoped effect (at least it is in Raku).
The autodie section in the Perl-to-Raku nutshell guide explains that routines now return Failures to indicate errors.
The fatal pragma makes returning a Failure from a routine automatically throw an exception that contains the Failure. Unless you provide code that catches them, these exceptions that wrap Failures automatically die.
The equivalent of Perl's use warnings 'FATAL' in Raku
Aiui the equivalent of use warnings 'FATAL' => 'all' in Perl is CONTROL { when CX::Warn { note $_; exit 1 } } in Raku. This is a lexically scoped effect (at least it is in Raku).
CONTROL exceptions explains how these work.
CONTROL exceptions are a subset of all exceptions that are .resume'd by default -- the program stays alive by default when they're thrown.
The Raku code I've provided (which is lifted from How could I make all warnings fatal? which you linked to) instead makes CONTROL exceptions die (due to the exit routine).
Returning to your current question text:
say %key<fake_key>; # (Any)
I want the program to die in such occurrences ...
Use either Jonathan++'s answer (use put, which, unlike say, isn't trying to keep your program alive) or Scimon++'s KeyRequired answer which will make accessing of a non-existent key fatal.
... as Perl does ...
Only if you use use warnings 'FATAL' ..., just as Raku does if you use the equivalent.
... because this implies that important data is missing.
Often it implies unimportant data is missing, or even important data that you don't want defined some of the time you attempt to access it, so both Perls and Raku default to keeping your program alive and require that you tell it what you want if you want something different.
You can use the above constructs to get the precise result you want and they'll be limited to a given variable (if you use the KeyRequired role) or statement (using put instead of say) or lexical scope (using a pragma or the CONTROL block).
You can create a role to do this. A simple version would be :
role KeyRequired {
method AT-KEY( \key ) {
die "Key {key} not found" unless self.EXISTS-KEY(key);
nextsame;
}
};
Then you create your hash with : my %key does KeyRequired; and it will die if you request a non existent key.
The simple answer is to not use say.
say %key<fake_key>;
# (Any)
put %key<fake_key>;
# Use of uninitialized value of type Any in string context.
# Methods .^name, .perl, .gist, or .say can be used to stringify it to something
# meaningful.
# in block <unit> at <unknown file> line 1
say calls .gist which prints enough information for a human to understand what was printed.
put just tries to turn it into a Str and print that, but in this case it produces an error.
Perl 5 warns, not dies, in this case. Perl 6 will do the same if equivalent code is used:
my %key;
print "%key<fake_key>\n"; # Gives a warning
Or, more neatly, use put:
my %key;
put %key<fake_key>;
The put routine ("print using terminator") will stringify the value, which is what triggers the warning about use of an undefined value in string context.
By contrast, say does not stringify, but instead calls .gist on the object, and prints whatever it returns. In the case of an undefined value, the gist of it is the name of its type, wrapped in parentheses. In general, say - which uses .gist underneath - gives more information. For example, consider an array:
my #a = 1..5;
put #a; # 1 2 3 4 5
say #a; # [1 2 3 4 5]
Where put just joins the elements with spaces, but say represents the structure and that it's an array.

What scope does ":my $foo" have and what is it used for?

With a regex, token or rule, its possible to define a variable like so;
token directive {
:my $foo = "in command";
<command> <subject> <value>?
}
There is nothing about it in the language documentation here, and very little in S05 - Regexes and Rules, to quote;
Any grammar regex is really just a kind of method, and you may declare variables in such a routine using a colon followed by any scope declarator parsed by the Perl 6 grammar, including my, our, state, and constant. (As quasi declarators, temp and let are also recognized.) A single statement (up through a terminating semicolon or line-final closing brace) is parsed as normal Perl 6 code:
token prove-nondeterministic-parsing {
:my $threshold = rand;
'maybe' \s+ <it($threshold)>
}
I get that regexen within grammars are very similar to methods in classes; I get that you can start a block anywhere within a rule and if parsing successfully gets to that point, the block will be executed - but I don't understand what on earth this thing is for.
Can someone clearly define what it's scope is; explain what need it fulfills and give the typical use case?
What scope does :my $foo; have?
:my $foo ...; has the lexical scope of the rule/token/regex in which it appears.
(And :my $*foo ...; -- note the extra * signifying a dynamic variable -- has both the lexical and dynamic scope of the rule/token/regex in which it appears.)
What this is used for
Here's what happens without this construct:
regex scope-too-small { # Opening `{` opens a regex lexical scope.
{ my $foo = / bar / } # Block with its own inner lexical scope.
$foo # ERROR: Variable '$foo' is not declared
}
grammar scope-too-large { # Opening `{` opens lexical scope for gramamr.
my $foo = / bar / ;
regex r1 { ... } # `$foo` is recognized inside `r1`...
...
regex r999 { ... } # ...but also inside r999
}
So the : ... ; syntax is used to get exactly the desired scope -- neither too broad nor too narrow.
Typical use cases
This feature is typically used in large or complex grammars to avoid lax scoping (which breeds bugs).
For a suitable example of precise lexical only scoping see the declaration and use of #extra_tweaks in token babble as defined in a current snapshot of Rakudo's Grammar.nqp source code.
P6 supports action objects. These are classes with methods corresponding one-to-one with the rules in a grammar. Whenever a rule matches, it calls its corresponding action method. Dynamic variables provide precisely the right scoping for declaring variables that are scoped to the block (method, rule, etc.) they're declared in both lexically and dynamically -- which latter means they're available in the corresponding action method too. For an example of this, see the declaration of #*nibbles in Rakudo's Grammar module and its use in Rakudo's Actions module.