How are dynamically scoped variables implemented in Rakudo/MoarVM? - raku

That is, variables like $*scalar, #*array and %*hash.
I'm asking this question mainly because I want to have an idea of how much of a burden they are on the overall performance of the grammar/regex engine.
To add a bit of precision to my question, I would like to know what happens :
on lookup,
my $var-1 = $*scalar
my $var-2 = #*array[3]
my $var-3 = %*hash<key>
and on insertion,
$*scalar = "new value 1"
#*array[3] = "new value 2"
%*hash{"existing key"} = "new value 3"
%*hash{"new key"} = "new value 3"
in relationship with the symbol table, the call stack, or any data structure involved like a special stack for handling dynamical scoping, or maybe a hash table of stacks.
My main interest is with array and hashes, and if they are both entirely duplicated each time we push/pop/modify/insert an array element or insert/modify/delete a key/value pair of a hash.
EDIT:
My assumptions about dynamical scoping was wrong. I thought that, when modifying a dynamically scoped hash, the changes would be undone at the end of the lexical scope in which the change has been made, similar to what happens in Perl with the local keyword and its associated "temporaries stack", hence my "Is the data structure duplicated ?" question. But this is not what dynamically scoped means it seems.
My assumption was that this following Raku code was equivalent to the following Perl code.
Instead, to replicate the behavior of the Perl code, we must do what the 2nd piece of Raku code do.
Raku 1:
my %*hash = (key => "value");
sub test() {
say "test() start";
say "\t", %*hash;
%*hash<key>="new value";
say "\t", %*hash;
say "test() end";
}
say %*hash;
test();
say %*hash;
OUTPUT:
{key => value}
test() start
{key => value}
{key => new value}
test() end
{key => new value}
Perl
our %hash=(key => "value");
sub test {
local %hash=%hash;
say "test() start";
say "\t", %hash;
$hash{key}="new value";
say "\t", %hash;
say "test() end"
}
say %hash;
test();
say %hash;
OUTPUT:
keyvalue
test() start
keyvalue
keynew value
test() end
keyvalue
Raku 2:
my %*hash = (key => "value");
sub test() {
my %*hash=CALLERS::<%*hash>;
say "test() start";
say "\t", %*hash;
%*hash<key>="new value";
say "\t", %*hash;
say "test() end";
}
say %*hash;
test();
say %*hash;
OUTPUT:
{key => value}
test() start
{key => value}
{key => new value}
test() end
{key => value}

The costs of looking up a dynamically scoped variable are independent of the sigil. For the purposes of lookup, the sigil is just part of the variable name to locate. The compilation of #a[3] = $x and #*a[3] = $x are identical except for the lookup of the #a/#*a - that is, they both result in a call to postcircumfix:<[ ]>, passing the resolved variable, the index, and the value to assign, and that leads to the assignment.
Arrays and hashes are implemented as mutable data structures; the only significant copying that would take place is if the dynamic array has to grow or the hash table needs its backing storage to grow to continue to provide efficient lookup.
Putting aside the fallback to GLOBAL and PROCESS, dynamic variables are stored in lexical symbol tables (thus their declaration with my). In MoarVM, the term "frame" is used for a record on the callstack, conceptually [1] created at the entry to a sub or block and destroyed at its exit. The term "static frame" is used for a data structure that represents all the things that are invariant over invocations of a sub or block. (So, if one writes a recursive sub and call it, there are many frames, but only one static frame). So far as lexical storage goes, the static frame contains a hash table mapping lexical variable names (strings) into integers. The values of a lexical are stored in the frame, as an array with the indices mapped by the static frame's hash table.
Most lexical variable lookups ($a) entirely bypass doing named lookups, being resolved at compile time to the applicable index.
By contrast, dynamic variable lookups ($*a) do use the static frame's lookup table. Dynamic variable lookup proceeds by walking the callstack, doing a lookup in the static frame's hash of lexicals to see if such a variable is declared, and if so the index is used to resolve it in the frame. There are some generic mechanisms and one special-case mechanism to improve performance here.
The generic ones are:
Strings have their hash codes cached, so the string $*a won't have to be repeatedly hashed; the integer hash code is just there and ready for the hash table lookup.
Strings are interned within a compilation unit, so if the declaration and usage of a dynamic variable occur within the same file, then the string comparison can short-circuit at pointer equality.
The special case mechanism consists of a dynamic variable lookup cache that can be established on any frame. This stores a name together with a pointer to where the dynamic variable is stored on the callstack. This provides a "shortcut" that can be especially valuable for frequently accessed dynamic variables that are declared a huge number of frames down the callstack. See this code for more details on how the cache entries are installed. (They are invalidated inside of the sliced frames when a continuation is taken; they used to also be broadly invalidated during deoptimization, but that changed with the adoption of lazy deoptimization on stack unwind a year or so ago.)
[1] Many Raku programs running on MoarVM enjoy a relatively high rate of inlining, which eliminates the cost of creation/destruction of a frame.

Related

Terraform module as "custom function"

It is possible to use some i.e. local module to return let' say same calculated output. But how can you pass some parameters? So each time you will ask for the output value you will get different value according to the parameter(ie different prefix)
Is it possible to pass resource to module and enhance it with tags?
I can imagine that both cases are more likely to be case for providers, but for some simple case it should work maybe. The best would be if they implemented some custom function that you will be able to call at will.
It is possible in principle to write a Terraform module that only contains "named values", which is the broad term for the three module features Input Variables (analogous to function arguments), Local Values (analogous to local declarations inside your function), and Output Values (analogous to return values).
Such a module would not contain any resource or data blocks at all and would therefore be a "computation-only" module, which therefore has all of the same capabilities as a function in a functional programming language.
variable "a" {
type = number
}
variable "b" {
type = number
}
locals {
sum = var.a + var.b
}
output "sum" {
value = local.sum
}
The above example is contrived just to show the principle. A "function" this simple doesn't really need the local value local.sum, because its expression could just be written inline in the value of output "sum", but I wanted to show examples of all three of the relevant constructs here.
You would "call the function" by declaring a module call referring to the directory containing the file with the above source code in it:
module "example" {
source = "./modules/sum"
a = 1
b = 2
}
output "result" {
value = module.example.sum
}
I included the output "result" block here to show how you can refer to the result of the "function" elsewhere in your module, as module.example.sum.
Of course, this syntax is much more "chunky" than a typical function call, so in practice Terraform module authors will use this approach only when the factored out logic is significant enough to justify it. Verbosity aside though, you can include as many module blocks referring to that same module as you like if you need to call the "function" with different sets of arguments. Each call to the module can take a different set of input variable values and therefore produce a different result.

Is the term immutable variable just a convention?

In Rust variables are immutable by default, i.e., they don't vary but are not constants (as noted here).
Do they retain the name "variable" just by convention, or is there another reason why the term "variable" is maintained?
It should be noted that the term mut in Rust was hotly debated before stabilization with some arguing that it should be called excl or uniq. The matter is that the mut in in let mut x and &mut x are two completely different things.
let mut x declares that x is mutable, in the sense that it can be re-assigned, but also that one can take a &mut reference of it; which is best called an exclusive or unique reference. It is quite possible in Rust in some cases to mutate through a shared reference in the case of std::cell::Cell, for instance, and not all operations that require an exclusive reference involve mutation. An operation that requires an exclusive reference is simply one that would be unsafe with a shared one; Cell is designed in such a way that it is not, by strictly controlling under what conditions mutation can occur.
In theory, the two functions of let mut x could have different keywords, but they are compressed into one for simplicity. Rust could in theory be designed with mut and excl being different keywords, and allowing for let excl x, which would be a variable wherefrom one could take an exclusive reference, but not mutate.
One can also have variables that are not declared with mut, in particular in function calls. In a signature like fn func ( x : u32 ), x is not mutable, but it is variable, because it a different x can be passed every single time.
The let mut x type of "mutable" is purely a lint and, in theory, unnecessary for Rust to work — any currently working Rust program will continue to work if all non-mutable variables be made mutable. It's simply considered bad practice to do so and the compiler will warn the programmer whenever he make a variable mutable that isn't necessary to be mutable; this helps catching unintended bugs. This is absolutely not the case with exclusive and shared references, which are necessary to be distinguished and more than just a lint.
Here "variable" means "factor involved in computation" not "varying". This is from the mathematical principle where expressions like f(x) include x, a variable, as a part of the equation.
In Rust, like with other languages, you'll need variables (e.g. input) that affects how the program runs, otherwise your program would only ever behave in a singular, specific way, producing the same output each time.
You'll need to think of what variables change during processing and which do not. Those that do not need to change do not need to be declared mutable.
Regardless of if or when they change, they're still considered variables.
In C++ you'll have things like const int x which is a constant (read-only) variable, so the term can take on all sorts of specific meanings.
Is the term immutable variable just a convention?
By definition every... definition of a word is a convention, language, meaning of the word, change by time, is unique for every people that live, you can take 100 peoples and end with 100 difference definition of 1 word. That why we often start scientific paper by defining word that could be miss understand in the paper. Trying to clarify as much as possible. Rust does not differs that why we have The Reference
We have a specific section for variable
A variable is a component of a stack frame, either a named function
parameter, an anonymous temporary, or a named local variable.
A local variable (or stack-local allocation) holds a value directly,
allocated within the stack's memory. The value is a part of the stack
frame.
Local variables are immutable unless declared otherwise. For example:
let mut x = ....
Function parameters are immutable unless declared with mut. The mut
keyword applies only to the following parameter. For example: |mut x,
y| and fn f(mut x: Box, y: Box) declare one mutable variable
x and one immutable variable y.
Local variables are not initialized when allocated. Instead, the
entire frame worth of local variables are allocated, on frame-entry,
in an uninitialized state. Subsequent statements within a function may
or may not initialize the local variables. Local variables can be used
only after they have been initialized through all reachable control
flow paths.
So there is not much to add, variable in rust is clearly defined, it doesn't matter if your definition doesn't match or you find a definition of variable that doesn't match Rust one. In the context of Rust, variable is that. If you want to ask about opinion about this choice then it's off topic as opinion oriented. But, wiki definition make Rust definition quite standard both from mathematics view than computer science:
Variable (computer science), a symbolic name associated with a value and whose associated value may be changed
Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many sciences

Array of objects

Let's say I want to connect to two package repositories, make a query for a package name, combine the result from the repos and process it (filter, unique, prioritize,...), What is a good way to do that?
What I though about is creating Array of two Cro::HTTP::Client objects (with base-uri specific to each repo), and when I need to make HTTP request I call #a>>.get, then process the result from the repos together.
I have attached a snippet of what I'm trying to do. But I would like to see if there is a better way to do that. or if the approach mention in the following link is suitable for this use case! https://perl6advent.wordpress.com/2013/12/08/day-08-array-based-objects/
use Cro::HTTP::Client;
class Repo {
has $.name;
has Cro::HTTP::Client $!client;
has Cro::Uri $.uri;
has Bool $.disable = False;
submethod TWEAK () {
$!client = Cro::HTTP::Client.new(base-uri => $!uri, :json);
}
method get (:$package) {
my $path = <x86_64?>;
my $resp = await $!client.get($path ~ $package);
my $json = await $resp.body;
return $json;
}
}
class AllRepos {
has Repo #.repo;
method get (:$package) {
# check if some repos are disabled
my #candidate = #!repo>>.get(:$package).unique(:with(&[eqv])).flat;
# do furthre processign of the data then return it;
return #candidate;
}
}
my $repo1 = Repo.new: name => 'repo1', uri => Cro::Uri.new(:uri<http://localhost:80>);
my $repo2 = Repo.new: name => 'repo2', uri => Cro::Uri.new(:uri<http://localhost:77>);
my #repo = $repo1, $repo2;
my $repos = AllRepos.new: :#repo;
#my #packages = $repos.get: package => 'rakudo';
Let's say I want to connect to two package repositories, make a query for a package name, combine the result from the repos and process it (filter, unique, prioritize,...), What is a good way to do that?
The code you showed looks like one good way in principle but not, currently, in practice.
The hyperoperators such as >>:
Distribute an operation (in your case, connect and make a query) ...
... to the leaves of one or two input composite data structures (in your case the elements of one array #!repo) ...
... with logically parallel semantics (by using a hyperoperator you are declaring that you are taking responsibility for thinking that the parallel invocations of the operation will not interfere with each other, which sounds reasonable for connecting and querying) ...
... and then return a resulting composite data structure with the same shape as the original structure if the hyperoperator is a unary operator (which applies in your case, because you applied >>, which is an unary operator which takes a single argument on its left, so the result of the >>.get is just a new array, just like the input #!repo) or whose shape is the hyper'd combination of the shapes of the pair of structures if the hyperoperator is a binary operator, such as >>op<< ...
... which can then be further processed (in your case it is, with .unique, which will produce a resulting Seq) ...
... whose elements you then assign back into another array (#candidate).
So your choice is a decent fit in principle, but the commitment to parallelism is only semantic and right now the Rakudo compiler never takes advantage of it, so it will actually run your code sequentially, which presumably isn't a good fit in practice.
Instead I suggest you consider:
Using map to distribute an operation over multiple elements (in a shallow manner; map doesn't recursively descend into a deep structure like the hyperoperators, deepmap etc., but that's OK for your use case) ...
... in combination with the race method which parallelizes the method it proceeds.
So you might write:
my #candidate =
#!repo.hyper.map(*.get: :$package).unique(:with(&[eqv])).flat;
Alternatively, check out task 94 in Using Perl 6.
if the approach mention in the following link is suitable for this use case! https://perl6advent.wordpress.com/2013/12/08/day-08-array-based-objects/
I don't think so. That's about constructing a general purpose container that's like an array but with some differences to the built in Array that are worth baking into a new type.
I can just about imagine such things that are vaguely related to your use case -- eg an array type that automatically hyper distributes method calls invoked on it, if they're defined on Any or Mu (rather than Array or List), i.e. does what I described above but with the code #!repo.get... instead of hyper #!repo.map: *.get .... But would it be worth it (assuming it would work -- I haven't thought about it beyond inventing the idea for this answer)? I doubt it.
More generally...
It seems like what you are looking for is cookbook like material. Perhaps a question posted at the reddit sub /r/perl6 is in order?

Mixing-in roles in traits apparently not working

This example is taken from roast, although it's been there for 8 years:
role doc { has $.doc is rw }
multi trait_mod:<is>(Variable $a, :$docced!) {
$a does doc.new(doc => $docced);
}
my $dog is docced('barks');
say $dog.VAR;
This returns Any, without any kind of role mixed in. There's apparently no way to get to the "doc" part, although the trait does not error. Any idea?
(This answer builds on #guifa's answer and JJ's comment.)
The idiom to use in variable traits is essentially $var.var.VAR.
While that sounds fun when said aloud it also seems crazy. It isn't, but it demands explanation at the very least and perhaps some sort of cognitive/syntactic relief.
Here's the brief version of how to make some sense of it:
$var makes sense as the name of the trait parameter because it's bound to a Variable, a compiler's-eye view of a variable.
.var is needed to access the user's-eye view of a variable given the compiler's-eye view.
If the variable is a Scalar then a .VAR is needed as well to get the variable rather than the value it contains. (It does no harm if it isn't a Scalar.)
Some relief?
I'll explain the above in more detail in a mo, but first, what about some relief?
Perhaps we could introduce a new Variable method that does .var.VAR. But imo this would be a mistake unless the name for the method is so good it essentially eliminates the need for the $var.var.VAR incantation explanation that follows in the next section of this answer.
But I doubt such a name exists. Every name I've come up with makes matters worse in some way. And even if we came up with the perfect name, it would still barely be worth it at best.
I was struck by the complexity of your original example. There's an is trait that calls a does trait. So perhaps there's call for a routine that abstracts both that complexity and the $var.var.VAR. But there are existing ways to reduce that double trait complexity anyway, eg:
role doc[$doc] { has $.doc is rw = $doc}
my $dog does doc['barks'];
say $dog.doc; # barks
A longer explanation of $var.var.VAR
But $v is already a variable. Why so many var and VARs?
Indeed. $v is bound to an instance of the Variable class. Isn't that enough?
No, because a Variable:
Is for storing metadata about a variable while it's being compiled. (Perhaps it should have been called Metadata-About-A-Variable-Being-Compiled? Just kidding. Variable looks nice in trait signatures and changing its name wouldn't stop us needing to use and explain the $var.var.VAR idiom anyway.)
Is not the droid we are looking for. We want a user's-eye view of the variable. One that's been declared and compiled and is then being used as part of user code. (For example, $dog in the line say $dog.... Even if it were BEGIN say $dog..., so it ran at compile-time, $dog would still refer to a symbol that's bound to a user's-eye view container or value. It would not refer to the Variable instance that's only the compiler's-eye view of data related to the variable.)
Makes life easier for the compiler and those writing traits. But it requires that a trait writer accesses the user's-eye view of the variable to access or alter the user's-eye view. The .var attribute of the Variable stores that user's-eye view. (I note the roast test has a .container attribute that you omitted. That's clearly now been renamed .var. My guess is that that's because a variable may be bound to an immutable value rather than a container so the name .container was considered misleading.)
So, how do we arrive at $var.var.VAR?
Let's start with a variant of your original code and then move forward. I'll switch from $dog to #dog and drop the .VAR from the say line:
multi trait_mod:<is>(Variable $a, :$docced!) {
$a does role { has $.doc = $docced }
}
my #dog is docced('barks');
say #dog.doc; # No such method 'doc' for invocant of type 'Array'
This almost works. One tiny change and it works:
multi trait_mod:<is>(Variable $a, :$docced!) {
$a.var does role { has $.doc = $docced }
}
my #dog is docced('barks');
say #dog.doc; # barks
All I've done is add a .var to the ... does role ... line. In your original, that line is modifying the compiler's-eye view of the variable, i.e. the Variable object bound to $a. It doesn't modify the user's-eye view of the variable, i.e. the Array bound to #dog.
As far as I know everything now works correctly for plural containers like arrays and hashes:
#dog[1] = 42;
say #dog; # [(Any) 42]
say #dog.doc; # barks
But when we try it with a Scalar variable:
my $dog is docced('barks');
we get:
Cannot use 'does' operator on a type object Any.
This is because the .var returns whatever it is that the user's-eye view variable usually returns. With an Array you get the Array. But with a Scalar you get the value the Scalar contains. (This is a fundamental aspect of P6. It works great but you have to know it in these sorts of scenarios.)
So to get this to appear to work again we have to add a couple .VAR's as well. For anything other than a Scalar a .VAR is a "no op" so it does no harm to cases other than a Scalar to add it:
multi trait_mod:<is>(Variable $a, :$docced!) {
$a.var.VAR does role { has $.doc = $docced }
}
And now the Scalar case also appears to work:
my $dog is docced('barks');
say $dog.VAR.doc; # barks
(I've had to reintroduce the .VAR in the say line for the same reason I had to add it to the $a.var.VAR ... line.)
If all were well that would be the end of this answer.
A bug
But something is broken. If we'd attempted to initialize the Scalar variable:
my $dog is docced('barks') = 42;
we'd see:
Cannot assign to an immutable value
As #guifa noted, and I stumbled on a while back:
It seems that a Scalar with a mixin no longer successfully functions as a container and the assignment fails. This currently looks to me like a bug.
Not a satisfactory answer but maybe you can progress from it
role doc {
has $.doc is rw;
}
multi trait_mod:<is>(Variable:D $v, :$docced!) {
$v.var.VAR does doc;
$v.var.VAR.doc = $docced;
}
say $dog; # ↪︎ Scalar+{doc}.new(doc => "barks")
say $dog.doc;  # ↪︎ barks
$dog.doc = 'woofs'; #
say $dog; # ↪︎ Scalar+{doc}.new(doc => "woofs")
Unfortunately, there is something off with this, and applying the trait seems to cause the variable to become immutable.

Using a variable in a Perl 6 program before assigning to it

I want to assign literals to some of the variables at the end of the file with my program, but to use these variables earlier. The only method I've come up with to do it is the following:
my $text;
say $text;
BEGIN {
$text = "abc";
}
Is there a better / more idiomatic way?
Just go functional.
Create subroutines instead:
say text();
sub text { "abc" }
UPDATE (Thanks raiph! Incorporating your feedback, including reference to using term:<>):
In the above code, I originally omitted the parentheses for the call to text, but it would be more maintainable to always include them to prevent the parser misunderstanding our intent. For example,
say text(); # "abc"
say text() ~ text(); # "abcabc"
say text; # "abc", interpreted as: say text()
say text ~ text; # ERROR, interpreted as: say text(~text())
sub text { "abc" };
To avoid this, you could make text a term, which effectively makes the bareword text behave the same as text():
say text; # "abc", interpreted as: say text()
say text ~ text; # "abcabc", interpreted as: say text() ~ text()
sub term:<text> { "abc" };
For compile-time optimizations and warnings, we can also add the pure trait to it (thanks Brad Gilbert!). is pure asserts that for a given input, the function "always produces the same output without any additional side effects":
say text; # "abc", interpreted as: say text()
say text ~ text; # "abcabc", interpreted as: say text() ~ text()
sub term:<text> is pure { "abc" };
Unlike Perl 5, in Perl 6 a BEGIN does not have to be a block. However, the lexical definition must be seen before it can be used, so the BEGIN block must be done before the say.
BEGIN my $text = "abc";
say $text;
Not sure whether this constitutes an answer to your question or not.
First, a rephrase of your question:
What options are there for succinctly referring to a variable (or constant etc.) whose initialization code appears further down in the same source file?
Post declare a routine
say foo;
sub foo { 'abc' }
When a P6 compiler parses an identifier that has no sigil, it checks to see if it has already seen a declaration of that identifier. If it hasn't, then it assumes that the identifier corresponds to a routine which will be declared later as a "listop" routine (which takes zero or more arguments) and moves on. (If its assumption turns out to be wrong, it fails the compilation.)
So you can use routines as if they were variables as described in Christopher Bottom's answer.
Autodeclare a variable on first use
strict is a "pragma" that controls how a P6 compiler reacts when it parses an as yet undeclared variable/constant that starts with a sigil.
P6 starts programs with strict mode switched on. This means that the compiler will insist on predeclaration of any sigil'd variable/constant. (By predeclaration I mean an explicit declaration that appears textually before the variable/constant is used.)
But you can write use strict or no strict to control whether the strict pragma is on or off in a given lexical scope, so this will work:
no strict;
say $text;
BEGIN {
$text = "abc";
}
Warning Having no strict in effect (which is unfortunately how most programming languages work) makes accidental misspelling of variable names a bigger nuisance than it is with use strict mode on.
Declare a variable explicitly in the same statement as its first use
You don't have to write a declaration as a separate statement. You can instead declare and use a variable in the same statement or expression:
say my $text;
BEGIN {
$text = "abc";
}
Warning If you repeat my $bar in the exact same lexical scope, the compiler will emit a warning. In contrast, say my $bar = 42; if foo { say my $bar = 99 } creates two distinct $bar variables without warning.
Initialize at run-time
The BEGIN phaser shown above runs at compile-time (after the my declaration, which also happens at compile-time, but before the say, which happens at run-time).
If you want to initialize variables/constants at run-time instead, use INIT instead:
say my $text;
INIT {
$text = "abc";
}
INIT code runs before any other run-time code, so the initialization still happens before the say gets executed.
Use a positronic (ym) variable
Given a literal interpretation of your question a "positronic" or ym variable would be yet another solution. (This feature is not built-in. I'm including it mostly because I encountered it after answering this question and think it belongs here, at the very least for entertainment value.)
Initialization and calculation of such a variable starts in the last statement using it and occurs backwards relative to the textual order of the code.
This is one of the several crazy sounding but actually working and useful concepts that Damian "mad scientist" Conway discusses in his 2011 presentation Temporally Quaquaversal Virtual Nanomachine Programming In Multiple Topologically Connected Quantum-Relativistic Parallel Spacetimes... Made Easy!.
Here's a link to the bit where he focuses on these variables.
(The whole presentation is a delight, especially if you're interested in physics; programming techniques; watching highly creative wunderkinds; and/or enjoy outstanding presentation skills and humor.)
Create a PS pragma?
In terms of coolness, the following pales in comparison to Damian's positronic variable feature that I just covered, but it's an idea I had while pondering this question.
Someone could presumably implement something like the following pragma:
use PS;
say $text;
BEGIN $text = 'abc';
This PS would lexically apply no strict and in addition require that, to avoid a compile-time error:
An auto-declared variable/constant must match up with a post declaration in a BEGIN or INIT phaser;
The declaration must include initialization if the first use (textually) of a variable/constant is not a binding or assignment.