Hash with Array values in Perl 6 - raku

What's going on here?
Why are %a{3} and %a{3}.Array different if %a has Array values and %a{3} is an Array?
> my Array %a
{}
> %a{3}.push("foo")
[foo]
> %a{3}.push("bar")
[foo bar]
> %a{3}.push("baz")
[foo bar baz]
> .say for %a{3}
[foo bar baz]
> %a{3}.WHAT
(Array)
> .say for %a{3}.Array
foo
bar
baz

The difference being observed here is the same as with:
my $a = [1,2,3];
.say for $a; # [1 2 3]
.say for $a.Array; # 1\n2\n3\n
The $ sigil can be thought of as meaning "a single item". Thus, when given to for, it will see that and say "aha, a single item" and run the loop once. This behavior is consistent across for and operators and routines. For example, here's the zip operator given arrays and them itemized arrays:
say [1, 2, 3] Z [4, 5, 6]; # ((1 4) (2 5) (3 6))
say $[1, 2, 3] Z $[4, 5, 6]; # (([1 2 3] [4 5 6]))
By contrast, method calls and indexing operations will always be called on what is inside of the Scalar container. The call to .Array is actually a no-op since it's being called on an Array already, and its interesting work is actually in the act of the method call itself, which is unwrapping the Scalar container. The .WHAT is like a method call, and is telling you about what's inside of any Scalar container.
The values of an array and a hash are - by default - Scalar containers which in turn hold the value. However, the .WHAT used to look at the value was hiding that, since it is about what's inside the Scalar. By contrast, .perl [1] makes it clear that there's a single item:
my Array %a;
%a{3}.push("foo");
%a{3}.push("bar");
say %a{3}.perl; $["foo", "bar"]
There are various ways to remove the itemization:
%a{3}.Array # Identity minus the container
%a{3}.list # Also identity minus the container for Array
#(%a{3}) # Short for %a{3}.cache, which is same as .list for Array
%a{3}<> # The most explicit solution, using the de-itemize op
|%a{3} # Short for `%a{3}.Slip`; actually makes a Slip
I'd probably use for %a{3}<> { } in this case; it's both shorter than the method calls and makes clear that we're doing this purely to remove the itemization rather than a coercion.
While for |%a{3} { } also works fine and is visually nice, it is the only one that doesn't optimize down to simply removing something from its Scalar container, and instead makes an intermediate Slip object, which is liable to slow the iteration down a bit (though depending on how much work is being done by the loop, that could well be noise).
[1] Based on what I wrote, one may wonder why .perl can recover the fact that something was itemized. A method call $foo.bar is really doing something like $foo<>.^find_method('bar')($foo). Then, in a method bar() { self }, the self is bound to the thing the method was invoked on, removed from its container. However, it's possible to write method bar(\raw-self:) { } to recover it exactly as it was provided.

The issue is Scalar containers do DWIM indirection.
%a{3} is bound to a Scalar container.
By default, if you refer to the value or type of a Scalar container, you actually access the value, or type of the value, contained in the container.
In contrast, when you refer to an Array container as a single entity, you do indeed access that Array container, no sleight of hand.
To see what you're really dealing with, use .VAR which shows what a variable (or element of a composite variable) is bound to rather than allowing any container it's bound to to pretend it's not there.
say %a{3}.VAR ; # $["foo", "bar", "baz"]
say %a{3}.Array.VAR ; # [foo bar baz]
This is a hurried explanation. I'm actually working on a post specifically focusing on containers.

Related

How to use hyperoperators with Scalars that aren't really scalar?

I want to make a hash of sets. Well, SetHashes, since they need to be mutable.
In fact, I would like to initialize my Hash with multiple identical copies of the same SetHash.
I have an array containing the keys for the new hash: #keys
And I have my SetHash already initialized in a scalar variable: $set
I'm looking for a clean way to initialize the hash.
This works:
my %hash = ({ $_ => $set.clone } for #keys);
(The parens are needed for precedence; without them, the assignment to %hash is part of the body of the for loop. I could change it to a non-postfix for loop or make any of several other minor changes to get the same result in a slightly different way, but that's not what I'm interested in here.)
Instead, I was kind of hoping I could use one of Raku's nifty hyper-operators, maybe like this:
my %hash = #keys »=>» $set;
That expression works a treat when $set is a simple string or number, but a SetHash?
Array >>=>>> SetHash can never work reliably: order of keys in SetHash is indeterminate
Good to know, but I don't want it to hyper over the RHS, in any order. That's why I used the right-pointing version of the hyperop: so it would instead replicate the RHS as needed to match it up to the LHS. In this sort of expression, is there any way to say "Yo, Raku, treat this as a scalar. No, really."?
I tried an explicit Scalar wrapper (which would make the values harder to get at, but it was an experiment):
my %map = #keys »=>» $($set,)
And that got me this message:
Lists on either side of non-dwimmy hyperop of infix:«=>» are not of the same length while recursing
left: 1 elements, right: 4 elements
So it has apparently recursed into the list on the left and found a single key and is trying to map it to a set on the right which has 4 elements. Which is what I want - the key mapped to the set. But instead it's mapping it to the elements of the set, and the hyperoperator is pointing the wrong way for that combination of sizes.
So why is it recursing on the right at all? I thought a Scalar container would prevent that. The documentation says it prevents flattening; how is this recursion not flattening? What's the distinction being drawn?
The error message says the version of the hyperoperator I'm using is "non-dwimmy", which may explain why it's not in fact doing what I mean, but is there maybe an even-less-dwimmy version that lets me be even more explicit? I still haven't gotten my brain aligned well enough with the way Raku works for it to be able to tell WIM reliably.
I'm looking for a clean way to initialize the hash.
One idiomatic option:
my %hash = #keys X=> $set;
See X metaoperator.
The documentation says ... a Scalar container ... prevents flattening; how is this recursion not flattening? What's the distinction being drawn?
A cat is an animal, but an animal is not necessarily a cat. Flattening may act recursively, but some operations that act recursively don't flatten. Recursive flattening stops if it sees a Scalar. But hyperoperation isn't flattening. I get where you're coming from, but this is not the real problem, or at least not a solution.
I had thought that hyperoperation had two tests controlling recursing:
Is it hyperoperating a nodal operation (eg .elems)? If so, just apply it like a parallel shallow map (so don't recurse). (The current doc quite strongly implies that nodal can only be usefully applied to a method, and only a List one (or augmentation thereof) rather than any routine that might get hyperoperated. That is much more restrictive than I was expecting, and I'm sceptical of its truth.)
Otherwise, is a value Iterable? If so, then recurse into that value. In general the value of a Scalar automatically behaves as the value it contains, and that applies here. So Scalars won't help.
A SetHash doesn't do the Iterable role. So I think this refusal to hyperoperate with it is something else.
I just searched the source and that yields two matches in the current Rakudo source, both in the Hyper module, with this one being the specific one we're dealing with:
multi method infix(List:D \left, Associative:D \right) {
die "{left.^name} $.name {right.^name} can never work reliably..."
}
For some reason hyperoperation explicitly rejects use of Associatives on either the right or left when coupled with the other side being a List value.
Having pursued the "blame" (tracking who made what changes) I arrived at the commit "Die on Associative <<op>> Iterable" which says:
This can never work due to the random order of keys in the Associative.
This used to die before, but with a very LTA error about a Pair.new()
not finding a suitable candidate.
Perhaps this behaviour could be refined so that the determining factor is, first, whether an operand does the Iterable role, and then if it does, and is Associative, it dies, but if it isn't, it's accepted as a single item?
A search for "can never work reliably" in GH/rakudo/rakudo issues yields zero matches.
Maybe file an issue? (Update I filed "RFC: Allow use of hyperoperators with an Associative that does not do Iterable role instead of dying with "can never work reliably".)
For now we need to find some other technique to stop a non-Iterable Associative being rejected. Here I use a Capture literal:
my %hash = #keys »=>» \($set);
This yields: {a => \(SetHash.new("b","a","c")), b => \(SetHash.new("b","a","c")), ....
Adding a custom op unwraps en passant:
sub infix:« my=> » ($lhs, $rhs) { $lhs => $rhs[0] }
my %hash = #keys »my=>» \($set);
This yields the desired outcome: {a => SetHash(a b c), b => SetHash(a b c), ....
my %hash = ({ $_ => $set.clone } for #keys);
(The parens seem to be needed so it can tell that the curlies are a block instead of a Hash literal...)
No. That particular code in curlies is a Block regardless of whether it's in parens or not.
More generally, Raku code of the form {...} in term position is almost always a Block.
For an explanation of when a {...} sequence is a Hash, and how to force it to be one, see my answer to the Raku SO Is that a Hash or a Block?.
Without the parens you've written this:
my %hash = { block of code } for #keys
which attempts to iterate #keys, running the code my %hash = { block of code } for each iteration. The code fails because you can't assign a block of code to a hash.
Putting parens around the ({ block of code } for #keys) part completely alters the meaning of the code.
Now it runs the block of code for each iteration. And it concatenates the result of each run into a list of results, each of which is a Pair generated by the code $_ => $set.clone. Then, when the for iteration has completed, that resulting list of pairs is assigned, once, to my %hash.

Assign a Seq(Seq) into arrays

What is it the correct syntax to assign a Seq(Seq) into multiple typed arrays without assign the Seq to an scalar first? Has the Seq to be flattened somehow? This fails:
class A { has Int $.r }
my A (#ra1, #ra2);
#create two arrays with 5 random numbers below a certain limit
#Fails: Type check failed in assignment to #ra1; expected A but got Seq($((A.new(r => 3), A.n...)
(#ra1, #ra2) =
<10 20>.map( -> $up_limit {
(^5).map({A.new( r => (^$up_limit).pick ) })
});
TL;DR Binding is faster than assignment, so perhaps this is the best practice solution to your problem:
:(#ra1, #ra2) := <10 20>.map(...);
While uglier than the solution in the accepted answer, this is algorithmically faster because binding is O(1) in contrast to assignment's O(N) in the length of the list(s) being bound.
Assigning / copying
Simplifying, your non-working code is:
(#listvar1, #listvar2) = list1, list2;
In Raku infix = means assignment / copying from the right of the = into one or more of the container variables on the left of the =.
If a variable on the left is bound to a Scalar container, then it will assign one of the values on the right. Then the assignment process starts over with the next container variable on the left and the next value on the right.
If a variable on the left is bound to an Array container, then it uses up all remaining values on the right. So your first array variable receives both list1 and list2. This is not what you want.
Simplifying, here's Christoph's answer:
#listvar1, #listvar2 Z= list1, list2;
Putting the = aside for a moment, Z is an infix version of the zip routine. It's like (a physical zip pairing up consecutive arguments on its left and right. When used with an operator it applies that operator to the pair. So you can read the above Z= as:
#listvar1 = list1;
#listvar2 = list2;
Job done?
Assignment into Array containers entails:
Individually copying as many individual items as there are in each list into the containers. (In the code in your example list1 and list2 contain 5 elements each, so there would be 10 copying operations in total.)
Forcing the containers to resize as necessary to accommodate the items.
Doubling up the memory used by the items (the original list elements and the duplicates copied into the Array elements).
Checking that the type of each item matches the element type constraint.
Assignment is in general much slower and more memory intensive than binding...
Binding
:(#listvar1, #listvar2) := list1, list2;
The := operator binds whatever's on its left to the arguments on its right.
If there's a single variable on the left then things are especially simple. After binding, the variable now refers precisely to what's on the right. (This is especially simple and fast -- a quick type check and it's done.)
But that's not so in our case.
Binding also accepts a standalone signature literal on its left. The :(...) in my answer is a standalone Signature literal.
(Signatures are typically attached to a routine without the colon prefix. For example, in sub foo (#var1, #var2) {} the (#var1, #var2) part is a signature attached to the routine foo. But as you can see, one can write a signature separately and let Raku know it's a signature by prefixing a pair of parens with a colon. A key difference is that any variables listed in the signature must have already been declared.)
When there's a signature literal on the left then binding happens according to the same logic as binding arguments in routine calls to a receiving routine's signature.
So the net result is that the variables get the values they'd have inside this sub:
sub foo (#listvar1, #listvar2) { }
foo list1, list2;
which is to say the effect is the same as:
#listvar1 := list1;
#listvar2 := list2;
Again, as with Christoph's answer, job done.
But this way we'll have avoided assignment overhead.
Not entirely sure if it's by design, but what seems to happen is that both of your sequences are getting stored into #ra1, while #ra2 remains empty. This violates the type constraint.
What does work is
#ra1, #ra2 Z= <10 20>.map(...);

perl6 placeholder variable and topic variable

There are both placeholder variables and topic variables in Perl 6. For example, the following two statements are the same
say ( $_ * 2 for 3, 9 ); # use topic variables
say ( { $^i * 2 } for 3, 9 ); # use placeholder variables
It seems to me, the only benefit one gets from topic variables is saving some keyboard strokes.
My question is: Is there a use case, where a topic variable can be much more convenient than placeholder variables?
The topic can have method calls on it:
say ( .rand for 3,9);
Compared to a placeholder:
say ( {$^i.rand} for 3,9);
Saves on typing a variable name and the curly braces for the block.
Also the topic variable is the whole point of the given block to my understanding:
my #anArrayWithALongName=[1,2,3];
#anArrayWithALongName[1].say;
#anArrayWithALongName[1].sqrt;
#OR
given #anArrayWithALongName[1] {
.say;
.sqrt;
}
That's a lot less typing when there are a lot of operations on the same variable.
There are several topic variables, one for each sigil: $, #, %_ and even &_ (yep, routines are first-class citizens in Perl6). To a certain point, you can also use Whatever (*) and create a WhateverCode in a expression, saving even more typing (look, ma! No curly braces!).
You can use the array form for several variables:
my &block = { sum #_ }; say block( 2,3 )
But the main problem they have is that they are single variables, unable to reflect the complexity of block calls. The code above can be rewritten using placeholder variables like this:
my &block = { $^a + $^b }; say block( 2,3 )
But imagine you've got some non-commutative thing in your hands. Like here:
my &block = { #_[1] %% #_[0] }; say block( 3, 9 )
That becomes clumsy and less expressive than
my &block = { $^divi %% $^divd }; say block( 3, 9 ); # OUTPUT: «True␤»
The trick being here that the placeholder variables get assigned in alphabetical order, with divd being before divi, and divi being short for divisible and divd for divided (which you chould have used if you wanted).
At the end of the day, there are many ways to do it. You can use whichever you want.

How to assign the .lines Seq to a variable and iterate over it?

Assigning an iterator to variable changes apparently how the Seq behaves. E.g.
use v6;
my $i = '/etc/lsb-release'.IO.lines;
say $i.WHAT;
say '/etc/lsb-release'.IO.lines.WHAT;
.say for $i;
.say for '/etc/lsb-release'.IO.lines;
results in:
(Seq)
(Seq)
(DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS")
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
So once assigned I get only the string representation of the sequence. I know I can use .say for $i.lines to get the same output but I do not understand the difference between the assigned and unassigned iterator/Seq.
Assignment in Perl 6 is always about putting something into something else.
Assignment into a Scalar ($ sigil) stores the thing being assigned into a Scalar container object, meaning it will be treated as a single item; this is why for $item { } will not do an iteration. There are various ways to overcome this; the most conceptually simple way is to use the <> postfix operator, which strips away any Scalar container:
my $i = '/etc/lsb-release'.IO.lines;
.say for $i<>;
There's also the slip operator ("flatten into"), which will achieve the same:
my $i = '/etc/lsb-release'.IO.lines;
.say for |$i;
Assignment into an Array will - unless the right-hand side is marked lazy - iterate it and store each element into the Array. Thus:
my #i = '/etc/lsb-release'.IO.lines;
.say for #i;
Will work, but it will eagerly read all the lines into #i before the loop starts. This is OK for a small file, but less ideal for a large file, where we might prefer to work lazily (that is, only pulling a bit of the file into memory at a time). One might try:
my #i = lazy '/etc/lsb-release'.IO.lines;
.say for #i;
But that won't help with the retention problem; it just means the array will be populated lazily from the file as the iteration takes place. Of course, sometimes we might want to go through the lines multiple times, in which case assignment into an Array would be the best choice.
By contrast, declaring a symbol and binding it to that:
my \i = '/etc/lsb-release'.IO.lines;
.say for i;
Is not a "put into" operation at all; it just makes the symbol i refer to exactly what lines returns. This is rather clearer than putting it into a Scalar container only to take it out again. It's also a little easier on the reader, since a my \foo = ... can never be rebound, and so the reader doesn't need to be on the lookup for any potential changes later on in the code.
As a final note, it's worth knowing that the my \foo = ... form is actually a binding, rather than an assignment. Perl 6 allows us to write it with the = operator rather than forcing :=, even if in this case the semantics are := semantics. This is just one of a number of cases where a declaration with an initializer differs a bit from a normal assignment, e.g. has $!foo = rand actually runs the assignment on every object instantiation, while state $foo = rand only runs it only if we're on the first entry to the current closure clone.
If you want to be able to iterate over the sequence you need to either assign it to a positional :
my #i = '/etc/lsb-release'.IO.lines;
.say for #i;
Or you can tell the iterator that you want to treat the given thing as iterable :
.say for #$i
Or you can Slip it into a list for the iterator :
.say for |$i

Rebol: Dynamic binding of block words

In Rebol, there are words like foreach that allow "block parametrization" over a given word and a series, e.g., foreach w [1 2 3] [print w]. Since I find that syntax very convenient (as opposed to passing func blocks), I'd like to use it for my own words that operate on lazy lists, e.g map/stream x s [... x ... ].
How is that syntax idiom called? How is it properly implemented?
I was searching the docs, but I could not find a straight answer, so I tried to implement foreach on my own. Basically, my implementation comes in two parts. The first part is a function that binds a specific word in a block to a given value and yields a new block with the bound words.
bind-var: funct [block word value] [
qw: load rejoin ["'" word]
do compose [
set (:qw) value
bind [(block)] (:qw)
[(block)] ; This shouldn't work? see Question 2
]
]
Using that, I implemented foreach as follows:
my-foreach: func ['word s block] [
if empty? block [return none]
until [
do bind-var block word first s
s: next s
tail? s
]
]
I find that approach quite clumsy (and it probably is), so I was wondering how the problem can be solved more elegantly. Regardless, after coming up with my contraption, I am left with two questions:
In bind-var, I had to do some wrapping in bind [(block)] (:qw) because (block) would "dissolve". Why?
Because (?) of 2, the bind operation is performed on a new block (created by the [(block)] expression), not the original one passed to my-foreach, with seperate bindings, so I have to operate on that. By mistake, I added [(block)] and it still works. But why?
Great question. :-) Writing your own custom loop constructs in Rebol2 and R3-Alpha (and now, history repeating with Red) has many unanswered problems. These kinds of problems were known to the Rebol3 developers and considered blocking bugs.
(The reason that Ren-C was started was to address such concerns. Progress has been made in several areas, though at time of writing many outstanding design problems remain. I'll try to just answer your questions under the historical assumptions, however.)
In bind-var, I had to do some wrapping in bind [(block)] (:qw) because (block) would "dissolve". Why?
That's how COMPOSE works by default...and it's often the preferred behavior. If you don't want that, use COMPOSE/ONLY and blocks will not be spliced, but inserted as-is.
qw: load rejoin ["'" word]
You can convert WORD! to LIT-WORD! via to lit-word! word. You can also shift the quoting responsibility into your boilerplate, e.g. set quote (word) value, and avoid qw altogether.
Avoiding LOAD is also usually preferable, because it always brings things into the user context by default--so it loses the binding of the original word. Doing a TO conversion will preserve the binding of the original WORD! in the generated LIT-WORD!.
do compose [
set (:qw) value
bind [(block)] (:qw)
[(block)] ; This shouldn't work? see Question 2
]
Presumably you meant COMPOSE/DEEP here, otherwise this won't work at all... with regular COMPOSE the embedded PAREN!s cough, GROUP!s for [(block)] will not be substituted.
By mistake, I added [(block)] and it still works. But why?
If you do a test like my-foreach x [1] [print x probe bind? 'x] the output of the bind? will show you that it is bound into the "global" user context.
Fundamentally, you don't have any MAKE OBJECT! or USE to create a new context to bind the body into. Hence all you could potentially be doing here would be stripping off any existing bindings in the code for x and making sure they are into the user context.
But originally you did have a USE, that you edited to remove. That was more on the right track:
bind-var: func [block word value /local qw] [
qw: load rejoin ["'" word]
do compose/deep [
use [(qw)] [
set (:qw) value
bind [(block)] (:qw)
[(block)] ; This shouldn't work? see Question 2
]
]
]
You're right to suspect something is askew with how you're binding. But the reason this works is because your BIND is only redoing the work that USE itself does. USE already deep walks to make sure any of the word bindings are adjusted. So you could omit the bind entirely:
do compose/deep [
use [(qw)] [
set (:qw) value
[(block)]
]
]
the bind operation is performed on a new block (created by the [(block)] expression), not the original one passed to my-foreach, with separate bindings
Let's adjust your code by taking out the deep-walking USE to demonstrate the problem you thought you had. We'll use a simple MAKE OBJECT! instead:
bind-var: func [block word value /local obj qw] [
do compose/deep [
obj: make object! [(to-set-word word) none]
qw: bind (to-lit-word word) obj
set :qw value
bind [(block)] :qw
[(block)] ; This shouldn't work? see Question 2
]
]
Now if you try my-foreach x [1 2 3] [print x]you'll get what you suspected... "x has no value" (assuming you don't have some latent global definition of x it picks up, which would just print that same latent value 3 times).
But to make you sufficiently sorry you asked :-), I'll mention that my-foreach x [1 2 3] [loop 1 [print x]] actually works. That's because while you were right to say a bind in the past shouldn't affect a new block, this COMPOSE only creates one new BLOCK!. The topmost level is new, any "deeper" embedded blocks referenced in the source material will be aliases of the original material:
>> original: [outer [inner]]
== [outer [inner]]
>> composed: compose [<a> (original) <b>]
== [<a> outer [inner] <b>]
>> append original/2 "mutation"
== [inner "mutation"]
>> composed
== [<a> outer [inner "mutation"] <b>]
Hence if you do a mutating BIND on the composed result, it can deeply affect some of your source.
until [
do bind-var block word first s
s: next s
tail? s
]
On a general note of efficiency, you're running COMPOSE and BIND operations on each iteration of your loop. No matter how creative new solutions to these kinds of problems get (there's a LOT of new tech in Ren-C affecting your kind of problem), you're still probably going to want to do it only once and reuse it on the iterations.