Perl6 hyper » operator doesn't work like map

Perl6 hyper » operator doesn't work like map - raku

As far as I undertand it, the hyper operator » is a shortcut for map(). Why does the following return two different results and in the second example .sum seems not to be applied?
say ([1,2], [2, 2], [3, 3]).map({.sum});
# (3 4 6)
say ([1,2], [2, 2], [3, 3])».sum;
# ([1 2] [2 2] [3 3])

Hyperops descend recursively into sublists. Also they are candidates for autothreading (NYI) what means their operations are out of order.
Also there was a bug that is corrected with https://github.com/rakudo/rakudo/commit/c8c27e93d618bdea7de3784575d867d9e7a2f6cb .
say ([1,2], [2, 2], [3, 3])».sum;
# (3 4 6)

TL;DR You've almost certainly encountered a bug. That said, map and the » hyperop have major differences.
map returns a Seq. This Seq yields the results of applying user supplied code to each of the elements of a user supplied data structure:
one level deep (traversal of the data structure is shallow -- map doesn't recursively descend in to sub structures of the data structure's top level)
one at a time (everything is done sequentially, nothing in parallel)
lazily (map returns immediately; the user supplied code is applied to the user supplied data structure to generate results later as needed to pull values from the Seq)
The » hyperop returns the data structure operand on its left after first applying the unary operation on its right to the elements of that data structure:
only one level deep or descending to leaves as dictated by the unary operation
in parallel batches, at least semantically (it's the programmer's responsibility to pick an unary operation that will yield correct results when applied to multiple elements in parallel in arbitrary order)
eagerly (unlike a map call, a hyperoperation only returns when the unary operator has been applied to the entire data structure)
If you're applying an unary operator that is "nodal" (so hyperoperation will choose not to descend) or the data structure being operated on is only one level deep (so there are no lower level leaves for the hyperoperation to descend to) then the differences between hyperoperation and map with an unary operator are just the sequential/parallel and lazy/eager aspects.
It seems pretty clear to me that sum ought to be a nodal operator otherwise it will descend in to sub structures until it arrives at individual leaves and will thus end up being applied to a bunch of single values which is pointless. ETA: Looks like it's fixed now.

Related

How to generate a lazy division?

I want to generate the sequence of
1, 1/2, 1/3, 1/4 ... *
using functional programming approach in raku, in my head it's should be look like:
(1,{1/$_} ...*)[0..5]
but the the output is: 1,1,1,1,1
The idea is simple, but seems enough powerful for me to using to generate for other complex list and work with it.
Others things that i tried are using a lazy list to call inside other lazy list, it doesn't work either, because the output is a repetitive sequence: 1, 0.5, 1, 0.5 ...
my list = 0 ... *;
(1, {1/#list[$_]} ...*)[0..5]

See #wamba's wonderful answer for solutions to the question in your title. They showcase a wide range of applicable Raku constructs.
This answer focuses on Raku's sequence operator (...), and the details in the body of your question, explaining what went wrong in your attempts, and explaining some working sequences.
TL;DR
The value of the Nth term is 1 / N.
# Generator ignoring prior terms, incrementing an N stored in the generator:
{ 1 / ++$ } ... * # idiomatic
{ state $N; $N++; 1 / $N } ... * # longhand
# Generator extracting denominator from prior term and adding 1 to get N:
1/1, 1/2, 1/3, 1/(*.denominator+1) ... * # idiomatic (#jjmerelo++)
1/1, 1/2, 1/3, {1/(.denominator+1)} ... * # longhand (#user0721090601++)
What's wrong with {1/$_}?
1, 1/2, 1/3, 1/4 ... *
What is the value of the Nth term? It's 1/N.
1, {1/$_} ...*
What is the value of the Nth term? It's 1/$_.
$_ is a generic parameter/argument/operand analogous to the English pronoun "it".
Is it set to N?
No.
So your generator (lambda/function) doesn't encode the sequence you're trying to reproduce.
What is $_ set to?
Within a function, $_ is bound either to (Any), or to an argument passed to the function.
If a function explicitly specifies its parameters (a "parameter" specifies an argument that a function expects to receive; this is distinct from the argument that a function actually ends up getting for any given call), then $_ is bound, or not bound, per that specification.
If a function does not explicitly specify its parameters -- and yours doesn't -- then $_ is bound to the argument, if any, that is passed as part of the call of the function.
For a generator function, any value(s) passed as arguments are values of preceding terms in the sequence.
Given that your generator doesn't explicitly specify its parameters, the immediately prior term, if any, is passed and bound to $_.
In the first call of your generator, when 1/$_ gets evaluated, the $_ is bound to the 1 from the first term. So the second term is 1/1, i.e. 1.
Thus the second call, producing the third term, has the same result. So you get an infinite sequence of 1s.
What's wrong with {1/#list[$_+1]}?
For your last example you presumably meant:
my #list = 0 ... *;
(1, {1/#list[$_+1]} ...*)[0..5]
In this case the first call of the generator returns 1/#list[1+1] which is 1/2 (0.5).
So the second call is 1/#list[0.5+1]. This specifies a fractional index into #list, asking for the 1.5th element. Indexes into standard Positionals are rounded down to the nearest integer. So 1.5 is rounded down to 1. And #list[1] evaluates to 1. So the value returned by the second call of the generator is back to 1.
Thus the sequence alternates between 1 and 0.5.
What arguments are passed to a generator?
Raku passes the value of zero or more prior terms in the sequence as the arguments to the generator.
How many? Well, a generator is an ordinary Raku lambda/function. Raku uses the implicit or explicit declaration of parameters to determine how many arguments to pass.
For example, in:
{42} ... * # 42 42 42 ...
the lambda doesn't declare what parameters it has. For such functions Raku presumes a signature including $_?, and thus passes the prior term, if any. (The above lambda ignores it.)
Which arguments do you need/want your generator to be passed?
One could argue that, for the sequence you're aiming to generate, you don't need/want to pass any of the prior terms. Because, arguably, none of them really matter.
From this perspective all that matters is that the Nth term computes 1/N. That is, its value is independent of the values of prior terms and just dependent on counting the number of calls.
State solutions such as {1/++$}
One way to compute this is something like:
{ state $N; $N++; 1/$N } ... *
The lambda ignores the previous term. The net result is just the desired 1 1/2 1/3 ....
(Except that you'll have to fiddle with the stringification because by default it'll use gist which will turn the 1/3 into 0.333333 or similar.)
Or, more succinctly/idiomatically:
{ 1 / ++$ } ... *
(An anonymous $ in a statement/expression is a simultaneous declaration and use of an anonymous state scalar variable.)
Solutions using the prior term
As #user0721090601++ notes in a comment below, one can write a generator that makes use of the prior value:
1/1, 1/2, 1/3, {1/(.denominator+1)} ... *
For a generator that doesn't explicitly specify its parameters, Raku passes the value of the prior term in the sequence as the argument, binding it to the "it" argument $_.
And given that there's no explicit invocant for .denominator, Raku presumes you mean to call the method on $_.
As #jjmerelo++ notes, an idiomatic way to express many lambdas is to use the explicit pronoun "whatever" instead of "it" (implicit or explicit) to form a WhateverCode lambda:
1/1, 1/2, 1/3, 1/(*.denominator+1) ... *
You drop the braces for this form, which is one of its advantages. (You can also use multiple "whatevers" in a single expression rather than just one "it", another part of this construct's charm.)
This construct typically takes some getting used to; perhaps the biggest hurdle is that a * must be combined with a "WhateverCodeable" operator/function for it to form a WhateverCode lambda.

TIMTOWTDI
routine map
(1..*).map: 1/*
List repetition operator xx
1/++$ xx *
The cross metaoperator, X or the zip metaoperator Z
1 X/ 1..*
1 xx * Z/ 1..*
(Control flow) control flow gather take
gather for 1..* { take 1/$_ }
(Seq) method from-loop
Seq.from-loop: { 1/++$ }
(Operators) infix ...
1, 1/(1+1/*) ... *
{1/++$} ... *

Reading TTree Friend with uproot

Is there an equivalent of TTree::AddFriend() with uproot ?
I have 2 parallel trees in 2 different files which I'd need to read with uproot.iterate and using interpretations (setting the 'branches' option of uproot.iterate).
Maybe I can do that by manually obtaining several iterators from iterate() calls on the files, and then calling next() on each iterators... but maybe there's a simpler way akin to AddFriend ?
Thanks for any hint !
edit: I'm not sure I've been clear, so here's a bit more details. My question is not about usage of arrays, but about how to read them from different files. Here's a mockup of what I'm doing :
# I will fill this array and give it as input to my DNN
# it's very big so I will fill it in place
bigarray = ndarray( (2,numentries),...)
# get a handle on a tree, just to be able to build interpretations :
t0 = .. first tree in input_files
interpretations = dict(
a=t0['a'].interpretation.toarray(bigarray[0]),
b=t0['b'].interpretation.toarray(bigarray[1]),
)
# iterate with :
uproot.iterate( input_files, treename,
branches = interpretations )
So what if a and b belong to 2 trees in 2 different files ?

In array-based programming, friends are implicit: you can JOIN any two columns after the fact—you don't have to declare them as friends ahead of time.
In the simplest case, if your arrays a and b have the same length and the same order, you can just use them together, like a + b. It doesn't matter whether a and b came from the same file or not. Even if I've if these is jagged (like jets.phi) and the other is not (like met.phi), you're still fine because the non-jagged array will be broadcasted to match the jagged one.
Note that awkward.Table and awkward.JaggedArray.zip can combine arrays into a single Table or jagged Table for bookkeeping.
If the two arrays are not in the same order, possibly because each writer was individually parallelized, then you'll need some column to act as the key associating rows of one array with different rows of the other. This is a classic database-style JOIN and although Uproot and Awkward don't provide routines for it, Pandas does. (Look up "merging, joining, and concatenating" in the Pandas documenting—there's a lot!) You can maintain an array's jaggedness in Pandas by preparing the column with the awkward.topandas function.
The following issue talks about a lot of these things, though the users in the issue below had to join sets of files, rather than just a single tree. (In principle, a process would have to look ahead to all the files to see which contain which keys: a distributed database problem.) Even if that's not your case, you might find more hints there to see how to get started.
https://github.com/scikit-hep/uproot/issues/314

This is how I have "friended" (befriended?) two TTree's in different files with uproot/awkward.
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
for field in array2.fields:
array1 = awkward.with_field(array1, getattr(array2, field), where=field)
# array1 now has branch "a" and "b"
print(array1.a)
print(array1.b)
Alternatively, if it is acceptable to "name" the trees,
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
zippedArray = awkward.zip({"tree1": array1, "tree2": array2})
# zippedArray. now has branch "tree1.a" and "tree2.b"
print(zippedArray.tree1.a)
print(zippedArray.tree2.b)
Of course you can use array1 and array2 together without merging them like this. But if you have already written code that expects only 1 Array this can be useful.

V8 elements kinds optimization

After reading this article: https://v8.dev/blog/elements-kinds, I wondering if null and object are considered same type by V8 in terms of internal optimizations.
eg.
[{}, null, {}] vs [{}, {}, {}]

Yes. The only types considered for elements kinds are "small integer", "double", and "anything". null is not an integer or a double, so it's "anything".
Note that elements kinds are tracked per array, not per element. An array's elements kind is the most generic elements kind required for any of its elements:
[1, 2, 3] // "integer" elements (stored as integers internally)
[1, 2, 3.5] // "double" elements (stored as doubles: [1.0, 2.0, 3.5])
[1, 2, {}] // "anything" elements
[1, 2, null] // "anything" elements
[1, 2, "3"] // "anything" elements
The reason is that the benefit of tracking elements kinds in the first place is that some checks can be avoided. That has significant impact (in relative terms) for operations that are otherwise cheap. For example, if you wanted to sum up an array's elements, which are all integers:
for (let i = 0; i < array.length; i++) result += array[i];
adding integers is really fast (one instruction + overflow check), so checking for every element "is this element an integer (so I can do an integer addition)?" (another instruction + conditional jump) adds a relatively large overhead, so knowing up front that every element in this array is an integer lets the engine skip those checks inside the loop. Whereas if the array contained strings and you wanted to concatenate them all, string concatenation is a much slower operation (you have to allocate a new string object for the result, and then decide whether you want to copy the characters or just refer to the input strings), so the overhead added by an additional "is this element a string (so I can do a string concatenation)?" check is probably barely measurable. So tracking "strings" as an elements kind wouldn't provide much benefit, but would add complexity to the implementation and probably a small performance cost in some situations, so V8 doesn't do it. Similarly, if you knew up front "this array contains only null", there isn't anything obvious that you could speed up with that knowledge.
Also: as a JavaScript developer, don't worry about elements kinds. See that blog post as a (hopefully interesting) story about the lengths to which V8 goes to squeeze every last bit of performance out of your code; don't specifically contort your code to make better use of it (or spend time worrying about it). The difference is usually small, and in the cases where it does matter, it'll probably happen without you having to think about it.

Modify the stack of a PLY parser

Trying to modify a value already pushed to the stack of a PLY/Yacc parser. I'm using ply on python 3.
Basically I want to invert the previous two values when a token SWAP is used.
Imagine we have this stack:
1, 2, 3, 4, SWAP
I need it to reduce to:
1, 2, 4, 3
the value you write to p[0] will be pushed to the stack, but how can I push more then one value?
# this fails because it consume two values and pushes only one
# results into: `1`, `2`, `4`
def p_swap(p):
'value : value value SWAP'
p[0] = p[2]
# this was just a try... fails as well
def p_swap(p):
'value : value value SWAP'
p[0] = p[2]
p[1] = p[1]
# this locked as a good idea since consumes only only value and modify the second in place
# it fails because the stack (negative indexes) are immutable:
# https://github.com/dabeaz/ply/blob/master/ply/yacc.py#L234
# results into: `1`, `2`, `3`, `3`
def p_swap(p):
'value : value SWAP'
p[0] = p[-1]
p[-1] = p[1] # this is a NOP
p is an instance of this class
I guess it was designed to be immutable to enforce the parsing to be done a certain way (the correct way), but I'm missing it: what's the correct way to modify the stack or to design a parser?

It sounds like you're trying to create a stack-based language like Forth or Joy. If so, you shouldn't need a bottom-up parser, and you shouldn't be surprised that a bottom-up parser-generator doesn't work the way you want it to.
Stack-based languages are mostly simply streams of tokens. Each token has some kind of stack effect, and they are just applied in sequence; there's usually little or no syntactic structure beyond that. Consequently, the languages really aren't parsed; at best, they are tokenised.
Most stack-based languages contain some kind of nested control structures which are not strictly conformant with the above (but not all; see Postscript, for example). But even these are so simple that a real parser is unnecessary.
Of course, nothing stops you from using a generated parser to parse a trivial language. But if you do that, you should certainly not expect to be able to gain access to the parser's internal datastructures. The parser stack is used by the parser in ways which might not be fully obvious, and which certainly must not be interfered with. If you want to implement a stack-based language interpreter, you need to use your own value stack. (Or stacks; many stack-based languages have several different stacks, each with its own semantics.)

Reverse iteration in Julia

Yesterday I had occasion to want to iterate over a collection in reverse order. I found the reverse function, but this does not return an iterator, but actually creates a reversed collection.
Apparently, there used to be a Reverse iterator, which was removed several years ago. I can also find reference to something (a type?) called Order.Reverse, but that does not seem to apply to my question.
The Iterators.jl package has many interesting iteration patterns, but apparently not reverse iteration.
I can of course use the reverse function, and in some cases, for example reverse(eachindex(c)) which returnes a reversed iterator, but I would prefer a general reverse iterator.
Is there such a thing?

UPDATE FOR JULIA 1.0+
You can now use Iterators.reverse to reverse a limited subset of types. For example:
julia> Iterators.reverse(1:10)
10:-1:1
julia> Iterators.reverse(CartesianIndices(zeros(2,3))) |> collect
2×3 Array{CartesianIndex{2},2}:
CartesianIndex(2, 3) CartesianIndex(2, 2) CartesianIndex(2, 1)
CartesianIndex(1, 3) CartesianIndex(1, 2) CartesianIndex(1, 1)
julia> Iterators.Reverse([1,1,2,3,5]) |> collect
5-element Array{Int64,1}:
5
3
2
1
1
# But not all iterables support it (and new iterables don't support it by default):
julia> Iterators.reverse(Set(1:2)) |> collect
ERROR: MethodError: no method matching iterate(::Base.Iterators.Reverse{Set{Int64}})
Note that this only works for types that have specifically defined reverse iteration. That is, they have specifically defined Base.iterate(::Iterators.Reverse{T}, ...) where T is the custom type. So it's not fully general purpose (for the reasons documented below), but it does work for any type that supports it.
ORIGINAL ANSWER
Jeff's comment when he removed the reverse iterator three years ago (in the issue you linked) is just as relevant today:
I am highly in favor of deleting this since it simply does not work. Unlike everything else in iterator.jl it depends on indexing, not iteration, and doesn't even work on everything that's indexable (for example UTF8String). I hate having landmines like this in Base.
At the most basic level, iterators only know how to do three things: start iteration, get the next element, and check if the iteration is done. In order to create an iterator that doesn't allocate using these primitives, you'd need an O(n^2) algorithm: walk through the whole iterator, counting as you go, until you find the last element. Then walk the iterator again, only this time stopping at the penultimate element. Sure it doesn't allocate, but it'll be way slower than just collecting the iterator into an array and then indexing backwards. And it'll be completely broken for one-shot iterators (like eachline). So it is simply not possible to create an efficient general reverse iterator.
Note that reverse(eachindex(c)) does not work in general:
julia> reverse(eachindex(sprand(5,5,.2)))
ERROR: MethodError: no method matching reverse(::CartesianRange{CartesianIndex{2}})
One alternative that will still work with offset arrays is reverse(linearindices(c)).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas