Binding a scalar to a sigilless variable (Perl 6) - raku

Let me start by saying that I understand that what I'm asking about in the title is dubious practice (as explained here), but my lack of understanding concerns the syntax involved.
When I first tried to bind a scalar to a sigilless symbol, I did this:
my \a = $(3);
thinking that $(...) would package the Int 3 in a Scalar (as seemingly suggested in the documentation), which would then be bound to symbol a. This doesn't seem to work though: the Scalar is nowhere to be found (a.VAR.WHAT returns (Int), not (Scalar)).
In the above-referenced post, raiph mentions that the desired binding can be performed using a different syntax:
my \a = $ = 3;
which works. Given the result, I suspect that the statement can be phrased equivalently, though less concisely, as: my \a = (my $ = 3), which I could then understand.
That leaves the question: why does the attempt with $(...) not work, and what does it do instead?

What $(…) does is turn a value into an item.
(A value in a scalar variable ($a) also gets marked as being an item)
say flat (1,2, (3,4) );
# (1 2 3 4)
say flat (1,2, $((3,4)) );
# (1 2 (3 4))
say flat (1,2, item((3,4)) );
# (1 2 (3 4))
Basically it is there to prevent a value from flattening. The reason for its existence is that Perl 6 does not flatten lists as much as most other languages, and sometimes you need a little more control over flattening.
The following only sort-of does what you want it to do
my \a = $ = 3;
A bare $ is an anonymous state variable.
my \a = (state $) = 3;
The problem shows up when you run that same bit of code more than once.
sub foo ( $init ) {
my \a = $ = $init; # my \a = (state $) = $init;
(^10).map: {
sleep 0.1;
++a
}
}
.say for await (start foo(0)), (start foo(42));
# (43 44 45 46 47 48 49 50 51 52)
# (53 54 55 56 57 58 59 60 61 62)
# If foo(42) beat out foo(0) instead it would result in:
# (1 2 3 4 5 6 7 8 9 10)
# (11 12 13 14 15 16 17 18 19 20)
Note that variable is shared between calls.
The first Promise halts at the sleep call, and then the second sets the state variable before the first runs ++a.
If you use my $ instead, it now works properly.
sub foo ( $init ) {
my \a = my $ = $init;
(^10).map: {
sleep 0.1;
++a
}
}
.say for await (start foo(0)), (start foo(42));
# (1 2 3 4 5 6 7 8 9 10)
# (43 44 45 46 47 48 49 50 51 52)
The thing is that sigiless “variables” aren't really variables (they don't vary), they are more akin to lexically scoped (non)constants.
constant \foo = (1..10).pick; # only pick one value and never change it
say foo;
for ^5 {
my \foo = (1..10).pick; # pick a new one each time through
say foo;
}
Basically the whole point of them is to be as close as possible to referring to the value you assign to it. (Static Single Assignment)
# these work basically the same
-> \a {…}
-> \a is raw {…}
-> $a is raw {…}
# as do these
my \a = $i;
my \a := $i;
my $a := $i;
Note that above I wrote the following:
my \a = (state $) = 3;
Normally in the declaration of a state var, the assignment only happens the first time the code gets run. Bare $ doesn't have a declaration as such, so I had to prevent that behaviour by putting the declaration in parens.
# bare $
for (5 ... 1) {
my \a = $ = $_; # set each time through the loop
say a *= 2; # 15 12 9 6 3
}
# state in parens
for (5 ... 1) {
my \a = (state $) = $_; # set each time through the loop
say a *= 2; # 15 12 9 6 3
}
# normal state declaration
for (5 ... 1) {
my \a = state $ = $_; # set it only on the first time through the loop
say a *= 2; # 15 45 135 405 1215
}

Sigilless variables are not actually variables, they are more of an alias, that is, they are not containers but bind to the values they get on the right hand side.
my \a = $(3);
say a.WHAT; # OUTPUT: «(Int)␤»
say a.VAR.WHAT; # OUTPUT: «(Int)␤»
Here, by doing $(3) you are actually putting in scalar context what is already in scalar context:
my \a = 3; say a.WHAT; say a.VAR.WHAT; # OUTPUT: «(Int)␤(Int)␤»
However, the second form in your question does something different. You're binding to an anonymous variable, which is a container:
my \a = $ = 3;
say a.WHAT; # OUTPUT: «(Int)␤»
say a.VAR.WHAT;# OUTPUT: «(Scalar)␤»
In the first case, a was an alias for 3 (or $(3), which is the same); in the second, a is an alias for $, which is a container, whose value is 3. This last case is equivalent to:
my $anon = 3; say $anon.WHAT; say $anon.VAR.WHAT; # OUTPUT: «(Int)␤(Scalar)␤»
(If you have some suggestion on how to improve the documentation, I'd be happy to follow up on it)

Related

Multiply all values in a %hash and return a %hash with the same structure

I have some JSON stored in a database column that looks like this:
pokeapi=# SELECT height FROM pokeapi_pokedex WHERE species = 'Ninetales';
-[ RECORD 1 ]------------------------------------------
height | {"default": {"feet": "6'07\"", "meters": 2.0}}
As part of a 'generation' algorithm I'm working on I'd like to take this value into a %hash, multiply it by (0.9..1.1).rand (to allow for a 'natural 10% variance in the height), and then create a new %hash in the same structure. My select-height method looks like this:
method select-height(:$species, :$form = 'default') {
my %heights = $.data-source.get-height(:$species, :$form);
my %height = %heights * (0.9..1.1).rand;
say %height;
}
Which actually calls my get-height routine to get the 'average' heights (in both metric and imperial) for that species.
method get-height (:$species, :$form) {
my $query = dbh.prepare(qq:to/STATEMENT/);
SELECT height FROM pokeapi_pokedex WHERE species = ?;
STATEMENT
$query.execute($species);
my %height = from-json($query.row);
my %heights = self.values-or-defaults(%height, $form);
return %heights;
}
However I'm given the following error on execution (I assume because I'm trying to multiple the hash as a whole rather than the individual elements of the hash):
$ perl6 -I lib/ examples/height-weight.p6
{feet => 6'07", meters => 2}
Odd number of elements found where hash initializer expected:
Only saw: 1.8693857987465123e0
in method select-height at /home/kane/Projects/kawaii/p6-pokeapi/lib/Pokeapi/Pokemon/Generator.pm6 (Pokeapi::Pokemon::Generator) line 22
in block <unit> at examples/height-weight.p6 line 7
Is there an easier (and working) way of doing this without duplicating my code for each element? :)
Firstly, there is an issue with logic of your code. Initially, you are getting a hash of values, "feet": "6'07\"", "meters": 2.0 parsed out of json, with meters being a number and feet being a string. Next, you are trying to multiply it on a random value... And while it will work for a number, it won't for a string. Perl 6 allomorphs allow you to do that, actually: say "5" * 3 will return 15, but X"Y' pattern is complex enough for Perl 6 to not naturally understand it.
So you likely need to convert it before processing, and to convert it back afterwards.
The second thing is exact line that leads to the error you are observing.
Consider this:
my %a = a => 5;
%a = %a * 10 => 5; # %a becomes a hash with a single value of 10 => 5
# It happens because when a Hash is used in math ops, its size is used as a value
# Thus, if you have a single value, it'll become 1 * 10, thus 10
# And for %a = a => 1, b => 2; %a * 5 will be evaluated to 10
%a = %a * 10; # error, the key is passed, but not a value
To work directly on hash values, you want to use map method and process every pair, for example: %a .= map({ .key => .value * (0.9..1.1).rand }).
Of course, it can be golfed or written in another manner, but the main issue is resolved this way.
You've accepted #Takao's answer. That solution requires manually digging into %hash to get to leaf hashes/lists and then applying map.
Given that your question's title mentions "return ... same structure" and the body includes what looks like a nested structure, I think it's important there's an answer providing some idiomatic solutions for automatically descending into and duplicating a nested structure:
my %hash = :a{:b{:c,:d}}
say my %new-hash = %hash».&{ (0.9 .. 1.1) .rand }
# {a => {b => {c => 1.0476391741359872, d => 0.963626602773474}}}
# Update leaf values of original `%hash` in-place:
%hash».&{ $_ = (0.9 .. 1.1) .rand }
# Same effect:
%hash »*=» (0.9..1.1).rand;
# Same effect:
%hash.deepmap: { $_ = (0.9..1.1).rand }
Hyperops (eg ») iterate one or two data structures to get to their leaves and then apply the op being hypered:
say %hash».++ # in-place increment leaf values of `%hash` even if nested
.&{ ... } calls the closure in braces using method call syntax. Combining this with a hyperop one can write:
%hash».&{ $_ = (0.9 .. 1.1) .rand }
Another option is .deepmap:
%hash.deepmap: { $_ = (0.9..1.1).rand }
A key difference between hyperops and deepmap is that the compiler is allowed to iterate data structures and run hyperoperations in parallel in any order whereas deepmap iteration always occurs sequentially.

Reading fields in previous lines for moving average

Main Question
What is the correct syntax for recursively calling AWK inside of another AWK program, and then saving the output to a (numeric) variable?
I want to call AWK using 2/3 variables:
N -> Can be read from Bash or from container AWK script.
Linenum -> Read from container AWK program
J -> Field that I would like to read
This is my attempt.
Container AWk program:
BEGIN {}
{
...
# Loop in j
...
k=NR
# Call to other instance of AWK
var=(awk -f -v n="$n_steps" linenum=k input-file 'linenum-n {printf "%5.4E", $j}'
...
}
END{}
Background for more general questions:
I have a file for which I would like to calculate a moving average of n (for example 2280) steps.
Ideally, for the first n rows the average is of the values 1 to k,
where k <= n.
For rows k > n the average would be of the last n values.
I will eventually execute the code in many large files, with several columns, and thousands to millions of rows, so I'm interested in streamlining the code as much as possible.
Code Excerpt and Description
The code I'm trying to develop looks something like this:
NR>1
{
# Loop over fields
for (j in columns)
{
# Rows before full moving average is done
if ( $1 <= n )
{
cumsum[j]=cumsum[j]+$j #Cumulative sum
$j=cumsum[j]/$1 # Average
}
#moving average
if ( $1 > n )
{
k=NR
last[j]=(awk -f -v n="$n_steps" ln=k input-file 'ln-n {printf "%5.4E", $j}') # Obtain value that will get ubstracted from moving average
cumsum[j]=cumsum[j]+$j-last[j] # Cumulative sum adds last step and deleted unwanted value
$j=cumsum[j]/n # Moving average
}
}
}
My input file contains several columns. The first column contains the row number, and the other columns contain values.
For the cumulative sum of the moving average: If I am in row k, I want to add it to the cumulative sum, but also start subtracting the first value that I don't need (k-n).
I don't want to have to create an array of cumulative sums for the last steps, because I feel it could impact performance. I prefer to directly select the values that I want to substract.
For that I need to call AWK once again (but on a different line). I attempt to do it in this line:
k=NR
last[j]=(awk -f -v n="$n_steps" ln=k input-file 'ln-n {printf "%5.4E", $j}'
I am sure that this code cannot be correct.
Discussion Questions
What is the best way to obtain information about a field in a previous line to the one that AWK is working on? Can it be then saved into a variable?
Is this recursive use of AWK allowed or even recommended?
If not, what could be the most efficient way to update the cumulative sum values so that I get an efficient enough code?
Sample input and Output
Here is a sample of the input (second column) and the desired output (third column). I'm using 3 as the number of averaging steps (n)
N VAL AVG_VAL
1 1 1
2 2 1.5
3 3 2
4 4 3
5 5 4
6 6 5
7 7 6
8 8 7
9 9 8
10 10 9
11 11 10
12 12 11
13 13 12
14 14 13
14 15 14
If you want to do a running average of a single column, you can do it this way:
BEGIN{n=2280; c=7}
{ s += $c - a[NR%n]; a[NR%n] = $c }
{ print $0, s /(NR < n : NR ? n) }
Here we store the last n values in an array a and keep track of the cumulative sum s. Every time we update the sum we correct by first removing the last value from it.
If you want to do this for a couple of columns, you have to be a bit handy with keeping track of your arrays
BEGIN{n=2280; c[0]=7; c[1]=8; c[2]=9}
{ for(i in c) { s[i] += $c[i] - a[n*i + NR%n]; a[n*i + NR%n] = $c[i] } }
{ printf $0
for(i=0;i<length(c);++i) printf OFS (s[i]/(NR < n : NR ? n))
printf ORS
}
However, you mentioned that you have to add millions of entries. That is where it becomes a bit more tricky. Summing a lot of values will introduce numeric errors as you loose precision bit by bit (when you add floats). So in this case, I would suggest implementing the Kahan summation.
For a single column you get:
BEGIN{n=2280; c=7}
{ y = $c - a[NR%n] - k; t = s + y; k = (t - s) - y; s = t; a[NR%n] = $c }
{ print $0, s /(NR < n : NR ? n) }
or a bit more expanded as:
BEGIN{n=2280; c=7}
{ y = $c - k; t = s + y; k = (t - s) - y; s = t; }
{ y = -a[NR%n] - k; t = s + y; k = (t - s) - y; s = t; }
{ a[NR%n] = $c }
{ print $0, s /(NR < n : NR ? n) }
For a multi-column problem, it is now straightforward to adjust the above script. All you need to know is that y and t are temporary values and k is the compensation term which needs to be stored in memory.

Getting a positional slice using a Range variable as a subscript

my #numbers = <4 8 15 16 23 42>;
this works:
.say for #numbers[0..2]
# 4
# 8
# 15
but this doesn't:
my $range = 0..2;
.say for #numbers[$range];
# 16
the subscript seems to be interpreting $range as the number of elements in the range (3). what gives?
Working as intended. Flatten the range object into a list with #numbers[|$range] or use binding on Range objects to hand them around. https://docs.perl6.org will be updated shortly.
On Fri Jul 22 15:34:02 2016, gfldex wrote:
> my #numbers = <4 8 15 16 23 42>; my $range = 0..2; .say for
> #numbers[$range];
> # OUTPUT«16␤»
> # expected:
> # OUTPUT«4␤8␤15␤»
>
This is correct, and part of the "Scalar container implies item" rule.
Changing it would break things like the second evaluation here:
> my #x = 1..10; my #y := 1..3; #x[#y]
(2 3 4)
> #x[item #y]
4
Noting that since a range can bind to #y in a signature, then Range being a
special case would make an expression like #x[$(#arr-param)]
unpredictable in its semantics.
> # also binding to $range provides the expected result
> my #numbers = <4 8 15 16 23 42>; my $range := 0..2; .say for
> #numbers[$range];
> # OUTPUT«4␤8␤15␤»
> y
This is also expected, since with binding there is no Scalar container to
enforce treatment as an item.
So, all here is working as designed.
A symbol bound to a Scalar container yields one thing
Options for getting what you want include:
Prefix with # to get a plural view of the single thing: numbers[#$range]; OR
declare the range variable differently so it works directly
For the latter option, consider the following:
# Bind the symbol `numbers` to the value 1..10:
my \numbers = [0,1,2,3,4,5,6,7,8,9,10];
# Bind the symbol `rangeA` to the value 1..10:
my \rangeA := 1..10;
# Bind the symbol `rangeB` to the value 1..10:
my \rangeB = 1..10;
# Bind the symbol `$rangeC` to the value 1..10:
my $rangeC := 1..10;
# Bind the symbol `$rangeD` to a Scalar container
# and then store the value 1..10 in it:`
my $rangeD = 1..10;
# Bind the symbol `#rangeE` to the value 1..10:
my #rangeE := 1..10;
# Bind the symbol `#rangeF` to an Array container and then
# store 1 thru 10 in the Scalar containers 1 thru 10 inside the Array
my #rangeF = 1..10;
say numbers[rangeA]; # (1 2 3 4 5 6 7 8 9 10)
say numbers[rangeB]; # (1 2 3 4 5 6 7 8 9 10)
say numbers[$rangeC]; # (1 2 3 4 5 6 7 8 9 10)
say numbers[$rangeD]; # 10
say numbers[#rangeE]; # (1 2 3 4 5 6 7 8 9 10)
say numbers[#rangeF]; # (1 2 3 4 5 6 7 8 9 10)
A symbol that's bound to a Scalar container ($rangeD) always yields a single value. In a [...] subscript that single value must be a number. And a range, treated as a single number, yields the length of that range.

Printing out a binary search tree with slashes

http://pastebin.com/dN9a9xfs
That's my code to print out the elements of a binary search tree. The goal is to display it in level order, with slashes connecting the parent to each child. So for instance, the sequence 15 3 16 2 1 4 19 17 28 31 12 14 11 0 would display after execution as:
15
/ \
3 16
/ \ \
2 4 19
/ \ / \
1 12 17 28
/ / \ \
0 11 14 31
I've been working on it for a long time now, but I just can't seem to get the spacing/indentation right. I know I wrote the proper algorithm for displaying the nodes in the proper order, but the slashes are just off. This is the result of my code as is: http://imgur.com/sz8l1
I know I'm so close to the answer, since my display is not that far off from what I need, and I have a feeling it's a really simple solution, but for some reason I just seem to get it right.
I'm out of time for now, but here's a quick version. I did not read your code (don't know C++), so I don't know how close our solutions are.
I changed the output format slightly. Instead of / for the left node, I used | so I didn't have to worry about left spacing at all.
15
| \
3 16
|\ \
2 4 19
| \ | \
1 | 17 28
| | \
0 12 31
| \
11 14
Here's the code. I hope you're able to take what you need from it. There are definitely some Pythonisms which I hope map to what you're using. The main idea is to treat each row of numbers as a map of position to node object, and at each level, sort the map by key and print them to the console iteratively based on their assigned position. Then generate a new map with positions relative to their parents in the previous level. If there's a collision, generate a fake node to bump the real node down a line.
from collections import namedtuple
# simple node representation. sorry for the mess, but it does represent the
# tree example you gave.
Node = namedtuple('Node', ('label', 'left', 'right'))
def makenode(n, left=None, right=None):
return Node(str(n), left, right)
root = makenode(
15,
makenode(
3,
makenode(2, makenode(1, makenode(0))),
makenode(4, None, makenode(12, makenode(11), makenode(14)))),
makenode(16, None, makenode(19, makenode(17),
makenode(28, None, makenode(31)))))
# takes a dict of {line position: node} and returns a list of lines to print
def print_levels(print_items, lines=None):
if lines is None:
lines = []
if not print_items:
return lines
# working position - where we are in the line
pos = 0
# line of text containing node labels
new_nodes_line = []
# line of text containing slashes
new_slashes_line = []
# args for recursive call
next_items = {}
# sort dictionary by key and put them in a list of pairs of (position,
# node)
sorted_pos_and_node = [
(k, print_items[k]) for k in sorted(print_items.keys())]
for position, node in sorted_pos_and_node:
# add leading whitespace
while len(new_nodes_line) < position:
new_nodes_line.append(' ')
while len(new_slashes_line) < position:
new_slashes_line.append(' ')
# update working position
pos = position
# add node label to string, as separate characters so list length
# matches string length
new_nodes_line.extend(list(node.label))
# add left child if any
if node.left is not None:
# if we're close to overlapping another node, push that node down
# by adding a parent with label '|' which will make it look like a
# line dropping down
for collision in [pos - i for i in range(3)]:
if collision in next_items:
next_items[collision] = makenode(
'|', next_items[collision])
# add the slash and the node to the appropriate places
new_slashes_line.append('|')
next_items[position] = node.left
else:
new_slashes_line.append(' ')
# update working position
len_num = len(node.label)
pos += len_num
# add some more whitespace
while len(new_slashes_line) < position + len_num:
new_slashes_line.append(' ')
# and take care of the right child
if node.right is not None:
new_slashes_line.append('\\')
next_items[position + len_num + 1] = node.right
else:
new_slashes_line.append(' ')
# concatenate each line's components and append them to the list
lines.append(''.join(new_nodes_line))
lines.append(''.join(new_slashes_line))
# do it again!
return print_levels(next_items, lines)
lines = print_levels({0: root})
print '\n'.join(lines)

Formula in gawk

I have a problem that I’m trying to work out in gawk. This should be so simple, but my attempts ended up with a divide by zero error.
What I trying to accomplish is as follows –
maxlines = 22 (fixed value)
maxnumber = > max lines (unknown value)
Example:
maxlines=22
maxnumber=60
My output should look like the following:
print lines:
1
2
...
22
print lines:
23
24
...
45
print lines:
46 (remainder of 60 (maxnumber))
47
...
60
It's not clear what you're asking, but I assume you want to loop through input lines and print a new header (page header?) after every 22 lines. Using a simple counter and check for
count % 22 == 1
which tells you it's time to print the next page.
Or you could keep two counters, one for the absolute line number and another for the line number within the current page. When the second counter exceeds 22, reset it to zero and print the next page heading.
Worked out gawk precedence with some help and this works -
maxlines = 22
maxnumber = 60
for (i = 1; i <= maxnumber; i++){
if ( ! ( (i-1) % maxlines) ){
print "\nprint lines:"
}
print i
}