Multiply all values in a %hash and return a %hash with the same structure - raku

I have some JSON stored in a database column that looks like this:
pokeapi=# SELECT height FROM pokeapi_pokedex WHERE species = 'Ninetales';
-[ RECORD 1 ]------------------------------------------
height | {"default": {"feet": "6'07\"", "meters": 2.0}}
As part of a 'generation' algorithm I'm working on I'd like to take this value into a %hash, multiply it by (0.9..1.1).rand (to allow for a 'natural 10% variance in the height), and then create a new %hash in the same structure. My select-height method looks like this:
method select-height(:$species, :$form = 'default') {
my %heights = $.data-source.get-height(:$species, :$form);
my %height = %heights * (0.9..1.1).rand;
say %height;
}
Which actually calls my get-height routine to get the 'average' heights (in both metric and imperial) for that species.
method get-height (:$species, :$form) {
my $query = dbh.prepare(qq:to/STATEMENT/);
SELECT height FROM pokeapi_pokedex WHERE species = ?;
STATEMENT
$query.execute($species);
my %height = from-json($query.row);
my %heights = self.values-or-defaults(%height, $form);
return %heights;
}
However I'm given the following error on execution (I assume because I'm trying to multiple the hash as a whole rather than the individual elements of the hash):
$ perl6 -I lib/ examples/height-weight.p6
{feet => 6'07", meters => 2}
Odd number of elements found where hash initializer expected:
Only saw: 1.8693857987465123e0
in method select-height at /home/kane/Projects/kawaii/p6-pokeapi/lib/Pokeapi/Pokemon/Generator.pm6 (Pokeapi::Pokemon::Generator) line 22
in block <unit> at examples/height-weight.p6 line 7
Is there an easier (and working) way of doing this without duplicating my code for each element? :)

Firstly, there is an issue with logic of your code. Initially, you are getting a hash of values, "feet": "6'07\"", "meters": 2.0 parsed out of json, with meters being a number and feet being a string. Next, you are trying to multiply it on a random value... And while it will work for a number, it won't for a string. Perl 6 allomorphs allow you to do that, actually: say "5" * 3 will return 15, but X"Y' pattern is complex enough for Perl 6 to not naturally understand it.
So you likely need to convert it before processing, and to convert it back afterwards.
The second thing is exact line that leads to the error you are observing.
Consider this:
my %a = a => 5;
%a = %a * 10 => 5; # %a becomes a hash with a single value of 10 => 5
# It happens because when a Hash is used in math ops, its size is used as a value
# Thus, if you have a single value, it'll become 1 * 10, thus 10
# And for %a = a => 1, b => 2; %a * 5 will be evaluated to 10
%a = %a * 10; # error, the key is passed, but not a value
To work directly on hash values, you want to use map method and process every pair, for example: %a .= map({ .key => .value * (0.9..1.1).rand }).
Of course, it can be golfed or written in another manner, but the main issue is resolved this way.

You've accepted #Takao's answer. That solution requires manually digging into %hash to get to leaf hashes/lists and then applying map.
Given that your question's title mentions "return ... same structure" and the body includes what looks like a nested structure, I think it's important there's an answer providing some idiomatic solutions for automatically descending into and duplicating a nested structure:
my %hash = :a{:b{:c,:d}}
say my %new-hash = %hash».&{ (0.9 .. 1.1) .rand }
# {a => {b => {c => 1.0476391741359872, d => 0.963626602773474}}}
# Update leaf values of original `%hash` in-place:
%hash».&{ $_ = (0.9 .. 1.1) .rand }
# Same effect:
%hash »*=» (0.9..1.1).rand;
# Same effect:
%hash.deepmap: { $_ = (0.9..1.1).rand }
Hyperops (eg ») iterate one or two data structures to get to their leaves and then apply the op being hypered:
say %hash».++ # in-place increment leaf values of `%hash` even if nested
.&{ ... } calls the closure in braces using method call syntax. Combining this with a hyperop one can write:
%hash».&{ $_ = (0.9 .. 1.1) .rand }
Another option is .deepmap:
%hash.deepmap: { $_ = (0.9..1.1).rand }
A key difference between hyperops and deepmap is that the compiler is allowed to iterate data structures and run hyperoperations in parallel in any order whereas deepmap iteration always occurs sequentially.

Related

Clean up code and keep null values from crashing read.csv.sql

I am using read.csv.sql to conditionally read in data (my data set is extremely large so this was the solution I chose to filter it and reduce it in size prior to reading the data in). I was running into memory issues by reading in the full data and then filtering it so that is why it is important that I use the conditional read so that the subset is read in, versus the full data set.
Here is a small data set so my problem can be reproduced:
write.csv(iris, "iris.csv", row.names = F)
library(sqldf)
csvFile <- "iris.csv"
I am finding that the notation you have to use is extremely awkward using read.csv.sql the following is the how I am reading in the file:
# Step 1 (Assume these values are coming from UI)
spec <- 'setosa'
petwd <- 0.2
# Add quotes and make comma-separated:
spec <- toString(sprintf("'%s'", spec))
petwd <- toString(sprintf("'%s'", petwd))
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Species" in ($spec)'
and "Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
My main problem is that if any of the values above (from UI) are null then it won't read in the data properly, because this chunk of code is all hard coded.
I would like to change this into: Step 1 - check which values are null and do not filter off of them, then filter using read.csv.sql for all non-null values on corresponding columns.
Note: I am reusing the code from this similar question within this question.
UPDATE
I want to clear up what I am asking. This is what I am trying to do:
If a field, say spec comes through as NA (meaning the user did not pick input) then I want it to filter as such (default to spec == EVERY SPEC):
# Step 2 - Conditionally read in the data, store in 'd'
d <- fn$read.csv.sql(csvFile, sql='select * from file where
"Petal.Width" in ($petwd)',
filter = list('gawk -f prog', prog = '{ gsub(/"/, ""); print }'))
Since spec is NA, if you try to filter/read in a file matching spec == NA it will read in an empty data set since there are no NA values in my data, hence breaking the code and program. Hope this clears it up more.
There are several problems:
some of the simplifications provided in the link in the question were not followed.
spec is a scalar so one can just use '$spec'
petwd is a numeric scalar and SQL does not require quotes around numbers so just use $petwd
the question states you want to handle empty fields but not how so we have used csvfix to map them to -1 and also strip off quotes. (Alternately let them enter and do it in R. Empty numerics will come through as 0 and empty character fields will come through as zero length character fields.)
you can use [...] in place of "..." in SQL
The code below worked for me in both Windows and Ubuntu Linux with the bash shell.
library(sqldf)
spec <- 'setosa'
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file where [Species] = '$spec' and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix map -smq -fv "" -tv -1'
)
Update
Regarding the update at the end of the question it was clarified that the NA could be in spec as opposed to being in the data being read in and that if spec is NA then the condition involving spec should be regarded as TRUE. In that case just expand the SQL where condition to handle that as follows.
spec <- NA
petwd <- 0.2
d <- fn$read.csv.sql(
"iris.csv",
sql = "select * from file
where ('$spec' == 'NA' or [Species] = '$spec') and [Petal.Width] = $petwd",
verbose = TRUE,
filter = 'csvfix echo -smq'
)
The above will return all rows for which Petal.Width is 0.2 .

What's the equivalent in Perl 6 to star expressions in Python?

In Python 3, suppose you run a course and decide at the end of the semester that you’re going to drop the first and last homework grades, and only average the rest of them:
def drop_first_last(grades):
first, *middle, last = grades
return avg(middle)
print drop_first_last([100,68,67,66,23]);
In Perl 6:
sub drop_first_last(#grades) {
my ($first, *#middle, $last) = #grades;
return avg(#middle);
}
say drop_first_last(100,68,67,66,23);
Leads to the error "Cannot put required parameter $last after variadic parameters".
So, what's the equivalent express in Perl 6 as star expressions in Python?
sub drop_first_last(Seq() \seq, $n = 1) { seq.skip($n).head(*-$n) };
say drop_first_last( 1..10 ); # (2 3 4 5 6 7 8 9)
say drop_first_last( 1..10, 2 ); # (3 4 5 6 7 8)
The way it works: convert whatever the first argument is to a Seq, then skip $n elements, and then keep all except the last $n elements.
Perl5:
sub drop_first_last { avg( #_[ 1 .. $#_-1 ] ) } #this
sub drop_first_last { shift;pop;avg#_ } #or this
Perl6:
sub drop_first_last { avg( #_[ 1 .. #_.end-1 ] ) }
Use a slice.
sub drop_first_last (#grades) {
return avg(#grades[1..*-2])
}
Workarounds such as have been shown in the rest of the answer are correct, but the short answer to your question is that there is no equivalent expression in Perl 6 to the * in Python.
This kind of arguments are called, in general, variadic, and *slurpy+ in Perl 6, because they slurp the rest of the arguments. And that's the key, the rest. There can be no argument declared after an slurpy argument in a subroutine's signature. This example below also uses a workaround:
sub avg( #grades ) {
return ([+] #grades) / +#grades;
}
sub drop_first_last($first, *#other-grades) {
return avg(#other-grades[0..*-1]);
}
my #grades = <10 4 8 9 10 8>;
say drop_first_last( |#grades );
but is first using the slurpy * in the signature to show how it works, and then, by calling it with |#grades, is flattening the array instead of binding it into an array argument. So the long answer is that there is actually an * or variadic symbol in signatures in Perl 6, and it works similarly to how it would do it in Python, but it can only be placed last in those signatures since it captures the rest of the elements of the expression.
In case the first and last values are needed for some other reason,
unflattened list structure inside slices maps across to the results
in most cases, so you can do this (you have to use a $ sigil on
$middle to prevent autoflattening):
my #grades = (1,2,3,4,5,6);
my ($first, $middle, $last) = #grades[0,(0^..^*-1),*-1];
$first.say; $middle.say; $last.say;
#1
#(2 3 4 5)
#6

Error while returning output of Pig macro via tuple

The error is in the function below, I'm trying to generate 2 measures of entropy (the latter removes all events with <5 frequency).
My error:
ERROR 1200: Cannot expand macro 'TOTUPLE'. Reason: Macro must be defined before expansion.
Which is weird, because TOTUPLE is a built-in function. Other pig scripts use TOTUPLE with no problems.
Code:
define dual_entropies (search, field) returns entropies {
summary = summary_total($search, $field);
entr1 = count_sum_entropy(summary, $field);
summary = filter summary by events >= 5L;
entr2 = count_sum_entropy(summary, $field);
$entropies = TOTUPLE(entr1, entr2);
};
Note that entr1 and entr2 are both single numbers, not vectors of numbers - I suspect that's part of the issue.
I ran into similar confusions. I'm not sure if it's true in general but Pig only liked TOTUPLE when it's part of a FOREACH operation. I worked around by doing group by all, which returns a bag with a single tuple in it, followed by a FOREACH .. GENERATE such as:
B = group A ALL;
C = foreach B generate 'x', 2, TOTUPLE('a', 'b', 'c');
dump C;
...
(x,2,(hi,2,3))
Perhaps this will help

Convert a Dynamic[] construct to a numerical list

I have been trying to put together something that allows me to extract points from a ListPlot in order to use them in further computations. My current approach is to select points with a Locator[]. This works fine for displaying points, but I cannot figure out how to extract numerical values from a construct with head Dynamic[]. Below is a self-contained example. By dragging the gray locator, you should be able to select points (indicated by the pink locator and stored in q, a list of two elements). This is the second line below the plot. Now I would like to pass q[[2]] to a function, or perhaps simply display it. However, Mathematica treats q as a single entity with head Dynamic, and thus taking the second part is impossible (hence the error message). Can anyone shed light on how to convert q into a regular list?
EuclideanDistanceMod[p1_List, p2_List, fac_: {1, 1}] /;
Length[p1] == Length[p2] :=
Plus ## (fac.MapThread[Abs[#1 - #2]^2 &, {p1, p2}]) // Sqrt;
test1 = {{1.`, 6.340196001221532`}, {1.`,
13.78779876355869`}, {1.045`, 6.2634018978377295`}, {1.045`,
13.754947081416544`}, {1.09`, 6.178367702583522`}, {1.09`,
13.72055251752498`}, {1.135`, 1.8183153704413153`}, {1.135`,
6.082497198000075`}, {1.135`, 13.684582525399742`}, {1.18`,
1.6809452373465104`}, {1.18`, 5.971583107298081`}, {1.18`,
13.646996905469383`}, {1.225`, 1.9480537697339537`}, {1.225`,
5.838386922625636`}, {1.225`, 13.607746407088161`}, {1.27`,
2.1183174369679234`}, {1.27`, 5.669799095595362`}, {1.27`,
13.566771130126131`}, {1.315`, 2.2572975468163463`}, {1.315`,
5.444014254828522`}, {1.315`, 13.523998701347882`}, {1.36`,
2.380307009155079`}, {1.36`, 5.153024664297602`}, {1.36`,
13.479342200528283`}, {1.405`, 2.4941312539733285`}, {1.405`,
4.861423833512566`}, {1.405`, 13.432697814928654`}, {1.45`,
2.6028066447609426`}, {1.45`, 4.619367407525507`}, {1.45`,
13.383942212133244`}};
DynamicModule[{p = {1.2, 10}, q = {1.3, 11}},
q := Dynamic#
First#test1[[
Ordering[{#, EuclideanDistanceMod[p, #, {1, .1}]} & /# test1,
1, #1[[2]] < #2[[2]] &]]];
Grid[{{Show[{ListPlot[test1, Frame -> True, ImageSize -> 300],
Graphics#Locator[Dynamic[p]],
Graphics#
Locator[q, Appearance -> {Small},
Background -> Pink]}]}, {Dynamic#p}, {q},{q[[2]]}}]]
There are several ways to extract values from a dynamic expression. What you probably want is Setting (documentation), which resolves all dynamic values into their values at the time Setting is evaluated.
In[75]:= Slider[Dynamic[x]] (* evaluate then move the slider *)
In[76]:= FullForm[Dynamic[x]]
Out[76]//FullForm= Dynamic[x]
In[77]:= FullForm[Setting[Dynamic[x]]]
Out[77]//FullForm= 0.384`
Here's a slightly more complicated example:
DynamicModule[{x},
{Dynamic[x], Slider[Dynamic[x]],
Button["Set y to the current value of x", y = Setting[Dynamic[x]]]}
]
If you evaluate the above expression, move the slider and then click the button, the current value of x as set by the slider is assigned to y. If you then move the slider again, the value of y doesn't change until you update it again by clicking the button.
Instead of assigning to a variable, you can of course paste values into the notebook, call a function, export a file, etc.
After a little more research, it appears that the answer revolves around the fact that Dynamic[] is a wrapper for updating and displaying the expression. Any computations that you want dynamically updated must be placed inside the wrapper: for instance, instead of doing something like q = Dynamic[p] + 1 one should use something like Dynamic[q = p + 1; q]}]. For my example, where I wanted to split q into two parts, here's the updated code:
DynamicModule[{p = {1.2, 10}, q = {1.3, 11}, qq, q1, q2},
q := Dynamic[
qq = First#
test1[[Ordering[{#, EuclideanDistanceMod[p, #, {1, .1}]} & /#
test1, 1, #1[[2]] < #2[[2]] &]]];
{q1, q2} = qq;
qq
];
Grid[{{Show[{ListPlot[test1, Frame -> True, ImageSize -> 300],
Graphics#Locator[Dynamic[p]],
Graphics#
Locator[q, Appearance -> {Small},
Background -> Pink]}]}, {Dynamic#p}, {Dynamic#q}, {Dynamic#
q1}}]]
If I am still missing something, or if there's a cleaner way to do this, I welcome any suggestions...

How do I use Perl to parse the output of the sqlplus command?

I have an SQL file which will give me an output like below:
10|1
10|2
10|3
11|2
11|4
.
.
.
I am using this in a Perl script like below:
my #tmp_cycledef = `sqlplus -s $connstr \#DLCycleState.sql`;
after this above statement, since #tmp_cycledef has all the output of the SQL query,
I want to show the output as:
10 1,2,3
11 2,4
How could I do this using Perl?
EDIT:
I am using the following code:
foreach my $row (#tmp_cycledef)
{
chomp $row;
my ($cycle_code,$cycle_month)= split /\s*\|\s*/, $row;
print "$cycle_code, $cycle_month\n";
$hash{$cycle_code}{$cycle_month}=1
}
foreach my $num ( sort keys %hash )
{
my $h = $hash{$num};
print join(',',sort keys %$h),"\n";
}
the fist print statement prints:
2, 1
2, 10
2, 11
2, 12
3, 1
3, 10
3, 11
but the out is always
1,10,11,12
1,10,11,12
1,10,11,12
1,10,11,12
1,10,11,12
1,10,11,12
1,10,11,12
Well, this one is actually how you might do it in perl:
# two must-have pragmas for perl development
use strict;
use warnings;
Perl allows for variables to be created as they are used, $feldman = some_function() means that you now have the variable $feldman in your local namespace. But the bad part about this is that you can type $fldman and take a long time finding out why what you thought was $feldman has no value. Turning on strictures means that your code fails to compile if it encounters an undeclared variable. You declare a variable with a my or our statement (or in older Perl code a use vars statement.
Turning on warnings just warns you when you're not getting values you expect. Often warnings tends to be too touchy, but they are generally a good thing to develop code with.
my %hash; # the base object for the data
Here, I've declared a hash variable that I creatively called %hash. The sigil (pronounced "sijil") "%" tells that it is a map of name-value pairs. This my statement declared the variable and makes it legal for the compiler. The compiler will warn me about any use of %hsh.
The next item is a foreach loop (which can be abbreviated "for"). The loop will process the list of lines in #tmp_cycledef assigning each one in turn to $row. ( my $row).
We chomp the line first, removing the end-of-line character for that platform.
We split the line on the '|' character, creating a list of strings that had been separated by a pipe.
And then we store it in a two-layered hash. Since we want to group them by at least the first number. We could do this by array, and create an array at the location in the hash like so: push #{$hash{$key}}, $val, but I typically want to collapse duplicates (not that there were any duplicates in your sample.)
Here:
foreach my $row ( #tmp_cycledef ) {
chomp $row; # removes the end-of-line character when present.
my ( $key, $val ) = split /\|/, $row;
# One of the best ways to merge lists is a presence-of idea
# with the hash holding whether the value is present
$hash{$key}{$val} = 1;
}
Once we have the data in the structure, we need to iterate both level of hash keys. You wanted to separate the "top level" numbers by lines, but you wanted the second numbers concatenated on the same line. So we print a line for each of the first numbers and join the list of strings stored for each number on the same line, delimited by commas. We also sort the list: { $a <=> $b } just takes to keys and numerically compares them. So you get a numeric order.
# If they were alpha keys our sort routine, we would just likely say sort keys %hash
foreach my $num ( sort { $a <=> $b } keys %hash ) {
my $h = $hash{$num};
print "$num ", join( ',', sort { $a <=> $b } keys %$h ), "\n";
}
As I said in the comments, sort, by default, sorts in character order so you can just say sort keys %hash.
To help you out, you really need to read some of these:
strictures
warnings
perldata
perlfunc -- especially my, foreach, chomp, split, keys, sort and join
And the data structure tutorial
Use a hash of arrays to collect all the values for a single key together, then print them out:
init hash
for each line:
parse into key|value
append value to hash[key]
for each key in hash: # you can sort it, if needed
print out key, list of values
If your input is sorted (as it is in the provided sample), you don't actually need to bother with the hash of arrays/hashes. The code is a bit longer, but doesn't require you to understand references and should run faster for large datasets:
#!/usr/bin/perl
use strict;
use warnings;
my #tmp_cycledef = <DATA>;
my $last_key;
my #values;
for (#tmp_cycledef) {
chomp;
my ($key, $val) = split '\|';
# Seed $last_key with the first key value on the first pass
$last_key = $key unless defined $last_key;
# The key has changed, so it's time to print out the values associated
# with the previous key, then reset everything for the new one
if ($key != $last_key) {
print "$last_key " . join(',', #values) . "\n";
$last_key = $key;
#values = ();
}
# Add the current value to the list of values for this key
push #values, $val;
}
# Don't forget to print out the final key when you're done!
print "$last_key " . join(',', #values) . "\n";
__DATA__
10|1
10|2
10|3
11|2
11|4