Repeated and unique in hash-like things - raku

The repeated method takes a function as an argument for normalizing the elements before finding out which ones are repeated. However, I can't seen to make it work with values. For instance:
%(:a(3),:b(3),:c(2)).repeated( as=> *.values ).say
Returns an empty list, while I was expecting the pairs :a(3) and :b(3), same as
%(:a(3),:b(3),:c(2)).repeated( as=> .values ).say
In this case, for instance, it seems to work as expected:
(3+3i, 3+2i, 2+1i).unique(as => *.re).say # OUTPUT: «(3+3i 2+1i)␤»
Any idea of what I'm missing here?

.values is a method for returning all of the values of a container.
Since it is a List method, if you call it on a singular value it pretends it is a List containing only that value.
say 5.values.perl;
# (5,)
The as named parameter of .repeated gets called on all of the singular values.
%(:a(3),:b(3),:c(2)).repeated( as=> *.perl.say );
# :a(3)
# :b(3)
# :c(2)
So by giving it the *.values lambda, it is effectively not doing anything useful.
The method you were looking for is .value. Which is a method on a Pair.
%(:a(3),:b(3),:c(2)).repeated( as=> *.value ).say
# (a => 3)

Related

List comprehension- Multiple inputs

I am a beginner , trying to understand how list comprehension for multiple input works.
Can someone explain how the below code works?
x,y = [int(x) for x in input("Enter the value ").split()]
print(x,y)
Thanks in advance!
This is actually is not directly related to list comprehensions but instead a concept called "sequence unpacking", which applies to any sequence type (list, tuple, range). What is happening here is that the user input is expected to be two whitespace-separated values. The split call will split the user input on the whitespace, returning a list of size 2. Then, the list comprehension is looping over each element of this split-produced list and converting each one to an int. Thus, the list comprehension will return a list of length 2, and each of its elements will be "unpacked" separately into the x and y variables on the left-hand side of the assignment operator. Here is an excerpt from the Data Structures section of the Python tutorial that explains sequence unpacking:
The statement t = 12345, 54321, 'hello!' is an example of tuple packing: the values 12345, 54321 and 'hello!' are packed together in a tuple. The reverse operation is also possible:
>>> x, y, z = t
This is called, appropriately enough, sequence unpacking and works for
any sequence on the right-hand side. Sequence unpacking requires that
there are as many variables on the left side of the equals sign as
there are elements in the sequence. Note that multiple assignment is
really just a combination of tuple packing and sequence unpacking.
Note that this only works if the user input is of length 2, else the
sequence unpacking will not work and will result in an error.

How to efficiently append a dataframe column with a vector?

Working with Julia 1.1:
The following minimal code works and does what I want:
function test()
df = DataFrame(NbAlternative = Int[], NbMonteCarlo = Int[], Similarity = Float64[])
append!(df.NbAlternative, ones(Int, 5))
df
end
Appending a vector to one column of df. Note: in my whole code, I add a more complicated Vector{Int} than ones' return.
However, #code_warntype test() does return:
%8 = invoke DataFrames.getindex(%7::DataFrame, :NbAlternative::Symbol)::AbstractArray{T,1} where T
Which means I suppose, thisn't efficient. I can't manage to get what this #code_warntype error means. More generally, how can I understand errors returned by #code_warntype and fix them, this is a recurrent unclear issue for me.
EDIT: #BogumiłKamiński's answer
Then how one would do the following code ?
for na in arr_nb_alternative
#show na
for mt in arr_nb_montecarlo
println("...$mt")
append!(df.NbAlternative, ones(Int, nb_simulations)*na)
append!(df.NbMonteCarlo, ones(Int, nb_simulations)*mt)
append!(df.Similarity, compare_smaa(na, nb_criteria, nb_simulations, mt))
end
end
compare_smaa returns a nb_simulations length vector.
You should never do such things as it will cause many functions from DataFrames.jl to stop working properly. Actually such code will soon throw an error, see https://github.com/JuliaData/DataFrames.jl/issues/1844 that is exactly trying to patch this hole in DataFrames.jl design.
What you should do is appending a data frame-like object to a DataFrame using append! function (this guarantees that the result has consistent column lengths) or using push! to add a single row to a DataFrame.
Now the reason you have type instability is that DataFrame can hold vector of any type (technically columns are held in a Vector{AbstractVector}) so it is not possible to determine in compile time what will be the type of vector under a given name.
EDIT
What you ask for is a typical scenario that DataFrames.jl supports well and I do it almost every day (as I do a lot of simulations). As I have indicated - you can use either push! or append!. Use push! to add a single run of a simulation (this is not your case, but I add it as it is also very common):
for na in arr_nb_alternative
#show na
for mt in arr_nb_montecarlo
println("...$mt")
for i in 1:nb_simulations
# here you have to make sure that compare_smaa returns a scalar
# if it is passed 1 in nb_simulations
push!(df, (na, mt, compare_smaa(na, nb_criteria, 1, mt)))
end
end
end
And this is how you can use append!:
for na in arr_nb_alternative
#show na
for mt in arr_nb_montecarlo
println("...$mt")
# here you have to make sure that compare_smaa returns a vector
append!(df, (NbAlternative=ones(Int, nb_simulations)*na,
NbMonteCarlo=ones(Int, nb_simulations)*mt,
Similarity=compare_smaa(na, nb_criteria, nb_simulations, mt)))
end
end
Note that I append here a NamedTuple. As I have written earlier you can append a DataFrame or any data frame-like object this way. What "data frame-like object" means is a broad class of things - in general anything that you can pass to DataFrame constructor (so e.g. it can also be a Vector of NamedTuples).
Note that append! adds columns to a DataFrame using name matching so column names must be consistent between the target and appended object.
This is different in push! which also allows to push a row that does not specify column names (in my example above I show that a Tuple can be pushed).

What is the lambda function doing in the info_dict parameter of the summary_col in this code?

I'm running summary statistics for a group of standard OLS regressions. The code was written by my professor and I'm trying to figure out what's going on specifically in a portion of the code.
summary_col(
[reg0,reg1,reg2,reg3],
stars=True,
float_format='%0.2f',
info_dict = {
'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)
})
I looked up lambda functions. I have a fairly decent understanding of how they work. Aspects of the code that I do understand:
info_dict is a dictionary of values that can be called if you wish to include them in your summary statistics
lambda function work by calling an anonymous function "lambda x" then you place the : and list what operation you want to take place (i.e. x + 5) and then if you already know what parameters you want it to run you can put in a list after a second ":".
{0:d} will round to integers which makes perfect sense for observations. Although I don't know why you can't just say {%.f}. Maybe it's because the former returns an explicit int and the latter returns a float that looks like an int.
{:.2f} will return a float with 2 decimal places
What I don't fully understand is what somestring.format() does. Somehow x is getting defined as the results from the regression I believe and x.nobs is the variable "number of observations". Similar for x.rsquared.
Could someone fill in the gaps for me about what's going on in the formula? What exactly about the lambda function is enabling it to fetch data for each individual regression?
Let's break this out a little bit to make it obvious what is happening:
summary_col(
[reg0,reg1,reg2,reg3],
stars=True,
float_format='%0.2f',
info_dict={
'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)
}
)
The summary_col object is taking in some input, the first argument being a list of regression objects, [reg0,reg1,reg2,reg3]. Then there are three named arguments, stars, float_format, and info_dict. When we pass in the list of regression objects as the first argument, I believe that the lambda function knows to apply the anonymous function to each object. So all info_dict is doing is creating a dictionary with two keys, N and R2 which map to strings. When the member x.nobs and x.rsquared are referenced in the lambda functions they are applied against the regression objects due to the context in which these are used.
If you try to use lambda in that line of code on something that does not exist in the regression objects, you'll almost certainly get an error. The key is in the context against which the lambda is applied.
A good example on the context of lambda functions is iterating over a dictionary and sorting by key and value.
# sort the dict by value first, and key second...
# x is inferred from the context (my_dict.items())
for key, value in sorted(my_dict.items(), key=lambda x: (x[1], x[0]):
print(key, value)

Accessing the last element in Perl6

Could someone explain why this accesses the last element in Perl 6
#array[*-1]
and why we need the asterisk *?
Isn't it more logical to do something like this:
#array[-1]
The user documentation explains that *-1 is just a code object, which could also be written as
-> $n { $n - 1 }
When passed to [ ], it will be invoked with the array size as argument to compute the index.
So instead of just being able to start counting backwards from the end of the array, you could use it to eg count forwards from its center via
#array[* div 2] #=> middlemost element
#array[* div 2 + 1] #=> next element after the middlemost one
According to the design documents, the reason for outlawing negative indices (which could have been accepted even with the above generalization in place) is this:
The Perl 6 semantics avoids indexing discontinuities (a source of subtle runtime errors), and provides ordinal access in both directions at both ends of the array.
If you don't like the whatever-star, you can also do:
my $last-elem = #array.tail;
or even
my ($second-last, $last) = #array.tail(2);
Edit: Of course, there's also a head method:
my ($first, $second) = #array.head(2);
The other two answers are excellent. My only reason for answering was to add a little more explanation about the Whatever Star * array indexing syntax.
The equivalent of Perl 6's #array[*-1] syntax in Perl 5 would be $array[ scalar #array - 1]. In Perl 5, in scalar context an array returns the number of items it contains, so scalar #array gives you the length of the array. Subtracting one from this gives you the last index of the array.
Since in Perl 6 indices can be restricted to never be negative, if they are negative then they are definitely out of range. But in Perl 5, a negative index may or may not be "out of range". If it is out of range, then it only gives you an undefined value which isn't easy to distinguish from simply having an undefined value in an element.
For example, the Perl 5 code:
use v5.10;
use strict;
use warnings;
my #array = ('a', undef, 'c');
say $array[-1]; # 'c'
say $array[-2]; # undefined
say $array[-3]; # 'a'
say $array[-4]; # out of range
say "======= FINISHED =======";
results in two nearly identical warnings, but still finishes running:
c
Use of uninitialized value $array[-2] in say at array.pl line 7.
a
Use of uninitialized value in say at array.pl line 9.
======= FINISHED =======
But the Perl 6 code
use v6;
my #array = 'a', Any, 'c';
put #array[*-1]; # evaluated as #array[2] or 'c'
put #array[*-2]; # evaluated as #array[1] or Any (i.e. undefined)
put #array[*-3]; # evaluated as #array[0] or 'a'
put #array[*-4]; # evaluated as #array[-1], which is a syntax error
put "======= FINISHED =======";
will likewise warn about the undefined value being used, but it fails upon the use of an index that comes out less than 0:
c
Use of uninitialized value #array of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
in block <unit> at array.p6 line 5
a
Effective index out of range. Is: -1, should be in 0..Inf
in block <unit> at array.p6 line 7
Actually thrown at:
in block <unit> at array.p6 line 7
Thus your Perl 6 code can more robust by not allowing negative indices, but you can still index from the end using the Whatever Star syntax.
last word of advice
If you just need the last few elements of an array, I'd recommend using the tail method mentioned in mscha's answer. #array.tail(3) is much more self-explanatory than #array[*-3 .. *-1].

Why does fold have the following type in Scala?

I was looking at the way fold is defined for immutable.Set:
def fold [A1 >: A] (z: A1)(op: (A1, A1) ⇒ A1): A1
yet foldLeft is defined as:
def foldLeft [B] (z: B)(op: (B, A) ⇒ B): B
This looks weird for me, at least at first glance, since I was expecting fold to be able to change the type of the collection it returned, much like foldLeft does.
I imagine this is because foldLeft and foldRight guarantee something about the order in which the elements are folded. What is the guarantee given by fold?
When you're applying foldLeft then your start value is combined with the first list element. The result is combined with the second list element. This result with the third and so on. Eventually, the list has collapsed to one element of the same type than your start value. Therefore you just need some type that can be combined by your function with a list element.
For foldRight the same applies but in reverse order.
fold does not guarantee the order in with the combinations are done. And it does not guarantee that it starts off at only one position. The folds might happen in parallel. Because you could have the parallelism it is required that any 2 list elements or return values can be combined – this adds a constraint to the types.
Regarding your comment that you have to see a case were order has an effect: Assume you're using folds to concatenate a list of characters and you want to have a text as result. If your input is A, B, C, you probably would like to preserve the order to receive ABC instead of ACB (for example).
On the other hand if you're, say, just adding up numbers, the order does not matter. Summing up 1, 2, 3 gives 6 independent of the additions' order. In such cases using fold instead of foldLeft or foldRight could lead to faster execution.
It seems that FoldLeft must return B. The method takes a B arg - this is an accumulator. Values of A are used to "add more to" a B. The final accumulated value is returned. I think FoldLeft and FoldRight are the same in this respect.