how would you write R.compose using R.o? - ramda.js

Seems like some use to knowing a good pattern to make an n-step composition or pipeline from a binary function. Maybe it's obvious or common knowledge.
What I was trying to do was R.either(predicate1, predicate2, predicate3, ...) but R.either is one of these binary functions. I thought R.composeWith might be part of a good solution but didn't get it to work right. Then I think R.o is at the heart of it, or perhaps R.chain somehow.
Maybe there's a totally different way to make an n-ary either that could be better than a "compose-with"(R.either)... interested if so but trying to ask a more general question than that.

One common way for converting a binary function into one that takes many arguments is by using R.reduce. This requires at least the arguments of the binary function and its return type to be the same type.
For your example with R.either, it would look like:
const eithers = R.reduce(R.either, R.F)
const fooOr42 = eithers([ R.equals("foo"), R.equals(42) ])
This accepts a list of predicate functions that will each be given as arguments to R.either.
The fooOr42 example above is equivalent to:
const fooOr42 = R.either(R.either(R.F, R.equals("foo")), R.equals(42))
You can also make use of R.unapply if you want to convert the function from accepting a list of arguments, to a variable number of arguments.
const eithers = R.unapply(R.reduce(R.either, R.F))
const fooOr42 = eithers(R.equals("foo"), R.equals(42))
The approach above can be used for any type that can be combined to produce a value of the same type, where the type has some "monoid" instance. This just means that we have a binary function that combines the two types together and some "empty" value, which satisfy some simple laws:
Associativity: combine(a, combine(b, c)) == combine(combine(a, b), c)
Left identity: combine(empty, a) == a
Right identity: combine(a, empty) == a
Some examples of common types with a monoid instance include:
arrays, where the empty list is the empty value and concat is the binary function.
numbers, where 1 is the empty value and multiply is the binary function
numbers, where 0 is the empty value and add is the binary function
In the case of your example, we have predicates (a function returning a boolean value), where the empty value is R.F (a.k.a (_) => false) and the binary function is R.either. You can also combine predicates using R.both with an empty value of R.T (a.k.a (_) => true), which will ensure the resulting predicate satisfies all of the combined predicates.
It is probably also worth mentioning that you could alternatively just use R.anyPass :)

Related

Raku: return Type

I want to write a function returning an array whose all subarrays must have a length of two.
For example return will be [[1, 2], [3, 4]].
I define:
(1) subset TArray of Array where { .all ~~ subset :: where [Int, Int] };
and
sub fcn(Int $n) of TArray is export(:fcn) {
[[1, 2], [3, 4]];
}
I find (1) over-complicated. Is there something simpler?
Stepping back first
subset TArray of Array where { .all ~~ subset :: where [Int, Int] };
Is there something simpler?
Before we go there, let's step back. Even ignoring your code's "overly-complicated" nature based on just looking at it, it's also potentially problematic and complicated for various reasons that may not be so obvious. I'll highlight three:
This subset will accept an Array containing Arrays, with each of those arrays containing two Ints. But it doesn't mandate an Array[Array[Int]]. The outer Array's type may be just a generic Array, rather than being an Array[Array] let lone an Array[Array[Int]]. Indeed it will be unless you deliberately introduce strongly typed values. I will cover strong typing in the last section of this answer.
What about an empty Array? Your subset will accept that. Is that your intent? If not, what about requiring at least one pair of Ints?
The outer where clause uses a common Raku idiom of the form .all ~~ ..., with a junction on the left hand side of the ~~ smart match operator. Astonishingly, per an issue I just filed, this may be a problem. What alternatives are there?
Starting simple
Raku does a decent job of keeping simple things simple. If we put aside any artificial desire for strong typing, and focus on simple tools for tightening code up, a simple subset I would have suggested in the past would be:
subset TArray where .all == 2; # BAD despite being idiomatic???
This has all of the problems your original code has, plus in addition it accepts data that has non-integers where integers belong.
But it does have the redeeming qualities that it does a useful check (that the inner arrays each have two elements) and it's significantly simpler than your code.
Now I've reminded myself that I need to view .all on the left hand side of ~~ as possibly a problem, I'll instead write it as:
subset TArray where 2 == .all; # Potentially the new idiomatic.
This version reads more poorly, but, while readability is important, basic correctness is more important.
Still fairly simple, and less problems
Here are two variants I came up with:
subset TArray where all .map: * ~~ (Int,Int);
subset TArray where .elems == .grep: (Int,Int);
These both avoid the junction/smartmatch problem. (The first where expression does have a junction to the left of a smart match, but it's not an example of the problem.)
The second version isn't so obviously correct (think of it as checking that the count of subarrays is the same as the count of subarrays that match (Int,Int)) but it nicely lends itself to fixing the problem of matching if there are zero subarrays, if that were to need fixing:
subset TArray where 0 < .elems == .grep: (Int,Int);
Strong typing solutions
The solutions thus far don't deal with strong typing. Perhaps that's desirable. Perhaps not.
To understand what I mean by this, let's first look at literals:
say WHAT 1; # (Int)
say WHAT [1,2]; # (Array)
say WHAT [[1,2],[3,4]]; # (Array)
These values have types determined by their literal constructors.
The last two are just Arrays, generic over their elements.
(The second is not an Array[Int], which might be expected. Similarly the last one is not an Array[Array[Int]].)
Current built in Raku literal forms for composite types (arrays and hashes) all construct generic Arrays which do not constrain the types of their elements.
See the PR Introduce [1,2,3]:Int syntax #4406 for a proposal/PR regarding element typed composite literals and a related issue I just posted in response to your Q here about an alternative and/or complementary approach to that PR. (There have been discussions over the years about this aspect of the type system but it seems like it's time for Rakoons to look at addressing it.)
What if you wanted to build a strongly typed data structure as the value to return from your routine, and to have the return type check that?
Here's one way one might build such a strongly typed value:
my Array[Array[Int]] $result .= new: Array[Int].new(1,2), Array[Int].new(3,4);
Super verbose! But now you could write the following for your sub's return type check and it'll work:
subset TArray of Array[Array[Int]] where 0 < .elems == .grep: (Int,Int);
sub fcn(Int $n) of TArray is export(:fcn) {
my Array[Array[Int]] $result .= new: Array[Int].new(1,2), Array[Int].new(3,4);
}
Another way to build a strongly typed value is to specify not only the strong typing in a variable's type constraint, but also coercion typing to bridge from a loosely typed value to a strongly typed target.
We keep the exact same subset (that establishes the strongly typed target data structure and adds "refinement typing" checks):
subset TArray of Array[Array[Int]] where 0 < .elems == .grep: (Int,Int);
But instead of using a verbose correct-by-construction initialization value, using full type names and news, we introduce additional coercion typing and then just use ordinary literal syntax:
constant TArrayInitialization = TArray(Array[Array[Int]()]());
sub fcn(Int $n) of TArray is export(:fcn) {
my TArrayInitialization $result = [[1,2],[3,4]];
}
(I could have written the TArrayInitialization declaration as another subset, but it would be a slight overkill to have done so. A constant does the job with less fuss.)
I gather that the aim is to restrict the type of the inner Array to [Int,Int] ... the closest I can get to this is to declare two subsets, one based on the other...
subset IArray where * ~~ [Int, Int];
subset TArray where .all ~~ IArray;
Otherwise, the anonymous subset form you use seems to be the briefest, although as #raiph points out you can drop the 'of Array' piece.
If you wanted to impose this sort of constraint on a function's parameter (rather than its return type) you could do so with something like:
sub fcn(#a where {all .map: * ~~ [Int, Int]}) {...}
As the other answers have mentioned, there currently isn't great syntax for similarly constraining the return type, but there's a proposal to add support for similar syntax for return types. In fact, as mentioned in that issue, someone has volunteered to work on an implementation but hasn't yet made any progress as far as I know. (And I guess I should know, since I was that volunteer… oops)
So, for now, a subset is the best option – but hopefully the future will have even better ways to write that.

What are the advantages of returning -1 instead of null in indexOf(...)?

When calling List.indexOf(...), what are the advantages of returning -1 rather than null if the value isn't present?
For example:
val list = listOf("a", "b", "c")
val index = list.indexOf("d")
print(index) // Prints -1
Wouldn't it be a cleaner result if index was null instead? If it had an optional return type, then it would be compatible with the elvis operator :? as well as doing things such as index?.let { ... }.
What are the advantages of returning -1 instead of null when there are no matches?
Just speculations but i could think of two reasons:
The first reason is to be compatible with Java and its List.indexOf
As the documentation states:
Returns:
the index of the first occurrence of the specified element in this list, or -1 if this list does not contain the element
The second reason is to have the same datatype as kotlins binarySearch.
Return the index of the element, if it is contained in the list within the specified range; otherwise, the inverted insertion point (-insertion point - 1). The insertion point is defined as the index at which the element should be inserted, so that the list (or the specified subrange of list) still remains sorted.
Where the negative values actually hold additional information where to insert the element if absent. But since the normal indexOf method works on unsorted collections you can not infer the insertion position.
To add to the definitive answer of #Burdui, another reason of such behavior is that -1 return value can be expressed with the same primitive Int type as the other possible results of indexOf function.
If indexOf returned null, it would require making its return type nullable, Int?, and that would cause a primitive return value being boxed into an object. indexOf is often used in a tight loop, for example, when searching for all occurrences of a substring in a string, and having boxing on that hot path could make the cost of using indexOf prohibitive.
On the other hand, there definitely can be situations where performance does not so matter, and returning null from indexOf would make code more expressive. There's a request KT-8133 to introduce indexOfOrNull extension for such situations.
Meanwhile a workaround with calling .takeIf { it >= 0 } on the result of indexOf allows to achieve the same.

What is the lambda function doing in the info_dict parameter of the summary_col in this code?

I'm running summary statistics for a group of standard OLS regressions. The code was written by my professor and I'm trying to figure out what's going on specifically in a portion of the code.
summary_col(
[reg0,reg1,reg2,reg3],
stars=True,
float_format='%0.2f',
info_dict = {
'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)
})
I looked up lambda functions. I have a fairly decent understanding of how they work. Aspects of the code that I do understand:
info_dict is a dictionary of values that can be called if you wish to include them in your summary statistics
lambda function work by calling an anonymous function "lambda x" then you place the : and list what operation you want to take place (i.e. x + 5) and then if you already know what parameters you want it to run you can put in a list after a second ":".
{0:d} will round to integers which makes perfect sense for observations. Although I don't know why you can't just say {%.f}. Maybe it's because the former returns an explicit int and the latter returns a float that looks like an int.
{:.2f} will return a float with 2 decimal places
What I don't fully understand is what somestring.format() does. Somehow x is getting defined as the results from the regression I believe and x.nobs is the variable "number of observations". Similar for x.rsquared.
Could someone fill in the gaps for me about what's going on in the formula? What exactly about the lambda function is enabling it to fetch data for each individual regression?
Let's break this out a little bit to make it obvious what is happening:
summary_col(
[reg0,reg1,reg2,reg3],
stars=True,
float_format='%0.2f',
info_dict={
'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)
}
)
The summary_col object is taking in some input, the first argument being a list of regression objects, [reg0,reg1,reg2,reg3]. Then there are three named arguments, stars, float_format, and info_dict. When we pass in the list of regression objects as the first argument, I believe that the lambda function knows to apply the anonymous function to each object. So all info_dict is doing is creating a dictionary with two keys, N and R2 which map to strings. When the member x.nobs and x.rsquared are referenced in the lambda functions they are applied against the regression objects due to the context in which these are used.
If you try to use lambda in that line of code on something that does not exist in the regression objects, you'll almost certainly get an error. The key is in the context against which the lambda is applied.
A good example on the context of lambda functions is iterating over a dictionary and sorting by key and value.
# sort the dict by value first, and key second...
# x is inferred from the context (my_dict.items())
for key, value in sorted(my_dict.items(), key=lambda x: (x[1], x[0]):
print(key, value)

Using a hash with object keys in Perl 6

I'm trying to make a Hash with non-string keys, in my case arrays or lists.
> my %sum := :{(1, 3, 5) => 9, (2, 4, 6) => 12}
{(1 3 5) => 9, (2 4 6) => 12}
Now, I don't understand the following.
How to retrieve an existing element?
> %sum{(1, 3, 5)}
((Any) (Any) (Any))
> %sum{1, 3, 5}
((Any) (Any) (Any))
How to add a new element?
> %sum{2, 4} = 6
(6 (Any))
Several things are going on here: first of all, if you use (1,2,3) as a key, Rakudo Perl 6 will consider this to be a slice of 3 keys: 1, 2 and 3. Since neither of these exist in the object hash, you get ((Any) (Any) (Any)).
So you need to indicate that you want the list to be seen as single key of which you want the value. You can do this with $(), so %sum{$(1,3,5)}. This however does not give you the intended result. The reason behind that is the following:
> say (1,2,3).WHICH eq (1,2,3).WHICH
False
Object hashes internally key the object to its .WHICH value. At the moment, Lists are not considered value types, so each List has a different .WHICH. Which makes them unfit to be used as keys in object hashes, or in other cases where they are used by default (e.g. .unique and Sets, Bags and Mixes).
I'm actually working on making this the above eq return True before long: this should make it to the 2018.01 compiler release, on which also a Rakudo Star release will be based.
BTW, any time you're using object hashes and integer values, you will probably be better of using Bags. Alas not yet in this case either for the above reason.
You could actually make this work by using augment class List and adding a .WHICH method on that, but I would recommend against that as it will interfere with any future fixes.
Elizabeth's answer is solid, but until that feature is created, I don't see why you can't create a Key class to use as the hash key, which will have an explicit hash function which is based on its values rather than its location in memory. This hash function, used for both placement in the list and equality testing, is .WHICH. This function must return an ObjAt object, which is basically just a string.
class Key does Positional {
has Int #.list handles <elems AT-POS EXISTS-POS ASSIGN-POS BIND-POS push>;
method new(*#list) { self.bless(:#list); }
method WHICH() { ObjAt.new(#!list.join('|')); }
}
my %hsh{Key};
%hsh{Key.new(1, 3)} = 'result';
say %hsh{Key.new(1, 3)}; # output: result
Note that I only allowed the key to contain Int. This is an easy way of being fairly confident no element's string value contains the '|' character, which could make two keys look the same despite having different elements. However, this is not hardened against naughty users--4 but role :: { method Str() { '|' } } is an Int that stringifies to the illegal value. You can make the code stronger if you use .WHICH recursively, but I'll leave that as an exercise.
This Key class is also a little fancier than you strictly need. It would be enough to have a #.list member and define .WHICH. I defined AT-POS and friends so the Key can be indexed, pushed to, and otherwise treated as an Array.

Generating Random String of Numbers and Letters Using Go's "testing/quick" Package

I've been breaking my head over this for a few days now and can't seem to be able to figure it out. Perhaps it's glaringly obvious, but I don't seem to be able to spot it. I've read up on all the basics of unicode, UTF-8, UTF-16, normalisation, etc, but to no avail. Hopefully somebody's able to help me out here...
I'm using Go's Value function from the testing/quick package to generate random values for the fields in my data structs, in order to implement the Generator interface for the structs in question. Specifically, given a Metadata struct, I've defined the implementation as follows:
func (m *Metadata) Generate(r *rand.Rand, size int) (value reflect.Value) {
value = reflect.ValueOf(m).Elem()
for i := 0; i < value.NumField(); i++ {
if t, ok := quick.Value(value.Field(i).Type(), r); ok {
value.Field(i).Set(t)
}
}
return
}
Now, in doing so, I'll end up with both the receiver and the return value being set with random generated values of the appropriate type (strings, ints, etc. in the receiver and reflect.Value in the returned reflect.Value).
Now, the implementation for the Value function states that it will return something of type []rune converted to type string. As far as I know, this should allow me to then use the functions in the runes, unicode and norm packages to define a filter which filters out everything which is not part of 'Latin', 'Letter' or 'Number'. I defined the following filter which uses a transform to filter out letters which are not in those character rangetables (as defined in the unicode package):
func runefilter(in reflect.Value) (out reflect.Value) {
out = in // Make sure you return something
if in.Kind() == reflect.String {
instr := in.String()
t := transform.Chain(norm.NFD, runes.Remove(runes.NotIn(rangetable.Merge(unicode.Letter, unicode.Latin, unicode.Number))), norm.NFC)
outstr, _, _ := transform.String(t, instr)
out = reflect.ValueOf(outstr)
}
return
}
Now, I think I've tried just about anything, but I keep ending up with a series of strings which are far from the Latin range, e.g.:
𥗉똿穊
𢷽嚶
秓䝏小𪖹䮋
𪿝ท솲
𡉪䂾
ʋ𥅮ᦸ
堮𡹯憨𥗼𧵕ꥆ
𢝌𐑮𧍛併怃𥊇
鯮
𣏲𝐒
⓿ꐠ槹𬠂黟
𢼭踁퓺𪇖
俇𣄃𔘧
𢝶
𝖸쩈𤫐𢬿詢𬄙
𫱘𨆟𑊙
欓
So, can anybody explain what I'm overlooking here and how I could instead define a transformer which removes/replaces non-letter/number/latin characters so that I can use the Value function as intended (but with a smaller subset of 'random' characters)?
Thanks!
Confusingly the Generate interface needs a function using the type not a the pointer to the type. You want your type signature to look like
func (m Metadata) Generate(r *rand.Rand, size int) (value reflect.Value)
You can play with this here. Note: the most important thing to do in that playground is to switch the type of the generate function from m Metadata to m *Metadata and see that Hi Mom! never prints.
In addition, I think you would be better served using your own type and writing a generate method for that type using a list of all of the characters you want to use. For example:
type LatinString string
const latin = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01233456789"
and then use the generator
func (l LatinString) Generate(rand *rand.Rand, size int) reflect.Value {
var buffer bytes.Buffer
for i := 0; i < size; i++ {
buffer.WriteString(string(latin[rand.Intn(len(latin))]))
}
s := LatinString(buffer.String())
return reflect.ValueOf(s)
}
playground
Edit: also this library is pretty cool, thanks for showing it to me
The answer to my own question is, it seems, a combination of the answers provided in the comments by #nj_ and #jimb and the answer provided by #benjaminkadish.
In short, the answer boils down to:
"Not such a great idea as you thought it was", or "Bit of an ill-posed question"
"You were using the union of 'Letter', 'Latin' and 'Number' (Letter || Number || Latin), instead of the intersection of 'Latin' with the union of 'Letter' and 'Number' ((Letter || Number) && Latin))
Now for the longer version...
The idea behind me using the testing/quick package is that I wanted random data for (fuzzy) testing of my code. In the past, I've always written the code for doing things like that myself, again and again. This meant a lot of the same code across different projects. Now, I could of course written my own package for it, but it turns out that, even better than that, there's actually a standard package which does just about exactly what I want.
Now, it turns out the package does exactly what I want very well. The codepoints in the strings which it generates are actually random and not just restricted to what we're accustomed to using in everyday life. Now, this is of course exactly the thing which you want in doing fuzzy testing in order to test the code with values outside the usual assumptions.
In practice, that means I'm running into two problems:
There's some limits on what I would consider reasonable input for a string. Meaning that, in testing the processing of a Name field or a URL field, I can reasonably assume there's not going to be a value like 'James Mc⌢' (let alone 'James Mc🙁') or 'www.🕸site.com', but just 'James McFrown' and 'www.website.com'. Hence, I can't expect a reasonable system to be able to support it. Of course, things shouldn't completely break down, but it also can't be expected to handle the former examples without any problems.
When I filter the generated string on values which one might consider reasonable, the chance of ending up with a valid string is very small. The set of possible characters in the set used by the testing/quick is just so large (0x10FFFF) and the set of reasonable characters so small, you end up with empty strings most of the time.
So, what do we need to take away from this?
So, whilst I hoped to use the standard testing/quick package to replace my often repeated code to generate random data for fuzzy testing, it does this so well that it provides data outside the range of what I would consider reasonable for the code to be able to handle. It seems that the choice, in the end, is to:
Either be able to actually handle all fuzzy options, meaning that if somebody's name is 'Arnold 💰💰' ('Arnold Moneybags'), it shouldn't go arse over end. Or...
Use custom/derived types with their own Generator. This means you're going to have to use the derived type instead of the basic type throughout the code. (Comparable to defining a string as wchar_t instead of char in C++ and working with those by default.). Or...
Don't use testing/quick for fuzzy testing, because as soon as you run into a generated string value, you can (and should) get a very random string.
As always, further comments are of course welcome, as it's quite possible I overlooked something.