Unsinking sunk calls via CALL-ME - raku

While trying to come up with an example of maps in sunk context, I bumped into this code:
my $a = -> { 42 };
my $b = -> { "foo" };
$a;
$a();
($a,$b).map: { $_ };
The first call to $a by itself returns:
WARNINGS for /home/jmerelo/Code/raku/my-raku-examples/sunk-map.p6:
Useless use of $a in sink context (line 6)
However, putting .() or () behind, or using them in (what I though it was) sink context in a map didn't result in any warning. Probably it's not sink context, but I would like to know why.

$a;
$a();
($a,$b).foo ...
The first call to $a by itself
That's not a "call". That's just a mention of it aka "use". (Note: this, er, use of "use" has nothing to do with the keyword use in a use statement.)
WARNINGS for /home/jmerelo/Code/raku/my-raku-examples/sunk-map.p6:
Useless use of $a in sink context (line 6)
Right. Just mentioning it like that is useless. You don't do anything with the value of $a (which has been assigned a function). You just drop it on the floor.
However, putting .() or () behind, or using them in (what I though it was) sink context in a map didn't result in any warning.
Right. Those cause .CALL-ME to be invoked on their left hand side. Those are understood to be potentially useful regardless of whether any value returned by the call is used by the code. So there's no warning despite being in sink context (though perhaps sink context is itself context dependent and at the compiler level they aren't even considered to be in sink context; I don't know).
cf my comment about a similar situation in perl.

Related

Alter how arguments are processed before they're passed to sub MAIN

Given the documentation and the comments on an earlier question, by request I've made a minimal reproducible example that demonstrates a difference between these two statements:
my %*SUB-MAIN-OPTS = :named-anywhere;
PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True;
Given a script file with only this:
#!/usr/bin/env raku
use MyApp::Tools::CLI;
and a module file in MyApp/Tools called CLI.pm6:
#PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True;
my %*SUB-MAIN-OPTS = :named-anywhere;
proto MAIN(|) is export {*}
multi MAIN( 'add', :h( :$hostnames ) ) {
for #$hostnames -> $host {
say $host;
}
}
multi MAIN( 'remove', *#hostnames ) {
for #hostnames -> $host {
say $host;
}
}
The following invocation from the command line will not result in a recognized subroutine, but show the usage:
mre.raku add -h=localhost -h=test1
Switching my %*SUB-MAIN-OPTS = :named-anywhere; for PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True; will print two lines with the two hostnames provided, as expected.
If however, this is done in a single file as below, both work identical:
#!/usr/bin/env raku
#PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True;
my %*SUB-MAIN-OPTS = :named-anywhere;
proto MAIN(|) is export {*}
multi MAIN( 'add', :h( :$hostnames )) {
for #$hostnames -> $host {
say $host;
}
}
multi MAIN( 'remove', *#hostnames ) {
for #hostnames -> $host {
say $host;
}
}
I find this hard to understand.
When reproducing this, be alert of how each command must be called.
mre.raku remove localhost test1
mre.raku add -h=localhost -h=test1
So a named array-reference is not recognized when this is used in a separate file with my %*SUB-MAIN-OPTS = :named-anywhere;. While PROCESS::<%SUB-MAIN-OPTS><named-anywhere> = True; always works. And for a slurpy array, both work identical in both cases.
The problem is that it isn't the same variable in both the script and in the module.
Sure they have the same name, but that doesn't mean much.
my \A = anon class Foo {}
my \B = anon class Foo {}
A ~~ B; # False
B ~~ A; # False
A === B; # False
Those two classes have the same name, but are separate entities.
If you look at the code for other built-in dynamic variables, you see something like:
Rakudo::Internals.REGISTER-DYNAMIC: '$*EXECUTABLE-NAME', {
PROCESS::<$EXECUTABLE-NAME> := $*EXECUTABLE.basename;
}
This makes sure that the variable is installed into the right place so that it works for every compilation unit.
If you look for %*SUB-MAIN-OPTS, the only thing you find is this line:
my %sub-main-opts := %*SUB-MAIN-OPTS // {};
That looks for the variable in the main compilation unit. If it isn't found it creates and uses an empty Hash.
So when you try do it in a scope other than the main compilation unit, it isn't in a place where it could be found by that line.
To test if adding that fixes the issue, you can add this to the top of the main compilation unit. (The script that loads the module.)
BEGIN Rakudo::Internals.REGISTER-DYNAMIC: '%*SUB-MAIN-OPTS', {
PROCESS::<%SUB-MAIN-OPTS> := {}
}
Then in the module, write this:
%*SUB-MAIN-OPTS = :named-anywhere;
Or better yet this:
%*SUB-MAIN-OPTS<named-anywhere> = True;
After trying this, it seems to work just fine.
The thing is, that something like that used to be there.
It was removed on the thought that it slows down every Raku program.
Though I think that any slowdown it caused would still be an issue as the line that is still there has to look to see if there is a dynamic variable of that name.
(There are more reasons given, and I frankly disagree with all of them.)
May a cuppa bring enlightenment to future SO readers pondering the meaning of things.[1]
Related answers by Liz
I think Liz's answer to an SO asking a similar question may be a good read for a basic explanation of why a my (which is like a lesser our) in the mainline of a module doesn't work, or at least confirmation that core devs know about it.
Her later answer to another SO explains how one can use my by putting it inside a RUN-MAIN.
Why does a slurpy array work by default but not named anywhere?
One rich resource on why things are the way they are is the section Declaring a MAIN subroutine of S06 (Synopsis on Subroutines)[2].
A key excerpt:
As usual, switches are assumed to be first, and everything after the first non-switch, or any switches after a --, are treated as positionals or go into the slurpy array (even if they look like switches).
So it looks like this is where the default behavior, in which nameds can't go anywhere, comes from; it seems that #Larry[3] was claiming that the "usual" shell convention was as described, and implicitly arguing that this should dictate that the default behavior was as it is.
Since Raku was officially released RFC: Allow subcommands in MAIN put us on the path to todays' :named-anywhere option. The RFC presented a very powerful 1-2 punch -- an unimpeachable two line hackers' prose/data argument that quickly led to rough consensus, with a working code PR with this commit message:
Allow --named-switches anywhere in command line.
Raku was GNU-like in that it has '--double-dashes' and that it stops interpreting named parameters when it encounters '--', but unlike GNU-like parsing, it also stopped interpreting named parameters when encountering any positional argument. This patch makes it a bit more GNU-like by allowing named arguments after a positional, to prepare for allowing subcommands.
> Alter how arguments are processed before they're passed to sub MAIN
In the above linked section of S06 #Larry also wrote:
Ordinarily a top-level Raku "script" just evaluates its anonymous mainline code and exits. During the mainline code, the program's arguments are available in raw form from the #*ARGS array.
The point here being that you can preprocess #*ARGS before they're passed to MAIN.
Continuing:
At the end of the mainline code, however, a MAIN subroutine will be called with whatever command-line arguments remain in #*ARGS.
Note that, as explained by Liz, Raku now has a RUN-MAIN routine that's called prior to calling MAIN.
Then comes the standard argument processing (alterable by using standard options, of which there's currently only the :named-anywhere one, or userland modules such as SuperMAIN which add in various other features).
And finally #Larry notes that:
Other [command line parsing] policies may easily be introduced by calling MAIN explicitly. For instance, you can parse your arguments with a grammar and pass the resulting Match object as a Capture to MAIN.
A doc fix?
Yesterday you wrote a comment suggesting a doc fix.
I now see that we (collectively) know about the coding issue. So why is the doc as it is? I think the combination of your SO and the prior ones provide enough anecdata to support at least considering filing a doc issue to the contrary. Then again Liz hints in one of the SO's that a fix might be coming, at least for ours. And SO is itself arguably doc. So maybe it's better to wait? I'll punt and let you decide. At least you now have several SOs to quote if you decide to file a doc issue.
Footnotes
[1] I want to be clear that if anyone perceives any fault associated with posting this SO then they're right, and the fault is entirely mine. I mentioned to #acw that I'd already done a search so they could quite reasonably have concluded there was no point in them doing one as well. So, mea culpa, bad coffee inspired puns included. (Bad puns, not bad coffee.)
[2] Imo these old historical speculative design docs are worth reading and rereading as you get to know Raku, despite them being obsolete in parts.
[3] #Larry emerged in Raku culture as a fun and convenient shorthand for Larry Wall et al, the Raku language team led by Larry.

Mixing-in roles in traits apparently not working

This example is taken from roast, although it's been there for 8 years:
role doc { has $.doc is rw }
multi trait_mod:<is>(Variable $a, :$docced!) {
$a does doc.new(doc => $docced);
}
my $dog is docced('barks');
say $dog.VAR;
This returns Any, without any kind of role mixed in. There's apparently no way to get to the "doc" part, although the trait does not error. Any idea?
(This answer builds on #guifa's answer and JJ's comment.)
The idiom to use in variable traits is essentially $var.var.VAR.
While that sounds fun when said aloud it also seems crazy. It isn't, but it demands explanation at the very least and perhaps some sort of cognitive/syntactic relief.
Here's the brief version of how to make some sense of it:
$var makes sense as the name of the trait parameter because it's bound to a Variable, a compiler's-eye view of a variable.
.var is needed to access the user's-eye view of a variable given the compiler's-eye view.
If the variable is a Scalar then a .VAR is needed as well to get the variable rather than the value it contains. (It does no harm if it isn't a Scalar.)
Some relief?
I'll explain the above in more detail in a mo, but first, what about some relief?
Perhaps we could introduce a new Variable method that does .var.VAR. But imo this would be a mistake unless the name for the method is so good it essentially eliminates the need for the $var.var.VAR incantation explanation that follows in the next section of this answer.
But I doubt such a name exists. Every name I've come up with makes matters worse in some way. And even if we came up with the perfect name, it would still barely be worth it at best.
I was struck by the complexity of your original example. There's an is trait that calls a does trait. So perhaps there's call for a routine that abstracts both that complexity and the $var.var.VAR. But there are existing ways to reduce that double trait complexity anyway, eg:
role doc[$doc] { has $.doc is rw = $doc}
my $dog does doc['barks'];
say $dog.doc; # barks
A longer explanation of $var.var.VAR
But $v is already a variable. Why so many var and VARs?
Indeed. $v is bound to an instance of the Variable class. Isn't that enough?
No, because a Variable:
Is for storing metadata about a variable while it's being compiled. (Perhaps it should have been called Metadata-About-A-Variable-Being-Compiled? Just kidding. Variable looks nice in trait signatures and changing its name wouldn't stop us needing to use and explain the $var.var.VAR idiom anyway.)
Is not the droid we are looking for. We want a user's-eye view of the variable. One that's been declared and compiled and is then being used as part of user code. (For example, $dog in the line say $dog.... Even if it were BEGIN say $dog..., so it ran at compile-time, $dog would still refer to a symbol that's bound to a user's-eye view container or value. It would not refer to the Variable instance that's only the compiler's-eye view of data related to the variable.)
Makes life easier for the compiler and those writing traits. But it requires that a trait writer accesses the user's-eye view of the variable to access or alter the user's-eye view. The .var attribute of the Variable stores that user's-eye view. (I note the roast test has a .container attribute that you omitted. That's clearly now been renamed .var. My guess is that that's because a variable may be bound to an immutable value rather than a container so the name .container was considered misleading.)
So, how do we arrive at $var.var.VAR?
Let's start with a variant of your original code and then move forward. I'll switch from $dog to #dog and drop the .VAR from the say line:
multi trait_mod:<is>(Variable $a, :$docced!) {
$a does role { has $.doc = $docced }
}
my #dog is docced('barks');
say #dog.doc; # No such method 'doc' for invocant of type 'Array'
This almost works. One tiny change and it works:
multi trait_mod:<is>(Variable $a, :$docced!) {
$a.var does role { has $.doc = $docced }
}
my #dog is docced('barks');
say #dog.doc; # barks
All I've done is add a .var to the ... does role ... line. In your original, that line is modifying the compiler's-eye view of the variable, i.e. the Variable object bound to $a. It doesn't modify the user's-eye view of the variable, i.e. the Array bound to #dog.
As far as I know everything now works correctly for plural containers like arrays and hashes:
#dog[1] = 42;
say #dog; # [(Any) 42]
say #dog.doc; # barks
But when we try it with a Scalar variable:
my $dog is docced('barks');
we get:
Cannot use 'does' operator on a type object Any.
This is because the .var returns whatever it is that the user's-eye view variable usually returns. With an Array you get the Array. But with a Scalar you get the value the Scalar contains. (This is a fundamental aspect of P6. It works great but you have to know it in these sorts of scenarios.)
So to get this to appear to work again we have to add a couple .VAR's as well. For anything other than a Scalar a .VAR is a "no op" so it does no harm to cases other than a Scalar to add it:
multi trait_mod:<is>(Variable $a, :$docced!) {
$a.var.VAR does role { has $.doc = $docced }
}
And now the Scalar case also appears to work:
my $dog is docced('barks');
say $dog.VAR.doc; # barks
(I've had to reintroduce the .VAR in the say line for the same reason I had to add it to the $a.var.VAR ... line.)
If all were well that would be the end of this answer.
A bug
But something is broken. If we'd attempted to initialize the Scalar variable:
my $dog is docced('barks') = 42;
we'd see:
Cannot assign to an immutable value
As #guifa noted, and I stumbled on a while back:
It seems that a Scalar with a mixin no longer successfully functions as a container and the assignment fails. This currently looks to me like a bug.
Not a satisfactory answer but maybe you can progress from it
role doc {
has $.doc is rw;
}
multi trait_mod:<is>(Variable:D $v, :$docced!) {
$v.var.VAR does doc;
$v.var.VAR.doc = $docced;
}
say $dog; # ↪︎ Scalar+{doc}.new(doc => "barks")
say $dog.doc;  # ↪︎ barks
$dog.doc = 'woofs'; #
say $dog; # ↪︎ Scalar+{doc}.new(doc => "woofs")
Unfortunately, there is something off with this, and applying the trait seems to cause the variable to become immutable.

Specifying lifetimes in nested iterators for flattening

I'm writing a function that should take a couple of vectors and produce their Cartesian product (all of their pair combinations) as a vector in row-major order. In other words, if I have
let x_coords = vec![1, 2, 3];
let y_coords = vec![4, 5, 6];
I want to produce
vec![ [1,4], [1,5], [1,6], [2,4], [2,5], [2,6], [3,4], [3,5], [3,6] ]
This seemed like a perfect job for .flat_map():
fn main() {
let x_coords = vec![1, 2, 3];
let y_coords = vec![4, 5, 6];
let coord_vec: Vec<[isize; 2]> =
x_coords.iter().map(|&x_coord| {
y_coords.iter().map(|&y_coord| {
[x_coord, y_coord]
})
}).flat_map(|s| s).collect();
// Expecting to get: vec![ [1,4], [1,5], [1,6], [2,4], [2,5], [2,6], [3,4], [3,5], [3,6] ]
println!("{:?}", &coord_vec);
}
But this doesn't work, because &x_coord doesn't live long enough. According to the compiler, it ends up inside the y_coords map and then never makes it back out.
I tried using .clone() and move in the closures, but got a strangely long and unclear lecture by the compiler in the form of multiple Note: lines.
Am I just completely off base with flat_map, or can this be saved?
Solution
You were really close! This works:
let coord_vec: Vec<_> = x_coords.iter()
.flat_map(|&x_coord| {
y_coords.iter().map(move |&y_coord| {
// ^^^^
[x_coord, y_coord]
})
})
.collect();
The only thing I added is the move keyword in front of the inner closure. Why? Let's try to understand what the compiler is thinking below!
A few notes, though:
I renamed map to flat_map and removed the second flat_map call... You made your life more complicated than it has to be ;-)
I omitted part of the type annotation of coord_vec, because it's not necessary
Explanation
The type x_coord is i32 (or any other integer). Not a reference or anything but the value directly. This means that x_coord is owned by the enclosing function, which happens to be a closure, specifically the "outer" closure, that you pass to flat_map. Thus x_coord lives just inside the closure, not longer. That's what the compiler is telling you. So far so good.
When you define the second closure (the "inner" one), you access the environment, specifically x_coord. The important question now is: how does a closure access its environment? It can do so with an immutable reference, with a mutable reference and by value. The Rust compiler determines what kind of access to the environment is needed and chooses the "least intrusive" option that works. Let's look at your code: the compiler figures out that the closure only needs to borrow the environment immutably (because i32 is Copy and thus the closure can transform its &i32 into a i32 easily).
But in this case, the compiler's reasoning is wrong! The closure borrowing its environment leads to a limited lifetime. And in this case we need the closure to live for longer than it can, hence the error.
By adding move, we force the compiler to pass the environment to the closure by value (transferring ownership). This way the closure borrows nothing and can live forever (satisfies 'static).

Alternative to the try (?) operator suited to iterator mapping

In the process of learning Rust, I am getting acquainted with error propagation and the choice between unwrap and the ? operator. After writing some prototype code that only uses unwrap(), I would like to remove unwrap from reusable parts, where panicking on every error is inappropriate.
How would one avoid the use of unwrap in a closure, like in this example?
// todo is VecDeque<PathBuf>
let dir = fs::read_dir(&filename).unwrap();
todo.extend(dir.map(|dirent| dirent.unwrap().path()));
The first unwrap can be easily changed to ?, as long as the containing function returns Result<(), io::Error> or similar. However, the second unwrap, the one in dirent.unwrap().path(), cannot be changed to dirent?.path() because the closure must return a PathBuf, not a Result<PathBuf, io::Error>.
One option is to change extend to an explicit loop:
let dir = fs::read_dir(&filename)?;
for dirent in dir {
todo.push_back(dirent?.path());
}
But that feels wrong - the original extend was elegant and clearly reflected the intention of the code. (It might also have been more efficient than a sequence of push_backs.) How would an experienced Rust developer express error checking in such code?
How would one avoid the use of unwrap in a closure, like in this example?
Well, it really depends on what you wish to do upon failure.
should failure be reported to the user or be silent
if reported, should one failure be reported or all?
if a failure occur, should it interrupt processing?
For example, you could perfectly decide to silently ignore all failures and just skip the entries that fail. In this case, the Iterator::filter_map combined with Result::ok is exactly what you are asking for.
let dir = fs::read_dir(&filename)?;
let todos.extend(dir.filter_map(Result::ok));
The Iterator interface is full of goodies, it's definitely worth perusing when looking for tidier code.
Here is a solution based on filter_map suggested by Matthieu. It calls Result::map_err to ensure the error is "caught" and logged, sending it further to Result::ok and filter_map to remove it from iteration:
fn log_error(e: io::Error) {
eprintln!("{}", e);
}
(|| {
let dir = fs::read_dir(&filename)?;
todo.extend(dir
.filter_map(|res| res.map_err(log_error).ok()))
.map(|dirent| dirent.path()));
})().unwrap_or_else(log_error)

How to do error handling in Rust and what are the common pitfalls?

I noticed that Rust does not have exceptions. How to do error handling in Rust and what are the common pitfalls? Are there ways to control flow with raise, catch, reraise and other stuff? I found inconsistent information on this.
Rust generally solves errors in two ways:
Unrecoverable errors. Once you panic!, that's it. Your program or thread aborts because it encounters something it can't solve and its invariants have been violated. E.g. if you find invalid sequences in what should be a UTF-8 string.
Recoverable errors. Also called failures in some documentation. Instead of panicking, you emit a Option<T> or Result<T, E>. In these cases, you have a choice between a valid value Some(T)/Ok(T) respectively or an invalid value None/Error(E). Generally None serves as a null replacement, showing that the value is missing.
Now comes the hard part. Application.
Unwrap
Sometimes dealing with an Option is a pain in the neck, and you are almost guaranteed to get a value and not an error.
In those cases it's perfectly fine to use unwrap. unwrap turns Some(e) and Ok(e) into e, otherwise it panics. Unwrap is a tool to turn your recoverable errors into unrecoverable.
if x.is_some() {
y = x.unwrap(); // perfectly safe, you just checked x is Some
}
Inside the if-block it's perfectly fine to unwrap since it should never panic because we've already checked that it is Some with x.is_some().
If you're writing a library, using unwrap is discouraged because when it panics the user cannot handle the error. Additionally, a future update may change the invariant. Imagine if the example above had if x.is_some() || always_return_true(). The invariant would changed, and unwrap could panic.
? operator / try! macro
What's the ? operator or the try! macro? A short explanation is that it either returns the value inside an Ok() or prematurely returns error.
Here is a simplified definition of what the operator or macro expand to:
macro_rules! try {
($e:expr) => (match $e {
Ok(val) => val,
Err(err) => return Err(err),
});
}
If you use it like this:
let x = File::create("my_file.txt")?;
let x = try!(File::create("my_file.txt"));
It will convert it into this:
let x = match File::create("my_file.txt") {
Ok(val) => val,
Err(err) => return Err(err),
};
The downside is that your functions now return Result.
Combinators
Option and Result have some convenience methods that allow chaining and dealing with errors in an understandable manner. Methods like and, and_then, or, or_else, ok_or, map_err, etc.
For example, you could have a default value in case your value is botched.
let x: Option<i32> = None;
let guaranteed_value = x.or(Some(3)); //it's Some(3)
Or if you want to turn your Option into a Result.
let x = Some("foo");
assert_eq!(x.ok_or("No value found"), Ok("foo"));
let x: Option<&str> = None;
assert_eq!(x.ok_or("No value found"), Err("No value found"));
This is just a brief skim of things you can do. For more explanation, check out:
http://blog.burntsushi.net/rust-error-handling/
https://doc.rust-lang.org/book/ch09-00-error-handling.html
http://lucumr.pocoo.org/2014/10/16/on-error-handling/
If you need to terminate some independent execution unit (a web request, a video frame processing, a GUI event, a source file to compile) but not all your application in completeness, there is a function std::panic::catch_unwind that invokes a closure, capturing the cause of an unwinding panic if one occurs.
let result = panic::catch_unwind(|| {
panic!("oh no!");
});
assert!(result.is_err());
I would not grant this closure write access to any variables that could outlive it, or any other otherwise global state.
The documentation also says the function also may not be able to catch some kinds of panic.