Why does it look like I can consume a value from an immutable BTreeMap? - iterator

I'm doing an Exercism exercise wherein I need to invert a map of scrabble scores; map of scores => letters becomes map of letters => scores. I came up with the following Rust code:
use std::collections::BTreeMap;
pub fn transform(h: &BTreeMap<i32, Vec<char>>) -> BTreeMap<char, i32> {
h.iter()
.flat_map(|(&score, letters)| {
letters
.into_iter()
.map(move |l| (l.to_ascii_lowercase(), score))
})
.collect()
}
This works, but afterwards I realized it shouldn't. h is an immutable reference. h.iter() iterates over references to its entries, which should have the type (&i32, &Vec<char>), but then I'm taking the Vec and consuming it with into_iter(). Rust does not complain, despite it being an immutable reference.
What am I missing here?

After typing out the question, I figured out the answer. IntoIterator is implemented for Vec, but also for &Vec and &mut Vec. Rather than consuming the Vec, it consumes the reference and iterates over references to its elements. Very simple behavior, but I managed to do it in the most confusing way possible.

Related

Why does `EffectScope.shift` need the type parameter `B`?

The move to the new continuations API in Arrow brought with it a handy new function: shift, in theory letting me get rid of ensure(false) { NewError() } or NewError().left().bind() constructs.
But I'm not sure how to properly use it. The documentation states that it is intended to short-circuit the continuation, and there are no conditionals, so it should always take the parameter, and (in either parlance) "make it a left value", and exit the scope.
So what is the type parameter B intended to be used for? It determines the return type of shift, but shift will not return. Given no more context, B can not be inferred, leading to this kind of code:
val res = either {
val intermediate = mayReturnNull()
if (intermediate == null) {
shift<Nothing>(IntermediateWasNull())
}
process(intermediate)
}
Note the <Nothing> (and ignore the contrived example, the main point is that shifts return type can not be inferred – the actual type parameter does not even matter).
I could wrap shift like this:
suspend fun <L> EffectScope<L>.fail(left: L): Nothing = shift(left)
But I feel like that is missing the point. Any explanations/hints would be greatly appreciated.
That is a great question!
This is more a matter of style, ideally we'd have both but they conflict so we cannot have both APIs available.
So shift always returns Nothing in its implementation, and so the B parameter is completely artificial.
This is something that is true for a lot of other things in Kotlin, such as object EmptyList : List<Nothing>. The Kotlin Std however exposes it as fun <A> emptyList(): List<A> = EmptyList.
For Arrow to stay consistent with APIs found in Kotlin Std, and to remain as Kotlin idiomatic as possible we also require a type argument just like emptyList. This has been up for discussion multiple times, and the Kotlin languages authors have stated that it was decided too explicitly require A for emptyList since that results in the best and most consistent ergonomics in Kotlin.
In the example you shared I would however recommend using ensureNotNull which will also smart-cast intermediate to non-null.
Arrow attempts to build the DSL so that you don't need to rely on shift in most cases, and you should prefer ensure and ensureNotNull when possible.
val res = either {
val intermediate = mayReturnNull()
ensureNotNull(intermediate) { IntermediateWasNull() }
process(intermediate) // <-- smart casted to non-null
}

Specifying lifetimes in nested iterators for flattening

I'm writing a function that should take a couple of vectors and produce their Cartesian product (all of their pair combinations) as a vector in row-major order. In other words, if I have
let x_coords = vec![1, 2, 3];
let y_coords = vec![4, 5, 6];
I want to produce
vec![ [1,4], [1,5], [1,6], [2,4], [2,5], [2,6], [3,4], [3,5], [3,6] ]
This seemed like a perfect job for .flat_map():
fn main() {
let x_coords = vec![1, 2, 3];
let y_coords = vec![4, 5, 6];
let coord_vec: Vec<[isize; 2]> =
x_coords.iter().map(|&x_coord| {
y_coords.iter().map(|&y_coord| {
[x_coord, y_coord]
})
}).flat_map(|s| s).collect();
// Expecting to get: vec![ [1,4], [1,5], [1,6], [2,4], [2,5], [2,6], [3,4], [3,5], [3,6] ]
println!("{:?}", &coord_vec);
}
But this doesn't work, because &x_coord doesn't live long enough. According to the compiler, it ends up inside the y_coords map and then never makes it back out.
I tried using .clone() and move in the closures, but got a strangely long and unclear lecture by the compiler in the form of multiple Note: lines.
Am I just completely off base with flat_map, or can this be saved?
Solution
You were really close! This works:
let coord_vec: Vec<_> = x_coords.iter()
.flat_map(|&x_coord| {
y_coords.iter().map(move |&y_coord| {
// ^^^^
[x_coord, y_coord]
})
})
.collect();
The only thing I added is the move keyword in front of the inner closure. Why? Let's try to understand what the compiler is thinking below!
A few notes, though:
I renamed map to flat_map and removed the second flat_map call... You made your life more complicated than it has to be ;-)
I omitted part of the type annotation of coord_vec, because it's not necessary
Explanation
The type x_coord is i32 (or any other integer). Not a reference or anything but the value directly. This means that x_coord is owned by the enclosing function, which happens to be a closure, specifically the "outer" closure, that you pass to flat_map. Thus x_coord lives just inside the closure, not longer. That's what the compiler is telling you. So far so good.
When you define the second closure (the "inner" one), you access the environment, specifically x_coord. The important question now is: how does a closure access its environment? It can do so with an immutable reference, with a mutable reference and by value. The Rust compiler determines what kind of access to the environment is needed and chooses the "least intrusive" option that works. Let's look at your code: the compiler figures out that the closure only needs to borrow the environment immutably (because i32 is Copy and thus the closure can transform its &i32 into a i32 easily).
But in this case, the compiler's reasoning is wrong! The closure borrowing its environment leads to a limited lifetime. And in this case we need the closure to live for longer than it can, hence the error.
By adding move, we force the compiler to pass the environment to the closure by value (transferring ownership). This way the closure borrows nothing and can live forever (satisfies 'static).

How does slice indexing work in Rust?

I wondered what slices were in Rust. As it turned out, it's just a struct with a data pointer and the size inside. I've looked at the source for indexing and I've found this:
impl<T> ops::Index<usize> for [T] {
type Output = T;
fn index(&self, index: usize) -> &T {
// NB built-in indexing
&(*self)[index]
}
}
I'm not a Rust expert but &(*self) seems like a pointer for me and there is no pointer indexing in Rust as far as I know. So how does this indexing thing work? Is it just a compiler built-in thing?
Is it just a compiler built-in thing?
Yes. The source code comment also says that same. [T] is an unsized type and needs some extra rules anyway. For example, unsized types can't live on the stack (are pretty difficult to handle). But references to unsized types consist of a pointer and a size (specifically "something that completes the type").
Note, however, that the expression is evaluated like this: &((*self)[index]). This means that self (type &[T]) is dereferenced to be type [T] and then indexed. This returns a T, but we only want a reference to it, thus the &.

How to mutate another item in a vector, but not the vector itself, while iterating over the vector?

It is quite clear to me that iterating over a vector shouldn't let the loop body mutate the vector arbitrarily. This prevents iterator invalidation, which is prone to bugs.
However, not all kinds of mutation lead to iterator invalidation. See the following example:
let mut my_vec: Vec<Vec<i32>> = vec![vec![1,2], vec![3,4], vec![5,6]];
for inner in my_vec.iter_mut() { // <- or .iter()
// ...
my_vec[some_index].push(inner[0]); // <-- ERROR
}
Such a mutation does not invalidate the iterator of my_vec, however it is disallowed. It could invalidate any references to the specific elements in my_vec[some_index] but we do not use any such references anyway.
I know that these questions are common, and I'm not asking for an explanation. I am looking for a way to refactor this so that I can get rid of this loop. In my actual code I have a huge loop body and I can't modularize it unless I express this bit nicely.
What I have thought of so far:
Wrapping the vector with Rc<RefCell<...>>. I think this would still fail at runtime, since the RefCell would be borrowed by the iterator and then will fail when the loop body tries to borrow it.
Using a temporary vector to accumulate the future pushes, and push them after the loop ends. This is okay, but needs more allocations than pushing them on the fly.
Unsafe code, and messing with pointers.
Anything listed in the Iterator documentation does not help. I checked out itertools and it looks like it wouldn't help either.
Using a while loop and indexing instead of using an iterator making use of a reference to the outer vector. This is okay, but does not let me use iterators and adapters. I just want to get rid of this outer loop and use my_vec.foreach(...).
Are there any idioms or any libraries which would let me do this nicely Unsafe functions would be okay as long as they don't expose pointers to me.
You can wrap each of the inner vectors in a RefCell.
use std::cell::RefCell;
fn main() {
let my_vec : Vec<RefCell<Vec<i32>>> = vec![
RefCell::new(vec![1,2]),
RefCell::new(vec![3,4]),
RefCell::new(vec![5,6])];
for inner in my_vec.iter() {
// ...
let value = inner.borrow()[0];
my_vec[some_index].borrow_mut().push(value);
}
}
Note that the value binding here is important if you need to be able to push to the vector that inner refers to. value happens to be a type that doesn't contain references (it's i32), so it doesn't keep the first borrow active (it ends by the end of the statement). Then, the next statement may borrow the same vector or another vector mutably and it'll work.
If we wrote my_vec[some_index].borrow_mut().push(inner.borrow()[0]); instead, then both borrows would be active until the end of the statement. If both my_vec[some_index] and inner refer to the same RefCell<Vec<i32>>, this will panic with RefCell<T> already mutably borrowed.
Without changing the type of my_vec, you could simply use access by indexing and split_at_mut:
for index in 0..my_vec.len() {
let (first, second) = my_vec.split_at_mut(index);
first[some_index].push(second[0]);
}
Note: beware, the indices in second are off by index.
This is safe, relatively easy, and very flexible. It does not, however, work with iterator adaptors.

Non-destructively iterating over a Rust collection, but not by reference

I can write the following two ways, the second is inspired by What is the idiomatic way to create a collection of references to methods that take self?:
channels.iter().flat_map(|c|c.to_uppercase()).collect(),
channels.clone().into_iter().flat_map(char::to_uppercase).collect(),
The second line has to clone the collection because char::to_uppercase doesn't accept a reference as it's argument and .iter() provides references and .into_iter() moves the collection.
Is there a way to do this that doesn't need to clone the collection or create a closure? I don't hate closures, I promise, and I know they're just turned into (usually inline) function calls in LLVM anyway, but I like the cleanness of referring to a function like in the second line and would prefer to use it if it can be done without the clone.
Iterator has a cloned method which is equivalent to .map(|x| x.clone()) which, in case of Copy types is equivalent to .map(|&x| x). This way you can write
channels.iter().cloned().flat_map(char::to_uppercase).collect()
You can define a function that takes a reference. You can even put it inside another function, if you want to keep it close to its usage.
fn foobar() {
fn to_uppercase(c: &char) -> ::std::char::ToUppercase {
c.to_uppercase()
}
// [...]
let channels_upper = channels.iter().flat_map(to_uppercase).collect();
}