Pass an iterator of structs to a function accepting references in Rust? - iterator

I have a function that takes an iterator over references to structs. Sometimes I'm iterating over a vector, which works fine, but sometimes I create an iterator that produces new structs, and I'm having trouble figuring that one out. I get that when I create a value in a closure, it goes away when the closure does. Rust is always trying to move values out of things when I don't want it to; why doesn't it here?
struct Thing {
value: u32,
}
fn consume<'a, I: IntoIterator<Item = &'a Thing>>(things: I) {
for thing in things {
println!("{}", thing.value);
}
}
fn main() {
let solid = vec![Thing { value: 0 }];
let ephemeral = (1..5).map(|i| &Thing { value: i }); // Boxing Thing does not work either
consume(solid.iter());
consume(ephemeral);
}
But
error[E0515]: cannot return reference to temporary value
--> src/main.rs:13:36
|
13 | let ephemeral = (1..5).map(|i| &Thing { value: i }); // Boxing Thing does not work either
| ^------------------
| ||
| |temporary value created here
| returns a reference to data owned by the current function
I have the sense I either need to move the struct out of the closure and iterator, or store it somewhere. But Boxing the struct doesn't work and returning a struct rather than a pointer doesn't type check (and I can't find the opposite of .cloned()). What's the approach here?

Short answer: you can't.
Longer explanation:
Here is "an iterator that produces new structs":
let iterator_of_structs = (1..5).map(|value| Thing { value });
The main trick to figuring this out is to always ask "who owns the data?".
Each time we call next, the closure takes ownership of an integer (via value) and constructs a new Thing. The closure returns the Thing, transferring ownership to the code that called next.
While you are borrowing a value (a.k.a. taking a reference), the ownership of the value cannot change hands and the value must last longer than the borrow lasts.
Let's turn to the concept of an iterator of references and ask our question: "who owns the data?".
map(|value| &Thing { value })
Here, we create a Thing and take a reference to it. No variable owns the Thing, so the scope owns it and the value will be destroyed when the scope ends. The closure tries to return the reference, but that violates the axiom that borrowed items must outlive their borrows.
So, how do you fix it? The easiest thing is to change your function to be more accepting:
use std::borrow::Borrow;
struct Thing {
value: u32,
}
fn consume(things: impl IntoIterator<Item = impl Borrow<Thing>>) {
for thing in things {
let thing = thing.borrow();
println!("{}", thing.value);
}
}
fn main() {
let iterator_of_structs = (1..5).map(|value| Thing { value });
consume(iterator_of_structs);
let vector_of_structs: Vec<_> = (1..5).map(|value| Thing { value }).collect();
let iterator_of_references_to_structs = vector_of_structs.iter();
consume(iterator_of_references_to_structs);
}
Here, we accept any type which can be converted into an iterator of items that allow us to borrow a reference to a Thing. This is implemented for any item and any reference to an item.

An iterator of references allows the consumer to keep all the references that the iterator yielded, for as long as they want (at least while the iterator itself remains alive). Obviously to support that, all objects to which the iterator creates references need to be in memory at the same time. There is no way around this with the iterator protocol as-is. So your best course of action is to collect() the iterator into a vector and create an reference iterator from that (as you do with solid). Unfortunately this means losing the laziness.
There is an alternative iterator abstraction, called streaming iterator, which would support this. With streaming iterators, the consumer may only hold onto the reference until it gets the next one. I am not aware of any crates implementing this though, and it would be a completely different trait which no function using std::iter::Iterator supports. In many cases it may even be impossible to use streaming iterators, because the algorithm needs the freedom to reference several values at once.

Related

Trying to implement an iterator: cannot infer an appropriate lifetime due to conflicting requirements

I'm trying to implement an iterator on my own struct. My general approach is by generating and storing an iterator the first time next is invoked, and then calling this iterator each time I need a value.
My minimal failing example looks like this, and the heart of it is:
if !self.vals.is_some() {
self.vals = Some(Box::new({
self.display.chars().filter(|&i| i == self.look_for)
}) as Box<std::iter::Iterator<Item = _>>);
}
My code fails to compile, producing the following message:
help: consider using an explicit lifetime parameter as shown: fn next(self: &'a mut Self) -> Option<<Self>::Item>
Following the advice doesn't help (just leads to more compile errors saying that my implementation is incompatible with the Iterator trait definition.
I'd appreciate help understanding what's going wrong and how I can fix it.
The problem is that the closure you pass to filter needs to borrow self, but you can't store a reference to self in the struct itself.
In this case, we can work around it by storing a copy of the value in the closure instead. This is done in two steps:
Assign self.look_for to a local variable, and use the local variable in the closure instead. This way, the closure is not tied to self.
Add move to the closure. The closure will thus capture the local variable by value.
Here's the final code:
impl<'a> Iterator for StatefulCounter<'a> {
type Item = bool;
fn next(&mut self) -> Option<Self::Item> {
if !self.vals.is_some() {
let look_for = self.look_for;
self.vals = Some(Box::new({
self.display.chars().filter(move |&i| i == look_for)
}));
}
if let &Some(v) = &self.vals.as_mut().unwrap().next() {
Some(expensive(v))
} else {
None
}
}
}
The explicit cast on the Box is not necessary, so I removed it.

How to make a Rust singleton's destructor run?

These are the ways I know of to create singletons in Rust:
#[macro_use]
extern crate lazy_static;
use std::sync::{Mutex, Once, ONCE_INIT};
#[derive(Debug)]
struct A(usize);
impl Drop for A {
fn drop(&mut self) {
// This is never executed automatically.
println!(
"Dropping {:?} - Important stuff such as release file-handles etc.",
*self
);
}
}
// ------------------ METHOD 0 -------------------
static PLAIN_OBJ: A = A(0);
// ------------------ METHOD 1 -------------------
lazy_static! {
static ref OBJ: Mutex<A> = Mutex::new(A(1));
}
// ------------------ METHOD 2 -------------------
fn get() -> &'static Mutex<A> {
static mut OBJ: *const Mutex<A> = 0 as *const Mutex<A>;
static ONCE: Once = ONCE_INIT;
ONCE.call_once(|| unsafe {
OBJ = Box::into_raw(Box::new(Mutex::new(A(2))));
});
unsafe { &*OBJ }
}
fn main() {
println!("Obj = {:?}", PLAIN_OBJ); // A(0)
println!("Obj = {:?}", *OBJ.lock().unwrap()); // A(1)
println!("Obj = {:?}", *get().lock().unwrap()); // A(2)
}
None of these call A's destructor (drop()) at program exit. This is expected behaviour for Method 2 (which is heap allocated), but I hadn't looked into the implementation of lazy_static! to know it was going to be similar.
There is no RAII here. I could achieve that behaviour of an RAII singleton in C++ (I used to code in C++ until a year a back, so most of my comparisons relate to it - I don't know many other languages) using function local statics:
A& get() {
static A obj; // thread-safe creation with C++11 guarantees
return obj;
}
This is probably allocated/created (lazily) in implementation defined area and is valid for the lifetime of the program. When the program terminates, the destructor is deterministically run. We need to avoid accessing it from destructors of other statics, but I have never run into that.
I might need to release resources and I want drop() to be run. Right now, I end up doing it manually just before program termination (towards the end of main after all threads have joined etc.).
I don't even know how to do this using lazy_static! so I have avoided using it and only go for Method 2 where I can manually destroy it at the end.
I don't want to do this; is there a way I can have such a RAII behaved singleton in Rust?
Singletons in particular, and global constructors/destructors in general, are a bane (especially in language such as C++).
I would say the main (functional) issues they cause are known respectively as static initialization (resp. destruction) order fiasco. That is, it is easy to accidentally create a dependency cycle between those globals, and even without such a cycle it is not immediately clear to compiler in which order they should be built/destroyed.
They may also cause other issues: slower start-up, accidentally shared memory, ...
In Rust, the attitude adopted has been No life before/after main. As such, attempting to get the C++ behavior is probably not going to work as expected.
You will get much greater language support if you:
drop the global aspect
drop the attempt at having a single instance
(and as a bonus, it'll be so much easier to test in parallel, too)
My recommendation, thus, is to simply stick with local variables. Instantiate it in main, pass it by value/reference down the call-stack, and not only do you avoid those tricky initialization order issue, you also get destruction.

Is there an Iterator-like trait which returns references that must fall out of scope before the next access?

This would make it possible to safely iterate over the same element twice, or to hold some state for the global thing being iterated over in the item type.
Something like:
trait IterShort<Iter>
where Self: Borrow<Iter>,
{
type Item;
fn next(self) -> Option<Self::Item>;
}
then an implementation could look like:
impl<'a, MyIter> IterShort<MyIter> for &'a mut MyIter {
type Item = &'a mut MyItem;
fn next(self) -> Option<Self::Item> {
// ...
}
}
I realize I could write my own (I just did), but I'd like one that works with the for-loop notation. Is that possible?
The std::iter::Iterator trait can not do this, but you can write a different trait:
trait StreamingIterator {
type Item;
fn next<'a>(&'a mut self) -> Option<&'a mut Self::Item>;
}
Note that the return value of next borrows the iterator itself, whereas in Vec::iter for example it only borrows the vector.
The downside is that &mut is hard-coded. Making it generic would require higher-kinded types (so that StreamingIterator::Item could itself be generic over a lifetime parameter).
Alexis Beingessner gave a talk about this and more titled Who Owns This Stream of Data? at RustCamp.
As to for loops, they’re really tied to std::iter::IntoIterator which is tied to std::iter::Iterator. You’d just have to implement both.
The standard iterators can't do this as far as I can see. The very definition of an iterator is that the outside has control over the elements while the inside has control over what produces the elements.
From what I understand of what you are trying to do, I'd flip the concept around and instead of returning elements from an iterator to a surrounding environment, pass the environment to the iterator. That is, you create a struct with a constructor function that accepts a closure and implements the iterator trait. On each call to next, the passed-in closure is called with the next element and the return value of that closure or modifications thereof are returned as the current element. That way, next can handle the lifetime of whatever would otherwise be returned to the surrounding environment.

How can I return an iterator over a slice?

fn main() {
let vec: Vec<_> = (0..5).map(|n| n.to_string()).collect();
for item in get_iterator(&vec) {
println!("{}", item);
}
}
fn get_iterator(s: &[String]) -> Box<Iterator<Item=String>> {
Box::new(s.iter())
}
fn get_iterator<'a>(s: &'a [String]) -> Box<Iterator<Item=&'a String> + 'a> {
Box::new(s.iter())
}
The trick here is that we start with a slice of items and that slice has the lifetime 'a. slice::iter returns a slice::Iter with the same lifetime as the slice. The implementation of Iterator likewise returns references with that lifetime. We need to connect all of the lifetimes together.
That explains the 'a in the arguments and in the Item=&'a part. So what's the + 'a mean? There's a complete answer about that, and another with more detail. The short version is that an object with references inside of it may implement a trait, so we need to account for those lifetimes when talking about a trait. By default, that lifetime is 'static as it was determined that was the usual case.
The Box is not strictly required, but is a normal thing you'll see when you don't want to deal with the complicated types that might underlie the implementation (or just don't want to expose the implementation). In this case, the function could be
fn get_iterator<'a>(s: &'a [String]) -> std::slice::Iter<'a, String> {
s.iter()
}
But if you add .skip(1), the type would be:
std::iter::Skip<std::slice::Iter<'a, String>>
And if you involve a closure, then it's currently impossible to specify the type, as closures are unique, anonymous, auto-generated types! A Box is required for those cases.

How do I apply an explicit lifetime bound to a returned trait?

Returning an iterator from a function in Rust is an exercise of Sisyphean dimensions, but I am told it's possible to return one as a trait without quite so much pain. Unfortunately, it isn't working: apparently, I need an explicit lifetime bound? Which is apparently not the same thing as adding a lifetime parameter. Which means I have no idea how to do that.
Here's my (tiny, test) code:
fn main() {
let args = get_args();
for arg in args {
println!("{}", arg);
}
}
fn get_args() -> Iterator {
std::env::args().filter_map(|arg| arg.into_string().ok())
}
What is the appropriate way to make this actually work?
Edit: rust version rustc 1.0.0-nightly (00df3251f 2015-02-08 23:24:33 +0000)
You can't return a bare Iterator from a function, because it is a trait, thus not a sized type.
In your situation, you'll need to put the iterator object inside a box, in order to make it into a sized object that can be returned from the function.
To do so, you can change your code like this:
fn get_args() -> Box<Iterator<Item=String> + 'static> {
Box::new(std::env::args().filter_map(|arg| arg.into_string().ok()))
}
Here I've added a lifetime specifier 'static for the trait object, meaning that it is completely self-owned (a function taking no arguments will almost always return something valid for the 'static lifetime in this sense).
You also need the <Item=String> part to explicit the type of data yielded by your iterator. In this case: Strings.
In this specific case you can manage to return a concrete type from your get_args, like so:
fn get_args() -> FilterMap<Args, fn(OsString) -> Option<String>> {
fn arg_into_string(arg: OsString) -> Option<String> { arg.into_string().ok() }
args().filter_map(arg_into_string as fn(OsString) -> Option<String>)
}
basically this applies to all the cases where the closure you use in the iterator adapter (in your case filter_map) is not really a closure, in that it does not capture any environment, and it can be modeled by a plain old function.
In general, if you do need to return a type that does contain a closure, you will indeed need to box it and return a trait object. In your case:
fn get_args() -> Box<Iterator<Item=String> + 'static> {
Box::new(std::env::args().filter_map(|arg| arg.into_string().ok()))
}