Implementing a "cautious" take_while using Peekable - iterator

I'd like to use Peekable as the basis for a new cautious_take_while operation that acts like take_while from IteratorExt but without consuming the first failed item. (There's a side question of whether this is a good idea, and whether there are better ways to accomplish this goal in Rust -- I'd be happy for hints in that direction, but mostly I'm trying to understand where my code is breaking).
The API I'm trying to enable is basically:
let mut chars = "abcdefg.".chars().peekable();
let abc : String = chars.by_ref().cautious_take_while(|&x| x != 'd');
let defg : String = chars.by_ref().cautious_take_while(|&x| x != '.');
// yielding (abc = "abc", defg = "defg")
I've taken a crack at creating a MCVE here, but I'm getting:
:10:5: 10:19 error: cannot move out of borrowed content
:10 chars.by_ref().cautious_take_while(|&x| x != '.');
As far as I can tell, I'm following the same pattern as Rust's own TakeWhile in terms of my function signatures, but I'm seeing different different behavior from the borrow checker. Can someone point out what I'm doing wrong?

The funny thing with by_ref() is that it returns a mutable reference to itself:
pub trait IteratorExt: Iterator + Sized {
fn by_ref(&mut self) -> &mut Self { self }
}
It works because the Iterator trait is implemented for the mutable pointer to Iterator type. Smart!
impl<'a, I> Iterator for &'a mut I where I: Iterator, I: ?Sized { ... }
The standard take_while function works because it uses the trait Iterator, that is automatically resolved to &mut Peekable<T>.
But your code does not work because Peekable is a struct, not a trait, so your CautiousTakeWhileable must specify the type, and you are trying to take ownership of it, but you cannot, because you have a mutable pointer.
Solution, do not take a Peekable<T> but &mut Peekable<T>. You will need to specify the lifetime too:
impl <'a, T: Iterator, P> Iterator for CautiousTakeWhile<&'a mut Peekable<T>, P>
where P: FnMut(&T::Item) -> bool {
//...
}
impl <'a, T: Iterator> CautiousTakeWhileable for &'a mut Peekable<T> {
fn cautious_take_while<P>(self, f: P) -> CautiousTakeWhile<&'a mut Peekable<T>, P>
where P: FnMut(&T::Item) -> bool {
CautiousTakeWhile{inner: self, condition: f,}
}
}
A curious side effect of this solution is that now by_ref is not needed, because cautious_take_while() takes a mutable reference, so it does not steal ownership. The by_ref() call is needed for take_while() because it can take either Peekable<T> or &mut Peekable<T>, and it defaults to the first one. With the by_ref() call it will resolve to the second one.
And now that I finally understand it, I think it might be a good idea to change the definition of struct CautiousTakeWhile to include the peekable bit into the struct itself. The difficulty is that the lifetime has to be specified manually, if I'm right. Something like:
struct CautiousTakeWhile<'a, T: Iterator + 'a, P>
where T::Item : 'a {
inner: &'a mut Peekable<T>,
condition: P,
}
trait CautiousTakeWhileable<'a, T>: Iterator {
fn cautious_take_while<P>(self, P) -> CautiousTakeWhile<'a, T, P> where
P: FnMut(&Self::Item) -> bool;
}
and the rest is more or less straightforward.

This was a tricky one! I'll lead with the meat of the code, then attempt to explain it (if I understand it...). It's also the ugly, unsugared version, as I wanted to reduce incidental complexity.
use std::iter::Peekable;
fn main() {
let mut chars = "abcdefg.".chars().peekable();
let abc: String = CautiousTakeWhile{inner: chars.by_ref(), condition: |&x| x != 'd'}.collect();
let defg: String = CautiousTakeWhile{inner: chars.by_ref(), condition: |&x| x != '.'}.collect();
println!("{}, {}", abc, defg);
}
struct CautiousTakeWhile<'a, I, P> //'
where I::Item: 'a, //'
I: Iterator + 'a, //'
P: FnMut(&I::Item) -> bool,
{
inner: &'a mut Peekable<I>, //'
condition: P,
}
impl<'a, I, P> Iterator for CautiousTakeWhile<'a, I, P>
where I::Item: 'a, //'
I: Iterator + 'a, //'
P: FnMut(&I::Item) -> bool
{
type Item = I::Item;
fn next(&mut self) -> Option<I::Item> {
let return_next =
match self.inner.peek() {
Some(ref v) => (self.condition)(v),
_ => false,
};
if return_next { self.inner.next() } else { None }
}
}
Actually, Rodrigo seems to have a good explanation, so I'll defer to that, unless you'd like me to explain something specific.

Related

How do I use the Option::ok_or() method correctly?

I'm trying to understand how to use the question mark operator for error handling in Rust. I have this code:
fn main() -> Result<(), &'static str> {
let foo: i32 = Some("1")
.ok_or(Err("error 1"))?
.parse()
.or(Err("error 2"))?;
Ok(())
}
This code can not be compiled for some reason:
error[E0277]: the trait bound `&str: std::convert::From<std::result::Result<_, &str>>` is not satisfied
--> src/main.rs:2:20
|
2 | let foo: i32 = Some("1")
| ____________________^
3 | | .ok_or(Err("error 1"))?
| |_______________________________^ the trait `std::convert::From<std::result::Result<_, &str>>` is not implemented for `&str`
|
= note: required by `std::convert::From::from`
The Rust book has an example usage of the question mark operator:
use std::io;
use std::io::Read;
use std::fs::File;
fn read_username_from_file() -> Result<String, io::Error> {
let mut s = String::new();
File::open("hello.txt")?.read_to_string(&mut s)?;
Ok(s)
}
In my opinion, it doesn't differ much from my example in sense of handling errors. I cannot see a reason for my code to be invalid. If the From trait should be implemented for all kinds of Result why does the code from the Rust book work fine?
Unlike or, ok_or takes an E, not a full Result<T, E> (because it wouldn't have anything to do if passed an Ok). Just pass the error string directly:
fn main() -> Result<(), &'static str> {
let foo: i32 = Some("1")
.ok_or("error 1")?
.parse()
.or(Err("error 2"))?;
Ok(())
}
The reason the error message mentions the From trait is because ? implicitly uses From to convert the expression's error type into the return value's error type. If it worked, .ok_or(Err("error 1")) would return a value of Result<&'static str, Result<_, &'static str>> (_ could be almost anything, since Err doesn't specify). The ? operator attempts to find an implementation of From that would convert Result<_, &'static str> (the expression's error type) into &'static str (the return value's error type). Since no such From implementation exists, the compiler emits an error.

What's the appropriate way to return a reference to the value from a generic associated function?

Imagine a tiny map that stores 3 values, the first two for known keys. I'd like to implement an iterator for this map, but I'm running into lifetime issues. What's the appropriate way to return a reference to the value from a generic associated function (K::zero() in the example below)?
FYI, I own the trait, so I tried changing it to the new RFC195 associated const, which didn't help.
I've boiled down my problem to the following code:
extern crate num;
use num::*;
pub struct TinyMap<K: Num, V> {
v0: Option<V>, // value for K::zero()
v1: Option<V>, // value for K::one()
k2: K, // arbitrary K
v2: Option<V>, // value for k2
}
pub struct Iter<'a, K: 'a + Num, V: 'a> {
k0: K,
v0: &'a Option<V>,
v1: &'a Option<V>,
k2: &'a K,
v2: &'a Option<V>,
}
impl<K: Num, V> TinyMap<K, V> {
pub fn iter(&self) -> Iter<K, V> {
Iter {
k0: K::zero(),
v0: &self.v0,
v1: &self.v1,
k2: &self.k2,
v2: &self.v2,
}
}
}
impl<'a, K: 'a + Num, V: 'a> Iterator for Iter<'a, K, V> {
type Item = (&'a K, &'a V);
fn next(&mut self) -> Option<(&'a K, &'a V)> {
if (*self.v0).is_some() {
// code removed that remembers we did this once.
return Some((&self.k0, ((*self.v0).as_ref()).unwrap()));
}
// if (*self.v1).is_some() {
// code removed that remembers we did this once.
// return Some((&K::one(), &((*self.v1).unwrap())));
// }
None
}
}
error[E0495]: cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
--> src/lib.rs:38:26
|
38 | return Some((&self.k0, ((*self.v0).as_ref()).unwrap()));
| ^^^^^^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the method body at 35:5...
--> src/lib.rs:35:5
|
35 | / fn next(&mut self) -> Option<(&'a K, &'a V)> {
36 | | if (*self.v0).is_some() {
37 | | // code removed that remembers we did this once.
38 | | return Some((&self.k0, ((*self.v0).as_ref()).unwrap()));
... |
44 | | None
45 | | }
| |_____^
note: ...so that reference does not outlive borrowed content
--> src/lib.rs:38:26
|
38 | return Some((&self.k0, ((*self.v0).as_ref()).unwrap()));
| ^^^^^^^^
note: but, the lifetime must be valid for the lifetime 'a as defined on the impl at 32:6...
--> src/lib.rs:32:6
|
32 | impl<'a, K: 'a + Num, V: 'a> Iterator for Iter<'a, K, V> {
| ^^
= note: ...so that the expression is assignable:
expected std::option::Option<(&'a K, &'a V)>
found std::option::Option<(&K, &V)>
It's not possible to do that with the Iterator trait, because of the lifetime of the self reference (which is elided away in your code, but can be explicitly written like this):
type Item = (&'a K, &'a V);
fn next<'s>(&'s mut self) -> Self::Item;
Since 's doesn't appear in the function's return value (and can't appear in there, because Self::Item can't use type parameters of the function), the output is not allowed to hold a reference to any of the iterator's member variables.
That's the mechanics of the mistake, now here's the why part:
Consider a function that does include a reference to a member of self, with all the lifetimes set up correctly:
struct SomeMember;
struct SomeObject {
some_member: SomeMember,
}
impl SomeObject {
fn some_function<'s>(&'s mut self) -> &'s SomeMember {
&self.some_member
}
}
The same way you're trying to return &self.k, but without any other things going on, and with the lifetimes fixed so that it's allowed. However, if I then try to do this:
fn main() {
let mut some_object = SomeObject{some_member: SomeMember};
let _item_1 = some_object.some_function();
let _item_2 = some_object.some_function();
}
error[E0499]: cannot borrow `some_object` as mutable more than once at a time
--> src/main.rs:15:23
|
14 | let _item_1 = some_object.some_function();
| ----------- first mutable borrow occurs here
15 | let _item_2 = some_object.some_function();
| ^^^^^^^^^^^ second mutable borrow occurs here
16 | }
| - first borrow ends here
The second call wasn't allowed, because it borrows some_object twice, mutably, a classic Rust no-no! But if I had tried to implement an iterator with an Item type that borrowed the iterator itself, then Iterator::collect() would be impossible, because it tries to pull more than one item out at once!
So, no, an iterator can't return an item that borrows its contents. That's an explicit, and intentional, part of the trait contract for iterators.
The consensus appears to be that as of this time (Rust 1.29), the only sensible way is to put K::zero() inside TinyMap. Thanks to #SvenMarnach for confirming my suspicions.

What does Rhs refer to in a compiler error message about PartialEq?

I am trying to create a function that takes two iterators as as parameters and iterates over the items by reference. Each Iterator item should implement PartialEq.
My first attempt was:
fn compute<T: Iterator>(first: T, second: T, len: usize) -> usize
where
T::Item: std::cmp::PartialEq,
{
// ...
}
This compiles but iterates (as far as I understand) not by reference but by value and the compiler complains about a move when iterating.
My second attempt was something like:
fn compute<'a, T>(first: T, second: T, len: usize) -> usize
where
T: Iterator<Item = &'a std::cmp::PartialEq>,
{
//...
}
resulting in a compiler error:
error[E0393]: the type parameter `Rhs` must be explicitly specified
--> src/main.rs:3:28
|
3 | T: Iterator<Item = &'a std::cmp::PartialEq>,
| ^^^^^^^^^^^^^^^^^^^ missing reference to `Rhs`
|
= note: because of the default `Self` reference, type parameters must be specified on object types
What does the Rhs (Right hand side?) the compiler refers to here mean? Why do I need a reference to it? How do I pass a bounded reference-based Iterator into a function?
PartialEq is a trait that allows you to compare two values. Those two values do not have to be of the same type! The generic type Rhs is used to specify what type we are comparing with. By default, the value of Rhs is the same as the type that is being compared to:
pub trait PartialEq<Rhs = Self>
where
Rhs: ?Sized,
In this case, you are actually requesting that the iterator value be the trait object &PartialEq. As the error message states:
because of the default Self reference, type parameters must be specified on object types
We could specify it:
fn compute<'a, T>(first: T, second: T, len: usize) -> usize
where
T: Iterator<Item = &'a std::cmp::PartialEq<i32>>,
or
fn compute<'a, T: 'a>(first: T, second: T, len: usize) -> usize
where
T: Iterator<Item = &'a std::cmp::PartialEq<&'a T>>,
but iterates (as far as I understand) not by reference but by value
It's quite possible for it to iterate by reference. Remember that T is any type and that i32, &i32, and &mut i32 are all types. Your first example is the formulation of the signature I would use:
fn compute<T: Iterator>(first: T, second: T, len: usize) -> usize
where
T::Item: std::cmp::PartialEq,
{
42
}
fn main() {
let a = [1, 2, 3];
let b = [4, 5, 6];
compute(a.iter(), b.iter(), 1);
compute(a.iter(), b.iter(), 2);
compute(a.iter(), b.iter(), 3);
}

Return lazy iterator that depends on data allocated within the function

I am new to Rust and reading The Rust Programming Language, and in the Error Handling section there is a "case study" describing a program to read data from a CSV file using the csv and rustc-serialize libraries (using getopts for argument parsing).
The author writes a function search that steps through the rows of the csv file using a csv::Reader object and collect those entries whose 'city' field match a specified value into a vector and returns it. I've taken a slightly different approach than the author, but this should not affect my question. My (working) function looks like this:
extern crate csv;
extern crate rustc_serialize;
use std::path::Path;
use std::fs::File;
fn search<P>(data_path: P, city: &str) -> Vec<DataRow>
where P: AsRef<Path>
{
let file = File::open(data_path).expect("Opening file failed!");
let mut reader = csv::Reader::from_reader(file).has_headers(true);
reader.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city)
.collect()
}
where the DataRow type is just a record,
#[derive(Debug, RustcDecodable)]
struct DataRow {
country: String,
city: String,
accent_city: String,
region: String,
population: Option<u64>,
latitude: Option<f64>,
longitude: Option<f64>
}
Now, the author poses, as the dreaded "exercise to the reader", the problem of modifying this function to return an iterator instead of a vector (eliminating the call to collect). My question is: How can this be done at all, and what are the most concise and idiomatic ways of doing it?
A simple attempt that i think gets the type signature right is
fn search_iter<'a,P>(data_path: P, city: &'a str)
-> Box<Iterator<Item=DataRow> + 'a>
where P: AsRef<Path>
{
let file = File::open(data_path).expect("Opening file failed!");
let mut reader = csv::Reader::from_reader(file).has_headers(true);
Box::new(reader.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city))
}
I return a trait object of type Box<Iterator<Item=DataRow> + 'a> so as not to have to expose the internal Filter type, and where the lifetime 'a is introduced just to avoid having to make a local clone of city. But this fails to compile because reader does not live long enough; it's allocated on the stack and so is deallocated when the function returns.
I guess this means that reader has to be allocated on the heap (i.e. boxed) from the beginning, or somehow moved off the stack before the function ends. If I were returning a closure, this is exactly the problem that would be solved by making it a move closure. But I don't know how to do something similar when I'm not returning a function. I've tried defining a custom iterator type containing the needed data, but I couldn't get it to work, and it kept getting uglier and more contrived (don't make too much of this code, I'm only including it to show the general direction of my attempts):
fn search_iter<'a,P>(data_path: P, city: &'a str)
-> Box<Iterator<Item=DataRow> + 'a>
where P: AsRef<Path>
{
struct ResultIter<'a> {
reader: csv::Reader<File>,
wrapped_iterator: Option<Box<Iterator<Item=DataRow> + 'a>>
}
impl<'a> Iterator for ResultIter<'a> {
type Item = DataRow;
fn next(&mut self) -> Option<DataRow>
{ self.wrapped_iterator.unwrap().next() }
}
let file = File::open(data_path).expect("Opening file failed!");
// Incrementally initialise
let mut result_iter = ResultIter {
reader: csv::Reader::from_reader(file).has_headers(true),
wrapped_iterator: None // Uninitialised
};
result_iter.wrapped_iterator =
Some(Box::new(result_iter.reader
.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|&row: &DataRow| row.city == city)));
Box::new(result_iter)
}
This question seems to concern the same problem, but the author of the answer solves it by making the concerned data static, which I don't think is an alternative for this question.
I am using Rust 1.10.0, the current stable version from the Arch Linux package rust.
CSV 1.0
As I alluded to in the answer for older versions of the crate, the best way of solving this is for the CSV crate to have an owning iterator, which it now does: DeserializeRecordsIntoIter
use csv::ReaderBuilder; // 1.1.1
use serde::Deserialize; // 1.0.104
use std::{fs::File, path::Path};
#[derive(Debug, Deserialize)]
struct DataRow {
country: String,
city: String,
accent_city: String,
region: String,
population: Option<u64>,
latitude: Option<f64>,
longitude: Option<f64>,
}
fn search_iter(data_path: impl AsRef<Path>, city: &str) -> impl Iterator<Item = DataRow> + '_ {
let file = File::open(data_path).expect("Opening file failed");
ReaderBuilder::new()
.has_headers(true)
.from_reader(file)
.into_deserialize::<DataRow>()
.map(|row| row.expect("Failed decoding row"))
.filter(move |row| row.city == city)
}
Before version 1.0
The straightest path to convert the original function would be to simply wrap the iterator. However, doing so directly will lead to problems because you cannot return an object that refers to itself and the result of decode refers to the Reader. If you could surmount that, you cannot have an iterator return references to itself.
One solution is to simply re-create the DecodedRecords iterator for each call to your new iterator:
fn search_iter<'a, P>(data_path: P, city: &'a str) -> MyIter<'a>
where
P: AsRef<Path>,
{
let file = File::open(data_path).expect("Opening file failed!");
MyIter {
reader: csv::Reader::from_reader(file).has_headers(true),
city: city,
}
}
struct MyIter<'a> {
reader: csv::Reader<File>,
city: &'a str,
}
impl<'a> Iterator for MyIter<'a> {
type Item = DataRow;
fn next(&mut self) -> Option<Self::Item> {
let city = self.city;
self.reader
.decode()
.map(|row| row.expect("Failed decoding row"))
.filter(|row: &DataRow| row.city == city)
.next()
}
}
This could have overhead associated with it, depending on the implementation of decode. Additionally, this might "rewind" back to the beginning of the input — if you substituted a Vec instead of a csv::Reader, you would see this. However, it happens to work in this case.
Beyond that, I'd normally open the file and create the csv::Reader outside of the function and pass in the DecodedRecords iterator and transform it, returning a newtype / box / type alias around the underlying iterator. I prefer this because the structure of your code mirrors the lifetimes of the objects.
I'm a little surprised that there isn't an implementation of IntoIterator for csv::Reader, which would also solve the problem because there would not be any references.
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?
What is the correct way to return an Iterator (or any other trait)?

Implementing Decodable for a wrapper around a fixed size vector

Background: the serialize crate is undocumented, deriving Decodable doesn't work. I've also looked at existing implementations for other types and find the code difficult to follow.
How does the decoding process work, and how do I implement Decodable for this struct?
pub struct Grid<A> {
data: [[A,..GRIDW],..GRIDH]
}
The reason why #[deriving(Decodable)] doesn't work is that [A,..GRIDW] doesn't implement Decodable, and it's impossible to implement a trait for a type when both are defined outside of this crate, which is the case here. So the only solution I can see is to manually implement Decodable for Grid.
And this is as far as I've gotten
impl <A: Decodable<D, E>, D: Decoder<E>, E> Decodable<D, E> for Grid<A> {
fn decode(decoder: &mut D) -> Result<Grid<A>, E> {
decoder.read_struct("Grid", 1u, ref |d| Ok(Grid {
data: match d.read_struct_field("data", 0u, ref |d| Decodable::decode(d)) {
Ok(e) => e,
Err(e) => return Err(e)
},
}))
}
}
Which gives an error at Decodable::decode(d)
error: failed to find an implementation of trait
serialize::serialize::Decodable for [[A, .. 20], .. 20]
It's not really possible to do this nicely at the moment for a variety of reasons:
We can't be generic over the length of a fixed length array (the fundamental issue)
The current trait coherence restrictions means we can't write a custom trait MyDecodable<D, E> { ... } with impl MyDecodable<D, E> for [A, .. GRIDW] (and one for GRIDH) and a blanket implementation impl<A: Decodable<D, E>> MyDecodable<D, E> for A. This forces a trait-based solution into using an intermediary type, which then makes the compiler's type inference rather unhappy and AFAICT impossible to satisfy.
We don't have associated types (aka "output types"), which I think would allow the type inference to be slightly sane.
Thus, for now, we're left with a manual implementation. :(
extern crate serialize;
use std::default::Default;
use serialize::{Decoder, Decodable};
static GRIDW: uint = 10;
static GRIDH: uint = 5;
fn decode_grid<E, D: Decoder<E>,
A: Copy + Default + Decodable<D, E>>(d: &mut D)
-> Result<Grid<A>, E> {
// mirror the Vec implementation: try to read a sequence
d.read_seq(|d, len| {
// check it's the required length
if len != GRIDH {
return Err(
d.error(format!("expecting length {} but found {}",
GRIDH, len).as_slice()));
}
// create the array with empty values ...
let mut array: [[A, .. GRIDW], .. GRIDH]
= [[Default::default(), .. GRIDW], .. GRIDH];
// ... and fill it in progressively ...
for (i, outer) in array.mut_iter().enumerate() {
// ... by reading each outer element ...
try!(d.read_seq_elt(i, |d| {
// ... as a sequence ...
d.read_seq(|d, len| {
// ... of the right length,
if len != GRIDW { return Err(d.error("...")) }
// and then read each element of that sequence as the
// elements of the grid.
for (j, inner) in outer.mut_iter().enumerate() {
*inner = try!(d.read_seq_elt(j, Decodable::decode));
}
Ok(())
})
}));
}
// all done successfully!
Ok(Grid { data: array })
})
}
pub struct Grid<A> {
data: [[A,..GRIDW],..GRIDH]
}
impl<E, D: Decoder<E>, A: Copy + Default + Decodable<D, E>>
Decodable<D, E> for Grid<A> {
fn decode(d: &mut D) -> Result<Grid<A>, E> {
d.read_struct("Grid", 1, |d| {
d.read_struct_field("data", 0, decode_grid)
})
}
}
fn main() {}
playpen.
It's also possible to write a more "generic" [T, .. n] decoder by using macros to instantiate each version, with special control over how the recursive decoding is handled to allow nested fixed-length arrays to be handled (as required for Grid); this requires somewhat less code (especially with more layers, or a variety of different lengths), but the macro solution:
may be harder to understand, and
the one I give there may be less efficient (there's a new array variable created for every fixed length array, including new Defaults, while the non-macro solution above just uses a single array and thus only calls Default::default once for each element in the grid). It may be possible to expand to a similar set of recursive loops, but I'm not sure.