How to compose mutable Iterators? - iterator

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.
I would like to make an iterator that generates a stream of prime numbers. My general thought process was to wrap an iterator with successive filters so for example you start with
let mut n = (2..N)
Then for each prime number you mutate the iterator and add on a filter
let p1 = n.next()
n = n.filter(|&x| x%p1 !=0)
let p2 = n.next()
n = n.filter(|&x| x%p2 !=0)
I am trying to use the following code, but I can not seem to get it to work
struct Primes {
base: Iterator<Item = u64>,
}
impl<'a> Iterator for Primes<'a> {
type Item = u64;
fn next(&mut self) -> Option<u64> {
let p = self.base.next();
match p {
Some(n) => {
let prime = n.clone();
let step = self.base.filter(move |&: &x| {x%prime!=0});
self.base = &step as &Iterator<Item = u64>;
Some(n)
},
_ => None
}
}
}
I have toyed with variations of this, but I can't seem to get lifetimes and types to match up. Right now the compiler is telling me
I can't mutate self.base
the variable prime doesn't live long enough
Here is the error I am getting
solution.rs:16:17: 16:26 error: cannot borrow immutable borrowed content `*self.base` as mutable
solution.rs:16 let p = self.base.next();
^~~~~~~~~
solution.rs:20:28: 20:37 error: cannot borrow immutable borrowed content `*self.base` as mutable
solution.rs:20 let step = self.base.filter(move |&: &x| {x%prime!=0});
^~~~~~~~~
solution.rs:21:30: 21:34 error: `step` does not live long enough
solution.rs:21 self.base = &step as &Iterator<Item = u64>;
^~~~
solution.rs:15:39: 26:6 note: reference must be valid for the lifetime 'a as defined on the block at 15:38...
solution.rs:15 fn next(&mut self) -> Option<u64> {
solution.rs:16 let p = self.base.next();
solution.rs:17 match p {
solution.rs:18 Some(n) => {
solution.rs:19 let prime = n.clone();
solution.rs:20 let step = self.base.filter(move |&: &x| {x%prime!=0});
...
solution.rs:20:71: 23:14 note: ...but borrowed value is only valid for the block suffix following statement 1 at 20:70
solution.rs:20 let step = self.base.filter(move |&: &x| {x%prime!=0});
solution.rs:21 self.base = &step as &Iterator<Item = u64>;
solution.rs:22 Some(n)
solution.rs:23 },
error: aborting due to 3 previous errors
Why won't Rust let me do this?

Here is a working version:
struct Primes<'a> {
base: Option<Box<Iterator<Item = u64> + 'a>>,
}
impl<'a> Iterator for Primes<'a> {
type Item = u64;
fn next(&mut self) -> Option<u64> {
let p = self.base.as_mut().unwrap().next();
p.map(|n| {
let base = self.base.take();
let step = base.unwrap().filter(move |x| x % n != 0);
self.base = Some(Box::new(step));
n
})
}
}
impl<'a> Primes<'a> {
#[inline]
pub fn new<I: Iterator<Item = u64> + 'a>(r: I) -> Primes<'a> {
Primes {
base: Some(Box::new(r)),
}
}
}
fn main() {
for p in Primes::new(2..).take(32) {
print!("{} ", p);
}
println!("");
}
I'm using a Box<Iterator> trait object. Boxing is unavoidable because the internal iterator must be stored somewhere between next() calls, and there is nowhere you can store reference trait objects.
I made the internal iterator an Option. This is necessary because you need to replace it with a value which consumes it, so it is possible that the internal iterator may be "absent" from the structure for a short time. Rust models absence with Option. Option::take replaces the value it is called on with None and returns whatever was there. This is useful when shuffling non-copyable objects around.
Note, however, that this sieve implementation is going to be both memory and computationally inefficient - for each prime you're creating an additional layer of iterators which takes heap space. Also the depth of stack when calling next() grows linearly with the number of primes, so you will get a stack overflow on a sufficiently large number:
fn main() {
println!("{}", Primes::new(2..).nth(10000).unwrap());
}
Running it:
% ./test1
thread '<main>' has overflowed its stack
zsh: illegal hardware instruction (core dumped) ./test1

Related

Why are the strings in my iterator being concatenated?

My original goal is to fetch a list of words, one on each line, and to put them in a HashSet, while discarding comment lines and raising I/O errors properly. Given the file "stopwords.txt":
a
# this is actually a comment
of
the
this
I managed to make the code compile like this:
fn stopword_set() -> io::Result<HashSet<String>> {
let words = Result::from_iter(
BufReader::new(File::open("stopwords.txt")?)
.lines()
.filter(|r| match r {
&Ok(ref l) => !l.starts_with('#'),
_ => true
}));
Ok(HashSet::from_iter(words))
}
fn main() {
let set = stopword_set().unwrap();
println!("{:?}", set);
assert_eq!(set.len(), 4);
}
Here's a playground that also creates the file above.
I would expect to have a set of 4 strings at the end of the program. To my surprise, the function actually returns a set containing a single string with all words concatenated:
{"aofthethis"}
thread 'main' panicked at 'assertion failed: `(left == right)` (left: `1`, right: `4`)'
Led by a piece of advice in the docs for FromIterator, I got rid of all calls to from_iter and used collect instead (Playground), which has indeed solved the problem.
fn stopword_set() -> io::Result<HashSet<String>> {
BufReader::new(File::open("stopwords.txt")?)
.lines()
.filter(|r| match r {
&Ok(ref l) => !l.starts_with('#'),
_ => true
}).collect()
}
Why are the previous calls to from_iter leading to unexpected inferences, while collect() works just as intended?
A simpler reproduction:
use std::collections::HashSet;
use std::iter::FromIterator;
fn stopword_set() -> Result<HashSet<String>, u8> {
let input: Vec<Result<_, u8>> = vec![Ok("foo".to_string()), Ok("bar".to_string())];
let words = Result::from_iter(input.into_iter());
Ok(HashSet::from_iter(words))
}
fn main() {
let set = stopword_set().unwrap();
println!("{:?}", set);
assert_eq!(set.len(), 2);
}
The problem is that here, we are collecting from the iterator twice. The type of words is Result<_, u8>. However, Result also implements Iterator itself, so when we call from_iter on that at the end, the compiler sees that the Ok type must be String due to the method signature. Working backwards, you can construct a String from an iterator of Strings, so that's what the compiler picks.
Removing the second from_iter would solve it:
fn stopword_set() -> Result<HashSet<String>, u8> {
let input: Vec<Result<_, u8>> = vec![Ok("foo".to_string()), Ok("bar".to_string())];
Result::from_iter(input.into_iter())
}
Or for your original:
fn stopword_set() -> io::Result<HashSet<String>> {
Result::from_iter(
BufReader::new(File::open("stopwords.txt")?)
.lines()
.filter(|r| match r {
&Ok(ref l) => !l.starts_with('#'),
_ => true
}))
}
Of course, I'd normally recommend using collect instead, as I prefer the chaining:
fn stopword_set() -> io::Result<HashSet<String>> {
BufReader::new(File::open("stopwords.txt")?)
.lines()
.filter(|r| match r {
&Ok(ref l) => !l.starts_with('#'),
_ => true,
})
.collect()
}

Why does .flat_map() with .chars() not work with std::io::Lines, but does with a vector of Strings?

I am trying to iterate over characters in stdin. The Read.chars() method achieves this goal, but is unstable. The obvious alternative is to use Read.lines() with a flat_map to convert it to a character iterator.
This seems like it should work, but doesn't, resulting in borrowed value does not live long enough errors.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut lines = stdin.lock().lines();
let mut chars = lines.flat_map(|x| x.unwrap().chars());
}
This is mentioned in Read file character-by-character in Rust, but it does't really explain why.
What I am particularly confused about is how this differs from the example in the documentation for flat_map, which uses flat_map to apply .chars() to a vector of strings. I don't really see how that should be any different. The main difference I see is that my code needs to call unwrap() as well, but changing the last line to the following does not work either:
let mut chars = lines.map(|x| x.unwrap());
let mut chars = chars.flat_map(|x| x.chars());
It fails on the second line, so the issue doesn't appear to be the unwrap.
Why does this last line not work, when the very similar line in the documentation doesn't? Is there any way to get this to work?
Start by figuring out what the type of the closure's variable is:
let mut chars = lines.flat_map(|x| {
let () = x;
x.unwrap().chars()
});
This shows it's a Result<String, io::Error>. After unwrapping it, it will be a String.
Next, look at str::chars:
fn chars(&self) -> Chars
And the definition of Chars:
pub struct Chars<'a> {
// some fields omitted
}
From that, we can tell that calling chars on a string returns an iterator that has a reference to the string.
Whenever we have a reference, we know that the reference cannot outlive the thing that it is borrowed from. In this case, x.unwrap() is the owner. The next thing to check is where that ownership ends. In this case, the closure owns the String, so at the end of the closure, the value is dropped and any references are invalidated.
Except the code tried to return a Chars that still referred to the string. Oops. Thanks to Rust, the code didn't segfault!
The difference with the example that works is all in the ownership. In that case, the strings are owned by a vector outside of the loop and they do not get dropped before the iterator is consumed. Thus there are no lifetime issues.
What this code really wants is an into_chars method on String. That iterator could take ownership of the value and return characters.
Not the maximum efficiency, but a good start:
struct IntoChars {
s: String,
offset: usize,
}
impl IntoChars {
fn new(s: String) -> Self {
IntoChars { s: s, offset: 0 }
}
}
impl Iterator for IntoChars {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
let remaining = &self.s[self.offset..];
match remaining.chars().next() {
Some(c) => {
self.offset += c.len_utf8();
Some(c)
}
None => None,
}
}
}
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let lines = stdin.lock().lines();
let chars = lines.flat_map(|x| IntoChars::new(x.unwrap()));
for c in chars {
println!("{}", c);
}
}
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?

Convert vector of enum values into an another vector

I have the following code which generates a vector of bytes from the passed vector of enum values:
#[derive(Debug, PartialEq)]
pub enum BertType {
SmallInteger(u8),
Integer(i32),
Float(f64),
String(String),
Boolean(bool),
Tuple(BertTuple),
}
#[derive(Debug, PartialEq)]
pub struct BertTuple {
pub values: Vec<BertType>
}
pub struct Serializer;
pub trait Serialize<T> {
fn to_bert(&self, data: T) -> Vec<u8>;
}
impl Serializer {
fn enum_value_to_binary(&self, enum_value: BertType) -> Vec<u8> {
match enum_value {
BertType::SmallInteger(value_u8) => self.to_bert(value_u8),
BertType::Integer(value_i32) => self.to_bert(value_i32),
BertType::Float(value_f64) => self.to_bert(value_f64),
BertType::String(string) => self.to_bert(string),
BertType::Boolean(boolean) => self.to_bert(boolean),
BertType::Tuple(tuple) => self.to_bert(tuple),
}
}
}
// some functions for serialize bool/integer/etc. into Vec<u8>
// ...
impl Serialize<BertTuple> for Serializer {
fn to_bert(&self, data: BertTuple) -> Vec<u8> {
let mut binary: Vec<u8> = data.values
.iter()
.map(|&item| self.enum_value_to_binary(item)) // <-- what the issue there?
.collect();
let arity = data.values.len();
match arity {
0...255 => self.get_small_tuple(arity as u8, binary),
_ => self.get_large_tuple(arity as i32, binary),
}
}
}
But when compiling, I receive an error with iterating around map:
error: the trait bound `std::vec::Vec<u8>: std::iter::FromIterator<std::vec::Vec<u8>>` is not satisfied [E0277]
.collect();
^~~~~~~
help: run `rustc --explain E0277` to see a detailed explanation
note: a collection of type `std::vec::Vec<u8>` cannot be built from an iterator over elements of type `std::vec::Vec<u8>`
error: aborting due to previous error
error: Could not compile `bert-rs`.
How can I fix this issue with std::iter::FromIterator?
The problem is that enum_value_to_binary returns a Vec<u8> for each element in values. So you end up with an Iterator<Item=Vec<u8>> and you call collect::<Vec<u8>>() on that, but it doesn't know how to flatten the nested vectors. If you want all the values to be flattened into one Vec<u8>, then you should use flat_map instead of map:
let mut binary: Vec<u8> = data.values
.iter()
.flat_map(|item| self.enum_value_to_binary(item).into_iter())
.collect();
Or, slightly more idiomatic and performant, you can just have enum_value_to_binary return an iterator directly.
Also, the iter method returns an Iterator<Item=&'a T>, which means you are just borrowing the elements, but self.enum_value_to_binary wants to take ownership over the value. There's a couple of ways to fix that. One option would be to use into_iter instead of iter, which will give you the elements by value. If you do that, you'll move the arity variable up to before the binary variable, since creating the binary variable will take ownership (move) data.values.
The other option would be to change self.enum_value_to_binary to take it's argument by reference.
Also possible that you meant for the type of binary to actually be Vec<Vec<u8>>.

Chaining iterators of different types

I get type errors when chaining different types of Iterator.
let s = Some(10);
let v = (1..5).chain(s.iter())
.collect::<Vec<_>>();
Output:
<anon>:23:20: 23:35 error: type mismatch resolving `<core::option::Iter<'_, _> as core::iter::IntoIterator>::Item == _`:
expected &-ptr,
found integral variable [E0271]
<anon>:23 let v = (1..5).chain(s.iter())
^~~~~~~~~~~~~~~
<anon>:23:20: 23:35 help: see the detailed explanation for E0271
<anon>:24:14: 24:33 error: no method named `collect` found for type `core::iter::Chain<core::ops::Range<_>, core::option::Iter<'_, _>>` in the current scope
<anon>:24 .collect::<Vec<_>>();
^~~~~~~~~~~~~~~~~~~
<anon>:24:14: 24:33 note: the method `collect` exists but the following trait bounds were not satisfied: `core::iter::Chain<core::ops::Range<_>, core::option::Iter<'_, _>> : core::iter::Iterator`
error: aborting due to 2 previous errors
But it works fine when zipping:
let s = Some(10);
let v = (1..5).zip(s.iter())
.collect::<Vec<_>>();
Output:
[(1, 10)]
Why is Rust able to infer the correct types for zip but not for chain and how can I fix it? n.b. I want to be able to do this for any iterator, so I don't want a solution that just works for Range and Option.
First, note that the iterators yield different types. I've added an explicit u8 to the numbers to make the types more obvious:
fn main() {
let s = Some(10u8);
let r = (1..5u8);
let () = s.iter().next(); // Option<&u8>
let () = r.next(); // Option<u8>
}
When you chain two iterators, both iterators must yield the same type. This makes sense as the iterator cannot "switch" what type it outputs when it gets to the end of one and begins on the second:
fn chain<U>(self, other: U) -> Chain<Self, U::IntoIter>
where U: IntoIterator<Item=Self::Item>
// ^~~~~~~~~~~~~~~ This means the types must match
So why does zip work? Because it doesn't have that restriction:
fn zip<U>(self, other: U) -> Zip<Self, U::IntoIter>
where U: IntoIterator
// ^~~~ Nothing here!
This is because zip returns a tuple with one value from each iterator; a new type, distinct from either source iterator's type. One iterator could be an integral type and the other could return your own custom type for all zip cares.
Why is Rust able to infer the correct types for zip but not for chain
There is no type inference happening here; that's a different thing. This is just plain-old type mismatching.
and how can I fix it?
In this case, your inner iterator yields a reference to an integer, a Clone-able type, so you can use cloned to make a new iterator that clones each value and then both iterators would have the same type:
fn main() {
let s = Some(10);
let v: Vec<_> = (1..5).chain(s.iter().cloned()).collect();
}
If you are done with the option, you can also use a consuming iterator with into_iter:
fn main() {
let s = Some(10);
let v: Vec<_> = (1..5).chain(s.into_iter()).collect();
}

Implementing Decodable for a wrapper around a fixed size vector

Background: the serialize crate is undocumented, deriving Decodable doesn't work. I've also looked at existing implementations for other types and find the code difficult to follow.
How does the decoding process work, and how do I implement Decodable for this struct?
pub struct Grid<A> {
data: [[A,..GRIDW],..GRIDH]
}
The reason why #[deriving(Decodable)] doesn't work is that [A,..GRIDW] doesn't implement Decodable, and it's impossible to implement a trait for a type when both are defined outside of this crate, which is the case here. So the only solution I can see is to manually implement Decodable for Grid.
And this is as far as I've gotten
impl <A: Decodable<D, E>, D: Decoder<E>, E> Decodable<D, E> for Grid<A> {
fn decode(decoder: &mut D) -> Result<Grid<A>, E> {
decoder.read_struct("Grid", 1u, ref |d| Ok(Grid {
data: match d.read_struct_field("data", 0u, ref |d| Decodable::decode(d)) {
Ok(e) => e,
Err(e) => return Err(e)
},
}))
}
}
Which gives an error at Decodable::decode(d)
error: failed to find an implementation of trait
serialize::serialize::Decodable for [[A, .. 20], .. 20]
It's not really possible to do this nicely at the moment for a variety of reasons:
We can't be generic over the length of a fixed length array (the fundamental issue)
The current trait coherence restrictions means we can't write a custom trait MyDecodable<D, E> { ... } with impl MyDecodable<D, E> for [A, .. GRIDW] (and one for GRIDH) and a blanket implementation impl<A: Decodable<D, E>> MyDecodable<D, E> for A. This forces a trait-based solution into using an intermediary type, which then makes the compiler's type inference rather unhappy and AFAICT impossible to satisfy.
We don't have associated types (aka "output types"), which I think would allow the type inference to be slightly sane.
Thus, for now, we're left with a manual implementation. :(
extern crate serialize;
use std::default::Default;
use serialize::{Decoder, Decodable};
static GRIDW: uint = 10;
static GRIDH: uint = 5;
fn decode_grid<E, D: Decoder<E>,
A: Copy + Default + Decodable<D, E>>(d: &mut D)
-> Result<Grid<A>, E> {
// mirror the Vec implementation: try to read a sequence
d.read_seq(|d, len| {
// check it's the required length
if len != GRIDH {
return Err(
d.error(format!("expecting length {} but found {}",
GRIDH, len).as_slice()));
}
// create the array with empty values ...
let mut array: [[A, .. GRIDW], .. GRIDH]
= [[Default::default(), .. GRIDW], .. GRIDH];
// ... and fill it in progressively ...
for (i, outer) in array.mut_iter().enumerate() {
// ... by reading each outer element ...
try!(d.read_seq_elt(i, |d| {
// ... as a sequence ...
d.read_seq(|d, len| {
// ... of the right length,
if len != GRIDW { return Err(d.error("...")) }
// and then read each element of that sequence as the
// elements of the grid.
for (j, inner) in outer.mut_iter().enumerate() {
*inner = try!(d.read_seq_elt(j, Decodable::decode));
}
Ok(())
})
}));
}
// all done successfully!
Ok(Grid { data: array })
})
}
pub struct Grid<A> {
data: [[A,..GRIDW],..GRIDH]
}
impl<E, D: Decoder<E>, A: Copy + Default + Decodable<D, E>>
Decodable<D, E> for Grid<A> {
fn decode(d: &mut D) -> Result<Grid<A>, E> {
d.read_struct("Grid", 1, |d| {
d.read_struct_field("data", 0, decode_grid)
})
}
}
fn main() {}
playpen.
It's also possible to write a more "generic" [T, .. n] decoder by using macros to instantiate each version, with special control over how the recursive decoding is handled to allow nested fixed-length arrays to be handled (as required for Grid); this requires somewhat less code (especially with more layers, or a variety of different lengths), but the macro solution:
may be harder to understand, and
the one I give there may be less efficient (there's a new array variable created for every fixed length array, including new Defaults, while the non-macro solution above just uses a single array and thus only calls Default::default once for each element in the grid). It may be possible to expand to a similar set of recursive loops, but I'm not sure.