Conflicting lifetime requirement for iterator returned from function - iterator

This may be a duplicate. I don't know. I couldn't understand the other answers well enough to know that. :)
Rust version: rustc 1.0.0-nightly (b47aebe3f 2015-02-26) (built 2015-02-27)
Basically, I'm passing a bool to this function that's supposed to build an iterator that filters one way for true and another way for false. Then it kind of craps itself because it doesn't know how to keep that boolean value handy, I guess. I don't know. There are actually multiple lifetime problems here, which is discouraging because this is a really common pattern for me, since I come from a .NET background.
fn main() {
for n in values(true) {
println!("{}", n);
}
}
fn values(even: bool) -> Box<Iterator<Item=usize>> {
Box::new([3usize, 4, 2, 1].iter()
.map(|n| n * 2)
.filter(|n| if even {
n % 2 == 0
} else {
true
}))
}
Is there a way to make this work?

You have two conflicting issues, so let break down a few representative pieces:
[3usize, 4, 2, 1].iter()
.map(|n| n * 2)
.filter(|n| n % 2 == 0))
Here, we create an array in the stack frame of the method, then get an iterator to it. Since we aren't allowed to consume the array, the iterator item is &usize. We then map from the &usize to a usize. Then we filter against a &usize - we aren't allowed to consume the filtered item, otherwise the iterator wouldn't have it to return!
The problem here is that we are ultimately rooted to the stack frame of the function. We can't return this iterator, because the array won't exist after the call returns!
To work around this for now, let's just make it static. Now we can focus on the issue with even.
filter takes a closure. Closures capture any variable used that isn't provided as an argument to the closure. By default, these variables are captured by reference. However, even is again a variable located on the stack frame. This time however, we can give it to the closure by using the move keyword. Here's everything put together:
fn main() {
for n in values(true) {
println!("{}", n);
}
}
static ITEMS: [usize; 4] = [3, 4, 2, 1];
fn values(even: bool) -> Box<Iterator<Item=usize>> {
Box::new(ITEMS.iter()
.map(|n| n * 2)
.filter(move |n| if even {
n % 2 == 0
} else {
true
}))
}

Related

Why does .flat_map() with .chars() not work with std::io::Lines, but does with a vector of Strings?

I am trying to iterate over characters in stdin. The Read.chars() method achieves this goal, but is unstable. The obvious alternative is to use Read.lines() with a flat_map to convert it to a character iterator.
This seems like it should work, but doesn't, resulting in borrowed value does not live long enough errors.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut lines = stdin.lock().lines();
let mut chars = lines.flat_map(|x| x.unwrap().chars());
}
This is mentioned in Read file character-by-character in Rust, but it does't really explain why.
What I am particularly confused about is how this differs from the example in the documentation for flat_map, which uses flat_map to apply .chars() to a vector of strings. I don't really see how that should be any different. The main difference I see is that my code needs to call unwrap() as well, but changing the last line to the following does not work either:
let mut chars = lines.map(|x| x.unwrap());
let mut chars = chars.flat_map(|x| x.chars());
It fails on the second line, so the issue doesn't appear to be the unwrap.
Why does this last line not work, when the very similar line in the documentation doesn't? Is there any way to get this to work?
Start by figuring out what the type of the closure's variable is:
let mut chars = lines.flat_map(|x| {
let () = x;
x.unwrap().chars()
});
This shows it's a Result<String, io::Error>. After unwrapping it, it will be a String.
Next, look at str::chars:
fn chars(&self) -> Chars
And the definition of Chars:
pub struct Chars<'a> {
// some fields omitted
}
From that, we can tell that calling chars on a string returns an iterator that has a reference to the string.
Whenever we have a reference, we know that the reference cannot outlive the thing that it is borrowed from. In this case, x.unwrap() is the owner. The next thing to check is where that ownership ends. In this case, the closure owns the String, so at the end of the closure, the value is dropped and any references are invalidated.
Except the code tried to return a Chars that still referred to the string. Oops. Thanks to Rust, the code didn't segfault!
The difference with the example that works is all in the ownership. In that case, the strings are owned by a vector outside of the loop and they do not get dropped before the iterator is consumed. Thus there are no lifetime issues.
What this code really wants is an into_chars method on String. That iterator could take ownership of the value and return characters.
Not the maximum efficiency, but a good start:
struct IntoChars {
s: String,
offset: usize,
}
impl IntoChars {
fn new(s: String) -> Self {
IntoChars { s: s, offset: 0 }
}
}
impl Iterator for IntoChars {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
let remaining = &self.s[self.offset..];
match remaining.chars().next() {
Some(c) => {
self.offset += c.len_utf8();
Some(c)
}
None => None,
}
}
}
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let lines = stdin.lock().lines();
let chars = lines.flat_map(|x| IntoChars::new(x.unwrap()));
for c in chars {
println!("{}", c);
}
}
See also:
How can I store a Chars iterator in the same struct as the String it is iterating on?
Is there an owned version of String::chars?

Generate a tree of structs with testing/quick, respecting invariants

I have a tree of structs which I'd like to test using testing/quick, but constraining it to within my invariants.
This example code works:
var rnd = rand.New(rand.NewSource(time.Now().UnixNano()))
type X struct {
HasChildren bool
Children []*X
}
func TestSomething(t *testing.T) {
x, _ := quick.Value(reflect.TypeOf(X{}), rnd)
_ = x
// test some stuff here
}
But we hold HasChildren = true whenever len(Children) > 0 as an invariant, so it'd be better to ensure that whatever quick.Value() generates respects that (rather than finding "bugs" that don't actually exist).
I figured I could define a Generate function which uses quick.Value() to populate all the variable members:
func (X) Generate(rand *rand.Rand, size int) reflect.Value {
x := X{}
throwaway, _ := quick.Value(reflect.TypeOf([]*X{}), rand)
x.Children = throwaway.Interface().([]*X)
if len(x.Children) > 0 {
x.HasChildren = true
} else {
x.HasChildren = false
}
return reflect.ValueOf(x)
}
But this is panicking:
panic: value method main.X.Generate called using nil *X pointer [recovered]
And when I change Children from []*X to []X, it dies with a stack overflow.
The documentation is very thin on examples, and I'm finding almost nothing in web searches either.
How can this be done?
Looking at the testing/quick source code it seems that you can't create recursive custom generators and at the same time reuse the quick library facilities to generate the array part of the struct, because the size parameter, that is designed to limit the number of recursive calls, cannot be passed back into quick.Value(...)
https://golang.org/src/testing/quick/quick.go (see around line 50)
in your case this lead to an infinite tree that quickly "explodes" with 1..50 leafs at each level (that's the reason for the stack overflow).
If the function quick.sizedValue() had been public we could have used it to accomplish your task, but unfortunately this is not the case.
BTW since HasChildren is an invariant, can't you simply make it a struct method?
type X struct {
Children []*X
}
func (me *X) HasChildren() bool {
return len(me.Children) > 0
}
func main() {
.... generate X ....
if x.HasChildren() {
.....
}
}

What's the most efficient way to reuse an iterator in Rust?

I'd like to reuse an iterator I made, so as to avoid paying to recreate it from scratch. But iterators don't seem to be cloneable and collect moves the iterator so I can't reuse it.
Here's more or less the equivalent of what I'm trying to do.
let my_iter = my_string.unwrap_or("A").chars().flat_map(|c|c.to_uppercase()).map(|c| Tag::from(c).unwrap() );
let my_struct = {
one: my_iter.collect(),
two: my_iter.map(|c|{(c,Vec::new())}).collect(),
three: my_iter.filter_map(|c|if c.predicate(){Some(c)}else{None}).collect(),
four: my_iter.map(|c|{(c,1.0/my_float)}).collect(),
five: my_iter.map(|c|(c,arg_time.unwrap_or(time::now()))).collect(),
//etc...
}
You should profile before you optimize something, otherwise you might end up making things both slower and more complex than they need to.
The iterators in your example
let my_iter = my_string.unwrap_or("A").chars().flat_map(|c|c.to_uppercase()).map(|c| Tag::from(c).unwrap() );
are thin structures allocated on the stack. Cloning them isn't going to be much cheaper than building them from scratch.
Constructing an iterator with .chars().flat_map(|c| c.to_uppercase()) takes only a single nanosecond when I benchmark it.
According to the same benchmark, wrapping iterator creation in a closure takes more time than simply building the iterator in-place.
Cloning a Vec iterator is not much faster than building it in-place, both are practically instant.
test construction_only ... bench: 1 ns/iter (+/- 0)
test inplace_construction ... bench: 249 ns/iter (+/- 20)
test closure ... bench: 282 ns/iter (+/- 18)
test vec_inplace_iter ... bench: 0 ns/iter (+/- 0)
test vec_clone_iter ... bench: 0 ns/iter (+/- 0)
Iterators in general are Clone-able if all their "pieces" are Clone-able. You have a couple of them in my_iter that are not: the anonymous closures (like the one in flat_map) and the ToUppercase struct returned by to_uppercase.
What you can do is:
rebuild the whole thing (as #ArtemGr suggests). You could use a macro to avoid repetition. A bit ugly but should work.
collect my_iter into a Vec before populating my_struct (since you seem to collect it anyway in there): let my_iter: Vec<char> = my_string.unwrap_or("A").chars().flat_map(|c|c.to_uppercase()).map(|c| Tag::from(c).unwrap() ).collect();
create your own custom iterator. Without your definitions of my_string (since you call unwrap_or on it I assume it's not a String) and Tag it's hard to help you more concretely with this.
You may use closure to get identical iterators:
#[derive(Debug)]
struct MyStruct{
one:Vec<char>,
two:Vec<char>,
three:String
}
fn main() {
let my_string:String = "ABCD1234absd".into();
let my_iter = || my_string.chars();
let my_struct = MyStruct{
one: my_iter().collect(),
two: my_iter().filter(|x| x.is_numeric()).collect(),
three: my_iter().filter(|x| x.is_lowercase()).collect()
};
println!("{:?}", my_struct);
}
See also this Correct way to return an Iterator? question.
Also you may clone iterator (see #Paolo Falabella answer about iterators cloneability):
fn main() {
let v = vec![1,2,3,4,5,6,7,8,9];
let mut i = v.iter().skip(2);
let mut j = i.clone();
println!("{:?}", i.take(3).collect::<Vec<_>>());
println!("{:?}", j.filter(|&x| x%2==0).collect::<Vec<_>>());
}
Unfortunately I can't tell which way is more effective

Iterate over copy types

It is clear that iterators pass around a references to avoid moving objects into iterator or it's closure argument, but what with Copy types? Let me show you a small snippet:
fn is_odd(x: &&i32) -> bool { *x & 1 == 1 }
// [1] fn is_odd(x: &i32) -> bool { x & 1 == 1 }
// [2] fn is_odd(x: i32) -> bool { x & 1 == 1 }
fn main() {
let xs = &[ 10, 20, 13, 14 ];
for x in xs.iter().filter(is_odd) {
assert_eq!(13, *x);
}
// [1] ...is slightly better, but not ideal
// for x in xs.iter().cloned().filter(is_odd) {
// assert_eq!(13, x);
// }
}
Am I right that .cloned() is preferred when we iterate over something like &[i32] or &[u8], where extra indirection is involved instead of just copying the tiny data unit?
But it looks like I can not avoid references passed into is_odd function.
Is there a way to make [2] function from above snippet work for higher-level functions like filter?
Assume that I understand that moving non-Copy type into predicate function is silly. But not all types use move semantics by default, right?
It is clear that iterators pass around a references
This blanket statement is not true, iterators are more than capable of yielding a non-reference. filter will provide a reference to the closure because it doesn't want to give ownership of the item to the closure. In your example, your iterated value is a &i32, and then filter provides a &&i32.
Is there a way to make [2] function from above snippet work for higher-level functions like filter?
Certainly, just provide a closure that does the dereferencing:
fn is_odd(x: i32) -> bool { x & 1 == 1 }
fn main() {
let xs = &[ 10, 20, 13, 14 ];
for x in xs.iter().filter(|&&x| is_odd(x)) {
assert_eq!(13, *x);
}
}

Counter as variable in for-in-loops

When normally using a for-in-loop, the counter (in this case number) is a constant in each iteration:
for number in 1...10 {
// do something
}
This means I cannot change number in the loop:
for number in 1...10 {
if number == 5 {
++number
}
}
// doesn't compile, since the prefix operator '++' can't be performed on the constant 'number'
Is there a way to declare number as a variable, without declaring it before the loop, or using a normal for-loop (with initialization, condition and increment)?
To understand why i can’t be mutable involves knowing what for…in is shorthand for. for i in 0..<10 is expanded by the compiler to the following:
var g = (0..<10).generate()
while let i = g.next() {
// use i
}
Every time around the loop, i is a freshly declared variable, the value of unwrapping the next result from calling next on the generator.
Now, that while can be written like this:
while var i = g.next() {
// here you _can_ increment i:
if i == 5 { ++i }
}
but of course, it wouldn’t help – g.next() is still going to generate a 5 next time around the loop. The increment in the body was pointless.
Presumably for this reason, for…in doesn’t support the same var syntax for declaring it’s loop counter – it would be very confusing if you didn’t realize how it worked.
(unlike with where, where you can see what is going on – the var functionality is occasionally useful, similarly to how func f(var i) can be).
If what you want is to skip certain iterations of the loop, your better bet (without resorting to C-style for or while) is to use a generator that skips the relevant values:
// iterate over every other integer
for i in 0.stride(to: 10, by: 2) { print(i) }
// skip a specific number
for i in (0..<10).filter({ $0 != 5 }) { print(i) }
let a = ["one","two","three","four"]
// ok so this one’s a bit convoluted...
let everyOther = a.enumerate().filter { $0.0 % 2 == 0 }.map { $0.1 }.lazy
for s in everyOther {
print(s)
}
The answer is "no", and that's a good thing. Otherwise, a grossly confusing behavior like this would be possible:
for number in 1...10 {
if number == 5 {
// This does not work
number = 5000
}
println(number)
}
Imagine the confusion of someone looking at the number 5000 in the output of a loop that is supposedly bound to a range of 1 though 10, inclusive.
Moreover, what would Swift pick as the next value of 5000? Should it stop? Should it continue to the next number in the range before the assignment? Should it throw an exception on out-of-range assignment? All three choices have some validity to them, so there is no clear winner.
To avoid situations like that, Swift designers made loop variables in range loops immutable.
Update Swift 5
for var i in 0...10 {
print(i)
i+=1
}