Why does Iterator::next return an Option instead of just an item? - iterator

It seems to me that whether an option is the right type to return should be up to the implementor.
I notice that it goes away when I try to filter or using other collection methods on the items. Is this simply a replacement for has_next? Won't it have potential performance/memory implications?

Because it needs some way to communicate to the caller that there's nothing left to output.
fn main() {
let mut it = vec![1, 2, 3].into_iter();
assert_eq!(it.next(), Some(1));
assert_eq!(it.next(), Some(2));
assert_eq!(it.next(), Some(3));
assert_eq!(it.next(), None); // End of iterator.
}
As for a hypothetical has_next, that can complicate some iterator designs because it requires the iterator to know whether there is another element. This might require the iterator to compute the next element, then store it somewhere. It's also possible to forget to call has_next, or call it but ignore the result.
With next returning an Option, none of this is an issue; an iterator can compute the next item and return it whilst making it impossible for a caller to forget to ensure the returned value actually has something in it.
One thing this does not let you do is "peek" at the iterator to see if there's something more and then change logic based on that answer, without actually consuming the next item. However, that's what the peekable combinator is for, which gives you what amounts to a traditional has_next: peek().is_some().
On your concerns about performance: I've never seen anything to suggest there is any penalty. Anything using an iterator correctly has to check to see if it's reached the end. As for space, a Rust iterator doesn't need to cache the next item, so they're likely to be the same size or smaller than an iterator for a language that uses has_next.
Finally, as noted in comments, Option is not heap allocated. A None is equivalent to a false followed by some uninitialised space (since there's nothing in it), and a Some(v) is equivalent to a true followed by v.

Related

Invalid Iterator Fix

So, looking for advice on how to fix a situation or maybe a better way to program it.
I'm using iteration to build a complicated string from key:value pairs in an unordered_map. To make this work, I'm iterating through the map to find specific items, then sending a search term to an outside function to create the string. The outside function uses its own iterator to search the same unordered_map for the passed search term, then creates the string, then erases the entries that it referenced. The problem, I believe, is that although the outside function's iterator is still valid because it called the erase function, the iterators in the main function are now invalidated and throwing an out of range error. Is there a way to reset the iterators or send them to the next valid key:value pair when they become invalidated in order to avoid the error?
The code is a mess (mostly because I'm still discovering C++) and it might be possible to use recursion to accomplish this, but I wasn't able to get recursion to work correctly.
I can post the code, but without understanding the inputs and required outputs, it's likely not going to help explain anything, so for now, I'll just leave the question as-is: is there a way to "re-validate" invalidated iterators?
I was able to resolve the issue by redefining each of the iterators once the scope of control returned back to them. For the last iterator (in the outside function) that deleted individual key:value pairs from the unordered_map, I used:
if (it != map.end()) it = map.erase(it);
This forces the iterator to move to the next valid key:value pair after the erasure.
That worked for the end of the line, but didn't work once control was returned to each of the previous iterators. In those case, the iterators were invalidated when the outside function erased a key:value pair. So as control returned to an iterator, I included the following line before it looped back for increment:
if (it != map.end()) it = map.begin();
It seems to have resolved all of the issues, though I'm sure there's a better way to handle it.

Alternative to the try (?) operator suited to iterator mapping

In the process of learning Rust, I am getting acquainted with error propagation and the choice between unwrap and the ? operator. After writing some prototype code that only uses unwrap(), I would like to remove unwrap from reusable parts, where panicking on every error is inappropriate.
How would one avoid the use of unwrap in a closure, like in this example?
// todo is VecDeque<PathBuf>
let dir = fs::read_dir(&filename).unwrap();
todo.extend(dir.map(|dirent| dirent.unwrap().path()));
The first unwrap can be easily changed to ?, as long as the containing function returns Result<(), io::Error> or similar. However, the second unwrap, the one in dirent.unwrap().path(), cannot be changed to dirent?.path() because the closure must return a PathBuf, not a Result<PathBuf, io::Error>.
One option is to change extend to an explicit loop:
let dir = fs::read_dir(&filename)?;
for dirent in dir {
todo.push_back(dirent?.path());
}
But that feels wrong - the original extend was elegant and clearly reflected the intention of the code. (It might also have been more efficient than a sequence of push_backs.) How would an experienced Rust developer express error checking in such code?
How would one avoid the use of unwrap in a closure, like in this example?
Well, it really depends on what you wish to do upon failure.
should failure be reported to the user or be silent
if reported, should one failure be reported or all?
if a failure occur, should it interrupt processing?
For example, you could perfectly decide to silently ignore all failures and just skip the entries that fail. In this case, the Iterator::filter_map combined with Result::ok is exactly what you are asking for.
let dir = fs::read_dir(&filename)?;
let todos.extend(dir.filter_map(Result::ok));
The Iterator interface is full of goodies, it's definitely worth perusing when looking for tidier code.
Here is a solution based on filter_map suggested by Matthieu. It calls Result::map_err to ensure the error is "caught" and logged, sending it further to Result::ok and filter_map to remove it from iteration:
fn log_error(e: io::Error) {
eprintln!("{}", e);
}
(|| {
let dir = fs::read_dir(&filename)?;
todo.extend(dir
.filter_map(|res| res.map_err(log_error).ok()))
.map(|dirent| dirent.path()));
})().unwrap_or_else(log_error)

How to mutate another item in a vector, but not the vector itself, while iterating over the vector?

It is quite clear to me that iterating over a vector shouldn't let the loop body mutate the vector arbitrarily. This prevents iterator invalidation, which is prone to bugs.
However, not all kinds of mutation lead to iterator invalidation. See the following example:
let mut my_vec: Vec<Vec<i32>> = vec![vec![1,2], vec![3,4], vec![5,6]];
for inner in my_vec.iter_mut() { // <- or .iter()
// ...
my_vec[some_index].push(inner[0]); // <-- ERROR
}
Such a mutation does not invalidate the iterator of my_vec, however it is disallowed. It could invalidate any references to the specific elements in my_vec[some_index] but we do not use any such references anyway.
I know that these questions are common, and I'm not asking for an explanation. I am looking for a way to refactor this so that I can get rid of this loop. In my actual code I have a huge loop body and I can't modularize it unless I express this bit nicely.
What I have thought of so far:
Wrapping the vector with Rc<RefCell<...>>. I think this would still fail at runtime, since the RefCell would be borrowed by the iterator and then will fail when the loop body tries to borrow it.
Using a temporary vector to accumulate the future pushes, and push them after the loop ends. This is okay, but needs more allocations than pushing them on the fly.
Unsafe code, and messing with pointers.
Anything listed in the Iterator documentation does not help. I checked out itertools and it looks like it wouldn't help either.
Using a while loop and indexing instead of using an iterator making use of a reference to the outer vector. This is okay, but does not let me use iterators and adapters. I just want to get rid of this outer loop and use my_vec.foreach(...).
Are there any idioms or any libraries which would let me do this nicely Unsafe functions would be okay as long as they don't expose pointers to me.
You can wrap each of the inner vectors in a RefCell.
use std::cell::RefCell;
fn main() {
let my_vec : Vec<RefCell<Vec<i32>>> = vec![
RefCell::new(vec![1,2]),
RefCell::new(vec![3,4]),
RefCell::new(vec![5,6])];
for inner in my_vec.iter() {
// ...
let value = inner.borrow()[0];
my_vec[some_index].borrow_mut().push(value);
}
}
Note that the value binding here is important if you need to be able to push to the vector that inner refers to. value happens to be a type that doesn't contain references (it's i32), so it doesn't keep the first borrow active (it ends by the end of the statement). Then, the next statement may borrow the same vector or another vector mutably and it'll work.
If we wrote my_vec[some_index].borrow_mut().push(inner.borrow()[0]); instead, then both borrows would be active until the end of the statement. If both my_vec[some_index] and inner refer to the same RefCell<Vec<i32>>, this will panic with RefCell<T> already mutably borrowed.
Without changing the type of my_vec, you could simply use access by indexing and split_at_mut:
for index in 0..my_vec.len() {
let (first, second) = my_vec.split_at_mut(index);
first[some_index].push(second[0]);
}
Note: beware, the indices in second are off by index.
This is safe, relatively easy, and very flexible. It does not, however, work with iterator adaptors.

Why use a single incrementer class

Below code are found in WebKit:
RefPtr<Element> element = pendingScript.releaseElementAndClear();
if (ScriptElement* scriptElement = toScriptElement(element.get())) {
NestingLevelIncrementer nestingLevelIncrementer(m_scriptNestingLevel);
IgnoreDestructiveWriteCountIncrementer ignoreDestructiveWriteCountIncrementer(m_document);
//Do something else...
}
}
NestingLevelIncrementer is a simple class, which increase the counter in construction and decrease it in destruction. You could check the implementation here.
In this scrap, I think that is similar with increasing and reducing the number directly. Perhaps the only benefit is no matter to reduce the number then, but one new class is introduced.
Any other reason to use this pattern?
The intent is for the increment to be reversed no matter how the something else concludes; the stack variable will be destroyed when the method returns or an exception is thrown.
An alternative approach in other languages would use try...finally; see this for more discussion on RAII in C++ vs. finally:
Does C++ support 'finally' blocks? (And what's this 'RAII' I keep hearing about?)

Optional blocks in the middle of a method: safe?

I have a method looking something like:
- (void)doSomething:(void(^)(MyItem *item))optionalBlock {
// 1. Do the regular stuff with the item
item.regularStuff = #"something_regular";
// 2. Run the optional block
// which may or may not make some extra modifications to the item
if (optionalBlock) optionalBlock(item);
// 3. Save the modified item into the Core Data
// etc
}
Which I’m planning to call either like this
[self doSomething:nil];
Or:
[self doSomething:^(MyItem *item) {
// Make some extra modifications to the item before it’s saved
item.custom = #"custom";
}];
Is it safe to assume that at the step three I’ll always get the item already modified by both the method and (potentially) the optional block, or do I need to implement some sort of a way to find out exactly when the block has finished executing so I can continue from there?
It's safe. You don't need any special check.
Yes and no.
Yes, it's safe, in that if the block contains only sequential code for modifying your item, all those modifications will be complete by the time control returns to your doSomething method.
However, if you allow the caller of your method to pass in an arbitrary block, there's no telling what it might do and when. It could set up timers, spawn threads, use dispatch_async, or do any number of other things which could cause it to in some sense not be truly "done" by the time it returns. You're handing over the car keys here -- there's nothing preventing the caller from taking a joyride.
Really, this is the sort of thing that's beyond the scope of the language and more about what sort of contract you define in the documentation of your API: if you want your callers to only modify the object during execution of that block, just tell them that's what you expect them to do and to not expect your API to work otherwise.