How to mutate another item in a vector, but not the vector itself, while iterating over the vector? - iterator

It is quite clear to me that iterating over a vector shouldn't let the loop body mutate the vector arbitrarily. This prevents iterator invalidation, which is prone to bugs.
However, not all kinds of mutation lead to iterator invalidation. See the following example:
let mut my_vec: Vec<Vec<i32>> = vec![vec![1,2], vec![3,4], vec![5,6]];
for inner in my_vec.iter_mut() { // <- or .iter()
// ...
my_vec[some_index].push(inner[0]); // <-- ERROR
}
Such a mutation does not invalidate the iterator of my_vec, however it is disallowed. It could invalidate any references to the specific elements in my_vec[some_index] but we do not use any such references anyway.
I know that these questions are common, and I'm not asking for an explanation. I am looking for a way to refactor this so that I can get rid of this loop. In my actual code I have a huge loop body and I can't modularize it unless I express this bit nicely.
What I have thought of so far:
Wrapping the vector with Rc<RefCell<...>>. I think this would still fail at runtime, since the RefCell would be borrowed by the iterator and then will fail when the loop body tries to borrow it.
Using a temporary vector to accumulate the future pushes, and push them after the loop ends. This is okay, but needs more allocations than pushing them on the fly.
Unsafe code, and messing with pointers.
Anything listed in the Iterator documentation does not help. I checked out itertools and it looks like it wouldn't help either.
Using a while loop and indexing instead of using an iterator making use of a reference to the outer vector. This is okay, but does not let me use iterators and adapters. I just want to get rid of this outer loop and use my_vec.foreach(...).
Are there any idioms or any libraries which would let me do this nicely Unsafe functions would be okay as long as they don't expose pointers to me.

You can wrap each of the inner vectors in a RefCell.
use std::cell::RefCell;
fn main() {
let my_vec : Vec<RefCell<Vec<i32>>> = vec![
RefCell::new(vec![1,2]),
RefCell::new(vec![3,4]),
RefCell::new(vec![5,6])];
for inner in my_vec.iter() {
// ...
let value = inner.borrow()[0];
my_vec[some_index].borrow_mut().push(value);
}
}
Note that the value binding here is important if you need to be able to push to the vector that inner refers to. value happens to be a type that doesn't contain references (it's i32), so it doesn't keep the first borrow active (it ends by the end of the statement). Then, the next statement may borrow the same vector or another vector mutably and it'll work.
If we wrote my_vec[some_index].borrow_mut().push(inner.borrow()[0]); instead, then both borrows would be active until the end of the statement. If both my_vec[some_index] and inner refer to the same RefCell<Vec<i32>>, this will panic with RefCell<T> already mutably borrowed.

Without changing the type of my_vec, you could simply use access by indexing and split_at_mut:
for index in 0..my_vec.len() {
let (first, second) = my_vec.split_at_mut(index);
first[some_index].push(second[0]);
}
Note: beware, the indices in second are off by index.
This is safe, relatively easy, and very flexible. It does not, however, work with iterator adaptors.

Related

Invalid Iterator Fix

So, looking for advice on how to fix a situation or maybe a better way to program it.
I'm using iteration to build a complicated string from key:value pairs in an unordered_map. To make this work, I'm iterating through the map to find specific items, then sending a search term to an outside function to create the string. The outside function uses its own iterator to search the same unordered_map for the passed search term, then creates the string, then erases the entries that it referenced. The problem, I believe, is that although the outside function's iterator is still valid because it called the erase function, the iterators in the main function are now invalidated and throwing an out of range error. Is there a way to reset the iterators or send them to the next valid key:value pair when they become invalidated in order to avoid the error?
The code is a mess (mostly because I'm still discovering C++) and it might be possible to use recursion to accomplish this, but I wasn't able to get recursion to work correctly.
I can post the code, but without understanding the inputs and required outputs, it's likely not going to help explain anything, so for now, I'll just leave the question as-is: is there a way to "re-validate" invalidated iterators?
I was able to resolve the issue by redefining each of the iterators once the scope of control returned back to them. For the last iterator (in the outside function) that deleted individual key:value pairs from the unordered_map, I used:
if (it != map.end()) it = map.erase(it);
This forces the iterator to move to the next valid key:value pair after the erasure.
That worked for the end of the line, but didn't work once control was returned to each of the previous iterators. In those case, the iterators were invalidated when the outside function erased a key:value pair. So as control returned to an iterator, I included the following line before it looped back for increment:
if (it != map.end()) it = map.begin();
It seems to have resolved all of the issues, though I'm sure there's a better way to handle it.

Why does Iterator::next return an Option instead of just an item?

It seems to me that whether an option is the right type to return should be up to the implementor.
I notice that it goes away when I try to filter or using other collection methods on the items. Is this simply a replacement for has_next? Won't it have potential performance/memory implications?
Because it needs some way to communicate to the caller that there's nothing left to output.
fn main() {
let mut it = vec![1, 2, 3].into_iter();
assert_eq!(it.next(), Some(1));
assert_eq!(it.next(), Some(2));
assert_eq!(it.next(), Some(3));
assert_eq!(it.next(), None); // End of iterator.
}
As for a hypothetical has_next, that can complicate some iterator designs because it requires the iterator to know whether there is another element. This might require the iterator to compute the next element, then store it somewhere. It's also possible to forget to call has_next, or call it but ignore the result.
With next returning an Option, none of this is an issue; an iterator can compute the next item and return it whilst making it impossible for a caller to forget to ensure the returned value actually has something in it.
One thing this does not let you do is "peek" at the iterator to see if there's something more and then change logic based on that answer, without actually consuming the next item. However, that's what the peekable combinator is for, which gives you what amounts to a traditional has_next: peek().is_some().
On your concerns about performance: I've never seen anything to suggest there is any penalty. Anything using an iterator correctly has to check to see if it's reached the end. As for space, a Rust iterator doesn't need to cache the next item, so they're likely to be the same size or smaller than an iterator for a language that uses has_next.
Finally, as noted in comments, Option is not heap allocated. A None is equivalent to a false followed by some uninitialised space (since there's nothing in it), and a Some(v) is equivalent to a true followed by v.

Alternative to the try (?) operator suited to iterator mapping

In the process of learning Rust, I am getting acquainted with error propagation and the choice between unwrap and the ? operator. After writing some prototype code that only uses unwrap(), I would like to remove unwrap from reusable parts, where panicking on every error is inappropriate.
How would one avoid the use of unwrap in a closure, like in this example?
// todo is VecDeque<PathBuf>
let dir = fs::read_dir(&filename).unwrap();
todo.extend(dir.map(|dirent| dirent.unwrap().path()));
The first unwrap can be easily changed to ?, as long as the containing function returns Result<(), io::Error> or similar. However, the second unwrap, the one in dirent.unwrap().path(), cannot be changed to dirent?.path() because the closure must return a PathBuf, not a Result<PathBuf, io::Error>.
One option is to change extend to an explicit loop:
let dir = fs::read_dir(&filename)?;
for dirent in dir {
todo.push_back(dirent?.path());
}
But that feels wrong - the original extend was elegant and clearly reflected the intention of the code. (It might also have been more efficient than a sequence of push_backs.) How would an experienced Rust developer express error checking in such code?
How would one avoid the use of unwrap in a closure, like in this example?
Well, it really depends on what you wish to do upon failure.
should failure be reported to the user or be silent
if reported, should one failure be reported or all?
if a failure occur, should it interrupt processing?
For example, you could perfectly decide to silently ignore all failures and just skip the entries that fail. In this case, the Iterator::filter_map combined with Result::ok is exactly what you are asking for.
let dir = fs::read_dir(&filename)?;
let todos.extend(dir.filter_map(Result::ok));
The Iterator interface is full of goodies, it's definitely worth perusing when looking for tidier code.
Here is a solution based on filter_map suggested by Matthieu. It calls Result::map_err to ensure the error is "caught" and logged, sending it further to Result::ok and filter_map to remove it from iteration:
fn log_error(e: io::Error) {
eprintln!("{}", e);
}
(|| {
let dir = fs::read_dir(&filename)?;
todo.extend(dir
.filter_map(|res| res.map_err(log_error).ok()))
.map(|dirent| dirent.path()));
})().unwrap_or_else(log_error)

Non-destructively iterating over a Rust collection, but not by reference

I can write the following two ways, the second is inspired by What is the idiomatic way to create a collection of references to methods that take self?:
channels.iter().flat_map(|c|c.to_uppercase()).collect(),
channels.clone().into_iter().flat_map(char::to_uppercase).collect(),
The second line has to clone the collection because char::to_uppercase doesn't accept a reference as it's argument and .iter() provides references and .into_iter() moves the collection.
Is there a way to do this that doesn't need to clone the collection or create a closure? I don't hate closures, I promise, and I know they're just turned into (usually inline) function calls in LLVM anyway, but I like the cleanness of referring to a function like in the second line and would prefer to use it if it can be done without the clone.
Iterator has a cloned method which is equivalent to .map(|x| x.clone()) which, in case of Copy types is equivalent to .map(|&x| x). This way you can write
channels.iter().cloned().flat_map(char::to_uppercase).collect()
You can define a function that takes a reference. You can even put it inside another function, if you want to keep it close to its usage.
fn foobar() {
fn to_uppercase(c: &char) -> ::std::char::ToUppercase {
c.to_uppercase()
}
// [...]
let channels_upper = channels.iter().flat_map(to_uppercase).collect();
}

How a programmers solve the dilemma of using old variables instead of new variables?

For example:
... some code
int sizeOfSomeObject = someObject.length();
... some code, sizeOfSomeObject is not need anymore
now I need other int variable for other action(for example, for position in some object), and i have the dilemma: create a new variable or use sizeOfSomeObject for this. In the first case I will keep readability, but lose performance. In the second case - on the contrary. What usually do programmers in this situation?
In the first case I will keep readability, but lose performance. In the second case - on the contrary.
So did you benchmark it? I suspect no, you didn't. Most modern compilers do a lot of agressive analysis during register allocation, so if the optimizer perceives that there's a variable that's not used anymore, but there's a new variable of the same type, it will just merge the two variables to the same memory region or processor register. No need to worry about performance penalties.
And anyway, don't do premature optimization (which this is). In 90% of the cases, readability is more important than "performance".
All in all, go ahead and create a new variable with an appropriate, different, descriptive name. And just for fun, compile this version and the version in which you used the same variable name, and look at the generated assembly (or bytecode, or...) - and find out that they're identical.
I would use different named variables for different things.
In terms of something like this, I don't think just one variable would cause a massive performance hit. In most languages you have the option to clear variables from memory in some way when they are no longer in use, so I would recommend doing that so that the code means something to you or others when read at a later date.
In C++, you can use blocks for objects to be destroyed as soon as they are not needed anymore:
void some_function () {
{
MyClass c;
// ... here we use c ...
}
// now c has been destroyed
{
MyClass d;
// ... here we use d ...
}
// now d has been destroyed
}
In your example (with int variables), there is no reason to worry about performance. The worst thing that could probably happen is memory for two variables being used instead of one, but (i) that's negligible and (ii) int's will probably live in a CPU register, anyway. If you really worry, use the block approach for your int example.
It depends how often such an int would be initialized. If it's not in some hugely nested for loop, most (all) programmers will go for the first. Besides, most modern programming languages have a garbage collector, which cleans up left over objects.
Decent compiler will optimize out your second variable, so that shouldn't be an issue.
That said, there are situations where variable reuse makes sense. E.g., you might have some variable that holds a generic output populated from call to some external API. According to the context and parameters passed to the API you'll process the data differently but it's probably better (more readable etc.) to reuse the same data variable.
For example, something like this:
void* data = getSomeData(params);
//process data
//change params
data = getSomeData(params);
//process data
//change params
data = getSomeData(params);