Rust Inspect Iterator: cannot borrow `*` as immutable because it is also borrowed as mutable - iterator

Why can't I push to this vector during inspect and do contains on it during skip_while?
I've implemented my own iterator for my own struct Chain like this:
struct Chain {
n: u32,
}
impl Chain {
fn new(start: u32) -> Chain {
Chain { n: start }
}
}
impl Iterator for Chain {
type Item = u32;
fn next(&mut self) -> Option<u32> {
self.n = digit_factorial_sum(self.n);
Some(self.n)
}
}
Now what I'd like to do it take while the iterator is producing unique values. So I'm inspect-ing the chain and pushing to a vector and then checking it in a take_while scope:
let mut v = Vec::with_capacity(terms);
Chain::new(i)
.inspect(|&x| {
v.push(x)
})
.skip_while(|&x| {
return v.contains(&x);
})
However, the Rust compile spits out this error:
error: cannot borrow `v` as immutable because it is also borrowed as mutable [E0502]
...
borrow occurs due to use of `v` in closure
return v.contains(&x);
^
previous borrow of `v` occurs here due to use in closure; the mutable borrow prevents subsequent moves, borrows, or modification of `v` until the borrow ends
.inspect(|&x| {
v.push(x)
})
Obviously I don't understand the concept of "borrowing". What am I doing wrong?

The problem here is that you're attempting to create both a mutable and an immutable reference to the same variable, which is a violation of Rust borrowing rules. And rustc actually does say this to you very clearly.
let mut v = Vec::with_capacity(terms);
Chain::new(i)
.inspect(|&x| {
v.push(x)
})
.skip_while(|&x| {
return v.contains(&x);
})
Here you're trying to use v in two closures, first in inspect() argument, second in skip_while() argument. Non-move closures capture their environment by reference, so the environment of the first closure contains &mut v, and that of the second closure contains &v. Closures are created in the same expression, so even if it was guaranteed that inspect() ran and dropped the borrow before skip_while() (which I is not the actual case, because these are iterator adapters and they won't be run at all until the iterator is consumed), due to lexical borrowing rules this is prohibited.
Unfortunately, this is one of those examples when the borrow checker is overly strict. What you can do is to use RefCell, which allows mutation through a shared reference but introduces some run-time cost:
use std::cell::RefCell;
let mut v = RefCell::new(Vec::with_capacity(terms));
Chain::new(i)
.inspect(|x| v.borrow_mut().push(*x))
.skip_while(|x| v.borrow().contains(x))
I think it may be possible to avoid runtime penalty of RefCell and use UnsafeCell instead, because when the iterator is consumed, these closures will only run one after another, not at the same time, so there should never be a mutable and an immutable references outstanding at the same time. It could look like this:
use std::cell::UnsafeCell;
let mut v = UnsafeCell::new(Vec::with_capacity(terms));
Chain::new(i)
.inspect(|x| unsafe { (&mut *v.get()).push(*x) })
.skip_while(|x| unsafe { (&*v.get()).contains(x) })
But I may be wrong, and anyway, the overhead of RefCell is not that high unless this code is running in a really tight loop, so you should only use UnsafeCell as a last resort, only when nothing else works, and exercise extreme caution when working with it.

Related

How to match custom Fails with the failure crate

I'm trying to understand how to use the failure crate. It works splendidly as a unification of different types of standard errors, but when creating custom errors (Fails), I do not understand how to match for custom errors. For example:
use failure::{Fail, Error};
#[derive(Debug, Fail)]
pub enum Badness {
#[fail(display = "Ze badness")]
Level(String)
}
pub fn do_badly() -> Result<(), Error> {
Err(Badness::Level("much".to_owned()).into())
}
#[test]
pub fn get_badness() {
match do_badly() {
Err(Badness::Level(level)) => panic!("{:?} badness!", level),
_ => (),
};
}
fails with
error[E0308]: mismatched types
--> barsa-nagios-forwarder/src/main.rs:74:9
|
73 | match do_badly() {
| ---------- this match expression has type `failure::Error`
74 | Err(Badness::Level(level)) => panic!("{:?} badness!", level),
| ^^^^^^^^^^^^^^^^^^^^^ expected struct `failure::Error`, found enum `Badness`
|
= note: expected type `failure::Error`
found type `Badness`
How can I formulate a pattern which matches a specific custom error?
You need to downcast the Error
When you create a failure::Error from some type that implements the Fail trait (via from or into, as you do), you temporarily hide the information about the type you're wrapping from the compiler. It doesn't know that Error is a Badness - because it can also be any other Fail type, that's the point. You need to remind the compiler of this, the action is called downcasting. The failure::Error has three methods for this: downcast, downcast_ref and downcast_mut. After you've downcast it, you can pattern match on the result as normal - but you need to take into account the possibility that downcasting itself may fail (if you try to downcast to a wrong type).
Here's how it'd look with downcast:
pub fn get_badness() {
if let Err(wrapped_error) = do_badly() {
if let Ok(bad) = wrapped_error.downcast::<Badness>() {
panic!("{:?} badness!", bad);
}
}
}
(two if lets can be combined in this case).
This quickly gets very unpleasant if more than one error type needs to be tested, since downcast consumes the failure::Error it was called on (so you can't try another downcast on the same variable if the first one fails). I sadly couldn't figure out an elegant way to do this. Here's a variant one shouldn't really use (panic! in map is questionable, and doing anything else there would be plenty awkward, and I don't even want to think about more cases than two):
#[derive(Debug, Fail)]
pub enum JustSoSo {
#[fail(display = "meh")]
Average,
}
pub fn get_badness() {
if let Err(wrapped_error) = do_badly() {
let e = wrapped_error.downcast::<Badness>()
.map(|bad| panic!("{:?} badness!", bad))
.or_else(|original| original.downcast::<JustSoSo>());
if let Ok(so) = e {
println!("{}", so);
}
}
}
or_else chain should work OK if you actually want to produce some value of the same type from all of the possible\relevant errors. Consider also using non-consuming methods if a reference to the original error is fine for you, as this would allow you to just make a series of if let blocks , one for each downcast attempt.
An alternative
Don't put your errors into failure::Error, put them in a custom enum as variants. It's more boilerplate, but you get painless pattern matching, which the compiler also will be able to check for sanity. If you choose to do this, I'd recommend derive_more crate which is capable of deriving From for such enums; snafu looks very interesting as well, but I have yet to try it. In its most basic form this approach looks like this:
pub enum SomeError {
Bad(Badness),
NotTooBad(JustSoSo),
}
pub fn do_badly_alt() -> Result<(), SomeError> {
Err(SomeError::Bad(Badness::Level("much".to_owned())))
}
pub fn get_badness_alt() {
if let Err(wrapper) = do_badly_alt() {
match wrapper {
SomeError::Bad(bad) => panic!("{:?} badness!", bad),
SomeError::NotTooBad(so) => println!("{}", so),
}
}
}

How to use the same iterator twice, once for counting and once for iteration?

It seems that an iterator is consumed when counting. How can I use the same iterator for counting and then iterate on it?
I'm trying to count the lines in a file and then print them. I am able to read the file content, I'm able to count the lines count, but then I'm no longer able to iterate over the lines as if the internal cursor was at the end of the iterator.
use std::fs::File;
use std::io::prelude::*;
fn main() {
let log_file_name = "/home/myuser/test.log";
let mut log_file = File::open(log_file_name).unwrap();
let mut log_content: String = String::from("");
//Reads the log file.
log_file.read_to_string(&mut log_content).unwrap();
//Gets all the lines in a Lines struct.
let mut lines = log_content.lines();
//Uses by_ref() in order to not take ownership
let count = lines.by_ref().count();
println!("{} lines", count); //Prints the count
//Doesn't enter in the loop
for value in lines {
println!("{}", value);
}
}
Iterator doesn't have a reset method, but it seems the internal cursor is at the end of the iterator after the count. Is it mandatory to create a new Lines by calling log_content.lines() again or can I reset the internal cursor?
For now, the workaround that I found is create a new iterator:
use std::fs::File;
use std::io::prelude::*;
fn main() {
let log_file_name = "/home/myuser/test.log";
let mut log_file = File::open(log_file_name).unwrap();
let mut log_content: String = String::from("");
//Reads the log file.
log_file.read_to_string(&mut log_content).unwrap();
//Counts all and consume the iterator
let count = log_content.lines().count();
println!("{} lines", count);
//Creates a pretty new iterator
let lines = log_content.lines();
for value in lines {
println!("{}", value);
}
}
Calling count consumes the iterator, because it actually iterates until it is done (i.e. next() returns None).
You can prevent consuming the iterator by using by_ref, but the iterator is still driven to its completion (by_ref actually just returns the mutable reference to the iterator, and Iterator is also implemented for the mutable reference: impl<'a, I> Iterator for &'a mut I).
This still can be useful if the iterator contains other state you want to reuse after it is done, but not in this case.
You could simply try forking the iterator (they often implement Clone if they don't have side effects), although in this case recreating it is just as good (most of the time creating an iterator is cheap; the real work is usually only done when you drive it by calling next directly or indirectly).
So no, (in this case) you can't reset it, and yes, you need to create a new one (or clone it before using it).
The other answers have already well-explained that you can either recreate your iterator or clone it.
If the act of iteration is overly expensive or it's impossible to do multiple times (such as reading from a network socket), an alternative solution is to create a collection of the iterator's values that will allow you to get the length and the values.
This does require storing every value from the iterator; there's no such thing as a free lunch!
use std::fs;
fn main() {
let log_content = fs::read_to_string("/home/myuser/test.log").unwrap();
let lines: Vec<_> = log_content.lines().collect();
println!("{} lines", lines.len());
for value in lines {
println!("{}", value);
}
}
Iterators can generally not be iterated twice because there might be a cost to their iteration. In the case of str::lines, each iteration needs to find the next end of line, which means scanning through the string, which has some cost. You could argue that the iterator could save those positions for later reuse, but the cost of storing them would be even bigger.
Some Iterators are even more expensive to iterate, so you really don't want to do it twice.
Many iterators can be recreated easily (here calling str::lines a second time) or be cloned. Whichever way you recreate an iterator, the two iterators are generally completely independent, so iterating will mean you'll pay the price twice.
In your specific case, it is probably fine to just iterate the string twice as strings that fit in memory shouldn't be so long that merely counting lines would be a very expensive operation. If you believe this is the case, first benchmark it, second, write your own algorithm as Lines::count is probably not optimized as much as it could since the primary goal of Lines is to iterate lines.

Preferable pattern for getting around the "moving out of borrowed self" checker

Consider the pattern where there are several states registered with a dispatcher and each state knows what state to transition to when it receives an appropriate event. This is a simple state transition pattern.
struct Dispatcher {
states: HashMap<Uid, Rc<RefCell<State>>>,
}
impl Dispatcher {
pub fn insert_state(&mut self, state_id: Uid, state: Rc<RefCell<State>>) -> Option<Rc<RefCell<State>>> {
self.states.insert(state_id, state)
}
fn dispatch(&mut self, state_id: Uid, event: Event) {
if let Some(mut state) = states.get_mut(&state_id).cloned() {
state.handle_event(self, event);
}
}
}
trait State {
fn handle_event(&mut self, &mut Dispatcher, Event);
}
struct S0 {
state_id: Uid,
move_only_field: Option<MOF>,
// This is pattern that concerns me.
}
impl State for S0 {
fn handle_event(&mut self, dispatcher: &mut Dispatcher, event: Event) {
if event == Event::SomeEvent {
// Do some work
if let Some(mof) = self.mof.take() {
let next_state = Rc::new(RefCell::new(S0 {
state_id: self.state_id,
move_only_field: mof,
}));
let _ = dispatcher.insert(self.state_id, next_state);
} else {
// log an error: BUGGY Logic somewhere
let _ = dispatcher.remove_state(&self.state_id);
}
} else {
// Do some other work, maybe transition to State S2 etc.
}
}
}
struct S1 {
state_id: Uid,
move_only_field: MOF,
}
impl State for S1 {
fn handle_event(&mut self, dispatcher: &mut Dispatcher, event: Event) {
// Do some work, maybe transition to State S2/S3/S4 etc.
}
}
With reference to the inline comment above saying:
// This is pattern that concerns me.
S0::move_only_field needs to be an Option in this pattern because self is borrowed in handle_event, but I am not sure that this is best way to approach it.
Here are the ways I can think of with demerits of each one:
Put it into an Option as I have done: this feels hacky and every time I need
to check the invariant that the Option is always Some otherwise
panic! or make it a NOP with if let Some() = and ignore
the else clause, but this causes code-bloat. Doing an unwrap
or bloating the code with if let Some() feels a bit off.
Get it into a shared ownership Rc<RefCell<>>: Need to heap allocate
all such variables or construct another struct called Inner or
something that has all these non-clonable types and put that into an
Rc<RefCell<>>.
Pass stuff back to Dispatcher indicating it to basically remove us
from the map and then move things out of us to the next State which
will also be indicated via our return value: Too much coupling,
breaks OOP, does not scale as Dispatcher needs to know about all the
States and needs frequent updating. I don't think this is a good
paradigm, but could be wrong.
Implement Default for MOF above: Now we can mem::replace it with
the default while moving out the old value. The burden of panicking OR
returning an error OR doing a NOP is now hidden in implementation of
MOF. The problem here is we don't always have the access to MOF
type and for those that we do, it again takes the point of bloat
from user code to the code of MOF.
Let the function handle_event take self by move as fn handle_event(mut self, ...) -> Option<Self>: Now instead of Rc<RefCell<>> you will need to have Box<State> and move it out each time in the dispatcher and if the return is Some you put it back. This almost feels like a sledgehammer and makes many other idioms impossible, for instance if I wanted to share self further in some registered closure/callback I would normally put a Weak<RefCell<>> previously but now sharing self in callbacks etc is impossible.
Are there any other options? Is there any that is considered the "most idiomatic" way of doing this in Rust?
Let the function handle_event take self by move as fn handle_event(mut self, ...) -> Option<Self>: Now instead of Rc<RefCell<>> you will need to have Box<State> and move it out each time in the dispatcher and if the return is Some you put it back.
This is what I would do. However, you don't need to switch from Rc to Box if there is only one strong reference: Rc::try_unwrap can move out of an Rc.
Here's part of how you might rewrite Dispatcher:
struct Dispatcher {
states: HashMap<Uid, Rc<State>>,
}
impl Dispatcher {
fn dispatch(&mut self, state_id: Uid, event: Event) {
if let Some(state_ref) = self.states.remove(&state_id) {
let state = state_ref.try_unwrap()
.expect("Unique strong reference required");
if let Some(next_state) = state.handle_event(event) {
self.states.insert(state_id, next_state);
}
} else {
// handle state_id not found
}
}
}
(Note: dispatch takes state_id by value. In the original version, this wasn't necessary -- it could have been changed to pass by reference. In this version, it is necessary, since state_id gets passed to HashMap::insert. It looks like Uid is Copy though, so it makes little difference.)
It's not clear whether state_id actually needs to be a member of the struct that implements State anymore, since you don't need it inside handle_event -- all the insertion and removal happens inside impl Dispatcher, which makes sense and reduces coupling between State and Dispatcher.
impl State for S0 {
fn handle_event(self, event: Event) -> Option<Rc<State>> {
if event == Event::SomeEvent {
// Do some work
let next_state = Rc::new(S0 {
state_id: self.state_id,
move_only_field: self.mof,
});
Some(next_state)
} else {
// Do some other work
}
}
}
Now you don't have to handle a weird, should-be-impossible corner case where the Option is None.
This almost feels like a sledgehammer and makes many other idioms impossible, for instance if I wanted to share self further in some registered closure/callback I would normally put a Weak<RefCell<>> previously but now sharing self in callbacks etc is impossible.
Because you can move out of an Rc if you have the only strong reference, you don't have to sacrifice this technique.
"Feels like a sledgehammer" might be subjective, but to me, what a signature like fn handle_event(mut self, ...) -> Option<Self> does is encode an invariant. With the original version, each impl State for ... had to know when to insert and remove itself from the dispatcher, and whether it did or not was uncheckable. For example, if somewhere deep in the logic you forgot to call dispatcher.insert(state_id, next_state), the state machine wouldn't transition, and might get stuck or worse. When handle_event takes self by-value, that's not possible anymore -- you have to return the next state, or the code simply won't compile.
(Aside: both the original version and mine do at least two hashtable lookups each time dispatch is called: once to get the current state, and again to insert the new state. If you wanted to get rid of the second lookup, you could combine approaches: store Option<Rc<State>> in the HashMap, and take from the Option instead of removing it from the map entirely.)

How do I handle errors from libc functions in an idiomatic Rust manner?

libc's error handling is usually to return something < 0 in case of an error. I find myself doing this over and over:
let pid = fork()
if pid < 0 {
// Please disregard the fact that `Err(pid)`
// should be a `&str` or an enum
return Err(pid);
}
I find it ugly that this needs 3 lines of error handling, especially considering that these tests are quite frequent in this kind of code.
Is there a way to return an Err in case fork() returns < 0?
I found two things which are close:
assert_eq!. This needs another line and it panics so the caller cannot handle the error.
Using traits like these:
pub trait LibcResult<T> {
fn to_option(&self) -> Option<T>;
}
impl LibcResult<i64> for i32 {
fn to_option(&self) -> Option<i64> {
if *self < 0 { None } else { Some(*self) }
}
}
I could write fork().to_option().expect("could not fork"). This is now only one line, but it panics instead of returning an Err. I guess this could be solved using ok_or.
Some functions of libc have < 0 as sentinel (e.g. fork), while others use > 0 (e.g. pthread_attr_init), so this would need another argument.
Is there something out there which solves this?
As indicated in the other answer, use pre-made wrappers whenever possible. Where such wrappers do not exist, the following guidelines might help.
Return Result to indicate errors
The idiomatic Rust return type that includes error information is Result (std::result::Result). For most functions from POSIX libc, the specialized type std::io::Result is a perfect fit because it uses std::io::Error to encode errors, and it includes all standard system errors represented by errno values. A good way to avoid repetition is using a utility function such as:
use std::io::{Result, Error};
fn check_err<T: Ord + Default>(num: T) -> Result<T> {
if num < T::default() {
return Err(Error::last_os_error());
}
Ok(num)
}
Wrapping fork() would look like this:
pub fn fork() -> Result<u32> {
check_err(unsafe { libc::fork() }).map(|pid| pid as u32)
}
The use of Result allows idiomatic usage such as:
let pid = fork()?; // ? means return if Err, unwrap if Ok
if pid == 0 {
// child
...
}
Restrict the return type
The function will be easier to use if the return type is modified so that only "possible" values are included. For example, if a function logically has no return value, but returns an int only to communicate the presence of error, the Rust wrapper should return nothing:
pub fn dup2(oldfd: i32, newfd: i32) -> Result<()> {
check_err(unsafe { libc::dup2(oldfd, newfd) })?;
Ok(())
}
Another example are functions that logically return an unsigned integer, such as a PID or a file descriptor, but still declare their result as signed to include the -1 error return value. In that case, consider returning an unsigned value in Rust, as in the fork() example above. nix takes this one step further by having fork() return Result<ForkResult>, where ForkResult is a real enum with methods such as is_child(), and from which the PID is extracted using pattern matching.
Use options and other enums
Rust has a rich type system that allows expressing things that have to be encoded as magic values in C. To return to the fork() example, that function returns 0 to indicate the child return. This would be naturally expressed with an Option and can be combined with the Result shown above:
pub fn fork() -> Result<Option<u32>> {
let pid = check_err(unsafe { libc::fork() })? as u32;
if pid != 0 {
Some(pid)
} else {
None
}
}
The user of this API would no longer need to compare with the magic value, but would use pattern matching, for example:
if let Some(child_pid) = fork()? {
// execute parent code
} else {
// execute child code
}
Return values instead of using output parameters
C often returns values using output parameters, pointer parameters into which the results are stored. This is either because the actual return value is reserved for the error indicator, or because more than one value needs to be returned, and returning structs was badly supported by historical C compilers.
In contrast, Rust's Result supports return value independent of error information, and has no problem whatsoever with returning multiple values. Multiple values returned as a tuple are much more ergonomic than output parameters because they can be used in expressions or captured using pattern matching.
Wrap system resources in owned objects
When returning handles to system resources, such as file descriptors or Windows handles, it good practice to return them wrapped in an object that implements Drop to release them. This will make it less likely that a user of the wrapper will make a mistake, and it makes the use of return values more idiomatic, removing the need for awkward invocations of close() and resource leaks coming from failing to do so.
Taking pipe() as an example:
use std::fs::File;
use std::os::unix::io::FromRawFd;
pub fn pipe() -> Result<(File, File)> {
let mut fds = [0 as libc::c_int; 2];
check_err(unsafe { libc::pipe(fds.as_mut_ptr()) })?;
Ok(unsafe { (File::from_raw_fd(fds[0]), File::from_raw_fd(fds[1])) })
}
// Usage:
// let (r, w) = pipe()?;
// ... use R and W as normal File object
This pipe() wrapper returns multiple values and uses a wrapper object to refer to a system resource. Also, it returns the File objects defined in the Rust standard library and accepted by Rust's IO layer.
The best option is to not reimplement the universe. Instead, use nix, which wraps everything for you and has done the hard work of converting all the error types and handling the sentinel values:
pub fn fork() -> Result<ForkResult>
Then just use normal error handling like try! or ?.
Of course, you could rewrite all of nix by converting your trait to returning Results and including the specific error codes and then use try! or ?, but why would you?
There's nothing magical in Rust that converts negative or positive numbers into a domain specific error type for you. The code you already have is the correct approach, once you've enhanced it to use a Result either by creating it directly or via something like ok_or.
An intermediate solution would be to reuse nix's Errno struct, perhaps with your own trait sugar on top.
so this would need another argument
I'd say it would be better to have different methods: one for negative sentinel values and one for positive sentinel values.

Iterating over a slice's values instead of references in Rust?

When looping over a slice of structs, the value I get is a reference (which is fine), however in some cases it's annoying to have to write var as (*var) in many places.
Is there a better way to avoid re-declaring the variable?
fn my_fn(slice: &[MyStruct]) {
for var in slice {
let var = *var; // <-- how to avoid this?
// Without the line above, errors in comments occur:
other_fn(var); // <-- expected struct `MyStruct`, found reference
if var != var.other {
// ^^ trait `&MyStruct: std::cmp::PartialEq<MyStruct>>` not satisfied
foo();
}
}
}
See: actual error output (more cryptic).
You can remove the reference by destructuring in the pattern:
// |
// v
for &var in slice {
other_fn(var);
}
However, this only works for Copy-types! If you have a type that doesn't implement Copy but does implement Clone, you could use the cloned() iterator adapter; see Chris Emerson's answer for more information.
In some cases you can iterate directly on values if you can consume the iterable, e.g. using Vec::into_iter().
With slices, you can use cloned or copied on the iterator:
fn main() {
let v = vec![1, 2, 3];
let slice = &v[..];
for u in slice.iter().cloned() {
let u: usize = u; // prove it's really usize, not &usize
println!("{}", u);
}
}
This relies on the item implementing Clone or Copy, but if it doesn't you probably do want references after all.