This question already has answers here:
How to satisfy the Iterator trait bound in order to use Rayon here?
(1 answer)
Is there a way to directly consume a Rayon chain without collecting it first?
(1 answer)
Closed 3 years ago.
I'm trying process a file line by line in Rust and parallelise this using Rayon. It complains of the following
rayon::str::Lines<'_>` is not an iterator
= help: the trait `std::iter::Iterator` is not implemented for
= note: required by `std::iter::IntoIterator::into_iter`
This is what the code looks like so far
use rayon::prelude::*;
use std::fs;
fn main() {
let file_content = match fs::read_to_string("input.txt") {
Ok(s) => s,
Err(e) => {
eprintln!("Error: {}", e);
std::process::exit(1);
}
};
file_content = 5;
for line in file_content.par_lines() {
println!("{}", line);
}
std::process::exit(0);
}
Am I missing a trait definition? How could I fix this error?
You can't use parallel iterators with the (non-parallel) for loop.
Instead, use .for_each(|| …) callback on the parallel iterator.
Alternatively, first call .collect::<Vec<_>>(), and then non-parallel for.
For more advanced cases you can also send results to a channel in parallel, and then use non-parallel for to read from the channel.
Related
In Rust the main function is defined like this:
fn main() {
}
This function does not allow for a return value though. Why would a language not allow for a return value and is there a way to return something anyway? Would I be able to safely use the C exit(int) function, or will this cause leaks and whatnot?
As of Rust 1.26, main can return a Result:
use std::fs::File;
fn main() -> Result<(), std::io::Error> {
let f = File::open("bar.txt")?;
Ok(())
}
The returned error code in this case is 1 in case of an error. With File::open("bar.txt").expect("file not found"); instead, an error value of 101 is returned (at least on my machine).
Also, if you want to return a more generic error, use:
use std::error::Error;
...
fn main() -> Result<(), Box<dyn Error>> {
...
}
std::process::exit(code: i32) is the way to exit with a code.
Rust does it this way so that there is a consistent explicit interface for returning a value from a program, wherever it is set from. If main starts a series of tasks then any of these can set the return value, even if main has exited.
Rust does have a way to write a main function that returns a value, however it is normally abstracted within stdlib. See the documentation on writing an executable without stdlib for details.
As was noted by others, std::process::exit(code: i32) is the way to go here
More information about why is given in RFC 1011: Process Exit. Discussion about the RFC is in the pull request of the RFC.
The reddit thread on this has a "why" explanation:
Rust certainly could be designed to do this. It used to, in fact.
But because of the task model Rust uses, the fn main task could start a bunch of other tasks and then exit! But one of those other tasks may want to set the OS exit code after main has gone away.
Calling set_exit_status is explicit, easy, and doesn't require you to always put a 0 at the bottom of main when you otherwise don't care.
Try:
use std::process::ExitCode;
fn main() -> ExitCode {
ExitCode::from(2)
}
Take a look in doc
or:
use std::process::{ExitCode, Termination};
pub enum LinuxExitCode { E_OK, E_ERR(u8) }
impl Termination for LinuxExitCode {
fn report(self) -> ExitCode {
match self {
LinuxExitCode::E_OK => ExitCode::SUCCESS,
LinuxExitCode::E_ERR(v) => ExitCode::from(v)
}
}
}
fn main() -> LinuxExitCode {
LinuxExitCode::E_ERR(3)
}
You can set the return value with std::os::set_exit_status.
This question already has answers here:
How to mock specific methods but not all of them in Rust?
(2 answers)
How to mock external dependencies in tests? [duplicate]
(1 answer)
How can I test stdin and stdout?
(1 answer)
Is there a way of detecting whether code is being called from tests in Rust?
(1 answer)
What is the proper way to use the `cfg!` macro to choose between multiple implementations?
(1 answer)
Closed 3 years ago.
I have a function generates a salted hash digest for some data. For the salt, it uses a random u32 value. It looks something like this:
use rand::RngCore;
use std::collections::hash_map::DefaultHasher;
use std::hash::Hasher;
fn hash(msg: &str) -> String {
let salt = rand::thread_rng().next_u32();
let mut s = DefaultHasher::new();
s.write_u32(salt);
s.write(msg.as_bytes());
format!("{:x}{:x}", &salt, s.finish())
}
In a test, I'd like to validate that it produces expected values, given a known salt and string. How do I mock (swizzle?) rand::thread_rng().next_u32() in the test to generate a specific value? In other words, what could replace the comment in this example to make the test pass?
mod tests {
#[test]
fn test_hashes() {
// XXX How to mock ThreadRng::next_u32() to return 3892864592?
assert_eq!(hash("foo"), "e80866501cdda8af09a0a656");
}
}
Some approaches I've looked at:
I'm aware that the ThreadRng returned by rand::thread_rng() implements RngCore, so in theory I could set a variable somewhere to store a reference to a RngCore, and implement my own mocked variant to set during testing. I've taken this sort of approach in Go and Java, but I couldn't get the Rust type checker to allow it.
I looked at the list of mock frameworks, such as MockAll, but they appear to be designed to mock a struct or trait to pass to a method, and this code doesn't pass one, and I wouldn't necessarily want users of the library to be able to pass in a RngCore.
Use the #[cfg(test)] macro to call a different function specified in the tests module, then have that function read the value to return from elsewhere. This I got to work, but had to use an unsafe mutable static variable to set the value for the mocked method to find, which seems gross. Is there a better way?
As a reference, I'll post an answer using the #[cfg(test)] + unsafe mutable static variable technique, but hope there's a more straightforward way to do this sort of thing.
In the test module, use lazy-static to add a static variable with a Mutex for thread safety, create a function like next_u32() to return its value, and have tests set the static variable to a known value. It should fall back on returning a properly random number if it's not set, so here I've made it Vec<u32> so it can tell:
mod tests {
use super::*;
use lazy_static::lazy_static;
use std::sync::Mutex;
lazy_static! {
static ref MOCK_SALT: Mutex<Vec<u32>> = Mutex::new(vec![]);
}
// Replaces random salt generation when testing.
pub fn mock_salt() -> u32 {
let mut sd = MOCK_SALT.lock().unwrap();
if sd.is_empty() {
rand::thread_rng().next_u32()
} else {
let ret = sd[0];
sd.clear();
ret
}
}
#[test]
fn test_hashes() {
MOCK_SALT.lock().unwrap().push(3892864592);
assert_eq!(hash("foo"), "e80866501cdda8af09a0a656");
}
}
Then modify hash() to call tests::mock_salt() instead of rand::thread_rng().next_u32() when testing (the first three lines of the function body are new):
fn hash(msg: &str) -> String {
#[cfg(test)]
let salt = tests::mock_salt();
#[cfg(not(test))]
let salt = rand::thread_rng().next_u32();
let mut s = DefaultHasher::new();
s.write_u32(salt);
s.write(msg.as_bytes());
format!("{:x}{:x}", &salt, s.finish())
}
Then use of the macros allows Rust to determine, at compile time, which function to call, so there's no loss of efficiency in non-test builds. It does mean that there's some knowledge of the tests module in the source code, but it's not included in the binary, so should be relatively safe. I suppose there could be a custom derive macro to automate this somehow. Something like:
#[mock(rand::thread_rng().next_u32())]
let salt = rand::thread_rng().next_u32();
Would auto-generate the mocked method in the tests module (or elsewhere?), slot it in here, and provide functions for the tests to set the value --- only when testing, of course. Seems like a lot, though.
Playground.
I'm learning Rust and trying to solve some basic algorithm problems with it. In many cases, I want to read lines from stdin, perform some transformation on each line and return a vector of resulting items. One way I did this was like this:
// Fully working Rust code
let my_values: Vec<u32> = stdin
.lock()
.lines()
.filter_map(Result::ok)
.map(|line| line.parse::<u32>())
.filter_map(Result::ok)
.map(|x|x*2) // For example
.collect();
This works but of course silently ignores any errors that may occur. Now what I woud like to do is something along the lines of:
// Pseudo-ish code
let my_values: Result<Vec<u32>, X> = stdin
.lock()
.lines() // Can cause std::io::Error
.map(|line| line.parse::<u32>()) // Can cause std::num::ParseIntError
.map(|x| x*2)
.collect();
Where X is some kind of error type that I can match on afterwards. Preferably I want to perform the whole operation on one line at a time and immediately discard the string data after it has been parsed to an int.
I think I need to create some kind of Enum type to hold the various possible errors, possibly like this:
#[derive(Debug)]
enum InputError {
Io(std::io::Error),
Parse(std::num::ParseIntError),
}
However, I don't quite understand how to put everything together to make it clean and avoid having to explicitly match and cast everywhere. Also, is there some way to automatically create these enum error types or do I have to explicilty enumerate them every time I do this?
You're on the right track.
The way I'd approach this is by using the enum you've defined,
then add implementations of From for the error types you're interested in.
That will allow you to use the ? operator on your maps to get the kind of behaviour you want.
#[derive(Debug)]
enum MyError {
IOError(std::io::Error),
ParseIntError(std::num::ParseIntError),
}
impl From<std::io::Error> for MyError {
fn from(e:std::io::Error) -> MyError {
return MyError::IOError(e)
}
}
impl From<std::num::ParseIntError> for MyError {
fn from(e:std::num::ParseIntError) -> MyError {
return MyError::ParseIntError(e)
}
}
Then you can implement the actual transform as either
let my_values: Vec<_> = stdin
.lock()
.lines()
.map(|line| -> Result<u32,MyError> { Ok(line?.parse::<u32>()?*2) } )
.collect();
which will give you one entry for each input, like: {Ok(x), Err(MyError(x)), Ok(x)}.
or you can do:
let my_values: Result<Vec<_>,MyError> = stdin
.lock()
.lines()
.map(|line| -> Result<u32,MyError> { Ok(line?.parse::<u32>()?*2) } )
.collect();
Which will give you either Err(MyError(...)) or Ok([1,2,3])
Note that you can further reduce some of the error boilerplate by using an error handling crate like snafu, but in this case it's not too much.
This question already has answers here:
How do I collect into an array?
(10 answers)
Closed 5 years ago.
I'm implementing FromIterator for [MyStruct;4] where MyStruct is a small Copy struct. My current implementation is
fn from_iter<I: IntoIterator<Item=MyStruct>>(iter: I) -> Self {
let mut retval = [Default::default();4];
for (ret, src) in retval.iter_mut().zip(iter) {
*ret = src;
}
retval
}
This works just fine, however I'm not sure that the for loop is as idiomatic as it could be. Is there perhaps a method like Slice::fill(iter) that could accomplish this more cleanly (and perhaps more efficiently)?
Loops are OK and they generally optimize very well.
Another solution may be to collect() into an ArrayVec. It avoids having to fill the array with a default value first.
In Rust I've started writing iterators, converting them from code which took a callback function.
I ran into the problem where the code that used a callback in multiple branches of the function didn't convert so cleanly into a Rust iterator.
To give some pseudo-code.
// function using callbacks where the caller can exit at any time,
// can be used in a similar way to an iterator.
fn do_stuff(args, callback_fn(cb_args)) {
// define a, b, c... args
if callback_fn(a, b, 0) == false { return; }
for i in 0..n {
if callback_fn(c, d, i) == false { return; }
}
if callback_fn(e, f, -1) == false { return; }
}
Converting this to an iterator was rather awkward since I needed to store some state representing each branch.
impl Iterator for MyStruct {
fn next(&mut self) -> Option<MyResult> {
let out = match (self.state) {
0 => {
self.state += 1;
Some(MyResult(self.a, self.b, 0))
},
1 => {
self.i += 1;
if self.i == self.n {
self.state += 1;
}
Some(MyResult(self.c, self.d, self.i - 1))
},
2 => {
self.state += 1;
Some(MyResult(self.e, self.f, -1))
},
_ => {
None
},
}
return out;
}
// --- snip
With the example above, this is arguably acceptable, (if a little awkward). Consider cases with multiple for loops, variable scopes, where its much harder to track state.
While I didn't try these, I imagine there are some ways to achieve this which in most cases are less-then-ideal workarounds:
Using the callback version, building a vector, then iterating over it... (works but defeats the purpose of using an iterator, no way to early exit and avoid creating the entire data set for eg).
Writing an iterator which communicates with a thread that uses similar logic to the callback version.(while possible, the overhead of creating OS threads makes it a poor choice in many cases).
Besides the workarounds above:
Are there ways to write iterators like the example given, with less convoluted logic?Ideally more like the example that uses callbacks.
Otherwise are there other ways to handle this?
Or is this simply not supported in Rust?
Note, the same logic applies coming from Python generators (using yield instead of a callback, using callbacks as an example here since they're ubiquitous with first class functions).
Languages like C# and Python provide a way to generate iterators from methods written using a special yield keyword. As of Rust 1.11, there is no such feature in the language. However, such a feature is planned (see RFC) (indeed, yield is a reserved keyword!) and would likely work as in C# (i.e. the compiler would generate a struct with the necessary state and implementation for Iterator).
In the meantime, you could try Stateful, a project that attempts to provide this feature. (This blog post explains how Stateful works, and the challenges involved.)