Whats the best way to write an iterator supporting multiple logic branches? - iterator

In Rust I've started writing iterators, converting them from code which took a callback function.
I ran into the problem where the code that used a callback in multiple branches of the function didn't convert so cleanly into a Rust iterator.
To give some pseudo-code.
// function using callbacks where the caller can exit at any time,
// can be used in a similar way to an iterator.
fn do_stuff(args, callback_fn(cb_args)) {
// define a, b, c... args
if callback_fn(a, b, 0) == false { return; }
for i in 0..n {
if callback_fn(c, d, i) == false { return; }
}
if callback_fn(e, f, -1) == false { return; }
}
Converting this to an iterator was rather awkward since I needed to store some state representing each branch.
impl Iterator for MyStruct {
fn next(&mut self) -> Option<MyResult> {
let out = match (self.state) {
0 => {
self.state += 1;
Some(MyResult(self.a, self.b, 0))
},
1 => {
self.i += 1;
if self.i == self.n {
self.state += 1;
}
Some(MyResult(self.c, self.d, self.i - 1))
},
2 => {
self.state += 1;
Some(MyResult(self.e, self.f, -1))
},
_ => {
None
},
}
return out;
}
// --- snip
With the example above, this is arguably acceptable, (if a little awkward). Consider cases with multiple for loops, variable scopes, where its much harder to track state.
While I didn't try these, I imagine there are some ways to achieve this which in most cases are less-then-ideal workarounds:
Using the callback version, building a vector, then iterating over it... (works but defeats the purpose of using an iterator, no way to early exit and avoid creating the entire data set for eg).
Writing an iterator which communicates with a thread that uses similar logic to the callback version.(while possible, the overhead of creating OS threads makes it a poor choice in many cases).
Besides the workarounds above:
Are there ways to write iterators like the example given, with less convoluted logic?Ideally more like the example that uses callbacks.
Otherwise are there other ways to handle this?
Or is this simply not supported in Rust?
Note, the same logic applies coming from Python generators (using yield instead of a callback, using callbacks as an example here since they're ubiquitous with first class functions).

Languages like C# and Python provide a way to generate iterators from methods written using a special yield keyword. As of Rust 1.11, there is no such feature in the language. However, such a feature is planned (see RFC) (indeed, yield is a reserved keyword!) and would likely work as in C# (i.e. the compiler would generate a struct with the necessary state and implementation for Iterator).
In the meantime, you could try Stateful, a project that attempts to provide this feature. (This blog post explains how Stateful works, and the challenges involved.)

Related

Using require() later in code and should one handle any exceptions thrown thereby

I have a kotlin class with a method
loadElements(e: Iterable<Int>) {
}
This then constructs a new copy of that Iterable as an ArrayList<Int> within the object.
It is a requirement that all the elements in that ArrayList<Int> be non-negative. It is considered a breach of contract by the caller if that is not met. I've been led to believe that "breach of contract" is something to be tested by require(), whereas check() is for testing logic internal to that method. Is this correct ?
All the examples I have seen, have the require() as the very first lines of code within the method. Is it, however, acceptable to run require() in a loop, like this ?
public fun loadElements(e: Iterable<Int>) {
elementArray.clear()
e.forEach {
require(it>=0)
elementArray.add(it)
moduleCount += it
}
if (elementCount %2 == 1)
elementArray.add(0)
check(elementCount %2 == 0)
computeInternalSizes()
}
Thing is, this means that part of the object's internals may already be set-up by the time the require() breach is detected: i.e., moduleCount will be wrong and computeInternalSizes() will never get called.
Now, of course I could just use a separate pass, with the first one checking for the require() condition, and then doing all the real work thereafter. This would mean that if the input came in as a Sequence<Int>, it would be forced to be terminal and multi-iterable.
If the require() throws, I would like to assume that the program cannot continue because a design error has occurred somewhere. However, if someone traps the resultant exception, and continues, I will end-up with an incoherent object state.
What is best practice for handling conditions where incoming parameter breaches won't be noticed until some significant unrewindable work has been done ?
I tried using a separate pass for checking for non-negativity. This worked perfectly well but, given that it could be coming from a Sequence or similar, I don't want to have to build the whole sequence, and then traverse that sequence again.
I tried using check(). This works, but it just shows up as an inconsistency in object state, rather than flagging up the incoming parameter validation, which is making a breach of contract look like an internal design fault, and just delaying the inevitable.
I've tried putting try/catch/finally all over the place, but this is an excessive amount of code for such a simple thing.
I'm not even sure if a program should attempt recovery if a require() fails.
In general you avoid situations like this, by reducing the scope of mutability in your code.
The difference between require and check is mostly a convention. They throw different Exceptions, namely IllegalArgumentException and IllegalStateException respectively. As the type of the Exceptions suggest, former is suited for validating the (user) input to a method whereas the latter is designed to check intermediate states during the runtime.
Exceptions in Kotlin should be handled as such, being an Exception that should not occur regularly. See also the Kotlin documentation why there are no checked exceptions in Kotlin.
You did not write the name of your surrounding Kotlin class, thus I'll call it Foo for the time being.
Rather than providing a function on Foo, that mutates the internal state of Foo, you could create new instances of Foo based on the Iterable<Int> / Sequence<Int>. This way, you only ever have an Foo object when its in a valid state.
private class Foo(source: Iterable<Int>) {
private val elementArray = ArrayList<Int>()
private val moduleCount: Int
init {
var internalCount = 0
for (count in source) {
require(count > 0)
elementArray.add(count)
internalCount += count
}
moduleCount = internalCount
if (elementArray.size % 2 == 1) {
elementArray.add(0)
}
check(elementArray.size % 2 == 0)
// ...
}
}
Alternatively, if you want / need to keep the interface as described in your question but also avoid the invalid state, you could make use of an intermediate copy.
As you're copying the incoming Iterable<Int> / Sequence<Int> into an ArrayList<Int> I assume you're not working with very large collections.
private class Foo(source: Iterable<Int>) {
private val elementArray = ArrayList<Int>()
private var moduleCount = 0
public fun loadElements(source: Iterable<Int>) {
val internalCopy = ArrayList<Int>()
for (count in source) {
require(count >= 0)
internalCopy.add(count)
}
elementArray.clear()
for (count in internalCopy) {
elementArray.add(count)
moduleCount += count
}
if (elementArray.size % 2 == 1) {
elementArray.add(0)
}
check(elementArray.size % 2 == 0)
// ...
}
}

How do I handle errors from libc functions in an idiomatic Rust manner?

libc's error handling is usually to return something < 0 in case of an error. I find myself doing this over and over:
let pid = fork()
if pid < 0 {
// Please disregard the fact that `Err(pid)`
// should be a `&str` or an enum
return Err(pid);
}
I find it ugly that this needs 3 lines of error handling, especially considering that these tests are quite frequent in this kind of code.
Is there a way to return an Err in case fork() returns < 0?
I found two things which are close:
assert_eq!. This needs another line and it panics so the caller cannot handle the error.
Using traits like these:
pub trait LibcResult<T> {
fn to_option(&self) -> Option<T>;
}
impl LibcResult<i64> for i32 {
fn to_option(&self) -> Option<i64> {
if *self < 0 { None } else { Some(*self) }
}
}
I could write fork().to_option().expect("could not fork"). This is now only one line, but it panics instead of returning an Err. I guess this could be solved using ok_or.
Some functions of libc have < 0 as sentinel (e.g. fork), while others use > 0 (e.g. pthread_attr_init), so this would need another argument.
Is there something out there which solves this?
As indicated in the other answer, use pre-made wrappers whenever possible. Where such wrappers do not exist, the following guidelines might help.
Return Result to indicate errors
The idiomatic Rust return type that includes error information is Result (std::result::Result). For most functions from POSIX libc, the specialized type std::io::Result is a perfect fit because it uses std::io::Error to encode errors, and it includes all standard system errors represented by errno values. A good way to avoid repetition is using a utility function such as:
use std::io::{Result, Error};
fn check_err<T: Ord + Default>(num: T) -> Result<T> {
if num < T::default() {
return Err(Error::last_os_error());
}
Ok(num)
}
Wrapping fork() would look like this:
pub fn fork() -> Result<u32> {
check_err(unsafe { libc::fork() }).map(|pid| pid as u32)
}
The use of Result allows idiomatic usage such as:
let pid = fork()?; // ? means return if Err, unwrap if Ok
if pid == 0 {
// child
...
}
Restrict the return type
The function will be easier to use if the return type is modified so that only "possible" values are included. For example, if a function logically has no return value, but returns an int only to communicate the presence of error, the Rust wrapper should return nothing:
pub fn dup2(oldfd: i32, newfd: i32) -> Result<()> {
check_err(unsafe { libc::dup2(oldfd, newfd) })?;
Ok(())
}
Another example are functions that logically return an unsigned integer, such as a PID or a file descriptor, but still declare their result as signed to include the -1 error return value. In that case, consider returning an unsigned value in Rust, as in the fork() example above. nix takes this one step further by having fork() return Result<ForkResult>, where ForkResult is a real enum with methods such as is_child(), and from which the PID is extracted using pattern matching.
Use options and other enums
Rust has a rich type system that allows expressing things that have to be encoded as magic values in C. To return to the fork() example, that function returns 0 to indicate the child return. This would be naturally expressed with an Option and can be combined with the Result shown above:
pub fn fork() -> Result<Option<u32>> {
let pid = check_err(unsafe { libc::fork() })? as u32;
if pid != 0 {
Some(pid)
} else {
None
}
}
The user of this API would no longer need to compare with the magic value, but would use pattern matching, for example:
if let Some(child_pid) = fork()? {
// execute parent code
} else {
// execute child code
}
Return values instead of using output parameters
C often returns values using output parameters, pointer parameters into which the results are stored. This is either because the actual return value is reserved for the error indicator, or because more than one value needs to be returned, and returning structs was badly supported by historical C compilers.
In contrast, Rust's Result supports return value independent of error information, and has no problem whatsoever with returning multiple values. Multiple values returned as a tuple are much more ergonomic than output parameters because they can be used in expressions or captured using pattern matching.
Wrap system resources in owned objects
When returning handles to system resources, such as file descriptors or Windows handles, it good practice to return them wrapped in an object that implements Drop to release them. This will make it less likely that a user of the wrapper will make a mistake, and it makes the use of return values more idiomatic, removing the need for awkward invocations of close() and resource leaks coming from failing to do so.
Taking pipe() as an example:
use std::fs::File;
use std::os::unix::io::FromRawFd;
pub fn pipe() -> Result<(File, File)> {
let mut fds = [0 as libc::c_int; 2];
check_err(unsafe { libc::pipe(fds.as_mut_ptr()) })?;
Ok(unsafe { (File::from_raw_fd(fds[0]), File::from_raw_fd(fds[1])) })
}
// Usage:
// let (r, w) = pipe()?;
// ... use R and W as normal File object
This pipe() wrapper returns multiple values and uses a wrapper object to refer to a system resource. Also, it returns the File objects defined in the Rust standard library and accepted by Rust's IO layer.
The best option is to not reimplement the universe. Instead, use nix, which wraps everything for you and has done the hard work of converting all the error types and handling the sentinel values:
pub fn fork() -> Result<ForkResult>
Then just use normal error handling like try! or ?.
Of course, you could rewrite all of nix by converting your trait to returning Results and including the specific error codes and then use try! or ?, but why would you?
There's nothing magical in Rust that converts negative or positive numbers into a domain specific error type for you. The code you already have is the correct approach, once you've enhanced it to use a Result either by creating it directly or via something like ok_or.
An intermediate solution would be to reuse nix's Errno struct, perhaps with your own trait sugar on top.
so this would need another argument
I'd say it would be better to have different methods: one for negative sentinel values and one for positive sentinel values.

Whats the best way to count indirect function calls conditionally using cfg attributes?

When writing a test, I would like to know how many times a function is called, since bad logic may yield a correct result even when excessive and unnecessary function calls are performed.
To give some context, this is a tree-search function running a test on a fixed data set, however that isn't important to the answer.
I'm currently using a static mutable variable, however this means every access needs to be marked as unsafe:
#[cfg(test)]
static mut total_calls: usize = 0;
fn function_to_count() {
#[cfg(test)]
unsafe {
total_calls += 1;
}
// do stuff
}
#[test]
fn some_test() {
// do stuff, indirectly call function_to_count().
assert!(total_calls < 100);
}
It would be good to avoid having to put unsafe into the code.
Is there a better way to count indirect function calls in Rust?
Mutable statics are unsafe because they're global, and could be accessed from any thread at any time. The simplest solution is to change the definition of the function in question to take some kind of "counter" interface that keeps track of calls. You can avoid performance problems by using generics plus a "dummy" implementation that does nothing.
// Use a callable because I'm feeling lazy.
fn function_to_count<Count: FnMut()>(count: &mut Count) {
count();
// ...
}
#[cfg(test)]
#[test]
fn some_test() {
let mut count = 0;
for _ in 0..10 {
function_to_count(&mut || count += 1);
}
assert_eq!(count, 10);
}
You should really, seriously do that, and not what I'm about to describe:
The other solution is to use a thread-safe construct.
A word of warning: do not use this if you have more than one test! The test runner will, by default, run tests in parallel. As such, if you have more than one test calling into the instrumented function, you will get corrupt results. You'd have to write some kind of exclusive locking mechanism and somehow teach the function to "know" which run it's a part of, and at that point, you should just use the previously described solution instead. You could also disable parallel tests, but I believe you can only do that from outside the code, and that's just asking for someone to forget and run into weird failures as a result.
But anyway...
use std::sync::atomic::{ATOMIC_USIZE_INIT, AtomicUsize, Ordering};
#[cfg(test)]
static TOTAL_CALLS: AtomicUsize = ATOMIC_USIZE_INIT;
fn function_to_count() {
if cfg!(test) {
TOTAL_CALLS.fetch_add(1, Ordering::SeqCst);
}
// ...
}
#[cfg(test)]
#[test]
fn some_test() {
for _ in 0..10 {
function_to_count();
}
assert_eq!(TOTAL_CALLS.load(Ordering::SeqCst), 10);
}

Why do generators not support map()?

It seems utterly natural to me that generators, which function very much like Arrays, should support the very basic list operations, like map(), filter(), and reduce(). Am I missing something?
I wrote the code for map and it seems simple enough, but it would be much better to have all the functions embedded in all the generators:
let fancyGen = g => {
let rv = function*() {
for (let x of g)
yield x;
}
rv.map = function*(p) {
for (let x of g)
yield p(x);
}
return rv;
}
I'm new to generators, so any comments on the code are welcome. In particular, is that the best way to write "the identity generator"?
Why do generators not support map()?
Because it's too easy to fill in as a userland implementation. ES3 didn't include Array iteration methods either, maybe will see transformers for iterators in ES7 :-)
generators, which function very much like Arrays
No, please stop and distinguish iterators from generators:
An iterator is an object with a .next() method that conforms to the iterator protocol.
A generator is an iterator created by a generator function (function*). Its .next() method takes an argument which is the result of each yield inside the generator function. It also has .return() and .throw() methods.
You'll mostly be interested in iterators, where we don't pass values to next, and don't care about the end result - just like for of loops do. We can extend them with the desired methods easily:
var IteratorPrototype = Object.getPrototypeOf(Object.getPrototypeOf([][Symbol.iterator]()));
IteratorPrototype.map = function*(f) {
for (var x of this)
yield f(x);
};
IteratorPrototype.filter = function*(p) {
for (var x of this)
if (p(x))
yield x;
};
IteratorPrototype.scan = function*(f, acc) {
for (var x of this)
yield acc = f(acc, x);
return acc;
};
IteratorPrototype.reduce = function(f, acc) {
for (var x of this)
acc = f(acc, x);
return acc;
};
These should suffice for the start, and most common use cases. A proper library will extend this to generators so that values are passed through appropriately, and also will deal with the problem that iterators can be used only once before they are exhausted (in contrast to arrays).

How do I exit a Rust program early from outside the main function?

I am in the process of writing a bash clone in Rust. I need to have my program exit when the user types exit. In previous iterations of my program, before I added more complicated features, I used return to get out of the loop that was prompting the user for input. This logic is now in a function, because of the way I am implementing built in shell functions, so when I return it just jumps out of the function back into the control loop, instead of short-circuiting the control loop and ending the program.
I realize that I could probably return a boolean when the user types exit and exit the loop, but I would like to at least know if Rust has a way to terminate programs early, similar to Java's System.exit(), as this is useful for certain types of programs.
Rust 1.0 stable
std::process::exit() does exactly that - it terminates the program with the specified exit code:
use std::process;
fn main() {
for i in 0..10 {
if i == 5 {
process::exit(1);
}
println!("{}", i);
}
}
This function causes the program to terminate immediately, without unwinding and running destructors, so it should be used sparingly.
Alternative (not recommended) solution
You can use C API directly. Add libc = "0.2" to Cargo.toml, and:
fn main() {
for i in 0..10 {
if i == 5 {
unsafe { libc::exit(1); }
}
println!("{}", i);
}
}
Calling C functions cannot be verified by the Rust compiler, so this requires the unsafe block. Resources used by the program will not be freed properly. This may cause problems such as hanging sockets.
As far as I understand, the proper way to exit from the program is to terminate all threads somehow, then the process will exit automatically.
panic!("Oh no something bad has happened!")
Example:
if a * g < 0f32 { panic!("The arithmetric-geometric mean is undefined for numbers less than zero!"); }
In older documentation, you will see this as fail!("Oh no something bad here has happened.")
For some reason, this macro was changed from fail to panic. Panic is the way to fail, if you must.
[edit] I am sorry. It looks like you should be testing input for the string "exit," which would depend on how you are taking input (by line or by args). Then you can have the program break out of the loop on the condition that the exit is detected.
Example:
loop {
if exit_found { break }
else {
// your thing, which also looks for exit_found
}
}