Does Rust optimize for loops over calculated ranges? - optimization

As an exercise I'm trying to micro-optimize code in Rust 1.3.0. I have a loop of a loop over an array. Something like this:
loop {
for i in 0..arr.len() {
// something happens here
}
}
Since arrays are fixed size in Rust, will the compiler optimize the code by evaluating arr.len() just once and reusing the value, or will the expression be evaluated with each pass of the top-level loop? The question can be expanded to more calculation-heavy functions without side-effects, other than arr.len().
In other words, would the above code be equivalent to this:
let arr_len = arr.len();
loop {
for i in 0..arr_len {
// something happens here
}
}

The .. is a range operator, which forms a Range<Idx> object (or a derivative: RangeFrom, RangeFull or RangeTo). Those objects only contain indexes (the Idx type), so you can rest assured that .len() is only evaluated once.
In general, it is a good idea to inspect the LLVM IR. If you have a synthetic example, you can use the playground easily enough. For example:
// A black-box prevents optimization, and its calls are easy to spot.
extern {
fn doit(i: i32) -> ();
}
fn main() {
let arr = [1, 2, 3, 4, 5];
for i in 0..arr.len() {
unsafe { doit(arr[i]); }
}
}
Yields the following function:
; Function Attrs: uwtable
define internal void #_ZN4main20hd87dea49c835fe43laaE() unnamed_addr #1 {
entry-block:
tail call void #doit(i32 1)
tail call void #doit(i32 2)
tail call void #doit(i32 3)
tail call void #doit(i32 4)
tail call void #doit(i32 5)
ret void
}
In this case, with a fixed length, there is no loop at all: it has been unrolled.

At least in a quick check with using arr.len() nested inside another loop, no code seems to be generated for the "call" to arr.len() at all. In the generated code, the size of the array is simply hard-coded into the output.
In other words, I would not expect your second snippet to execute any faster than the first one.

Related

Iterating over a slice's values instead of references in Rust?

When looping over a slice of structs, the value I get is a reference (which is fine), however in some cases it's annoying to have to write var as (*var) in many places.
Is there a better way to avoid re-declaring the variable?
fn my_fn(slice: &[MyStruct]) {
for var in slice {
let var = *var; // <-- how to avoid this?
// Without the line above, errors in comments occur:
other_fn(var); // <-- expected struct `MyStruct`, found reference
if var != var.other {
// ^^ trait `&MyStruct: std::cmp::PartialEq<MyStruct>>` not satisfied
foo();
}
}
}
See: actual error output (more cryptic).
You can remove the reference by destructuring in the pattern:
// |
// v
for &var in slice {
other_fn(var);
}
However, this only works for Copy-types! If you have a type that doesn't implement Copy but does implement Clone, you could use the cloned() iterator adapter; see Chris Emerson's answer for more information.
In some cases you can iterate directly on values if you can consume the iterable, e.g. using Vec::into_iter().
With slices, you can use cloned or copied on the iterator:
fn main() {
let v = vec![1, 2, 3];
let slice = &v[..];
for u in slice.iter().cloned() {
let u: usize = u; // prove it's really usize, not &usize
println!("{}", u);
}
}
This relies on the item implementing Clone or Copy, but if it doesn't you probably do want references after all.

Whats the best way to write an iterator supporting multiple logic branches?

In Rust I've started writing iterators, converting them from code which took a callback function.
I ran into the problem where the code that used a callback in multiple branches of the function didn't convert so cleanly into a Rust iterator.
To give some pseudo-code.
// function using callbacks where the caller can exit at any time,
// can be used in a similar way to an iterator.
fn do_stuff(args, callback_fn(cb_args)) {
// define a, b, c... args
if callback_fn(a, b, 0) == false { return; }
for i in 0..n {
if callback_fn(c, d, i) == false { return; }
}
if callback_fn(e, f, -1) == false { return; }
}
Converting this to an iterator was rather awkward since I needed to store some state representing each branch.
impl Iterator for MyStruct {
fn next(&mut self) -> Option<MyResult> {
let out = match (self.state) {
0 => {
self.state += 1;
Some(MyResult(self.a, self.b, 0))
},
1 => {
self.i += 1;
if self.i == self.n {
self.state += 1;
}
Some(MyResult(self.c, self.d, self.i - 1))
},
2 => {
self.state += 1;
Some(MyResult(self.e, self.f, -1))
},
_ => {
None
},
}
return out;
}
// --- snip
With the example above, this is arguably acceptable, (if a little awkward). Consider cases with multiple for loops, variable scopes, where its much harder to track state.
While I didn't try these, I imagine there are some ways to achieve this which in most cases are less-then-ideal workarounds:
Using the callback version, building a vector, then iterating over it... (works but defeats the purpose of using an iterator, no way to early exit and avoid creating the entire data set for eg).
Writing an iterator which communicates with a thread that uses similar logic to the callback version.(while possible, the overhead of creating OS threads makes it a poor choice in many cases).
Besides the workarounds above:
Are there ways to write iterators like the example given, with less convoluted logic?Ideally more like the example that uses callbacks.
Otherwise are there other ways to handle this?
Or is this simply not supported in Rust?
Note, the same logic applies coming from Python generators (using yield instead of a callback, using callbacks as an example here since they're ubiquitous with first class functions).
Languages like C# and Python provide a way to generate iterators from methods written using a special yield keyword. As of Rust 1.11, there is no such feature in the language. However, such a feature is planned (see RFC) (indeed, yield is a reserved keyword!) and would likely work as in C# (i.e. the compiler would generate a struct with the necessary state and implementation for Iterator).
In the meantime, you could try Stateful, a project that attempts to provide this feature. (This blog post explains how Stateful works, and the challenges involved.)

Whats the best way to count indirect function calls conditionally using cfg attributes?

When writing a test, I would like to know how many times a function is called, since bad logic may yield a correct result even when excessive and unnecessary function calls are performed.
To give some context, this is a tree-search function running a test on a fixed data set, however that isn't important to the answer.
I'm currently using a static mutable variable, however this means every access needs to be marked as unsafe:
#[cfg(test)]
static mut total_calls: usize = 0;
fn function_to_count() {
#[cfg(test)]
unsafe {
total_calls += 1;
}
// do stuff
}
#[test]
fn some_test() {
// do stuff, indirectly call function_to_count().
assert!(total_calls < 100);
}
It would be good to avoid having to put unsafe into the code.
Is there a better way to count indirect function calls in Rust?
Mutable statics are unsafe because they're global, and could be accessed from any thread at any time. The simplest solution is to change the definition of the function in question to take some kind of "counter" interface that keeps track of calls. You can avoid performance problems by using generics plus a "dummy" implementation that does nothing.
// Use a callable because I'm feeling lazy.
fn function_to_count<Count: FnMut()>(count: &mut Count) {
count();
// ...
}
#[cfg(test)]
#[test]
fn some_test() {
let mut count = 0;
for _ in 0..10 {
function_to_count(&mut || count += 1);
}
assert_eq!(count, 10);
}
You should really, seriously do that, and not what I'm about to describe:
The other solution is to use a thread-safe construct.
A word of warning: do not use this if you have more than one test! The test runner will, by default, run tests in parallel. As such, if you have more than one test calling into the instrumented function, you will get corrupt results. You'd have to write some kind of exclusive locking mechanism and somehow teach the function to "know" which run it's a part of, and at that point, you should just use the previously described solution instead. You could also disable parallel tests, but I believe you can only do that from outside the code, and that's just asking for someone to forget and run into weird failures as a result.
But anyway...
use std::sync::atomic::{ATOMIC_USIZE_INIT, AtomicUsize, Ordering};
#[cfg(test)]
static TOTAL_CALLS: AtomicUsize = ATOMIC_USIZE_INIT;
fn function_to_count() {
if cfg!(test) {
TOTAL_CALLS.fetch_add(1, Ordering::SeqCst);
}
// ...
}
#[cfg(test)]
#[test]
fn some_test() {
for _ in 0..10 {
function_to_count();
}
assert_eq!(TOTAL_CALLS.load(Ordering::SeqCst), 10);
}

Iterate over dlang struct

I have a struct that looks something like this:
struct MultipartMessage {
ubyte[] mime, data;
Header header;
void setSender(string sender) {
header.sender = sender;
}
void setId(int id) {
header.id = id;
}
}
and I would like to iterate over it, in another class with something like this:
struct Socket {
...
void send(MultipartMessage msg) {
foreach (part; msg) {
sendPart(part);
}
}
...
}
Is this possible? I'd like to use something analogous to Python's __iter__ in the MultipartMessage that can return the fields in a specific order, and ideally even run some additional code, like header.serialize().
Ideally I would add a function to MultipartMessage that would look something like this (pseudocode):
ubyte[] __iter__() {
yield mime;
yield data;
yield header.serialize(); //header.serialize returns a ubyte[]
}
Use tupleof:
foreach (ref part; msg.tupleof)
sendPart(part);
This will call sendPart with mime, data and header (the struct's fields, in the order they were declared). You can filter fields by checking their type with e.g. static if (!is(typeof(part) == Header)).
To get the field's name, you can use __traits(identifier):
foreach (i, ref part; msg.tupleof)
writeln(__traits(identifier, msg.tupleof[i]));
(__traits(identifier, part) would return part.)
There's also __traits(allMembers), which also returns methods.
The closest thing to what you want is probably opApply.
See http://dlang.org/spec/statement.html , Section Foreach over Structs and Classes wit opApply
This will work:
int opApply(int delegate(ref ubyte[]) dg) {
int result = 0;
result = dg(mime);
result = dg(data);
ubyte[] header_bytes = header.serialize();
result = dg(header_bytes);
return result;
}
There are several ways to do iterate over objects in D.
One is to implement the InputRange API. Input ranges are similar to iterators, but have a different API. Implementing a range interface means that you can use all of the std.range/std.algorithm functions on your object, such as map, array, joiner and so on.
D doesn't have an __iter__ function to get an iterator from arbitrary collections, so you will need to implement a function that returns an input range.
import std.range;
auto bytes() {
return chain(mime, data, header.serialize);
}
This will return a ubyte input range, consisting of the bytes in mime, followed by the bytes in data, then in header.serialize.
You can also implement the opApply method on your struct. opApply will only work with foreach, so you can't use range methods with it, but it allows you to do things like execute the loop body in separate threads.
The gist of opApply is that D passes the loop body to opApply as a function; that is, foreach(x; myObj) { body } is transformed into myObj.opApply((x) { body }).
void opApply(void delegate(ubyte[] part) loopbody) {
loopbody(mime);
loopbody(data);
loopbody(header.serialize());
}
However, instead of either of those options, I recommend that you implement a function on your object that takes an output range and writes the data to it.
An output range is an object that accepts other objects, and does something to them. In this case, the output range should accept ubytes, making it similar to an output stream.
void serialize(Range)(ref Range outRange) if(isOutputRange!(Range, ubyte)) {
put(outRange, mime); -- `put` simply feeds data into the output range
put(outRange, data);
header.serialize(outRange); // No longer have to allocate/return a ubyte array
}
Example usage, that stores the output into an Appender, which can be converted into an array:
import std.array;
auto serializedDataAppender = appender!ubyte();
myMsg.serialize(serializedDataAppender);
auto serializedData = serializedDataAppender.data;
If you implement an output range on top of your socket, then that means that the output range solution does not have to allocate any memory from the heap.
Check out the Programming in D book (specifically, the Ranges and More Ranges sections) for info on how to implement your own ranges.

Can you append code to an Objective-C block variable?

I want to dynamically add code to a block variable, or merge or concatenate a block with another block. Is this possible?
One way of doing it is creating a block that calls the block to be "expanded" before performing its own functions.
For example, consider the example below that adds logging functionality to an arbitrary block passed into it:
typedef void (^MyBlock)(int);
-(MyBlock) expand:(MyBlock)nested {
return ^(int x) {
nested(x);
NSLog("The value of x = %d", x);
};
}
The cumulative effect of calling the block produced by expand: is that of invoking the original block, followed by an operation from the expanded block. You can take it further, to create an appendBlock method:
-(MyBlock) appendBlock:(MyBlock)second toBlock:(MyBlock)first {
return ^(int x) {
first(x);
second(x);
};
}
Is this possible?
No, but you can create a collection of blocks and execute them sequentially.
Sure - just create a new block, which makes use of the original in whatever compositional way you'd like. If you've got block1 and block2, you might create:
someCodeBefore = ^myBlockType(block1) {
someCode()
thatIWantBefore();
block1();
}
someCodeAfter = ^myBlockType(block1) {
block1();
someCode()
thatIWantAfterBlock1();
}
composedBlocks = ^myBlockType(block1, block2) {
block1();
block2();
}
Just make sure you're copying the blocks correctly.