Using conditionally compiled functions in rust benchmarks - testing

Before I begin, let me note that I think there have been many related questions and answers, all of which proved useful to me. However, I couldn't find anyone who thoroughly described a method to do everything I wanted, so I thought I would document the problem and my solution and ask if there were better approaches.
Let's suppose that I have some slow--but definitely correct--code to perform a certain task that I keep in my project to test a faster implementation. For concreteness, define:
pub fn fast(x: T) -> U {
// does stuff and eventually returns a value `out`
// ...
debug_assert_eq!(out, slow(x))
out
}
#[cfg(any(test, debug_assertions))]
pub fn slow(x: T) -> U { ... }
This is all fine and good. However, now suppose that I would like to add some benchmarks to demonstrate how good my fast implementation is...
Attempt 1: Criterion
I think that a standard way to set up benchmarking is to put a benches/ directory in the project, add a [[bench]] to Cargo.toml with the harness disabled, and use the criterion crate. However, if I understand correctly, if we then run cargo bench, the benchmark will have to take the position of a user that cannot access crate features defined only during testing. Thus, slow will not be resolved and the command will fail.
A Quick Aside: Another thing that derailed me for a while is that I kept wanting to use bench as a cfg flag but couldn't find anything about this. As it turns out, I think that test also covers the benching case. (I think all the seasoned rustaceans will be laughing at me, but this seems like a useful thing to note for anyone in a similar situation).
Attempt 2: The Nightly Test Crate
Since the previous method didn't seem fruitful, another popular option seemed to be to use the unstable test crate. This results in a project structure that looks like:
Cargo.toml
src/
lib.rs
bench.rs
Our original file is then revised to be:
// Include the unstable feature
#![feature(test)]
pub fn fast(x: T) -> U { ... }
#[cfg(any(test, debug_assertions))]
pub fn slow(x: T) -> U { ... }
#[cfg(test)]
mod bench;
And then bench.rs should look something like:
extern crate test;
use test::Bencher;
#[bench]
fn bench_it(b: &mut Bencher) {
b.iter(|| {}) // gotta go fast
}
This seemed to do everything I wanted upon running cargo +nightly bench. However, it is also super desirable for the project to be compilable outside of testing without the use of nightly or extra feature flags. That is, I still want to be able to run cargo build and cargo test and not get yelled at for requesting unstable features on a stable channel.
Attempt 2.5: Enter Build Scripts
(Once again, each of the parts is well-documented in other questions, I'm just collecting everything here for fun). Using a bunch of other posts, I learned that we can check for nightly and conditionally enable features by way of a build script. Our project now looks like this:
Cargo.toml
build.rs
src/
lib.rs
bench.rs
And we need to add rustc_version to our [build-dependencies] in Cargo.toml. We then add the following build script:
use rustc_version::{version_meta, Channel};
fn main() {
// Set feature flags based on the detected compiler version
match version_meta().unwrap().channel {
Channel::Stable => {
println!("cargo:rustc-cfg=RUSTC_IS_STABLE");
}
Channel::Beta => {
println!("cargo:rustc-cfg=RUSTC_IS_BETA");
}
Channel::Nightly => {
println!("cargo:rustc-cfg=RUSTC_IS_NIGHTLY");
}
Channel::Dev => {
println!("cargo:rustc-cfg=RUSTC_IS_DEV");
}
}
}
Finally, if we update lib.rs to be the following:
// Include the unstable feature
#![cfg_attr(RUSTC_IS_NIGHTLY, feature(test))] // <-- Note the change here!
pub fn fast(x: T) -> U { ... }
#[cfg(any(test, debug_assertions))]
pub fn slow(x: T) -> U { ... }
#[cfg(all(RUSTC_IS_NIGHTLY, test))] // <-- Note the change here!
mod bench;
I think we get everything we want.
So... thanks for joining me on this adventure. Would appreciate commentary on whether or not this was the right approach. Also, you might ask "why keep the benchmark around once we know it's slower?" I suppose this might be fair, but perhaps the test could be changed or I'd like to prove the new implementation is faster to a third party that won't just trust me.

Related

Does Rust have hooks for early return on errors?

panic! allows the setting of a custom (albeit global) hook. Is there anything comparable for early returns with the ? operator? I have a function that needs to close some resources in a special way before exiting. I could write a function ok_or_close() that closes the resources before returning the error:
fn opens_resources() -> Result<(), MyError> {
//Opens some stuff.
//Now a bunch of functions that might raise errors.
ok_or_close(foo(), local variables)?;
ok_or_close(bar(), local variables)?;
ok_or_close(baz(), local variables)?;
ok_or_close(Ok(()), local variables)
}
But that seems verbose. What I'd really like to do is this:
fn opens_resources() -> Result<(), MyError> {
//Opens some stuff.
//Now a bunch of functions that might raise errors.
foo()?;
bar()?;
baz()?;
on_err:
//Closes some stuff. Would prefer not to make
// this a function, uses many local variables.
Ok(())
}
Is there a way to do this or a pattern of programming that gets around this?
The closest thing to this would be the Try trait which allows you to implement how ? affect a specific type, but sadly it is still a nightly experiment as stated here
If you're interested in this features I'd recommend you give a +1 at this issue

How to run Rust tests with an explicit time zone using chrono?

I am building some tests that do timestamp conversions in Rust using the chrono crate. I need to make sure they take into account the local time zone but the tests will be run in multiple time zones and so will fail for most testers. How can I force Rust or chrono within the code to use a specific time zone when running tests?
I know about setting env TZ=CST or similar. Since I cannot control that part of the execution environment for all those running cargo test, I don't think this works for us.
If all tests should run in the same timezone, you can use std::sync::Once to initialize the TZ-environment variable as pointed out in the comments. Technically, since there is no race condition, all tests could initialize the env to that timezone.
If tests need to set their own time-zone - valid only for that one test - it's probably safest to still modify the timezone for the entire process (including chrono, yet somewhere down in libc dragons may access the tz as well). As you pointed out yourself, multiple tests need to synchronize over their shared environment. You can do that with a lazy_static:
#[macro_use]
extern crate lazy_static;
lazy_static! {
static ref TZ_LOCK: std::sync::Mutex<()> = std::sync::Mutex::new(());
}
fn with_tz<R, F: FnOnce() -> Result<(), R>>(tz: &str, f: F) -> Result<(), R> {
let tz_lock = TZ_LOCK.lock();
std::env::set_var("TZ", tz);
f()
}
#[test]
fn foobar() -> Result<(), ()> {
with_tz("CET", || {
Ok(())
})
}
You can get more fancy with this by using a more complex TZ_LOCK where all threads which currently want to run under the same timezone get to run simultaneously.

Can I write tests for invalid lifetimes?

I'm writing some Rust code that manipulates raw pointers. These raw pointers are then exposed to users through structures that use ContravariantLifetime to tie the lifetime of the struct to my object.
I'd like to be able to write tests that validate that the user-facing structures cannot live longer than my object. I have code like the following:
fn element_cannot_outlive_parts() {
let mut z = {
let p = Package::new();
p.create() // returns an object that cannot live longer than p
};
}
This fails to compile, which is exactly what I want. However, I'd like to have some automated check that this behavior is true even after whatever refactoring I do to the code.
My best idea at the moment is to write one-off Rust files with this code and rig up bash scripts to attempt to compile them and look for specific error messages, which all feels pretty hacky.
The Rust project has a special set of tests called "compile-fail" tests that do exactly what you want.
The compiletest crate is an extraction of this idea that allows other libraries to do the same thing:
fn main() {
let x: (u64, bool) = (true, 42u64);
//~^ ERROR mismatched types
//~^^ ERROR mismatched types
}
One idea that gets halfway there is to use Cargo's "features".
Specify tests with a feature flag:
#[test]
#[cfg(feature = "compile_failure")]
fn bogus_test() {}
Add this to Cargo.toml:
[features]
compile_failure = []
And run tests as
cargo test --features compile_failure
The obvious thing missing from this is the automatic checking of "was it the right failure". If nothing else, this allows me to have tests that are semi-living in my codebase.
You are able to annotate a test that you expect to fail.
#[should_fail]
As such, you can write a test that attempts to breach the life time it should have, and thus fail, which would actually be a pass.
For an example of a test for 'index out of bounds' see below (pulled from the Rust guides)
#[test]
#[should_fail]
fn test_out_of_bounds_failure() {
let v: &[int] = [];
v[0];
}
I believe that this example would be a compilation error, so it would stand to reason your compile lifetime violation error would be caught by this too.

MicrosoftAjaxMinifier doesn't seem to remove "unreachable code"

I'm using this with BundleTransformer from nuget and System.Web.Optimisation in an ASP.Net app. According to various docs this minifier is supposed to "remove unreachable code". I know it's not as aggressive as google closure (which I can't use presently) but I can't get even the simplest cases to work, eg;
function foo() {
}
where foo isn't called from anywhere. I can appreciate the argument that says this might be an exported function but I can't see a way to differentiate that. All my JS code is concatenated so it would be able to say for sure whether that function was needed or not if I can find the right switches.
The only way I've found to omit unnecessary code is to use the debugLookupList property in the web.config for BundleTransformer but that seems like a sledgehammer to crack a nut. It's not very granular.
Does anyone have an example of how to write so-called 'unreachable code' that this minifier will recognise?
Here's a place to test online
I doubt the minifier has any way of knowing if a globally defined function can be removed safely (as it doesn't know the full scope). On the other hand it might not remove any unused functions and might only be interested in unreachable code (i.e. code after a return).
Using the JavaScript Module Pattern, your unused private functions would most likely get hoovered up correctly (although I've not tested this). In the example below, the minifier should only be confident about removing the function called privateFunction. Whether it considers unused functions as unreachable code is another matter.
var AmazingModule = (function() {
var module = {};
function privateFunction() {
// ..
}
module.otherFunction = function() {
// ..
};
return module;
}());
function anotherFunction() {
// ..
}

Split a module across several files

I want to have a module with multiple structs in it, each in its own file. Using a Math module as an example:
Math/
Vector.rs
Matrix.rs
Complex.rs
I want each struct to be in the same module, which I would use from my main file, like so:
use Math::Vector;
fn main() {
// ...
}
However Rust's module system (which is a bit confusing to begin with) does not provide an obvious way to do this. It seems to only allow you to have your entire module in one file. Is this un-rustic? If not, how do I do this?
Rust's module system is actually incredibly flexible and will let you expose whatever kind of structure you want while hiding how your code is structured in files.
I think the key here is to make use of pub use, which will allow you to re-export identifiers from other modules. There is precedent for this in Rust's std::io crate where some types from sub-modules are re-exported for use in std::io.
Edit (2019-08-25): the following part of the answer was written quite some time ago. It explains how to setup such a module structure with rustc alone. Today, one would usually use Cargo for most use cases. While the following is still valid, some parts of it (e.g. #![crate_type = ...]) might seem strange. This is not the recommended solution.
To adapt your example, we could start with this directory structure:
src/
lib.rs
vector.rs
main.rs
Here's your main.rs:
extern crate math;
use math::vector;
fn main() {
println!("{:?}", vector::VectorA::new());
println!("{:?}", vector::VectorB::new());
}
And your src/lib.rs:
#[crate_id = "math"];
#[crate_type = "lib"];
pub mod vector; // exports the module defined in vector.rs
And finally, src/vector.rs:
// exports identifiers from private sub-modules in the current
// module namespace
pub use self::vector_a::VectorA;
pub use self::vector_b::VectorB;
mod vector_b; // private sub-module defined in vector_b.rs
mod vector_a { // private sub-module defined in place
#[derive(Debug)]
pub struct VectorA {
xs: Vec<i64>,
}
impl VectorA {
pub fn new() -> VectorA {
VectorA { xs: vec![] }
}
}
}
And this is where the magic happens. We've defined a sub-module math::vector::vector_a which has some implementation of a special kind of vector. But we don't want clients of your library to care that there is a vector_a sub-module. Instead, we'd like to make it available in the math::vector module. This is done with pub use self::vector_a::VectorA, which re-exports the vector_a::VectorA identifier in the current module.
But you asked how to do this so that you could put your special vector implementations in different files. This is what the mod vector_b; line does. It instructs the Rust compiler to look for a vector_b.rs file for the implementation of that module. And sure enough, here's our src/vector_b.rs file:
#[derive(Debug)]
pub struct VectorB {
xs: Vec<i64>,
}
impl VectorB {
pub fn new() -> VectorB {
VectorB { xs: vec![] }
}
}
From the client's perspective, the fact that VectorA and VectorB are defined in two different modules in two different files is completely opaque.
If you're in the same directory as main.rs, you should be able to run it with:
rustc src/lib.rs
rustc -L . main.rs
./main
In general, the "Crates and Modules" chapter in the Rust book is pretty good. There are lots of examples.
Finally, the Rust compiler also looks in sub-directories for you automatically. For example, the above code will work unchanged with this directory structure:
src/
lib.rs
vector/
mod.rs
vector_b.rs
main.rs
The commands to compile and run remain the same as well.
The Rust module rules are:
A source file is just its own module (except the special files main.rs, lib.rs and mod.rs).
A directory is just a module path component.
The file mod.rs is just the directory's module.
The file matrix.rs1 in the directory math is just the module math::matrix. It's easy. What you see on your filesystem you also find in your source code. This is an one-to-one correspondence of file paths and module paths2.
So you can import a struct Matrix with use math::matrix::Matrix, because the struct is inside the file matrix.rs in a directory math. Not happy? You'd prefer use math::Matrix; very much instead, don't you? It's possible. Re-export the identifier math::matrix::Matrix in math/mod.rs with:
pub use self::math::Matrix;
There's another step to get this working. Rust needs a module declaration to load the module. Add a mod math; in main.rs. If you don't do that, you get an error message from the compiler when importing like this:
error: unresolved import `math::Matrix`. Maybe a missing `extern crate math`?
The hint is misleading here. There's no need for additional crates, except of course you really intend to write a separate library.
Add this at the top of main.rs:
mod math;
pub use math::Matrix;
The module declaration is also neccessary for the submodules vector, matrix and complex, because math needs to load them to re-export them. A re-export of an identifier only works if you have loaded the module of the identifier. This means, to re-export the identifier math::matrix::Matrix you need to write mod matrix;. You can do this in math/mod.rs. Therefore create the file with this content:
mod vector;
pub use self::vector::Vector;
mod matrix;
pub use self::matrix::Matrix;
mod complex;
pub use self::complex::Complex;
Aaaand you are done.
1Source file names usually start with a lowercase letter in Rust. That's why I use matrix.rs and not Matrix.rs.
2Java's different. You declare the path with package, too. It's redundant. The path is already evident from the source file location in the filesystem. Why repeat this information in a declaration at the top of the file? Of course sometimes it's easier to have a quick look at the source code instead of finding out the filesystem location of the file. I can understand people who say it's less confusing.
Rusts purists will probably call me a heretic and hate this solution, but this is much simpler: just do each thing in its own file, then use the "include!" macro in mod.rs:
include!("math/Matrix.rs");
include!("math/Vector.rs");
include!("math/Complex.rs");
That way you get no added nested modules, and avoid complicated export and rewrite rules.
Simple, effective, no fuss.
Alright, fought my compiler for a while and finally got it to work(thanks to BurntSushi for pointing out pub use.
main.rs:
use math::Vec2;
mod math;
fn main() {
let a = Vec2{x: 10.0, y: 10.0};
let b = Vec2{x: 20.0, y: 20.0};
}
math/mod.rs:
pub use self::vector::Vec2;
mod vector;
math/vector.rs
use std::num::sqrt;
pub struct Vec2 {
x: f64,
y: f64
}
impl Vec2 {
pub fn len(&self) -> f64 {
sqrt(self.x * self.x + self.y * self.y)
}
// other methods...
}
Other structs could be added in the same manner. NOTE: compiled with 0.9, not master.
I'd like to add in here how you include Rust files when they are deeply nested. I have the following structure:
|-----main.rs
|-----home/
|---------bathroom/
|-----------------sink.rs
|-----------------toilet.rs
How do you access sink.rs or toilet.rs from main.rs?
As others have mentioned, Rust has no knowledge of files. Instead it sees everything as modules and submodules. To access the files inside the bathroom directory you need to export them or barrel them to the top. You do this by specifying a filename with the directory you'd like to access and pub mod filename_inside_the_dir_without_rs_ext inside the file.
Example.
// sink.rs
pub fn run() {
println!("Wash my hands for 20 secs!");
}
// toilet.rs
pub fn run() {
println!("Ahhh... This is sooo relaxing.")
}
Create a file called bathroom.rs inside the home directory:
Export the filenames:
// bathroom.rs
pub mod sink;
pub mod toilet;
Create a file called home.rs next to main.rs
pub mod the bathroom.rs file
// home.rs
pub mod bathroom;
Within main.rs
// main.rs
// Note: If you mod something, you just specify the
// topmost module, in this case, home.
mod home;
fn main() {
home::bathroom::sink::run();
}
use statements can be also used:
// main.rs
// Note: If you mod something, you just specify the
// topmost module, in this case, home.
use home::bathroom::{sink, toilet};
fn main() {
sink::run();
sink::toilet();
}
Including other sibling modules (files) within submodules
In the case you'd like to use sink.rs from toilet.rs, you can call the module by specifying the self or super keywords.
// inside toilet.rs
use self::sink;
pub fn run() {
sink::run();
println!("Ahhh... This is sooo relaxing.")
}
Final Directory Structure
You'd end up with something like this:
|-----main.rs
|-----home.rs
|-----home/
|---------bathroom.rs
|---------bathroom/
|-----------------sink.rs
|-----------------toilet.rs
The structure above only works with Rust 2018 onwards. The following directory structure is also valid for 2018, but it's how 2015 used to work.
|-----main.rs
|-----home/
|---------mod.rs
|---------bathroom/
|-----------------mod.rs
|-----------------sink.rs
|-----------------toilet.rs
In which home/mod.rs is the same as ./home.rs and home/bathroom/mod.rs is the same as home/bathroom.rs. Rust made this change because the compiler would get confused if you included a file with the same name as the directory. The 2018 version (the one shown first) fixes that structure.
See this repo for more information and this YouTube video for an overall explanation.
One last thing... avoid hyphens! Use snake_case instead.
Important Note
You must barrel all the files to the top, even if deep files aren't required by top-level ones.
This means, that for sink.rs to discover toilet.rs, you'd need to barrel them by using the methods above all the way up to main.rs!
In other words, doing pub mod sink; or use self::sink; inside toilet.rs will not work unless you have exposed them all the way up to main.rs!
Therefore, always remember to barrel your files to the top!
A more rustlings method to export module, which I picked up from Github.
mod foo {
//! inner docstring comment 1
//! inner docstring comment 2
mod a;
mod b;
pub use a::*;
pub use b::*;
}
Adjusting the question's example directory and file names to conform to Rust naming conventions:
main.rs
math.rs
math/
vector.rs
matrix.rs
complex.rs
Make sure to export the public symbols (types, functions, etc.) in each of the files in the math directory by preceding them with the keyword pub.
Define math.rs:
mod vector;
pub use vector::*;
mod matrix;
pub use matrix::*;
mod complex;
pub use complex::*;
The above file keeps the sub-modules of math private but the submodules' public symbols are exported from module math. This effectively flattens the module structure.
Use math::Vector in main.rs:
mod math;
use crate::math::Vector;
fn main() {
// ...
}