How to benchmark if I need a reset in each iteration? - testing

I've written a small Sudoku solver using backtracking. Now I want to benchmark the speed of this function. Here is my current code:
type Board struct {
Cells [9][9]int
}
func BenchmarkBacktrack(b *testing.B) {
for i := 0; i < b.N; i++ {
b.StopTimer()
// prevent the modification of the orignal board
copy := &Board{
Cells: exampleBoard.Cells,
}
b.StartTimer()
copy.Backtrack()
}
}
Since &Board is pointer I would solve the Sudoku in the first iteration and in the next one I would backtrack a solved board. Therefore, I reset the board at the beginning of each iteration. exampleBoard is filled with sample values.
Is their a better way to benchmark the function without stopping and restarting the timer over and over?
And wouldn't cost the function calls a small amount of time that impacts the benchmark?

And wouldn't cost the function calls a small amount of time that that impacts the benchmark?
Of course they would. So does the for loop, which is included in the benchmark. Plus overhead of calling copy.Backtrack function. But the thing is, this should be all irrelevant, unless you're benchmarking a single operation taking nanoseconds (in which case you shouldn't). Creation of an empty board is probably a trivial operation, so I wouldn't touch the timers at all. If it's not trivial, then you're doing it right – call StopTimer. This is exactly why it was invented:
StopTimer stops timing a test. This can be used to pause the timer while performing complex initialization that you don't want to measure.

You could try providing a func NewBoard([9][9]int) *Board method, which just initializes a board from the example data. Then write a benchmark for Backtrack() on a new board and a separate benchmark for NewBoard().
Subtracting the two numbers should give you an idea of the speed of your Backtrack method alone.
type Board struct {
Cells [9][9]int
}
var scratch *Board
func NewBoard(cells [9][9]int) *Board {
return &Board{Cells: cells}
}
func BenchmarkBacktrack(b *testing.B) {
for i := 0; i < b.N; i++ {
scratch = NewBoard(exampleBoard.Cells)
scratch.Backtrack()
}
func BenchmarkNewBoard(b *testing.B) {
for i := 0; i < b.N; i++ {
scratch = NewBoard(exampleBoard.Cells)
}
Also note the use of scratch variable. Trying to create a loop local variable inside the benchmark loop could lead the compiler to optimise away the call to NewBoard() depending on presence/absence of side-effects. For parity, you need to use the scratch variable in both benchmarks.

Related

Whats the best way to count indirect function calls conditionally using cfg attributes?

When writing a test, I would like to know how many times a function is called, since bad logic may yield a correct result even when excessive and unnecessary function calls are performed.
To give some context, this is a tree-search function running a test on a fixed data set, however that isn't important to the answer.
I'm currently using a static mutable variable, however this means every access needs to be marked as unsafe:
#[cfg(test)]
static mut total_calls: usize = 0;
fn function_to_count() {
#[cfg(test)]
unsafe {
total_calls += 1;
}
// do stuff
}
#[test]
fn some_test() {
// do stuff, indirectly call function_to_count().
assert!(total_calls < 100);
}
It would be good to avoid having to put unsafe into the code.
Is there a better way to count indirect function calls in Rust?
Mutable statics are unsafe because they're global, and could be accessed from any thread at any time. The simplest solution is to change the definition of the function in question to take some kind of "counter" interface that keeps track of calls. You can avoid performance problems by using generics plus a "dummy" implementation that does nothing.
// Use a callable because I'm feeling lazy.
fn function_to_count<Count: FnMut()>(count: &mut Count) {
count();
// ...
}
#[cfg(test)]
#[test]
fn some_test() {
let mut count = 0;
for _ in 0..10 {
function_to_count(&mut || count += 1);
}
assert_eq!(count, 10);
}
You should really, seriously do that, and not what I'm about to describe:
The other solution is to use a thread-safe construct.
A word of warning: do not use this if you have more than one test! The test runner will, by default, run tests in parallel. As such, if you have more than one test calling into the instrumented function, you will get corrupt results. You'd have to write some kind of exclusive locking mechanism and somehow teach the function to "know" which run it's a part of, and at that point, you should just use the previously described solution instead. You could also disable parallel tests, but I believe you can only do that from outside the code, and that's just asking for someone to forget and run into weird failures as a result.
But anyway...
use std::sync::atomic::{ATOMIC_USIZE_INIT, AtomicUsize, Ordering};
#[cfg(test)]
static TOTAL_CALLS: AtomicUsize = ATOMIC_USIZE_INIT;
fn function_to_count() {
if cfg!(test) {
TOTAL_CALLS.fetch_add(1, Ordering::SeqCst);
}
// ...
}
#[cfg(test)]
#[test]
fn some_test() {
for _ in 0..10 {
function_to_count();
}
assert_eq!(TOTAL_CALLS.load(Ordering::SeqCst), 10);
}

How to limit JProfiler to a subtree

I have a method called com.acmesoftware.shared.AbstractDerrivedBean.getDerivedUniqueId(). When I JProfiler the application, this method, getDerivedUniqueId(), is essentially buried 80 methods deep as expected. The method is invoked on behalf of every bean in the application. I'm trying to record CPU calltree starting with this method down to leaf node (ie, one of the excluded classes).
I tried the following but it didn't produce the expected outcome:
Find a method above the method targeted for profiling, eg, markForDeletion().
set trigger to start recording at getDerivedUniqueId()
set trigger to STOP recording at markForDeletion()
I was expecting to only see everything below markForDeletion(), but I saw everything up to but not INCLUDING getDerivedUniqueId(), which is the opposite of my intended goal. Worse yet, even with 5ms sampling, this trigger increased the previous running time from 10 minutes to "I terminated after 3 hours of running". It seems the trigger is adding a giant amount of overhead on top of the overhead. Hence, even if I figure out how to correctly enable the trigger, the added overhead would seem to render it ineffective.
The reason I need to limit the recording to just this method is: When running in 5ms sampling mode, the application completes in 10 minutes. When I run it in full instrumentation, I've waited 3 hours and it still hasn't completed. Hence, I need to turn on full instrumentation ONLY after getDerivedUniqueId() is invoked and pause profiling when getDerivedUniqueId() is exited.
-- Updated/Edit:
Thank you Ingo Kegel for your assistance.
I am likely not clear on how to use triggers. In the code below, I set triggers as shown after the code. My expectation is that when I JProfile the application (both sampling and full instrumentation) with the below configured triggers, if boolean isCollectMetrics is false, I should see 100% or 99.9% of cpu in filtered classes. However, that is not the case. The CPU tree seems not to take into account the triggers.
Secondly, when isCollectMetrics is true, the jprofiler call tree I expect would start with startProfiling() and end at stopProfiling(). Again, this is not the case either.
The method contains() is the bottleneck. It eventually calls one of 150 getDerivedUniqueId(). I am trying to pinpoint which getDerivedUniqueId() is causing the performance degradation.
public static final AtomicLong doEqualContentTime = new AtomicLong();
public static final AtomicLong instCount = new AtomicLong();
protected boolean contentsEqual(final InstanceSetValue that) {
if (isCollectMetrics) {
// initialization code removed for clarity
// ..........
// ..........
final Set<Instance> c1 = getReferences();
final Set<Instance> c2 = that.getReferences();
long st = startProfiling(); /// <------- start here
for (final Instance inst : c1) {
instCount.incrementAndGet();
if (!c2.contains(inst)) {
long et = stopProfiling(); /// <------- stop here
doEqualContentTime.addAndGet(et - st);
return false;
}
}
long et = stopProfiling(); /// <------- stop here
doEqualContentTime.addAndGet(et - st);
return true;
} else {
// same code path as above but w/o the profiling. code removed for bravity.
// ......
// ......
return true;
}
}
public long startProfiling() {
return System.nanoTime();
}
public long stopProfiling() {
return System.nanoTime();
}
public static void reset() {
doEqualContentTime.set(0);
instCount.set(0);
}
The enabled triggers:
startProfiling trigger:
stopProfiling trigger:
I've tried 'Start Recordings' or 'Record CPU' buttons separately to capture the call tree only
If the overhead with instrumentation is large, you should refine your filters. With good filters, the instrumentation overhead can be very small,
As for the trigger setup, the correct actions are:
"Start recording" with CPU data selected
"Wait for the event to finish"
"Stop recording" with CPU data selected

How do I fix yield not working in while loop?

I get no console errors, but all of them isntantiate at the same time, so they are all one unit, and I want a delay between their spawn. (they are enemies traveling a path)
#pragma strict
// Instantiate a rigidbody then set the velocity
var projectile : Transform;
var cnt : int = 0;
function Update () {
if (buttonFeatures.started) {
while (cnt < 4) {
// Instantiate the projectile at the position and rotation of this transform
wait();
var clone : Transform;
clone = Instantiate(projectile, transform.position, transform.rotation);
cnt++;
}
}
}
function wait(){
yield WaitForSeconds (10);
}
Your problem is you're trying to something similar to yield from Update(), which can't be done.
You can't obviously block Update() for 10 seconds, but calling a method that usesyield will return *immediately *and start a coroutine with that method, so what you're seeing is:
Update calls wait()
Update keeps going without waiting for wait() to return
wait() starts waiting for 10 seconds on it's own as a coroutine
Update continues through the loop, calling wait() 3 more times, not waiting each time.
To confirm this you can change wait():
function wait(){
yield WaitForSeconds (10);
Debug.Log("Done Waiting"); //You'll see 3 of these pop up in your console in 10 seconds later
}
You have two main options here. Either replace wait() with your logic:
function Update () {
if (buttonFeatures.started) {
while (cnt < 4) {
InstantiateProjectile(cnt*10)// Instantiate the projectile at the position and rotation of this transform in X seconds. This will return immediately
cnt++;
}
}
}
function InstantiateProjectile(delay : int){
yield WaitForSeconds(delay);
var clone : Transform;
clone = Instantiate(projectile, transform.position, transform.rotation);
}
Or start a co-routine in Start():
function Start(){
UpdateProjectiles();
}
function UpdateProjectiles (){
while(true){
if (buttonFeatures.started) {
while (cnt < 4) {
yield WaitForSeconds (10); //This works because it's not in Update
var clone : Transform;
clone = Instantiate(projectile, transform.position, transform.rotation); // Instantiate the projectile at the position and rotation of this transform
cnt++;
}
}
yield; //This causes this coroutine to behave the way Update would (running once a frame)
}
}
The while(true) in the second example might be a little alarming, but it's performance is no different than using Update() because of the yield. In fact, many people use the majority of their "traditional" Update() logic in co-routines, as they allow better encapsulation of state management logic, and are great for periodic tasks.
Note: I didn't want to distort your code too much and obscure it's meaning, but you might want to reconsider certain parts of your code:
You have a while loop with a count, that can easily be a for loop.
You have a loop that seems to be replacing objects. Instantiate can be one of the most expensive calls you can make, because for every call you make to it, eventually you'll pay the cost of the GC cleaning up an object.
You don't want to be destroying lots of objects and calling Instantiate to replace them with identical ones, because the GC will start slowing your game to a standstill keeping up. Right now as your code stands, if projectiles start being spawned in a situation where they're being destroyed very quickly, I won't be surprised if the game completely freezes.
Instead, prefer reusing objects, with object pools. There are tutorials on how to do it, such as this one. At the most basic level, it can be something as simple as giving your projectiles a Reset method. Then you could replace Destroy with a method that calls Reset and stores them in a Stack or List, where they can be accessed again. But you'll want to encapsulate all that logic, so look at that tutorial for specifics.
replace this line instead of making the function call:
yield return new WaitForSeconds(10);

How do I exit a Rust program early from outside the main function?

I am in the process of writing a bash clone in Rust. I need to have my program exit when the user types exit. In previous iterations of my program, before I added more complicated features, I used return to get out of the loop that was prompting the user for input. This logic is now in a function, because of the way I am implementing built in shell functions, so when I return it just jumps out of the function back into the control loop, instead of short-circuiting the control loop and ending the program.
I realize that I could probably return a boolean when the user types exit and exit the loop, but I would like to at least know if Rust has a way to terminate programs early, similar to Java's System.exit(), as this is useful for certain types of programs.
Rust 1.0 stable
std::process::exit() does exactly that - it terminates the program with the specified exit code:
use std::process;
fn main() {
for i in 0..10 {
if i == 5 {
process::exit(1);
}
println!("{}", i);
}
}
This function causes the program to terminate immediately, without unwinding and running destructors, so it should be used sparingly.
Alternative (not recommended) solution
You can use C API directly. Add libc = "0.2" to Cargo.toml, and:
fn main() {
for i in 0..10 {
if i == 5 {
unsafe { libc::exit(1); }
}
println!("{}", i);
}
}
Calling C functions cannot be verified by the Rust compiler, so this requires the unsafe block. Resources used by the program will not be freed properly. This may cause problems such as hanging sockets.
As far as I understand, the proper way to exit from the program is to terminate all threads somehow, then the process will exit automatically.
panic!("Oh no something bad has happened!")
Example:
if a * g < 0f32 { panic!("The arithmetric-geometric mean is undefined for numbers less than zero!"); }
In older documentation, you will see this as fail!("Oh no something bad here has happened.")
For some reason, this macro was changed from fail to panic. Panic is the way to fail, if you must.
[edit] I am sorry. It looks like you should be testing input for the string "exit," which would depend on how you are taking input (by line or by args). Then you can have the program break out of the loop on the condition that the exit is detected.
Example:
loop {
if exit_found { break }
else {
// your thing, which also looks for exit_found
}
}

Pre-processing a loop in Objective-C

I am currently writing a program to help me control complex lights installations. The idea is I tell the program to start a preset, then the app has three options (depending on the preset type)
1) the lights go to one position (so only one group of data sent when the preset starts)
2) the lights follows a mathematical equation (ex: sinus with a timer to make smooth circles)
3) the lights respond to a flow of data (ex midi controller)
So I decided to go with an object I call the AppBrain, that receive data from the controllers and the templates, but also is able to send processed data to the lights.
Now, I come from non-native programming, and I kinda have trust issues concerning working with a lot of processing, events and timing; as well as troubles with understanding 100% the Cocoa logic.
This is where the actual question starts, sorry
What I want to do, would be when I load the preset, I parse it to prepare the timer/data receive event so it doesn't have to go trough every option for 100 lights 100 times per second.
To explain more deeply, here's how I would do it in Javascript (crappy pseudo code, of course)
var lightsFunctions = {};
function prepareTemplate(theTemplate){
//Let's assume here the template is just an array, and I won't show all the processing
switch(theTemplate.typeOfTemplate){
case "simpledata":
sendAllDataTooLights(); // Simple here
break;
case "periodic":
for(light in theTemplate.lights){
switch(light.typeOfEquation){
case "sin":
lightsFunctions[light.id] = doTheSinus; // doTheSinus being an existing function
break;
case "cos":
...
}
}
function onFrame(){
for(light in lightsFunctions){
lightsFunctions[light]();
}
}
var theTimer = setTimeout(onFrame, theTemplate.delay);
break;
case "controller":
//do the same pre-processing without the timer, to know which function to execute for which light
break;
}
}
}
So, my idea is to store the processing function I need in an NSArray, so I don't need to test on each frame the type and loose some time/CPU.
I don't know if I'm clear, or if my idea is possible/the good way to go. I'm mostly looking for algorithm ideas, and if you have some code that might direct me in the good direction... (I know of PerformSelector, but I don't know if it is the best for this situation.
Thanks;
I_
First of all, don't spend time optimizing what you don't know is a performance problem. 100 iterations of the type is nothing in the native world, even on the weaker mobile CPUs.
Now, to your problem. I take it you are writing some kind of configuration / DSL to specify the light control sequences. One way of doing it is to store blocks in your NSArray. A block is the equivalent of a function object in JavaScript. So for example:
typedef void (^LightFunction)(void);
- (NSArray*) parseProgram ... {
NSMutableArray* result = [NSMutableArray array];
if(...) {
LightFunction simpledata = ^{ sendDataToLights(); };
[result addObject:simpleData];
} else if(...) {
Light* light = [self getSomeLight:...];
LightFunction periodic = ^{
// Note how you can access the local scope of the outside function.
// Make sure you use automatic reference counting for this.
[light doSomethingWithParam:someParam];
};
[result addObject:periodic];
}
return result;
}
...
NSArray* program = [self parseProgram:...];
// To run your program
for(LightFunction func in program) {
func();
}