Fast evaluation of a decision forest - optimization

I have some decision trees (1000-3000) which need to be evaluated as fast as possible. They all access the same set of double values. There are no categorical values at all (so all values are just numerical).
What is the fastest way to do this? At the moment I generate some C-code at runtime and compile it with the heaviest optimizations for the local architecture. The generated code looks like this (similar, but much larger):
static inline double eval_tree0() {
if (*(const double *)0x12345 < 1.2345) {
if (*(const double *)0x4563456 < 2.2243) {
return 1.2111;
}
else {
return 5.2111;
}
}
else {
return 1.234;
}
}
double eval() {
return eval_tree0() + eval_tree1() + ...;
}
Is there something more performant? I was thinking about using AVX to evaluate multiple trees at once, but this seems to be tricky and I'm not sure if the performance is that much better.
Has anyone an idea what's the very fastest possibility to evaluate a bunch of decision trees for a given input (batch size is 1)? Maybe even something with AVX?
Thanks

Related

refactor if to takeIf and return "return" without label kotlin

I am trying to write more idiomatic Kotlin code and I am stuck with the best way to refactor this if condition. Basically when the condition if true (fragment is GenericActionsBottomSheetDialog instance in a list of Fragments) I return the funcion itself.
Here is what I had and how I refactored it. Is there better way to achieve it? After my refactoring it get worse:
Before refactor:
supportFragmentManager.fragments.iterator().forEach {
if (it is GenericActionsBottomSheetDialog)
return
After refactor:
supportFragmentManager.fragments.iterator().forEach { it ->
it.apply {
takeIf { it is GenericActionsBottomSheetDialog }?.apply { return }}}
If this forEach is the only thing in your current function (which it should IMO), you could get rid of the non-local return by using takeWhile instead:
supportFragmentManager.fragments
.takeWhile { it !is GenericActionsBottomSheetDialog }
.forEach {
// do stuff
}
/!\ be careful that this changes semantics if there is other stuff after the forEach in the same function declared with fun.
If you expect many fragments in the list, you could also use asSequence() before takeWhile so you don't create an intermediate list.
Here's one possibility, which separates the decision from the action:
if (supportFragmentManager.fragments.any{ it is GenericActionsBottomSheetDialog })
return
I think this approach makes the intent clearest. (It's also about the most efficient.)
any() simply checks each item in turn, stopping when it finds a match (or when it reaches the end of the list). Kotlin has many functions like this (inspired by functional programming languages) that use lambdas to operate on lists and other structures. They tend to be named for what they do, rather than how they do it — which makes code using them both short and easy to read. (You should be writing code for people to read, as much as for computers to execute!)
For completeness, here's another approach, which uses filterIsInstance():
if (supportFragmentManager.fragments
.filterIsInstance<GenericActionsBottomSheetDialog>)
.isNotEmpty())
return
There are bound to be many other ways. But I agree with the commenter that your ‘refactored’ approach, while using many more Kotlin functions, has little else to recommend it!
This is an opinion based question, and answers cannot be any different.
That being said: there is nothing wrong with if clauses. From what I can see from your current question, I'd leave it with an if.
Now, if you really do not want to use it, filter elements that are not of type GenericActionsBottomSheetDialog and apply whatever function you want on them (the part that is in your else clause, which we do not see).
EDIT:
In case you only want to check if the object of the GenericActionsBottomSheetDialog exists in the collection, you can perhaps do it like this:
val dialogExists = supportFragmentManager.fragments
.firstOrNull { it is GenericActionsBottomSheetDialog} != null
if (dialogExists) {
return
}
#gidds solution is IMO the most idiomatic one:
if (supportFragmentManager.fragments
.any { it is GenericActionsBottomSheetDialog }) return
I would like to add this solution eliminating the if:
supportFragmentManager.fragments
.firstOrNull { it is GenericActionsBottomSheetDialog }
?.run { return }
It's a matter of taste which one you pick, I prefer the first one.
I was wondering why you use the iterator? You could simply do:
supportFragmentManager.fragments.forEach {

Whats the best way to write an iterator supporting multiple logic branches?

In Rust I've started writing iterators, converting them from code which took a callback function.
I ran into the problem where the code that used a callback in multiple branches of the function didn't convert so cleanly into a Rust iterator.
To give some pseudo-code.
// function using callbacks where the caller can exit at any time,
// can be used in a similar way to an iterator.
fn do_stuff(args, callback_fn(cb_args)) {
// define a, b, c... args
if callback_fn(a, b, 0) == false { return; }
for i in 0..n {
if callback_fn(c, d, i) == false { return; }
}
if callback_fn(e, f, -1) == false { return; }
}
Converting this to an iterator was rather awkward since I needed to store some state representing each branch.
impl Iterator for MyStruct {
fn next(&mut self) -> Option<MyResult> {
let out = match (self.state) {
0 => {
self.state += 1;
Some(MyResult(self.a, self.b, 0))
},
1 => {
self.i += 1;
if self.i == self.n {
self.state += 1;
}
Some(MyResult(self.c, self.d, self.i - 1))
},
2 => {
self.state += 1;
Some(MyResult(self.e, self.f, -1))
},
_ => {
None
},
}
return out;
}
// --- snip
With the example above, this is arguably acceptable, (if a little awkward). Consider cases with multiple for loops, variable scopes, where its much harder to track state.
While I didn't try these, I imagine there are some ways to achieve this which in most cases are less-then-ideal workarounds:
Using the callback version, building a vector, then iterating over it... (works but defeats the purpose of using an iterator, no way to early exit and avoid creating the entire data set for eg).
Writing an iterator which communicates with a thread that uses similar logic to the callback version.(while possible, the overhead of creating OS threads makes it a poor choice in many cases).
Besides the workarounds above:
Are there ways to write iterators like the example given, with less convoluted logic?Ideally more like the example that uses callbacks.
Otherwise are there other ways to handle this?
Or is this simply not supported in Rust?
Note, the same logic applies coming from Python generators (using yield instead of a callback, using callbacks as an example here since they're ubiquitous with first class functions).
Languages like C# and Python provide a way to generate iterators from methods written using a special yield keyword. As of Rust 1.11, there is no such feature in the language. However, such a feature is planned (see RFC) (indeed, yield is a reserved keyword!) and would likely work as in C# (i.e. the compiler would generate a struct with the necessary state and implementation for Iterator).
In the meantime, you could try Stateful, a project that attempts to provide this feature. (This blog post explains how Stateful works, and the challenges involved.)

How to convert valueForKeyPath in swift

My old function in objective c is
+ (NSUInteger)getNumberOfDistinctUsers:(NSArray *)users {
NSArray* usersAfterPredicate = [users valueForKeyPath:#"#distinctUnionOfObjects.userName"];
return [usersAfterPredicate count]; }
How do I convert this in swift, I was trying to something like this but its crashing "Could not cast value of type 'Swift.Array'to 'Swift.AnyObject'"
static func getNumberOfDistinctUsers(users: [ICEPKReferenceDataUser]) -> Int {
var retval : Int = 0
if let usersAfterPredicate = (users as! AnyObject).valueForKeyPath("#distinctUnionOfObjects.userName") {
retval = usersAfterPredicate.count
}
return retval
}
Can I solve this problem using filter, map or Reduce? I am just trying to find out distint users in users array using the property username.
Edit* brute force way
static func getNumberOfDistinctUsers(users: [ICEPKReferenceDataUser]) -> Int {
var retvalSet : Set<String> = []
for user in users {
retvalSet.insert(user.userName)
}
return retvalSet.count
}
As you suspect, you can simplify the code with a simple map:
static func getNumberOfDistinctUsers(users: [ICEPKReferenceDataUser]) -> Int {
return Set(users.lazy.map{$0.userName}).count
}
This uses the fact that you can initialize a Set using any other sequence.
I added lazy in there to avoid creating an extra copy of the array. It'll work with or without, but I expect it to be much more memory efficient this way. Array.map creates another Array. array.lazy.map return a lazy collection that computes values as requested.
That said, I don't know that my approach is dramatically better than your "brute-force" way. It's not obvious which is easer to read or maintain. I have a fondness for the map approach, but it can be a tradeoff (I had to know to add lazy for instance, or I could have allocated significant memory if this were a large array). Your code makes it very clear what's going on, so I don't think there's any problem that has to be solved there.
If you really wanted to use KVC, you'd need to convert your array to an NSArray, not an AnyObject, but I suspect that the above code is much faster, and is clearer and simpler, too, IMO.

How to avoid if else or switch case whe dealing with enums?

I have a member variable that tells units for a value I have measured like centimeters,kilometers,seconds,hours etc.
Now these are enums,
When I display a corresponding string, I have created a method that returns corresponding string for these enums.
Unlike Java, enums here cant have other properties associated with them.
So I have to explicitly do a if-else-if chain or a switch case to return the correct string.
I am new to Objective C. any good practice that I should be following in such scenarios ?
afaik Objective-C enums are just old-school C enums... so maybe you can use an integer value for them?
I guess if your enum values started at 0 and increased you could use some sort of array access:
const char *distanceUnitToString2(enum DistanceUnit unit)
{
const char *values[] = {
"cm",
"m",
"km"
};
// do some sanity checking here
// ...
return values[unit];
}
But this feels a little flaky to me. What if you have negative values, or you are using bitmask-style enum values like 1 << 8? You are going to end up using a very large array.
You also could use a switch and improve it a little with a macro. Something like this:
const char *distanceUnitToString(enum DistanceUnit unit)
{
#define CASE(UNIT, STRING) case (UNIT): return (STRING)
switch (unit) {
CASE(kCentimeters, "cm");
CASE(kMeters, "m");
CASE(kKiloMeters, "km");
default:
// should not get here
assert(0);
break;
}
#undef CASE
}
But you don't really save that much vs. not using the macro.
Martin James's comment is the right answer. And use a definition of the enum like:
enum units { cm = 0, m, km };
that way you can be sure that your enum translates to the correct index values.

Object oriented design patterns for parsing text files?

As part of a software package I'm working on, I need to implement a parser for application specific text files. I've already specified the grammar for these file on paper, but am having a hard time translating it into easily readable/updatable code (right now just it passes each line through a huge number of switch statements).
So, are there any good design patterns for implementing a parser in a Java style OO environment?
Any easy way to break a massive switch into an OO design would be to have
pseudo code
class XTokenType {
public bool isToken(string data);
}
class TokenParse {
public void parseTokens(string data) {
for each step in data {
for each tokenType in tokenTypess {
if (tokenType.isToken(step)) {
parsedTokens[len] = new tokenType(step);
}
...
}
}
...
}
}
Here your breaking each switch statement into a method on that token object to detect whether the next bit of the string is of that token type.
Previously:
class TokenParse {
public void parseTokens(string data) {
for each step in data {
switch (step) {
case x:
...
case y:
...
...
}
}
...
}
}
One suggestion is to create property file where you define rules. Load it during run time and use if else loop (since switch statements also does the same internally). This way if you want to change some parsing rules you have to change .property file not code. :)
You need to learn how to express context free grammars. You should be thinking about the GoF Interpreter and parser/generators like Bison, ANTRL, lex/yacc, etc.