I'm translating a chunk (2000 lines) of proprietary C code into Rust. In C, it is common to run a pointer, array index, etc. down, for as long as it is non-negative. In Rust, simplified to the bone, it would look something like:
while i >= 0 && more_conditions {
more_work;
i -= 1;
}
Of course, when i is usize, you get an under-overflow from subtraction. I have learned to work around this by using for loops with .rev(), offsetting my indexes by one, or using a different type and casting with as usize, etc.
Usually it works, and usually I can make it legible, but the code I'm modifying is chock-full of indexes running towards each other, and eventually tested with i_low > i_high
Something like (in Rust)
loop {
while condition1(i_low) { i_low += 1; }
while condition2(i_high) { j_high -= 1; }
if i_low > i_high { return something; }
do_something_else;
}
Every now and then this panics, as i_high runs past 0.
I have been inserting a lot of j_high >= 0 && in the code, and it become a lot less readable.
How do experienced Rust programmers avoid usize variables going to -1?
for loops? for i in (0..size).rev()
casting? i as usize, after checking for i < 0
offsetting your variable by one, and using i-1 when safe?
extra conditionals?
catching exceptions?
Or do you just eventually learn to write programs around these situations?
Clarification: The C code is not broken - it has been supposedly in production for ten years, structuring video segments on multiple servers 24/7. It is just not following Rust conventions - it often returns -1 as an index, it recurses with -1 for the low index of an array to process, and indexes go negative all the time. All of these are handled before problems occurs - ugly, but functional. Something like:
incident_segment = detect_incident(array, start, end);
attach(array, incident_segment);
store(array, start, incident_segment - 1);
process(array, incident_segment + 1, end);
In the above code, every single of the three resulting calls may be getting a segment index that's -1 (attach, store) or out of bounds (process) It's handled, but after the call.
My Rust code appears to be working as well. As a matter of fact, in order to deal with the negative usize, I added additional logic that pruned a number of recursions, so it runs about as fast as the C code (apparently faster, but that's also because I distributed the output on multiple drives)
The issue is that the client does not not want a full rewrite, and wants the 'native' programmers to be able to check the two programs against each other. Based on the answers so far, I'm thinking that using i64 and casting/shadowing as needed may be the best way to produce code that's easy to read for the 'natives'. Which I personally do not have to like...
If you want to do it idiomatically:
for j in (0..=i).rev() {
if conditions {
break;
}
//use j as your new i here
}
Note the use of ..=i here in the iterator, this means that it'll actually iterate including i: [0, 1, 2, ..., i-1, i], otherwise, you end up with [0, 1, 2, ..., i-2, i-1]
Otherwise, here is the code:
while (i as isize - 1) != -2 && more_conditions {
more_work;
i -= 1;
}
playground
I'd probably start by using saturating_sub (and _add for parallel structure):
while condition1(i_low) { i_low = i_low.saturating_add(1); }
while condition2(i_high) { j_high = j_high.saturating_sub(1); }
You need to be careful to ensure that your logic handles the value saturating at zero. You could also use more C-like semantics with wrapping_sub.
Truthfully, there's no one-size-fits-all solution. Many times, complicated logic becomes simpler if you abstract it a bit, or turn it slightly sideways. You haven't provided any concrete examples, so we cannot give any useful advice. I solve way too many problems with iterators, so that's often my first solution.
catching exceptions
Absolutely not. That's exceedingly inefficient and non-idiomatic.
Related
I wrote the following code:
val src = (0 until 1000000).toList()
val dest = ArrayList<Double>(src.size / 2 + 1)
for (i in src)
{
if (i % 2 == 0) dest.add(Math.sqrt(i.toDouble()))
}
IntellJ (in my case AndroidStudio) is asking me if I want to replace the for loop with operations from stdlib. This results in the following code:
val src = (0 until 1000000).toList()
val dest = ArrayList<Double>(src.size / 2 + 1)
src.filter { it % 2 == 0 }
.mapTo(dest) { Math.sqrt(it.toDouble()) }
Now I must say, I like the changed code. I find it easier to write than for loops when I come up with similar situations. However upon reading what filter function does, I realized that this is a lot slower code compared to the for loop. filter function creates a new list containing only the elements from src that match the predicate. So there is one more list created and one more loop in the stdlib version of the code. Ofc for small lists it might not be important, but in general this does not sound like a good alternative. Especially if one should chain more methods like this, you can get a lot of additional loops that could be avoided by writing a for loop.
My question is what is considered good practice in Kotlin. Should I stick to for loops or am I missing something and it does not work as I think it works.
If you are concerned about performance, what you need is Sequence. For example, your above code will be
val src = (0 until 1000000).toList()
val dest = ArrayList<Double>(src.size / 2 + 1)
src.asSequence()
.filter { it % 2 == 0 }
.mapTo(dest) { Math.sqrt(it.toDouble()) }
In the above code, filter returns another Sequence, which represents an intermediate step. Nothing is really created yet, no object or array creation (except a new Sequence wrapper). Only when mapTo, a terminal operator, is called does the resulting collection is created.
If you have learned java 8 stream, you may found the above explaination somewhat familiar. Actually, Sequence is roughly the kotlin equivalent of java 8 Stream. They share similiar purpose and performance characteristic. The only difference is Sequence isn't designed to work with ForkJoinPool, thus a lot easier to implement.
When there is multiple steps involved or the collection may be large, it's suggested to use Sequence instead of plain .filter {...}.mapTo{...}. I also suggest you to use the Sequence form instead of your imperative form because it's easier to understand. Imperative form may become complex, thus hard to understand, when there are 5 or more steps involved in the data processing. If there is just one step, you don't need a Sequence, because it just creates garbage and gives you nothing useful.
You're missing something. :-)
In this particular case, you can use an IntProgression:
val progression = 0 until 1_000_000 step 2
You can then create your desired list of squares in various ways:
// may make the list larger than necessary
// its internal array is copied each time the list grows beyond its capacity
// code is very straight forward
progression.map { Math.sqrt(it.toDouble()) }
// will make the list the exact size needed
// no copies are made
// code is more complicated
progression.mapTo(ArrayList(progression.last / 2 + 1)) { Math.sqrt(it.toDouble()) }
// will make the list the exact size needed
// a single intermediate list is made
// code is minimal and makes sense
progression.toList().map { Math.sqrt(it.toDouble()) }
My advice would be to choose whichever coding style you prefer. Kotlin is both object-oriented and functional language, meaning both of your propositions are correct.
Usually, functional constructs favor readability over performance; however, in some cases, procedural code will also be more readable. You should try to stick with one style as much as possible, but don't be afraid to switch some code if you feel like it's better suited to your constraints, either readability, performance, or both.
The converted code does not need the manual creation of the destination list, and can be simplified to:
val src = (0 until 1000000).toList()
val dest = src.filter { it % 2 == 0 }
.map { Math.sqrt(it.toDouble()) }
And as mentioned in the excellent answer by #glee8e you can use a sequence to do a lazy evaluation. The simplified code for using a sequence:
val src = (0 until 1000000).toList()
val dest = src.asSequence() // change to lazy
.filter { it % 2 == 0 }
.map { Math.sqrt(it.toDouble()) }
.toList() // create the final list
Note the addition of the toList() at the end is to change from a sequence back to a final list which is the one copy made during the processing. You can omit that step to remain as a sequence.
It is important to highlight the comments by #hotkey saying that you should not always assume that another iteration or a copy of a list causes worse performance than lazy evaluation. #hotkey says:
Sometimes several loops. even if they copy the whole collection, show good performance because of good locality of reference. See: Kotlin's Iterable and Sequence look exactly same. Why are two types required?
And excerpted from that link:
... in most cases it has good locality of reference thus taking advantage of CPU cache, prediction, prefetching etc. so that even multiple copying of a collection still works good enough and performs better in simple cases with small collections.
#glee8e says that there are similarities between Kotlin sequences and Java 8 streams, for detailed comparisons see: What Java 8 Stream.collect equivalents are available in the standard Kotlin library?
Let's say I'd like to iterate through a generic iterator in reverse, without knowing about the internals of the iterator and essentially not cheating via untyped magic and assuming this could be any type of iterable, which serves a iterator; can we optimise the reverse of a iterator at runtime or even via macros?
Forwards
var a = [1, 2, 3, 4].iterator();
// Actual iteration bellow
for(i in a) {
trace(i);
}
Backwards
var a = [1, 2, 3, 4].iterator();
// Actual reverse iteration bellow
var s = [];
for(i in a) {
s.push(i);
}
s.reverse();
for(i in s) {
trace(i);
}
I would assume that there has to be a simpler way, or at least fast way of doing this. We can't know a size because the Iterator class doesn't carry one, so we can't invert the push on to the temp array. But we can remove the reverse because we do know the size of the temp array.
var a = [1,2,3,4].iterator();
// Actual reverse iteration bellow
var s = [];
for(i in a) {
s.push(i);
}
var total = s.length;
var totalMinusOne = total - 1;
for(i in 0...total) {
trace(s[totalMinusOne - i]);
}
Is there any more optimisations that could be used to remove the possibility of the array?
It bugs me that you have to duplicate the list, though... that's nasty. I mean, the data structure would ALREADY be an array, if that was the right data format for it. A better thing (less memory fragmentation and reallocation) than an Array (the "[]") to copy it into might be a linked List or a Hash.
But if we're using arrays, then Array Comprehensions (http://haxe.org/manual/comprehension) are what we should be using, at least in Haxe 3 or better:
var s = array(for (i in a) i);
Ideally, at least for large iterators that are accessed multiple times, s should be cached.
To read the data back out, you could instead do something a little less wordy, but quite nasty, like:
for (i in 1-s.length ... 1) {
trace(s[-i]);
}
But that's not very readable and if you're after speed, then creating a whole new iterator just to loop over an array is clunky anyhow. Instead I'd prefer the slightly longer, but cleaner, probably-faster, and probably-less-memory:
var i = s.length;
while (--i >= 0) {
trace(s[i]);
}
First of all I agree with Dewi Morgan duplicating the output generated by an iterator to reverse it, somewhat defeats its purpose (or at least some of its benefits). Sometimes it's okay though.
Now, about a technical answer:
By definition a basic iterator in Haxe can only compute the next iteration.
On the why iterators are one-sided by default, here's what we can notice:
if all if iterators could run backwards and forwards, the Iterator classes would take more time to write.
not all iterators run on collections or series of numbers.
E.g. 1: an iterator running on the standard input.
E.g. 2: an iterator running on a parabolic or more complicated trajectory for a ball.
E.g. 3: slightly different but think about the performance problems running an iterator on a very large single-linked list (eg the class List). Some iterators can be interrupted in the middle of the iteration (Lambda.has() and Lambda.indexOf() for instance return as soon as there is a match, so you normally don't want to think of what's iterated as a collection but more as an interruptible series or process iterated step by step).
While this doesn't mean you shouldn't define two-ways iterators if you need them (I've never done it in Haxe but it doesn't seem impossible), in the absolute having two-ways iterators isn't that natural, and enforcing Iterators to be like that would complicate coding one.
An intermediate and more flexible solution is to simply have ReverseXxIter where you need, for instance ReverseIntIter, or Array.reverseIter() (with using a custom ArrayExt class). So it's left for every programmer to write their own answers, I think it's a good balance; while it takes more time and frustration in the beginning (everybody probably had the same kind of questions), you end up knowing the language better and in the end there are just benefits for you.
Complementing the post of Dewi Morgan, you can use for(let i = a.length; --i >= 0;) i; if you wish to simplify the while() method. if you really need the index values, I think for(let i=a.length, k=keys(a); --i in k;) a[k[i]]; is the best that give to do keeping the performance. There is also for(let i of keys(a).reverse()) a[i]; which has cleaner writing, but its iteration rate increases 1n using .reduce()
I remember many years back, when I was in school, one of my computer science teachers taught us that it was better to check for 'trueness' or 'equality' of a condition and not the negative stuff like 'inequality'.
Let me elaborate - If a piece of conditional code can be written by checking whether an expression is true or false, we should check the 'trueness'.
Example: Finding out whether a number is odd - it can be done in two ways:
if ( num % 2 != 0 )
{
// Number is odd
}
or
if ( num % 2 == 1 )
{
// Number is odd
}
(Please refer to the marked answer for a better example.)
When I was beginning to code, I knew that num % 2 == 0 implies the number is even, so I just put a ! there to check if it is odd. But he was like 'Don't check NOT conditions. Have the practice of checking the 'trueness' or 'equality' of conditions whenever possible.' And he recommended that I use the second piece of code.
I am not for or against either but I just wanted to know - what difference does it make? Please don't reply 'Technically the output will be the same' - we ALL know that. Is it a general programming practice or is it his own programming practice that he is preaching to others?
NOTE: I used C#/C++ style syntax for no reason. My question is equally applicable when using the IsNot, <> operators in VB etc. So readability of the '!' operator is just one of the issues. Not THE issue.
The problem occurs when, later in the project, more conditions are added - one of the projects I'm currently working on has steadily collected conditions over time (and then some of those conditions were moved into struts tags, then some to JSTL...) - one negative isn't hard to read, but 5+ is a nightmare, especially when someone decides to reorganize and negate the whole thing. Maybe on a new project, you'll write:
if (authorityLvl!=Admin){
doA();
}else{
doB();
}
Check back in a month, and it's become this:
if (!(authorityLvl!=Admin && authorityLvl!=Manager)){
doB();
}else{
doA();
}
Still pretty simple, but it takes another second.
Now give it another 5 to 10 years to rot.
(x%2!=0) certainly isn't a problem, but perhaps the best way to avoid the above scenario is to teach students not to use negative conditions as a general rule, in the hopes that they'll use some judgement before they do - because just saying that it could become a maintenance problem probably won't be enough motivation.
As an addendum, a better way to write the code would be:
userHasAuthority = (authorityLvl==Admin);
if (userHasAuthority){
doB();
else{
doA();
}
Now future coders are more likely to just add "|| authorityLvl==Manager", userHasAuthority is easier to move into a method, and even if the conditional is reorganized, it will only have one negative. Moreover, no one will add a security hole to the application by making a mistake while applying De Morgan's Law.
I will disagree with your old professor - checking for a NOT condition is fine as long as you are checking for a specific NOT condition. It actually meets his criteria: you would be checking that it is TRUE that a value is NOT something.
I grok what he means though - mostly the true condition(s) will be orders of magnitude smaller in quantity than the NOT conditions, therefore easier to test for as you are checking a smaller set of values.
I've had people tell me that it's to do with how "visible" the ping (!) character is when skim reading.
If someone habitually "skim reads" code - perhaps because they feel their regular reading speed is too slow - then the ! can be easily missed, giving them a critical mis-understanding of the code.
On the other hand, if a someone actually reads all of the code all of the time, then there is no issue.
Two very good developers I've worked with (and respect highily) will each write == false instead of using ! for similar reasons.
The key factor in my mind is less to do with what works for you (or me!), and more with what works for the guy maintaining the code. If the code is never going to be seen or maintained by anyone else, follow your personal whim; if the code needs to be maintained by others, better to steer more towards the middle of the road. A minor (trivial!) compromise on your part now, might save someone else a week of debugging later on.
Update: On further consideration, I would suggest factoring out the condition as a separate predicate function would give still greater maintainability:
if (isOdd(num))
{
// Number is odd
}
You still have to be careful about things like this:
if ( num % 2 == 1 )
{
// Number is odd
}
If num is negative and odd then depending on the language or implementation num % 2 could equal -1. On that note, there is nothing wrong with checking for the falseness if it simplifies at least the syntax of the check. Also, using != is more clear to me than just !-ing the whole thing as the ! may blend in with the parenthesis.
To only check the trueness you would have to do:
if ( num % 2 == 1 || num % 2 == -1 )
{
// Number is odd
}
That is just an example obviously. The point is that if using a negation allows for fewer checks or makes the syntax of the checks clear then that is clearly the way to go (as with the above example). Locking yourself into checking for trueness does not suddenly make your conditional more readable.
I remember hearing the same thing in my classes as well. I think it's more important to always use the more intuitive comparison, rather than always checking for the positive condition.
Really a very in-consequential issue. However, one negative to checking in this sense is that it only works for binary comparisons. If you were for example checking some property of a ternary numerical system you would be limited.
Replying to Bevan (it didn't fit in a comment):
You're right. !foo isn't always the same as foo == false. Let's see this example, in JavaScript:
var foo = true,
bar = false,
baz = null;
foo == false; // false
!foo; // false
bar == false; // true
!bar; // true
baz == false; // false (!)
!baz; // true
I also disagree with your teacher in this specific case. Maybe he was so attached to the generally good lesson to avoid negatives where a positive will do just fine, that he didn't see this tree for the forest.
Here's the problem. Today, you listen to him, and turn your code into:
// Print black stripe on odd numbers
int zebra(int num) {
if (num % 2 == 1) {
// Number is odd
printf("*****\n");
}
}
Next month, you look at it again and decide you don't like magic constants (maybe he teaches you this dislike too). So you change your code:
#define ZEBRA_PITCH 2
[snip pages and pages, these might even be in separate files - .h and .c]
// Print black stripe on non-multiples of ZEBRA_PITCH
int zebra(int num) {
if (num % ZEBRA_PITCH == 1) {
// Number is not a multiple of ZEBRA_PITCH
printf("*****\n");
}
}
and the world seems fine. Your output hasn't changed, and your regression testsuite passes.
But you're not done. You want to support mutant zebras, whose black stripes are thicker than their white stripes. You remember from months back that you originally coded it such that your code prints a black stripe wherever a white strip shouldn't be - on the not-even numbers. So all you have to do is to divide by, say, 3, instead of by 2, and you should be done. Right? Well:
#define DEFAULT_ZEBRA_PITCH 2
[snip pages and pages, these might even be in separate files - .h and .c]
// Print black stripe on non-multiples of pitch
int zebra(int num, int pitch) {
if (num % pitch == 1) {
// Number is odd
printf("*****\n");
}
}
Hey, what's this? You now have mostly-white zebras where you expected them to be mostly black!
The problem here is how think about numbers. Is a number "odd" because it isn't even, or because when dividing by 2, the remainder is 1? Sometimes your problem domain will suggest a preference for one, and in those cases I'd suggest you write your code to express that idiom, rather than fixating on simplistic rules such as "don't test for negations".
I just want to know what the difference between all the conditional statements in objective-c and which one is faster and lighter.
One piece of advice: stop worrying about which language constructs are microscopically faster or slower than which others, and instead focus on which ones let you express yourself best.
If and case statements described
While statement described
Since these statements do different things, it is unproductive to debate which is faster.
It's like asking whether a hammer is faster than a screwdriver.
The language-agnostic version (mostly, obviously this doesn't count for declarative languages or other weird ones):
When I was taught programming (quite a while ago, I'll freely admit), a language consisted of three ways of executing instructions:
sequence (doing things in order).
selection (doing one of many things).
iteration (doing something zero or more times).
The if and case statements are both variants on selection. If is used to select one of two different options based on a condition (using pseudo-code):
if condition:
do option 1
else:
do option 2
keeping in mind that the else may not be needed in which case it's effectively else do nothing. Also remember that option 1 or 2 may also consist of any of the statement types, including more if statements (called nesting).
Case is slightly different - it's generally meant for more than two choices like when you want to do different things based on a character:
select ch:
case 'a','e','i','o','u':
print "is a vowel"
case 'y':
print "never quite sure"
default:
print "is a consonant"
Note that you can use case for two options (or even one) but it's a bit like killing a fly with a thermonuclear warhead.
While is not a selection variant but an iteration one. It belongs with the likes of for, repeat, until and a host of other possibilities.
As to which is fastest, it doesn't matter in the vast majority of cases. The compiler writers know far more than we mortal folk how to get the last bit of performance out of their code. You either trust them to do their job right or you hand-code it in assembly yourself (I'd prefer the former).
You'll get far more performance by concentrating on the macro view rather than the minor things. That includes selection of appropriate algorithms, profiling, and targeting of hot spots. It does little good to find something that take five minutes each month and get that running in two minutes. Better to get a smaller improvement in something happening every minute.
The language constructs like if, while, case and so on will already be as fast as they can be since they're used heavily and are relative simple. You should be first writing your code for readability and only worrying about performance when it becomes an issue (see YAGNI).
Even if you found that using if/goto combinations instead of case allowed you to run a bit faster, the resulting morass of source code would be harder to maintain down the track.
while isn't a conditional it is a loop. The difference being that the body of a while-loop can be executed many times, the body of a conditional will only be executed once or not at all.
The difference between if and switch is that if accepts an arbitrary expression as the condition and switch just takes values to compare against. Basically if you have a construct like if(x==0) {} else if(x==1) {} else if(x==2) ..., it can be written much more concisely (and effectively) by using switch.
A case statement could be written as
if (a)
{
// Do something
}
else if (b)
{
// Do something else
}
But the case is much more efficient, since it only evaluates the conditional once and then branches.
while is only useful if you want a condition to be evaluated, and the associated code block executed, multiple times. If you expect a condition to only occur once, then it's equivalent to if. A more apt comparison is that while is a more generalized for.
Each condition statement serves a different purpose and you won't use the same one in every situation. Learn which ones are appropriate for which situation and then write your code. If you profile your code and find there's a bottleneck, then you go ahead and address it. Don't worry about optimizing before there's actually a problem.
Are you asking whether an if structure will execute faster than a switch statement inside of a large loop? If so, I put together a quick test, this code was put into the viewDidLoad method of a new view based project I just created in the latest Xcode and iPhone SDK:
NSLog(#"Begin loop");
NSDate *loopBegin = [NSDate date];
int ctr0, ctr1, ctr2, ctr3, moddedNumber;
ctr0 = 0;
ctr1 = 0;
ctr2 = 0;
ctr3 = 0;
for (int i = 0; i < 10000000; i++) {
moddedNumber = i % 4;
// 3.34, 1.23s in simulator
if (moddedNumber == 0)
{
ctr0++;
}
else if (moddedNumber == 1)
{
ctr1++;
}
else if (moddedNumber == 2)
{
ctr2++;
}
else if (moddedNumber == 3)
{
ctr3++;
}
// 4.11, 1.34s on iPod Touch
/*switch (moddedNumber)
{
case 0:
ctr0++;
break;
case 1:
ctr1++;
break;
case 2:
ctr2++;
break;
case 3:
ctr3++;
break;
}*/
}
NSTimeInterval elapsed = [[NSDate date] timeIntervalSinceDate:loopBegin];
NSLog(#"End loop: %f seconds", elapsed );
This code sample is by no means complete, because as pointed out earlier if you have a situation that comes up more times than the others, you would of course want to put that one up front to reduce the total number of comparisons. It does show that the if structure would execute a bit faster in a situation where the decisions are more or less equally divided among the branches.
Also, keep in mind that the results of this little test varied widely in performance between running it on a device vs. running it in the emulator. The times cited in the code comments are running on an actual device. (The first time shown is the time to run the loop the first time the code was run, and the second number was the time when running the same code again without rebuilding.)
There are conditional statements and conditional loops. (If Wikipedia is to be trusted, then simply referring to "a conditional" in programming doesn't cover conditional loops. But this is a minor terminology issue.)
Shmoopty said "Since these statements do different things, it is nonsensical to debate which is faster."
Well... it may be time poorly spent, but it's not nonsensical. For instance, let's say you have an if statement:
if (cond) {
code
}
You can transform that into a loop that executes at most one time:
while (cond) {
code
break;
}
The latter will be slower in pretty much any language (or the same speed, because the optimizer turned it back into the original if behind the scenes!) Still, there are occasions in computer programming where (due to bizarre circumstances) the convoluted thing runs faster
But those incidents are few and far between. The focus should be on your code--what makes it clearest, and what captures your intent.
loops and branches are hard to explain briefly, to get the best code out of a construct in any c-style language depends on the processor used and the local context of the code. The main objective is to reduce the breaking of the execution pipeline -- primarily by reducing branch mispredictions.
I suggest you go here for all your optimization needs. The manuals are written for the c-style programmer and relatively easy to understand if you know some assembly. These manuals should explain to you the subtleties in modern processors, the strategies used by top compilers, and the best way to structure code to get the most out of it.
I just remembered the most important thing about conditionals and branching code. Order your code as follows
if(x==1); //80% of the time
else if(x==2); // 10% of the time
else if(x==3); //6% of the time
else break;
You must use an else sequence... and in this case the prediction logic in your CPU will predict correctly for x==1 and avoid the breaking of your pipeline for 80% of all execution.
More information from intel. Particularly:
In order to effectively write your code to take advantage of these rules, when writing if-else or switch statements, check the most common cases first and work progressively down to the least common. Loops do not necessarily require any special ordering of code for static branch prediction, as only the condition of the loop iterator is normally used.
By following this rule you are flat-out giving the CPU hints about how to bias its prediction logic towards your chained conditionals.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I know that you should only optimize things when it is deemed necessary. But, if it is deemed necessary, what are your favorite low level (as opposed to algorithmic level) optimization tricks.
For example: loop unrolling.
gcc -O2
Compilers do a lot better job of it than you can.
Picking a power of two for filters, circular buffers, etc.
So very, very convenient.
-Adam
Why, bit twiddling hacks, of course!
One of the most useful in scientific code is to replace pow(x,4) with x*x*x*x. Pow is almost always more expensive than multiplication. This is followed by
for(int i = 0; i < N; i++)
{
z += x/y;
}
to
double denom = 1/y;
for(int i = 0; i < N; i++)
{
z += x*denom;
}
But my favorite low level optimization is to figure out which calculations can be removed from a loop. Its always faster to do the calculation once rather than N times. Depending on your compiler, some of these may be automatically done for you.
Inspect the compiler's output, then try to coerce it to do something faster.
I wouldn't necessarily call it a low level optimization, but I have saved orders of magnitude more cycles through judicious application of caching than I have through all my applications of low level tricks combined. Many of these methods are applications specific.
Having an LRU cache of database queries (or any other IPC based request).
Remembering the last failed database query and returning a failure if re-requested within a certain time frame.
Remembering your location in a large data structure to ensure that if the next request is for the same node, the search is free.
Caching calculation results to prevent duplicate work. In addition to more complex scenarios, this is often found in if or for statements.
CPUs and compilers are constantly changing. Whatever low level code trick that made sense 3 CPU chips ago with a different compiler may actually be slower on the current architecture and there may be a good chance that this trick may confuse whoever is maintaining this code in the future.
++i can be faster than i++, because it avoids creating a temporary.
Whether this still holds for modern C/C++/Java/C# compilers, I don't know. It might well be different for user-defined types with overloaded operators, whereas in the case of simple integers it probably doesn't matter.
But I've come to like the syntax... it reads like "increment i" which is a sensible order.
Using template metaprogramming to calculate things at compile time instead of at run-time.
Years ago with a not-so-smart compilier, I got great mileage from function inlining, walking pointers instead of indexing arrays, and iterating down to zero instead of up to a maximum.
When in doubt, a little knowledge of assembly will let you look at what the compiler is producing and attack the inefficient parts (in your source language, using structures friendlier to your compiler.)
precalculating values.
For instance, instead of sin(a) or cos(a), if your application doesn't necessarily need angles to be very precise, maybe you represent angles in 1/256 of a circle, and create arrays of floats sine[] and cosine[] precalculating the sin and cos of those angles.
And, if you need a vector at some angle of a given length frequently, you might precalculate all those sines and cosines already multiplied by that length.
Or, to put it more generally, trade memory for speed.
Or, even more generally, "All programming is an exercise in caching" -- Terje Mathisen
Some things are less obvious. For instance traversing a two dimensional array, you might do something like
for (x=0;x<maxx;x++)
for (y=0;y<maxy;y++)
do_something(a[x,y]);
You might find the processor cache likes it better if you do:
for (y=0;y<maxy;y++)
for (x=0;x<maxx;x++)
do_something(a[x,y]);
or vice versa.
Don't do loop unrolling. Don't do Duff's device. Make your loops as small as possible, anything else inhibits x86 performance and gcc optimizer performance.
Getting rid of branches can be useful, though - so getting rid of loops completely is good, and those branchless math tricks really do work. Beyond that, try never to go out of the L2 cache - this means a lot of precalculation/caching should also be avoided if it wastes cache space.
And, especially for x86, try to keep the number of variables in use at any one time down. It's hard to tell what compilers will do with that kind of thing, but usually having less loop iteration variables/array indexes will end up with better asm output.
Of course, this is for desktop CPUs; a slow CPU with fast memory access can precalculate a lot more, but in these days that might be an embedded system with little total memory anyway…
I've found that changing from a pointer to indexed access may make a difference; the compiler has different instruction forms and register usages to choose from. Vice versa, too. This is extremely low-level and compiler dependent, though, and only good when you need that last few percent.
E.g.
for (i = 0; i < n; ++i)
*p++ = ...; // some complicated expression
vs.
for (i = 0; i < n; ++i)
p[i] = ...; // some complicated expression
Optimizing cache locality - for example when multiplying two matrices that don't fit into cache.
Allocating with new on a pre-allocated buffer using C++'s placement new.
Counting down a loop. It's cheaper to compare against 0 than N:
for (i = N; --i >= 0; ) ...
Shifting and masking by powers of two is cheaper than division and remainder, / and %
#define WORD_LOG 5
#define SIZE (1 << WORD_LOG)
#define MASK (SIZE - 1)
uint32_t bits[K]
void set_bit(unsigned i)
{
bits[i >> WORD_LOG] |= (1 << (i & MASK))
}
Edit
(i >> WORD_LOG) == (i / SIZE) and
(i & MASK) == (i % SIZE)
because SIZE is 32 or 2^5.
Jon Bentley's Writing Efficient Programs is a great source of low- and high-level techniques -- if you can find a copy.
Eliminating branches (if/elses) by using boolean math:
if(x == 0)
x = 5;
// becomes:
x += (x == 0) * 5;
// if '5' was a base 2 number, let's say 4:
x += (x == 0) << 2;
// divide by 2 if flag is set
sum >>= (blendMode == BLEND);
This REALLY speeds things out especially when those ifs are in a loop or somewhere that is being called a lot.
The one from Assembler:
xor ax, ax
instead of:
mov ax, 0
Classical optimization for program size and performance.
In SQL, if you only need to know whether any data exists or not, don't bother with COUNT(*):
SELECT 1 FROM table WHERE some_primary_key = some_value
If your WHERE clause is likely return multiple rows, add a LIMIT 1 too.
(Remember that databases can't see what your code's doing with their results, so they can't optimise these things away on their own!)
Recycling the frame-pointer all of a sudden
Pascal calling-convention
Rewrite stack-frame tail call optimizarion (although it sometimes messes with the above)
Using vfork() instead of fork() before exec()
And one I am still looking for, an excuse to use: data driven code-generation at runtime
Liberal use of __restrict to eliminate load-hit-store stalls.
Rolling up loops.
Seriously, the last time I needed to do anything like this was in a function that took 80% of the runtime, so it was worth trying to micro-optimize if I could get a noticeable performance increase.
The first thing I did was to roll up the loop. This gave me a very significant speed increase. I believe this was a matter of cache locality.
The next thing I did was add a layer of indirection, and put some more logic into the loop, which allowed me to only loop through the things I needed. This wasn't as much of a speed increase, but it was worth doing.
If you're going to micro-optimize, you need to have a reasonable idea of two things: the architecture you're actually using (which is vastly different from the systems I grew up with, at least for micro-optimization purposes), and what the compiler will do for you.
A lot of the traditional micro-optimizations trade space for time. Nowadays, using more space increases the chances of a cache miss, and there goes your performance. Moreover, a lot of them are now done by modern compilers, and typically better than you're likely to do them.
Currently, you should (a) profile to see if you need to micro-optimize, and then (b) try to trade computation for space, in the hope of keeping as much as possible in cache. Finally, run some tests, so you know if you've improved things or screwed them up. Modern compilers and chips are far too complex for you to keep a good mental model, and the only way you'll know if some optimization works or not is to test.
In addition to Joshua's comment about code generation (a big win), and other good suggestions, ...
I'm not sure if you would call it "low-level", but (and this is downvote-bait) 1) stay away from using any more levels of abstraction than absolutely necessary, and 2) stay away from event-driven notification-style programming, if possible.
If a computer executing a program is like a car running a race, a method call is like a detour. That's not necessarily bad except there's a strong temptation to nest those things, because once you're written a method call, you tend to forget what that call could cost you.
If your're relying on events and notifications, it's because you have multiple data structures that need to be kept in agreement. This is costly, and should only be done if you can't avoid it.
In my experience, the biggest performance killers are too much data structure and too much abstraction.
I was amazed at the speedup I got by replacing a for loop adding numbers together in structs:
const unsigned long SIZE = 100000000;
typedef struct {
int a;
int b;
int result;
} addition;
addition *sum;
void start() {
unsigned int byte_count = SIZE * sizeof(addition);
sum = malloc(byte_count);
unsigned int i = 0;
if (i < SIZE) {
do {
sum[i].a = i;
sum[i].b = i;
i++;
} while (i < SIZE);
}
}
void test_func() {
unsigned int i = 0;
if (i < SIZE) { // this is about 30% faster than the more obvious for loop, even with O3
do {
addition *s1 = &sum[i];
s1->result = s1->b + s1->a;
i++;
} while ( i<SIZE );
}
}
void finish() {
free(sum);
}
Why doesn't gcc optimise for loops into this? Or is there something I missed? Some cache effect?