This question concerns optimization. Suppose I need the array length of an array A at two places in my code. Should I use the function a.length() in the two places, or is it faster to assign a local variable the value of a.length() and use it at the two places.
By "faster" I mean in terms of running time. Moreover, i am talking asymptotically.
The asymptotic complexity of calling the function twice is the same - any constant number of calls to the same (pure) function on the same arguments has the same asymptotic complexity as a single call to that function, since you can just roll the constant number of calls into the big-O's hidden constant.
As for what will be faster, there's no guarantee which one will be faster. It depends on the language and compiler. I'd suggest just writing it both ways and timing the result to see if there's an appreciable difference. That said, if you are writing something that is so performance-critical that you can't afford to call .length() twice, you may need to reconsider your approach in general to see if there's a better global solution to the problem. Microoptimizations are rarely worth the effort unless you have a compelling reason to believe that your program is markedly slower in the unoptimized version.
If you have to ask the question, you're not at a point where it matters yet. If you were, you'd already have code that you've profiled, and you could just try it and see. This kind of thing depends heavily on your language and compiler, and the only results that matter are the ones you see.
Don't worry about micro-optimizations til you find you need to shave cycles, and even then the algorithm is the first thing to check.
What language? In many languages, such calls are optimized away (either at compile time or by a JIT compiler) into direct access to the length field of the array object.
Related
I am in the second year of my bachelor study in information technology. Last year in one of my courses they taught me to write clean code so other programmers have an easier time working with your code. I learned a lot about writing clean code from a video ("clean code") on pluralsight (paid website for learning which my school uses). There was an example in there about assigning if conditions to boolean variables and using them to enhance readability. In my course today my teacher told me it's very bad code because it decreases performance (in bigger programs) due to increased tests being executed. I was wondering now whether I should continue using boolean variables for readability or not use them for performance. I will illustrate in an example (I am using python code for this example):
example boolean variable
Let's say we need to check whether somebody is legal to drink alcohol we get the persons age and we know the legal drinking age is 21.
is_old_enough = persons_age >= legal_drinking_age
if is_old_enough:
do something
My teacher told me today that this would be very bad for performance since 2 tests are performed first persons_age >= legal_drinking_age is tested and secondly in the if another test occurs whether the person is_old_enough.
My teacher told me that I should just put the condition in the if, but in the video they said that code should be read like natural language to make it clear for other programmers. I was wondering now which would be the better coding practice.
example condition in if:
if persons_age >= legal_drinking_age:
do something
In this example only 1 test is tested whether persons_age >= legal_drinking_age. According to my teacher this is better code.
Thank you in advance!
yours faithfully
Jonas
I was wondering now which would be the better coding practice.
The real safe answer is : Depends..
I hate to use this answer, but you won't be asking unless you have faithful doubt. (:
IMHO:
If the code will be used for long-term use, where maintainability is important, then a clearly readable code is preferred.
If the program speed performance crucial, then any code operation that use less resource (smaller dataSize/dataType /less loop needed to achieve the same thing/ optimized task sequencing/maximize cpu task per clock cycle/ reduced data re-loading cycle) is better. (example keyword : space-for-time code)
If the program minimizing memory usage is crucial, then any code operation that use less storage and memory resource to complete its operation (which may take more cpu cycle/loop for the same task) is better. (example: small devices that have limited data storage/RAM)
If you are in a race, then you may what to code as short as possible, (even if it may take a slightly longer cpu time later). example : Hackathon
If you are programming to teach a team of student/friend something.. Then readable code + a lot of comment is definitely preferred .
If it is me.. I'll stick to anything closest to assembly language as possible (as much control on the bit manipulation) for backend development. and anything closest to mathematica-like code (less code, max output, don't really care how much cpu/memory resource is needed) for frontend development. ( :
So.. If it is you.. you may have your own requirement/preference.. from the user/outsiders/customers point of view.. it is just a working/notWorking program. YOur definition of good program may defer from others.. but this shouldn't stop us to be flexible in the coding style/method.
Happy exploring. Hope it helps.. in any way possible.
Performance
Performance is one of the least interesting concerns for this question, and I say this as one working in very performance-critical areas like image processing and raytracing who believes in effective micro-optimizations (but my ideas of effective micro-optimization would be things like improving memory access patterns and memory layouts for cache efficiency, not eliminating temporary variables out of fear that your compiler or interpreter might allocate additional registers and/or utilize additional instructions).
The reason it's not so interesting is, because, as pointed out in the comments, any decent optimizing compiler is going to treat those two you wrote as equivalent by the time it finishes optimizing the intermediate representation and generates the final results of the instruction selection/register allocation to produce the final output (machine code). And if you aren't using a decent optimizing compiler, then this sort of microscopic efficiency is probably the last thing you should be worrying about either way.
Variable Scopes
With performance aside, the only concern I'd have with this convention, and I think it's generally a good one to apply liberally, is for languages that don't have a concept of a named constant to distinguish it from a variable.
In those cases, the more variables you introduce to a meaty function, the more intellectual overhead it can have as the number of variables with a relatively wide scope increases, and that can translate to practical burdens in maintenance and debugging in extreme cases. If you imagine a case like this:
some_variable = ...
...
some_other_variable = ...
...
yet_another_variable = ...
(300 lines more code to the function)
... in some function, and you're trying to debug it, then those variables combined with the monstrous size of the function starts to multiply the difficulty of trying to figure out what went wrong. That's a practical concern I've encountered when debugging codebases spanning millions of lines of code written by all sorts of people (including those no longer on the team) where it's not so fun to look at the locals watch window in a debugger and see two pages worth of variables in some monstrous function that appears to be doing something incorrectly (or in one of the functions it calls).
But that's only an issue when it's combined with questionable programming practices like writing functions that span hundreds or thousands of lines of code. In those cases it will often improve everything just focusing on making reasonable-sized functions that perform one clear logical operation and don't have more than one side effect (or none ideally if the function can be programmed as a pure function). If you design your functions reasonably then I wouldn't worry about this at all and favor whatever is readable and easiest to comprehend at a glance and maybe even what is most writable and "pliable" (to make changes to the function easier if you anticipate a future need).
A Pragmatic View on Variable Scopes
So I think a lot of programming concepts can be understood to some degree by just understanding the need to narrow variable scopes. People say avoid global variables like the plague. We can go into issues with how that shared state can interfere with multithreading and how it makes programs difficult to change and debug, but you can understand a lot of the problems just through the desire to narrow variable scopes. If you have a codebase which spans a hundred thousand lines of code, then a global variable is going to have the scope of a hundred thousands of lines of code for both access and modification, and crudely speaking a hundred thousand ways to go wrong.
At the same time that pragmatic sort of view will find it pointless to make a one-shot program which only spans 100 lines of code with no future need for extension avoid global variables like the plague, since a global here is only going to have 100 lines worth of scope, so to speak. Meanwhile even someone who avoids those like the plague in all contexts might still write a class with member variables (including some superfluous ones for "convenience") whose implementation spans 8,000 lines of code, at which point those variables have a much wider scope than even the global variable in the former example, and this realization could drive someone to design smaller classes and/or reduce the number of superfluous member variables to include as part of the state management for the class (which can also translate to simplified multithreading and all the similar types of benefits of avoiding global variables in some non-trivial codebase).
And finally it'll tend to tempt you to write smaller functions as well, since a variable towards the top of some function spanning 500 lines of code is going to also have a fairly wide scope. So anyway, my only concern when you do this is to not let the scope of those temporary, local variables get too wide. And if they do, then the general answer is not necessarily to avoid those variables but to narrow their scope.
When you have a function that accepts an array as an argument and calls another function with that array and that calls another function with it and so forth the stack will contain many copies of the pointer to that array. I just thought of an interesting way to alleviate this problem but I'm wondering whether or not it is worth implementing.
Does anyone have any idea how often stacks contain duplicate pointers in practice?
EDIT
Just to clarify, I am not optimizing a given program but, rather, am considering writing a new kind of optimization pass for my VM. My benchmarks have indicated that my current solution causes up to 70% of the total running time to be spent in stack manipulations. The optimization pass I am thinking of would generate code at compile time that would perform the same actions but pointers would (potentially) be duplicated on the stack less often. I am interested in any prior studies that have measured the number of duplicates on the stack because this would help me to quantify my optimization's potential. For example, if it is known that real programs do not push pointers already on the stack in practice then my optimization is worthless.
Moreover, these stack manipulations are due to the code generated by my VM making sure locally-held pointers are visible to the garbage collector and not due only to function parameters as both answerers have currently assumed. And they are actually operations on a shadow stack rather than the main stack.
First of all, the answer will depend on your application.
Secondly, even with high duplication, I doubt there is much sense in implementing the mechanism you describe, or even that it is possible in a general case. If you call a method and you pass it parameters, you must do it either one way or another.
There may be advantages to doing it in some specific way - for example there are several function calling conventions and many C/C++ compilers (e.g. gcc) let you choose between passing parameters on the stack or via registers. In certain cases, the latter may be faster - you can try and benchmark if it helps your application.
But in a general case, the cost of detecting duplicated values on the stack and "reusing" them would probably much exceed any gains from having a smaller stack. The code for pushing and popping values is really simple (just a few CPU instructions in an optimized case), code for finding and reusing duplicates - hardly so. You would also have to somehow store the information about which values are already on the stack and how to find them - a nontrivial data structure. Except for some really weird cases, I don't think this would be smaller than the actual copied data itself.
What you could do, would be to rewrite your algorithm in such way that some function calls are eliminated. For example, if your function's result only depends on the input arguments, you could somehow cache or memoize the results, thus avoiding repeated calls with the same values. This may indeed bring some gains, though it's usually a memory vs CPU time tradeoff. Getting an advantage both in memory and in CPU time is rarely possible. Also, rewriting your algorithm is not really "avoiding duplication of data on the stack".
Any way, for the original question, I think the idea is not viable and you should look at optimizations elsewhere.
PS: You use case may somewhat resemble tail-call optimization, so perhaps that's a direction worth looking at - but if you implement it yourself, I would also consider this to fall into the "change your algorithm" category. Maybe changing from a recursive algorithm to an iterative one could help also.
Can I suggest getting some exposure to actual performance tuning?
(Here's my canonical example.)
Between the time a program starts and the time it ends, of the cycles it uses, it obviously uses 100% of those cycles.
If it goes in and out of functions, and passes pointers to an array, but does nothing else, then there's no surprise that a high percent of time goes into function entry and exit, and passing arguments.
If a program P is written to do task T, there are a multitude of other programs P' which could also do task T. Some of them take fewer cycles than all the others, and those are the optimal ones.
The way the optimal ones differ from the non-optimal ones is that the non-optimal ones are doing things that can be done without.
So, to optimize any program, find out what cycles are being spent that don't have to be, and get rid of those activities. That link shows in great detail how I do it.
Trying to pass fewer arguments to functions might or might not be necessary, depending on what your diagnostics tell you.
I'd rather not use code since it's common concept:
Say we have the scenario of a function which is neither too big or too small and also can't easily in itself be optimized with OpenMP for-loop optimizations.
However, it is a function which is called millions of times throughout the project's run in a few hundred unrelated circumstances in the code.
[inline in itself doesn't seem to do much (on by default on optimized gcc outcomes) and making it into a macro while not parallel either, it would be an undertaking to be compatible.]
OpenMP is for "making things run in parallel" - in general. Not only for loops... Well, you don't even need to have any loops at all to make some good use of OpenMP and speed up your code.
The only thing which matters is: "do I have a several independent operations which run one after one, and which could work at the same time instead?". If so, then you've found an easy spot for optimization with OpenMP.
When the function is called, is it called multiple times, particularly in a loop? The question is a little vague -- maybe yes (it's called thousands of times in each of a few hundred unrelated places -> millions) or maybe no (it's called once in each of a hundred unrelated places, and you hit those sections of code thousands of times -> millions).
In the first case, then yes, parallelizing the `map' -- that is, applying the function independantly to a bunch of cases -- is easy and OpenMPs very well.
In the second case, if the function is called a million times but each time once, then no. There's repetition of execution there, but no exposed concurrency; there's no list of tasks that have to be done at the same time that can be done independantly. All that you can do there, if the function is likely to be called with repeated parameters, is to use memoization, which is a memory/compute time tradeoff, not a parallelization technique.
In the second case, it may be the case that you can restructure the code so that a bunch of those function calls are made at once, thus exposing the concurrency and allowing parallelization -- but its not something that OpenMP (or any parallel programming model) can automatically do for you.
The idea is somewhat similar to what Apple has done in the OpenGL stack. I want to have that a bit more general.
Basically, I want to have specialised and optimised variants of some code for some specific cases.
In other words: I have given an algorithm/code for a function (let B = {0,1})
f : B^n -> B^m
Now, I special a specific case by a function (which predefines part of the input of f)
preset : {1..n} -> {0,1,unset}
The amount of predefinitions (∈ {0..n}) is then given by
pn := |preset⁻¹({0,1})|
Canonically, we now get a specialised function
f_preset : B^(n-pn) -> B^m
Also canonically, we get the code/algorithm for this specialised function. Naturally, the code for f_preset will be somewhat more fast than f with pn > 0. Then, you also can optimise this code further (there might be some dead code now, some loops can be unpacked now, some calculations can be precalculated, etc). In some cases, it can have noteable improvements.
Apple does roughly this for their OpenGL stack (from what I have read / know): They try to find a good preset at runtime after everything is setup for variables which will not change anymore, then make an optimised version of the specialised function and only use that one instead of the original function.
Initially, I thought about a way to optimise the physics simulation of some own game. There I have a lot of particle objects and a set of particle types (which is unknown at compile time). A particle type is a set of attributes. The particle types are fixed and constant once they are loaded. Each particle object is of one of theye particle types. The physic simulation for a particle object is some very heavy peace of code with many many branches and very heavily depends on the particle type. My idea was now to have an optimised physics simulation function for each particle type.
After thinking a bit about this, I wanted to go a bit further:
I want to automatically calculate a set of such presets at runtime and maintain the optimised code for each. And I want to automatically add or remove presets when the circumstances change.
There are several questions now:
Is there an easy way to calculate a good preset? How do I know what variables are constant for a given situation?
Is there an easy way to check how good a preset is? 'Good' refers to the performance of the resulting optimised code.
How to compare two algorithms/codes for performance? Via some heuristic? Or by testing with random input?
How many presets (and optimised code variants) should there be for a function? A fixed limit for all functions? Or is this different for every function? Is it maybe even depending on the current computer state?
How to maintain the different optimised code variants? A wrapper function around f which chooses automatically the best optimised variant doesn't seem to be very nice as this maybe not so easy check would be needed for every single call. A solution to this problem might also be deeply related to the question about how to find the set/amount of good presets. (In the particle type case, the optimised code would be attached to / saved together with the particle type. The amount of particle types also define the amount of presets.)
For my initial case, most of these questions are kind of obsolete but am really interested now in how to do this in a more general way. Of course, most/all of these questions are also uncalculateable but I wonder to what degree you may still get good results.
This whole topic is also very important for optimisations in JIT compilers. Are they doing these kind of optimisations already? To what degree?
Are there good recent research works which answers some of my questions? Or maybe also some results which say that it is just too hard to do this in such a general way?
It seems to me you are asking about partial evaluation.
I actually have a bit of a problem with that concept, because it is usually couched in terms that are over-academic and over-difficult.
The way it is usually expressed is that you have some general function F(Islow, Ifast) having arguments that can take different values at different times. The Islow arguments change seldom, and the Ifast arguments can be different every time it is called.
Then the problem is to write some kind of partial-evaluator function G(F, Islow) -> F1(Ifast) that takes function F and the Islow arguments, and generates a new (simpler) function F1 that only takes the Ifast arguments.
The problem with this is 1) somebody has to write the general function F, and 2) somebody has to write the general partial evaluator G.
What makes more sense to me is to write from scratch a function H(Islow) -> F1(Ifast), that is, write a code-generator specifically for F1, rather than writing two functions F and G, especially where G is very difficult to write.
H is usually much easier to write than F, and G need not be written at all! The result function F1 usually is smaller and has much higher performance than F, so it's a win-win situation.
When people write code generators, that is what they are doing, and it is a very effective programming technique.
This is a dangerous question, so let me try to phrase it correctly. Premature optimization is the root of all evil, but if you know you need it, there is a basic set of rules that should be considered. This set is what I'm wondering about.
For instance, imagine you got a list of a few thousand items. How do you look up an item with a specific, unique ID? Of course, you simply use a Dictionary to map the ID to the item.
And if you know that there is a setting stored in a database that is required all the time, you simply cache it instead of issuing a database request hundred times a second.
Or even something as simple as using a release instead of a debug build in prod.
I guess there are a few even more basic ideas.
I am specifically not looking for "don't do it, for experts: don't do it yet" or "use a profiler" answers, but for really simple, general hints. If you feel this is an argumentative question, you probably misunderstood my intention.
I am also not looking for concrete advice in any of my projects nor any sophisticated low level tricks. Think of it as an overview of how to avoid the most important performance mistakes you made as a very beginner.
Edit: This might be a good description of what I am looking for: Create a presentation (not a practical example) of common optimization rules for people who have a basic technical understanding (let's say they got a CS degree) but for some reason never wrote a single line of code. Point out the most important aspects. Pseudocode is fine. Do not assume specific languages or even architectures.
Two rules:
Use the right data structures.
Use the right algorithms.
I think that covers it.
Minimize the number of network roundtrips
Minimize the number of harddisk seeks
These are several orders of magnitude slower than anything else your program is likely to do, so avoiding them can be very important indeed. Typical methods to achieve this are:
Caching
Increasing the granularity of network and HD accesses
For example, B-Trees are absolutely ubiquitous in DB systems because the reduce the granularity of HD access for on-disk index lookups.
I think something extremely important is to be very carefully on all code that is frequently executed. This is normally the code in critical inner loops.
Rule 1: Know this code
For this code avoid all overhead. Small differences in runtime can make a big impact on the overall performance. E.g. if you implement an image filter a difference of 0.001ms per pixel will make a difference in 1s in the filter runtime on a image with size 1000x1000 (which is not big).
Things to avoid/do in inner loops are:
don't go through interfaces (e.g DB queries, RPC calls etc)
don't jump around in the RAM, try to access it linearly
if you have to read from disk then read large chunks outside the inner loop (paging)
avoid virtual function calls
avoid function calls / use inline functions
use float instead of double if possible
avoid numerical casts if possible
use ++a instead of a++ in C++
iterate directly on pointers if possible
The second general advice: Each layer/interface costs, try to avoid large stacks of different technologies, the system will spend more time in data transformation then in doing the actual job, keep things simple.
And as the others said, use the right algorithm, try to optimize the algorithm complexity first before you optimize the algorithm implementation.
I know you're looking for specific coding hints, but those are easy to find: cacheing, loop unrolling, code hoisting, data & code locality, blah, blah...
The biggest hint of all is don't use them.
Would it help to make this point if I said "This is the secret that the almighty Powers That Be don't want you to know!!"? Pick your Powers: Microsoft, Google, Sun, etc. etc.
Don't Use Them
Until you know, with dead certainty, what the problems are, and then the coding hints are obvious.
Here's an example where many coding tricks were used, but the heart and soul of the exercise is not the coding techniques, but the diagnostic technique.
Are your algorithms correct for the situation or are there better ones available?