I've seen sample code (most notably in iOS GL projects) where the current GL texture name is cached, and a comparison done before calling glBindTexture. The goal is to avoid unnecessary calls to glBindTexture.
e.g.
if (textureName != cachedTextureName) {
glBindTexture(GL_TEXTURE_2D, textureName);
cachedTextureName = textureName;
}
This approach requires that ALL calls to glBindTexture be handled similarly, preferably through a wrapper. But the inclusion of third-party code makes this problematic, as well as being Yet Another Thing to Remember.
Q. Is this a useful optimisation? Or is the OpenGL implementation smart enough to disregard calls to glBlindTexture where the texture name has not changed?
Thanks.
This isn't called out in the spec as far as I've seen. In practice is appears many implementations actually do some processing if you rebind the same texture as you'll see a relative performance drop if you don't test the current texture. I'd recommend using the test if possible just to ensure that implementation details don't have an adverse effect on your program.
Generally speaking it is quite useful to abstract over the OpenGL state machine with your own state handling so you can query state such as the bound texture or the active matrices without calling glGet which will almost always be slower than your custom handling. This also allows you to prohibit invalid behavior and generally makes your program easier to reason about.
Related
When looking at arrows documentation about functional error handling one of the reason listed to avoid throwing exceptions is performance cost (referencing The hidden performance costs of instantiating Throwables)
So it is suggested to model errors/failures as an Effect.
When building an Effect to interrupt a computation the shift() method should be used (and under the hood it is also what is used to "unwrap" the effects through the bind() method).
Looking at the shift() method implementation seems that its magic is done...throwing an exception, meaning that not only exceptions are created when we want to signal an error, but also to "unwrap" any missing Option, Left instance of an Either and all the other effect types exposed by the library.
What I'm not getting is if there's some optimization done to avoid the issues with "the hidden performance costs of instantiating Throwables", or in the end they are not a real problem?
What I'm not getting is if there's some optimisation done to avoid the issues with "the hidden performance costs of instantiating Throwables", or in the end they are not a real problem?
The argumentation that this is the biggest reason for using typed errors on the JVM is probably an overstatement, there are better reason for using typed errors. Exceptions are not typed, so they are not tracked by the compiler. This is what we want to avoid if you care about type-safety, or purity. This will be better reflected in the documentation or 2.x.x.
Avoiding the performance penalty can be a benefit in hot-loops, but in general application programming it can probably be neglected.
However to answer your question on how this is dealt with in Kotlin, and Arrow:
In Kotlin cancellation of Coroutines works through CancellationException, so it's required to use this mechanism to correctly work in the Kotlin language. You can find more details in the Arrow 2.x.x Raise design document.
It's possible to remove the performance penalty of exceptions. Which is also what Arrow is doing. (Except a small regression in a single version, this was be fixed in the next release).
An example of this can also be found in the official KotlinX Coroutines which applies the same technique for disabled stack traces for JobCancellationException.
I'm currently developing an audio application and the performance is one of my main concerns.
There are really good articles like Four common mistakes in audio development or Real-time audio programming 101: time waits for nothing.
I understood that the c++ is the way to go for audio processing but I still have a question: Does Objective-C++ slow down the performance?
For example with a code like this
#implementation MyObjectiveC++Class
- (float*) objCMethodWithOnlyC++:(float*) input {
// Process in full c++ code here
}
#end
Will this code be less efficient than the same one in a cpp file?
Bonus question: What will happen if I use GrandCentralDispatch inside this method in order to to parallelize the process?
Calling an obj C method is slower than calling a pure c or c++ method as the obj C runtime is invoked at every call. If it matters in your case dependent on the number of samples processed in each call. If you are only processing one sample at a time, then this might be a problem. If you process a large buffer then I wouldn't worry too much.
The best thing to do is to profile it and then evaluate the results against your requirements for performance.
And for your bonus question the answer is somewhat the same. GCD comes at a cost, and if that cost is larger than what you gain by parallelising it, then it is not worth. so again it depends on the amount of work you plan to do per call.
Regards
Klaus
To simplify, in the end ObjC and C++ code goes through the same compile and optimize chain as C. So the performance characteristics of identical code inside an ObjC or C++ method are identical.
That said, calling ObjC or C++ methods has different performance characteristics. ObjC provides a dynamically modifiable, binary stable ABI with its methods. These additional guarantees (which are great if you are exposing methods via public API from e.g. a framework) come at a slight performance cost.
If you are doing lots of calls to the same method (e.g. per sample or per pixel), that tiny performance penalty may add up. The same applies to ivar access, which carries another load on top of how ivar access would be in C++ (on the modern runtime) to guarantee binary stability.
Similar considerations apply to GCD. GCD parallelizes operations, so you pay a penalty for thread switches (like with regular threads) and for each new block you dispatch. But the actual code in them runs at just the same speed as it would anywhere else. Also, different from starting your own threads, GCD will re-use threads from a pool, so you don't pay the overhead for creating new threads repeatedly.
It's a trade-off. Depending on what your code does and how it does it and how long individual operations take, either approach may be faster. You'll just have to profile and see.
The worst thing you can probably do is do things that don't need to be real-time (like updating your volume meter) in one of the real-time CoreAudio threads, or make calls that block.
PS - In most cases, performance differences will be negligible. So another aspect to focus on would be readability and easy maintenance of your code. E.g. using blocks can make code that has lots of asynchronicity easier to read because you can set up the blocks in the correct order in one method of your source file, making the flow clearer than if you split them across several methods that all just do a tiny thing and then start an asynchronous process.
I have a very CPU intensive F# program that depends on persistent data-structures - about 40% of the total CPU time is spent in the Map module. So I thought I'd try out the PersistentHashMap in FSharpX collections. (BTW, this is already a big improvement over the previous version of F# in VS2013 where the same program spent 70% of its time in Map. I also notice that running programs with the debugger attached doesn't have the huge penalty it did before - good work guys...) There is also a hot-spot where I'm re-sorting all the time, where instead I should be adding to a Heap, so I thought I'd give that a go as well.
Two issue became immediately apparent:
(1) Swapping out one for the other from an interface perspective proved harder than it seems it should - I.e., making a shim that let me switch from a Map to a PersistentMap, preserving both the needed module-based let-bound functions and Types necessary to use the each map. I know that having full HM type-inference (and no type-classes) is orthogonal to LSP-style referential transparency for the most part - but maybe I was missing some way to do this better with a minimal amount of code.
(2) The biggest problem (which I'd like to focus on here) is the reliance of the F# functional data-structs on oo-style dispatched equality and comparison via the IComparison (when 't : comparison), etc., family of interfaces.
Even for OO programs ISTM that the idea of dispatching equality and comparison is a bad idea -- an object "knows" how to perform its own domain-specific tasks, but it doesn't "know" for the most part what notion of equality is going to be necessary at various points in the program for various purposes -- so equality/comparison should not be part of the object's interface, but when these concepts are needed, they should always be mentioned explicitly. For example, there should never be a .Sort(), only a .SortWith(...). One could argue that even something as basic as structural equality in F# could be explicit a.StructEq(b) or a ~= b - otherwise you always get object.Equals -- but even stipulating that doing things this way is the best for a multi-paradigm language that's a first-class .Net citizen, it seems like there should at least be the option of using passed-in comparison and equality functions, but this is not the case.
This means that: (a) type constraints are enforced even if you don't want them, causing ripples of broken inferred typing (and hundreds of wavy red lines with it being unclear where the actual "problem" is) and (b), that by implementing a notion of equality or comparison that makes one container type happy in one part of your program (and in my case I want to use the same container and item type with two different notions of ordering in two different places), it is likely to silently break (or cause inefficiency, if one subsumes the other) in other parts of the code that depended on the default/previous implementation.
The only way around this that I could think of is wrapping each item a adapter object using new...with object expression - but I really don't want to create so much garbage just to get the code to work.
So, ISTM that we could have a "pure" version of each persistent data struct that could be loaded if desired (even basics like List, etc.) that do not depend on dispatched equality/comparison/hashing and do not impose type constraints - all such needs should be via a passed in fn's at the time of the call. (Dispatched eq/cmp would be only for used for interop with BCL collections that don't accept delegates.) Then we could have a [EqCmpHashThrowNotImplemented] attribute, and I could be sure that there were no default operations happening at all, and I would feel better about the efficiency and predictability of my code. (And this also let's one change from a Record to a Class or visa-versa w/o worrying about any changes in behavior due to default implementations.) Again, this would be optional, but done by with a simple import. (Which does mean that each base core collection type would have to be broken out into its own module, which isn't really a bad idea anyway.)
If I've overlooked a better way to do things or there are some patterns people are using here, I'd be interested.
What's the purpose of if statement in a custom setter? I see this routine a lot in sample code. Provided using ARC, why bother checking the equality?
- (void)setPhotoDatabase:(UIManagedDocument *)photoDatabase
{
if (_photoDatabase != photoDatabase) {
_photoDatabase = photoDatabase;
...
}
}
The important part is typically what follows the change (what's in ...): side-effects after assigning new value, which can be very costly.
It's a good idea to restrict those changes to avoid triggering unnecessary and potentially very costly side effects. say you change a document, well you will likely need to change a good percentage of the the ui related to that document, as well as model changes.
When the conditions are checked, a significant amount of unnecessary/changes work may be short circuited, which could wind up avoiding making unnecessary changes.
such unnecessary side effects could easily eclipse your app's real work regarding CPU, drawing, object creation, writes to disk -- pretty much anything.
believe it or not, a lot of apps do perform significant amounts of unnecessary work, even if they are very well designed. drawing and ui updates in view-based rendering systems are probably the best example i can think of. in that domain, there are a ton of details one could implement to minimize redundant drawing.
One of the main reasons to override and implement custom setters is to execute additional code in response to changes of the property. If the property doesn't actually change, why execute that code?
The answer is usually in the ... section that you have commented out: when there is nothing there, the code makes no sense. However, a typical thing to have in that spot is some sort of notification of your own delegate, like this:
[myDelegate photoDatabaseDidChanged:photoDatabase];
This should not be called unless the photoDatabase has indeed changed. The call may be costly, anywhere from "expensive" to "very expensive", depending on what the delegate really does. It could be updating a screen with the images from the new library, or it could be saving new images into the cloud. If there is no need to report the change, you could be wasting the CPU cycles, along with the battery and the network bandwidth. Your code has no way of knowing what the delegate is going to do, so you need to avoid calling back unless the change did happen.
If you check for equality you can prevent the redundant assignment of the parameter that is passed into the method.
This way you can avoid the cost (even if it's small) of doing all the code within the brackets if there is no change to the photoDatabase in your sample method.
Ex (Extending your example):
- (void)setPhotoDatabase:(UIManagedDocument *)photoDatabase
{
if (_photoDatabase != photoDatabase)
{
_photoDatabase = photoDatabase;
// do stuff
// do more stuff
// do even more stuff
// do something really expensive
}
}
As you can see from the example, if you check first to see if the photoDatabase doesn't equal what is passed in, you can just exit the method and not run additional code that isn't necessary.
I keep hitting this problem when building game engines where my classes want to look like this:
interface Entity {
draw();
}
class World {
draw() {
for (e in entities)
e.draw();
}
}
That's just pseudo-code to show roughly how the drawing happens. Each entity subclass implements its own drawing. The world loops through all the entities in no particular order and tells them to draw themselves one by one.
But with shader based graphics, this tends to be horribly inefficient or even infeasible. Each entity type is probably going to have its own shader program. To minimize program changes, all entities of each particular type need to be drawn together. Simple types of entities, like particles, may also want to aggregate their drawing in other ways, like sharing one big vertex array. And it gets really hairy with blending and such where some entity types need to be rendered at certain times relative to others, or even at multiple times for different passes.
What I normally end up with is some sort of renderer singleton for each entity class that keeps a list of all instances and draws them all at once. That's not so bad since it separates the drawing from the game logic. But the renderer needs to figure out which subset of entities to draw and it needs access to multiple different parts of the graphics pipeline. This is where my object model tends to get messy, with lots of duplicate code, tight coupling, and other bad things.
So my question is: what is a good architecture for this kind of game drawing that is efficient, versatile, and modular?
Use a two stages approach: First loop through all entities, but instead of drawing let them insert references to themself into a (the) drawing batch list. Then sort the list by OpenGL state and shader use; after sorting insert state changer objects at every state transistion.
Finally iterate through the list executing the drawing routine of each object referenced in the list.
This is not an easy question to answer, since there are many ways to deal with the problem. A good idea is to look into some Game/Rendering engines and see how this is handled there. A good starting point would be Ogre, since its well documented and open source.
As far as I know, it separates the vertex data from the material components (shaders) through the built-in material scripts. The renderer itself knows what mesh is to be drawn in what order and with what shader (and its passes).
I know this answer is a bit vague, but I hope I could give you a useful hint.