Perfomance in audio processing with objective-c ++ - objective-c

I'm currently developing an audio application and the performance is one of my main concerns.
There are really good articles like Four common mistakes in audio development or Real-time audio programming 101: time waits for nothing.
I understood that the c++ is the way to go for audio processing but I still have a question: Does Objective-C++ slow down the performance?
For example with a code like this
#implementation MyObjectiveC++Class
- (float*) objCMethodWithOnlyC++:(float*) input {
// Process in full c++ code here
}
#end
Will this code be less efficient than the same one in a cpp file?
Bonus question: What will happen if I use GrandCentralDispatch inside this method in order to to parallelize the process?

Calling an obj C method is slower than calling a pure c or c++ method as the obj C runtime is invoked at every call. If it matters in your case dependent on the number of samples processed in each call. If you are only processing one sample at a time, then this might be a problem. If you process a large buffer then I wouldn't worry too much.
The best thing to do is to profile it and then evaluate the results against your requirements for performance.
And for your bonus question the answer is somewhat the same. GCD comes at a cost, and if that cost is larger than what you gain by parallelising it, then it is not worth. so again it depends on the amount of work you plan to do per call.
Regards
Klaus

To simplify, in the end ObjC and C++ code goes through the same compile and optimize chain as C. So the performance characteristics of identical code inside an ObjC or C++ method are identical.
That said, calling ObjC or C++ methods has different performance characteristics. ObjC provides a dynamically modifiable, binary stable ABI with its methods. These additional guarantees (which are great if you are exposing methods via public API from e.g. a framework) come at a slight performance cost.
If you are doing lots of calls to the same method (e.g. per sample or per pixel), that tiny performance penalty may add up. The same applies to ivar access, which carries another load on top of how ivar access would be in C++ (on the modern runtime) to guarantee binary stability.
Similar considerations apply to GCD. GCD parallelizes operations, so you pay a penalty for thread switches (like with regular threads) and for each new block you dispatch. But the actual code in them runs at just the same speed as it would anywhere else. Also, different from starting your own threads, GCD will re-use threads from a pool, so you don't pay the overhead for creating new threads repeatedly.
It's a trade-off. Depending on what your code does and how it does it and how long individual operations take, either approach may be faster. You'll just have to profile and see.
The worst thing you can probably do is do things that don't need to be real-time (like updating your volume meter) in one of the real-time CoreAudio threads, or make calls that block.
PS - In most cases, performance differences will be negligible. So another aspect to focus on would be readability and easy maintenance of your code. E.g. using blocks can make code that has lots of asynchronicity easier to read because you can set up the blocks in the correct order in one method of your source file, making the flow clearer than if you split them across several methods that all just do a tiny thing and then start an asynchronous process.

Related

Vulkan: Any downsides to creating pipelines, command-buffers, etc one at a time?

Some Vulkan objects (eg vkPipelines, vkCommandBuffers) are able to be created/allocated in arrays (using size + pointer parameters). At a glance, this appears to be done to make it easier to code using common usage patterns. But in some cases (eg: when creating a C++ RAII wrapper), it's nicer to create them one at a time. It is, of course, simple to achieve this.
However, I'm wondering whether there are any significant downsides to doing this?
(I guess this may vary depending on the actual object type being created - but I didn't think it'd be a good idea to ask the same question for each object)
Assume that, in both cases, objects are likely to be created in a first-created-last-destroyed manner, and that - while the objects are individually created and destroyed - this will likely happen in a loop.
Also note:
vkCommandBuffers are also deallocated in arrays.
vkPipelines are destroyed individually.
Are there any reasons I should modify my RAII wrapper to allow for array-based creation/destruction? For example, will it save memory (significantly)? Will single-creation reduce performance?
Remember that vkPipeline creation does not require external synchronization. That means that the process is going to handle its own mutexes and so forth. As such, it makes sense to avoid locking those internal mutexes whenever possible.
Also, the process is slow. So being able to batch it up and execute it into another thread is very useful.
Command buffer creation doesn't have either of these concerns. So there, you should feel free to allocate whatever CBs you need. However, multiple creation will never harm performance, and it may help it. So there's no reason to avoid it.
Vulkan is an API designed around modern graphics hardware. If you know you want to create a certain number of objects up front, you should use the batch functions if they exist, as the driver may be able to optimize creation/allocation, resulting in potentially better performance.
There may (or may not) be better performance (depending on driver and the type of your workload). But there is obviously potential for better performance.
If you create one or ten command buffers in you application then it does not matter.
For most cases it will be like less than 5 %. So if you do not care about that (e.g. your application already runs 500 FPS), then it does not matter.
Then again, C++ is a versatile language. I think this is a non-problem. You would simply have a static member function or a class that would construct/initialize N objects (there's probably a pattern name for that).
The destruction may be trickier. You can again have static member function that would destroy N objects. But it would not be called automatically and it is annoying to have null/husk objects around. And the destructor would still be called on VK_NULL_HANDLE. There is also a problem, that a pool reset or destruction would invalidate all the command buffer C++ objects, so there's probably no way to do it cleanly/simply.

Am I missing any points in my argument in favor of atomic properties?

I read this question (and several others):
What's the difference between the atomic and nonatomic attributes?
I fully understand (at least I hope so :-D ) how the atomic/nonatomic specifier for properties works:
Atomic guarantees that a "read" operation won't be interrupted by a "write" operation.
Nonatomic doesn't guarantee this.
Neither atomic nor nonatomic solve race conditions, where one thread is reading and two threads are writing. There is no way to predict what result the read operation will return. This needs to be solved by additional synchronization.
Neither atomic nor nonatomic guarantee overall data integrity; one thread could set one property while another thread sets a second property in a state which is inconsistent with the state of the first property. This also needs to be solved by additional synchronization.
What make my eyebrow raise is that people are divided into two camps:
Pro atomic: It make sense to use nonatomic only for performance optimization.
And if you are not optimizing, then you should always use atomic because of point 1. This way you won't get some complete crap when reading this property in a multi-threaded application. And sure, if you care about points 2 and 3, you need to add more synchronizaion on top of it.
Against atomic: It doesn't make sense to use atomic at all.
Since atomic doesn't solve all the problems in a multi-threaded application, it doesn't make sense to use it at all, since you will need to add more synchronization code on top of it anyway. It will just make things slower.
I am leaning to the pro-atomic camp, but I want to do a sanity check that I didn't miss anything.
Lacking a very specific question (though still a good question), I'll answer with personal experience, FWIW.
In general, concurrency design is hard. With modern conveniences like GCD and ARC, the tools for implementing concurrent systems have certainly improved. However, the architecture of concurrency is still very hard.
And, generally, the hard part has nothing to do with individual properties; individual getters and setters. Concurrency is something that is implemented at a higher level.
The current state of the art is concurrency in isolation. That is, the parts of your app that are running concurrently are doing so using isolated graphs of objects that have extremely minimal connection to the rest of your application (typically, the "connections" are via callbacks that bundle up a bit of state and toss it over to some other queue, often the main queue for updating the UI).
By keeping the concurrency surface area -- the # of entry points into your code that must be concurrency safe -- to an absolute minimum, you reduce both complexity and the amount of time you'll spend debugging really weird, oft irreproducible, concurrency problems (that'll eat at your sanity).
Given all that, the value of atomic properties is pretty minimal. Sure, they can be useful along what should be the very very small set of interfaces -- of API -- that might be banged upon from multiple threads, but that is about it.
If you have objects for which the accessors are being banged on rapidly, making them atomic can be a significant performance hit, but premature optimization is the devil's fingers at play.

Slow Deletion of Handle Object in MATLAB

I used MATLAB to write a simulation engine for the simulation of product flows in a production environment. I inherited all used class from handle and used these handles (quite excessively, I guess) to link between e.g. products and work systems, orders, etc.
Now, to run multiple instances of my model, I create a simulation object that contains all other objects and their relations, run the model and free the simulation variable.
Creating and running the model takes ~50 seconds (this including the generation of all objects, their relations and of course the calculation over the course of the simulation run). Freeing the variable before the next run, currently takes ~3-4 minutes!
I tried clear, delete and plain overwriting of the old simulation object, without notifying significant differences in performance.
Is there a way to improve the performance without rewriting the code?
It is hard to say anything particular about your code without seeing it, or at least some high level design.
A short advice before optimizing the OO aspects :
Are you sure that the bottleneck is in the objects creation? Verify it with the profiler.
If the OO is indeed the bottleneck, here are some guesses:
You have used circular references. Matlab does not use garbage collector, but rather a smart reference counting mechanism, which can be quite slow in this case. Change the references between the objects to be tree-like instead.
You have created an enormous amount of objects. Matlab has a significant overhead for each object, much more than the traditional languages (c++, java). Re-design the system to have a smaller amount of objects.
Do you happen to use cell arrays to store other handle objects from within a handle object? This can cause serious slowdowns prior to Matlab R2011A. See http://www.mathworks.com/support/solutions/en/data/1-6VVMS0/index.html?product=ML
A workaround is to use a temp local variable to manipulate cell array, then assign this tmp variable back to your handle object property. I saw ~ 100X improvement in performance after doing this in one case.

Obj-C iPhone architecture regarding instantiation of classes

I have a noob question regarding how to architect my iPhone app.
I have two main components, a map and an audio player, both self contained classes (MapController and AudioController). I read this great article and am trying to apply it to my app:
http://blog.shinetech.com/2011/06/14/delegation-notification-and-observation/
I have a third class, WebServices, that handles uploading POST data to my server as well as making queries to external API's.
My question:
Do I import header files, and create a new instance of WebServices in both the map controller, and the audio player? And then each controller can reference it's own WebServices for queries?
Or, should I create one WebServices instance on the RootController, and then pass this to the map and audio controllers on init?
In particular, I'm interested in which approach consumes memory more efficiently. Or if it doesn't matter at all.
Thanks!
Consider creating a singleton for your WebServices class and use a shared instance. It seems this is the design pattern you require here.
With regard to the efficiency part of the question, the second option is more efficient purely because you're not storing as much data in RAM. The difference however, assuming your classes are not storing too much internally, is likely to be unnoticable.
#interface WebServices: NSObject
{
}
+ (WebServices*)sharedInstance;
#end
static WebServices *sharedInstance;
#implementation WebSerivces
+ (WebServices*)sharedInstance
{
#synchronized(self)
{
if (!sharedInstance)
sharedInstance = [[WebServices alloc] init];
return sharedInstance;
}
}
#end
Point Zero
Do not resort to singletons, and especially not in multithreaded contexts. You can search this site if you need explanations (hint: the reasons against them are quite similar in OO languages similar to ObjC. e.g. C++, Java, C#...).
To share, or not?
It ultimately depends on the behavior your design needs, but I favor Option #1 as the default.
If you need multiple instances, do not be afraid create multiple (unless you have a really, really, really good reason not to).
If they are heavy to construct, consider what immutable data they can share.
Favor smaller, specialized objects (WebServices may be huge). Smaller objects allow more flexibility and reduce the chances that your program will compete for resources.
What will you gain in this case by sharing? Often, this just increases runtime and implementation complexity... but there can be gains. Without identifying those gains, you should stick to multiple instances to reduce complexity.
Multiple instances allow your program to perform multiple asynchronous requests, based on what they need. Multiple async requests is fine (but it will obviously be slow if you have too many active requests).
Most programs are more logical and easy to maintain when the objects they depend on allow multiple instances (they do not enforce sharing).
Assume multiple instances (1) until have identified a problem. Even after the problem is identified, you can usually avoid sharing.
To determine which consumes less memory requires much more information. If the objects you are considering sharing are not huge (very unlikely), they themselves are not problems wrt memory consumption.
Either approach proposed can result in more memory consumption - it depends entirely on the context and execution. We'll need some more information to answer this specifically.
This is not to imply that sharing is bad, but sharing usually complicates things because you need to design objects to behave predictably when shared (especially since you are operating in a multithreaded context).
Ask yourself:
What significant benefit(s) would sharing WebServices provide?
How would sharing WebServices improve your program and reduce complexity?
It's likely the case that the WebServices map stuff does not interact with the audio stuff. In that case, WebServices may be a good candidate for a base class, where the audio and map services are separate subclasses (assuming the WebServices class is already complex).
I would suggest an NSOperationQueue with a number of NSOperations, instead of a WebServices singleton.
Look at this sweet example from Apple where they do this for concurrent image-downloading. It's a Mac example, but all the classes shown are available on iOS. They use NSInvocationOperation, but you can use an NSOperation subclass, or an NSBlockOperation for cleaner code.
You can add these from anywhere to a single queue, by calling [NSOperationQueue mainQueue]; to fetch the main queue singleton. The singleton's job, in this case, is only to manage the queue (so not too many downloads happen at once), not to do any data processing (that's all handled by your NSOperation).

objective-c block vs selector. which one is better?

In objective-c when you are implementing a method that is going to perform a repetitive operations, for example, you need to choice in between the several options that the language brings you:
#interface FancyMutableCollection : NSObject { }
-(void)sortUsingSelector:(SEL)comparator;
// or ...
-(void)sortUsingComparator:(NSComparator)cmptr;
#end
I was wondering which one is better?
Objective-c provides many options: selectors, blocks, pointers to functions, instances of a class that conforms a protocol, etc.
Some times the choice is clear, because only one method suits your needs, but what about the rest? I don't expect this to be just a matter of fashion.
Are there any rules to know when to use selectors and when to use blocks?
The main difference I can think of is that with blocks, they act like closures so they capture all of the variables in the scope around them. This is good for when you already have the variables there and don't want to create an instance variable just to hold that variable temporarily so that the action selector can access it when it is run.
With relation to collections, blocks have the added ability to be run concurrently if there are multiple cores in the system. Currently in the iPhone there isn't, but the iPad 2 does have it and it is probable that future iPhone models will have multiple cores. Using blocks, in this case, would allow your app to scale automatically in the future.
In some cases, blocks are just easier to read as well because the callback code is right next to the code that's calling it back. This is not always the case of course, but when sometimes it does simply make the code easier to read.
Sorry to refer you to the documentation, but for a more comprehensive overview of the pros/cons of blocks, take a look at this page.
As Apple puts it:
Blocks represent typically small, self-contained pieces of code. As such, they’re particularly useful as a means of encapsulating units of work that may be executed concurrently, or over items in a collection, or as a callback when another operation has finished.
Blocks are a useful alternative to traditional callback functions for two main reasons:
They allow you to write code at the point of invocation that is executed later in the context of the method implementation.
Blocks are thus often parameters of framework methods.
They allow access to local variables.
Rather than using callbacks requiring a data structure that embodies all the contextual information you need to perform an operation, you simply access local variables directly.
On this page
The one that's better is whichever one works better in the situation at hand. If your objects all implement a comparison selector that supports the ordering you want, use that. If not, a block will probably be easier.