Does coding many small methods have performance ramifications for Objective-C? - objective-c

I come from Ruby, and have sort of adopted the methodology of single responsibility principle, encapsulation, loose coupling, small testable methods, etc., so my code tends to jump from method to method frequently. That's the way I am used to working in the Ruby world. I argue that this is the best way to work, mainly for BDD, as once you start having "large" methods that do multiple things, it becomes very difficult to test.
I am wondering if there are any draw backs to this approach as far as noticeable differences in performance?

Yes, there will always be some amount of performance impact unless you have a compiler that inlines things, but if you do dynamic method lookup (like Ruby and Obj-C), you can't inline, and so there will be some impact. However, it really depends on the language and the calling conventions. If you have a call in Objective-C, you know that the overhead will be the overhead of of using the C calling conventions once (calling objc_msg_send), then a method lookup, and then some sort of jump (most likely also C calling conventions, but could be anything). You have to remember, though, that unless you're writing C and assembly, it's almost impossible to see any difference.

It is true that there is some performance impact. However, the performance of most systems is dominated by I/O, and so method dispatch overhead is a small part of the performance picture. If you are doing an app where method dispatch overhead is significant, you might look into coding in a language with more dispatch options.

While not necessarily Object-C specific, too many method calls that are not inlined by the compiler in non-interpreted or non-dynamic-dispatch languages/runtimes will create performance penalties since every call to a new function requires pushing a return address on the stack as well as setting up a new stack frame that must be cleaned up when the callee returns back to the caller. Additionally, functions that pass variables by-reference can incurr performance penalties since the function call is considered opaque to the compiler, removing the ability for the compiler to make optimizations ... in other words the compiler cannot re-order read/write operations across a function call to memory addresses that are passed to a function as modifiable arguments (i.e., pointers to non-const objects) since there could be side-effects on those memory addresses that would create hazards if an operation was re-ordered.
I would think though in the end you would only really notice these performance losses in a tight CPU-bound loop. For instance, any function call that makes an OS syscall could actually cause your program to lose it's time-slice in the OS scheduler or be pre-empted, which will take exorbitantly longer than any function call itself.

Method calls in Objective-C are relatively expensive (probably at least 10x slower than, say, C++ or Java). (In fact, I don't know if any standard Objective-C method calls are, or can be, inlined, due to "duck typing", et al.)
But in most cases the performance of Objective-C is dominated by UI operations (and IO operations, and network operations), and the efficiency of run-of-the-mill code is inconsequential. Only if you were doing some intense computation would call performance be of concern.

Related

Why does the Objective-C compiler need to know method signatures?

Why does the Objective-C compiler need to know at compile-time the signature of the methods that will be invoked on objects when it could defer that to runtime (i.e., dynamic binding)? For example, if I write [foo someMethod], why it is necessary for the compiler to know the signature of someMethod?
Because of calling conventions at a minimum (with ARC, there are more reasons, but calling conventions have always been a problem).
You may have been told that [foo someMethod] is converted into a function call:
objc_msgSend(foo, #selector(someMethod))
This, however, isn't exactly true. It may be converted to a number of different function calls depending on what it returns (and what's returned matters whether you use the result or not). For instance, if it returns an object or an integer, it'll use objc_msgSend, but if it returns a structure (on both ARM and Intel) it'll use objc_msgSend_stret, and if it returns a floating point on Intel (but not ARM I believe), it'll use objc_msgSend_fpret. This is all because on different processors the calling conventions (how you set up the stack and registers, and where the result is stored) are different depending on the result.
It also matters what the parameters are and how many there are (the number can be inferred from ObjC method names unless they're varargs... right, you have to deal with varargs, too). On some processors, the first several parameters may be put in registers, while later parameters may be put on the stack. If your function takes a varargs, then the calling convention may be different still. All of that has to be known in order to compile the function call.
ObjC could be implemented as a more pure object model to avoid all of this (as other, more dynamic languages do), but it would be at the cost of performance (both space and time). ObjC can make method calls surprisingly cheap given the level of dynamic dispatch, and can easily work with pure C machine types, but the cost of that is that we have to let the compiler know more specifics about our method signatures.
BTW, this can (and every so often does) lead to really horrible bugs. If you have a couple of methods:
- (MyPointObject *)point;
- (CGPoint)point;
Maybe they're defined in completely different files as methods on different classes. But if the compiler chooses the wrong definition (such as when you're sending a message to id), then the result you get back from -point can be complete garbage. This is a very, very hard bug to figure out when it happens (and I've had it happen to me).
For a bit more background, you may enjoy Greg Parker's article explaining objc_msgSend_stret and objc_msgSend_fpret. Mike Ash also has an excellent introduction to this topic. And if you want to go deep down this rabbit hole, you can see bbum's instruction-by-instruction investigation of objc_msgSend. It's outdated now, pre-ARC, and only covers x86_64 (since every architecture needs its own implementation), but is still highly educational and recommended.

How does Apple's Objective-C runtime do multithreaded reference counting without degraded performance?

So I was reading this article about an attempt to remove the global interpreter lock (GIL) from the Python interpreter to improve multithreading performance and saw something interesting.
It turns out that one of the places where removing the GIL actually made things worse was in memory management:
With free-threading, reference counting operations lose their thread-safety. Thus, the patch introduces a global reference-counting mutex lock along with atomic operations for updating the count. On Unix, locking is implemented using a standard pthread_mutex_t lock (wrapped inside a PyMutex structure) and the following functions...
...On Unix, it must be emphasized that simple reference count manipulation has been replaced by no fewer than three function calls, plus the overhead of the actual locking. It's far more expensive...
...Clearly fine-grained locking of reference counts is the major culprit behind the poor performance, but even if you take away the locking, the reference counting performance is still very sensitive to any kind of extra overhead (e.g., function call, etc.). In this case, the performance is still about twice as slow as Python with the GIL.
and later:
Reference counting is a really lousy memory-management technique for free-threading. This was already widely known, but the performance numbers put a more concrete figure on it. This will definitely be the most challenging issue for anyone attempting a GIL removal patch.
So the question is, if reference counting is so lousy for threading, how does Objective-C do it? I've written multithreaded Objective-C apps, and haven't noticed much of an overhead for memory management. Are they doing something else? Like some kind of per object lock instead of a global one? Is Objective-C's reference counting actually technically unsafe with threads? I'm not enough of a concurrency expert to really speculate much, but I'd be interested in knowing.
There is overhead and it can be significant in rare cases (like, for example, micro-benchmarks ;), regardless of the optimizations that are in place (of which, there are many). The normal case, though, is optimized for un-contended manipulation of the reference count for the object.
So the question is, if reference counting is so lousy for threading, how does Objective-C do it?
There are multiple locks in play and, effectively, a retain/release on any given object selects a random lock (but always the same lock) for that object. Thus, reducing lock contention while not requiring one lock per object.
(And what Catfish_man said; some classes will implement their own reference counting scheme to use class-specific locking primitives to avoid contention and/or optimize for their specific needs.)
The implementation details are more complex.
Is Objectice-C's reference counting actually technically unsafe with threads?
Nope -- it is safe in regards to threads.
In reality, typical code will call retain and release quite infrequently, compared to other operations. Thus, even if there were significant overhead on those code paths, it would be amortized across all the other operations in the app (where, say, pushing pixels to the screen is really expensive, by comparison).
If an object is shared across threads (bad idea, in general), then the locking overhead protecting the data access and manipulation will generally be vastly greater than the retain/release overhead because of the infrequency of retaining/releasing.
As far as Python's GIL overhead is concerned, I would bet that it has more to do with how often the reference count is incremented and decremented as a part of normal interpreter operations.
In addition to what bbum said, a lot of the most frequently thrown around objects in Cocoa override the normal reference counting mechanisms and store a refcount inline in the object, which they manipulate with atomic add and subtract instructions rather than locking.
(edit from the future: Objective-C now automatically does this optimization on modern Apple platforms, by mixing the refcount in with the 'isa' pointer)

#synchronized vs lock/unlock

I'm new to Objective-C. When should I used #synchronized, and when should I use lock/unlock? My background is mainly in Java. I know that in Java, obtaining explicit-locks allows you to do more complex, extensive, and flexible operations (vis-à-vis release order, etc.) whereas the synchronized keyword forces locks to be used in a block-structured way and they have also to released in the reverse order of how they were acquired. Does the same rationale hold in Objective-C?
Many would consider locking/unlocking in arbitrary order to be a bug, not a feature. Namely, it quite easily leads to deadlocks.
In any case, there is little difference between #synchonized(), -lock/-unlock, or any other mutex, save for the details of the scope. They are expensive, fragile, error-prone, and, often, getting them absolutely correct leads to performance akin to a single-threaded solution anyway (but with the complexity of threads).
The new hotness are queues. Queues tend to be a lot lighter weight in that they don't generally require system calls for the "fast path" operations that account for most calls. They also tend to be much more expressive of intent.
Grand Central Dispatch or NSOperationQueue specifically. The latter is built upon the former in current OS Release. GCD's API's tend to be lower level, but they are very powerful and surprisingly simple. NSOperationQueue is higher level and allows for directly expressing dependencies and the like.
I would suggest starting with the Cocoa concurrency guide.
You are generally correct. #synchronized takes a lock that's "attached" to a given object (the object referred to in the #synchronized directive does not have to be a lock). As in Java, it defines a new block, and the lock is taken at the beginning, and released at exit, whether by leaving the block normally or through an exception. So, as you guessed, the locks are released in reverse order from acquisition.

What is the performance difference between blocks and callbacks?

One of the things that block objects, introduced in Snow Leopard, are good for is situations that would previously have been handled with callbacks. The syntax is much cleaner for passing context around. However, I haven't seen any information on the performance implications of using blocks in this manner. What, if any, performance pitfalls should I look out for when using blocks, particularly as a replacement for a C-style callback?
The blocks runtime looks pretty tight. Block descriptors and functions are statically allocated, so they could enlarge the working set of your program, but you only "pay" in storage for the variables you reference from the enclosing scope. Non-global block literals and __block variables are constructed on the stack without any branching, so you're unlikely to run into much of a slowdown from that. Calling a block is just result = (*b->__FuncPtr)(b, arg1, arg2); this is comparable to result = (*callback_func_ptr)(callback_ctx, arg1, arg2).
If you think of blocks as "callbacks that write their own context structure and handle the ugly packing, memory management, casting, and dereferencing for you," I think you'll realize that blocks are a small cost at runtime and a huge savings in programming time.
You might want to check out this blog post and this one. Blocks are implemented as Objective-C objects, except they can be put on the stack, so they don't necessarily have to be malloc'd (if you retain a reference to a block, it will be copied onto the heap, though). They will thus probably perform better than most Objective-C objects, but will have a slight performance hit compared to a simple callback--I'd guess it shouldn't be a problem 95% of the time.

Objective C two-phase construction of objects

I've been reading up on RAII and single vs. two-phase construction/initialization. For whatever reason, I was in the two-phase camp up until recently, because at some point I must have heard that it's bad to do error-prone operations in your constructor. However, I think I'm now convinced that single-phase is preferable, based on questions I've read on SO and other articles.
My question is: Why does Objective C use the two-phase approach (alloc/init) almost exclusively for non-convenience constructors? Is there any specific reason in the language, or was it just a design decision by the designers?
I have the enviable situation of working for the guy who wrote +alloc back in 1991, and I happened to ask him a very similar question a few months ago. The addition of +alloc was in order to provide +allocWithZone:, which was in order to add memory pools in NeXTSTEP 2.0 where memory was very tight (4M). This allowed the caller to control where objects were allocated in memory. It was a replacement for +new and its kin, which was (and continues to be, though no one uses it) a 1-phase constructor, based on Smalltalk's new. When Cocoa came over to Apple, the use of +alloc was already entrenched, and there was no going back to +new, even though actually picking your NSZone is seldom of significant value.
So it isn't a big 1-phase/2-phase philosophical question. In practice, Cocoa has a single phase construction, because you always do (and always should) call these back-to-back in a single call without a test on the +alloc. You can think of it as a elaborate way of typing "new".
My experience is with c++, but one downside of c++'s one phase initialization is handling of inheritance/virtual functions. In c++, you can't call virtual functions during construction or destruction (well, you can, it just won't do what you expect). A two phase init could solve this (partially. From what I understand, it would get routed to the right class, but the init might not have finished yet. You could still do things with that) (I'm still in favor of the one phase)