What is the performance difference between blocks and callbacks? - objective-c

One of the things that block objects, introduced in Snow Leopard, are good for is situations that would previously have been handled with callbacks. The syntax is much cleaner for passing context around. However, I haven't seen any information on the performance implications of using blocks in this manner. What, if any, performance pitfalls should I look out for when using blocks, particularly as a replacement for a C-style callback?

The blocks runtime looks pretty tight. Block descriptors and functions are statically allocated, so they could enlarge the working set of your program, but you only "pay" in storage for the variables you reference from the enclosing scope. Non-global block literals and __block variables are constructed on the stack without any branching, so you're unlikely to run into much of a slowdown from that. Calling a block is just result = (*b->__FuncPtr)(b, arg1, arg2); this is comparable to result = (*callback_func_ptr)(callback_ctx, arg1, arg2).
If you think of blocks as "callbacks that write their own context structure and handle the ugly packing, memory management, casting, and dereferencing for you," I think you'll realize that blocks are a small cost at runtime and a huge savings in programming time.

You might want to check out this blog post and this one. Blocks are implemented as Objective-C objects, except they can be put on the stack, so they don't necessarily have to be malloc'd (if you retain a reference to a block, it will be copied onto the heap, though). They will thus probably perform better than most Objective-C objects, but will have a slight performance hit compared to a simple callback--I'd guess it shouldn't be a problem 95% of the time.

Related

How does Apple's Objective-C runtime do multithreaded reference counting without degraded performance?

So I was reading this article about an attempt to remove the global interpreter lock (GIL) from the Python interpreter to improve multithreading performance and saw something interesting.
It turns out that one of the places where removing the GIL actually made things worse was in memory management:
With free-threading, reference counting operations lose their thread-safety. Thus, the patch introduces a global reference-counting mutex lock along with atomic operations for updating the count. On Unix, locking is implemented using a standard pthread_mutex_t lock (wrapped inside a PyMutex structure) and the following functions...
...On Unix, it must be emphasized that simple reference count manipulation has been replaced by no fewer than three function calls, plus the overhead of the actual locking. It's far more expensive...
...Clearly fine-grained locking of reference counts is the major culprit behind the poor performance, but even if you take away the locking, the reference counting performance is still very sensitive to any kind of extra overhead (e.g., function call, etc.). In this case, the performance is still about twice as slow as Python with the GIL.
and later:
Reference counting is a really lousy memory-management technique for free-threading. This was already widely known, but the performance numbers put a more concrete figure on it. This will definitely be the most challenging issue for anyone attempting a GIL removal patch.
So the question is, if reference counting is so lousy for threading, how does Objective-C do it? I've written multithreaded Objective-C apps, and haven't noticed much of an overhead for memory management. Are they doing something else? Like some kind of per object lock instead of a global one? Is Objective-C's reference counting actually technically unsafe with threads? I'm not enough of a concurrency expert to really speculate much, but I'd be interested in knowing.
There is overhead and it can be significant in rare cases (like, for example, micro-benchmarks ;), regardless of the optimizations that are in place (of which, there are many). The normal case, though, is optimized for un-contended manipulation of the reference count for the object.
So the question is, if reference counting is so lousy for threading, how does Objective-C do it?
There are multiple locks in play and, effectively, a retain/release on any given object selects a random lock (but always the same lock) for that object. Thus, reducing lock contention while not requiring one lock per object.
(And what Catfish_man said; some classes will implement their own reference counting scheme to use class-specific locking primitives to avoid contention and/or optimize for their specific needs.)
The implementation details are more complex.
Is Objectice-C's reference counting actually technically unsafe with threads?
Nope -- it is safe in regards to threads.
In reality, typical code will call retain and release quite infrequently, compared to other operations. Thus, even if there were significant overhead on those code paths, it would be amortized across all the other operations in the app (where, say, pushing pixels to the screen is really expensive, by comparison).
If an object is shared across threads (bad idea, in general), then the locking overhead protecting the data access and manipulation will generally be vastly greater than the retain/release overhead because of the infrequency of retaining/releasing.
As far as Python's GIL overhead is concerned, I would bet that it has more to do with how often the reference count is incremented and decremented as a part of normal interpreter operations.
In addition to what bbum said, a lot of the most frequently thrown around objects in Cocoa override the normal reference counting mechanisms and store a refcount inline in the object, which they manipulate with atomic add and subtract instructions rather than locking.
(edit from the future: Objective-C now automatically does this optimization on modern Apple platforms, by mixing the refcount in with the 'isa' pointer)

Does coding many small methods have performance ramifications for Objective-C?

I come from Ruby, and have sort of adopted the methodology of single responsibility principle, encapsulation, loose coupling, small testable methods, etc., so my code tends to jump from method to method frequently. That's the way I am used to working in the Ruby world. I argue that this is the best way to work, mainly for BDD, as once you start having "large" methods that do multiple things, it becomes very difficult to test.
I am wondering if there are any draw backs to this approach as far as noticeable differences in performance?
Yes, there will always be some amount of performance impact unless you have a compiler that inlines things, but if you do dynamic method lookup (like Ruby and Obj-C), you can't inline, and so there will be some impact. However, it really depends on the language and the calling conventions. If you have a call in Objective-C, you know that the overhead will be the overhead of of using the C calling conventions once (calling objc_msg_send), then a method lookup, and then some sort of jump (most likely also C calling conventions, but could be anything). You have to remember, though, that unless you're writing C and assembly, it's almost impossible to see any difference.
It is true that there is some performance impact. However, the performance of most systems is dominated by I/O, and so method dispatch overhead is a small part of the performance picture. If you are doing an app where method dispatch overhead is significant, you might look into coding in a language with more dispatch options.
While not necessarily Object-C specific, too many method calls that are not inlined by the compiler in non-interpreted or non-dynamic-dispatch languages/runtimes will create performance penalties since every call to a new function requires pushing a return address on the stack as well as setting up a new stack frame that must be cleaned up when the callee returns back to the caller. Additionally, functions that pass variables by-reference can incurr performance penalties since the function call is considered opaque to the compiler, removing the ability for the compiler to make optimizations ... in other words the compiler cannot re-order read/write operations across a function call to memory addresses that are passed to a function as modifiable arguments (i.e., pointers to non-const objects) since there could be side-effects on those memory addresses that would create hazards if an operation was re-ordered.
I would think though in the end you would only really notice these performance losses in a tight CPU-bound loop. For instance, any function call that makes an OS syscall could actually cause your program to lose it's time-slice in the OS scheduler or be pre-empted, which will take exorbitantly longer than any function call itself.
Method calls in Objective-C are relatively expensive (probably at least 10x slower than, say, C++ or Java). (In fact, I don't know if any standard Objective-C method calls are, or can be, inlined, due to "duck typing", et al.)
But in most cases the performance of Objective-C is dominated by UI operations (and IO operations, and network operations), and the efficiency of run-of-the-mill code is inconsequential. Only if you were doing some intense computation would call performance be of concern.

How does the new automatic reference counting mechanism work?

Can someone briefly explain to me how ARC works? I know it's different from Garbage Collection, but I was just wondering exactly how it worked.
Also, if ARC does what GC does without hindering performance, then why does Java use GC? Why doesn't it use ARC as well?
Every new developer who comes to Objective-C has to learn the rigid rules of when to retain, release, and autorelease objects. These rules even specify naming conventions that imply the retain count of objects returned from methods. Memory management in Objective-C becomes second nature once you take these rules to heart and apply them consistently, but even the most experienced Cocoa developers slip up from time to time.
With the Clang Static Analyzer, the LLVM developers realized that these rules were reliable enough that they could build a tool to point out memory leaks and overreleases within the paths that your code takes.
Automatic reference counting (ARC) is the next logical step. If the compiler can recognize where you should be retaining and releasing objects, why not have it insert that code for you? Rigid, repetitive tasks are what compilers and their brethren are great at. Humans forget things and make mistakes, but computers are much more consistent.
However, this doesn't completely free you from worrying about memory management on these platforms. I describe the primary issue to watch out for (retain cycles) in my answer here, which may require a little thought on your part to mark weak pointers. However, that's minor when compared to what you're gaining in ARC.
When compared to manual memory management and garbage collection, ARC gives you the best of both worlds by cutting out the need to write retain / release code, yet not having the halting and sawtooth memory profiles seen in a garbage collected environment. About the only advantages garbage collection has over this are its ability to deal with retain cycles and the fact that atomic property assignments are inexpensive (as discussed here). I know I'm replacing all of my existing Mac GC code with ARC implementations.
As to whether this could be extended to other languages, it seems geared around the reference counting system in Objective-C. It might be difficult to apply this to Java or other languages, but I don't know enough about the low-level compiler details to make a definitive statement there. Given that Apple is the one pushing this effort in LLVM, Objective-C will come first unless another party commits significant resources of their own to this.
The unveiling of this shocked developers at WWDC, so people weren't aware that something like this could be done. It may appear on other platforms over time, but for now it's exclusive to LLVM and Objective-C.
ARC is just play old retain/release (MRC) with the compiler figuring out when to call retain/release. It will tend to have higher performance, lower peak memory use, and more predictable performance than a GC system.
On the other hand some types of data structure are not possible with ARC (or MRC), while GC can handle them.
As an example, if you have a class named node, and node has an NSArray of children, and a single reference to its parent that "just works" with GC. With ARC (and manual reference counting as well) you have a problem. Any given node will be referenced from its children and also from its parent.
Like:
A -> [B1, B2, B3]
B1 -> A, B2 -> A, B3 -> A
All is fine while you are using A (say via a local variable).
When you are done with it (and B1/B2/B3), a GC system will eventually decide to look at everything it can find starting from the stack and CPU registers. It will never find A,B1,B2,B3 so it will finalize them and recycle the memory into other objects.
When you use ARC or MRC, and finish with A it have a refcount of 3 (B1, B2, and B3 all reference it), and B1/B2/B3 will all have a reference count of 1 (A's NSArray holds one reference to each). So all of those objects remain live even though nothing can ever use them.
The common solution is to decide one of those references needs to be weak (not contribute to the reference count). That will work for some usage patterns, for example if you reference B1/B2/B3 only via A. However in other patterns it fails. For example if you will sometimes hold onto B1, and expect to climb back up via the parent pointer and find A. With a weak reference if you only hold onto B1, A can (and normally will) evaporate, and take B2, and B3 with it.
Sometimes this isn't an issue, but some very useful and natural ways of working with complex structures of data are very difficult to use with ARC/MRC.
So ARC targets the same sort of problems GC targets. However ARC works on a more limited set of usage patterns then GC, so if you took a GC language (like Java) and grafted something like ARC onto it some programs wouldn't work any more (or at least would generate tons of abandoned memory, and may cause serious swapping issues or run out of memory or swap space).
You can also say ARC puts a bigger priority on performance (or maybe predictability) while GC puts a bigger priority on being a generic solution. As a result GC has less predictable CPU/memory demands, and a lower performance (normally) than ARC, but can handle any usage pattern. ARC will work much better for many many common usage patterns, but for a few (valid!) usage patterns it will fall over and die.
Magic
But more specifically ARC works by doing exactly what you would do with your code (with certain minor differences). ARC is a compile time technology, unlike GC which is runtime and will impact your performance negatively. ARC will track the references to objects for you and synthesize the retain/release/autorelease methods according to the normal rules. Because of this ARC can also release things as soon as they are no longer needed, rather than throwing them into an autorelease pool purely for convention sake.
Some other improvements include zeroing weak references, automatic copying of blocks to the heap, speedups across the board (6x for autorelease pools!).
More detailed discussion about how all this works is found in the LLVM Docs on ARC.
It varies greatly from garbage collection. Have you seen the warnings that tell you that you may be leaking objects on different lines? Those statements even tell you on what line you allocated the object. This has been taken a step further and now can insert retain/release statements at the proper locations, better than most programmers, almost 100% of the time. Occasionally there are some weird instances of retained objects that you need to help it out with.
Very well explained by Apple developer documentation. Read "How ARC Works"
To make sure that instances don’t disappear while they are still needed, ARC tracks how many properties, constants, and variables are currently referring to each class instance. ARC will not deallocate an instance as long as at least one active reference to that instance still exists.
To make sure that instances don’t disappear while they are still needed, ARC tracks how many properties, constants, and variables are currently referring to each class instance. ARC will not deallocate an instance as long as at least one active reference to that instance still exists.
To know Diff. between Garbage collection and ARC: Read this
ARC is a compiler feature that provides automatic memory management of objects.
Instead of you having to remember when to use retain, release, and autorelease, ARC evaluates the lifetime requirements of your objects and automatically inserts appropriate memory management calls for you at compile time. The compiler also generates appropriate dealloc methods for you.
The compiler inserts the necessary retain/release calls at compile time, but those calls are executed at runtime, just like any other code.
The following diagram would give you the better understanding of how ARC works.
Those who're new in iOS development and not having work experience on Objective C.
Please refer the Apple's documentation for Advanced Memory Management Programming Guide for better understanding of memory management.

Calling -retainCount Considered Harmful

Or, Why I Didn't Use retainCount On My Summer Vacation
This post is intended to solicit detailed write-ups about the whys and wherefores of that infamous method, retainCount, in order to consolidate the relevant information floating around SO.*
The basics: What are the official reasons to not use retainCount? Is there ever any situation at all when it might be useful? What should be done instead?** Feel free to editorialize.
Historical/explanatory: Why does Apple provide this method in the NSObject protocol if it's not intended to be used? Does Apple's code rely on retainCount for some purpose? If so, why isn't it hidden away somewhere?
For deeper understanding: What are the reasons that an object may have a different retain count than would be assumed from user code? Can you give any examples*** of standard procedures that framework code might use which cause such a difference? Are there any known cases where the retain count is always different than what a new user might expect?
Anything else you think is worth metioning about retainCount?
*
Coders who are new to Objective-C and Cocoa often grapple with, or at least misunderstand, the reference-counting scheme. Tutorial explanations may mention retain counts, which (according to these explanations) go up by one when you call retain, alloc, copy, etc., and down by one when you call release (and at some point in the future when you call autorelease).
A budding Cocoa hacker, Kris, could thus quite easily get the idea that checking an object's retain count would be useful in resolving some memory issues, and, lo and behold, there's a method available on every object called retainCount! Kris calls retainCount on a couple of objects, and this one is too high, and that one's too low, and what the heck is going on?! So Kris makes a post on SO, "What's wrong with my memory management?" and then a swarm of <bold>, <large> letters descend saying "Don't do that! You can't rely on the results.", which is well and good, but our intrepid coder may want a deeper explanation.
I'm hoping that this will turn into an FAQ, a page of good informational essays/lectures from any of our experts who are inclined to write one, that new Cocoa-heads can be pointed to when they wonder about retainCount.
** I don't want to make this too broad, but specific tips from experience or the docs on verifying/debugging retain and release pairings may be appropriate here.
***In dummy code; obviously the general public don't have access to Apple's actual code.
The basics: What are the official reasons to not use retainCount?
Autorelease management is the most obvious -- you have no way to be sure how many of the references represented by the retainCount are in a local or external (on a secondary thread, or in another thread's local pool) autorelease pool.
Also, some people have trouble with leaks, and at a higher level reference counting and how autorelease pools work at fundamental levels. They will write a program without (much) regard to proper reference counting, or without learning ref counting properly. This makes their program very difficult to debug, test, and improve -- it's also a very time consuming rectification.
The reason for discouraging its use (at the client level) is twofold:
The value may vary for so many reasons. Threading alone is reason enough to never trust it.
You still have to implement correct reference counting. retainCount will never save you from imbalanced reference counting.
Is there ever any situation at all when it might be useful?
You could in fact use it in a meaningful way if you wrote your own allocators or reference counting scheme, or if your object lived on one thread and you had access to any and all autorelease pools it could exist in. This also implies you would not share it with any external APIs. The easy way to simulate this is to create a program with one thread, zero autorelease pools, and do your reference counting the 'normal' way. It's unlikely that you'll ever need to solve this problem/write this program for anything other than "academic" reasons.
As a debugging aid: you could use it to verify that the retain count is not unusually high. If you take this approach, be mindful of the implementation variances (some are cited in this post), and don't rely on it. Don't even commit the tests to your SCM repository.
This may be a useful diagnostic in extremely rare circumstances. It can be used to detect:
Over-retaining: An allocation with a positive imbalance in retain count would not show up as a leak if the allocation is reachable by your program.
An object which is referenced by many other objects: One illustration of this problem is a (mutable) shared resource or collection which operates in a multithreaded context - frequent access or changes to this resource/collection can introduce a significant bottleneck in your program's execution.
Autorelease levels: Autoreleasing, autorelease pools, and retain/autorelease cycles all come with a cost. If you need to minimize or reduce memory use and/or growth, you could use this approach to detect excessive cases.
From commentary with Bavarious (below): a high value may also indicate an invalidated allocation (dealloc'd instance). This is completely an implementation detail, and again, not usable in production code. Messaging this allocation would result in a error when zombies are enabled.
What should be done instead?
If you're not responsible for returning the memory at self (that is, you did not write an allocator), leave it alone - it is useless.
You have to learn proper reference counting.
For a better understanding of release and autorelease usage, set up some breakpoints and understand how they are used, in what cases, etc. You'll still have to learn to use reference counting correctly, but this can aid your understanding of why it's useless.
Even simpler: use Instruments to track allocs and ref counts, then analyze the ref counting and callstacks of several objects in an active program.
Historical/explanatory: Why does Apple provide this method in the NSObject protocol if it's not intended to be used? Does Apple's code rely on retainCount for some purpose? If so, why isn't it hidden away somewhere?
We can assume that it is public for two primary reasons:
Reference counting proper in managed environments. It's fine for the allocators to use retainCount -- really. It's a very simple concept. When -[NSObject release] is called, the ref counter (unless overridden) may be called, and the object can be deallocated if retainCount is 0 (after calling dealloc). This is all fine at the allocator level. Allocators and zones are (largely) abstracted so... this makes the result meaningless for ordinary clients. See commentary with bbum (below) for details on why retainCount cannot be equal to 0 at the client level, object deallocation, deallocation sequences, and more.
To make it available to subclassers who want a custom behavior, and because the other reference counting methods are public. It may be handy in a few cases, but it's typically used for the wrong reasons (e.g. immortal singletons). If you need your own reference counting scheme, then this family may be worth overriding.
For deeper understanding: What are the reasons that an object may have a different retain count than would be assumed from user code? Can you give any examples*** of standard procedures that framework code might use which cause such a difference? Are there any known cases where the retain count is always different than what a new user might expect?
Again, a custom reference counting schemes and immortal objects. NSCFString literals fall into the latter category:
NSLog(#"%qu", [#"MyString" retainCount]);
// Logs: 1152921504606846975
Anything else you think is worth mentioning about retainCount?
It's useless as a debugging aid. Learn to use leak and zombie analyses, and use them often -- even after you have a handle on reference counting.
Update: bbum has posted an article entitled retainCount is useless. The article contains a thorough discussion of why -retainCount isn’t useful in the vast majority of cases.
The general rule of thumb is if you're using this method, you better be damn sure you know what you're doing. If you are using it for debugging a memory leak you're doing it wrong, if you're doing it to see what is going on with an object, you're doing it wrong.
There is one case where I have used it, and found it useful. That is in doing a shared object cache where I wanted to flush the object when nothing had a reference to it anymore. In this situation I waited until the retainCount is equal to 1, and then I can release it knowing that nothing else is holding onto it, this will obviously not work properly in garbage collected environments and there are better ways to do it. But this is still the only 'valid' use case I've seen for it, and isn't something a lot of people will be doing.

What is the cost of using autorelease in Cocoa?

Most of Apples documentation seems to avoid using autoreleased objects especially when creating gui views, but I want to know what the cost of using autoreleased objects is?
UIScrollView *timeline = [[UIScrollView alloc] initWithFrame:CGRectMake(0, 20, 320, 34)];
[self addSubview:timeline];
[timeline release];
Ultimately should I use a strategy where everything is autoreleased and using retain/release should be the exception to the rule for specific cases? Or should I generally be using retain/release with autorelease being the exception for returned objects from convenience methods like [NSString stringWithEtc...] ?
There are two costs:
(Assuming you have an option to avoid autoreleased objects.) You effectively unnecessarily extend the lifetime of your objects. This can mean that your memory footprint grows -- unnecessarily. On a constrained platform, this can mean that your application is terminated if it exceeds a limit. Even if you don't exceed a limit, it may cause your system to start swapping, which is very inefficient.
The additional overhead of finding the current autorelease pool, adding the autoreleased object to it, and then releasing the object at the end (an extra method call). This may not be a large overhead, but it can add up.
Best practice on any platform is to try to avoid autorelease if you can.
To answer the questions:
Ultimately should I use a strategy where everything is autoreleased and using retain/release should be the exception to the rule for specific cases?
Quite the opposite.
Or should I generally be using retain/release with autorelease being the exception for returned objects from convenience methods like [NSString stringWithEtc...] ?
You should always use retain/release if you can -- in the case of NSString there is typically no need to use stringWithEtc methods as there are initWithEtc equivalents.
See also this question.
I have to disagree with Jim Puls - I think that not using Autorelease makes debugging more difficult, because you are more likely to find yourself accidentally leaking memory. Of course Clang static analyzer can pick up some of these instances, but for me, the slight overhead costs in habitually using autorelease are far overshadowed by my code being less likely to be buggy.
And then, only if I have a tight loop I need to optimize will I start looking at performance. Otherwise this is all just premature optimization, which is generally considered to be a bad thing.
I'm surprised nobody has mentioned this yet. The biggest reason to avoid autoreleased objects when you can has nothing to do with performance. Yes, all of the performance concerns mentioned here are absolutely valid, but the biggest downside to autorelease is that it makes debugging significantly more difficult.
If you have an overreleased object that's never autoreleased, it's trivially easy to track down. If you have a user-reported crash that happens intermittently with a backtrace somewhere south of NSPopAutoreleasePool, good luck...
I generally use autoreleased objects these days because they tend to result in simpler, easier to read code. You declare and initialize them, then let the drop out of scope. Mechanically they exist quite a bit longer, but from the perspective of the person writing the code it is equivalent to a stack declared object in C++ automatically being destructed when the function returns and its frame is destroyed.
While there is an efficiency loss, in most cases it is not significant. The bigger issue is the more extant objects and later memory recovery can lead to a more fragmented address space. If that is an issue it usually is fairly simple to go in and switch to manual retain/release in a few hot methods and improve it.
As others have said, readability trumps performance in nonperformance sensitive code. There are a number of cases where using autoreleased objects leads to more memory fragmentation, but in any case where the object will outlive the pool it will not. In those cases the only price you are paying is finding the cost of finding the correct autorelease pool.
One benefit to using autorelease pools is that they are exception safe without using #try/#finally. Greg Parker ('Mr. Objective-C') has a great post explaining the details of this.
I tend to use autorelease a lot as its less code and makes it more readable, IMO. The downside, as others have pointed out, is that you extend the lifetime of objects, thus temporarily using more memory. In practice, I have yet to find overuse of autorelease to be a significant issue in any Mac app I've written. If high memory usage does seem to be an issue (that isn't caused by a genuine leak), I just add in more autorelease pools (after profiling to show me where I need them). But, in general, this is quite rare. As Mike Ash's post shows (Graham Lee linked to it), autorelease pools have very little overhead and are fast. There's almost zero cost to adding more autorelease pools.
Granted, this is all for Mac apps. In iPhone apps, where memory is more tight, you may want to be conservative in your use of autorelease. But as always, write readable code first, and then optimize later, by measuring where the slow/memory intensive parts are.
The costs are:
The time to locate the current thread's autorelease pool and add the object to it.
The memory occupied by the object until it is released at some later point.
If you want to be very conservative with your memory usage, you should avoid autorelease. However, it is a useful technique that can make code more readable. Obsessively using retain/release falls under the umbrella of "premature optimization."
If you are in Cocoa's main event handling thread (which you are most of the time), the autorelease pool is emptied when control returns to the event handler. If your method is short and doesn't loop over large amounts of data, using autorelease to defer deallocation to the end of the run loop is fine.
The time to be wary of autorelease is when you are in a loop. For example, you are iterating over a user's address book and perhaps loading an image file for each entry. If all of those image objects are autoreleased, they will accumulate in memory until you have visited the entire address book. If the address book is large enough, you may run out of memory. If you release the images as soon as you are done with them, within the loop, your app can recycle the memory.
If you can't avoid autoreleasing inside a loop (it's being done by code you didn't write and can't change), you can also manage an NSAutoreleasePool within the loop yourself if needed.
So, be mindful of using autorelease inside loops (or methods that may be called from loops), but don't avoid it when it can make code more readable.
As I understand it, the main downside to using autorelease is that you don't know when the object will finally be released and destroyed. This could potentially cause your app to use a lot more memory than it needs to if you have a lot of autoreleased objects hanging around but not yet released.
Others have answered whether you should autorelease, but when you must autorelease, drain early and drain often: http://www.mikeash.com/?page=pyblog/autorelease-is-fast.html
I notice the code sample you provided is for the iPhone. Apple specifically recommends avoiding autoreleased objects for iPhone apps. I can't find the specific reasoning, but they were hammering this point at WWDC.
One side note to keep in mind is if you are spawning a new thread, you must setup a new Autorelease pool on that thread before you do anything else. Even if you are not using autorelease objects, chances are that something in the Cocoa APIs is.
Old thread, but chipping on for the benefit of newer readers.
I use autorelease vs retain/release depending on the risk of autorelease bugs specific to an object and the size of the object. If I'm just adding some tiny UIImageViews, or a couple of UILabels to my view, autorelease keeps the code readable and manageable. And when the view is removed and dealloced, these subviews should get released soon enough.
If on the other hand we're talking about a UIWebView (high risk of autorelease bugs), or of course some data that needs to be persistent till the 'death' of the object, retain/release is the way to go.
Honestly, my projects have not gotten that big yet, where the additional 'staying-time' of autoreleased objects would create a memory problem. For complex apps, that concern is legitimate.
In any case, I don't think a one-size-fits-all approach would be right. You use whatever approach - or combination of approaches - is right for the project, keeping all the factors mentioned above in mind.