How can doing tasks in multiple threads be 100 times slower than doing sequentially on the main thread? - objective-c

I have this other question of mine where I have asked about converting a code from sequential to parallel processing using Grand Central Dispatch.
I will copy the question text to makes things easy...
I have an array of NSNumbers that have to pass thru 20 tests. If one test fails than the array is invalid if all tests pass than the array is valid. I am trying to do it in a way that as soon as the first failure happens it stops doing the remaining tests. If a failure happens on the 3rd test then stop evaluating other tests.
Every individual test returns YES when it fails and NO when it is ok.
I am trying to convert the code I have that is serial processing, to parallel processing with grand central dispatch, but I cannot wrap my head around it.
This is what I have.
First the definition of the tests to be done. This array is used to run the tests.
#define TESTS #[ \
#"averageNotOK:", \
#"numbersOverRange:", \
#"numbersUnderRange:",\
#"numbersForbidden:", \
// ... etc etc
#"numbersNotOnCurve:"]
- (BOOL) numbersPassedAllTests:(NSArray *)numbers {
NSInteger count = [TESTS count];
for (int i=0; i<count; i++) {
NSString *aMethodName = TESTS[i];
SEL selector = NSSelectorFromString(aMethodName);
BOOL failed = NO;
NSMethodSignature *signature = [[self class] instanceMethodSignatureForSelector:selector];
NSInvocation *invocation = [NSInvocation invocationWithMethodSignature:signature];
[invocation setSelector:selector];
[invocation setTarget:self];
[invocation setArgument:&numbers atIndex:2];
[invocation invoke];
[invocation getReturnValue:&failed];
if (failed) {
return NO;
}
}
return YES;
}
This work perfectly but performs the tests sequentially.
After working on the code with the help of an user, I got this code using grand central dispatch:
- (BOOL) numbersPassedAllTests:(NSArray *)numbers {
volatile __block int32_t hasFailed = 0;
NSInteger count = [TESTS count];
__block NSArray *numb = [[NSArray alloc] initWithArray:numbers];
dispatch_apply(
count,
dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0),
^(size_t index)
{
// do no computation if somebody else already failed
if(hasFailed) {
return;
}
SEL selector = NSSelectorFromString(TESTS[index]);
BOOL failed = NO;
NSMethodSignature *signature = [[self class] instanceMethodSignatureForSelector:selector];
NSInvocation *invocation = [NSInvocation invocationWithMethodSignature:signature];
[invocation setSelector:selector];
[invocation setTarget:self];
[invocation setArgument:&numb atIndex:2];
[invocation invoke];
[invocation getReturnValue:&failed];
if(failed)
OSAtomicIncrement32(&hasFailed);
});
return !hasFailed;
}
Activity Monitor shows what appears to be the cores being used with more intensity but this code is at least 100 times slower than the older one working sequentially!
How can that be?

If your methods that you're calling are simple, the overhead of creating all of these threads could offset any advantage gained by concurrency. As the Performing Loop Iterations Concurrently section of the Concurrency Programming Guide says:
You should make sure that your task code does a reasonable amount of work through each iteration. As with any block or function you dispatch to a queue, there is overhead to scheduling that code for execution. If each iteration of your loop performs only a small amount of work, the overhead of scheduling the code may outweigh the performance benefits you might achieve from dispatching it to a queue. If you find this is true during your testing, you can use striding to increase the amount of work performed during each loop iteration. With striding, you group together multiple iterations of your original loop into a single block and reduce the iteration count proportionately. For example, if you perform 100 iterations initially but decide to use a stride of 4, you now perform 4 loop iterations from each block and your iteration count is 25. For an example of how to implement striding, see “Improving on Loop Code.”
That link to Improving on Loop Code walks through a sample implementation of striding, whereby you balance the number of threads with the amount of work done by each. It will take some experimentation to find the right balance with your methods, so play around with different striding values until you achieve the best performance.
In my experiments with a CPU-bound process, I found that I achieved a huge gain when doing two threads, but it diminished after that point. It may vary based upon what is in your methods that you're calling.
By the way, what are these methods that you're calling doing? If you're doing anything that requires the main thread (e.g. UI updates), that will also skew the results. For the sake of comparison, I'd suggest you take your serial example and dispatch that to a background queue (as a single task), and see what sort of performance you get that way. This way you can differentiate between main vs. background queue related issues, and the too-many-threads overhead issue I discuss above.

Parallel computing only makes sense if you have enough tasks for each node to do. Otherwise, the extra overhead of setting up/managing the parallel nodes takes up more time than the problem itself.
Example of bad parallelization:
void function(){
for(int i = 0; i < 1000000; ++i){
for(int j = 0; j < 1000000; ++j){
ParallelAction{ //Turns the following code into a thread to be done concurrently.
print(i + ", " + j)
}
}
}
Problem: every print() statement has to be turned into a thread, where a worker node has to initialize, acquire the thread, finish, and find a new thread.
Essentially, you've got 1 000 000 * 1 000 000 threads waiting for a node to work on them.
How to make the above better:
void function(){
for(int i = 0; i < 1000000; ++i){
ParallelAction{ //Turns the following code into a thread to be done concurrently.
for(int j = 0; j < 1000000; ++j){
print(i + ", " + j)
}
}
}
This way, every node can start up, do a sizeable amount of work (print 1 000 000 things), finish up, and find a new job.
http://en.wikipedia.org/wiki/Granularity
The above link talks about granularity, the amount breaking up of a problem that you do.

Related

Objective-C Reusing NSString Memory Leak

I have written a very simple test application to try to help with a larger project I'm working on.
Simply put, the test app loops a predetermined number of times and appends "1" to a string on each loop. When the loop hits a multiple of 1000, the string is reset and the process starts over again.
The code is below; but what I am finding is that the memory usage is much higher than I would expect. Each iteration adds about .5MB.
It appears that the newString is not reused, but is discarded and a new instance created, without recovering the memory it was using.
Ultimately, the software needs to count much much higher than 100000.
As a test, if I change the iteration to 10million, it takes over 5GB memory!
Has anybody got any suggestions? So far I have various ways of writing the clearing of the string and turning off ARC and releasing it/recreating manually, but none seem to be reclaiming the amount of memory I would expect.
Thank you for any help!
*ps. Yes this actual software is totally pointless, but as I say, its a test app that will be migrated into a useful piece of code once fixed.
int targetCount = 100000;
NSString * newString;
int main(int argc, const char * argv[]) {
#autoreleasepool {
process();
return 0;
}
}
void process() {
for (int i=0; i<targetCount; i++) {
calledFunction(i);
}
}
void calledFunction(count) {
if ((count % 1000) == 0) {
newString = nil;
newString = #"";
} else {
newString = [NSString stringWithFormat:#"%#1", newString];
}
}
Your calledFunction function creates an autoreleased NSString that won't be released until the current autorelease pool gets drained.
Your process function calls the calledFunction 100,000 times in a loop. During the duration of this loop, the current autorelease pool is not given a chance to drain. By the time the end of the process method is reached, all 100,000 instances of the NSString objects created in calledFunction are still in memory.
A common solution to avoid a build-up of autoreleased objects when using a large loop is to add an additional autorelease pool as shown below:
void process() {
for (int i=0; i<targetCount; i++) {
#autoreleasepool {
calledFunction(i);
}
}
}
Your problem stems from the auto release pool, a somewhat anachronistic feature in the days of ARC.
When an object is created with an alloc/init combination the resultant object is owned by the caller. The same is true for the standard new method, it is defined as alloc followed by init.
For each init... method a class by have a matching <type>... method, this is defined as alloc/init/autorelease and returns an unowned object to the caller. For example your code uses stringWithFormat: which is defined as alloc/initWithFormat/autorelease.
The unowned returned object is in the auto release pool and unless ownership is taken it will be release automatically the next time that pool is emptied, which for the main autorelease pool is once per iteration of the main event loop. In many programs iterations of the event loop are frequent enough to reclaim objects from the auto release pool quick enough that memory usage does not climb. However if objects are created and then discarded a lot with a single iteration of the event loop, as in your example of a large for loop, the auto release pool can end up with a lot of needed objects before it is emptied.
A Common Solution
A common solution to this problem is to use a local auto release pool, which is emptied as soon as it is exited. By judicious placement of such local auto release pools memory use can be minimised. A common place for them is wrapping the body of loops which generate a lot of garbage, in your code this would be:
void process()
{
for (int i=0; i<targetCount; i++)
{ #autoreleasepool
{
calledFunction(i);
}
}
}
here all auto released and discarded objects created by calledFunction() will be reclaimed on every iteration of the loop.
A disadvantage of this approach is determining the best placement of the #autoreleasepool constructs. Surely in these days of automatic referencing count (ARC) this process can be simplified? Of course...
Another Solution: Avoid The Auto Release Pool
The problem you face is objects ending up in the auto release pool for too long, a simple solution to this is to never put the objects in the pool in the first place.
Objective-C has a third object creation pattern new..., it is similar to the <type>... but without the autorelease. Originating from the days of manual memory management and heavy auto release pool use most classes only implement new - which is just alloc/init - and no other members of the family, but you can easily add them with a category. Here is newWithFormat:
#interface NSString (ARC)
+ (instancetype)newWithFormat:(NSString *)format, ... NS_FORMAT_FUNCTION(1,2);
#end
#implementation NSString (ARC)
+ (instancetype)newWithFormat:(NSString *)format, ...
{
va_list args;
va_start(args, format);
id result = [[self alloc] initWithFormat:format arguments:args];
va_end(args);
return result;
}
#end
(Due to the variable arguments this is more involved than most new family methods would be.)
Add the above to your application and then replace calls to stringWithFormat with newWithFormat and the returned strings will be owned by the caller, ARC will manage them, they won't fill up the auto release pool - they will never enter it, and you won't need to figure out where to place #autoreleasepool constructs. Win, win, win :-)
HTH

Objective-c pendulum modelling memory issues

I am trying to implement a modelling class for a Physics project with finite difference methods for simulating a simple pendulum. I want to be able to make this class as generic as possible so I can do whatever I want with the values on each iteration of the method. For this reason I have given my methods callback blocks which can also be used to stop the method if we want to.
For example my Euler method loop looks like so:
for (NSInteger i = 0; i < n; i++) {
if (callBack) {
if(!callBack(NO, currentTheta, currentThetaDot, currentT, (CGFloat)i/n)) break;
}
currentTheta += self.dt*f_single_theta(currentThetaDot);
currentThetaDot += self.dt*f_single_thetaDot(currentTheta, currentThetaDot, gamma);
currentT += self.dt;
}
And in the callBack block I run the code
^BOOL (BOOL complete, double theta, double thetaDot, CGFloat timeElapsed, CGFloat percentComplete){
eulerT = [eulerT stringByAppendingFormat:#"%.8f\n",timeElapsed];
eulerTheta = [eulerTheta stringByAppendingFormat:#"%.8f\n",theta];
if ((currentThetaDot*currentThetaDot + cos(currentTheta)) > 0.5) {
return 0; // stops running if total E > 0.5
}
return 1;
}];
Where eulerT and eulerTheta are strings which I later save to a file. This callback method quickly results in a massive build up of memory, even for n of order 10,000 I end up with about 1Gb of RAM usage. As soon as I comment out calling the callBack block this drops right off. Is there anyway I can keep this nice functionality without the massive memory problems?
Many people who are new to Objective C do not realize the difference between [NSArray array] and [[NSArray alloc] init]. In the days before ARC, the difference was much more obvious now. Both create a new object, but the former allocates the object, assigns it to the current NSAutoreleasePool, and leaves it with a retain count of 0 while the latter allocates it and leaves it with a retain count of 1.
Objects that are assigned to an NSAutoreleasePool do not get deallocated immediately when the retain count reaches 0. Instead, they get deallocated when the OS gets time to. Generally this can be assumed to be when control returns to the current run loop, but it can also be when drain is called on the NSAutoreleasePool.
With ARC, the difference is less obvious, but still significant. Many, if not most, of the objects your allocate are assigned to an autorelease pool. This means that you don't get them back just because you're done using them. That leads to the memory usage spiking in tight loops, such as what you posted. The solution is to explicitly drain your autorelease pool, like this:
for (NSInteger i = 0; i < n; i++) {
if (callBack) {
#autoreleasepool {
if(!callBack(NO, currentTheta, currentThetaDot, currentT, (CGFloat)i/n))
break;
}
}
currentTheta += self.dt*f_single_theta(currentThetaDot);
currentThetaDot += self.dt*f_single_thetaDot(currentTheta, currentThetaDot, gamma);
currentT += self.dt;
}
You should wrap the inside of your loop in #autoreleasepool{} to clean up temporary objects.

NSOperation & Singleton: Correct concurency design

I need an advice from you guys on the design of my app here, basically I would like to know if it will work as I expect ? As the multi-threading is quite tricky thing I would like to hear from you.
Basically my task is very simple -I've SomeBigSingletonClass - big singleton class, which has two methods someMethodOne and someMethodTwo
These methods should be invoked periodically (timer based) and in separate threads.
But there should be only one instance of each thread at the moment, e.g. there should be only one running someMethodOne at any time and the same for someMethodTwo.
What I've tried
GCD - Did implementation with GCD but it lacks very important feature, it does not provide means to check if there is any running task at the moment, i.e. I was not able to check if there is only one running instance of let say someMethodOne method.
NSThread - It does provide good functionality but I'm pretty sure that new high level technologies like NSOperation and GCD will make it more simple to maintain my code. So I decided to give-up with NSThread.
My Solution with NSOperation
How I plan to implement the two thread invokation
#implementation SomeBigSingletonClass
- (id)init
{
...
// queue is an iVar
queue = [[NSOperationQueue alloc] init];
// As I'll have maximum two running threads
[queue setMaxConcurrentOperationCount:2];
...
}
+ (SomeBigSingletonClass *)sharedInstance
{
static SomeBigSingletonClass *sharedInstance = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
sharedInstance = [[SomeBigSingletonClass alloc] init];
});
return sharedInstance;
}
- (void)someMethodOne
{
SomeMethodOneOperation *one = [[SomeMethodOneOperation alloc] init];
[queue addOperation:one];
}
- (void)someMethodTwo
{
SomeMethodTwoOperation *two = [[SomeMethodOneOperation alloc] init];
[queue addOperation:two];
}
#end
And finally my NSOperation inherited class will look like this
#implementation SomeMethodOneOperation
- (id)init
{
if (![super init]) return nil;
return self;
}
- (void)main {
// Check if the operation is not running
if (![self isExecuting]) {
[[SomeBigSingletonClass sharedInstance] doMethodOneStuff];
}
}
#end
And the same for SomeMethodTwoOperation operation class.
If you are using NSOperation, you can achieve what you want be creating your own NSOperationQueue and setting numberOfConcurrentOperations to 1.
You could have also maybe used an #synchronized scope with your class as your lock object.
EDIT: clarification---
What I am proposing:
Queue A (1 concurrent operation--used to perform SomeMethodOneOperation SomeMethodTwoOperation once at a time)
Queue B (n concurrent operations--used for general background operation performing)
EDIT 2: Updated code illustrating approach to run maximum operation one and operation two, with max one each of operation one and operation two executing at any given time.
-(void)enqueueMethodOne
{
static NSOperationQueue * methodOneQueue = nil ;
static dispatch_once_t onceToken ;
dispatch_once(&onceToken, ^{
queue = [ [ NSOperationQueue alloc ] init ] ;
queue = 1 ;
});
[ queue addOperation:[ NSBlockOperation blockOperationWithBlock:^{
... do method one ...
} ] ];
}
-(void)enqueueMethodTwo
{
static NSOperationQueue * queue = nil ;
static dispatch_once_t onceToken ;
dispatch_once(&onceToken, ^{
queue = [ [ NSOperationQueue alloc ] init ] ;
queue = 1 ;
});
[ queue addOperation:[ NSBlockOperation blockOperationWithBlock:^{
... do method two ...
} ] ];
}
EDIT 3:
per our discussion:
I pointed out that isExecuting is a member variable and refers only to the state of the operation being queried, not if any instance of that class is executing
therefore Deimus' solution won't work to keep multiple instances of operation one running simultaneously for example
Sorry, I'm late to the party. If your methods are called back based on timers, and you want them to execute concurrently with respect to one another, but synchronous with respect to themselves, might I suggest using GCD timers.
Basically, you have two timers, one which executes methodOne, and the other executes methodTwo. Since you pass blocks to the GCD timers, you don't even have to use methods, especially if you want to make sure other code does not call those methods when they are not supposed to run.
If you schedule the timers onto a concurrent queue, then both timers could possibly be running at the same time on different threads. However, the timer itself will only run when it is scheduled. Here is an example I just hacked up... you can easily use it with a singleton...
First, a helper function to create a timer that takes a block which will be called when the timer fires. The block passes the object, so it can be referenced by the block without creating a retain cycle. If we use self as the parameter name, the code in the block can look just like other code...
static dispatch_source_t setupTimer(Foo *fooIn, NSTimeInterval timeout, void (^block)(Foo * self)) {
// Create a timer that uses the default concurrent queue.
// Thus, we can create multiple timers that can run concurrently.
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_source_t timer = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, queue);
uint64_t timeoutNanoSeconds = timeout * NSEC_PER_SEC;
dispatch_source_set_timer(timer,
dispatch_time(DISPATCH_TIME_NOW, timeoutNanoSeconds),
timeoutNanoSeconds,
0);
// Prevent reference cycle
__weak Foo *weakFoo = fooIn;
dispatch_source_set_event_handler(timer, ^{
// It is possible that the timer is running in another thread while Foo is being
// destroyed, so make sure it is still there.
Foo *strongFoo = weakFoo;
if (strongFoo) block(strongFoo);
});
return timer;
}
Now, the basic class implementation. If you don't want to expose methodOne and methodTwo, there is no reason to even create them, especially if they are simple, as you can just put that code directly in the block.
#implementation Foo {
dispatch_source_t timer1_;
dispatch_source_t timer2_;
}
- (void)methodOne {
NSLog(#"methodOne");
}
- (void)methodTwo {
NSLog(#"methodTwo");
}
- (id)initWithTimeout1:(NSTimeInterval)timeout1 timeout2:(NSTimeInterval)timeout2 {
if (self = [super init]) {
timer1_ = setupTimer(self, timeout1, ^(Foo *self) {
// Do "methodOne" work in this block... or call it.
[self methodOne];
});
timer2_ = setupTimer(self, timeout2, ^(Foo *self) {
// Do "methodOne" work in this block... or call it.
[self methodTwo];
});
dispatch_resume(timer1_);
dispatch_resume(timer2_);
}
return self;
}
- (void)dealloc {
dispatch_source_cancel(timer2_);
dispatch_release(timer2_);
dispatch_source_cancel(timer1_);
dispatch_release(timer1_);
}
#end
EDIT
In response to the comments (with more detail to hopefully explain why the block will not be executed concurrently, and why missed timers are coalesced into one).
You do not need to check for it being run multiple times. Straight from the documentation...
Dispatch sources are not reentrant. Any events received while the
dispatch source is suspended or while the event handler block is
currently executing are coalesced and delivered after the dispatch
source is resumed or the event handler block has returned.
That means when a GCD dispatch_source timer block is dispatched, it will not be dispatched again until the one that is already running completes. You do nothing, and the library itself will make sure the block is not executed multiple times concurrently.
If that block takes longer than the timer interval, then the "next" timer call will wait until the one that is running completes. Also, all the events that would have been delivered are coalesced into one single event.
You can call
unsigned numEventsFired = dispatch_source_get_data(timer);
from within your handler to get the number of events that have fired since the last time the handler was executed (e.g., if your handler ran through 4 timer firings, this would be 4 - but you would still get all this firings in this one event -- you would not receive separate events for them).
For example, let's say your interval timer is 1 second, and your timer happens to take 5 seconds to run. That timer will not fire again until the current block is done. Furthermore, all those timers will be coalesced into one, so you will get one call into your block, not 5.
Now, having said all that, I should caution you about what I think may be a bug. Now, I rarely lay bugs at the feet of library code, but this one is repeatable, and seems to go against the documentation. So, if it's not a bug, it's an undocumented feature. However, it is easy to get around.
When using timers, I have noticed that coalesced timers will most certainly be coalesced. That means, if your timer handler is running, and 5 timers fired while it was running, the block will be called immediately, representing those missed 5 events. However, as soon as that one is done, the block will be executed again, just once, no matter how many timer events were missed before.
It's easy to identify these, though, because dispatch_source_get_data(timer) will return 0, which means that no timer events have fired since the last time the block was called.
Thus, I have grown accustomed to adding this code as the first line of my timer handlers...
if (dispatch_source_get_data(timer) == 0) return;

Is it more efficient to schedule a method to spawn enemies or use the update method of an Enemy cache?

I am using Cocos2d for iPhone and I am wondering if it is more efficient to structure the logic of my code to spawn enemies using this method:
-(void) schedule:(SEL)selector interval:(ccTime)interval
or using the update in an EnemyCache class and verify each time if the time interval is met. Here is the code snippet that is called in the update method of the EnemyCache class (the relative time is an integer value that is updated by the GameScene at each update in the GameScene class - the GameScene update method call is scheduled with an interval of 1 second):
-(void) checkForPlayerCollisionsAndSpwanTime
{
int count = [elements count];
//CCLOG(#"count %i", count);
Element* element;
for(int i=0; i<count;i++){
element = [elements objectAtIndex:i];
NSAssert(element!=nil, #"Nil enemy");
if (element.visible)
{
[element justComeDown];
ShipEntity * ship = [[GameScene sharedGameScene]defaultShip];
CGRect rect = [ship boundingBox];
if (CGRectIntersectsRect([element boundingBox], rect)){
[element doWhatever];
element.visible=FALSE;
[element stopAllActions];
}
}
else{
if(element.spawnTime == relativeTime) {
[self addChild:element];
element.visible=TRUE;
}
}
}
}
The difference is that in this way at each update the checkForPlayerCollisionsAndSpwanTime method goes through the array of enemies. In the first way, via scheduling a selector to call a similar method, I could reduce the time spent by the CPU to look through the array and conditions.
I am not sure how costly is this call:
[self schedule:selector interval:interval repeat:kCCRepeatForever delay:0];
Looking through I see that calls this method (See below) but I wanted to ask in general what is your approach for this problem and whether I should keep using the EnemyCache update method or use the scheduleSelector methods.
-(void) scheduleSelector:(SEL)selector forTarget:(id)target interval:(ccTime)interval paused:(BOOL)paused repeat:(uint) repeat delay:(ccTime) delay
{
NSAssert( selector != nil, #"Argument selector must be non-nil");
NSAssert( target != nil, #"Argument target must be non-nil");
tHashSelectorEntry *element = NULL;
HASH_FIND_INT(hashForSelectors, &target, element);
if( ! element ) {
element = calloc( sizeof( *element ), 1 );
element->target = [target retain];
HASH_ADD_INT( hashForSelectors, target, element );
// Is this the 1st element ? Then set the pause level to all the selectors of this target
element->paused = paused;
} else
NSAssert( element->paused == paused, #"CCScheduler. Trying to schedule a selector with a pause value different than the target");
if( element->timers == nil )
element->timers = ccArrayNew(10);
else
{
for( unsigned int i=0; i< element->timers->num; i++ ) {
CCTimer *timer = element->timers->arr[i];
if( selector == timer->selector ) {
CCLOG(#"CCScheduler#scheduleSelector. Selector already scheduled. Updating interval from: %.4f to %.4f", timer->interval, interval);
timer->interval = interval;
return;
}
}
ccArrayEnsureExtraCapacity(element->timers, 1);
}
CCTimer *timer = [[CCTimer alloc] initWithTarget:target selector:selector interval:interval repeat:repeat delay:delay];
ccArrayAppendObject(element->timers, timer);
[timer release];
}
Do you have a performance problem in your app? If not, the answer is: it doesn't matter. If you do, did you measure it and did the issue come from the method in question? If not, the answer is: you're looking in the wrong place.
In other words: premature optimization is the root of all evil.
If you still want to know, there's just one way to find out: measure both variants of the code and pick the one that's faster. If the speed difference is minimal (which I suspect it will be), favor the version that's easier for you to work with. There's a different kind of performance you should consider: you, as a human being, reading, understanding, changing code. Code readability and maintainability is way more important than performance in almost all situations.
No one can (or will) look at this amount of code and conclude "Yes, A is definitely about 30-40% faster, use A". If you are concerned about the speed of the method, don't let anyone tell you which is faster. Measure it. It's the only way you can be sure.
The reason is this: programmer's are notorious about making assumptions about code performance. Many times they're wrong, because the language or hardware or understanding of the topic have made big leaps the last time they measured it. But more likely they're going to remember what they've learned because once they've asked a question just like yours, and someone else gave them an answer which they accepted as fact from then on.
But coming back to your specific example: it really doesn't matter. You're much, much, much, much, much more likely to run into performance issues due to rendering too many enemies than the code that determines when to spawn one. And then it really, really, really, really, really doesn't matter if that code is run in a scheduled selector or a scheduled update method that increases a counter every frame. This boils down to being a subjective coding style preference issue a lot more than it is a decision about performance.

Parallel reduce algorithm implementation

I've been investigating implementations of reduce [inject, fold, whatever you want to call it] functions in Objective-C using blocks and was wondering if there were any techniques for parallelizing the computation where the function applied is associative (e.g. sum of a collection of integers)?
i.e. is it possible to parallelize or improve on something like this on NSArray:
- (id)reduceWithBlock:(id (^)(id memo, id obj))block andAccumulator:(id)accumulator
{
id acc = [[accumulator copy] autorelease];
for (id obj in self) {
acc = block(acc, obj);
}
return acc;
}
Using grand-central dispatch?
EDIT: I've made a second attempt, partitioning the array into smaller chunks and reducing them in separate dispatch queues but there's no discernable performance gain in my testing: (gist here)
You can use dispatch_apply with Dispatch Global Queue for parallelizing it, but your code seems that it is not so efficient with concurrent work. Because the accumulator object requires exclusive access, and it is tightly used by the block, thus it will cause giant lock for the accumulator object.
For example, this code is nearly non-concurrent work even though using dispatch_apply with Dispatch Global Queue.
dispatch_semaphore_t sema = dispatch_semaphore_create(1);
dispatch_queue_t queue =
dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_apply([array count], queue, ^(size_t index) {
dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER);
acc = block(acc, [array objectAtIndex:index]);
dispatch_semaphore_signal(sema);
});
dispatch_release(sema);
You need split the block and the accumulator implementation for efficient parallelization.
EDITED:
(I haven't check the algorithm of your code.)
dispatch_queue_t result_queue = dispatch_queue_create(NULL, NULL);
You are using Serial Queue. Serial queue executes one block at a time. Thus, it might be
dispatch_queue_t result_queue =
dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
or
dispatch_queue_t result_queue = dispatch_queue_create(NULL, DISPATCH_QUEUE_CONCURRENT);
/* DISPATCH_QUEUE_CONCURRENT is only available OS X 10.7/iOS 4.3 or later. */
I implemented a parallel divide & conquer algorithm which works with associative functions here. Unfortunately I couldn't get any discernable speedup from it so I'm sticking with a simple serial version for now. I believe my base case needs optimising- I read somewhere that the inequality n >= p^2 should hold, where n is the number of jobs and p the number of processors.
Obviously a lot of time is being lost on array-splitting and recursing, if anybody has suggestions they'd be much appreciated.