How to ensure FIFO execution in a concurrent NSOperationQueue? - objective-c

I'm working on a framework and in order to ensure non blocking public methods, I'm using a NSOperationQueue that puts all the public method calls into an operation queue and returns immediately.
There is no relation or dependencies between different operations and the only thing that matters is that the operations are started in FIFO order that is in the same order as they were added to the queue.
Here is an example of my current implementation (sample project here):
#implementation Executor
-(instancetype) init {
self = [super init];
if(self) {
_taskQueue = [[NSOperationQueue alloc] init];
_taskQueue.name = #"com.d360.tasks";
}
return self;
}
-(void) doTask:(NSString*) taskName
{
NSOperation *operation = [NSBlockOperation blockOperationWithBlock:^{
NSLog(#"executing %#", taskName);
}];
[self.taskQueue addOperation:operation];
}
I realised though that the order at which the operations are started is not necessarily the order at which they were added to the queue. For instance, if I call
[self.executor doTask:#"Task 1"];
[self.executor doTask:#"Task 2"];
Sometimes Task 2 is started after Task 1.
The question is how can I ensure a FIFO execution start?
I could achieve it using _taskQueue.maxConcurrentOperationCount = 1; but this would allow only 1 operation at once which I don't want. One operation should not block any other operation and they can run concurrently as long as they are started in the correct order.
I looked also into the NSOperationQueuePriority property which would work If I knew the priorities of the calls which I don't. In fact, even if I sent the earlier added operation to NSOperationQueuePriorityHigh and the second to NSOperationQueuePriorityNormal the order is not guaranteed neither.
[self.executor doTask:#"Task 1" withQueuePriority:NSOperationQueuePriorityHigh];
[self.executor doTask:#"Task 2" withQueuePriority:NSOperationQueuePriorityNormal];
Output is sometimes
executing Task 2
executing Task 1
Any ideas?
thanks,
Jan

When you create each task you could add a dependency on the previous task with NSOperation -addDependency. The complication is that dependencies aren't satisfied until the dependent task completes, which probably isn't what you want. You could work around that by creating another NSOperation inside each task, and make the next queued task depend on that. This inner task can just set a flag or something that says "hey, I've started!". Then when that inner task completes it will satisfy the dependency for the next task in the queue and allow it to start.
Seems like a convoluted way to do things, though, and I'm not sure the benefit is worth the extra complication - why does it matter what order the operations are started in, if they truly are independent operations? Once they've started, the OS decides which task gets CPU time, and you don't have much control over it anyway, so why not just queue them up and let the OS manage the start order?

Related

Return value from asynchronous SQL method

I have this code:
- (NSString *)obtenerDatosUsuario
{
__block NSString *result=#"";
[self obtenerDatosUsuarioSQL:^(NSString *resultadoSQL){
result=resultadoSQL;
}];
return result;
}
And I want that the return be the content of resultadoSQL
If my guess is correct about what happens inside your method -obtenerDatosUsuarioSQL: (i.e., it performs a lengthy operation in a background thread, and gives the result to the passed block argument), then your code runs in the following order:
You call -obtenerDatosUsuario
You call -obtenerDatosUsuarioSQL:, passing a completion handler block.
Execution proceeds forward and reaches the return statement at the end of -obtenerDatosUsuario, and exits the method body. The returned variable result hasn't been set yet!
Sometime later, the SQL query completes and the block is executed. But it is too late to return the result because execution already exited the method -obtenerDatosUsuario.
There are ways to make this asynchronous method behave synchronously (e.g. semaphores), but it generally is a very, very bad idea. Most likely, obtenerDatosUsuarioSQL is asynchronous because there is a chance (even if only a small chance) that the result won't be returned immediately. Maybe it's possible that the SQL will be slow. Or maybe you'll eventually be doing queries from multiple threads, so this query might have to wait for queries in other threads to finish. Or there might be other reasons. But whatever the reason, this method was implemented as asynchronous method, and you should embrace that, rather than fight it. If you change obtenerDatosUsuario to return synchronously, you open yourself to a wide variety of possible problems.
Instead, you should just adopt asynchronous pattern in your code. For example, let's imagine that you have some code that was planning on using the result of obtenerDatosUsuario for some other purpose, e.g.:
NSString *resultadoSQL = [self obtenerDatosUsuario];
// use `resultadoSQL` here
Just change that to:
[self obtenerDatosUsuarioSQL:^(NSString *resultadoSQL){
// use `resultadoSQL` here
}];
// but not here
And, if you're using obtenerDatosUsuarioSQL in some method that you're currently trying to return the value immediately, then change that to behave asynchronously, too. For example, let's assume you had something like:
- (NSString *)someOtherMethod {
NSString *resultadoSQL = [self obtenerDatosUsuario];
// let's assume you're doing something else with `resultadoSQL` to build some other string
NSString *string = ... // some expression using `resultadoSQL`
return string;
}
Then, you'd change that to also adopt asynchronous pattern:
- (void)someOtherMethod:(void (^)(NSString *))completionHandler {
[self obtenerDatosUsuarioSQL:^(NSString *resultadoSQL){
NSString *string = ... // some expression using `resultadoSQL`
completionHandler(resultadoSQL);
}];
}
When you first encounter this, this may seem unnecessarily complicated, but asynchronous programming is so critical, such a fundamental part of Cocoa programming, that one really must gain some familiarity with these common asynchronous patterns, such as blocks. Personally, we use block syntax so much that I create code snippets in Xcode's "Code Snippet Library" for typical block patterns, which simplifies life a lot and gets you out of the world of memorizing the unintuitive block syntax.
But don't be tempted to wrap asynchronous method in another method that makes it behave synchronously. You open yourself up to many types of problems if you do that.

Is it possible to use runMode:beforeDate: with [NSDate dateWithTimeIntervalSinceNow:0]?

I'm trying to use the pattern described in: Grand Central Dispatch and unit testing, Pattern for unit testing async queue that calls main queue on completion and mainly in: https://github.com/AFNetworking/AFNetworking/issues/466#issuecomment-7917445.
In my unit tests I need to "straighten" an asynchronous flow of some of the methods I have (fx AFNetworking requests operations):
This is what I use inside my tests for such "forced synchronous operations":
#property (nonatomic, assign) dispatch_semaphore_t semaphore;
// ...
- (void)runTestWithBlock:(void (^)(void))block {
self.semaphore = dispatch_semaphore_create(0);
block();
while (dispatch_semaphore_wait(self.semaphore, DISPATCH_TIME_NOW))
[[NSRunLoop currentRunLoop] runMode:NSDefaultRunLoopMode
beforeDate:[NSDate dateWithTimeIntervalSinceNow:2]];
dispatch_release(self.semaphore);
}
- (void)blockTestCompletedWithBlock:(void (^)(void))block {
dispatch_semaphore_signal(self.semaphore);
if (block) {
block();
}
}
If use fx dateWithTimeIntervalSinceNow:10 i.e. with a value more than zero (like quoted SO topics and Github comment state) I get equivalent superfluous harmful delays in my tests. If I set ...SinceNow:0... all my tests work without any delays and I don't see any problems with this 0 value.
Apple doc says:
Runs the loop once, blocking for input in the specified mode until a given date.
If no input sources or timers are attached to the run loop, this method exits immediately and returns NO; otherwise, it returns after either the first input source is processed *or* limitDate is reached.
This "or" makes me guessing whether I may use 0 seconds without affecting my code in any bad way.
I would also be thankful for any alternative to using exactly this variation of runMode... method or for a completely different solution for solving the problem described in quoted links.
Just to note, I still don't know the answer to the original question.
The intermediate solution to this problem (inspired by the code GHUnit uses to test asynchronous methods) was to use small time interval like [runLoop runMode:NSDefaultRunLoopMode beforeDate:[NSDate dateWithTimeItervalSinceNow:0.05].
Currently I use the following loop:
while (dispatch_semaphore_wait(_semaphore, DISPATCH_TIME_NOW))
CFRunLoopRunInMode(kCFRunLoopDefaultMode, 0.05, YES);
Delays have gone.

objective-c How to prevent an action while a thread is being executed

I've been using Multithreading for a while I thought I got it but my program is crashing now.
I have a method that has to download data for the server and access memory depending on the data, that process takes long, so I execute it from a secondary thread like this:
-(void)showPeople{
dispatch_queue_t pintaOcupantes = dispatch_queue_create("Pinta Ocupantes", NULL);
dispatch_async(pintaOcupantes, ^{
//BUNCH OF CODE
[self isPersonIn:jid];
//MORE CODE that include methods calling isPersonIn
});
Inside that block there's isPersonIn. It crashes if I press too fast the button that executes showPeople. IsPersonIn is something like:
-(int)isPersonIn:(XMPPJID *)jid{
int i = 0;
for(NSDictionary *card in self.listaGente){
NSLog(#"la jid es: %#", [card objectForKey:#"jid"]);
NSLog(#"la jid del usuario es: %#", jid.user);
if([[card objectForKey:#"jid"] isEqualToString:jid.user]){
return i;
}
i++;
}
return -1;
}
It compares a XMPPJID with an array which is a instance variable.
isPersonIn is called several times from different methods but all the methods that call it belong to the block, so as I understand it, all the executions of isPersonIn should be serialized, FIFO, right?
But if I press the button that executes showPeople, the one containing the block, many times very fast the app crashes on isPersonIn, sometimes without any message. I can see the threads when it crashes and I see at least 2 threads with isPersonIn last in the stack, which doesn`t make sense, since the block should be executed one at a time, not several threads at the same time, right?
Any help will be very much appreaciated.
Thanks!
[EDIT]
Also the instance array, self.listaGente, is modified outside the block.
I'm not a GCD expert, but I suspect the reason you're getting multiple threads is that you're creating a new dispatch queue each time showPeople is called.
So rather than having a single serial queue with multiple blocks, I think you are ending up with multiple queues each executing a single block.
[EDIT] If the collection is modified outside of the block but during execution of the block, this could be the source of your crash. From Fast Enumeration Documentation:
Enumeration is “safe”—the enumerator has a mutation guard so that if you attempt to modify the collection during enumeration, an exception is raised.
In this case protecting the array, that was provoking my app to crash, fixed the problem.
using:
#syncronized(theArray){
//CODE THAT WILL ACCESS OR WRITE IN THE ARRAY
}
This way threads will stop before if there's a thread already executing that code, like a mutex or semaphore

How to get hold of the currently executing NSOperation?

Is there an equivalent to [NSOperationQueue currentQueue] or [NSThread currentThread] for NSOperation?
I have a fairly complex domain model where the heavy processing happens quite deep down in the call stack. In order to timely cancel an operation I would need to pass the NSOperation as a parameter to every method until I get to the point where I want to interrupt a longer running loop. Using threads I could use [[NSThread currentThread] isCancelled] so it would seem convenient if there is an equivalent for NSOperation, unfortunately there is only the seemingly useless [NSOperationQueue currentQueue].
Came up with an extension in swift that returns the running operations
extension NSOperationQueue {
public var runningOperations: [NSOperation] {
return operations.filter {$0.executing && !$0.finished && !$0.cancelled}
}
}
You can then pick up the first one
if let operation = aQueue.runningOperations.first {}
No, there's no method to find the currently executing operation.
Two ways to solve your problem:
Operations are objects. If you need object A to talk to object B, you'll need to arrange for A to have a reference to B. There are lots of ways to do that. One way is to pass the operation along to each object that needs to know about it. Another is to use delegation. A third is to make the operation part of some larger "context" that's passed along to each method or function. If you find that you need to pass a reference from one object through several others just to get it to the object that will finally use it, that's a clue that you should think about rearranging your code.
Have the "heavy lifting" method return some value that gets passed up the call chain. You don't necessarily need the heavy lifting method to call [currentOperation cancel] to accomplish your goal. In fact, it would be better to have it return some value that the operation will understand to mean "work is done, stop now" because it can check that return value and exit immediately rather than having to call -isCancelled once in a while to find out whether it has been cancelled.
This isn't a good idea. Operations are usually canceled by their queue. Within the operation's main() method, you can periodically check if self is cancelled (say, every n trips through a loop, or at the start of every major block of commands) and abort if so.
To respond to a cancellation (say, some UI element tied to the operation's or queue's status), you use key value observing (KVO) to have your controller observe the operations' started, completion, and cancelled properties (as needed), then set your UI's state (always on the main thread) when those keys are updated. Per JeremyP's comments, it's important to note the KVO notifications come from the op's thread and UI should (almost) always be manipulated on the main thread, so you'll need to use -performSelectorOnMainThread... methods to update your actual UI when you receive a state change KVO note about your operations.
What are you really trying to do? That is, why do you feel other parts of your app need to know directly about the current operation?
You could store the current operation in the thread dictionary. Just remember to get rid of it before you exit. You can safely use the thread dict if you created the object.
You can use a combination of [NSOperationQueue currentQueue] & [NSThread currentThread] to accomplish this.
Essentially, you need to loop through the operations on the currentQueue and find the operation running on the currentThread.
NSOperation doesn't provide access to the thread it is running on, so you need to add that property yourself and assign it.
You're probably already subclassing NSOperation and providing a main, so add a 'thread' property to that subclass:
#interface MyOperation : NSOperation
#property(nonatomic,strong) NSThread *thread ;
#end
Then, in your 'main' assign the current thread to that property
myOperation.thread = [NSThread currentThread]
You can then add a 'currentOperation' method:
+(MyOperation *)currentOperation
{
NSOperationQueue *opQueue = [NSOperationQueue currentQueue] ;
NSThread *currentThread = [NSThread currentThread] ;
for( MyOperation *op in opQueue.operations ) {
if( [op isExecuting] && [op respondsToSelector:#selector(thread)] ) {
if( op.thread == currentThread ) {
return ( op ) ;
}
}
}
}
return nil ;
}
How do you know which operation you want to cancel?
When you get to the point that you want to cancel, just call [myQueue operations] and go through the operations until you find ones that you now want to cancel. I guess if you have millions of operations (or thousands) this might not work.
[myQueue operations] is thread safe - a snapshot of the Queue contents. You can dive through it pretty quick cancelling at will.
Another way:
NSOperationQueue is not a singleton, so you can create a Q that has say 200 jobs on it, and then cancel all 20 by just getting that Q and cancelling them all. Store the Q's in a dictionary on the main thread, and then you can get the jobs you want canceled from the dict and cancel them all. i.e. you have 1000 kinds of operations and at the point in the code where you realize you don't need a certain task, you just get the Q for that kind, and look through it for jobs to cancel.

Grand Central Strategy for Opening Multiple Files

I have a working implementation using Grand Central dispatch queues that (1) opens a file and computes an OpenSSL DSA hash on "queue1", (2) writing out the hash to a new "side car" file for later verification on "queue2".
I would like to open multiple files at the same time, but based on some logic that doesn't "choke" the OS by having 100s of files open and exceeding the hard drive's sustainable output. Photo browsing applications such as iPhoto or Aperture seem to open multiple files and display them, so I'm assuming this can be done.
I'm assuming the biggest limitation will be disk I/O, as the application can (in theory) read and write multiple files simultaneously.
Any suggestions?
TIA
You are correct in that you'll be I/O bound, most assuredly. And it will be compounded by the random access nature of having multiple files open and being actively read at the same time.
Thus, you need to strike a bit of a balance. More likely than not, one file is not the most efficient, as you've observed.
Personally?
I'd use a dispatch semaphore.
Something like:
#property(nonatomic, assign) dispatch_queue_t dataQueue;
#property(nonatomic, assign) dispatch_semaphore_t execSemaphore;
And:
- (void) process:(NSData *)d {
dispatch_async(self.dataQueue, ^{
if (!dispatch_semaphore_wait(self.execSemaphore, DISPATCH_TIME_FOREVER)) {
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
... do calcualtion work here on d ...
dispatch_async(dispatch_get_main_queue(), ^{
.... update main thread w/new data here ....
});
dispatch_semaphore_signal(self.execSemaphore);
});
}
});
}
Where it is kicked off with:
self.dataQueue = dispatch_queue_create("com.yourcompany.dataqueue", NULL);
self.execSemaphore = dispatch_semaphore_create(3);
[self process: ...];
[self process: ...];
[self process: ...];
[self process: ...];
[self process: ...];
.... etc ....
You'll need to determine how best you want to handle the queueing. If there are many items and there is a notion of cancellation, enqueueing everything is likely wasteful. Similarly, you'll probably want to enqueue URLs to the files to process, and not NSData objects like the above.
In any case, the above will process three things simultaneously, regardless of how many have been enqueued.
I'd use NSOperation for this because of the ease of handling both dependencies and cancellation.
I'd create one operation each for reading the data file, computing the data file's hash, and writing the sidecar file. I'd make each write operation dependent on its associated compute operation, and each compute operation dependent on its associated read operation.
Then I'd add the read and write operations to one NSOperationQueue, the "I/O queue," with a restricted width. The compute operations I'd add to a separate NSOperationQueue, the "compute queue," with a non-restricted width.
The reason for the restriced width on the I/O queue is that your work will likely be I/O bound; you may want it to have a width greater than 1, but it's very likely to be directly related to the number of physical disks on which your input files reside. (Probably something like 2x, you'll want to determine this experimentally.)
The code would wind up looking something like this:
#implementation FileProcessor
static NSOperationQueue *FileProcessorIOQueue = nil;
static NSOperationQueue *FileProcessorComputeQueue = nil;
+ (void)inititalize
{
if (self == [FileProcessor class]) {
FileProcessorIOQueue = [[NSOperationQueue alloc] init];
[FileProcessorIOQueue setName:#"FileProcessorIOQueue"];
[FileProcessorIOQueue setMaxConcurrentOperationCount:2]; // limit width
FileProcessorComputeQueue = [[NSOperationQueue alloc] init];
[FileProcessorComputeQueue setName:#"FileProcessorComputeQueue"];
}
}
- (void)processFilesAtURLs:(NSArray *)URLs
{
for (NSURL *URL in URLs) {
__block NSData *fileData = nil; // set by readOperation
__block NSData *fileHashData = nil; // set by computeOperation
// Create operations to do the work for this URL
NSBlockOperation *readOperation =
[NSBlockOperation blockOperationWithBlock:^{
fileData = CreateDataFromFileAtURL(URL);
}];
NSBlockOperation *computeOperation =
[NSBlockOperation blockOperationWithBlock:^{
fileHashData = CreateHashFromData(fileData);
[fileData release]; // created in readOperation
}];
NSBlockOperation *writeOperation =
[NSBlockOperation blockOperationWithBlock:^{
WriteHashSidecarForFileAtURL(fileHashData, URL);
[fileHashData release]; // created in computeOperation
}];
// Set up dependencies between operations
[computeOperation addDependency:readOperation];
[writeOperation addDependency:computeOperation];
// Add operations to appropriate queues
[FileProcessorIOQueue addOperation:readOperation];
[FileProcessorComputeQueue addOperation:computeOperation];
[FileProcessorIOQueue addOperation:writeOperation];
}
}
#end
It's pretty straightforward; rather than deal with multiply-nested layers of sync/async as you would with the dispatch_* APIs, NSOperation allows you to define your units of work and your dependencies between them independently. For some situations this can be easier to understand and debug.
You have received excellent answers already, but I wanted to add a couple points. I have worked on projects that enumerate all the files in a file system and calculate MD5 and SHA1 hashes of each file (in addition to other processing). If you are doing something similar, where you are searching a large number of files and the files may have arbitrary content, then some points to consider:
As noted, you will be I/O bound. If you read more than 1 file simultaneously, you will have a negative impact on the performance of each calculation. Obviously, the goal of scheduling calculations in parallel is to keep the disk busy between files, but you may want to consider structuring your work differently. For example, set up one thread that enumerates and opens the files and a second thread the gets open file handles from the first thread one at a time and processes them. The file system will cache catalog information, so the enumeration won't have a severe impact on reading the data, which will actually have to hit the disk.
If the files can be arbitrarily large, Chris' approach may not be practical since the entire content is read into memory.
If you have no other use for the data than calculating the hash, then I suggest disabling file system caching before reading the data.
If using NSFileHandles, a simple category method will do this per-file:
#interface NSFileHandle (NSFileHandleCaching)
- (BOOL)disableFileSystemCache;
#end
#include <fcntl.h>
#implementation NSFileHandle (NSFileHandleCaching)
- (BOOL)disableFileSystemCache {
return (fcntl([self fileDescriptor], F_NOCACHE, 1) != -1);
}
#end
If the sidecar files are small, you may want to collect them in memory and write them out in batches to minimize disruption of the processing.
The file system (HFS, at least) stores file records for files in a directory sequentially, so traverse the file system breadth-first (i.e., process each file in a directory before entering subdirectories).
The above is just suggestions, of course. You will want to experiment and measure performance to confirm the actual impact.
libdispatch actually provides APIs explicitly for this! Check out dispatch_io; it will handle parallelizing IO when appropriate, and otherwise serializing it to avoid thrashing the disk.
The following link is to a BitBucket project I setup utilizing NSOperation and Grand Central Dispatch in use a primitive file integrity application.
https://bitbucket.org/torresj/hashar-cocoa
I hope it is of help/use.