iPhone use of mutexes with asynchronous URL requests - objective-c

My iPhone client has a lot of involvement with asynchronous requests, a lot of the time consistently modifying static collections of dictionaries or arrays. As a result, it's common for me to see larger data structures which take longer to retrieve from a server with the following errors:
*** Terminating app due to uncaught exception 'NSGenericException', reason: '*** Collection <NSCFArray: 0x3777c0> was mutated while being enumerated.'
This typically means that two requests to the server come back with data which are trying to modify the same collection. What I'm looking for is a tutorial/example/understanding of how to properly structure my code to avoid this detrimental error. I do believe the correct answer is mutexes, but I've never personally used them yet.
This is the result of making asynchronous HTTP requests with NSURLConnection and then using NSNotification Center as a means of delegation once requests are complete. When firing off requests that mutate the same collection sets, we get these collisions.

There are several ways to do this. The simplest in your case would probably be to use the #synchronized directive. This will allow you to create a mutex on the fly using an arbitrary object as the lock.
#synchronized(sStaticData) {
// Do something with sStaticData
}
Another way would be to use the NSLock class. Create the lock you want to use, and then you will have a bit more flexibility when it comes to acquiring the mutex (with respect to blocking if the lock is unavailable, etc).
NSLock *lock = [[NSLock alloc] init];
// ... later ...
[lock lock];
// Do something with shared data
[lock unlock];
// Much later
[lock release], lock = nil;
If you decide to take either of these approaches it will be necessary to acquire the lock for both reads and writes since you are using NSMutableArray/Set/whatever as a data store. As you've seen NSFastEnumeration prohibits the mutation of the object being enumerated.
But I think another issue here is the choice of data structures in a multi-threaded environment. Is it strictly necessary to access your dictionaries/arrays from multiple threads? Or could the background threads coalesce the data they receive and then pass it to the main thread which would be the only thread allowed to access the data?

If it's possible that any data (including classes) will be accessed from two threads simultaneously you must take steps to keep these synchronized.
Fortunately Objective-C makes it ridiculously easy to do this using the synchronized keyword. This keywords takes as an argument any Objective-C object. Any other threads that specify the same object in a synchronized section will halt until the first finishes.
-(void) doSomethingWith:(NSArray*)someArray
{
// the synchronized keyword prevents two threads ever using the same variable
#synchronized(someArray)
{
// modify array
}
}
If you need to protect more than just one variable you should consider using a semaphore that represents access to that set of data.
// Get the semaphore.
id groupSemaphore = [Group semaphore];
#synchronized(groupSemaphore)
{
// Critical group code.
}

In response to the sStaticData and NSLock answer (comments are limited to 600 chars), don't you need to be very careful about creating the sStaticData and the NSLock objects in a thread safe way (to avoid the very unlikely scenario of multiple locks being created by different threads)?
I think there are two workarounds:
1) You can mandate those objects get created at the start of day in the single root thread.
2) Define a static object that is automatically created at the start of day to use as the lock, e.g. a static NSString can be created inline:
static NSString *sMyLock1 = #"Lock1";
Then I think you can safely use
#synchronized(sMyLock1)
{
// Stuff
}
Otherwise I think you'll always end up in a 'chicken and egg' situation with creating your locks in a thread safe way?
Of course, you are very unlikely to hit any of these problems as most iPhone apps run in a single thread.
I don't know about the [Group semaphore] suggestion earlier, that might also be a solution.

N.B. If you are using synchronisation don't forget to add -fobjc-exceptions to your GCC flags:
Objective-C provides support for
thread synchronization and exception
handling, which are explained in this
article and “Exception Handling.” To
turn on support for these features,
use the -fobjc-exceptions switch of
the GNU Compiler Collection (GCC)
version 3.3 and later.
http://developer.apple.com/library/ios/#documentation/cocoa/Conceptual/ObjectiveC/Articles/ocThreading.html

Use a copy of the object to modify it. Since you are trying to modify the reference of an array (collection), while someone else might also modify it (multiple access), creating a copy will work for you. Create a copy and then enumerate over that copy.
NSMutableArray *originalArray = #[#"A", #"B", #"C"];
NSMutableArray *arrayToEnumerate = [originalArray copy];
Now modify the arrayToEnumerate. Since it's not referenced to originalArray, but is a copy of the originalArray, it won't cause an issue.

There are other ways if you don't want the overhead of Locking as it has its cost. Instead of using a lock to protect on shared resource (in your case it might be dictionary or array), you can create a queue to serialise the task that is accessing your critical section code.
Queue doesn't take same amount of penalty as locks as it doesn't require trapping into the kernel to acquire mutex.
simply put
dispatch_async(serial_queue, ^{
<#critical code#>
})
In case if you want current execution to wait until task complete, you can use
dispatch_sync(serial_queue Or concurrent, ^{
<#critical code#>
})
Generally if execution doest need not to wait, asynchronous is a preferred way of doing.

Related

Enforcing one-at-a-time access to pointer from a primative wrapper

I've read a fair amount on thread-safety, and have been using GCD to keep the math-heavy code off the main thread for a while now (I learned about it before NSOperation, and it seems to still be the easier option). However, I wonder if I could improve part of my code that currently uses a lock.
I have an Objective-C++ class that is a wrapper for a c++ vector. (Reasons: primitive floats are added constantly without knowing a limit beforehand, the container must be contiguous, and the reason for using a vector vs NSMutableData is "just cause" it's what I settled on, and NSMutableData will still suffer from the same "expired" pointer when it goes to resize itself).
The class has instance methods to add data points that are processed and added to the vector (vector.push_back). After new data is added I need to analyze it (by a different object). That processing happens on a background thread, and it uses a pointer directly to the vector. Currently the wrapper has a getter method that will first lock the instance (it suspends a local serial queue for the writes) and then return the pointer. For those that don't know, this is done because when the vector runs out of space push_back causes the vector to move in memory to make room for the new entries - invalidating the pointer that was passed. Upon completion, the math-heavy code will call unlock on the wrapper, and the wrapper will resume the queued writes finish.
I don't see a way to pass the pointer along -for an unknown length of time- without using some type of lock or making a local copy -which would be prohibitively expensive.
Basically: Is there a better way to pass a primitive pointer to a vector (or NSMutableData, for those that are getting hung up by a vector), that while the pointer is being used, any additions to the vector are queued and then when the consumer of the pointer is done, automatically "unlock" the vector and process the write queue
Current Implementation
Classes:
DataArray: a wrapper for a C++ vector
DataProcessor: Takes the most raw data and cleans it up before sending it to the 'DataArray'
DataAnalyzer: Takes the 'DataArray' pointer and does analysis on array
Worker: owns and initializes all 3, it also coordinates the actions (it does other stuff as well that is beyond the scope here). it is also a delegate to the processor and analyzer
What happens:
Worker is listening for new data from another class that handles external devices
When it receives a NSNotification with the data packet, it passes that onto DataProcessor by -(void)checkNewData:(NSArray*)data
DataProcessor, working in a background thread cleans up the data (and keeps partial data) and then tells DataArray to -(void)addRawData:(float)data (shown below)
DataArray then stores that data
When DataProcessor is done with the current chunk it tells Worker
When Worker is notified processing is done it tells DataAnalyzer to get started on the new data by -(void)analyzeAvailableData
DataAnalyzer does some prep work, including asking DataArray for the pointer by - (float*)dataPointer (shown below)
DataAnalyzer does a dispatch_async to a global thread and starts the heavy-lifting. It needs access to the dataPointer the entire time.
When done, it does a dispatch_async to the main thread to tell DataArray to unlock the array.
DataArray can is accessed by other objects for read only purposes as well, but those other reads super quick.
Code snips from DataArray
-(void)addRawData:(float)data {
//quick sanity check
dispatch_async(addDataQueue, ^{
rawVector.push_back(data);
});
}
- (float*)dataPointer {
[self lock];
return &rawVector[0];
}
- (void)lock {
if (!locked) {
locked = YES;
dispatch_suspend(addDataQueue);
}
}
- (void)unlock {
if (locked) {
dispatch_resume(addDataQueue);
locked = NO;
}
}
Code snip from DataAnalyzer
-(void)analyzeAvailableData {
//do some prep work
const float *rawArray = [self.dataArray dataPointer];
dispatch_async(global_queue, ^{
//lots of analysis
//done
dispatch_async(main_queue, ^{
//tell `Worker` analysis is done
[self.dataArray unlock];
};
};
}
If you have a shared resource (your vector) which will be concurrently accessed through reads and writes from different tasks, you may associated a dedicated dispatch queue with this resource where these tasks will exclusively run.
That is, every access to this resource (read or write) will be executed on that dispatch queue exclusively. Let's name this queue "sync_queue".
This "sync_queue" may be a serial queue or a concurrent queue.
If it's a serial queue, it should be immediately obvious that all accesses are thread-safe.
If it's a concurrent queue, you can allow read accesses to happen simultaneously, that is you simply call dispatch_async(sync_queue, block):
dispatch_async(sync_queue, ^{
if (_shared_value == 0) {
dispatch_async(otherQueue, block);
}
});
If that read access "moves" the value to a call-site executing on a different execution context, you should use the synchronous version:
__block int x;
dispatch_sync(sync_queue, ^{
x = _shared_value;
});
return x;
Any write access requires exclusive access to the resource. Having a concurrent queue, you accomplish this through using a barrier:
dispatch_barrier_async(sync_queue, ^{
_shared_value = 0;
dispatch_async(mainQueue, ^{
NSLog(#"value %d", _shared_value);
});
});
It really depends what you're doing, most of the time I drop back to the main queue (or a specifically designated queue) using dispatch_async() or dispatch_sync().
Async is obviously better, if you can do it.
It's going to depend on your specific use case but there are times when dispatch_async/dispatch_sync is multiple orders of magnitude faster than creating a lock.
The entire point of grand central dispatch (and NSOperationQueue) is to take away many of the bottlenecks found in traditional threaded programming, including locks.
Regarding your comment about NSOperation being harder to use... that's true, I don't use it very often either. But it does have useful features, for example if you need to be able to terminate a task half way through execution or before it's even started executing, NSOperation is the way to go.
There is a simple way to get what you need even without locking. The idea is that you have either shared, immutable data or you exclusive, mutable data. The reason why you don't need a lock for shared, immutable data is that it is simply read-only, so no race conditions during writing can occur.
All you need to do is to switch between both depending on what you currently need:
When you are adding samples to your storage, you need exclusive access to the data. If you already have a "working copy" of the data, you can just extend it as you need. If you only have a reference to the shared data, you create a working copy which you then keep for later exclusive access.
When you want to evaluate your samples, you need read-only access to the shared data. If you already have a shared copy, you just use that. If you only have an exclusive-access working copy, you convert that to a shared one.
Both of these operations are performed on demand. Assuming C++, you could use std::shared_ptr<vector const> for the shared, immutable data and std::unique_ptr<vector> for the exclusive-access, mutable data. For the older C++ standard those would be boost::shared_ptr<..> and std::auto_ptr<..> instead. Note the use of const in the shared version and that you can convert from the exclusive to the shared one easily, but the inverse is not possible, in order to get a mutable from an immutable vector, you have to copy.
Note that I'm assuming that copying the sample data is not possible and doesn't explode the complexity of your algorithm. If that doesn't work, your approach with the scrap space that is used while the background operations are in progress is probably the best way to go. You can automate a few things using a dedicated structure that works similar to a smart pointer though.

Thoughts in accessing read only objects from different threads

Based on a previous discussion I had in SO (see Doubts on concurrency with objects that can be used multiple times like formatters), here I'm asking a more theoretical question about objects that during the application lifetime are created once (and never modified, hence read-only) and they can be accessed from different threads. A simple use case it's the Core Data one. Formatters can be used in different threads (main thread, importing thread, etc.).
NSFormatters, for example, are extremely expensive to create. Based on that they can created once and then reused. A typical pattern that can be follow (also highlighted by #mattt in NSFormatter article) is the following.
+ (NSNumberFormatter *)numberFormatter {
static NSNumberFormatter *_numberFormatter = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
_numberFormatter = [[NSNumberFormatter alloc] init];
[_numberFormatter setNumberStyle:NSNumberFormatterDecimalStyle];
});
return _numberFormatter;
}
Even if I'm sure that is a very good approach to follow (a sort of read-only/immutable object is created), formatters are not thread safe and so using them in a thread safe manner could be dangerous. I found a discussion on the argument in NSDateFormatter crashes when used from different threads where the author has noticed that a crash could happen.
NSDateFormatters are not thread safe; there was a background thread
attempting to use the same formatter at the same time (hence the
randomness).
So, what could be the problem in accessing a formatter from different threads? Any secure pattern to follow?
Specific answer for formatters:
Prior to iOS 7/OSX 10.9, even read-only access to formatters was not thread-safe. ICU has a ton of lazy computation that it does in response to requests, and that can crash or produce incorrect results if done concurrently.
In iOS 7/OSX 10.9, NSDateFormatter and NSNumberFormatter use locks internally to serialize access to the underlying ICU code, preventing this issue.
General answer:
Real-only access/immutable objects are indeed generally thread-safe, but it's difficult-to-impossible to tell which things actually are internally immutable, and which ones are merely presenting an immutable interface to the outside world.
In your own code, you can know this. When using other people's classes, you'll have to rely on what they document about how to use their classes safely.
(edit, since an example of serializing access to formatters was requested)
// in the dispatch_once where you create the formatter
dispatch_queue_t formatterQueue = dispatch_queue_create("date formatter queue", 0);
// where you use the formatter
dispatch_sync(formatterQueue, ^{ (do whatever you wanted to do with the formatter) });
I'm asking a more theoretical question about objects that during the application lifetime are created once (and never modified, hence read-only)
That is not technically correct: in order for an object to be truly read-only, it must also be immutable. For example, one could create NSMutableArray once, but since the object allows changes after creation, it cannot be considered read-only, and is therefore unsafe for concurrent use without additional synchronization.
Moreover, logically immutable object could have mutating implementation, making them non-thread-safe. For example, objects that perform lazy initialization on first use, or cache state in their instance variables without synchronization, are not thread-safe. It appears that NSDateFormatter is one such class: it appears that you could get a crash even when you are not calling methods to mutate the state of the class.
One solution to this could be using thread-local storage: rather than creating one NSDateFormatter per application you would be creating one per thread, but that would let you save some CPU cycles as well: one report mentions that they managed to shave 5..10% off their start-up time by using this simple trick.

Locking an object from being accessed by multiple threads - Objective-C

I have a question regarding thread safety in Objective-C. I've read a couple of other answers, some of the Apple documentation, and still have some doubts regarding this, so thought I'd ask my own question.
My question is three fold:
Suppose I have an array, NSMutableArray *myAwesomeArray;
Fold 1:
Now correct me if I'm mistaken, but from what I understand, using #synchronized(myAwesomeArray){...} will prevent two threads from accessing the same block of code. So, basically, if I have something like:
-(void)doSomething {
#synchronized(myAwesomeArray) {
//some read/write operation on myAwesomeArray
}
}
then, if two threads access the same method at the same time, that block of code will be thread safe. I'm guessing I've understood this part properly.
Fold 2:
What do I do if myAwesomeArray is being accessed by multiple threads from different methods?
If I have something like:
- (void)readFromArrayAccessedByThreadOne {
//thread 1 reads from myAwesomeArray
}
- (void)writeToArrayAccessedByThreadTwo {
//thread 2 writes to myAwesomeArray
}
Now, both the methods are accessed by two different threads at the same time. How do I ensure that myAwesomeArray won't have problems? Do I use something like NSLock or NSRecursiveLock?
Fold 3:
Now, in the above two cases, myAwesomeArray was an iVar in memory. What if I have a database file, that I don't always keep in memory. I create a databaseManagerInstance whenever I want to perform database operations, and release it once I'm done. Thus, basically, different classes can access the database. Each class creates its own instance of DatabaseManger, but basically, they are all using the same, single database file. How do I ensure that data is not corrupted due to race conditions in such a situation?
This will help me clear out some of my fundamentals.
Fold 1
Generally your understanding of what #synchronized does is correct. However, technically, it doesn't make any code "thread-safe". It prevents different threads from aquiring the same lock at the same time, however you need to ensure that you always use the same synchronization token when performing critical sections. If you don't do it, you can still find yourself in the situation where two threads perform critical sections at the same time. Check the docs.
Fold 2
Most people would probably advise you to use NSRecursiveLock. If I were you, I'd use GCD. Here is a great document showing how to migrate from thread programming to GCD programming, I think this approach to the problem is a lot better than the one based on NSLock. In a nutshell, you create a serial queue and dispatch your tasks into that queue. This way you ensure that your critical sections are handled serially, so there is only one critical section performed at any given time.
Fold 3
This is the same as Fold 2, only more specific. Data base is a resource, by many means it's the same as the array or any other thing. If you want to see the GCD based approach in database programming context, take a look at fmdb implementation. It does exactly what I described in Fold2.
As a side note to Fold 3, I don't think that instantiating DatabaseManager each time you want to use the database and then releasing it is the correct approach. I think you should create one single database connection and retain it through your application session. This way it's easier to manage it. Again, fmdb is a great example on how this can be achieved.
Edit
If don't want to use GCD then yes, you will need to use some kind of locking mechanism, and yes, NSRecursiveLock will prevent deadlocks if you use recursion in your methods, so it's a good choice (it is used by #synchronized). However, there may be one catch. If it's possible that many threads will wait for the same resource and the order in which they get access is relevant, then NSRecursiveLock is not enough. You may still manage this situation with NSCondition, but trust me, you will save a lot of time using GCD in this case. If the order of the threads is not relevant, you are safe with locks.
As in Swift 3 in WWDC 2016 Session Session 720 Concurrent Programming With GCD in Swift 3, you should use queue
class MyObject {
private let internalState: Int
private let internalQueue: DispatchQueue
var state: Int {
get {
return internalQueue.sync { internalState }
}
set (newValue) {
internalQueue.sync { internalState = newValue }
}
}
}
Subclass NSMutableArray to provide locking for the accessor (read and write) methods. Something like:
#interface MySafeMutableArray : NSMutableArray { NSRecursiveLock *lock; } #end
#implementation MySafeMutableArray
- (void)addObject:(id)obj {
[self.lock lock];
[super addObject: obj];
[self.lock unlock];
}
// ...
#end
This approach encapsulates the locking as part of the array. Users don't need to change their calls (but may need to be aware that they could block/wait for access if the access is time critical). A significant advantage to this approach is that if you decide that you prefer not to use locks you can re-implement MySafeMutableArray to use dispatch queues - or whatever is best for your specific problem. For example, you could implement addObject as:
- (void)addObject:(id)obj {
dispatch_sync (self.queue, ^{ [super addObject: obj] });
}
Note: if using locks, you'll surely need NSRecursiveLock, not NSLock, because you don't know of the Objective-C implementations of addObject, etc are themselves recursive.

use NSOperationQueue as a LIFO stack?

i need to do a series of url calls (fetching WMS tiles). i want to use a LIFO stack so the newest url call is the most important. i want to display the tile on the screen now, not a tile that was on the screen 5 seconds ago after a pan.
i can create my own stack from a NSMutableArray, but i'm wondering if a NSOperationQueue can be used as a LIFO stack?
You can set the priority of operations in an operation queue using -[NSOperation setQueuePriority:]. You'll have to rejigger the priorities of existing operations each time you add an operation, but you can achieve something like what you're looking for. You'd essentially demote all of the old ones and give the newest one highest priority.
Sadly I think NSOperationQueues are, as the name suggests, only usable as queues — not as stacks. To avoid having to do a whole bunch of manual marshalling of tasks, probably the easiest thing is to treat your queues as though they were immutable and mutate by copying. E.g.
- (NSOperationQueue *)addOperation:(NSOperation *)operation toHeadOfQueue:(NSOperationQueue *)queue
{
// suspending a queue prevents it from issuing new operations; it doesn't
// pause any already ongoing operations. So we do this to prevent a race
// condition as we copy operations from the queue
queue.suspended = YES;
// create a new queue
NSOperationQueue *mutatedQueue = [[NSOperationQueue alloc] init];
// add the new operation at the head
[mutatedQueue addOperation:operation];
// copy in all the preexisting operations that haven't yet started
for(NSOperation *operation in [queue operations])
{
if(!operation.isExecuting)
[mutatedQueue addOperation:operation];
}
// the caller should now ensure the original queue is disposed of...
}
/* ... elsewhere ... */
NSOperationQueue *newQueue = [self addOperation:newOperation toHeadOfQueue:operationQueue];
[operationQueue release];
operationQueue = newQueue;
It seems at present that releasing a queue that is still working (as will happen to the old operation queue) doesn't cause it to cancel all operations, but that's not documented behaviour so probably not trustworthy. If you want to be really safe, key-value observe the operationCount property on the old queue and release it when it goes to zero.
I'm not sure if you're still looking a solution, but I've the same problem has been bugging me for a while, so I went ahead and implemented an operation stack here: https://github.com/cbrauchli/CBOperationStack. I've used it with a few hundred download operations and it has held up well.
Sadly, you cannot do that without running into some tricky issues, because:
Important You should always configure dependencies before running your operations or adding them to an operation queue. Dependencies added afterward may not prevent a given operation object from running. (From: Concurrency Programming Guide: Configuring Interoperation Dependencies)
Take a look at this related question: AFURLConnectionOperation 'start' method gets called before it is ready and never gets called again afterwards
Found a neat implementation of stack/LIFO features on top of NSOperationQueue. It can be used as a category that extends NSOperationQueue or an NSOperationQueue LIFO subclass.
https://github.com/nicklockwood/NSOperationStack
The easiest way would be to separate your operations and data you will be processing, so you can add operations to NSOperationQueue as usual and then take data from a stack or any other data structure you need.
var tasks: [MyTask]
...
func startOperation() {
myQueue.addOperation {
guard let task = tasks.last else {
return
}
tasks.removeLast()
task.perform()
}
}
Now, obviously, you might need to ensure that tasks collection can be used concurrently, but it's a much more common problem with lots of pre-made solutions than hacking your way around NSOperationQueue execution order.

Changing the locking object inside #synchronized section

Can I do any of the following? Will they properly lock/unlock the same object? Why or why not? Assume there are many identical threads using global variable "obj", which was initialized before all threads started.
1.
#synchronized(obj) {
[obj release];
obj = nil;
}
2.
#synchronized(obj) {
obj = [[NSObject new] autorelease];
}
Short answer: no, they won't properly lock/unlock, and such approaches should be avoided.
My first question is why you'd want to do something like this, since these approaches nullify the purposes and benefits of using a #synchronized block in the first place.
In your second example, once a thread changes the value of obj, every subsequent thread that reaches the #synchronized block will synchronize on the new object, not the original object. For N threads, you'd be explicitly creating N autoreleased objects, and the runtime may create up to N recursive locks associated with those objects. Swapping out the object on which you synchronize within the critical section is a fundamental no-no of thread-safe concurrency. Don't do it. Ever. If multiple threads can safely access a block concurrently, just omit the #synchronized entirely.
In your first example, the results may be undefined, and certainly not what you want, either. If the runtime only uses the object pointer to find the associated lock, the code may run fine, but synchronizing on nil has no perceptible effect in my simple tests, so again you're using #synchronized in a pointless way, since it offers no protection whatsoever.
I'm honestly not trying to be harsh, since I figure you're probably just curious about the construct. I'm just wording this strongly to (hopefully) prevent you and others from writing code that is fatally flawed, especially if under the assumption that it synchronizes properly. Good luck!