Locking an object from being accessed by multiple threads - Objective-C - objective-c

I have a question regarding thread safety in Objective-C. I've read a couple of other answers, some of the Apple documentation, and still have some doubts regarding this, so thought I'd ask my own question.
My question is three fold:
Suppose I have an array, NSMutableArray *myAwesomeArray;
Fold 1:
Now correct me if I'm mistaken, but from what I understand, using #synchronized(myAwesomeArray){...} will prevent two threads from accessing the same block of code. So, basically, if I have something like:
-(void)doSomething {
#synchronized(myAwesomeArray) {
//some read/write operation on myAwesomeArray
}
}
then, if two threads access the same method at the same time, that block of code will be thread safe. I'm guessing I've understood this part properly.
Fold 2:
What do I do if myAwesomeArray is being accessed by multiple threads from different methods?
If I have something like:
- (void)readFromArrayAccessedByThreadOne {
//thread 1 reads from myAwesomeArray
}
- (void)writeToArrayAccessedByThreadTwo {
//thread 2 writes to myAwesomeArray
}
Now, both the methods are accessed by two different threads at the same time. How do I ensure that myAwesomeArray won't have problems? Do I use something like NSLock or NSRecursiveLock?
Fold 3:
Now, in the above two cases, myAwesomeArray was an iVar in memory. What if I have a database file, that I don't always keep in memory. I create a databaseManagerInstance whenever I want to perform database operations, and release it once I'm done. Thus, basically, different classes can access the database. Each class creates its own instance of DatabaseManger, but basically, they are all using the same, single database file. How do I ensure that data is not corrupted due to race conditions in such a situation?
This will help me clear out some of my fundamentals.

Fold 1
Generally your understanding of what #synchronized does is correct. However, technically, it doesn't make any code "thread-safe". It prevents different threads from aquiring the same lock at the same time, however you need to ensure that you always use the same synchronization token when performing critical sections. If you don't do it, you can still find yourself in the situation where two threads perform critical sections at the same time. Check the docs.
Fold 2
Most people would probably advise you to use NSRecursiveLock. If I were you, I'd use GCD. Here is a great document showing how to migrate from thread programming to GCD programming, I think this approach to the problem is a lot better than the one based on NSLock. In a nutshell, you create a serial queue and dispatch your tasks into that queue. This way you ensure that your critical sections are handled serially, so there is only one critical section performed at any given time.
Fold 3
This is the same as Fold 2, only more specific. Data base is a resource, by many means it's the same as the array or any other thing. If you want to see the GCD based approach in database programming context, take a look at fmdb implementation. It does exactly what I described in Fold2.
As a side note to Fold 3, I don't think that instantiating DatabaseManager each time you want to use the database and then releasing it is the correct approach. I think you should create one single database connection and retain it through your application session. This way it's easier to manage it. Again, fmdb is a great example on how this can be achieved.
Edit
If don't want to use GCD then yes, you will need to use some kind of locking mechanism, and yes, NSRecursiveLock will prevent deadlocks if you use recursion in your methods, so it's a good choice (it is used by #synchronized). However, there may be one catch. If it's possible that many threads will wait for the same resource and the order in which they get access is relevant, then NSRecursiveLock is not enough. You may still manage this situation with NSCondition, but trust me, you will save a lot of time using GCD in this case. If the order of the threads is not relevant, you are safe with locks.

As in Swift 3 in WWDC 2016 Session Session 720 Concurrent Programming With GCD in Swift 3, you should use queue
class MyObject {
private let internalState: Int
private let internalQueue: DispatchQueue
var state: Int {
get {
return internalQueue.sync { internalState }
}
set (newValue) {
internalQueue.sync { internalState = newValue }
}
}
}

Subclass NSMutableArray to provide locking for the accessor (read and write) methods. Something like:
#interface MySafeMutableArray : NSMutableArray { NSRecursiveLock *lock; } #end
#implementation MySafeMutableArray
- (void)addObject:(id)obj {
[self.lock lock];
[super addObject: obj];
[self.lock unlock];
}
// ...
#end
This approach encapsulates the locking as part of the array. Users don't need to change their calls (but may need to be aware that they could block/wait for access if the access is time critical). A significant advantage to this approach is that if you decide that you prefer not to use locks you can re-implement MySafeMutableArray to use dispatch queues - or whatever is best for your specific problem. For example, you could implement addObject as:
- (void)addObject:(id)obj {
dispatch_sync (self.queue, ^{ [super addObject: obj] });
}
Note: if using locks, you'll surely need NSRecursiveLock, not NSLock, because you don't know of the Objective-C implementations of addObject, etc are themselves recursive.

Related

Enforcing one-at-a-time access to pointer from a primative wrapper

I've read a fair amount on thread-safety, and have been using GCD to keep the math-heavy code off the main thread for a while now (I learned about it before NSOperation, and it seems to still be the easier option). However, I wonder if I could improve part of my code that currently uses a lock.
I have an Objective-C++ class that is a wrapper for a c++ vector. (Reasons: primitive floats are added constantly without knowing a limit beforehand, the container must be contiguous, and the reason for using a vector vs NSMutableData is "just cause" it's what I settled on, and NSMutableData will still suffer from the same "expired" pointer when it goes to resize itself).
The class has instance methods to add data points that are processed and added to the vector (vector.push_back). After new data is added I need to analyze it (by a different object). That processing happens on a background thread, and it uses a pointer directly to the vector. Currently the wrapper has a getter method that will first lock the instance (it suspends a local serial queue for the writes) and then return the pointer. For those that don't know, this is done because when the vector runs out of space push_back causes the vector to move in memory to make room for the new entries - invalidating the pointer that was passed. Upon completion, the math-heavy code will call unlock on the wrapper, and the wrapper will resume the queued writes finish.
I don't see a way to pass the pointer along -for an unknown length of time- without using some type of lock or making a local copy -which would be prohibitively expensive.
Basically: Is there a better way to pass a primitive pointer to a vector (or NSMutableData, for those that are getting hung up by a vector), that while the pointer is being used, any additions to the vector are queued and then when the consumer of the pointer is done, automatically "unlock" the vector and process the write queue
Current Implementation
Classes:
DataArray: a wrapper for a C++ vector
DataProcessor: Takes the most raw data and cleans it up before sending it to the 'DataArray'
DataAnalyzer: Takes the 'DataArray' pointer and does analysis on array
Worker: owns and initializes all 3, it also coordinates the actions (it does other stuff as well that is beyond the scope here). it is also a delegate to the processor and analyzer
What happens:
Worker is listening for new data from another class that handles external devices
When it receives a NSNotification with the data packet, it passes that onto DataProcessor by -(void)checkNewData:(NSArray*)data
DataProcessor, working in a background thread cleans up the data (and keeps partial data) and then tells DataArray to -(void)addRawData:(float)data (shown below)
DataArray then stores that data
When DataProcessor is done with the current chunk it tells Worker
When Worker is notified processing is done it tells DataAnalyzer to get started on the new data by -(void)analyzeAvailableData
DataAnalyzer does some prep work, including asking DataArray for the pointer by - (float*)dataPointer (shown below)
DataAnalyzer does a dispatch_async to a global thread and starts the heavy-lifting. It needs access to the dataPointer the entire time.
When done, it does a dispatch_async to the main thread to tell DataArray to unlock the array.
DataArray can is accessed by other objects for read only purposes as well, but those other reads super quick.
Code snips from DataArray
-(void)addRawData:(float)data {
//quick sanity check
dispatch_async(addDataQueue, ^{
rawVector.push_back(data);
});
}
- (float*)dataPointer {
[self lock];
return &rawVector[0];
}
- (void)lock {
if (!locked) {
locked = YES;
dispatch_suspend(addDataQueue);
}
}
- (void)unlock {
if (locked) {
dispatch_resume(addDataQueue);
locked = NO;
}
}
Code snip from DataAnalyzer
-(void)analyzeAvailableData {
//do some prep work
const float *rawArray = [self.dataArray dataPointer];
dispatch_async(global_queue, ^{
//lots of analysis
//done
dispatch_async(main_queue, ^{
//tell `Worker` analysis is done
[self.dataArray unlock];
};
};
}
If you have a shared resource (your vector) which will be concurrently accessed through reads and writes from different tasks, you may associated a dedicated dispatch queue with this resource where these tasks will exclusively run.
That is, every access to this resource (read or write) will be executed on that dispatch queue exclusively. Let's name this queue "sync_queue".
This "sync_queue" may be a serial queue or a concurrent queue.
If it's a serial queue, it should be immediately obvious that all accesses are thread-safe.
If it's a concurrent queue, you can allow read accesses to happen simultaneously, that is you simply call dispatch_async(sync_queue, block):
dispatch_async(sync_queue, ^{
if (_shared_value == 0) {
dispatch_async(otherQueue, block);
}
});
If that read access "moves" the value to a call-site executing on a different execution context, you should use the synchronous version:
__block int x;
dispatch_sync(sync_queue, ^{
x = _shared_value;
});
return x;
Any write access requires exclusive access to the resource. Having a concurrent queue, you accomplish this through using a barrier:
dispatch_barrier_async(sync_queue, ^{
_shared_value = 0;
dispatch_async(mainQueue, ^{
NSLog(#"value %d", _shared_value);
});
});
It really depends what you're doing, most of the time I drop back to the main queue (or a specifically designated queue) using dispatch_async() or dispatch_sync().
Async is obviously better, if you can do it.
It's going to depend on your specific use case but there are times when dispatch_async/dispatch_sync is multiple orders of magnitude faster than creating a lock.
The entire point of grand central dispatch (and NSOperationQueue) is to take away many of the bottlenecks found in traditional threaded programming, including locks.
Regarding your comment about NSOperation being harder to use... that's true, I don't use it very often either. But it does have useful features, for example if you need to be able to terminate a task half way through execution or before it's even started executing, NSOperation is the way to go.
There is a simple way to get what you need even without locking. The idea is that you have either shared, immutable data or you exclusive, mutable data. The reason why you don't need a lock for shared, immutable data is that it is simply read-only, so no race conditions during writing can occur.
All you need to do is to switch between both depending on what you currently need:
When you are adding samples to your storage, you need exclusive access to the data. If you already have a "working copy" of the data, you can just extend it as you need. If you only have a reference to the shared data, you create a working copy which you then keep for later exclusive access.
When you want to evaluate your samples, you need read-only access to the shared data. If you already have a shared copy, you just use that. If you only have an exclusive-access working copy, you convert that to a shared one.
Both of these operations are performed on demand. Assuming C++, you could use std::shared_ptr<vector const> for the shared, immutable data and std::unique_ptr<vector> for the exclusive-access, mutable data. For the older C++ standard those would be boost::shared_ptr<..> and std::auto_ptr<..> instead. Note the use of const in the shared version and that you can convert from the exclusive to the shared one easily, but the inverse is not possible, in order to get a mutable from an immutable vector, you have to copy.
Note that I'm assuming that copying the sample data is not possible and doesn't explode the complexity of your algorithm. If that doesn't work, your approach with the scrap space that is used while the background operations are in progress is probably the best way to go. You can automate a few things using a dedicated structure that works similar to a smart pointer though.

Multi-threaded Objective-C accessors: GCD vs locks

I'm debating whether or not to move to a GCD-based pattern for multi-threaded accessors. I've been using custom lock-based synchronization in accessors for years, but I've found some info (Intro to GCD) and there seems to be pros of a GCD-based approach. I'm hoping to start a dialog here to help myself and others weigh the decision.
The pattern looks like:
- (id)something
{
__block id localSomething;
dispatch_sync(queue, ^{
localSomething = [something retain];
});
return [localSomething autorelease];
}
- (void)setSomething:(id)newSomething
{
dispatch_async(queue, ^{
if(newSomething != something)
{
[something release];
something = [newSomething retain];
[self updateSomethingCaches];
}
});
}
On the pro side: you get the benefit of, possibly, non-blocking write access; lower overhead than locks (maybe?); safety from forgetting to unlock before returning from critical code sections; others?
Cons: Exception handling is non-existent so you have to code this into every block in which you might need it.
Is this pattern potentially the recommended method of writing multithreaded accessors?
Are there standard approaches for creating dispatch queues for this purpose? In other words, best practices for trading off granularity? With locks, for example, locking on each attribute is more fine grain than locking on the entire object. With dispatch queues, I could imagine that creation of a single queue for all objects would create performance bottlenecks, so are per-object queues appropriate? Obviously , the answer is highly dependent on the specific application, but are there known performance tradeoffs to help gauge the feasibility of the approach.
Any information / insight would be appreciated.
Is this pattern potentially the recommended method of writing
multithreaded accessors?
I guess you wrote that with a serial queue in mind, but there is no reason for it. Consider this:
dispatch_queue_t queue = dispatch_queue_create("com.example", DISPATCH_QUEUE_CONCURRENT);
// same thing as your example
- (NSString*)something {
__block NSString *localSomething;
dispatch_sync(queue, ^{
localSomething = _something;
});
return localSomething;
}
- (void)setSomething:(NSString*)something {
dispatch_barrier_async(queue, ^{
_something = something;
});
}
It reads concurrently but uses a dispatch barrier to disable concurrency while the write is happening. A big advantage of GCD is that allows concurrent reads instead locking the whole object like #property (atomic) does.
Both asyncs (dispatch_async, dispatch_barrier_async) are faster from the client point of view, but slower to execute than a sync because they have to copy the block, and having the block such a small task, the time it takes to copy becomes meaningful. I rather have the client returning fast, so I'm OK with it.

Risks of using performSelectorInBackground?

I have been doing some testing with ObjectiveResource (iOS->Rails bridge). Things seem to work, but the library is synchronous (or maybe not, but the mailing list that supports it is a mess).
I'm wondering what the pitfalls are to just running all calls in a performSelectorInBackground... in small tests it seems to work fine, but that's the case with many things that are wrong.
The only caveat I've noticed is that you have to create an Autorelease Pool in the method called by performSelectorInBackground (and then you should only call drain and not release?).
performSelectorInBackground: uses threads behind the scenes, and the big thing with threads is that any piece of code touched by more than one is a minefield for race conditions and other subtle bugs. This obviously means drawing to the screen is off-limits outside the main thread. But there are a lot of other libraries that are also not threadsafe, and any code using those is also tainted.
Basically, thread-safety is something you have to intentionally put in your code or it's probably not there. ObjectiveResource doesn't make any claims to it, so already I would be nervous. Glancing at the source, it looks like it mainly uses the Foundation URL loading machinery, which is threadsafe IIRC. But the ObjectiveResource code itself is not. Just at a glance, all of the class methods use static variables, which means they're all subject to race conditions if you performSelectorInBackground: more than once with code that uses them.
It looks like the 1.1 branch on their Github has explicit support for async through a ConnectionManager class. Probably better off using that (though this is essentially unmaintained code, so caveat emptor).
So are you actually experiencing any issues? Or are you just anticipating them?
Running on a background thread shouldn't give you any issues, unless you try to update a UI element from that same background thread. Be sure to forward any UI-related activities to the main thread. For example (pseudo):
- (void)viewWillAppear:(BOOL)animated {
[self performSelectorInBackground:#selector(refreshTableView)];
[super viewWillAppear:animated];
}
- (void)refreshTableView {
// Where _listOfObjects is used to populate your UITableView
#synchronized(self) {
self._listOfObjects = [MyDataType findAllRemote];
}
[self.tableView performSelectorOnMainThread:#selector(reloadData) withObject:nil waitUntilDone:YES];
}
Note also (as above) that if you are changing the value of any instance variables on the background thread, it's important that you synchronize on self to prevent any other threads (like the main thread) from accessing objects in the _listOfObjects array while it is being updated or set. (Or you may "get" an incomplete object.)
I'm not 100% positive (comments are welcome), but I believe that if you declare the _listOfObjects property as atomic, you won't need to worry about the synchronized block. Though, you would need the synchronized block regardless of the #property declaration if, instead of reassigning the value of the property, you were instead making changes to a single, persistent instance. (Eg. Adding/removing objects from a static NSMutableArray.)

Changing the locking object inside #synchronized section

Can I do any of the following? Will they properly lock/unlock the same object? Why or why not? Assume there are many identical threads using global variable "obj", which was initialized before all threads started.
1.
#synchronized(obj) {
[obj release];
obj = nil;
}
2.
#synchronized(obj) {
obj = [[NSObject new] autorelease];
}
Short answer: no, they won't properly lock/unlock, and such approaches should be avoided.
My first question is why you'd want to do something like this, since these approaches nullify the purposes and benefits of using a #synchronized block in the first place.
In your second example, once a thread changes the value of obj, every subsequent thread that reaches the #synchronized block will synchronize on the new object, not the original object. For N threads, you'd be explicitly creating N autoreleased objects, and the runtime may create up to N recursive locks associated with those objects. Swapping out the object on which you synchronize within the critical section is a fundamental no-no of thread-safe concurrency. Don't do it. Ever. If multiple threads can safely access a block concurrently, just omit the #synchronized entirely.
In your first example, the results may be undefined, and certainly not what you want, either. If the runtime only uses the object pointer to find the associated lock, the code may run fine, but synchronizing on nil has no perceptible effect in my simple tests, so again you're using #synchronized in a pointless way, since it offers no protection whatsoever.
I'm honestly not trying to be harsh, since I figure you're probably just curious about the construct. I'm just wording this strongly to (hopefully) prevent you and others from writing code that is fatally flawed, especially if under the assumption that it synchronizes properly. Good luck!

iPhone use of mutexes with asynchronous URL requests

My iPhone client has a lot of involvement with asynchronous requests, a lot of the time consistently modifying static collections of dictionaries or arrays. As a result, it's common for me to see larger data structures which take longer to retrieve from a server with the following errors:
*** Terminating app due to uncaught exception 'NSGenericException', reason: '*** Collection <NSCFArray: 0x3777c0> was mutated while being enumerated.'
This typically means that two requests to the server come back with data which are trying to modify the same collection. What I'm looking for is a tutorial/example/understanding of how to properly structure my code to avoid this detrimental error. I do believe the correct answer is mutexes, but I've never personally used them yet.
This is the result of making asynchronous HTTP requests with NSURLConnection and then using NSNotification Center as a means of delegation once requests are complete. When firing off requests that mutate the same collection sets, we get these collisions.
There are several ways to do this. The simplest in your case would probably be to use the #synchronized directive. This will allow you to create a mutex on the fly using an arbitrary object as the lock.
#synchronized(sStaticData) {
// Do something with sStaticData
}
Another way would be to use the NSLock class. Create the lock you want to use, and then you will have a bit more flexibility when it comes to acquiring the mutex (with respect to blocking if the lock is unavailable, etc).
NSLock *lock = [[NSLock alloc] init];
// ... later ...
[lock lock];
// Do something with shared data
[lock unlock];
// Much later
[lock release], lock = nil;
If you decide to take either of these approaches it will be necessary to acquire the lock for both reads and writes since you are using NSMutableArray/Set/whatever as a data store. As you've seen NSFastEnumeration prohibits the mutation of the object being enumerated.
But I think another issue here is the choice of data structures in a multi-threaded environment. Is it strictly necessary to access your dictionaries/arrays from multiple threads? Or could the background threads coalesce the data they receive and then pass it to the main thread which would be the only thread allowed to access the data?
If it's possible that any data (including classes) will be accessed from two threads simultaneously you must take steps to keep these synchronized.
Fortunately Objective-C makes it ridiculously easy to do this using the synchronized keyword. This keywords takes as an argument any Objective-C object. Any other threads that specify the same object in a synchronized section will halt until the first finishes.
-(void) doSomethingWith:(NSArray*)someArray
{
// the synchronized keyword prevents two threads ever using the same variable
#synchronized(someArray)
{
// modify array
}
}
If you need to protect more than just one variable you should consider using a semaphore that represents access to that set of data.
// Get the semaphore.
id groupSemaphore = [Group semaphore];
#synchronized(groupSemaphore)
{
// Critical group code.
}
In response to the sStaticData and NSLock answer (comments are limited to 600 chars), don't you need to be very careful about creating the sStaticData and the NSLock objects in a thread safe way (to avoid the very unlikely scenario of multiple locks being created by different threads)?
I think there are two workarounds:
1) You can mandate those objects get created at the start of day in the single root thread.
2) Define a static object that is automatically created at the start of day to use as the lock, e.g. a static NSString can be created inline:
static NSString *sMyLock1 = #"Lock1";
Then I think you can safely use
#synchronized(sMyLock1)
{
// Stuff
}
Otherwise I think you'll always end up in a 'chicken and egg' situation with creating your locks in a thread safe way?
Of course, you are very unlikely to hit any of these problems as most iPhone apps run in a single thread.
I don't know about the [Group semaphore] suggestion earlier, that might also be a solution.
N.B. If you are using synchronisation don't forget to add -fobjc-exceptions to your GCC flags:
Objective-C provides support for
thread synchronization and exception
handling, which are explained in this
article and “Exception Handling.” To
turn on support for these features,
use the -fobjc-exceptions switch of
the GNU Compiler Collection (GCC)
version 3.3 and later.
http://developer.apple.com/library/ios/#documentation/cocoa/Conceptual/ObjectiveC/Articles/ocThreading.html
Use a copy of the object to modify it. Since you are trying to modify the reference of an array (collection), while someone else might also modify it (multiple access), creating a copy will work for you. Create a copy and then enumerate over that copy.
NSMutableArray *originalArray = #[#"A", #"B", #"C"];
NSMutableArray *arrayToEnumerate = [originalArray copy];
Now modify the arrayToEnumerate. Since it's not referenced to originalArray, but is a copy of the originalArray, it won't cause an issue.
There are other ways if you don't want the overhead of Locking as it has its cost. Instead of using a lock to protect on shared resource (in your case it might be dictionary or array), you can create a queue to serialise the task that is accessing your critical section code.
Queue doesn't take same amount of penalty as locks as it doesn't require trapping into the kernel to acquire mutex.
simply put
dispatch_async(serial_queue, ^{
<#critical code#>
})
In case if you want current execution to wait until task complete, you can use
dispatch_sync(serial_queue Or concurrent, ^{
<#critical code#>
})
Generally if execution doest need not to wait, asynchronous is a preferred way of doing.