Thoughts in accessing read only objects from different threads - objective-c

Based on a previous discussion I had in SO (see Doubts on concurrency with objects that can be used multiple times like formatters), here I'm asking a more theoretical question about objects that during the application lifetime are created once (and never modified, hence read-only) and they can be accessed from different threads. A simple use case it's the Core Data one. Formatters can be used in different threads (main thread, importing thread, etc.).
NSFormatters, for example, are extremely expensive to create. Based on that they can created once and then reused. A typical pattern that can be follow (also highlighted by #mattt in NSFormatter article) is the following.
+ (NSNumberFormatter *)numberFormatter {
static NSNumberFormatter *_numberFormatter = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
_numberFormatter = [[NSNumberFormatter alloc] init];
[_numberFormatter setNumberStyle:NSNumberFormatterDecimalStyle];
});
return _numberFormatter;
}
Even if I'm sure that is a very good approach to follow (a sort of read-only/immutable object is created), formatters are not thread safe and so using them in a thread safe manner could be dangerous. I found a discussion on the argument in NSDateFormatter crashes when used from different threads where the author has noticed that a crash could happen.
NSDateFormatters are not thread safe; there was a background thread
attempting to use the same formatter at the same time (hence the
randomness).
So, what could be the problem in accessing a formatter from different threads? Any secure pattern to follow?

Specific answer for formatters:
Prior to iOS 7/OSX 10.9, even read-only access to formatters was not thread-safe. ICU has a ton of lazy computation that it does in response to requests, and that can crash or produce incorrect results if done concurrently.
In iOS 7/OSX 10.9, NSDateFormatter and NSNumberFormatter use locks internally to serialize access to the underlying ICU code, preventing this issue.
General answer:
Real-only access/immutable objects are indeed generally thread-safe, but it's difficult-to-impossible to tell which things actually are internally immutable, and which ones are merely presenting an immutable interface to the outside world.
In your own code, you can know this. When using other people's classes, you'll have to rely on what they document about how to use their classes safely.
(edit, since an example of serializing access to formatters was requested)
// in the dispatch_once where you create the formatter
dispatch_queue_t formatterQueue = dispatch_queue_create("date formatter queue", 0);
// where you use the formatter
dispatch_sync(formatterQueue, ^{ (do whatever you wanted to do with the formatter) });

I'm asking a more theoretical question about objects that during the application lifetime are created once (and never modified, hence read-only)
That is not technically correct: in order for an object to be truly read-only, it must also be immutable. For example, one could create NSMutableArray once, but since the object allows changes after creation, it cannot be considered read-only, and is therefore unsafe for concurrent use without additional synchronization.
Moreover, logically immutable object could have mutating implementation, making them non-thread-safe. For example, objects that perform lazy initialization on first use, or cache state in their instance variables without synchronization, are not thread-safe. It appears that NSDateFormatter is one such class: it appears that you could get a crash even when you are not calling methods to mutate the state of the class.
One solution to this could be using thread-local storage: rather than creating one NSDateFormatter per application you would be creating one per thread, but that would let you save some CPU cycles as well: one report mentions that they managed to shave 5..10% off their start-up time by using this simple trick.

Related

Can a single privateManagedContext be accessed by multiple threads?

I am implementing some core data sample project. I have defined a mainManagedContext and a privateManagedContext.
Now my question is, "Is it right if I access privateManagedContext using multiple threads?"
Thank you in advance.
In my experience with accessing at the same time the static context lead me to a crashes.
So the only solution that I've implemented is to make a copy of the managedObjectContext for each tread where i have to work and make all my function( add/edit/save) with context as parameter:
so in some thread make a copy of context:
NSManagedObjectContext *localContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
[localContext setPersistentStoreCoordinator:YourStaticManagedObjectContext.persistentStoreCoordinator];
and use that context in current thread (add/edit)
then save context:
[YourManager saveContext:localContext withBlock:nil];
be careful if you need data at the same time when write in one thread and read on other thread better to implement some synchronized access to that staticManagedObjectContext.
You'll want to refer to this guide:
https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/CoreData/Concurrency.html
In short, you generally can't refer to the same managed object context from multiple threads. Unless you set the concurrency policy accordingly. But, even then, you aren't going to get truly concurrent I/O operations (which is as designed because concurrent I/O is generally slower anyway, among other reasons).
What are you trying to do?

When to use dispatch_once versus reallocation? (Cocoa/CocoaTouch)

I often use simple non compile-time immutable objects: like an array #[#"a", #"b"] or a dictionary #{#"a": #"b"}.
I struggle between reallocating them all the time:
- (void)doSomeStuff {
NSArray<NSString *> *fileTypes = #[#"h", #"m"];
// use fileTypes
}
And allocating them once:
- (void)doSomeStuff {
static NSArray<NSString *> * fileTypes;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
fileTypes = #[#"h", #"m"];
});
// use fileTypes
}
Are there recommendations on when to use each construct? Like:
depending on the size of the allocated object
depending on the frequency of the allocation
depending on the device (iPhone 4 vs iMac 2016)
...
How do I figure it out?
Your bullet list is a good start. Memory would be another consideration. A static variable will stay in memory from the time it is actually initialized until the termination of the app. Obviously a local variable will be deallocated at the end of the method (assuming no further references).
Readability is something to consider too. The reallocation line is much easier to read than the dispatch_once setup.
For a little array, I'd reallocate as the first choice. The overhead is tiny. Unless you are creating the array in a tight loop, performance will be negligible.
I would use dispatch_once and a static variable for things that take more overhead such as creating a date formatter. But then there is the overhead of reacting to the user changing the device's locale.
In the end, my thought process is to first use reallocation. Then I consider whether there is tangible benefit to using static and dispatch_once. If there isn't a worthwhile reason to use a static, I leave it a local variable.
Use static if the overhead (speed) of reallocation is too much (but not if the permanent memory hit is too large).
Your second approach is more complex. So you should use it, if it is needed, but not as a default.
Typically this is done when the object creating is extreme expensive (almost never) or if you need a single identity of the instance object (shared instance, sometimes called singleton, what is incorrect.) In such a case you will recognize that you need it.
Although this question could probably be closed as "primarily opinion-based", I think this question addresses a choice that programmers frequently make.
The obvious answer is: don't optimize, and if you do, profile first - because most of the time you'll intuitively address the wrong part of your code.
That being said, here's how I address this.
If the needed object is expensive to build (like the formatters per documentation) and lightweight on resources, reuse / cache the object.
If the object is cheap to build, create a new instance when needed.
For crucial things that run on the main thread (UI stuff) I tend to draw the line more strict and cache earlier.
Depending on what you need those objects for, sometimes there are valid alternatives that are cheaper but offer a similar programming comfort. For example, if you wanted to look up some chessboard coordinates, you could either build a dictionary {#"a1" : 0, #"b1" : 1, ...}, use an array alone and the indices, take a plain C array (which now has a much less price tag attached to it), or do a small integer-based calculation.
Another point to consider is where to put the cached object. For example, you could store it statically in the method as in your example. Or you could make it an instance variable. Or a class property / static variable. Sometimes caching is only the half way to the goal - if you consider a table view with cells including a date formatter, you can think about reusing the very same formatter for all your cells. Sometimes you can refactor that reused object into a helper object (be it a singleton or not), and address some issues there.
So there is really no golden bullet here, and each situation needs an individual approach. Just don't fall into the trap of premature optimization and trade clear code for bug-inviting, hardly readable stuff that might not matter to your performance but comes with sure drawbacks like increased memory footprint.
dispatch_once_t
This allows two major benefits: 1) a method is guaranteed to be called only once, during the lifetime of an application run, and 2) it can be used to implement lazy initialization, as reported in the Man Page below.
From the OS X Man Page for dispatch_once():
The dispatch_once() function provides a simple and efficient mechanism to run an initializer exactly once, similar to pthread_once(3). Well designed code hides the use of lazy initialization.
Some use cases for dispatch_once_t
Singleton
Initialization of a file system resource, such as a file handle
Any static variable that would be shared by a group of instances, and takes up a lot of memory
static, without dispatch_once_t
A statically declared variable that is not wrapped in a dispatch_once block still has the benefit of being shared by many instances. For example, if you have a static variable called defaultColor, all instances of the object see the same value. It is therefore class-specific, instead of instance-specific.
However, any time you need a guarantee that a block will be called only once, you will need to use dispatch_once_t.
Immutability
You also mentioned immutability. Immutability is independent of the concern of running something once and only once--so there are cases for both static immutable variables and instance immutable variables. For instance, there may be times when you need to have an immutable object initialized, but it still may be different for each instance (in cases where it's value depends on other instance variables). In that case, an immutable object is not static, and still may be initialized with different values from multiple instances. In this case, a property is derived from other instance variables, and therefore should not be allowed to be changed externally.
A note on immutability vs mutability, from Concepts in Objective-C Programming:
Consider a scenario where all objects are capable of being mutated. In your application you invoke a method and are handed back a reference to an object representing a string. You use this string in your user interface to identify a particular piece of data. Now another subsystem in your application gets its own reference to that same string and decides to mutate it. Suddenly your label has changed out from under you. Things can become even more dire if, for instance, you get a reference to an array that you use to populate a table view. The user selects a row corresponding to an object in the array that has been removed by some code elsewhere in the program, and problems ensue. Immutability is a guarantee that an object won’t unexpectedly change in value while you’re using it.
Objects that are good candidates for immutability are ones that encapsulate collections of discrete values or contain values that are stored in buffers (which are themselves kinds of collections, either of characters or bytes). But not all such value objects necessarily benefit from having mutable versions. Objects that contain a single simple value, such as instances of NSNumber or NSDate, are not good candidates for mutability. When the represented value changes in these cases, it makes more sense to replace the old instance with a new instance.
A note on performance, from the same reference:
Performance is also a reason for immutable versions of objects representing things such as strings and dictionaries. Mutable objects for basic entities such as strings and dictionaries bring some overhead with them. Because they must dynamically manage a changeable backing store—allocating and deallocating chunks of memory as needed—mutable objects can be less efficient than their immutable counterparts.

Doubts on concurrency with objects that can be used multiple times like formatters

Maybe a stupid question to ask but I need some confirmations on it.
Usually, when I deal with objects that can be used multiple times within my application I use an approach like the following.
Create an extension, say for example NSDecimalNumber+Extension, or a class utility where a number formatter is created like the following.
+ (NSNumberFormatter*)internal_sharedNumberFormatter
{
static NSNumberFormatter* _internal_numberFormatter = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
_internal_numberFormatter = [[NSNumberFormatter alloc] init];
// other configurations here...
});
return _internal_numberFormatter;
}
+ (NSString*)stringRepresentationOfDecimalNumber:(NSDecimalNumber*)numberToFormat
{
NSString *stringRepresentation = [[self class] internal_sharedNumberFormatter] stringFromNumber:numberToFormat];
return stringRepresentation;
}
This approach is quite good since, for example, formatters are expensive to create. But it could be applied to other situations as well.
Now, my questions is the following.
Does this approach is also valid in situations where different path of execution (different threads) are involved?
So, if I call first stringRepresentationOfDecimalNumber on the main thread and then in a different thread, what could happen?
I think is valid to perform different calls to stringRepresentationOfDecimalNumber in different threads since the shared formatter, in this case, is reading only, but I would like to have a reply from experts.
Thanks in advance.
NSNumberFormatter is mutable, so it is generally not thread safe and cited in Thread Safety Summary (see the "Thread-Unsafe Classes" section) in the non thread safe classes list.
But if you treat your object as an immutable object, you don't have to worry about race conditions. So for example, you cannot change the format if there are multiple threads accessing the formatter. If _internal_numberFormatter isn't altered in any way, and you have just these two methods in the category, you should consider it thread safe.

Changing the locking object inside #synchronized section

Can I do any of the following? Will they properly lock/unlock the same object? Why or why not? Assume there are many identical threads using global variable "obj", which was initialized before all threads started.
1.
#synchronized(obj) {
[obj release];
obj = nil;
}
2.
#synchronized(obj) {
obj = [[NSObject new] autorelease];
}
Short answer: no, they won't properly lock/unlock, and such approaches should be avoided.
My first question is why you'd want to do something like this, since these approaches nullify the purposes and benefits of using a #synchronized block in the first place.
In your second example, once a thread changes the value of obj, every subsequent thread that reaches the #synchronized block will synchronize on the new object, not the original object. For N threads, you'd be explicitly creating N autoreleased objects, and the runtime may create up to N recursive locks associated with those objects. Swapping out the object on which you synchronize within the critical section is a fundamental no-no of thread-safe concurrency. Don't do it. Ever. If multiple threads can safely access a block concurrently, just omit the #synchronized entirely.
In your first example, the results may be undefined, and certainly not what you want, either. If the runtime only uses the object pointer to find the associated lock, the code may run fine, but synchronizing on nil has no perceptible effect in my simple tests, so again you're using #synchronized in a pointless way, since it offers no protection whatsoever.
I'm honestly not trying to be harsh, since I figure you're probably just curious about the construct. I'm just wording this strongly to (hopefully) prevent you and others from writing code that is fatally flawed, especially if under the assumption that it synchronizes properly. Good luck!

iPhone use of mutexes with asynchronous URL requests

My iPhone client has a lot of involvement with asynchronous requests, a lot of the time consistently modifying static collections of dictionaries or arrays. As a result, it's common for me to see larger data structures which take longer to retrieve from a server with the following errors:
*** Terminating app due to uncaught exception 'NSGenericException', reason: '*** Collection <NSCFArray: 0x3777c0> was mutated while being enumerated.'
This typically means that two requests to the server come back with data which are trying to modify the same collection. What I'm looking for is a tutorial/example/understanding of how to properly structure my code to avoid this detrimental error. I do believe the correct answer is mutexes, but I've never personally used them yet.
This is the result of making asynchronous HTTP requests with NSURLConnection and then using NSNotification Center as a means of delegation once requests are complete. When firing off requests that mutate the same collection sets, we get these collisions.
There are several ways to do this. The simplest in your case would probably be to use the #synchronized directive. This will allow you to create a mutex on the fly using an arbitrary object as the lock.
#synchronized(sStaticData) {
// Do something with sStaticData
}
Another way would be to use the NSLock class. Create the lock you want to use, and then you will have a bit more flexibility when it comes to acquiring the mutex (with respect to blocking if the lock is unavailable, etc).
NSLock *lock = [[NSLock alloc] init];
// ... later ...
[lock lock];
// Do something with shared data
[lock unlock];
// Much later
[lock release], lock = nil;
If you decide to take either of these approaches it will be necessary to acquire the lock for both reads and writes since you are using NSMutableArray/Set/whatever as a data store. As you've seen NSFastEnumeration prohibits the mutation of the object being enumerated.
But I think another issue here is the choice of data structures in a multi-threaded environment. Is it strictly necessary to access your dictionaries/arrays from multiple threads? Or could the background threads coalesce the data they receive and then pass it to the main thread which would be the only thread allowed to access the data?
If it's possible that any data (including classes) will be accessed from two threads simultaneously you must take steps to keep these synchronized.
Fortunately Objective-C makes it ridiculously easy to do this using the synchronized keyword. This keywords takes as an argument any Objective-C object. Any other threads that specify the same object in a synchronized section will halt until the first finishes.
-(void) doSomethingWith:(NSArray*)someArray
{
// the synchronized keyword prevents two threads ever using the same variable
#synchronized(someArray)
{
// modify array
}
}
If you need to protect more than just one variable you should consider using a semaphore that represents access to that set of data.
// Get the semaphore.
id groupSemaphore = [Group semaphore];
#synchronized(groupSemaphore)
{
// Critical group code.
}
In response to the sStaticData and NSLock answer (comments are limited to 600 chars), don't you need to be very careful about creating the sStaticData and the NSLock objects in a thread safe way (to avoid the very unlikely scenario of multiple locks being created by different threads)?
I think there are two workarounds:
1) You can mandate those objects get created at the start of day in the single root thread.
2) Define a static object that is automatically created at the start of day to use as the lock, e.g. a static NSString can be created inline:
static NSString *sMyLock1 = #"Lock1";
Then I think you can safely use
#synchronized(sMyLock1)
{
// Stuff
}
Otherwise I think you'll always end up in a 'chicken and egg' situation with creating your locks in a thread safe way?
Of course, you are very unlikely to hit any of these problems as most iPhone apps run in a single thread.
I don't know about the [Group semaphore] suggestion earlier, that might also be a solution.
N.B. If you are using synchronisation don't forget to add -fobjc-exceptions to your GCC flags:
Objective-C provides support for
thread synchronization and exception
handling, which are explained in this
article and “Exception Handling.” To
turn on support for these features,
use the -fobjc-exceptions switch of
the GNU Compiler Collection (GCC)
version 3.3 and later.
http://developer.apple.com/library/ios/#documentation/cocoa/Conceptual/ObjectiveC/Articles/ocThreading.html
Use a copy of the object to modify it. Since you are trying to modify the reference of an array (collection), while someone else might also modify it (multiple access), creating a copy will work for you. Create a copy and then enumerate over that copy.
NSMutableArray *originalArray = #[#"A", #"B", #"C"];
NSMutableArray *arrayToEnumerate = [originalArray copy];
Now modify the arrayToEnumerate. Since it's not referenced to originalArray, but is a copy of the originalArray, it won't cause an issue.
There are other ways if you don't want the overhead of Locking as it has its cost. Instead of using a lock to protect on shared resource (in your case it might be dictionary or array), you can create a queue to serialise the task that is accessing your critical section code.
Queue doesn't take same amount of penalty as locks as it doesn't require trapping into the kernel to acquire mutex.
simply put
dispatch_async(serial_queue, ^{
<#critical code#>
})
In case if you want current execution to wait until task complete, you can use
dispatch_sync(serial_queue Or concurrent, ^{
<#critical code#>
})
Generally if execution doest need not to wait, asynchronous is a preferred way of doing.