Context pattern? Why does Core Data need it? - objective-c

I'm still fairly new to Core Data and am trying to understand why it requires the passing around of a NSManagedObjectContext. As I understand it, passing the context is needed so that multiple threads don't affect the same context, but I was also under the impression that this pattern is sometimes considered an antipattern, as noted here.
Could Core Data theoretically be implemented in a thread safe way that would avoid using this pattern? How do other ORMs (such as Ruby's ActiveRecord for example) avoid this pattern? For example, couldn't CoreData implement a per-NSManagedObject saving method such as in this extension. This light framework doesn't handle multithreading, but couldn't NSManagedObjects use some kind of internal GCD queue(s) to support it, with an internal context they don't expose?
Sorry if I'm missing anything major.

The NSManagedObjectContext is the in-memory container of your applications object graph, just as the persistent store (XML, SQLite, etc.) usually represents the on disk container of your object graph.
There are some advantages to this approach:
Faulting can be applied to a set of objects, or in the case of CoreData the entire object graph
It's a convenient abstraction for forcing the application to batch it's I/O.
It provides a single point of contact for efficiently performing operations over the entire object graph (NSFetchRequests, etc.)
Undo can be applied to the object graph, not just individual objects.
It's also important to remember that CoreData is not an ORM framework, it's an object persistence framework. The primary responsibility of CoreData is to make accessing data stored in a persistent format on disk more efficient. However it doesn't attempt to emulate the functionality of relational databases.
To your point about concurrency, new concurrency models have been introduced in the upcoming release of Mac OSX. You can read more about that at developer.apple.com.
In the abstract though, the concurrency model chosen for a managed object context has more to do with the specifics of an individual application than the context pattern itself. Instances of NSManagedObjectContext should generally never be shared between threads.
In the same way that each thread requires it's own instance of NSAutoReleasePool, each thread should also have it's own MOC. That way, when the thread is done executing, it can commit it's changes to the store on disk and then release the context, freeing up all the memory consumed by objects processed on the thread.
This is a much more efficient paradigm than allowing a single context to continuously consume system resources during the lifecycle of a given application. Of course, this can be done by invoking -reset on the context as well, which will cause all of the NSManagedObject's in use by the context to be turned back in to faults.

You need one NSManagedObjectContext per thread. So you would have one to fill your UI on the main thread and for longer operations you would have another for each background thread. If you want results to be merged in from those other threads then there is a notification you can subscribe to that provides something to quickly merge what was changed into your main MOC.

Related

Alternative for concurrency:: data structures in macOS

For my macOS application, I'd like to use concurrent map and queue data structure to be shared between multithread process and support parallel operations.
After some research I've found what what I need, but unfortunately those are only implemented in windows.
concurrency::concurrent_unordered_map<key,value> concurrency::concurrent_queue<key>
Perhaps there are synonyms internal implementations in macOS in CoreFoundation or other framework that comes with Xcode SDK (disregarding the language implementation) ?
thanks,
Perhaps there are synonyms internal implementations in macOS in CoreFoundation or other framework that comes with Xcode SDK (disregarding the language implementation) ?
Nope. You must roll-your-own or source elsewhere.
The Apple provided collections are not thread safe, however the common recommendation is to combine them with Grand Central Dispatch (GCD) to provide lightweight thread-safe wrappers, and this is quite easy to do.
Here is an outline of one way to do it for NSMutableDictionary, which you would use for your concurrent map:
Write a subclass, say ThreadSafeDictionary, of NSMutabableDictionary. This will allow your thread safe version to be passed anywhere an NSMutableDictionary is accepted.
The subclass should have a private instance of a standard NSMutableDictionary, say actualDictionary
To subclass NSMutableDicitonary you just need to override 2 methods from NSMutableDictionary and 4 methods from NSDictionary. Each of these methods should invoke the same method on actualDictionary after meeting any concurrency requirements.
To handle concurrency requirements the subclass should first create a private concurrent dispatch queue using dispatch_queue_create() and save this in an instance variable, say operationQueue.
For operations which read from the dictionary the override method uses a dispatch_sync() to schedule the read on actualDicitonary and return the value. As operationQueue is concurrent this allows multiple concurrent readers.
For operations which write to the dictionary the override method uses a dispatch_async_barrier() to schedule the write on actualDicitonary. The async means the writer is allowed to continue without waiting for any other writers, and the barrier ensures there are no other concurrent operations mutating the dictionary.
You repeat the above outline to implement the concurrent queue based on one of the other collection types.
If after studying the documentation you get stuck on the design or implementation ask a new question, show what you have, describe the issue, include a link back to this question so people can follow the thread, and someone will undoubtedly help you take the next step.
HTH

Tracking model object attributes changes (dirty) in Cocoa

I'm trying to gain insight into the least overhead solution to tracking model object changes in Cocoa.
As I see it there are 3 options:
Use Core Data – lot's of functionality exists for monitoring model object changes (Core Data NSManagedObject - tracking if attribute was changed). I don't know what the overhead of Core Data's management infrastructure is compared to other approaches but it's well established architecture for multi-threading support is a plus. For cross-platform devs there is some downside in not having a readily accessible schema but there are ways around that issue.
Write custom accessors that mark the object as dirty when updating a field with a new value. I've been using this technique with mixed success for quite some time. There are some sticky issues to deal with when sharing objects across threads. You also don't get the benefits of enhancements to automatic synthesis of attributes, etc. You do, however, have greater control of your data store than when using Core Data which can be of benefit (eg. certain operations can be done in a SQL store across many objects in a much more efficient way). Note: There could be a lot of variation here depending on how you write the accessors. For the sake of conversation let's assume setters make a check of the new value against the old one, make appropriate calls to KVO (willChange / didChange), and set a boolean flag (all within synchronization of course).
Use KVO to monitor object fields (ala keyPathsForValuesAffectingValueForKey:) and mark the object as dirty in the KVO callout. I have yet to use this method but it seems like a decent approach. The obvious downside would be the callout every time a setter is called.
I am inclined to think that option 2 has the lowest overhead (in terms of raw processing requirements) given that Core Data and KVO both have some additional overhead either in the generated accessors or in the KVO callouts. The question is, how substantial is the overhead?
And lastly, did I miss an option?
Thanks.

"Delegates or NSNotifications" Adjudging performance of code?

In my application, I have to display image files as a list in tableview, present them in full size and as multiple thumbnails. Hence basically I developed three seperate classes to handle these three views. Now to perform any file operations, I can think of two approaches:
Create appdelegate objects for all these classes, handle them accordingly. When one operation on a photo file is performed in one class, all other classes are notified using NSNotification, keeping the obeserver as Appdelegate object.
Create locally objects for these classes as and when required and assign delegates for performing file operations from one class to other by calling relevant methods.
However, I was not able to judge Which approach would be better in terms of memory usage and performance? Thanks in advance.
Using a one-to-one relationship with direct messaging is the simpler relationship and means of communication/messaging. Favor the delegate callback -- Number 2.
It is also easy to make this design bidirectional -- if the view goes offscreen, you could perform a cancellation. If the load fails, it is easier to inform the controller.
NSNotifications are comparably heavyweight. Not necessary.
Storing a bunch of stuff in a singleton (app delegate) can result in several unnecessarily retained objects. If your program is concurrent, then that can add even more complexity. There's no need for any of this complexity or introduction of mutable global state, and there is no reason presented whereby the objects should have a much larger scope of access and lifetime.
You can optimize for specific needs beyond that, but I don't see any at this time.
It depends a lot on the code and how you are structuring your app. I general use delegates in the following situation:
Where the delegate object exists before and after the main object that needs it. In other words the main object does not need to worry about the lifecycle of it's delegate.
Where the relationship between an object and it's delegate object is a strict one to one. In other words only one delegate object needs to interact with the main object. I have seen situations where delegates are swapped in and out and I would not recommend such code.
Where the main object needs information from the delegate.
I would use notifications where:
Multiple objects need to know of about things happening in another class.
Where the main class does not need to interact with the other classes or even know they exist.
Which ever you choose I would not have more than one file management object for each image. The simple reason being that having multiple means you need to ensure that they all have the same state and therefore are communicating with each other. Otherwise bugs will creep in.

Copying pending changes between NSManagedObjectContexts with shared persistent store?

I have two instances of NSManagedObjectContext: one is used in main thread and the other is used in a background thread (via an NSOperation.) For thread safety, these two contexts only share an NSPersistentStoreCoordinator.
The problem I'm having is with pending changes in the first context (on the main thread) not being available to the second context until a -save is performed. This is understandable since the shared persistent store won't have copies of the NSManagedObjects being tracked by -insertedObjects, -updatedObjects, and -deletedObjects are persisted.
Unfortunately, this presents a problem with the user experience: any unsaved changes won't appear in the (time consuming) reports that are generated in the background thread.
The only solution I can think of is nasty: take the inserted, updated and deleted objects from the first context and graft them onto the object graph of the second context. There are some pretty complex relations in the dataset, so I'm hesitant to go in this direction. I'm hoping someone here as a better solution.
If this is under 10.7 there are some solutions: one is you can have nested ManagedObjectContexts, so you can “save” in the one being modified and it won’t save all the way to the disk, but it will make the changes available to other children of the master context.
Before 10.7 you will probably have to copy the changes over yourself. This isn’t super-hard since you can just have a single object listen for NSManagedObjectContextObjectsDidChangeNotification and then just re-apply the changes exactly from the main context. (Should be about 20 lines of code.) You never have to save this second context I assume?
Not sure if you have any OS restraints but in iOS 5 / Mac OS 10.7 you can use nested managed object contexts to accomplish this. I believe a child context is able to pull in unsaved changes in the parent by simply doing a new fetch.
Edit: Looks like Wil beat me to it but yeah, prior to iOS 5 / Mac OS 10.7 you'll have to listen for the NSManagedObjectContextDidSaveNotification and take a look at the userInfo dictionary for the added/updated/deleted objects.
An alternate solution might involve using a single managed object context and providing your own thread safety over access to it, or use the context's lock and unlock methods.
I would try to make the main thread do a normal save so the second context can just merge the changes into his context. "fighting" a APIs intended use is never an good idea.
You could mark the newly saved record with an attribute as intermediate and delete later if the user finally cancels the edit.
Solving those problems with attributes in your entities and querying in the background thread with a matching predicated would be easy...
And that would be a stable solution as well. I am coming from a database driven world (oracle) we often use such patterns (status attributes in records) to make data visible/invisible to other DB sessions (which would equal to threads in an cocoa app). Works always without problems. Other threads /sessions do always only see commited changes that's how most RDBMS work.

How to model a HPC queueing system with Objective-C

I am trying to program an application for the mac to query a high performance computing cluster about its running and queued calculation jobs. The aim is to be able to monitor the submitted jobs if they are still queued and waiting for execution or if they are running and on which node or host in the cluster.
On the GUI side I would like to be able to display an NSTableView showing all submitted job and alternatively a second option to see all hosts in the cluster, how many and which jobs are running on each node.
The model objects themselves are not so hard to do, what bothers me most is the lifecycle and the ownership relations between the host and the job objects. This has to be well designed otherwise I will run into memory management problems.
Please note that I would like to program it without using CoreData if possible.
1. Possibility
The yellow queue object is the root object of my object graph and it owns all the host objects (has an NSArray of custom host objects). Each host object owns all the job object which are running on this host (also by having an NSArray of custom job objects). I think that there are two major problems with this approach:
where are all the job objects store which are still queued and are not already running on a host. They lack a parent host object.
How would one implement a NSTableView containing all the job objects?
2. Possibility
The yellow root object holds directly references to all job objects by having them stored in a NSArray. Each job has an instance variable retaining a host object. Again here are some problems
I would also have the hosts in the model which are currently idle, so no job is currently executed on them.
How would one implement the data source for a NSTableView showing all the hosts.
How does one make sure that there are no duplicate host objects, so that each host in the cluster is represented by one host object only.
My questions are:
1. Which of the two possibilities make most sense? Are there alternatives?
2. Would one better implement it with CoreData?
3. How would one manage the object lifecycle so that there are no retain cycles or dangling pointers.
Thank you
If your concerned about memory management, Core Data is the way to go. It's much more efficient memory wise and it manages memory for you in the vast majority of cases.
In general you manage memory by making sure that each individual class' instances clean up after themselves. Then you put the objects in a hierarchy such that as each level deallocates it automatically cleans up the objects under it.
As to the specific structure, it depends on the logic of the situation you are modeling. If the organization logically goes:
queue{
jobs{
host
}
}
... then you should mimic that in your data structure.
I strongly recommend you use Core Data. You will end up duplicating a lot Core Data functionality anyway if you implement this all by hand. Core Data was designed specifically to manage object graphs like this. That is its primary role. All the database stuff was tacked on as an after thought. There's no need to reinvent the wheel.