Core data : how to undo operations once managed objects are saved with context - cocoa-touch

I am trying to implement downloading of bulk data from several tables on the server.
In my case there are 16 tables. For all these tables I will be firing 10 requests to the server. This means I have done a bit of logical groupings for related tables, but it is like all tables are inter-related with each other through one or the other relationship.
I need to consider three cases while doing downloading:
Saving data to each table at local.
Managing relationships between inserted objects.
Handling situation when one of the requests fails during download, say 8th request failed.
I will be following this approach for each response:
Inserting data in managed object context.
Managing relationships by firing NSPredicate and associating the related objects.
Saving the context.
In case of a response failure, I have two options:
Next time continue from the failed response.
Revert all saved data to its previous state.
1st approach may lead to some data inconsistency, so I am going with 2nd approach.
I know that if a managed object context is not saved, we can revert the changes, but
is it possible to revert the changes, if the managed object context is
saved?
I require some useful answers from the community.
Please suggest.

Is it possible to revert the changes, if the managed object context is saved?
After saving? Maybe, but it could be tricky. If you set up a separate managed object context for your network operations, and give it an NSUndoManager, you could later on tell the undo manager to roll everything back to the previous state.
It would be simpler to just not save changes until you're finished, though. Using an undo manager doesn't really help much-- the memory needed to store up all the undo actions will at least match the memory use from keeping all of the unsaved changes around until you're finished. If you're working on a separate managed object context (whether a child context or a completely separate context), handling the error case is as simple as letting the MOC get deallocated without saving changes first.

Related

Fix inconsistent state right away or lazily when data is requested

Our users go through several steps of workflow - the further they go the more objects we create. We also allow users to go back to Step#1 and change one of the existing objects. Which may cause inconsistencies so we must update/delete some of the objects at Step#2. I see 2 options:
Update/delete objects from Step#2 right away. This leads to:
Operation that's supposed to be a simple PATCH of an entity field becomes complicated. And it's a shared object between multiple workflows - so we'll have to add if-statements and do different things depending on the workflow.
Circular dependencies. Operations on Step#1 have to know about objects/operations on Step#2.
On each request in Step#1 we'd have to load data for Step#2 in order to determine whether Step#2 really needs to be updated. Which slows down operations on Step#1. So to change 1 record in DB we'll have to load hundreds (or even thousands) records for Step#2.
Many actions on Step#1 may need fixing state at Step#2. So we have to ensure we don't forget anything today and in the future.
Fix Step#2 lazily - when user goes there (our current approach). Step#2 will recognize that objects are inconsistent and fix them. Which leads to just 1 place where we need to care, but:
Until user opens Step#2 - DB will contain inconsistent objects. This hasn't resulted in any problems so far. But I can imagine it may complicate future SQL migrations.
We update DB state on GET request. This one doesn't seem like that big of a deal since GET stays idempotent anyway. But still it feels awkward.
Anyone knows better approaches? Or maybe improvements to these two?
Update
I haven't found perfect solution, but eventually we implemented an improved version of #1. When updating state on Step#1 we also set a flag "need to rebuild Step#2", when UI opens Step#2 it first checks this flag and issues a PUT to rebuild the state, and only then it GETs Step#2.
This still means that DB state is inconsistent for some period of time. But at least we'll know this for sure from the flag in DB. And if needed - we could write migrations taking this flag into account. This also allows (if needed in the future) to create an async job to fix the state.
I think it is more flexible to separate the state and the context where the objects are stored. Any creation of a new object at any step is accompanied by the preservation of the invariant and consistency of context.
There are separate rules of states - these are rules for transition from one to another and available objects for creation and separate rules for the context, rules for its consistency, which is ensured every time it changes.
What about dirty data asynchronous cleanup?
Whenever user goes back to Step #1 and changes something, mark all related data as "dirty" (e.g. add links to it in "DirtyData" table) and be done for now.
Have a DataCleanup worker (e.g. separate thread or smth) that constantly looks for data to be cleaned up.
Before editing data for Step #2, check if the data is not dirty.
Depending on your logic, 3) might result in user error (e.g. user would need to repeat Step #2). If DataCleanup worker has enough resources (i.e. it processes DirtyData table almost instantaneously), that should happen only on very rare occasions. If that is not OK, you could opt for checking for dirty data on each fetch, but that could be expensive.
It sounds like you're familiar with the HTTP spec regarding GET requests, but for future readers:
Why shouldn't a GET request change data on the server?
Why is using a HTTP GET to update state on the server in a RESTful call incorrect?
For the other bullet under 2, we probably don't need a specification to agree that persisting valid data is preferable to persisting invalid data.
So what can we do for the bullets under 1 to avoid complex branching logic in a particular step and also circular dependencies? My suggestion is an event-driven design. When step #2 changes it should fire a change event. In this scenario, step #2 has no knowledge of the concrete listener(s) who may receive its events, so it remains decoupled from any complex handling logic.
There's probably no way to guarantee you don't forget anything in the future; but if every step in the workflow is defined as a listener, it forces you to consider change events to some extent every time you implement a new step.
One side note on granularity: if a step has many changes, it can batch up its events rather than fire each one individually. You can adjust the size for efficiency.
In summary, I would strongly consider the Observer design pattern.

Asynchronous Object Construction in Core Data

I'm currently working on an app with a reasonably complex Core Data model. The data model currently has 10 tables in it, with a bunch of relationships set between them. The data for the model is obtained piecemeal from a remote server. In order to minimize the amount of traffic to/from the server, the server API passes object ID's first, giving me a chance to discover if I already have stored the objects. If not, then I can ask the server for the full objects and store them. However, those objects can have references to other objects, for which I will need to check follow the same process: check if I have the object(s) and, if not, grab the objects from the server. The Core Data model includes fields for the server IDs which I use to validate and construct Core Data's object graph.
This creates a situation where objects will have been instantiated in Core Data, but won't have been completely constructed as they may be waiting for referenced objects to be returned by the server (which may, in turn, need to wait for their own reference objects).
So my first attempt to deal with this was to create a semaphore that would not allow the object context to be saved (I only save the context in one place) until all objects are downloaded and the object graph is constructed. The problem I ran into was that the context was being saved anyway, without me asking. This results in a ton of changes propagating through NSFetchedResultsController as objects are downloaded from the server and the object graph is being constructed. Moreover, the propagated objects may not be complete.
Has any dealt with anything like this? I think this could all work if I could explicitly control when Core Data saves, but that does not appear to be possible. Or am I missing something?
UPDATE
I was missing something. I was under the impression that NSFetchedResultsController received updates when the Context is saved. This is not true. It receives updates whenever processPendingChanges is called in the context, which occurs at the end of an event cycle. In the past, I've always used two contexts to keep updates separate from the UI, but this project had a deadline and existing code that kept me from refactoring. Given this new information, I think the separate context will fix my problem.
That is an extremely expensive way to sync with a server. Is there a reason your server can't respond to "changed since X" calls and give you everything? In your current design you are spending more time opening and closing sockets than you are receiving data.
Be that as it may, you want to do all of this processing in a secondary context that is connected directly to the NSPersistentStoreCoordinator. When it saves you want to capture the NSManagedObjectContextDidSaveNotification and then have your UI context consume that notification. That will update your UI when your server sync is complete.
This will keep your syncing 100% isolated from the UI and allow the UI to save or do whatever else it needs to do while you are working with the server. I would not use a parent/child design here. There is no reason to.
You access a core data database via the NSManagedObjectContext class.
Each context object must belong to a single thread, and any NSManagedObjects that context creates belong to the same thread.
Do not read or write any managed object from a thread other than the one that created it. If you do, you'll end up with unpredictable and impossible to debug data corruption problems.
However, you can have multiple NSManagedObjectContext instances for a single core data database, each one on a different thread, and you can merge any changes made to the context in one thread over to a context on another thread.
So, basically, you have a "main" NSManagedObjectContext which is used on the main thread, and used for almost all your operations. And then when you need to do something on another thread you create a "child" context for that thread, make all your changes, then merge those changes back to the main context on the main thread.
You can find specific details how to implement this from Apple's official documentation. Start reading here:
https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/CoreData/Articles/cdConcurrency.html#//apple_ref/doc/uid/TP40003385-SW1

NSManagedObjectContext confusion

I am learning about CoreData. Obviously, one of the main classes you entouer is NSManagedObjectContext. I am unclear about the exact role of this. From the articles i've read, it seems that you can have multiple NSManagedObjectContexts. Does this mean that NSManagedObjectContext is basically a copy of the backend?
How would this resolve into a consistent backend when there is multiple different copies lying around?
So, 2 questions basically:
Is NSManagedContext a copy of the backend database?
and...
For example, say I make a change in context A and make some other change in context B. Then I call save on A first, then B? will B prevail?
Thanks
The NSManagedObjectContext is not a copy of the backend database. The documentation describes it as a scratch pad
An instance of NSManagedObjectContext represents a single “object
space” or scratch pad in an application. Its primary responsibility is
to manage a collection of managed objects. These objects form a group
of related model objects that represent an internally consistent view
of one or more persistent stores. A single managed object instance
exists in one and only one context, but multiple copies of an object
can exist in different contexts. Thus object uniquing is scoped to a
particular context.
The NSManagedObjectContext is just a temporary place to make changes to your managed objects in a transactional way. When you make changes to objects in a context it does not effect the backend database until and if you save the context, and as you know you can have multiple context that you can make changes to which is really important for concurrency.
For question number 2, the answer for who prevails will depend on the merge policy you set for your context and which one is called last which would be B. Here are the merge policies that can be set that will effect the second context to be saved.
NSErrorMergePolicyType
Specifies a policy that causes a save to fail
if there are any merge conflicts.
NSMergeByPropertyStoreTrumpMergePolicyType
Specifies a policy that
merges conflicts between the persistent store’s version of the object
and the current in-memory version, giving priority to external
changes.
NSMergeByPropertyObjectTrumpMergePolicyType
Specifies a policy that merges conflicts between the persistent store’s version
of the object and the current in-memory version, giving priority to
in-memory changes.
NSOverwriteMergePolicyType
Specifies a policy that
overwrites state in the persistent store for the changed objects in
conflict.
NSRollbackMergePolicyType
Specifies a policy that
discards in-memory state changes for objects in conflict.
An NSManagedObjectContext is specific representation of your data model. Each context maintains its own state (e.g. context) so changes in one context will not directly affect other contexts. When you work with multiple contexts it is your responsibility to keep them consistent by merging changes when a context saves its changes to the store.
Your question is regarding this process and may also involve merge conflicts. Whenever you save a context its changes are committed to the store and a merge policy is used to resolve conflicts.
When you save a context, it will post various notifications regarding progress. In your case, if [contextA save:&error] succeeds, the context will post the notification NSManagedObjectContextDidSaveNotification. When you have multiple contexts, you typically observe this notification and call:
[contextB mergeChangesFromContextDidSaveNotification:notification];
This will merge the changes saved on contextA into contextB.
EDIT: removed the 'thread-safe' comment. NSManagedObjectContext is not thread safe.

NSManagedObjectContext with different processes

I have two processes that are talking to the same persistent store. I save the context on one process, and I post a distributed notification. The other process sees the distributed notification, and fetches its data again, but still receives the old data. Is there some kind of "flushing" I need to do to get the other process to get the correct data from the store?
EDIT: So, it turns out that I was flushing the data correctly. NSManagedObjects have a "refreshObject:mergeChanges" method that you use to do this. The issue appears to be timing related. Let's say I have two processes, A and B. Process A is the main process and does a save to the database. Then Process B does a save to the database and sends a notification to Process A that it has done so, and Process A fetches the new data. I've found that if Process A's save and Process B's save are too close together, the old data is fetched by Process A even if I refresh. If I force there to be some time between the two saves, then it works out correctly.
Obviously this seems like some kind of race condition, where perhaps the notification is getting sent before the data is actually getting saved to the database, however, the notification gets sent in the didSave method of the managed object, at which point the context has already saved.
You should check the merge policy concept, to manage, get and communicate the correct values of a persistent store coordinator between different contexts.
Here -> http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/CoreData/Articles/cdChangeManagement.html#//apple_ref/doc/uid/TP30001201-CJBDBHCB
That should resolve the problem.
Hope this can help.

NHibernate concurrency / cache problem

I have two applications running on a machine, where NHibernate is used as an ORM. One app is managing objects (CRUD operations), while the other is processing the objects (get, process, set status and save).
First I let the processing app process an object and set the status to processed. Then I change a text property manually in the database and reset the status (to make it process it again). The manual DB edit is to simulate the managing app. Then I start to see problems:
The read object still has the old text property, event though I've changed it in the DB. I guess NHibernate caching is the problem here.
When I set the object's status to processed, it uses all properties in the where clause when updating, which means it doesn't get updated in the database. This is because it has the wrong text in a property. I would guess this also has to do with caching.
The consequence of the status not being updated is that the same object (with wrong text) is processed over and over and over...
Anyone out there who can help me with how I should set up NHibernate to make this problem disappear?
Better call refresh method on the object you want, because flush can have unwanted side-effects.