NHibernate concurrency / cache problem - nhibernate

I have two applications running on a machine, where NHibernate is used as an ORM. One app is managing objects (CRUD operations), while the other is processing the objects (get, process, set status and save).
First I let the processing app process an object and set the status to processed. Then I change a text property manually in the database and reset the status (to make it process it again). The manual DB edit is to simulate the managing app. Then I start to see problems:
The read object still has the old text property, event though I've changed it in the DB. I guess NHibernate caching is the problem here.
When I set the object's status to processed, it uses all properties in the where clause when updating, which means it doesn't get updated in the database. This is because it has the wrong text in a property. I would guess this also has to do with caching.
The consequence of the status not being updated is that the same object (with wrong text) is processed over and over and over...
Anyone out there who can help me with how I should set up NHibernate to make this problem disappear?

Better call refresh method on the object you want, because flush can have unwanted side-effects.

Related

Fix inconsistent state right away or lazily when data is requested

Our users go through several steps of workflow - the further they go the more objects we create. We also allow users to go back to Step#1 and change one of the existing objects. Which may cause inconsistencies so we must update/delete some of the objects at Step#2. I see 2 options:
Update/delete objects from Step#2 right away. This leads to:
Operation that's supposed to be a simple PATCH of an entity field becomes complicated. And it's a shared object between multiple workflows - so we'll have to add if-statements and do different things depending on the workflow.
Circular dependencies. Operations on Step#1 have to know about objects/operations on Step#2.
On each request in Step#1 we'd have to load data for Step#2 in order to determine whether Step#2 really needs to be updated. Which slows down operations on Step#1. So to change 1 record in DB we'll have to load hundreds (or even thousands) records for Step#2.
Many actions on Step#1 may need fixing state at Step#2. So we have to ensure we don't forget anything today and in the future.
Fix Step#2 lazily - when user goes there (our current approach). Step#2 will recognize that objects are inconsistent and fix them. Which leads to just 1 place where we need to care, but:
Until user opens Step#2 - DB will contain inconsistent objects. This hasn't resulted in any problems so far. But I can imagine it may complicate future SQL migrations.
We update DB state on GET request. This one doesn't seem like that big of a deal since GET stays idempotent anyway. But still it feels awkward.
Anyone knows better approaches? Or maybe improvements to these two?
Update
I haven't found perfect solution, but eventually we implemented an improved version of #1. When updating state on Step#1 we also set a flag "need to rebuild Step#2", when UI opens Step#2 it first checks this flag and issues a PUT to rebuild the state, and only then it GETs Step#2.
This still means that DB state is inconsistent for some period of time. But at least we'll know this for sure from the flag in DB. And if needed - we could write migrations taking this flag into account. This also allows (if needed in the future) to create an async job to fix the state.
I think it is more flexible to separate the state and the context where the objects are stored. Any creation of a new object at any step is accompanied by the preservation of the invariant and consistency of context.
There are separate rules of states - these are rules for transition from one to another and available objects for creation and separate rules for the context, rules for its consistency, which is ensured every time it changes.
What about dirty data asynchronous cleanup?
Whenever user goes back to Step #1 and changes something, mark all related data as "dirty" (e.g. add links to it in "DirtyData" table) and be done for now.
Have a DataCleanup worker (e.g. separate thread or smth) that constantly looks for data to be cleaned up.
Before editing data for Step #2, check if the data is not dirty.
Depending on your logic, 3) might result in user error (e.g. user would need to repeat Step #2). If DataCleanup worker has enough resources (i.e. it processes DirtyData table almost instantaneously), that should happen only on very rare occasions. If that is not OK, you could opt for checking for dirty data on each fetch, but that could be expensive.
It sounds like you're familiar with the HTTP spec regarding GET requests, but for future readers:
Why shouldn't a GET request change data on the server?
Why is using a HTTP GET to update state on the server in a RESTful call incorrect?
For the other bullet under 2, we probably don't need a specification to agree that persisting valid data is preferable to persisting invalid data.
So what can we do for the bullets under 1 to avoid complex branching logic in a particular step and also circular dependencies? My suggestion is an event-driven design. When step #2 changes it should fire a change event. In this scenario, step #2 has no knowledge of the concrete listener(s) who may receive its events, so it remains decoupled from any complex handling logic.
There's probably no way to guarantee you don't forget anything in the future; but if every step in the workflow is defined as a listener, it forces you to consider change events to some extent every time you implement a new step.
One side note on granularity: if a step has many changes, it can batch up its events rather than fire each one individually. You can adjust the size for efficiency.
In summary, I would strongly consider the Observer design pattern.

Asynchronous Object Construction in Core Data

I'm currently working on an app with a reasonably complex Core Data model. The data model currently has 10 tables in it, with a bunch of relationships set between them. The data for the model is obtained piecemeal from a remote server. In order to minimize the amount of traffic to/from the server, the server API passes object ID's first, giving me a chance to discover if I already have stored the objects. If not, then I can ask the server for the full objects and store them. However, those objects can have references to other objects, for which I will need to check follow the same process: check if I have the object(s) and, if not, grab the objects from the server. The Core Data model includes fields for the server IDs which I use to validate and construct Core Data's object graph.
This creates a situation where objects will have been instantiated in Core Data, but won't have been completely constructed as they may be waiting for referenced objects to be returned by the server (which may, in turn, need to wait for their own reference objects).
So my first attempt to deal with this was to create a semaphore that would not allow the object context to be saved (I only save the context in one place) until all objects are downloaded and the object graph is constructed. The problem I ran into was that the context was being saved anyway, without me asking. This results in a ton of changes propagating through NSFetchedResultsController as objects are downloaded from the server and the object graph is being constructed. Moreover, the propagated objects may not be complete.
Has any dealt with anything like this? I think this could all work if I could explicitly control when Core Data saves, but that does not appear to be possible. Or am I missing something?
UPDATE
I was missing something. I was under the impression that NSFetchedResultsController received updates when the Context is saved. This is not true. It receives updates whenever processPendingChanges is called in the context, which occurs at the end of an event cycle. In the past, I've always used two contexts to keep updates separate from the UI, but this project had a deadline and existing code that kept me from refactoring. Given this new information, I think the separate context will fix my problem.
That is an extremely expensive way to sync with a server. Is there a reason your server can't respond to "changed since X" calls and give you everything? In your current design you are spending more time opening and closing sockets than you are receiving data.
Be that as it may, you want to do all of this processing in a secondary context that is connected directly to the NSPersistentStoreCoordinator. When it saves you want to capture the NSManagedObjectContextDidSaveNotification and then have your UI context consume that notification. That will update your UI when your server sync is complete.
This will keep your syncing 100% isolated from the UI and allow the UI to save or do whatever else it needs to do while you are working with the server. I would not use a parent/child design here. There is no reason to.
You access a core data database via the NSManagedObjectContext class.
Each context object must belong to a single thread, and any NSManagedObjects that context creates belong to the same thread.
Do not read or write any managed object from a thread other than the one that created it. If you do, you'll end up with unpredictable and impossible to debug data corruption problems.
However, you can have multiple NSManagedObjectContext instances for a single core data database, each one on a different thread, and you can merge any changes made to the context in one thread over to a context on another thread.
So, basically, you have a "main" NSManagedObjectContext which is used on the main thread, and used for almost all your operations. And then when you need to do something on another thread you create a "child" context for that thread, make all your changes, then merge those changes back to the main context on the main thread.
You can find specific details how to implement this from Apple's official documentation. Start reading here:
https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/CoreData/Articles/cdConcurrency.html#//apple_ref/doc/uid/TP40003385-SW1

Core data : how to undo operations once managed objects are saved with context

I am trying to implement downloading of bulk data from several tables on the server.
In my case there are 16 tables. For all these tables I will be firing 10 requests to the server. This means I have done a bit of logical groupings for related tables, but it is like all tables are inter-related with each other through one or the other relationship.
I need to consider three cases while doing downloading:
Saving data to each table at local.
Managing relationships between inserted objects.
Handling situation when one of the requests fails during download, say 8th request failed.
I will be following this approach for each response:
Inserting data in managed object context.
Managing relationships by firing NSPredicate and associating the related objects.
Saving the context.
In case of a response failure, I have two options:
Next time continue from the failed response.
Revert all saved data to its previous state.
1st approach may lead to some data inconsistency, so I am going with 2nd approach.
I know that if a managed object context is not saved, we can revert the changes, but
is it possible to revert the changes, if the managed object context is
saved?
I require some useful answers from the community.
Please suggest.
Is it possible to revert the changes, if the managed object context is saved?
After saving? Maybe, but it could be tricky. If you set up a separate managed object context for your network operations, and give it an NSUndoManager, you could later on tell the undo manager to roll everything back to the previous state.
It would be simpler to just not save changes until you're finished, though. Using an undo manager doesn't really help much-- the memory needed to store up all the undo actions will at least match the memory use from keeping all of the unsaved changes around until you're finished. If you're working on a separate managed object context (whether a child context or a completely separate context), handling the error case is as simple as letting the MOC get deallocated without saving changes first.

Repeating a query does not refresh the properties of the returned objects

When doing a criteria query with NHibernate, I want to get fresh results and not old ones from a cache.
The process is basically:
Query persistent objects into NHibernate application.
Change database entries externally (another program, manual edit in SSMS / MSSQL etc.).
Query persistence objects (with same query code), previously loaded objects shall be refreshed from database.
Here's the code (slightly changed object names):
public IOrder GetOrderByOrderId(int orderId)
{
...
IList result;
var query =
session.CreateCriteria(typeof(Order))
.SetFetchMode("Products", FetchMode.Eager)
.SetFetchMode("Customer", FetchMode.Eager)
.SetFetchMode("OrderItems", FetchMode.Eager)
.Add(Restrictions.Eq("OrderId", orderId));
query.SetCacheMode(CacheMode.Ignore);
query.SetCacheable(false);
result = query.List();
...
}
The SetCacheMode and SetCacheable have been added by me to disable the cache. Also, the NHibernate factory is set up with config parameter UseQueryCache=false:
Cfg.SetProperty(NHibernate.Cfg.Environment.UseQueryCache, "false");
No matter what I do, including Put/Refresh cache modes, for query or session: NHibernate keeps returning me outdated objects the second time the query is called, without the externally committed changes. Info btw.: the outdated value in this case is the value of a Version column (to test if a stale object state can be detected before saving). But I need fresh query results for multiple reasons!
NHibernate even generates an SQL query, but it is never used for the values returned.
Keeping the sessions open is neccessary to do dynamic updates on dirty columns only (also no stateless sessions for solution!); I don't want to add Clear(), Evict() or such everywhere in code, especially since the query is on a lower level and doesn't remember the objects previously loaded. Pessimistic locking would kill performance (multi-user environment!)
Is there any way to force NHibernate, by configuration, to send queries directly to the DB and get fresh results, not using unwanted caching functions?
First of all: this doesn't have anything to do with second-level caching (which is what SetCacheMode and SetCacheable control). Even if it did, those control caching of the query, not caching of the returned entities.
When an object has already been loaded into the current session (also called "first-level cache" by some people, although it's not a cache but an Identity Map), querying it again from the DB using any method will never override its value.
This is by design and there are good reasons for it behaving this way.
If you need to update potentially changed values in multiple records with a query, you will have to Evict them previously.
Alternatively, you might want to read about Stateless Sessions.
Is this code running in a transaction? Or is that external process running in a transaction? If one of those two is still in a transaction, you will not see any updates.
If that is not the case, you might be able to find the problem in the log messages that NHibernate is creating. These are very informative and will always tell you exactly what it is doing.
Keeping the sessions open is neccessary to do dynamic updates on dirty columns only
This is either the problem or it will become a problem in the future. NHibernate is doing all it can to make your life better, but you are trying to do as much as possible to prevent NHibernate to do it's job properly.
If you want NHibernate to update the dirty columns only, you could look at the dynamic-update-attribute in your class mapping file.

lazy-loading fails when session got disconnected

We use one (read-only) session which we disconnect as soon as we retrieve the data from the database. The data retrieved, often has lazy-loaded properties which are not initialized yet.
When we try to access the properties, the following exception gets thrown:
NHibernate.LazyInitializationException
Initializing[NHibernateTest.AppUser#16]-failed to lazily initialize a collection of role: NHibernateTest.AppUser.Permissions, session is disconnected
Is there a way (interceptor) to automatically detect that the application is trying to access an uninitialized property, so that the interceptor can quickly open the connection and close it after the unit of work?
Fetching everything at once would nullify the usage of laziness.
There is no efficient way to do that. The idea is that you keep the session open until your done with the session. There should be one session per unit of work. (a session is kind of unit of work actually).
Fetching everything your need in one query is more efficient than fetching everything you need in multiple queries, so I don't agree with your last statement. Lazy loading is useful for lazy programmers (like me) but is never more efficient than eager loading. Lazy loading can save you some programming time, but you still have to watch out for to many queries being executed (select N+1)