Strict vs NonStrict NHibernate cache concurrency strategies - nhibernate

This question is about the difference between ReadWrite and NonStrictReadWrite cache concurrency strategies for NHibernate's second level cache.
As I understand it, the difference between these two strategies is relevant when you have a distributed replicated cache - nonstrict won't guarantee that one cache has the exact same value as another cache, while strict read/write should - assuming the cache provider does the appropriate distributed locking.
The part I don't understand is how the strict vs nonstrict distinction is relevant when you have a single cache, or a distributed partitioned (non replicated) cache. Can it be relevant? It seems to me that in non replicated scenarios, the timestamps cache will ensure that stale results are not served. If it can be relevant, I would like to see an example.

What you assume is right, in a single target/thread environment there's little difference. However if you look at the cache providers there is a bit going on even in a multi-threaded scenario.
How an object is re-cached from it's modified state is different in the non-strict. For example, if your object is much heftier to reload but you'd like it to after an update instead of footing the next user with the bill, then you'll see different performance with strict vs non-strict. For example: non-strict simply dumps an object from cache after an update is performed...price is paid for the fetch on the next access instead of a post-update event handler. In the strict model, the re-cache is taken care of automatically. A similar thing happens with inserts, non-strict will do nothing where strict will go behind and load the newly inserted object into cache.
In non-strict you also have the possibility of a dirty read, since the cache isn't locked at the time of the read you would not see the result of another thread's change to the item. In strict the cache key for that item would lock and you would be held up but see the absolute latest result.
So, even in a single target environment, if there is a large amount of concurrent reads/edits on objects then you have a chance to see data that isn't really accurate.
This of course becomes a problem when a save is performed and an edit screen is loading: the person thinking they're editing the latest version of the object really isn't, and they're in for a nasty surprise when they try to save the edits to the stale data they loaded.

I have created a post here explaining the differences. Please have a look and feel free to comment.

Related

cloudflare Durable Objects update object value

Halo! I'm recently diving into cloudflare Workers, especially Durable Objects. I could make a simple request which put a js object into the assigned key. Let's say the key is key0, and the put object value is {"fieldA": "val0", "fieldB": "val1"}. In this case, how can i update the field-value of fieldA without removing fieldB? I've tried simply executing put("key0", {"fieldA": "newVal0"}) and it has kept removing {"fieldB": "val1"}.
Of course it is a common behaviour in js operations, but i cannot find out anything like ~["key0"]["fieldA"] = "newVal0" in docs(maybe i'm missing sth). OTL
Hope this question reach to the gurus in the community! Thanks in advance [:
EDIT after the answers:
In theory, it would be wonderful if flare durable objects support and work just like a normal js object. Such possible worker feature feels like a killer app for the cloud db services, since the average cpu time is quite fast and flare also has super low pricing compared to other big bros. If it happens, i would eager to migrate everything into the flare platform [:
Durable Objects' KV storage only supports get and put operations -- it doesn't have any sort of "update". So, you have two options:
get() the key, modify it, and then write the modified version back. This may sound inefficient, but keep in mind that commonly-accessed keys will likely be in in-memory cache. In fact, this get/modify/put implemented in your JavaScript is probably about as fast as any modification operation that Durable Objects itself could possibly implement built-in. That said, you probably don't want to use this approach with large objects, since the whole object has to be written to disk again after every update.
Split your object across multiple keys. E.g. instead of having the key foo map to {"fieldA": "val0", "fieldB": "val1"}, you could have separate keys foo:fieldA and foo:fieldB. Note that you can fetch all the keys at once using storage.list({prefix: "foo:"}). This approach is not as convenient but allows each field to be written separately to disk.
get and put deal with whole JS objects, so if you want to change part of the object you should get it, update it using normal JS, and then put the entire object back.

Use cases of Event Sourcing, when we don't care about past states

I have been reading about Event Sourcing pattern, I have seen it used in the projects I have worked on, but I am still yet to see any benefit of it, while it makes the design much more complicated.
That is, many sources mention that Event Sourcing is good if you want to see Audit Log, be able to reconstruct the state of 15 days ago and I see that Event Sourcing solves all of that beautifully. But apart from that, what is the point?
Yes, I can imagine that if you are in relational world, then writes are comparatively slow as they lock the data and so on. But it is much easier to solve this problem, by going no-sql and using something like Cassandra. Cassandra's writes are super fast, as they are append-only (kinda temporary event source), it scales beautifully as well. Sources also mention that Event Sourcing helps scaling - how on earth it can help you to scale, when instead of storing ~1 row of data per user, now you have 9000 and instead of retrieving that single row, now you are replaying 9000 rows (or less, if you complicate the design even more and add some temporal snapshots of state and replay the current state form the last snapshot).
Any examples of real life problems that Event Sourcing solves or links would be much appreciated.
While I haven't implemented a distributed, event-sourced sub-system as yet (so I'm no expert), I have been researching and evaluating the approach. Event sourcing provides a number of key benefits:
Reliability
Scalability
Evolvability
Audit
I'm sure there are more. To a large extent, the benefits of event sourcing depend on the baseline you are comparing it against (CRUD, event-driven DDD, CQRS, or whatever), and the domain.
Let's look at each of those in turn:
Reliability
With event driven systems that fire events whenever the system is updated, you often have a problem: how do you both update the system state and fire the event in one go? If the 2nd operation fails, your system is in a broken, inconsistent state. Event sourcing provides a neat solution to this, since the system only requires a single operation for the state change, which will either succeed or fail atomically: the writing of the event. Other solutions tend to be more complex and less scalable - 2 phase commit, etc.
This is a big benefit in a large, high transaction system, where components are failing, being updated or replaced all the time while transactions are going on. The ability to terminate a process at any time without any worry about data corruption or consistency is a big benefit and helps you sleep at night.
In many domains you won't have concurrent writes to the same entities, or you won't require events since a state change has no knock-on effects, in which case event sourcing is unlikely to be a good approach, and simpler approaches like CRUD may be fine.
Scalability
First of all, event streams make consistent writes very efficient - it's just an append only log, which makes replication and 'compare and set' simple to optimise. Something like Cassandra is quite slow in the scenario where you need to protect your invariants - that is, you need to validate a command against the current state of a 'row', and reject the update if the row changes before you have a chance to update it. You either need to use 'lightweight transactions' to ensure consistency, or have a single writer thread per partition, so that you can be sure that you can successfully validate a command against the current state of the system before allowing the update. Of course you can implement an event store in Cassandra, using either of these approaches (single thread/lightweight transactions).
Read scalability is the biggest performance benefit though - since you can build as many different eventually consistent projections (views) on the data as you want by reading from event streams, and horizontally scale query services on these views as much as you want. These views can use custom databases (Cassandra, graph databases) as necessary to allow queries to be optimised as much as you want. They can store denormalised data, to allow all required data to be fetched in a single (non-joined) database query. They can even store the projected state in memory, for maximum performance. While this can potentially be achieved without event sourcing, it is much more complex to implement.
If you don't have complex querying and high scalability requirements, event sourcing may not be the right solution.
Evolvability
If you need to look at your data in a new way, say you create a new client app or screen in an app, it's very easy to add new projections of the event streams as new, independent services. If you need to add some data to an existing read view that you missed, or fix a bug in the read view, you can just rebuild the views using the event streams and throw away the old ones. The advantages here vs. the non-event sourced case are:
You don't need to write both DB migration code and then code to keep the view up to date as events come in. Instead, you just write the code to keep it up to date, and run it on the events from the start of time.
Related to this, you can do the update without having to bring down the query service to do a schema change - instead, just leave the old service version running against the old DB, generate a new DB with the new service version, and when it's caught up with the event streams, just atomically switch over then clean up the old service and DB once you're happy the new one is stable (noting that the old service will be keeping itself up to date in the meantime, if you need to roll back!). This is likely to be extremely difficult to achieve without event sourcing.
If you need any temporal information to be added to your views (e.g. when was the last update, when was this created), that's already available and easy to add, but impossible to add retrospectively without event sourcing.
Note that the above isn't about modifying event streams (which is tricker, see my comment on challenges below) - it's about using the existing event streams to enhance a view or create a new one.
There are simple ways to do this without event sourcing, such as using database views (with an RDBMS), but they aren't as scalable.
Event sourcing also has some challenges for evolvability - you need to take care of event versioning, probably using a combination of weak event schema (so you can add properties with default values) and stream replacement (when you want to do a bigger change to your events). Greg Young is writing a good book on this.
Audit
As you mentioned, you're not interested in this.

nhibernate lazy loading uses implicit transaction

This seems to be a pretty common problem: I load an NHibernate object that has a lazily loaded collection.
At some later point, I access the collection to do something.
I still have the nhibernate session open (as it's managed per view or whatever) so it does actually work but the transaction is closed so in NHprof I get 'use of implicit transactions is discouraged'.
I understand this message and since I'm using a unit of work implementation, I can fix it simply by creating a new transaction and wrapping the call to the lazy loaded collection within it.
My problem is that this doesn't feel right...
I have this great NHibernate framework that gives me nice lazy loading but I can't use it without wrapping every property access in a transaction.
I've googled this a lot, read plenty of blog posts, questions on SO, etc, but can't seem to find a complete solution.
This is what I've considered:
Turn off lazy loading. I think this is silly, it's like getting a full on sports car and then only ever driving it in eco mode. Eager loading everything would hurt performance and if I just had ids instead of references then why bother with Nhibernate at all?
Keep the transaction open longer. Transactions should not be long lived and keeping one open as long as a view is open would just be asking for trouble.
Wrap every lazy load property access in a transaction. Works but is bloaty and error prone. (i.e. if I forget to wrap an accessor then it will still work fine. Only using NHProf will tell me the problem)
Always load all the data for the properties I might need when I load the initial object. Again, this is error prone, both with loading data that you don't need (because the later call to access it has been removed at some point) or with not loading data that you do
So is there a better way?
Any help/thoughts appreciated.
I has had the same feelings when I first encountered this warning in NHProf. In web applications I think the most popular way is to have opened transaction (and unit of work) for the whole duration of request. For desktop applications managing transactions (as well as sessions) may be painful. You can use automatic transaction management frameworks (e.g. Castle) and declare with attributes service methods that should be run within transaction. With this approach you can wrap multiple operations into single transaction denending on your requirements. Also, I was using session-per-view approach with one opened session per view and manual transaction management (in this case I just ignored profiler warnings about implicit transactions).
As for your considerations: I strongly don't recommend 2) and 3). 1) and 4) are points to consider. But the general advice is: think, then try different approaches and find a solution that suits better for your particular situation.

(Fluent) NHibernate progress events for lengthy transactions?

We've hooked up the ISaveOrUpdateEventListener event and hoped we could tie it to a progress bar update for each node being visited during the save traversal of a pretty big model, BUT the event only fires once when the save operations starts (only on the node on which the Save( ) was inititated and not on any subnodes).
Are there any other events that are more appropriate to listen to for this?
We've also tried breaking up the save operation (of a hierarchical model) by doing the traversal ourselves, but that seems to degrade the performance even further.
Perhaps we're trying to solve a problem for which FNH wasn't aimed to be used. We're new to it.
We've also set up an alternative solution using SqlBulkCopy, as recommended elsewhere.
We've seen the comments that FNH is primarily supposed for smaller transactions (OLTP) and not the type of exhaustive model we're bound to by our problem (signal processing of huge data volumes).
Background:
We're trying to use Fluent NHibernate on a larger database project with data gathered from fairly complex real time analysis (high frequency, multiple input signals, long experiment times etc). In a prototype we've built we see pretty scary wait times for the moment, and need to hook in some sort of reliable progress indicator.
Yes, now confirmed - as mentioned in my comment above. One (possible) solution to this is to simply turn of Cascades and do the model traversal manually and do explicit Save( ) calls.
This works, although it's not as neat as just handling an event. Still, given the genuin design of NHibernate, I bet there's certainly an event somewhere that could be intercepted - the question is just under what name. ... I bet someone on here knows more.
Also to improve performance we used a Stateless Session, experiemented with differnet batch size, and periodically/explicitly call Flush() and Clear(). See articles below for further details:
http://davybrion.com/blog/2008/10/bulk-data-operations-with-nhibernates-stateless-sessions/
http://ideas-net.blogspot.com/2009/03/nhibernate-update-performance-issue.html
Hope this helps.

Is there any reason I shouldn't cache in nHibernate?

I've just discovered the joy of Cache.ReadWrite() in fluent nHibernate, and have been analyzing the results with nhprof extensively.
It seems to be quite useful, but that seems a bit deceptive. Is there any particular reason I wouldn't want to cache a very frequently used object from a query? I mean, I have to presume I should not just go around decorating every single Mapping with a Cache property ... or should I?
As usual, it depends :)
If something has potential to be updated by background processes that don't use the second level cache, or changed directly in the database, caching will cause problems.
Entities that are infrequently accessed may not be good candidates for second level caching either, as they will just take up space.
Also, you may see some weirdness if you have collections mapped as Inverse - the changes will not be picked up by the second level cache correctly and you'll need to manually evict the collection.
As sJhonny points out below, if you have a web farm scenario (or any where your app is running on several servers) you'll need to use a distributed cache (like memcached) instead of the built in ASP.net cache.