I have an entity with multiple fields. There are two types of actions that may be performed on it: a long one, usually initiated by the user, and a short one, which is periodically run by the system. Both of these update the entity, but they touch different fields.
There can't be two concurrent long operations or two concurrent short operations. But the system may schedule a short operation while a long operation is in progress, and the two should execute concurrently. Since they touch different fields, I believe this should be possible.
I think NHibernate's change tracking should do the trick here - i.e., if one session loads an entity and updates some fields, and another session loads the same entity and updates different fields, then the two will not collide. However, I feel I shouldn't be relying on this because it sounds like an "optimization" or an "implementation detail". I tend to think of change tracking as an optimization to reduce database traffic, I don't want the functionality of the system to depend on it. Also, if I ever decide to implement optimistic concurrency for this entity, then I risk getting a StaleObjectException, even though I can guarantee that there is no actual collision.
What would be the best way to achieve this? Should I split the entity into two? Can't this affect database consistency (e.g. what if only one "half" of the entity is in the DB)? Can I use NHibernate to explicitly set only a single field of an entity? Am I wrong in not wanting to rely on change tracking to achieve functionality?
If it matters, I'm using Fluent NHibernate.
You could map the entity using dynamic update.
dynamic-update (optional, defaults to false): Specifies that UPDATE SQL should be generated at runtime and contain only those columns whose values have changed.
If you enable dynamic-update, you will have a choice of optimistic locking strategies:
version check the version/timestamp columns
all check all columns
dirty check the changed columns
none do not use optimistic locking
More information here.
Related
What can we use to reduce the time taken to get data from the database using Entity Framework? sny suggestion with caching or any other way?
I would recommend you check the tracking options (especially .AsNoTracking) provided by the EF.
A sneak peek to begin your research:
When we use function AsNoTracking() we are explicitly telling Entity Framework that the entities are not tracked by the context. This can be especially useful when retrieving large amounts of data from your data store. If you want to make changes to un-tracked entities however, you must remember to attach them before calling SaveChanges.
I was reading about DDD and CQRS (using Asp.Net Core ,MSSQL), and their different approaches, then I read a topic about separating Read and Write Database ,so I started to search web about how to do so and how to sync those databases, but sadly(maybe I was searching wrong) I didn't find any good source to find how to do so.
So here is my question :
How should I separate those databases, and then how should I sync the data between them, e.g. I have a table called "User" which is in read and write separated dbs,now if I add a new row to the write table in write db, I have to tell the read db to sync itself with write db so I can have the new data there to query and use later,but how? I also read something about Event Sourcing Pattern or Event-Driven Architecture,but they didn't help me find out how to sync.
so anyone know how to do so or have any good resources about this topic which can help a dummy :)
(consider you're explaining for a guy who is learning it for the first time!).
Thanks!
I have a related answer that may provide some background on how to approach CQRS.
The main point to keep in mind is that the "write" side is concerned with changes/transaction (OLTP) and the "read" side is concerned with queries (OLAP).
How you update your "read" side (read model) is going to depend on how you make the "write" side changes. When using an Event Store things may be easier in that each event has a global sequence number and each projection (read model) tracks where it is in terms of the global sequence number. So when new events arrive (projection polls) then they can be actioned if the event applies to the projection.
If you simply update the "write" side with, say, a SQL query then things are going to be a bit different, and possibly tricky, since you don't have any mechanism to replay those changes into the read model should you wish to make changes. In such a case you could use messaging, and possibly store those, or make the changes to the "read" side together with the "write" side... which isn't ideal; unless you need 100% consistency.
As mentioned by #Levi Ramsey, the read model is usually quite a bit different from the write model in that it is optimised for reading so it may include denormalized data or simply be in a data store that is more suited to read models.
The main benefit of CQRS is around being able to use different data models and/or different databases for queries vs. updates. If they are using the same data model, there's often not much benefit (at least not with a DB like SQL Server which is, at most scales, reasonable for both) to CQRS.
This in turn implies that it's generally not possible to just have the two databases automatically be in sync, because there's going to be some model translation involved (e.g. from a relational DB (with a normalized schema) like SQL Server to a denormalized document DB like Mongo).
One fairly common pattern is to have the software which writes to the DB also publish events describing what was updated to some event bus. Another piece of software subscribes to those events and performs the appropriate updates to the read DB. Note that this implies the existence of a period of time where queries against the read DB and the write DB will give different results.
I'm looking over the dynamo documentation and it looks like they have optimistic. I'm wondering if this is used by default or not.
From the documentation, it looks like you need to code up the java application to use the #DynamoDBVersionAttribute annotation and get and set the versions. Without doing this, it looks like you can write to DynamoDB without any sort of locking.
Is that correct?
On a side note, I'm not too familiar with DBs without some sort of locking so what would happen if 2 people wrote to the same item at the same time in DynamoDB without any locking? Say the item we're writing to has 4 fields, would one write completely fail or is it possible that DynamoDB updates 2/4 fields with 1 write, and the other 2 fields with the other write?
You are correct. DynamoDB does NOT have optimistic locking by default. There are various SDKs for DynamoDB and as far as I am aware the only one which provides optimistic locking functionality is the Java SDK.
Here's what the Java SDK optimistic locking actually supports:
Creates an attribute in your table called version
You must load an item from the database before updating it
When you try and save an item the SDK tests that the client item version number matches the one in the table, and if it does, the save is completed and the version number is incremented
This is pretty simple to implement yourself if you are using a different SDK. You would create the version attribute yourself. You would create a wrapper for the putItem method (and any other required save/update operations). You would use the Condition Expression to test that the version number in the database is one less than the version you saving.
To answer the second part of your question, both updates would succeed (assuming you had put no conditions on your update). The first one would make any updates specified, and the second one would come along and overwrite them.
Dynamodb doesn't support optimistic locking by default. As you mentioned, you need to use the annotation in the Java model class in order to use the optimistic locking.
If two threads write to the same item, the Dynamodb item will have the last write data (i.e. last thread which writes the data).
I'm rather used to use one database alone (say PostgreSQL or ElasticSearch).
But currently I'm using a mix (PG and ES) in a prototype app and may throw other kind of dbs in the mix (eg: redis).
Say some piece of data need to be persisted to each databases in a different way.
How do you keep a system consistent in the event of a failure on one of the components/databases ?
Example scenario that i'm facing:
Data update on PostgreSQL, ElasticSearch is unavailable.
At this point, the system is inconsistent, as I should have updated both databases.
As I'm using an SQL db, I can simply abort the transaction to put the system in its previous consistent state.
But what is the best way to keep the system consistent ?
Check everytime that the value has been persisted in all databases ?
In case of failure, restore the previous state ? But in some NoSQL databases there is no transaction/ACID mechanism, so I can't revert as easily the previous state.
Additionnaly, if multiple databases must be kept in sync, is there any good practice to have, like adding some kind of "version" metadata (whether a timestamp or an home made incrementing version number) so you can put your databases back in sync ? (Not talking about CouchDB where it is built-in!)
Moreover, the databases are not all updated atomically so some part are inconsistent for a short period. I think it depends on the business of the app but does anyone have some thought about the problem that my occur or the way to fix that ? I guess it must be tough and depends a lot of the configuration (for maybe very few real benefits).
I guess this may be a common architecture issue but I'm having trouble to find information on the subject.
Keep things simple.
Search engine can and will lag behind sometimes. You may fight it. You may embrace it. It's fine, and most of the times its acceptable.
Don't mix the data. If you use Redis for sessions - good. Don't store stuff from database A in B and vice versa.
Select proper database with ACID and strong consistency for your Super Important Business Data™®.
Again, do not mix the data.
Using more than one database technology in one product is a decision one shouldn't make light-hearted. The more technologies you use the more complex your project will become in development, deployment, maintenance and administration. Also, every database technology will become an individual point of failure. That means it is often much wiser to stick to one technology, even when it means that you need to make some compromises.
But when you have good(!) reason to use multiple DBMS, you should try to keep them as separated as possible. Avoid placing related data spanning multiple databases. When possible, no feature should require more than one DBMS to work (preferably a failure of the DBMS would only affect those features which use it). Storing redundant data in two different DBMS should also be avoided.
When you can't avoid redundancies and relationships spanning multiple DBMS, you should decide on one system to be the single source of truth (preferably one which you trust most regarding consistency). When there are inconsistencies between systems, they should be resolved by synchronizing the data with the SSOT.
What do I need to know when setting up caching using NHibernate, in the case that I have two applications running on different servers, but only one database. Are table dependencies generally sufficient to make sure that weird caching problems don't arise? If so, what sort of polltime should I look at?
well in order for nhibernate to check for concurrency issues you can add a field to your entities. That will cause nhibernate to throw a concurrency exception when trying to update an entity that has been modified by someone else.
If you want to use the second level cache with multiple servers I can recommend a distributed implementation of the nhibernate second level cache, for example NCache:
http://www.alachisoft.com/ncache/nhibernate_index.html