I want to make a sequence of in-memory operations atomic. I presume there is no framework supplied functionality for this and that I would have to implement my own rollback functionality using memento (or something)?
If it needs to be really atomic there is no such thing AFAIK in the Framework itself - an interesting link discussing this issue.
What you ask is called STM (Software Transactional Memory) and is an inherent part for example of Haskell.
Basically any implementation uses some sort of copy meachnism - either keeping the old data till the transaction is commited OR makring a copy first and then do all "changes" on the copy and switch references on commit... anyway always some log and/or copying mechanism involved...
For C# check these links out:
http://research.microsoft.com/en-us/downloads/6cfc842d-1c16-4739-afaf-edb35f544384/default.aspx
http://download.microsoft.com/download/9/5/6/9560741A-EEFC-4C02-822C-BB0AFE860E31/STM_User_Guide.pdf
http://blogs.msdn.com/b/stmteam/
IF F# is an option then check these links out:
http://cs.hubfs.net/blogs/hell_is_other_languages/archive/2008/01/16/4565.aspx
http://geekswithblogs.net/Podwysocki/archive/2008/02/07/119387.aspx
Another option could be to use an "in-memory-Database" - there are several out there with transaction support thus providing atomic operation via the DB... as long as the DB is "in-memory" it should perform well
Related
In an earlier blog written by members of CockroachDB: https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mapping-table-data-to-key-value-storage/, the author states that CockroachDB's key-value API supports a ConditionalPut(key, value, expected-value). Given that CockroachDB was built on RocksDB, how were they able to support conditional put?
CockroachDB implements ConditionalPut using the same mechanism it uses for ACID read-write transactions. Key-values are stored along with a multi-version concurrency control timestamp. To do a ConditionalPut, the storage client reads the existing value "as of" the same timestamp it's going to write the new value at. Since the write being discussed here is the write to a secondary index, there's already an implicit or explicit transaction happening, so there's no extra overhead beyond the read to check the precondition.
After going through multiple stack overflow posts and blog articles, I came to this decision that we need UnitOfWork design pattern to maintain transactional integrity while writing the domain objects to their respective repositories.
However, we do not need such integrity while reading/searching the repository. Given that, is it a good design to separate the purposes of repositories and unit of works, with the former to be used only for reading domain objects and the later to be used only to create/write/refresh/delete domain objects?
Eric Evans, Domain Driven Design:
Implementation (of a repository) will vary greatly, depending on the technology being used for persistence and the infrastructure you have. The ideal is to hide all the inner workings from the client (although not from the developer of the client), so that the client code ill be the same whether the data is stored in an object database, a relational database, or simply held in memory....
The possibilities of implementation are so diverse that I can only list some concerns to keep in mind....
Leave transaction control to the client. Although the REPOSITORY will insert and delete from the database, it will ordinarily not commit anything. It is tempting to commit after saving, for example, but the client presumably has the context to correctly initiate and commit units of work. Transaction management will be simpler if the REPOSITORY keeps its hands off.
That said; I call your attention in particular to an important phrase in the above discussion: the database. The underlying assumption here being that all of the aggregates being modified are stored in such a way that the unit of work can be committed atomically.
When that's not the case -- for example, if you are storing aggregates in a document store that doesn't promise atomic updates of multiple documents, then you may want to consider making this separation explicit in your model, rather than trying to disguise the fact that you are trying to coordinate multiple commits.
It is entirely reasonable to use one set of repositories for your read use cases, which are distinct from those used in your write use cases. In other words, when we have different semantics, then we should have a different interface, the implementations of which can be tuned as necessary.
I'm looking over the dynamo documentation and it looks like they have optimistic. I'm wondering if this is used by default or not.
From the documentation, it looks like you need to code up the java application to use the #DynamoDBVersionAttribute annotation and get and set the versions. Without doing this, it looks like you can write to DynamoDB without any sort of locking.
Is that correct?
On a side note, I'm not too familiar with DBs without some sort of locking so what would happen if 2 people wrote to the same item at the same time in DynamoDB without any locking? Say the item we're writing to has 4 fields, would one write completely fail or is it possible that DynamoDB updates 2/4 fields with 1 write, and the other 2 fields with the other write?
You are correct. DynamoDB does NOT have optimistic locking by default. There are various SDKs for DynamoDB and as far as I am aware the only one which provides optimistic locking functionality is the Java SDK.
Here's what the Java SDK optimistic locking actually supports:
Creates an attribute in your table called version
You must load an item from the database before updating it
When you try and save an item the SDK tests that the client item version number matches the one in the table, and if it does, the save is completed and the version number is incremented
This is pretty simple to implement yourself if you are using a different SDK. You would create the version attribute yourself. You would create a wrapper for the putItem method (and any other required save/update operations). You would use the Condition Expression to test that the version number in the database is one less than the version you saving.
To answer the second part of your question, both updates would succeed (assuming you had put no conditions on your update). The first one would make any updates specified, and the second one would come along and overwrite them.
Dynamodb doesn't support optimistic locking by default. As you mentioned, you need to use the annotation in the Java model class in order to use the optimistic locking.
If two threads write to the same item, the Dynamodb item will have the last write data (i.e. last thread which writes the data).
In one of my process I have this SQL query that take 10-20% of the total execution time. This SQL query does a filter on my Database, and load a list of PricingGrid object.
So I want to improve these performance.
So far I guessed 2 solutions :
Use a NoSQL solution, AFAIK these are good solutions for improving reading process.
But the migration seems hard and needs a lot of work (like import the data from sql server to nosql in a regular basis)
I don't have any knowledge , I even don't know which one I should use (the first I'd use is Ravendb because I follow ayende and it's done by the .net community).
I might have some stuff to change in my model to make my object ok for a nosql database
Load all my PricingGrid object in memory (in a static IEnumerable)
This might be a problem when my server won't have enough memory to load everything
I might reinvent the wheel (indexes...) invented by the NoSQL providers
I think I'm not the first one wondering this, so what would be the best solution ? Is there any tools that could help me ?
.net 3.5, SQL Server 2005, windows server 2005
Migrating your data from SQL is only the first step.
Moving to a document store (like RavenDB or MongoDB) also means that you need to:
Denormalize your data
Perform schema validation in your code
Handle concurrency of complex operations in your code since you no longer have transactions (at least not the same way)
Perform rollbacks in the event of partial commits (changes)
Depending on your updates, reads and network model you might also need to handle conflicts
You provided very limited information but it sounds like your needs include a single database server and that your data fits well in the relational model.
In such a case I would vote against a NoSQL solution, it is more likely that you can speed up your queries with database optimizations and still retain all the added value of a RDBMS.
Non-relational databases are tools for a specific job (no matter how they sell them), if you need them it is usually because your data doesn't fit well in the relational model or if you have a need to distribute your data over multiple machines (size or availability). For instance, I use MongoDB for a write-intensive high throughput job management application. It is centralized and the data is very transient so the "cost" of having low durability is acceptable. This doesn't sound like the case for you.
If prefer to use a NoSQL solution perhaps you should try using Memcached+MySQL (InnoDB) this will allow you to get the speed benefits of an in-memory cache (in the form of a memcached daemon plugin) with the underlying protection and capabilities of an RDBMS (MySQL). It should also ease data migration and somewhat reduce the amount of changes required in your code.
I myself have never used it, I find that I either need NoSQL for the reasons I stated above or that I can optimize the RDBMS using stored procedures, indexes and table views in a way which is sufficient for my needs.
Asaf has provided great information in regards to the usage of NoSQL and when it is most appropriate. Given that your main concern was performance, I would tend to agree with his opinion - it would take you much more time and effort to adopt a completely new (and very different) data persistence platform than it would to trick out your SQL Server cluster. That said, my answer is mainly to address the "how" part of your question.
Addressing misunderstandings:
Denormalizing Data - You do not need to manually denormalize your existing data. This will be done for you when it is migrated over. More than anything you need to simply think about your data in a different fashion - root aggregates, entity and value types, etc.
Concurrency/Transactions - Transactions are possible in both Mongo and Raven, they are simply done in a different fashion. One of the inherent ways Raven does this is by using an ORM-like "unit of work" pattern with its RavenSession objects. Yes, your data validation needs to be done in code, but you already should be doing it there anyway. In my experience this is an over-hyped con.
How:
Install Raven or Mongo on a primary server, run it as a service.
Create or extend an existing application that uses the database you intend to port. This application needs all the model classes/libraries that your SQL database provides persistence for.
a. In your "data layer" you likely have a repository class somewhere. Extract an interface form this, and use it to build another repository class for your Raven/Mongo persistence. Both DB's have plenty good documentation for using their APIs to push/pull/update changes in the document graphs. It's pretty damn simple.
b. Load your SQL data into C# objects in memory. Pull back your top-level objects (just the entities) and load their inner collections and related data in memory. Your repository is probably already doing this (ex. when fetching an Order object, ensure not only its properties but associated collections like Items are loaded in memory.
c. Instantiate your Raven/Mongo repository and push the data to it. Primary entities become "top level documents" or "root aggregates" serialized in JSON, and their collections' data nested within. Save changes and close the repository. Note: You may break this step down into as many little pieces as your data deems necessary.
Once your data is migrated, play around with it and ensure you are satisfied. You may want to modify your application Models a little to adjust the way they are persisted to Raven/Mongo - for instance you may want to make both Orders and Items top-level documents and simply use reference values (much like relationships in RDBMS systems). Watch out here though, as doing so sort-of goes against the principal and performance behind NoSQL as now you have to tap the DB twice to get the Order and the Items.
If satisfied, shard/replicate your mongo/raven servers across your remaining available server boxes.
Obviously there are tons of little details I did not explain, but that is the general process, and much of it depends on the applications already consuming the database and may be tricky if more than one app/system talks to it.
Lastly, just to reiterate what Asaf said... learn as much as you can about NoSQL and its best use-cases. It is an amazing tool, but not golden solution for all data persistence. In your case try to really find the bottlenecks in your current solution and see if they are solvable. As one of my systems guys says, "technology for technology's sake is bullshit"
We are setting up a Jboss cluster and we are building an own distributed cache solution built upon Jboss cache (Cant use it as 2nd level cache to ORM layer in our case). We want to use invalidation and not replication as cache mode. As far as i can see after (very) little testing both solutions seem to work, objects are put into the cache and objects seem to be evicted when they are updated on any of the servers.
This leads me to believe that PojoCache with AOP instrumentation is only needed when using replication so that you can replicate only updated field values and not whole objects. Am I correct here or are there any other advantages with using PojoCache over TreeCache in our scenario? And if PojoCache have advantages, do we still need AOP instrumentation and to annotate our entities with #PojoCacheable (yes, we are using JBCache 1.4.1) since we are not using relication?
Regards
Jonas Heineson
PoJoCache has the ability through AOP to:
only replicate changed fields and not whole objects. Makes a difference if e.g. your person object containes a huge image of the person and you only change the password
detect changes and thus can automatically put them on the list to be replicated.
TreeCache (plain) does not need AOP, but can thus not replicate individual fields or detect what has changed so that you need to trigger replication yourself.
If you don't replicate, those points are probably irrelevant.
IIrc, you don't need the #PojocaCacheable annotation for Pojo cache - without it, you need to specify the classes to be enhanced in a different way.
I have the feeling that if you are not replicating, the plain TreeCache will be enough.