How to implement conditional put in RocksDB? - sql

In an earlier blog written by members of CockroachDB: https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mapping-table-data-to-key-value-storage/, the author states that CockroachDB's key-value API supports a ConditionalPut(key, value, expected-value). Given that CockroachDB was built on RocksDB, how were they able to support conditional put?

CockroachDB implements ConditionalPut using the same mechanism it uses for ACID read-write transactions. Key-values are stored along with a multi-version concurrency control timestamp. To do a ConditionalPut, the storage client reads the existing value "as of" the same timestamp it's going to write the new value at. Since the write being discussed here is the write to a secondary index, there's already an implicit or explicit transaction happening, so there's no extra overhead beyond the read to check the precondition.

Related

How to speed up database operations?

What can we use to reduce the time taken to get data from the database using Entity Framework? sny suggestion with caching or any other way?
I would recommend you check the tracking options (especially .AsNoTracking) provided by the EF.
A sneak peek to begin your research:
When we use function AsNoTracking() we are explicitly telling Entity Framework that the entities are not tracked by the context. This can be especially useful when retrieving large amounts of data from your data store. If you want to make changes to un-tracked entities however, you must remember to attach them before calling SaveChanges.

SQL : Synchronize Read/Write Databases in CQRS , Asp.Net Core

I was reading about DDD and CQRS (using Asp.Net Core ,MSSQL), and their different approaches, then I read a topic about separating Read and Write Database ,so I started to search web about how to do so and how to sync those databases, but sadly(maybe I was searching wrong) I didn't find any good source to find how to do so.
So here is my question :
How should I separate those databases, and then how should I sync the data between them, e.g. I have a table called "User" which is in read and write separated dbs,now if I add a new row to the write table in write db, I have to tell the read db to sync itself with write db so I can have the new data there to query and use later,but how? I also read something about Event Sourcing Pattern or Event-Driven Architecture,but they didn't help me find out how to sync.
so anyone know how to do so or have any good resources about this topic which can help a dummy :)
(consider you're explaining for a guy who is learning it for the first time!).
Thanks!
I have a related answer that may provide some background on how to approach CQRS.
The main point to keep in mind is that the "write" side is concerned with changes/transaction (OLTP) and the "read" side is concerned with queries (OLAP).
How you update your "read" side (read model) is going to depend on how you make the "write" side changes. When using an Event Store things may be easier in that each event has a global sequence number and each projection (read model) tracks where it is in terms of the global sequence number. So when new events arrive (projection polls) then they can be actioned if the event applies to the projection.
If you simply update the "write" side with, say, a SQL query then things are going to be a bit different, and possibly tricky, since you don't have any mechanism to replay those changes into the read model should you wish to make changes. In such a case you could use messaging, and possibly store those, or make the changes to the "read" side together with the "write" side... which isn't ideal; unless you need 100% consistency.
As mentioned by #Levi Ramsey, the read model is usually quite a bit different from the write model in that it is optimised for reading so it may include denormalized data or simply be in a data store that is more suited to read models.
The main benefit of CQRS is around being able to use different data models and/or different databases for queries vs. updates. If they are using the same data model, there's often not much benefit (at least not with a DB like SQL Server which is, at most scales, reasonable for both) to CQRS.
This in turn implies that it's generally not possible to just have the two databases automatically be in sync, because there's going to be some model translation involved (e.g. from a relational DB (with a normalized schema) like SQL Server to a denormalized document DB like Mongo).
One fairly common pattern is to have the software which writes to the DB also publish events describing what was updated to some event bus. Another piece of software subscribes to those events and performs the appropriate updates to the read DB. Note that this implies the existence of a period of time where queries against the read DB and the write DB will give different results.

Is it okay to have more than one repository for an aggregate in DDD?

I've read this question about something similar but it didn't quite solve my problem.
I have an application where I'm required to use data from an API. Problem is there are performance and technical limitations to doing this. The performance limitations are obvious. The technical limitations lie in the fact that the API does not support some of the more granular queries I need to make.
I decided to use MySQL as a queryable cache.
Since the data I needed to retrieve from the API did not change very often, I settled on refreshing the cache once a day, so I didn't need any complicated mapper that checked if we had the data in the cache and if not fell back to the API. That was my first design, but I realized that wasn't very practical when the API couldn't support most of the queries I needed to make anyway.
Now I have a set of two mappers for every aggregate. One for MySQL and one for the API.
My problem is now how I hide the complexities of persistence from the domain, and the fact that it seems that I need multiple repositories.
Ideally I would have an interface that both mappers adhered to, but as previously disclosed that's not possible.
Is it okay to have multiple repositories, one for each mapper?
Is it okay to have more than one repository for an aggregate in DDD?
Short answer: yes.
Longer answer: you won't find any suggestion of multiple repository in the original book by Evans. As he described things, the domain model would have one representation of the aggregate, and the repository abstraction provided consumers with the illusion that the aggregate was stored in an in-memory collection.
Largely, this makes sense -- you are trying to ensure that writes to data within the aggregate boundary are consistent, so you need a single authority for change.
But... there's no particular reason that reads need to travel through the same code path as writes. Welcome to the world of cqrs. What that gives you immediately is the idea that the in memory representation for reads might need to be optimized differently from the in memory representation used for writes.
In its more general form, you get the idea that the concept that you are modeling might have different representations for each use case.
For your case, where it is sometimes appropriate to read from the RDBMS, sometimes from the API, sometimes both, this isn't quite an exact match -- the repository interface hides the implementation details from the consumer, but you still have to bother with the implementation.
One thing you might look at is your requirements; how fresh does the data need to be in each use case? A constraint that is often relaxed in the CQRS pattern is the idea that the effects of writes are immediately available for reading. The important question to ask would be, if the data hasn't been cached yet, can you simply report "data not available" without hitting the API?
If so, then use cases that access the cached data need only a single repository implementation.
If you are using external API to read and modify data, you can cache them locally to be faster in reads, but I would avoid to have a domain repository.
From the domain perspective it seems that you need a service to query (or just a Query in CQRS implementation) for some data, that you can do with a service, that internally can call some remote API or read from a local cache (mysql, whatever).
When you read your local cache you can develop a repository to decouple your logic from the db implementation, but this is a different concept from a domain repository, it is just a detail of your technical implementation, that has nothing to do with your domain.
If the remote service start offering the query you need you will change the implementation of how your query is executed, calling the remote API instead of the db, but your domain model should not change.
A domain repository is used to load and persist your aggregates, meanwhile if you are working with external aggregates (in a different context, subdomain) you need to interact with them using services.

Does DynamoDB have locking by default?

I'm looking over the dynamo documentation and it looks like they have optimistic. I'm wondering if this is used by default or not.
From the documentation, it looks like you need to code up the java application to use the #DynamoDBVersionAttribute annotation and get and set the versions. Without doing this, it looks like you can write to DynamoDB without any sort of locking.
Is that correct?
On a side note, I'm not too familiar with DBs without some sort of locking so what would happen if 2 people wrote to the same item at the same time in DynamoDB without any locking? Say the item we're writing to has 4 fields, would one write completely fail or is it possible that DynamoDB updates 2/4 fields with 1 write, and the other 2 fields with the other write?
You are correct. DynamoDB does NOT have optimistic locking by default. There are various SDKs for DynamoDB and as far as I am aware the only one which provides optimistic locking functionality is the Java SDK.
Here's what the Java SDK optimistic locking actually supports:
Creates an attribute in your table called version
You must load an item from the database before updating it
When you try and save an item the SDK tests that the client item version number matches the one in the table, and if it does, the save is completed and the version number is incremented
This is pretty simple to implement yourself if you are using a different SDK. You would create the version attribute yourself. You would create a wrapper for the putItem method (and any other required save/update operations). You would use the Condition Expression to test that the version number in the database is one less than the version you saving.
To answer the second part of your question, both updates would succeed (assuming you had put no conditions on your update). The first one would make any updates specified, and the second one would come along and overwrite them.
Dynamodb doesn't support optimistic locking by default. As you mentioned, you need to use the annotation in the Java model class in order to use the optimistic locking.
If two threads write to the same item, the Dynamodb item will have the last write data (i.e. last thread which writes the data).

Transactional behavior across in-memory objects

I want to make a sequence of in-memory operations atomic. I presume there is no framework supplied functionality for this and that I would have to implement my own rollback functionality using memento (or something)?
If it needs to be really atomic there is no such thing AFAIK in the Framework itself - an interesting link discussing this issue.
What you ask is called STM (Software Transactional Memory) and is an inherent part for example of Haskell.
Basically any implementation uses some sort of copy meachnism - either keeping the old data till the transaction is commited OR makring a copy first and then do all "changes" on the copy and switch references on commit... anyway always some log and/or copying mechanism involved...
For C# check these links out:
http://research.microsoft.com/en-us/downloads/6cfc842d-1c16-4739-afaf-edb35f544384/default.aspx
http://download.microsoft.com/download/9/5/6/9560741A-EEFC-4C02-822C-BB0AFE860E31/STM_User_Guide.pdf
http://blogs.msdn.com/b/stmteam/
IF F# is an option then check these links out:
http://cs.hubfs.net/blogs/hell_is_other_languages/archive/2008/01/16/4565.aspx
http://geekswithblogs.net/Podwysocki/archive/2008/02/07/119387.aspx
Another option could be to use an "in-memory-Database" - there are several out there with transaction support thus providing atomic operation via the DB... as long as the DB is "in-memory" it should perform well