SQL : Synchronize Read/Write Databases in CQRS , Asp.Net Core - sql

I was reading about DDD and CQRS (using Asp.Net Core ,MSSQL), and their different approaches, then I read a topic about separating Read and Write Database ,so I started to search web about how to do so and how to sync those databases, but sadly(maybe I was searching wrong) I didn't find any good source to find how to do so.
So here is my question :
How should I separate those databases, and then how should I sync the data between them, e.g. I have a table called "User" which is in read and write separated dbs,now if I add a new row to the write table in write db, I have to tell the read db to sync itself with write db so I can have the new data there to query and use later,but how? I also read something about Event Sourcing Pattern or Event-Driven Architecture,but they didn't help me find out how to sync.
so anyone know how to do so or have any good resources about this topic which can help a dummy :)
(consider you're explaining for a guy who is learning it for the first time!).
Thanks!

I have a related answer that may provide some background on how to approach CQRS.
The main point to keep in mind is that the "write" side is concerned with changes/transaction (OLTP) and the "read" side is concerned with queries (OLAP).
How you update your "read" side (read model) is going to depend on how you make the "write" side changes. When using an Event Store things may be easier in that each event has a global sequence number and each projection (read model) tracks where it is in terms of the global sequence number. So when new events arrive (projection polls) then they can be actioned if the event applies to the projection.
If you simply update the "write" side with, say, a SQL query then things are going to be a bit different, and possibly tricky, since you don't have any mechanism to replay those changes into the read model should you wish to make changes. In such a case you could use messaging, and possibly store those, or make the changes to the "read" side together with the "write" side... which isn't ideal; unless you need 100% consistency.
As mentioned by #Levi Ramsey, the read model is usually quite a bit different from the write model in that it is optimised for reading so it may include denormalized data or simply be in a data store that is more suited to read models.

The main benefit of CQRS is around being able to use different data models and/or different databases for queries vs. updates. If they are using the same data model, there's often not much benefit (at least not with a DB like SQL Server which is, at most scales, reasonable for both) to CQRS.
This in turn implies that it's generally not possible to just have the two databases automatically be in sync, because there's going to be some model translation involved (e.g. from a relational DB (with a normalized schema) like SQL Server to a denormalized document DB like Mongo).
One fairly common pattern is to have the software which writes to the DB also publish events describing what was updated to some event bus. Another piece of software subscribes to those events and performs the appropriate updates to the read DB. Note that this implies the existence of a period of time where queries against the read DB and the write DB will give different results.

Related

What design patterns for marshalling JSON APIs to/from SQL

I'm working on a first JSON-RPC/JSON-REST API. One of the conveniences of JSON is that it can easily represent structured data (a user may have multiple email addresses, multiple addresses), etc...
For example, the Facebook Graph API nicely represents the kind of thing that's handy to return as JSON objects:
https://fbcdn-dragon-a.akamaihd.net/hphotos-ak-ash3/851559_339008529558010_1864655268_n.png
However, in implementing an API such as this with a relational database, we end up shattering structured objects into very many tables (at least one for each list in the JSON object), and un-shattering them when responding to requests. So:
requires a lot of modelling (separate models for JSON object and SQL tables).
inconsistencies creep in between the models: e.g. user_id (in SQL) vs. userID (in JSON)
marshaling stuff between one model and the other is very time consuming (tedious, error-prone and pointless boilerplate).
What design-patterns exist to help in this situation?
I'm not sure you are looking for design patterns. I would look for tools that handle this better.
I assume that you want to be able to query these objects, and not just store them in TEXT fields. Many databases support XML fairly well, so I would convert the JSON to XML (with a library) and then store that in the database.
You may also want to consider a JSON document based database. That will definitely get you where you want to go.
If you don't need to be able to query these, or only need to query a very small subset of fields, just store the objects as text, and extract those query-able fields into actual columns. This way you don't need to touch the majority of the data, but you can still query the few fields you care about. (Plus you can index them for speedier lookup.)
I have always chosen to implement this functionality in a facade pattern. Since the point of the facade is to simplify (abstract) an underlying complexity as a boundary between two or more systems, it seemed like the perfect place to handle this.
I realize however that this does not quite answer the question. I am talking about the container for the marshalling while the question is about how to better manage the contents (the code that does the job).
My approach here is somewhat old fashioned, but since this an old question maybe that’s okay. I employ (as much as possible) stored procedures in the dB. This promotes better encapsulation than one typically finds with a code layer outside of the dB that has to “know about” dB structure. What inevitably happens in the latter case is that more than one system will be written to do this (one large company I worked at had at least 6 competing ESBs) and there will be conflicts. Also, usually the stored procedure scripting will benefit from some sort of IDE that will helps maintain contextual awareness of the dB structure.
So this approach - even though it is not a pattern per se - makes managing the ORM a lot easier.

Database Type Agnostic Select Query Encapsulation class

I am upgrading a webapp that will be using two different database types. The existing database is a MySQL database, and is tightly integrated with the current systems, and a MongoDB database for the extended functionality. The new functionality will also be relying pretty heavily on the MySQL database for environmental variables such as information on the current user, content, etc.
Although I know I can just assemble the queries independently, it got me thinking of a way that might make the construction of queries much simpler (only for easier legibility while building, once it's finished, converting back to hard coded queries) that would entail an encapsulation object that would contain:
what data is being selected (including functionally derived data)
source (including joined data, I know that join's are not a good idea for non-relational db's, but it would be nice to have the facility just in case, which can be re-written into two queries later for performance times)
where and having conditions (stored as their own object types so they can be processed later, potentially including other select queries that can be interpreted by whatever db is using it)
orders
groupings
limits
This data can then be passed to an interface adapter that can build and execute the query, returning it in an array, or object or whatever is desired.
Although this sounds good, I have no idea if any code like this exists. If so, can anybody point it out to me, if not, are there any resources on similar projects undertaken that might allow me to continue the work and build a basic version?
I know this is a complicated library, but I have been working on this update for the last few days, and constantly switching back and forth has been getting me muddled at times and allowing for mistakes to occur
I would study things like the SQL grammar: http://www.h2database.com/html/grammar.html
Gives you an idea of how queries should be constructed.
You can study existing libraries around LINQ (C#): https://code.google.com/p/linqbridge/
Maybe even check out this link about FQL (Facebook's query language): https://code.google.com/p/mockfacebook/issues/list?q=label:fql
Like you already know, this is a hard problem. It will be a big challenge to make it run efficiently. Maybe consider moving all data from MySQL and Mongo to a third data store that has a copy of all the data and then running queries against that? Replicating all writes to something like Redis or Elastic Search and then write your queries there?
Either way, good luck!

Database access: one master database object or have objects call queries themselves?

For a hobby project I am building an application to keep track of my money. Register everything that comes in and goes out. I am using sqlite as a database backend.
I have two data access models in mind.
Creating one master object as a sort of database connector, which contains methods which execute the queries and provide the required sets of data as a list of objects
Have objects who need data execute the queries themselves
Which one of these is 'the best' and why? Or are there different, better models out there?
The latter option is better. In the first option, you would end up having to touch your universal data access object for just about any update to the code that wasn't purely a change in display logic. If you have different data access objects, then you will have much more testable, maintainable code.
I suggest you read up a bit on the model-view-controller paradigm. The wikipedia article on it is a good start: http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller.
Also, you didn't say which language/platform you were coding in, but most platforms have numerous options for auto-generating a starting point your data access classes from your database. You may find something like that useful.
Much of a muchness really, the thing to avoid is having the "same" sql sprinkled all over your code base.
The key point is. You've just added a new column to Table1. When you do Find In Files "Table1", how many hits are you going to get and where.
If you use one class and there's a lot of db operations, it's going to get very messy very quickly, but if you have one interface (say IModel) with one implementation, you can swap backends very easily.
So how many db operations, and how likely is it you will move away from SqlLite.

How to go from a full SQL querying to something like a NoSQL?

In one of my process I have this SQL query that take 10-20% of the total execution time. This SQL query does a filter on my Database, and load a list of PricingGrid object.
So I want to improve these performance.
So far I guessed 2 solutions :
Use a NoSQL solution, AFAIK these are good solutions for improving reading process.
But the migration seems hard and needs a lot of work (like import the data from sql server to nosql in a regular basis)
I don't have any knowledge , I even don't know which one I should use (the first I'd use is Ravendb because I follow ayende and it's done by the .net community).
I might have some stuff to change in my model to make my object ok for a nosql database
Load all my PricingGrid object in memory (in a static IEnumerable)
This might be a problem when my server won't have enough memory to load everything
I might reinvent the wheel (indexes...) invented by the NoSQL providers
I think I'm not the first one wondering this, so what would be the best solution ? Is there any tools that could help me ?
.net 3.5, SQL Server 2005, windows server 2005
Migrating your data from SQL is only the first step.
Moving to a document store (like RavenDB or MongoDB) also means that you need to:
Denormalize your data
Perform schema validation in your code
Handle concurrency of complex operations in your code since you no longer have transactions (at least not the same way)
Perform rollbacks in the event of partial commits (changes)
Depending on your updates, reads and network model you might also need to handle conflicts
You provided very limited information but it sounds like your needs include a single database server and that your data fits well in the relational model.
In such a case I would vote against a NoSQL solution, it is more likely that you can speed up your queries with database optimizations and still retain all the added value of a RDBMS.
Non-relational databases are tools for a specific job (no matter how they sell them), if you need them it is usually because your data doesn't fit well in the relational model or if you have a need to distribute your data over multiple machines (size or availability). For instance, I use MongoDB for a write-intensive high throughput job management application. It is centralized and the data is very transient so the "cost" of having low durability is acceptable. This doesn't sound like the case for you.
If prefer to use a NoSQL solution perhaps you should try using Memcached+MySQL (InnoDB) this will allow you to get the speed benefits of an in-memory cache (in the form of a memcached daemon plugin) with the underlying protection and capabilities of an RDBMS (MySQL). It should also ease data migration and somewhat reduce the amount of changes required in your code.
I myself have never used it, I find that I either need NoSQL for the reasons I stated above or that I can optimize the RDBMS using stored procedures, indexes and table views in a way which is sufficient for my needs.
Asaf has provided great information in regards to the usage of NoSQL and when it is most appropriate. Given that your main concern was performance, I would tend to agree with his opinion - it would take you much more time and effort to adopt a completely new (and very different) data persistence platform than it would to trick out your SQL Server cluster. That said, my answer is mainly to address the "how" part of your question.
Addressing misunderstandings:
Denormalizing Data - You do not need to manually denormalize your existing data. This will be done for you when it is migrated over. More than anything you need to simply think about your data in a different fashion - root aggregates, entity and value types, etc.
Concurrency/Transactions - Transactions are possible in both Mongo and Raven, they are simply done in a different fashion. One of the inherent ways Raven does this is by using an ORM-like "unit of work" pattern with its RavenSession objects. Yes, your data validation needs to be done in code, but you already should be doing it there anyway. In my experience this is an over-hyped con.
How:
Install Raven or Mongo on a primary server, run it as a service.
Create or extend an existing application that uses the database you intend to port. This application needs all the model classes/libraries that your SQL database provides persistence for.
a. In your "data layer" you likely have a repository class somewhere. Extract an interface form this, and use it to build another repository class for your Raven/Mongo persistence. Both DB's have plenty good documentation for using their APIs to push/pull/update changes in the document graphs. It's pretty damn simple.
b. Load your SQL data into C# objects in memory. Pull back your top-level objects (just the entities) and load their inner collections and related data in memory. Your repository is probably already doing this (ex. when fetching an Order object, ensure not only its properties but associated collections like Items are loaded in memory.
c. Instantiate your Raven/Mongo repository and push the data to it. Primary entities become "top level documents" or "root aggregates" serialized in JSON, and their collections' data nested within. Save changes and close the repository. Note: You may break this step down into as many little pieces as your data deems necessary.
Once your data is migrated, play around with it and ensure you are satisfied. You may want to modify your application Models a little to adjust the way they are persisted to Raven/Mongo - for instance you may want to make both Orders and Items top-level documents and simply use reference values (much like relationships in RDBMS systems). Watch out here though, as doing so sort-of goes against the principal and performance behind NoSQL as now you have to tap the DB twice to get the Order and the Items.
If satisfied, shard/replicate your mongo/raven servers across your remaining available server boxes.
Obviously there are tons of little details I did not explain, but that is the general process, and much of it depends on the applications already consuming the database and may be tricky if more than one app/system talks to it.
Lastly, just to reiterate what Asaf said... learn as much as you can about NoSQL and its best use-cases. It is an amazing tool, but not golden solution for all data persistence. In your case try to really find the bottlenecks in your current solution and see if they are solvable. As one of my systems guys says, "technology for technology's sake is bullshit"

I need advise choosing a NoSQL database for a project with a lot of minute related information

I am currently working on a private project that is going to use Google's GTFS spec to get information about 100s of Public Transit agencies, their routers, stations, times, and other related information. I will be getting my information from here and the google code wiki page with similar info. There is a lot of data and its partitioned into multiple CSV formatted text files. These can be huge, some ranging in 80-100mb of data.
With the data I have, I want to translate it all into a nice solid database that I can build layers on top of to use for my project. I will be using GPS positioning to pinpoint a location and all surrounding stations/stops.
My goal is to access all the information for all these stops and stations with as few calls as possible, while keeping datasets small for queried results.
I am currently leaning towards MongoDB and CouchDB for their GeoSpatial support that can really optimize getting small datasets. But I also need to be sure to link all the stops on a route because I will be propagating information along a transit route for that line. In this case I have found that I can benefit from a Graph DB like Neo4j and OrientDB, but from what I know, neither has GeoSpatial support nor am I 100% sure that a Graph DB would be what I need.
The perfect solution might not exist, but I come here asking for help on finding the best possible for my situation. I know I will possible have to work around limitations of whatever I choose, but I want to at least have done my research and know that its the best I can get at the moment.
I have also been suggested to splinter the data into multiple DBs, but that could get very messy because all the information is very tightly interconnected through IDs.
Any help would be appreciated.
Obviously a graph database fits 100% your problem. My advice here is to go for some geo spatial module over neo4j or orientdb, althought you have some others free and open source implementation.
I think the best one right now, with all the geo spatial thing implemented is neo4j-spatial package. But as far as I know, you can also reproduce most of the geo spatial thing on your own if necessary.
BTW talking about splitting, if the amount of data/queries will be high, I strongly recommend you to share the load and think the model in this terms. Sure you can do something.
I've used Mongo's GeoSpatial features and can offer some guidance if you need help with a C# or javascript implementation - I would recommend it to start because it's super easy to use. I'm learning all about Neo4j right now and I am working on a hybrid approach that takes advantage of both Mongo and Neo4j. You might want to cross reference the documents in Mongo to the nodes in Neo4j using the Mongo object id.
For my hybrid implementation, I'm storing profiles and any other large static data in Mongo. In Neo4j, I'm storing relationships like friend and friend-of-friend. If I wanted to analyze movies two friends are most likely to want to watch together (or really any other relationship I hadn't thought of initially), by keeping that object id reference I can simply add some code instructing each node go out and grab a list of movies from the related profile.
Added 2011-02-12:
Just wanted to follow up on this "hybrid" idea as I created prototypes for and implemented a few more solutions recently where I ended up using more than one database. Martin Fowler refers to this as "Polyglot Persistence."
I'm finding that I am often using a combination of a relational database, document database and a graph database (in my case this is generally SQL Server, MongoDB and Neo4j). Since the question is related to data modeling as much as it is to geospatial, I thought I would touch on that here:
I've used Neo4j for site organization (similar to the idea of hypermedia in the REST model), modeling social data and building recommendations (often based on social data). As a result, I will generally model this part of the application before I begin programming.
I often end up using MongoDB for prototyping the rest of the application because it provides such a simple persistence mechanism. I like to start developing an application with the user interface, so this ends up working well.
When I start moving entities from Mongo to SQL Server, the context is usually important - for instance, if I have an application that allows users to build daily reports based on periodically collected data, it may make sense to run a procedure that builds those reports each night and stores daily report objects in Mongo that may be combined into larger aggregate reports as needed (obviously this doesn't consider a few special cases, but that is not relevant to the point)...on the other hand, if users need to pull on-demand reports limited to very specific time periods, it may make sense to keep everything in SQL server and build those reports as needed.
That said, and this deserves more intense thought, here are some considerations that may be helpful:
I generally try to store entities in a relational database if I find that pulling an entity from the database [in other words(in the context of a relational database) - querying data from the database that provides the data required to generate an entity or list of entities that fulfills the requested parameters] does not require significant processing (multiple joins, for instance)
Do you require ACID compliance(aside:if you have a graph problem, you can leverage Neo4j for this)? There are document databases with ACID compliance, but there's a reason Mongo is not: What does MongoDB not being ACID compliant really mean?
One use of Mongo I saw in the wild that I thought was worthy of mention - Hadoop was being used to compute massive hash tables that were then stored in Mongo. I believe a similar approach is used by TripAdvisor for user based customization in terms of targeting offers, advertising, etc..
NoSQL only exists because MySQL users assume that all databases have their performance problems when their database grows large and/or becomes complex.
I suggest that you use PostGIS. You can use the same database for the rest of your data needs as well.
http://postgis.refractions.net/