Context:
Microservices architecture, DDD, CQRS, Event driven.
SQL database.
I have an use case, where I have to store a record when a entity state is updated. I'm afraid that the quantity of records could be huge, and I was thinking that maybe an sql table is not the right place to store it. Also this records are used every now and then, and probably not by the service domain.
Could be a good practice to store it in another database(firestore, mongo, cassandra...) so it doesn't affect the performance and the scope of this service?
Thanks!
Could be a good practice to store it in another database(firestore, mongo, cassandra...) so it doesn't affect the performance and the scope of this service?
Part of the benefit of using microservices is that you are hiding implementation details. As such, you are able to store/process data by whatever means is required or available without the need to broadcast that implementation to external services.
That said, from a technical standpoint, it is worth considering transaction boundaries. When writing to a single database, it is possible to commit transactions easily. Once you are writing to different databases within the same transaction, you can run into situations where one write might succeed while another one might fail.
My recommendation is to make sure you write to only one of those databases at a time. Use a two-phase commit to ensure that the second database is written to. In this way, you can avoid lost data and get the benefit of using a more efficient data store.
Related
I was reading about DDD and CQRS (using Asp.Net Core ,MSSQL), and their different approaches, then I read a topic about separating Read and Write Database ,so I started to search web about how to do so and how to sync those databases, but sadly(maybe I was searching wrong) I didn't find any good source to find how to do so.
So here is my question :
How should I separate those databases, and then how should I sync the data between them, e.g. I have a table called "User" which is in read and write separated dbs,now if I add a new row to the write table in write db, I have to tell the read db to sync itself with write db so I can have the new data there to query and use later,but how? I also read something about Event Sourcing Pattern or Event-Driven Architecture,but they didn't help me find out how to sync.
so anyone know how to do so or have any good resources about this topic which can help a dummy :)
(consider you're explaining for a guy who is learning it for the first time!).
Thanks!
I have a related answer that may provide some background on how to approach CQRS.
The main point to keep in mind is that the "write" side is concerned with changes/transaction (OLTP) and the "read" side is concerned with queries (OLAP).
How you update your "read" side (read model) is going to depend on how you make the "write" side changes. When using an Event Store things may be easier in that each event has a global sequence number and each projection (read model) tracks where it is in terms of the global sequence number. So when new events arrive (projection polls) then they can be actioned if the event applies to the projection.
If you simply update the "write" side with, say, a SQL query then things are going to be a bit different, and possibly tricky, since you don't have any mechanism to replay those changes into the read model should you wish to make changes. In such a case you could use messaging, and possibly store those, or make the changes to the "read" side together with the "write" side... which isn't ideal; unless you need 100% consistency.
As mentioned by #Levi Ramsey, the read model is usually quite a bit different from the write model in that it is optimised for reading so it may include denormalized data or simply be in a data store that is more suited to read models.
The main benefit of CQRS is around being able to use different data models and/or different databases for queries vs. updates. If they are using the same data model, there's often not much benefit (at least not with a DB like SQL Server which is, at most scales, reasonable for both) to CQRS.
This in turn implies that it's generally not possible to just have the two databases automatically be in sync, because there's going to be some model translation involved (e.g. from a relational DB (with a normalized schema) like SQL Server to a denormalized document DB like Mongo).
One fairly common pattern is to have the software which writes to the DB also publish events describing what was updated to some event bus. Another piece of software subscribes to those events and performs the appropriate updates to the read DB. Note that this implies the existence of a period of time where queries against the read DB and the write DB will give different results.
After going through multiple stack overflow posts and blog articles, I came to this decision that we need UnitOfWork design pattern to maintain transactional integrity while writing the domain objects to their respective repositories.
However, we do not need such integrity while reading/searching the repository. Given that, is it a good design to separate the purposes of repositories and unit of works, with the former to be used only for reading domain objects and the later to be used only to create/write/refresh/delete domain objects?
Eric Evans, Domain Driven Design:
Implementation (of a repository) will vary greatly, depending on the technology being used for persistence and the infrastructure you have. The ideal is to hide all the inner workings from the client (although not from the developer of the client), so that the client code ill be the same whether the data is stored in an object database, a relational database, or simply held in memory....
The possibilities of implementation are so diverse that I can only list some concerns to keep in mind....
Leave transaction control to the client. Although the REPOSITORY will insert and delete from the database, it will ordinarily not commit anything. It is tempting to commit after saving, for example, but the client presumably has the context to correctly initiate and commit units of work. Transaction management will be simpler if the REPOSITORY keeps its hands off.
That said; I call your attention in particular to an important phrase in the above discussion: the database. The underlying assumption here being that all of the aggregates being modified are stored in such a way that the unit of work can be committed atomically.
When that's not the case -- for example, if you are storing aggregates in a document store that doesn't promise atomic updates of multiple documents, then you may want to consider making this separation explicit in your model, rather than trying to disguise the fact that you are trying to coordinate multiple commits.
It is entirely reasonable to use one set of repositories for your read use cases, which are distinct from those used in your write use cases. In other words, when we have different semantics, then we should have a different interface, the implementations of which can be tuned as necessary.
I'm rather used to use one database alone (say PostgreSQL or ElasticSearch).
But currently I'm using a mix (PG and ES) in a prototype app and may throw other kind of dbs in the mix (eg: redis).
Say some piece of data need to be persisted to each databases in a different way.
How do you keep a system consistent in the event of a failure on one of the components/databases ?
Example scenario that i'm facing:
Data update on PostgreSQL, ElasticSearch is unavailable.
At this point, the system is inconsistent, as I should have updated both databases.
As I'm using an SQL db, I can simply abort the transaction to put the system in its previous consistent state.
But what is the best way to keep the system consistent ?
Check everytime that the value has been persisted in all databases ?
In case of failure, restore the previous state ? But in some NoSQL databases there is no transaction/ACID mechanism, so I can't revert as easily the previous state.
Additionnaly, if multiple databases must be kept in sync, is there any good practice to have, like adding some kind of "version" metadata (whether a timestamp or an home made incrementing version number) so you can put your databases back in sync ? (Not talking about CouchDB where it is built-in!)
Moreover, the databases are not all updated atomically so some part are inconsistent for a short period. I think it depends on the business of the app but does anyone have some thought about the problem that my occur or the way to fix that ? I guess it must be tough and depends a lot of the configuration (for maybe very few real benefits).
I guess this may be a common architecture issue but I'm having trouble to find information on the subject.
Keep things simple.
Search engine can and will lag behind sometimes. You may fight it. You may embrace it. It's fine, and most of the times its acceptable.
Don't mix the data. If you use Redis for sessions - good. Don't store stuff from database A in B and vice versa.
Select proper database with ACID and strong consistency for your Super Important Business Data™®.
Again, do not mix the data.
Using more than one database technology in one product is a decision one shouldn't make light-hearted. The more technologies you use the more complex your project will become in development, deployment, maintenance and administration. Also, every database technology will become an individual point of failure. That means it is often much wiser to stick to one technology, even when it means that you need to make some compromises.
But when you have good(!) reason to use multiple DBMS, you should try to keep them as separated as possible. Avoid placing related data spanning multiple databases. When possible, no feature should require more than one DBMS to work (preferably a failure of the DBMS would only affect those features which use it). Storing redundant data in two different DBMS should also be avoided.
When you can't avoid redundancies and relationships spanning multiple DBMS, you should decide on one system to be the single source of truth (preferably one which you trust most regarding consistency). When there are inconsistencies between systems, they should be resolved by synchronizing the data with the SSOT.
In one of my process I have this SQL query that take 10-20% of the total execution time. This SQL query does a filter on my Database, and load a list of PricingGrid object.
So I want to improve these performance.
So far I guessed 2 solutions :
Use a NoSQL solution, AFAIK these are good solutions for improving reading process.
But the migration seems hard and needs a lot of work (like import the data from sql server to nosql in a regular basis)
I don't have any knowledge , I even don't know which one I should use (the first I'd use is Ravendb because I follow ayende and it's done by the .net community).
I might have some stuff to change in my model to make my object ok for a nosql database
Load all my PricingGrid object in memory (in a static IEnumerable)
This might be a problem when my server won't have enough memory to load everything
I might reinvent the wheel (indexes...) invented by the NoSQL providers
I think I'm not the first one wondering this, so what would be the best solution ? Is there any tools that could help me ?
.net 3.5, SQL Server 2005, windows server 2005
Migrating your data from SQL is only the first step.
Moving to a document store (like RavenDB or MongoDB) also means that you need to:
Denormalize your data
Perform schema validation in your code
Handle concurrency of complex operations in your code since you no longer have transactions (at least not the same way)
Perform rollbacks in the event of partial commits (changes)
Depending on your updates, reads and network model you might also need to handle conflicts
You provided very limited information but it sounds like your needs include a single database server and that your data fits well in the relational model.
In such a case I would vote against a NoSQL solution, it is more likely that you can speed up your queries with database optimizations and still retain all the added value of a RDBMS.
Non-relational databases are tools for a specific job (no matter how they sell them), if you need them it is usually because your data doesn't fit well in the relational model or if you have a need to distribute your data over multiple machines (size or availability). For instance, I use MongoDB for a write-intensive high throughput job management application. It is centralized and the data is very transient so the "cost" of having low durability is acceptable. This doesn't sound like the case for you.
If prefer to use a NoSQL solution perhaps you should try using Memcached+MySQL (InnoDB) this will allow you to get the speed benefits of an in-memory cache (in the form of a memcached daemon plugin) with the underlying protection and capabilities of an RDBMS (MySQL). It should also ease data migration and somewhat reduce the amount of changes required in your code.
I myself have never used it, I find that I either need NoSQL for the reasons I stated above or that I can optimize the RDBMS using stored procedures, indexes and table views in a way which is sufficient for my needs.
Asaf has provided great information in regards to the usage of NoSQL and when it is most appropriate. Given that your main concern was performance, I would tend to agree with his opinion - it would take you much more time and effort to adopt a completely new (and very different) data persistence platform than it would to trick out your SQL Server cluster. That said, my answer is mainly to address the "how" part of your question.
Addressing misunderstandings:
Denormalizing Data - You do not need to manually denormalize your existing data. This will be done for you when it is migrated over. More than anything you need to simply think about your data in a different fashion - root aggregates, entity and value types, etc.
Concurrency/Transactions - Transactions are possible in both Mongo and Raven, they are simply done in a different fashion. One of the inherent ways Raven does this is by using an ORM-like "unit of work" pattern with its RavenSession objects. Yes, your data validation needs to be done in code, but you already should be doing it there anyway. In my experience this is an over-hyped con.
How:
Install Raven or Mongo on a primary server, run it as a service.
Create or extend an existing application that uses the database you intend to port. This application needs all the model classes/libraries that your SQL database provides persistence for.
a. In your "data layer" you likely have a repository class somewhere. Extract an interface form this, and use it to build another repository class for your Raven/Mongo persistence. Both DB's have plenty good documentation for using their APIs to push/pull/update changes in the document graphs. It's pretty damn simple.
b. Load your SQL data into C# objects in memory. Pull back your top-level objects (just the entities) and load their inner collections and related data in memory. Your repository is probably already doing this (ex. when fetching an Order object, ensure not only its properties but associated collections like Items are loaded in memory.
c. Instantiate your Raven/Mongo repository and push the data to it. Primary entities become "top level documents" or "root aggregates" serialized in JSON, and their collections' data nested within. Save changes and close the repository. Note: You may break this step down into as many little pieces as your data deems necessary.
Once your data is migrated, play around with it and ensure you are satisfied. You may want to modify your application Models a little to adjust the way they are persisted to Raven/Mongo - for instance you may want to make both Orders and Items top-level documents and simply use reference values (much like relationships in RDBMS systems). Watch out here though, as doing so sort-of goes against the principal and performance behind NoSQL as now you have to tap the DB twice to get the Order and the Items.
If satisfied, shard/replicate your mongo/raven servers across your remaining available server boxes.
Obviously there are tons of little details I did not explain, but that is the general process, and much of it depends on the applications already consuming the database and may be tricky if more than one app/system talks to it.
Lastly, just to reiterate what Asaf said... learn as much as you can about NoSQL and its best use-cases. It is an amazing tool, but not golden solution for all data persistence. In your case try to really find the bottlenecks in your current solution and see if they are solvable. As one of my systems guys says, "technology for technology's sake is bullshit"
If there are any DBAs out there, I'm making a fairly large piece of software and one of the biggest issues presently is where to put the business logic. While Stored Procedures would be easier to fix on the fly, the processing requirements would probably slow the DB down tremendously. I also don't want to have all of the business logic handled by the application because I want it to be a "self-sustaining entity" that doesn't require the user-front end to operate.
My idea, is to create a service to run on a central server somewhere, and have the clients connect through that. The service would maintain all the business logic and serve as a front-end for all the database operations.
Ideas? Yes? No?
I'm willing to accept that I'm also missing some key concepts and need to read some literature.
i would suggest that you keep a keen eye on the difference between what you think of as business logic, and what are the referential integrity constraints.
Make sure all constraints that keep the data meaningfully related are in place at the database layer. i.e. if you need to cascade some deletes, or inserts - and when you need to validate some basic data values in order to have everything make sense... these should all be in the database.
Then decide if the Client, or the middle layer server, or the database is appropriate for any additional business logic.
What do you mean by "business logic"?
I've seen cases where aggregations and other set-based operations have been done in client code, as well as horrible RBAR operations in SQL that should be somewhere else.
SQL is one tool that has it's place: if you're working through large datasets, JOINs, aggregations etc then SQL is the place to do it. Anything else is slavish obedience to an SOA ideal.
My approach is to consider what the stored proc or SQL is doing: is it part of the middle tier to avoid set based operations in procedural code, or is it lower as pure data integrity/persistence?
If your business logic is 100% set based then you don't need a middle tier (edit: client code based) arguably, unless it's very thin.
Over the years, I've seen client applications come and go, but the database is still there.
So nowadays I use stored procedures for most of the business logic. Three big advantages:
Bug fix deployment takes an instant, with no downtime
Multi-user by default
Far less plumbing code (no data access layer)
Having all business logic on the server side is fine.
Not having it on the server side is fine, too.
In fact, it's up to you.
If a stored procedure tends to look not sql-ish, you can make a CLR stored procedure.
Here's a similar question.
I'd highly recommend a traditional n-layer approach, where you have at least UI layer, business layer (like a C# assembly or Java equivolent), and data access. See: http://en.wikipedia.org/wiki/Multitier_architecture.
I worked for a company where all the business logic was in the procs, and maintence costs are much higher than they had to be, it limited us to a specific version of sql server, it wasn't scalable, etc. In short, unless your application is a simple throw away kind of thing, I'd not put any business logic in the database.