After going through multiple stack overflow posts and blog articles, I came to this decision that we need UnitOfWork design pattern to maintain transactional integrity while writing the domain objects to their respective repositories.
However, we do not need such integrity while reading/searching the repository. Given that, is it a good design to separate the purposes of repositories and unit of works, with the former to be used only for reading domain objects and the later to be used only to create/write/refresh/delete domain objects?
Eric Evans, Domain Driven Design:
Implementation (of a repository) will vary greatly, depending on the technology being used for persistence and the infrastructure you have. The ideal is to hide all the inner workings from the client (although not from the developer of the client), so that the client code ill be the same whether the data is stored in an object database, a relational database, or simply held in memory....
The possibilities of implementation are so diverse that I can only list some concerns to keep in mind....
Leave transaction control to the client. Although the REPOSITORY will insert and delete from the database, it will ordinarily not commit anything. It is tempting to commit after saving, for example, but the client presumably has the context to correctly initiate and commit units of work. Transaction management will be simpler if the REPOSITORY keeps its hands off.
That said; I call your attention in particular to an important phrase in the above discussion: the database. The underlying assumption here being that all of the aggregates being modified are stored in such a way that the unit of work can be committed atomically.
When that's not the case -- for example, if you are storing aggregates in a document store that doesn't promise atomic updates of multiple documents, then you may want to consider making this separation explicit in your model, rather than trying to disguise the fact that you are trying to coordinate multiple commits.
It is entirely reasonable to use one set of repositories for your read use cases, which are distinct from those used in your write use cases. In other words, when we have different semantics, then we should have a different interface, the implementations of which can be tuned as necessary.
Related
Context:
Microservices architecture, DDD, CQRS, Event driven.
SQL database.
I have an use case, where I have to store a record when a entity state is updated. I'm afraid that the quantity of records could be huge, and I was thinking that maybe an sql table is not the right place to store it. Also this records are used every now and then, and probably not by the service domain.
Could be a good practice to store it in another database(firestore, mongo, cassandra...) so it doesn't affect the performance and the scope of this service?
Thanks!
Could be a good practice to store it in another database(firestore, mongo, cassandra...) so it doesn't affect the performance and the scope of this service?
Part of the benefit of using microservices is that you are hiding implementation details. As such, you are able to store/process data by whatever means is required or available without the need to broadcast that implementation to external services.
That said, from a technical standpoint, it is worth considering transaction boundaries. When writing to a single database, it is possible to commit transactions easily. Once you are writing to different databases within the same transaction, you can run into situations where one write might succeed while another one might fail.
My recommendation is to make sure you write to only one of those databases at a time. Use a two-phase commit to ensure that the second database is written to. In this way, you can avoid lost data and get the benefit of using a more efficient data store.
I've read this question about something similar but it didn't quite solve my problem.
I have an application where I'm required to use data from an API. Problem is there are performance and technical limitations to doing this. The performance limitations are obvious. The technical limitations lie in the fact that the API does not support some of the more granular queries I need to make.
I decided to use MySQL as a queryable cache.
Since the data I needed to retrieve from the API did not change very often, I settled on refreshing the cache once a day, so I didn't need any complicated mapper that checked if we had the data in the cache and if not fell back to the API. That was my first design, but I realized that wasn't very practical when the API couldn't support most of the queries I needed to make anyway.
Now I have a set of two mappers for every aggregate. One for MySQL and one for the API.
My problem is now how I hide the complexities of persistence from the domain, and the fact that it seems that I need multiple repositories.
Ideally I would have an interface that both mappers adhered to, but as previously disclosed that's not possible.
Is it okay to have multiple repositories, one for each mapper?
Is it okay to have more than one repository for an aggregate in DDD?
Short answer: yes.
Longer answer: you won't find any suggestion of multiple repository in the original book by Evans. As he described things, the domain model would have one representation of the aggregate, and the repository abstraction provided consumers with the illusion that the aggregate was stored in an in-memory collection.
Largely, this makes sense -- you are trying to ensure that writes to data within the aggregate boundary are consistent, so you need a single authority for change.
But... there's no particular reason that reads need to travel through the same code path as writes. Welcome to the world of cqrs. What that gives you immediately is the idea that the in memory representation for reads might need to be optimized differently from the in memory representation used for writes.
In its more general form, you get the idea that the concept that you are modeling might have different representations for each use case.
For your case, where it is sometimes appropriate to read from the RDBMS, sometimes from the API, sometimes both, this isn't quite an exact match -- the repository interface hides the implementation details from the consumer, but you still have to bother with the implementation.
One thing you might look at is your requirements; how fresh does the data need to be in each use case? A constraint that is often relaxed in the CQRS pattern is the idea that the effects of writes are immediately available for reading. The important question to ask would be, if the data hasn't been cached yet, can you simply report "data not available" without hitting the API?
If so, then use cases that access the cached data need only a single repository implementation.
If you are using external API to read and modify data, you can cache them locally to be faster in reads, but I would avoid to have a domain repository.
From the domain perspective it seems that you need a service to query (or just a Query in CQRS implementation) for some data, that you can do with a service, that internally can call some remote API or read from a local cache (mysql, whatever).
When you read your local cache you can develop a repository to decouple your logic from the db implementation, but this is a different concept from a domain repository, it is just a detail of your technical implementation, that has nothing to do with your domain.
If the remote service start offering the query you need you will change the implementation of how your query is executed, calling the remote API instead of the db, but your domain model should not change.
A domain repository is used to load and persist your aggregates, meanwhile if you are working with external aggregates (in a different context, subdomain) you need to interact with them using services.
Consider the following situation:
There is an update request on Entity A, to create sub-entity A.B. there might be many B's on A, each B has unique email address.
The entity A is a shared entity, and the same request can happen in multiple servers in parallel (scalable micro-service).
In order to create A.B we have to verify that B does not already exist as sub entity on A (according to B's email address).
The service which handles this update request should lock A(by it's unique id) in order to make the update safe.
My questions are more conceptual than technical:
Does locking the resource A in this case is part of the business logic of this update task?
Would you consider putting the resource lock in a separate middleware than the one which handles the verify and update procedure?
(the other option is to treat the lock as part of the business logic and put it directly in the middleware responsible for the business logic.)
The technical implementation of the chosen solution to contention problems is obviously not business logic, but choosing the right solution requires business knowledge.
What I mean by this is that you must understand how the business works in order to determine the right approach to protect the integrity of the data in concurrency scenarios. How often concurrency conflicts will occur? Can conflicts be resolved automatically? What should be conflicting? Not only that, but the business may very well accept eventual consistency over strong consistency.
In short, the mechanisms put in place to protect the data integrity in concurrency scenarios shouldn't be part of the domain. These would probably go either in the application service layer or in the infrastructure layer, but the business experts must be involved in the discussions regarding how concurrency conflicts should be resolved and how these affects the business.
Locking is not a business related issue (unless your business is building distributed databases), and so should never be considered part of the business logic.
Further, you should not be implementing distributed locking yourself, but should be relying on a packaged solution, that is preferably part of your data persistence solution.
Here's an article on how to do this with Redis discussing an algorithm called Redlock. Here's a blog post linking to articles about building concensus in Cassandra. And, here's a link about concurrency in Mongo. As you'll see from these articles, distributed locking is a big and complex issue that you probably don't want to tackle yourself.
In DDD, aggregate roots are persisted via repositories. But are repositories the only classes that can touch persistence in a bounded context?
I am using CQRS along side DDD. In the query side, things like view count, upvotes, these things need to be persisted but I feel it is awkward to model them as aggregate roots. I am limiting DDD aggregate root modeling to the command side. The query side is not allowed to use repositories. But often query side asks for small amount of persistence capabilities.
Also, I am using domain events, certain domain events also need to be persisted. I need something called event storage, but I only heard such terms appear in event sourcing (ES) and I am not using ES.
If such persistent classes are indeed needed. How do I call them, which layer should they belong to?
[Update]
When I read answers below, I realized my question is a bit ambiguous. By touch, I mainly mean write (and also including read).
Thanks.
In the query side, things like view count, upvotes, these things need
to be persisted
Not necessarily. CQRS doesn't specify
whether the read model should be materialized in its own database
how the read model is updated
The simplest CQRS implementation is one where the query side and command side use the same tables. The persistent source for Read Models could also be SQL (materialized) views based on these tables. If you do have a separate database for reads, it can be kept up-to-date by additional Command Handlers or sub-handlers or Event Handlers that operate after the command has been executed.
You can see a minimalist - yet perfectly CQRS compliant - implementation here : https://github.com/gregoryyoung/m-r/tree/master/SimpleCQRS
But are repositories the only classes that can touch persistence in a
bounded context?
No, in a CQRS context, Read Model Facades (a.k.a. read side repos) can also read from it and your read model update mechanism write to it.
Also, I am using domain events, certain domain events also need to be
persisted. I need something called event storage, but I only heard
such terms appear in event sourcing (ES) and I am not using ES.
Event stores are the main storage technology of event-sourced systems. You could use them to store a few domain events on the side in a non-ES application, but they may be overkill and too complex for the task. It depends if you need all the guarantees they offer in terms of delivery, consistency, concurrency/versioning, etc. Otherwise, a regular RDBMS or NoSQL store can do the trick.
First, you need to think about your object model independantly of how you will store it in the database. You're designing an object model. Forget about the database for a moment.
You're saying that you don't want view counts or upvotes to be aggregate roots. That means you want to put them in an aggregate with some other objects. One of those objects is the aggregate root.
Without knowing more about your model, it's hard to say what you could do with more details, but the basic way would be to persist the aggregate root with the corresponding repository. The repository is not only responsible of storing the aggregate root, but the entire aggregate, following the relationships.
Think about the other side, when you are using the repository to retrieve an entity. You get an instance of your aggregate root, but if you follow the relationships, you also have all those other objects. It's perfectly logical that when you save an entity, all those other objects are saved too.
I don't know which technology you're using, but you should write your repository so that it does that.
also, why is the query side not allowed to use repositories ? Repositories are not only used to save data. They are also used to retrieve it. How are you retrieving objects without repositories (even if you don't modify them ?)
I'm building an ORM, and try to find out what are the exact responsibilities of each pattern. Let's say I want to transfer money between two accounts, using the Unit Of Work to manage the updates in a single database transaction.
Is the following approach correct?
Get them from the Repository
Attach them to my Unit Of Work
Do the business transaction & commit?
Example:
from = acccountRepository.find(fromAccountId);
to = accountRepository.find(toAccountId);
unitOfWork.attach(from);
unitOfWork.attach(to);
unitOfWork.begin();
from.withdraw(amount);
to.deposit(amount);
unitOfWork.commit();
Should, as in this example, the Unit Of Work and the Repository be used independently, or:
Should the Unit Of Work use internally a Repository and have the ability to load objects?
... or should the Repository use internally a Unit Of Work and automatically attach any loaded entity?
All comments are welcome!
The short answer would be that the Repository would be using the UoW in some way, but I think the relationship between these patterns is less concrete than it would initially seem. The goal of the Unit Of Work is to create a way to essentially lump a group of database related functions together so they can be executed as an atomic unit. There is often a relationship between the boundaries created when using UoW and the boundaries created by transactions, but this relationship is more coincidence.
The Repository pattern, on the other hand, is a way to create an abstraction resembling a collection over an Aggregate Root. More often than not the sorts of things you see in a repository are related to querying or finding instances of the Aggregate Root. A more interesting question (and one which doesn't have a single answer) is whether it makes sense to add methods that deal with something other than querying for Aggregates. On the one hand there could be some valid cases where you have operations that would apply to multiple Aggregates. On the other it could be argued that if you're performing operations on more than one Aggregate you are actually performing a single action on another Aggregate. If you are only querying data I don't know if you really need to create the boundaries implied by the UoW. It all comes down to the domain and how it is modeled.
The two patterns are dealing at very different levels of abstraction, and the involvement of the Unit Of Work is going to be dependent on how the Aggregates are modeled as well. The Aggregates may want to delegate work related to persistence to the Entities its managing, or there could be another layer of abstraction between the Aggregates and the actual ORM. If your Aggregates/Entities are dealing with persistence themselves, then it may be appropriate for the Repositories to also manage that persistence. If not, then it doesn't make sense to include UoW in your Repository.
If you're wanting to create something for general public consumption outside of your organization, then I would suggest creating your Repository interfaces/base implementations in a way that would allow them to interact directly with your ORM or not depending on the needs of the user of your ORM. If this is internal, and you are doing the persistence work in your Aggregates.Entities, then it makes sense for your Repository to make use of your UoW. For a generic Repository it would make sense to provide access to the UoW object from within Repository implementations that can make sure it is initialized and disposed of appropriately. On that note, there will also be times when you would likely want to utilize multiple Repositories within what would be a single UoW boundary, so you would want to be able to pass in an already primed UoW to the Repository in that case.
I recommend you to use approach when repository uses UoW internally. This approach has some advantages, especially for web application.
In web application recommended pattern of using UoW is Unit of Work (session) per HTTP request. So if your repositories will share UoW, you will be able to use 1st level cache (using identity map) for object that were requested by other repositories (like data dictionaries that are referenced by multiple aggregates). Also you will have to commit only one transaction instead of multiple, so it will work much better in terms of the performance.
You could take a look at Hibernate/NHibernate source codes that are mature ORMs in Java/.NET world.
Good Question!
Depends on what your work boundaries are going to be. If they are going to span multiple repositories then you might have to create another abstraction to ensure that multiple repositories are covered. It would be like a small "service" layer that is defined in Domain Driven Design.
If your unit of work is going to be pretty much per Repository then I would go with the second option.
My question, however, to you would be, how can you worry about repository when writing an ORM? They are going to be defined and used by the consumers of your Unit of Work right? If so, you have no option but to just provide a Unit of Work and your consumers will have to enlist the repositories with your unit of work and will also be responsible for controlling the boundaries of unit of work. Isn't it?