I'm building an ORM, and try to find out what are the exact responsibilities of each pattern. Let's say I want to transfer money between two accounts, using the Unit Of Work to manage the updates in a single database transaction.
Is the following approach correct?
Get them from the Repository
Attach them to my Unit Of Work
Do the business transaction & commit?
Example:
from = acccountRepository.find(fromAccountId);
to = accountRepository.find(toAccountId);
unitOfWork.attach(from);
unitOfWork.attach(to);
unitOfWork.begin();
from.withdraw(amount);
to.deposit(amount);
unitOfWork.commit();
Should, as in this example, the Unit Of Work and the Repository be used independently, or:
Should the Unit Of Work use internally a Repository and have the ability to load objects?
... or should the Repository use internally a Unit Of Work and automatically attach any loaded entity?
All comments are welcome!
The short answer would be that the Repository would be using the UoW in some way, but I think the relationship between these patterns is less concrete than it would initially seem. The goal of the Unit Of Work is to create a way to essentially lump a group of database related functions together so they can be executed as an atomic unit. There is often a relationship between the boundaries created when using UoW and the boundaries created by transactions, but this relationship is more coincidence.
The Repository pattern, on the other hand, is a way to create an abstraction resembling a collection over an Aggregate Root. More often than not the sorts of things you see in a repository are related to querying or finding instances of the Aggregate Root. A more interesting question (and one which doesn't have a single answer) is whether it makes sense to add methods that deal with something other than querying for Aggregates. On the one hand there could be some valid cases where you have operations that would apply to multiple Aggregates. On the other it could be argued that if you're performing operations on more than one Aggregate you are actually performing a single action on another Aggregate. If you are only querying data I don't know if you really need to create the boundaries implied by the UoW. It all comes down to the domain and how it is modeled.
The two patterns are dealing at very different levels of abstraction, and the involvement of the Unit Of Work is going to be dependent on how the Aggregates are modeled as well. The Aggregates may want to delegate work related to persistence to the Entities its managing, or there could be another layer of abstraction between the Aggregates and the actual ORM. If your Aggregates/Entities are dealing with persistence themselves, then it may be appropriate for the Repositories to also manage that persistence. If not, then it doesn't make sense to include UoW in your Repository.
If you're wanting to create something for general public consumption outside of your organization, then I would suggest creating your Repository interfaces/base implementations in a way that would allow them to interact directly with your ORM or not depending on the needs of the user of your ORM. If this is internal, and you are doing the persistence work in your Aggregates.Entities, then it makes sense for your Repository to make use of your UoW. For a generic Repository it would make sense to provide access to the UoW object from within Repository implementations that can make sure it is initialized and disposed of appropriately. On that note, there will also be times when you would likely want to utilize multiple Repositories within what would be a single UoW boundary, so you would want to be able to pass in an already primed UoW to the Repository in that case.
I recommend you to use approach when repository uses UoW internally. This approach has some advantages, especially for web application.
In web application recommended pattern of using UoW is Unit of Work (session) per HTTP request. So if your repositories will share UoW, you will be able to use 1st level cache (using identity map) for object that were requested by other repositories (like data dictionaries that are referenced by multiple aggregates). Also you will have to commit only one transaction instead of multiple, so it will work much better in terms of the performance.
You could take a look at Hibernate/NHibernate source codes that are mature ORMs in Java/.NET world.
Good Question!
Depends on what your work boundaries are going to be. If they are going to span multiple repositories then you might have to create another abstraction to ensure that multiple repositories are covered. It would be like a small "service" layer that is defined in Domain Driven Design.
If your unit of work is going to be pretty much per Repository then I would go with the second option.
My question, however, to you would be, how can you worry about repository when writing an ORM? They are going to be defined and used by the consumers of your Unit of Work right? If so, you have no option but to just provide a Unit of Work and your consumers will have to enlist the repositories with your unit of work and will also be responsible for controlling the boundaries of unit of work. Isn't it?
Related
Is it possible to query the ORM of a microservice through its API, and use it as the ORM of another microservice?
E.g. Let's say I have the microservice A with its API (let's call it API_A), its DB (DB_A) and its internal Object-Relation Mapper instances (ORM_A) defining the correspondences between the the classes belonging to the microservice into the structure of the relational DB and managing its access.
Now imagine I want to have a microservice B, with different functionalities respect to A, although with the same ORM as A (and so a DB with the same structure of DB_A, although not necessarily with the same data, as the different functionalities may produce different data).
How do I query/copy/mirror ORM_A into the microservice B in a smart way, so that I have no code duplication and when A changes, also ORM_B changes accordingly with no manual intervention?
Is there the option to query ORM_A into B via its API, and recreate it in the microservice B?
The idea that code changes inside API_A could yield code changes inside API_B creates a coupling between the services and their data that would suggest they shouldn't be two different services.
If API_B does in fact perform wildly different functions than API_A and only needs a few pieces of data from structures surfaced by API_A, you should consider a couple different options to ensure the relevant data is accessible to API_B from API_A:
Surface the data from API_A in an endpoint that is accessible to API_B. This creates an API contract that is easier to enforce and test. This solution is relatively easy to implement, but creates some dependency relationships between the two APIs.
Setup an event topic that you can notify whenever API_A writes data that API_B (or other services) might want to consume. By reading these events, API_B can write the relevant data to its own DB in its own format to avoid coupling with A either by data structures or contracts. This solution requires the creation of event queues, but would be the best solution for the performance of API_B.
One thing that I've seen people struggle with when adopting microservices (I struggled with it myself) was the idea that data duplication is ok. Try not to get stuck thinking of data as relational across multiple services because that's how you'll naturally create the kind of coupling that you'll want to avoid in microservices. Good luck!
I've read this question about something similar but it didn't quite solve my problem.
I have an application where I'm required to use data from an API. Problem is there are performance and technical limitations to doing this. The performance limitations are obvious. The technical limitations lie in the fact that the API does not support some of the more granular queries I need to make.
I decided to use MySQL as a queryable cache.
Since the data I needed to retrieve from the API did not change very often, I settled on refreshing the cache once a day, so I didn't need any complicated mapper that checked if we had the data in the cache and if not fell back to the API. That was my first design, but I realized that wasn't very practical when the API couldn't support most of the queries I needed to make anyway.
Now I have a set of two mappers for every aggregate. One for MySQL and one for the API.
My problem is now how I hide the complexities of persistence from the domain, and the fact that it seems that I need multiple repositories.
Ideally I would have an interface that both mappers adhered to, but as previously disclosed that's not possible.
Is it okay to have multiple repositories, one for each mapper?
Is it okay to have more than one repository for an aggregate in DDD?
Short answer: yes.
Longer answer: you won't find any suggestion of multiple repository in the original book by Evans. As he described things, the domain model would have one representation of the aggregate, and the repository abstraction provided consumers with the illusion that the aggregate was stored in an in-memory collection.
Largely, this makes sense -- you are trying to ensure that writes to data within the aggregate boundary are consistent, so you need a single authority for change.
But... there's no particular reason that reads need to travel through the same code path as writes. Welcome to the world of cqrs. What that gives you immediately is the idea that the in memory representation for reads might need to be optimized differently from the in memory representation used for writes.
In its more general form, you get the idea that the concept that you are modeling might have different representations for each use case.
For your case, where it is sometimes appropriate to read from the RDBMS, sometimes from the API, sometimes both, this isn't quite an exact match -- the repository interface hides the implementation details from the consumer, but you still have to bother with the implementation.
One thing you might look at is your requirements; how fresh does the data need to be in each use case? A constraint that is often relaxed in the CQRS pattern is the idea that the effects of writes are immediately available for reading. The important question to ask would be, if the data hasn't been cached yet, can you simply report "data not available" without hitting the API?
If so, then use cases that access the cached data need only a single repository implementation.
If you are using external API to read and modify data, you can cache them locally to be faster in reads, but I would avoid to have a domain repository.
From the domain perspective it seems that you need a service to query (or just a Query in CQRS implementation) for some data, that you can do with a service, that internally can call some remote API or read from a local cache (mysql, whatever).
When you read your local cache you can develop a repository to decouple your logic from the db implementation, but this is a different concept from a domain repository, it is just a detail of your technical implementation, that has nothing to do with your domain.
If the remote service start offering the query you need you will change the implementation of how your query is executed, calling the remote API instead of the db, but your domain model should not change.
A domain repository is used to load and persist your aggregates, meanwhile if you are working with external aggregates (in a different context, subdomain) you need to interact with them using services.
After going through multiple stack overflow posts and blog articles, I came to this decision that we need UnitOfWork design pattern to maintain transactional integrity while writing the domain objects to their respective repositories.
However, we do not need such integrity while reading/searching the repository. Given that, is it a good design to separate the purposes of repositories and unit of works, with the former to be used only for reading domain objects and the later to be used only to create/write/refresh/delete domain objects?
Eric Evans, Domain Driven Design:
Implementation (of a repository) will vary greatly, depending on the technology being used for persistence and the infrastructure you have. The ideal is to hide all the inner workings from the client (although not from the developer of the client), so that the client code ill be the same whether the data is stored in an object database, a relational database, or simply held in memory....
The possibilities of implementation are so diverse that I can only list some concerns to keep in mind....
Leave transaction control to the client. Although the REPOSITORY will insert and delete from the database, it will ordinarily not commit anything. It is tempting to commit after saving, for example, but the client presumably has the context to correctly initiate and commit units of work. Transaction management will be simpler if the REPOSITORY keeps its hands off.
That said; I call your attention in particular to an important phrase in the above discussion: the database. The underlying assumption here being that all of the aggregates being modified are stored in such a way that the unit of work can be committed atomically.
When that's not the case -- for example, if you are storing aggregates in a document store that doesn't promise atomic updates of multiple documents, then you may want to consider making this separation explicit in your model, rather than trying to disguise the fact that you are trying to coordinate multiple commits.
It is entirely reasonable to use one set of repositories for your read use cases, which are distinct from those used in your write use cases. In other words, when we have different semantics, then we should have a different interface, the implementations of which can be tuned as necessary.
In DDD, aggregate roots are persisted via repositories. But are repositories the only classes that can touch persistence in a bounded context?
I am using CQRS along side DDD. In the query side, things like view count, upvotes, these things need to be persisted but I feel it is awkward to model them as aggregate roots. I am limiting DDD aggregate root modeling to the command side. The query side is not allowed to use repositories. But often query side asks for small amount of persistence capabilities.
Also, I am using domain events, certain domain events also need to be persisted. I need something called event storage, but I only heard such terms appear in event sourcing (ES) and I am not using ES.
If such persistent classes are indeed needed. How do I call them, which layer should they belong to?
[Update]
When I read answers below, I realized my question is a bit ambiguous. By touch, I mainly mean write (and also including read).
Thanks.
In the query side, things like view count, upvotes, these things need
to be persisted
Not necessarily. CQRS doesn't specify
whether the read model should be materialized in its own database
how the read model is updated
The simplest CQRS implementation is one where the query side and command side use the same tables. The persistent source for Read Models could also be SQL (materialized) views based on these tables. If you do have a separate database for reads, it can be kept up-to-date by additional Command Handlers or sub-handlers or Event Handlers that operate after the command has been executed.
You can see a minimalist - yet perfectly CQRS compliant - implementation here : https://github.com/gregoryyoung/m-r/tree/master/SimpleCQRS
But are repositories the only classes that can touch persistence in a
bounded context?
No, in a CQRS context, Read Model Facades (a.k.a. read side repos) can also read from it and your read model update mechanism write to it.
Also, I am using domain events, certain domain events also need to be
persisted. I need something called event storage, but I only heard
such terms appear in event sourcing (ES) and I am not using ES.
Event stores are the main storage technology of event-sourced systems. You could use them to store a few domain events on the side in a non-ES application, but they may be overkill and too complex for the task. It depends if you need all the guarantees they offer in terms of delivery, consistency, concurrency/versioning, etc. Otherwise, a regular RDBMS or NoSQL store can do the trick.
First, you need to think about your object model independantly of how you will store it in the database. You're designing an object model. Forget about the database for a moment.
You're saying that you don't want view counts or upvotes to be aggregate roots. That means you want to put them in an aggregate with some other objects. One of those objects is the aggregate root.
Without knowing more about your model, it's hard to say what you could do with more details, but the basic way would be to persist the aggregate root with the corresponding repository. The repository is not only responsible of storing the aggregate root, but the entire aggregate, following the relationships.
Think about the other side, when you are using the repository to retrieve an entity. You get an instance of your aggregate root, but if you follow the relationships, you also have all those other objects. It's perfectly logical that when you save an entity, all those other objects are saved too.
I don't know which technology you're using, but you should write your repository so that it does that.
also, why is the query side not allowed to use repositories ? Repositories are not only used to save data. They are also used to retrieve it. How are you retrieving objects without repositories (even if you don't modify them ?)
I have been looking through a few different threads all over the interwebs and either I can't see that the proposed solution will work for me, or their particular situations are not the same as the one I am in.
Currently with have about 8 or so different self contained databases, each sitting behind yet another self contained website (asp.net webforms). All of the databases are very small, and serve a very particular purpose. That said, none of the schemas and designs really match up in a reasonable way. There are various GUIDs that identified a user that will all EVENTUALLY map to a single guid for that user, but the mapping between them are different, and sometimes you have to hop between a couple databases to get what you need from a third. It's just a bit messy.
I would like to refactor completely to put them all in one database to reduce confusion and duplicate data, but due to push-back the solution will have to be use what we have.
What I want to do them is create a layer where all these databases will be accessible in ONE application (maybe fire up a new site and put it there). A few questions regarding this:
Is there a simple solution where I can leverage Entity Framework on each database and have them all sit in the same application?
Would something like a WCF service where I map every CRUD operation that needs to be done for each database be an efficient solution?
Each DBContext in Entity Framework uses a single database, so your solution here must use one for each database. What I usually do is create an IContext around the DBContext that implements the CRUD operations, then build a Repository that performs the "business logic" (like RegisterUser) my manipulating multiple contexts. The IContext should represent a set of related "database objects" (such as IProductContext for the product database). For now, you would probably just have one context per database. As databases are consolidated, just change the connection string of the affected context(s) to the new database and you can get back up and running without much (if any) code changes.
Here's a good intro I found on what I'm talking about, though I think this calls my "Contexts" Repositories" and my "Repositories" "Unit of Work".
http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/implementing-the-repository-and-unit-of-work-patterns-in-an-asp-net-mvc-application
That said, you could probably write one WCF service that abstracts away multiple EF DBContexts, then you can change things on the backend while continuing to use the same webservice.