Why are repositories only used for aggregates in Domain-Driven Design? - oop

In DDD, repositories are used to perform serialization and de-serialization of aggregates, e.g. by reading and writing to a database. That way, aggregates can contain purer business logic, and won't be coupled to non-domain-specific persistence strategies.
However, I wonder why repositories are always described as being used for aggregates specifically. Isn't it equally motivated to use it for all entities?
(If this is only a matter of the fact that all plain entities can be seen as aggregate roots with zero children, please notify me of this, and the question can be buried.)

I wonder why repositories are always described as being used for aggregates specifically. Isn't it equally motivated to use it for all entities?
Because aggregates are the consistency boundaries exposed to the application layer.
Which is to say that, yes, the repositories are responsible for taking the snapshot of state from the data store, and building from it the graph of entities and values that make up the aggregate.
The API of the repository only exposes an aggregate root, because that defines the consistency boundary. Instead of allowing the application to reach into an arbitrary location in the graph and make changes, we force the application to communicate with the root object exclusively. With this constraint in place, we only need to look in one place to ensure that all changes satisfy the business invariant.
So there's no need to develop a repository for each type of entity in your model, because the application isn't allowed to interact directly with the model on that fine a grain.
Put another way, the entities within the aggregate are private data structures. We don't allow the client code to manipulate the entities directly for the same reason that we don't implement lists that allow the clients to reach past the api and manipulate the pointers directly.
In cqrs, you do see "repositories" that are used for things other than aggregates -- repositories can also be used to look up cached views of the state of the model. The trick is that the views don't support modification. In the approach that Evans describes, each entity has one single representation that fulfills all of its roles. In CQRS, and entity may have different representations in each role, but typically only a single role that supports modifying the entity.

In DDD there are two kind of entities: Aggregate roots and nested entities. As #VoiceOfUnreason answered, you are not allowed to modify the nested entities from outside an Aggregate so there is no need to have a repository for them (by "repository" I'm refering to an interface for load and persist an entities state). If you would be allowed, it would break the Aggregate's encapsulation, one if the most important things in OOP. Encapsulation helps in rich domains, with lots and lots of models where DDD is a perfect fit.


What should repositories in DDD return

I researched about repositories in DDD and found too much different thing. Everyone says different things about repositories and that made me confused.
I want to know:
What methods should repositories contain?
What should repositories definitely (or closer that) return?
For each aggregate root (AR) you should have a repository. As a minimum the repository would probably have a void Save(Aggregate aggregate) and a Aggregate Get(Guid id) method. The returned aggregate would always be fully constituted.
I sometimes add methods for specific use cases in order to update only certain bits of data. For instance, something like void Activate(Guid id) or some such. This is simply to avoid manipulating more data than is necessary.
Querying on a repository is usually problematic since you should typically avoid querying your domain. For such scenarios my recommendation is to use a query mechanism that is closer to the data and in a more raw format than a domain object or object graph. The query mechanism would more-then-likely return primitives, such as int Count(Query.Specification specification) or perhaps return a list of read model instances.
You are right, a repository has different meanings in different contexts - and many authors have their own interpretation. The way I understand them is from multiple perspectives:
They abstract away underline storage type
They can introduce interface closer to the domain model
They represent a collection of objects and thus serve as aggregate
in-memory storage(collection of related objects)
They represent a transaction boundary for related objects.
They can't contain duplicates - like sets.
It is valid for the repository to contain only one object, without
complex relations internally
So to answer your questions, repositories should contain collection related methods like add, remove, addAll, findByCriteria - instead of save, update, delete. They can return whole aggregate or parts of aggregates or some internal aggregate relation - it is dependent on your domain model and the way you want to represent objects
Eric Evans coined "domain driven design" in 2003; so the right starting point for any definitions in that context is his book. He defines the repository pattern in chapter 6 ("Lifecycle of a Domain Object").
A REPOSITORY represents all objects of a type as a conceptual set (usually emulated). It acts like a collection, except with more elaborate querying ability. Objects of the appropriate type are added and removed, and the machinery behind the repository inserts them or deletes them from the database.
For each type of object that requires global access, create an object that can provide the illusion of an in-memory collection of all objects of that type.
The primary use case of a repository: given a key, return the correct root entity. The repository implementation acts as a module, which hides your choice of persistence strategy (see: Parnas 1971).

Usage of data entities to exchange between client and microservices (asp.core)

Usually i create dto's to get data from microservice (WebApi) to client (MVC).
But sometimes it's cumbersome to duplicate structure of data entity to dto, especially if entity has multiple fields and many embedded relationships.
So i have to duplicate fields and relations.
Can i use data entity instead of dto?
I use a special assembly for dto's to be exchanged between client (MVC) and given microservice. Should my data entities live in this assembly?
This is a common complaint that derives from not understanding the concept of bounded contexts. Because you're deep in the code, you just see two things that look like the same thing, and you have had, like all developers, the idea beaten into your brain that you should not repeat yourself (DRY).
However, the key word above is that the two things look the same. They are in fact not the same, and that is the critical salient point. They are representations of domain objects from different contexts (data store and application layer, for example). If you use the same object, you are tightly coupling those contexts to the point where they're now inseparable. As such, the very concept of having multiple layers becomes moot.
A related concept here is anti-corruption layers. This is a layer you add to your application to facilitate communication between two different application contexts or domains. An API is a form of anti-corruption layer. Again, because you're building all the apps, it seems like they're all the same thing. However, imagine your MVC app as a third-party application being built by someone else to consume your API. Should they use your entities directly? No. They probably would have their own entity classes and their own data store. The DTO your API uses provides a way for these two different applications to communicate via a common language. If you use your entity classes directly, then any change to your data, necessitates a change to your API, which in turn necessitates a change to any consumers of your API. Imagine if Google changed a database column, and because of that, every single developer using their API(s) had to immediately make changes to their own applications or they would break.
In short, just because two classes look the same, doesn't mean they are the same. Your entity and your DTO are each representations of a concept in different contexts, and therefore you need and should have both.

Relations between Repositories for Aggragates in DDD

I'm building a Repository for an Aggragate. We've got 3 different Entities that it's constructed out of, one of them is root.
Data for all 3 is persisted in a SQL database. Each has it's own table.
Let's consider simple case of getting full list of those Aggregates. I need to fetch data from all 3 tables. Should I build one optimised query to fetch this data set or rather encapsulate logic for each Entity in it's own Repository and assemble it the Aggragate's repo? (Aggregate repo would then call respective repos and assemble it)
I'm leaning twords the first solution, however it's stronger coupleing. The later seems nicer from OOP point of view, but seems to be overcomplicated and potentialy casue problems with cache invalidation for subsequent sets of data etc.
For each type of object requiring global access, create an object that provides the illusion of all objects of this type stored in memory. Configure access through the global interface. [..] Define methods for adding and removing objects. [..] Define repositories only for aggregates. ~ Evans, about repositories
You should create one repository for Aggregate only. There is no reason to create seperate repositories. What is more, creating seperate repository would cause some additional problems as you mentioned.
I'm leaning twords the first solution, however it's stronger
To answer that, please take a look at Aggregate definition from Martin Fowler:
Aggregate is a pattern in Domain-Driven Design. A DDD aggregate is a
cluster of domain objects that can be treated as a single unit. An
example may be an order and its line-items, these will be separate
objects, but it's useful to treat the order (together with its line
items) as a single aggregate.
Aggregate is coupling Entities that it is constructed out of by definition.

DDD: Where to put persistence logic, and when to use ORM mapping

We are taking a long, hard look at our (Java) web application patterns. In the past, we've suffered from an overly anaemic object model and overly procedural separation between controllers, services and DAOs, with simple value objects (basically just bags of data) travelling between them. We've used declarative (XML) managed ORM (Hibernate) for persistence. All entity management has taken place in DAOs.
In trying to move to a richer domain model, we find ourselves struggling with how best to design the persistence layer. I've spent a lot of time reading and thinking about Domain Driven Design patterns. However, I'd like some advice.
First, the things I'm more confident about:
We'll have "thin" controllers at the front that deal only with HTTP and HTML - processing forms, validation, UI logic.
We'll have a layer of stateless business logic services that implements common algorithms or logic, unaware of the UI, but very much aware of (and delegating to) the domain model.
We'll have a richer domain model which contains state, relationships, and logic inherent to the objects in that domain model.
The question comes around persistence. Previously, our services would be injected (via Spring) with DAOs, and would use DAO methods like find() and save() to perform persistence. However, a richer domain model would seem to imply that objects should know how to save and delete themselves, and perhaps that higher level services should know how to locate (query for) domain objects.
Here, a few questions and uncertainties arise:
Do we want to inject DAOs into domain objects, so that they can do "this.someDao.save(this)" in a save() method? This is a little awkward since domain objects are not singletons, so we'll need factories or post-construction setting of DAOs. When loading entities from a database, this gets messy. I know Spring AOP can be used for this, but I couldn't get it to work (using Play! framework, another line of experimentation) and it seems quite messy and magical.
Do we instead keep DAOs (repositories?) completely separate, on par with stateless business logic services? This can make some sense, but it means that if "save" or "delete" are inherent operations of a domain object, the domain object can't express those.
Do we just dispense with DAOs entirely and use JPA to let entities manage themselves.
Herein lies the next subtlety: It's quite convenient to map entities using JPA. The Play! framework gives us a nice entity base class, too, with operations like save() and delete(). However, this means that our domain model entities are quite closely tied to the database structure, and we are passing objects around with a large amount of persistence logic, perhaps all the way up to the view layer. If nothing else, this will make the domain model less re-usable in other contexts.
If we want to avoid this, then we'd need some kind of mapping DAO - either using simple JDBC (or at least Spring's JdbcTemplate), or using a parallel hierarchy of database entities and "business" entities, with DAOs forever copying information from one hierarchy to another.
What is the appropriate design choice here?
Your questions and doubts ring an interesting alarm here, I think you went a bit too far in your interpretation of a "rich domain model". Richness doesn't go as far as implying that persistence logic must be handled by the domain objects, in other words, no, they shouldn't know how to save and delete themselves (at least not explicitely, though Hibernate actually adds some persistence logic transparently). This is often referred to as persistence ignorance.
I suggest that you keep the existing DAO injection system (a nice thing to have for unit testing) and leave the persistence layer as is while trying to move some business logic to your entities where it's fit. A good starting point to do that is to identify Aggregates and establish your Aggregate Roots. They'll often contain more business logic than the other entities.
However, this is not to say domain objects should contain all logic (especially not logic needed by many other objects across the application, which often belongs in Services).
I am not a Java expert, but I use NHibernate in my .NET code so my experience should be directly translatable to the Java world.
When using ORM (like Hibernate you mentioned) to build Domain-Driven Design application, one of good (I won't say best) practices is to create so-called application services between the UI and the Domain. They are similar to stateless business objects you mentioned, but should contain almost no logic. They should look like this:
public void SayHello(int id, String helloString)
SomeDomainObject target = domainObjectRepository.findById(id); //This uses Hibernate to load the object.
target.sayHello(helloString); //There is a single domain object method invocation per application service method.
domainObjectRepository.Save(target); //This one is optional. Hibernate should already know that this object needs saving because it tracks changes.
Any changes to objects contained by DomainObject (also adding objects to collections) will be handled by Hibernate.
You will also need some kind of AOP to intercept application service method invocations and create Hibernate's session before the method executes and save changes after method finishes with no exceptions.
There is a really good sample how to do DDD in Java here. It is based on the sample problem from Eric Evans' 'Blue Book'. The application logic class sample code is here.

Repository, Service or Domain object - where does logic belong?

Take this simple, contrived example:
Inevitably, I will have more complex "queries", such as:
//returns users where active=true, deleted=false, and confirmed = true
I'm having trouble determining where the responsibility of the repository ends. GetActiveUsers() represents a simple "query". Does it belong in the repository?
How about something that involves a bit of logic, such as:
//activate the user, set the activationCode to "used", etc.
ActivateUser(string activationCode);
Repositories are responsible for the application-specific handling of sets of objects. This naturally covers queries as well as set modifications (insert/delete).
ActivateUser operates on a single object. That object needs to be retrieved, then modified. The repository is responsible for retrieving the object from the set; another class would be responsible for invoking the query and using the object.
These are all excellent questions to be asking. Being able to determine which of these you should use comes down to your experience and the problem you are working on.
I would suggest reading a book such as Fowler's patterns of enterprise architecture. In this book he discusses the patterns you mention. Most importantly though he assigns each pattern a responsibility. For instance domain logic can be put in either the Service or Domain layers. There are pros and cons associated with each.
If I decide to use a Service layer I assign the layer the role of handling Transactions and Authorization. I like to keep it 'thin' and have no domain logic in there. It becomes an API for my application. I keep all business logic with the domain objects. This includes algorithms and validation for the object. The repository retrieves and persists the domain objects. This may be a one to one mapping between database columns and domain properties for simple systems.
I think GetAtcitveUsers is ok for the Repository. You wouldnt want to retrieve all users from the database and figure out which ones are active in the application as this would lead to poor performance. If ActivateUser has business logic as you suggest, then that logic belongs in the domain object. Persisting the change is the responsibility of the Repository layer.
Hope this helps.
When building DDD projects I like to differentiate two responsibilities: a Repository and a Finder.
A Repository is responsible for storing aggregate roots and for retrieving them, but only for usage in command processing. By command processing I meant executing any action a user invoked.
A Finder is responsible for querying domain objects for purposes of UI, like grid views and details views.
I don't consider finders to be a part of domain model. The particular IXxxFinder interfaces are placed in presentation layer, not in the domain layer. Implementation of both IXxxRepository and IXxxFinder are placed in data access layer, possibly even in the same class.