How to avoid creating too large DDD aggregates?

How to avoid creating too large DDD aggregates? - repository

I have the root of the aggregate Film, which contains all the information about a movie and a list of objects of the Comments type (text, link to the user, and so on).
Since Film is the root, then, if necessary, I must receive absolutely all information about the film, including comments, although very often I do not need it at all. For example, if I want to get a list of all the films, then I absolutely do not need comments on each, especially all at once.
There was an idea to put comments in a separate aggregate, but comments cannot exist without the film they were written for, therefore they are part of the Film aggregate.
What to do in such cases? Is it possible to receive them in portions or separately from the Film unit according to DDD?

Remember that an aggregate is all about state consistency.
Thinking about the domain model in terms of operations (which for this purpose could encompass both commands and queries, since we're not necessarily doing CQRS...) is useful. If there exist two operations A and B such that we want operation B to always see state changes made in operation A, then that's a very strong signal that A and B should be on the same aggregate. Conversely, if there's an operation C for which no operation on a given aggregate requires that C see that operation's state changes or that operation see C's state changes, that's a sign that operation C shouldn't be an operation on that aggregate.
After assigning operations, the aggregates should only have the state necessary to support the operations: any other state in the aggregate is superfluous and should be removed.
In the context of your question, then, do you want a "get comments for film" operation to see the changes introduced by "update year of film" immediately?
Note that you can (within the limits of your infrastructure and application layers: it's possible that something (e.g. choice of framework) in those layers makes this impractical if not impossible) take two high-level-operations which have a "requires consistency" relationship and implement the operations in terms of a process which ensures their consistency in terms of operations against different aggregates. The saga pattern may prove useful for this. This adds complexity (because you'll typically end up introducing some concurrency-control operations into your domain model, e.g. locks or rollbacks), but if the coordination overhead of a too-large aggregate is causing the system to fail to meet non-functional requirements, well, sometimes "ya gotta do what ya gotta do".
For example, if the requirement is that a "post comment about film" operation must see the result of a "create film" or "delete film" operation but no other operations on film, you can have CommentAboutFilm be a separate aggregate from Film, as is a FilmLifecycle aggregate; all three operations go through the lifecycle aggregate which records the intention to create/delete a film (updating its state) before actually creating/deleting the film aggregate and then recording that the operation was performed; similarly, the post comment operation goes through the lifecycle aggregate: if the film creation operation hasn't yet succeeded or there's an intent to delete, the post comment operation is rejected.
Note that if the concern about aggregate size isn't around concurrency/consistency but loading, there are some technical/implementation tricks that are outside the scope of the domain model such as lazy loading. These tricks may require custom support in the infrastructure layer (e.g. if you rely on some sort of optimistic concurrency control from your infrastructure, you may need to implement logic there to allow a change strictly to the comments to a film to be concurrent with adding a cast member).

Related

DDD: How many aggregates should have a single bounded context?

How many aggregates should have a single bounded context?
I'm asking this question due the reason, that the information from books and other resources are too broad/abstract.
I suppose, that it depends on certain domain model and its structure. How many bounded contexts do have a domain model? How many entities there are in each bounded context. I suppose, that all these questions make dependency on that fact, how many aggregates should be in a single bounded context.
Also, if to recall the SOLID principles and the common idea to have the small loosely coupled pieces of code. I suppose, that it's fine to have maximum 3-4 aggregates per single bounded context. If there are more aggregates in single bounded context, then there are probably some issues with the software design.
I'm reading the Vernon's book right now about DDD, but it's rather difficult to understand how to design certainly such things.

The trite answer is “just enough, but not too many”. There is no real guidance on how many aggregates to put in a bounded context.
The thing that drives aggregates and entities is the Ubiquitous Language that is used to describe the context. The Ubiquitous Language is different for each context, and the entities and aggregate roots needed in the context can be found in the nouns used in the language. Once you have the domain fully described by the language, count up the nouns that have a special meaning in that language and you have a count of the entities necessary.
Bear in mind, though, that I've rarely come across a bounded context that was "fully described". The goal is "described fully enough for this release". Therefore for any release the number of entities won't be "enough" and you'll probably have plans of adding more. Whether those plans ever rise to the top of the priority queue is another question.

How many aggregates should have a single bounded context?
All aggregates should have a single bounded context. You can almost work that out backwards - an aggregate is going to be stored in a single database, a database is going to belong to a single (micro) service, a service is going to serve a single bounded context; therefore it follows that an aggregate is going to belong to a single bounded context.
Where things can get messy: it's easy to take some broad business concept, like "order", and try to create a single representation for order that works for every bounded context. That's not the goal though -- the goal is for each context to have a representation of order that works in that context.
Common example: sales, billing, fulfillment may all care about "order", but the information that they need to share is largely just the order id, which acts as a correlation identifier so that they can coordinate conversations.
See Mauro Servienti: All of Our Aggregates Are Wrong

DDD: Aggregates and Deletes

It's taken awhile but I feel I've started to build a good understanding of aggregates in DDD. Keep them small (An Entity with Value Objects whenever possible) and when containing multiple entities, ensure their reason to exist is to enforce some (transactional) invariant.
Where I come a bit undone is when it comes to the Remove or Delete side of things. Imagine a:
Thread, with Posts
For a long time I would mistake the 'has-a' relationship for an aggregate, but...
The requirement that a Post must have a Thread can be enforced via a factory method on the Thread to add a Post.
Then in lieu of any business rules that require it, they can be separate aggregates. For instance, if you were loading a list of threads, it doesn't make much sense to have to also load all the posts for each thread as well.
What about deleting a Thread though? It makes sense that removing a Thread means the Posts for that thread should go as well. But enforcing that a Post must be removed when its Thread is removed leads to them becoming a single Aggregate with Thread as the aggregate root.
This is just a representative example, but in many cases any 'has-a' relationship often implies something like the above. ie. the child should no longer exists if the parent is removed.
So, any advice on a situation when the only reason to seem to need an aggregate relationship between two entities is for delete/remove purposes?
My thinking at the moment?
You don't really delete a Thread. You make it inactive.
When a thread is made inactive, you obviously can't add any new posts (enforced through the factory method). Any posts that belong to the now inactive thread are also made inactive through eventual consistency?
Any other pearls of wisdom learned to ensure not mixing up a 'has-a' relationship with an aggregate root / child entity aggregate?

You don't really delete a Thread. You make it inactive.
See also Don't Delete, Just Don't.
Any other pearls of wisdom learned to ensure not mixing up a 'has-a' relationship with an aggregate root / child entity aggregate?
I'd say that the most important lesson is this: if two pieces of information have to be kept immediately consistent with each other, then they have to be stored together <-- same database. In other words, the need for immediate consistency puts constraints not only on your domain model, but also on your data model.
In business systems, "have to be consistent" is less frequent than you might expect, because the key motivation for "have to be" is "what is the cost to the business if they are not?"
The classic example used here is orders vs inventory; we don't need to have reservable stock on the floor in order to accept a new order -- "backorder" is a real thing in the domain, and is often a better way of doing business than keeping everything immediately consistent.

Domain Driven Design - Creating general purpose entities vs. Context specific Entities

Situation
Suppose you have Orders and Clients as entities in your application. In one aggregate, the Order entity is considered to be the root but you also want to make use of the Client entity for simple things. In another the Client is the root entity and the Order entity is touched ever so lightly.
An example:
Let's say that in the Order aggregate I use the Client only to read details like name, address, build order history and not to make the client do client specific business logic. (like persistence, passwords resets and back flips..).
On the other hand, in the Client aggregate I use the Order entity to report on the client's buying habbits, order totals, order counting, without requiring advanced order functionality like order processing, updating, status changes, etc.
Possible solution
I believe the better solution is to create the entities for each aggregate specific to the aggregate context, because making them full featured (general purpose) and ready for any situation and usage seems like overkill and could potentially become a maintenance nightmare. (and potentially memory intensive)
Question
What is the DDD recommended way of handling this situation?
What is your take on the matter?

The basic driver for these decisions should be the ubiquitous language, and consequently the real world domain you're modeling. If both works in a specific domain, I'd favor separation over god-classes for maintainability reasons.
Apart from separating behavior into different aggregates, you should also take care that you don't mix different bounded contexts. Depending on the requirements of your domain, it could make sense to separate the Purchase Context from the Reporting Context (to extend on your example).
To decide on a context design, context maps are a helpful tool.

You are one the right track. In DDD, entities are not merely containers encapsulating all attributes related to a "subject" (for example: a customer, or an order). This is a very important concept that eludes a lot of people. An entity in DDD represents an operation boundary, thus only the data necessary to perform the operation is considered to be a part of the entity. Exactly which data to include in an entity can be difficult to consider because some data is relevant in a different use-cases. Here are some tips when analyzing data:
Analyze invariants, things that must be considered when applying validation rules and that can not be out of sync should be in the same aggregate.
Drop the database-thinking, normalization is not a concern of DDD
Just because things look the same, it doesn't mean that they are. For example: the current shipping address registered on a customer is different from the shipping address which a specific order was shipped to.
Don't look at reads. Reading, like creating a report or populating av viewmodel/dto/whatever has nothing to do with operation boundaries and can typically be a 360 deg view of the data. In fact don't event use your domain model when returning reads, use a different architectural stack.

Should the rule "one transaction per aggregate" be taken into consideration when modeling the domain?

Taking into consideration the domain events pattern and this post , why do people recomend keeping one aggregate per transaction model ? There are good cases when one aggregate could change the state of another one . Even by removing an aggregate (or altering it's identity) will lead to altering the state of other aggregates that reference it. Some people say that keeping one transaction per aggregates help scalability (keeping one aggregate per server) . But doesn't this type of thinking break the fundamental characteristic about DDD : technology agnostic ?
So based on the statements above and on your experience, is it bad to design aggregates, domain events, that lead to changes in other aggregates and this will lead to having 2 or more aggregates per transaction (ex. : when a new order is placed with 100 items change the customer's state from normal to V.I.P. )?

There are several things at play here and even more trade-offs to be made.
First and foremost, you are right, you should think about the model first. Afterall, the interplay of language, model and domain is what we're doing this all for: coming up with carefully designed abstractions as a solution to a problem.
The tactical patterns - from the DDD book - are a means to an end. In that respect we shouldn't overemphasize them, eventhough they have served us well (and caused major headaches for others). They help us find "units of consistency" in the model, things that change together, a transactional boundary. And therein lies the problem, I'm afraid. When something happens and when the side effects of it happening should be visible are two different things. Yet all too often they are treated as one, and thus cause this uncomfortable feeling, to which we respond by trying to squeeze everything within the boundary, without questioning. Still, we're left with that uncomfortable feeling. There are a lot of things that logically can be treated as a "whole change", whereas physically there are multiple small changes. It takes skill and experience, or even blunt trying to know when that is the case. Not everything can be solved this way mind you.
To scale or not to scale, that is often the question. If you don't need to scale, keep things on one box, be content with a certain backup/restore strategy, you can bend the rules and affect multiple aggregates in one go. But you have to be aware you're doing just that and not take it as a given, because inevitably change is going to come and it might mess with this particular way of handling things. So, fair warning. More subtle is the question as to why you're changing multiple aggregates in one go. People often respond to that with the "your aggregate boundaries are wrong" answer. In reality it means you have more domain and model exploration to do, to uncover the true motivation for those synchronous, multi-aggregate changes. Often a UI or service is the one that has this "unreasonable" expectation. But there might be other reasons and all it might take is a different set of abstractions to solve the same problem. This is a pretty essential aspect of DDD.
The example you gave seems like something I could handle as two separate transactions: an order was placed, and as a reaction to that, because the order was placed with a 100 items, the customer was made a VIP. As MikeSW hinted at in his answer (I started writing mine after he posted his), the question is when, who, how, and why should this customer status change be observed. Basically it's the "next" behavior that dictates the consistency requirements of the previous behavior(s).

An aggregate groups related business objects while an aggregate root (AR) is the 'representative' of that aggregate. Th AR itself is an entity modeling a (bigger, more complex) domain concept. In DDD a model is always relative to a context (the bounded context - BC) i.e that model is valid only in that BC.
This allows you to define a model representative of the specific business context and you don't need to shove everything in one model only. An Order is an AR in one context, while in another is just an id.
Since an AR pretty much encapsulates all the lower concepts and business rules, it acts as a whole i.e as a transaction/unit of work. A repository always works with AR because 1) a repo always deals with business objects and 2) the AR represents the business object for a given context.
When you have a use case involving 2 or more AR the business workflow and the correct modelling of that use case is paramount. In a lot of cases those AR can be modified independently (one doesn't care about other) or an AR changes as a result of other AR behaviour.
In your example, it's pretty trivial: when the customer places an order for 100 items, a domain event is generated and published. Then you have a handler which will check if the order complies with the customer promotions rules and if it does, a command is issued which will have the result of changing the client state to VIP.
Domain events are very powerful and allows you to implement transactions but in an eventual consistent environment. The old db transaction is an implementation detail and it's usually used when persisting one AR (remember AR are treated as a logical unit but persisting one may involve multiple tables hence db transaction).
Eventual consistency is a 'feature' of domain events which fits naturally a rich domain (and the real world actually). For some cases you might need instant consistency however those are particular cases and they are related to UI rather than how Domain works. Of course, it really depends from one domain to another. In your example, the customer won't mind it became a VIP 2 seconds or 2 minutes after the order was placed instead of the same milisecond.

Should an aggregate root's behaviour be dependent on other aggregate root's attributes?

I'm reading a book about DDD and i see an example domain that involves cars, engines, wheels and tires .
Above is the model as it is in the book . Customer is also aggregate root .
Having that model , there might be a case where the engine could have height, width and length attributes.
What happens when you need to attach a big engine on a small car ? The engine could not fit .
Is it a problem if the car checks for the engine attributes and allows it or not to be a part of the car ?
The engine has global identity (like you know each engine has a serial/manufacturer number). Maybe the engines need to be tracked by the manufacturer .
So I'm asking again , is it a problem if the car is using the engine's attributes to fit it inside (to allow it or not to be part of it) ?

Is it a problem if the car checks for the engine attributes and allows it or not to be a part of the car?
No.
That being said, your validation may be complex enough to introduce a domain service. Since two aggregates are involved you could have this:
car.Fit(engine)
Or this:
engine.Fit(car)
However, you probably want to be checking against a car model anyway :)
Since the rules are going to be somewhat more advanced and involve some data you probably want to introduce a domain service and possibly use double-dispatch on the objects:
So rather than car.Fit(engine) you could have this:
car.Fit(engine, IModelServiceImplementation)
And in the Fit method call:
if (!IModelServiceImplementation.CanFit(car, engine)) { throw new Exception(); }
The service could possibly load the correct model and rather check that against the engine. Depending on the domain one may even have modification levels and other rules to deal with.
Since a Car instance would not contain the actual Engine instance but only the EngineId or possibly some value object there would be no real assignment of engine to car. You could still pass the engine instance to the car and have it create the association however it sees fit.
The solution proposed by 'Enrico S.' is possibly more relevant to scenarios where changes are effected on the aggregate roots where one may not have all the aggregate roots available or even where aggregate roots live in separate bounded contexts. Even if Car and Engine were in separate BCs one would probably be able to query the validity somehow. Some things are fine for eventual consistency but others may not be.
As usual there are many things to consider :)

From DDD book, p128:
Any rule that spans AGGREGATES will not be expected to be up-to-date at all times. Through event processing, batch processing, or other update mechanisms, other dependencies can be resolved within some specific time.
So, it really depend on what Car aggregate is deigned for: if it requires strong consistency with the Engine, then the Engine should be part of the Car aggregate.
Other way, if it require only "eventual consistency", you might put that validation logic inside a Domain Event.
See this Udi Dahan's post

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas