DDD: Aggregates and Deletes - entity

It's taken awhile but I feel I've started to build a good understanding of aggregates in DDD. Keep them small (An Entity with Value Objects whenever possible) and when containing multiple entities, ensure their reason to exist is to enforce some (transactional) invariant.
Where I come a bit undone is when it comes to the Remove or Delete side of things. Imagine a:
Thread, with Posts
For a long time I would mistake the 'has-a' relationship for an aggregate, but...
The requirement that a Post must have a Thread can be enforced via a factory method on the Thread to add a Post.
Then in lieu of any business rules that require it, they can be separate aggregates. For instance, if you were loading a list of threads, it doesn't make much sense to have to also load all the posts for each thread as well.
What about deleting a Thread though? It makes sense that removing a Thread means the Posts for that thread should go as well. But enforcing that a Post must be removed when its Thread is removed leads to them becoming a single Aggregate with Thread as the aggregate root.
This is just a representative example, but in many cases any 'has-a' relationship often implies something like the above. ie. the child should no longer exists if the parent is removed.
So, any advice on a situation when the only reason to seem to need an aggregate relationship between two entities is for delete/remove purposes?
My thinking at the moment?
You don't really delete a Thread. You make it inactive.
When a thread is made inactive, you obviously can't add any new posts (enforced through the factory method). Any posts that belong to the now inactive thread are also made inactive through eventual consistency?
Any other pearls of wisdom learned to ensure not mixing up a 'has-a' relationship with an aggregate root / child entity aggregate?

You don't really delete a Thread. You make it inactive.
See also Don't Delete, Just Don't.
Any other pearls of wisdom learned to ensure not mixing up a 'has-a' relationship with an aggregate root / child entity aggregate?
I'd say that the most important lesson is this: if two pieces of information have to be kept immediately consistent with each other, then they have to be stored together <-- same database. In other words, the need for immediate consistency puts constraints not only on your domain model, but also on your data model.
In business systems, "have to be consistent" is less frequent than you might expect, because the key motivation for "have to be" is "what is the cost to the business if they are not?"
The classic example used here is orders vs inventory; we don't need to have reservable stock on the floor in order to accept a new order -- "backorder" is a real thing in the domain, and is often a better way of doing business than keeping everything immediately consistent.

Related

How to avoid creating too large DDD aggregates?

I have the root of the aggregate Film, which contains all the information about a movie and a list of objects of the Comments type (text, link to the user, and so on).
Since Film is the root, then, if necessary, I must receive absolutely all information about the film, including comments, although very often I do not need it at all. For example, if I want to get a list of all the films, then I absolutely do not need comments on each, especially all at once.
There was an idea to put comments in a separate aggregate, but comments cannot exist without the film they were written for, therefore they are part of the Film aggregate.
What to do in such cases? Is it possible to receive them in portions or separately from the Film unit according to DDD?
Remember that an aggregate is all about state consistency.
Thinking about the domain model in terms of operations (which for this purpose could encompass both commands and queries, since we're not necessarily doing CQRS...) is useful. If there exist two operations A and B such that we want operation B to always see state changes made in operation A, then that's a very strong signal that A and B should be on the same aggregate. Conversely, if there's an operation C for which no operation on a given aggregate requires that C see that operation's state changes or that operation see C's state changes, that's a sign that operation C shouldn't be an operation on that aggregate.
After assigning operations, the aggregates should only have the state necessary to support the operations: any other state in the aggregate is superfluous and should be removed.
In the context of your question, then, do you want a "get comments for film" operation to see the changes introduced by "update year of film" immediately?
Note that you can (within the limits of your infrastructure and application layers: it's possible that something (e.g. choice of framework) in those layers makes this impractical if not impossible) take two high-level-operations which have a "requires consistency" relationship and implement the operations in terms of a process which ensures their consistency in terms of operations against different aggregates. The saga pattern may prove useful for this. This adds complexity (because you'll typically end up introducing some concurrency-control operations into your domain model, e.g. locks or rollbacks), but if the coordination overhead of a too-large aggregate is causing the system to fail to meet non-functional requirements, well, sometimes "ya gotta do what ya gotta do".
For example, if the requirement is that a "post comment about film" operation must see the result of a "create film" or "delete film" operation but no other operations on film, you can have CommentAboutFilm be a separate aggregate from Film, as is a FilmLifecycle aggregate; all three operations go through the lifecycle aggregate which records the intention to create/delete a film (updating its state) before actually creating/deleting the film aggregate and then recording that the operation was performed; similarly, the post comment operation goes through the lifecycle aggregate: if the film creation operation hasn't yet succeeded or there's an intent to delete, the post comment operation is rejected.
Note that if the concern about aggregate size isn't around concurrency/consistency but loading, there are some technical/implementation tricks that are outside the scope of the domain model such as lazy loading. These tricks may require custom support in the infrastructure layer (e.g. if you rely on some sort of optimistic concurrency control from your infrastructure, you may need to implement logic there to allow a change strictly to the comments to a film to be concurrent with adding a cast member).

Discriminator field and data modeling

I have the following case.
A reservation, this reservation be canceled, it can be newly created it can be Confirmed it can be rejected.
There might be different reasons for cancelation. Lets say the reservation has expired, or it may have not been processed within certain timelimit or some other reason.
In order for a reservation to be confirmed a multiple sub - transactions should be performed. This mean that there is a flow within the Confirmation itself. The solution my team came with is some sort of work table holding many different statuses. Which is fine. I felt the need to uniquely identify the state of a reservation by declaring a field ReservationStatus that depicts certain variation of statuses that are already defined in the table. In this case the Reservation status would be NEW,CONFIRMED,CANCELED,REJECTED. Each state will depict certain variation of statuses in the work table.
My team was convinced that this is adding additional complexity. I think this is the opposite it simplifyes the flow. It also declares a natural discriminator and polymorphism. We are supposed to use Queues and asynchroneus processes.
How can I actualy jsutify that we should have such column it apears the arguments I already mentioned were not enough and deep down inside I know I am right :)?
Wanted this to be a comment but it came out too long so here it goes.
#AlexandarPetrov I would add the following questions:
Do all the Statuses concretely represent every State a Reservation could have?
Are there clear rules for all Status migration paths? For e.g. Expired -> CONFIRMED and so forth.
Do you need to model the state changes? And is it a finite state machine?
I'd personally expose the status field but only if it is concrete enough by itself to define state. For e.g. I've seen cases where there are 2 layers of statuses - status and sub-status. In a case like that boundaries are lost and state becomes a complex VO rather than a simple field and state transition rules could become blurry.
Additionally:
For me it seems like Event Sourcing and CQRS could be a good fit for all those Reservations. Especially having in mind the complex flows you mention. Then transitions will be events being applied and the statuses - a simple way to expose state. Tracking status changes separately will also be needless as the Event Stream holds all historical data.
Finally:
How can I actualy jsutify that we should have such column it apears the arguments I already mentioned were not enough and deep down inside I know I am right :)?
Well at the end you can always put your foot down and take responsibility. And if it turns out to be a wrong decision in time - bare the responsibility and admit the mistake.

Should the rule "one transaction per aggregate" be taken into consideration when modeling the domain?

Taking into consideration the domain events pattern and this post , why do people recomend keeping one aggregate per transaction model ? There are good cases when one aggregate could change the state of another one . Even by removing an aggregate (or altering it's identity) will lead to altering the state of other aggregates that reference it. Some people say that keeping one transaction per aggregates help scalability (keeping one aggregate per server) . But doesn't this type of thinking break the fundamental characteristic about DDD : technology agnostic ?
So based on the statements above and on your experience, is it bad to design aggregates, domain events, that lead to changes in other aggregates and this will lead to having 2 or more aggregates per transaction (ex. : when a new order is placed with 100 items change the customer's state from normal to V.I.P. )?
There are several things at play here and even more trade-offs to be made.
First and foremost, you are right, you should think about the model first. Afterall, the interplay of language, model and domain is what we're doing this all for: coming up with carefully designed abstractions as a solution to a problem.
The tactical patterns - from the DDD book - are a means to an end. In that respect we shouldn't overemphasize them, eventhough they have served us well (and caused major headaches for others). They help us find "units of consistency" in the model, things that change together, a transactional boundary. And therein lies the problem, I'm afraid. When something happens and when the side effects of it happening should be visible are two different things. Yet all too often they are treated as one, and thus cause this uncomfortable feeling, to which we respond by trying to squeeze everything within the boundary, without questioning. Still, we're left with that uncomfortable feeling. There are a lot of things that logically can be treated as a "whole change", whereas physically there are multiple small changes. It takes skill and experience, or even blunt trying to know when that is the case. Not everything can be solved this way mind you.
To scale or not to scale, that is often the question. If you don't need to scale, keep things on one box, be content with a certain backup/restore strategy, you can bend the rules and affect multiple aggregates in one go. But you have to be aware you're doing just that and not take it as a given, because inevitably change is going to come and it might mess with this particular way of handling things. So, fair warning. More subtle is the question as to why you're changing multiple aggregates in one go. People often respond to that with the "your aggregate boundaries are wrong" answer. In reality it means you have more domain and model exploration to do, to uncover the true motivation for those synchronous, multi-aggregate changes. Often a UI or service is the one that has this "unreasonable" expectation. But there might be other reasons and all it might take is a different set of abstractions to solve the same problem. This is a pretty essential aspect of DDD.
The example you gave seems like something I could handle as two separate transactions: an order was placed, and as a reaction to that, because the order was placed with a 100 items, the customer was made a VIP. As MikeSW hinted at in his answer (I started writing mine after he posted his), the question is when, who, how, and why should this customer status change be observed. Basically it's the "next" behavior that dictates the consistency requirements of the previous behavior(s).
An aggregate groups related business objects while an aggregate root (AR) is the 'representative' of that aggregate. Th AR itself is an entity modeling a (bigger, more complex) domain concept. In DDD a model is always relative to a context (the bounded context - BC) i.e that model is valid only in that BC.
This allows you to define a model representative of the specific business context and you don't need to shove everything in one model only. An Order is an AR in one context, while in another is just an id.
Since an AR pretty much encapsulates all the lower concepts and business rules, it acts as a whole i.e as a transaction/unit of work. A repository always works with AR because 1) a repo always deals with business objects and 2) the AR represents the business object for a given context.
When you have a use case involving 2 or more AR the business workflow and the correct modelling of that use case is paramount. In a lot of cases those AR can be modified independently (one doesn't care about other) or an AR changes as a result of other AR behaviour.
In your example, it's pretty trivial: when the customer places an order for 100 items, a domain event is generated and published. Then you have a handler which will check if the order complies with the customer promotions rules and if it does, a command is issued which will have the result of changing the client state to VIP.
Domain events are very powerful and allows you to implement transactions but in an eventual consistent environment. The old db transaction is an implementation detail and it's usually used when persisting one AR (remember AR are treated as a logical unit but persisting one may involve multiple tables hence db transaction).
Eventual consistency is a 'feature' of domain events which fits naturally a rich domain (and the real world actually). For some cases you might need instant consistency however those are particular cases and they are related to UI rather than how Domain works. Of course, it really depends from one domain to another. In your example, the customer won't mind it became a VIP 2 seconds or 2 minutes after the order was placed instead of the same milisecond.

Domain driven design and aggregate references

I am designing the domain model, but there is something that doesn't seem to be ok.
I start with a main aggregate. It has references to other aggregates and those other aggregates reference more aggregates too. I can travel the hole domain model starting from the main aggregate.
The problem I see is that I will be holding all instances of aggregates in memory.
Is that a good design? I can solve the memory problem with lazy loading but I think that I have a deeper problem.
I have another question regarding aggregate references. Should I load the references to other aggregates lazily? If that is the case I would almost never user their repositories. Is that ok?
Having direct referenced between aggregate roots (ARs) can lead to problems that cannot be solved by lazy loading. Moreover, it forces all connected ARs to be in the same database and makes it more difficult to reason about and enforce invariants which is the primary purpose of an AR in the first place. It is better to limit or eliminate direct references between ARs. A great resource to learn about aggregate design is a series of articles by Vaughn Vernon. The basic idea is to make your ARs lean and focused all while keeping in mind their function - enforcing business constraints and forging a boundary around the root entity. If an AR needs data from another AR to perform its work, this data can be provided to it by an application service via repository. Also, if references are only needed to fulfill UI requirements, then consider using the read-model pattern.

DDD/NHibernate Use of Aggregate root and impact on web design - ex. Editing children of aggregate root

Hopefully, this fictitious example will illustrate my problem:
Suppose you are writing a system which tracks complaints for a software product, as well as many other attributes about the product. In this case the SoftwareProduct is our aggregate root and Complaints are entities that only can exist as a child of the product. In other words, if the software product is removed from the system, so shall the complaints.
In the system, there is a dashboard like web page which displays many different aspects of a single SoftwareProduct. One section in the dashboard, displays a list of Complaints in a grid like fashion, showing only some very high level information for each complaint. When an admin type user chooses one of these complaints, they are directed to an edit screen which allows them to edit the detail of a single Complaint.
The question is: what is the best way for the edit screen to retrieve the single Complaint, so that it can be displayed for editing purposes? Keep in mind we have already established the SoftwareProduct as an aggregate root, therefore direct access to a Complaint should not be allowed. Also, the system is using NHibernate, so eager loading is an option, but my understanding is that even if a single Complaint is eager loaded via the SoftwareProduct, as soon as the Complaints collection is accessed the rest of the collection is loaded. So, how do you get the single Complaint through the SoftwareProduct without incurring the overhead of loading the entire Complaints collection?
This gets a bit into semantics and religiosity, but within the context of editing a complaint, the complaint is the root object. When you are editing a complaint, the parent object (software product) is unimportant. It is obviously an entity with a unique identity. Therefore you would would have a service/repository devoted to saving the updated complaint, etc.
Also, i think you're being a bit too negative. Complaints? How about "Comments"? Or "ConstructiveCriticisms"?
#Josh,
I don't agree with what you are saying even though I have noticed some people design their "Web" applications this way just for the sake of performance, and not based on the domain model itself.
I'm not a DDD expert either, but I'm sure you have read the traditional Order and OrderItem example. All DDD books say OrderItem belongs to the Order aggregate with Order being the aggregate root.
Based on what you are saying, OrderItem doesn't belong to Order aggregate anymore since the user may want to directly edit an OrderItem with Order being unimportant (just like editing a Complaing with its parents Software Product being unimportant). And you know if this approach is followed, none of the Order invariants could be enforced, which are extremely important when it comes to e-commerce systems.
Anyone has any better approaches?
Mosh
To answer your question:
Aggregates are used for the purpose of consistency. For example, if adding/modifying/deleting a child object from a parent (aggregate root) causes an invariant to break, then you need an aggregate there.
However, in your problem, I believe SoftwareProduct and Compliant belong to two separate aggregates, each being the root of their own aggregates. You don't need to load the SoftwareProject and all N Complaints assigned to it, just to add a new Complaint. To me, it doesn't seem that you have any business rules to be evaluated when adding a new Complaint.
So, in summary, create 2 different Repositories: SoftwareProductRepository and ComplaintRepository.
Also, when you delete a SoftwareProduct, you can use database relationships to cascade deletes and remove the associated Complaints. This should be done in your database for the purpose of data integrity. You don't need to control that in your Domain Model, unless you had other invariants apart from deleting linked objects.
Hope this helps,
Mosh
I am using NH for another business context but similar entity relationships like yours. I do not understand why do you say:
Keep in mind we have already
established the SoftwareProduct as an
aggregate root, therefore direct
access to a Complaint should not be
allowed
I have this in mine, Article and Publisher entities, if Publisher cease to exist, so do all the dependent Artcle entities. I allow myself to have direct access to the Article collections of each Publisher and individual entities. In the DB/Mapping of the Article class, Publisher is one of the members and cannot accept Null.
Care to elaborate the difference between yours and mine?
Sorry this is not a direct answer but too long to be added as a comment.
I agree with Mosh. Each ones of these two entities has its own aggregate root. Let me to explain it in the real life. suppose that a company has developed a software. There are some bug in this software, that made you annoy. you are going to go to the company and aware them from this problem. this company gives you a form to be filled by you.
This form has a field - section - indicates to the software name and description. additionally, it has some parts for your complaint. Is this form the same as the software manual? No. It is a form related to the software. It is not the software. Does this form has any ID? yes. It has. In other words, you can call the company in the next day and ask the operator about your letter of complaint. It is obvious that the operator will ask you about the Id.
This evidence shows that this form has its own entity and it could not be confused with the software itself. Any relation between two different entity does not mean one of them belongs to the other.