DDD: How many aggregates should have a single bounded context? - entity

How many aggregates should have a single bounded context?
I'm asking this question due the reason, that the information from books and other resources are too broad/abstract.
I suppose, that it depends on certain domain model and its structure. How many bounded contexts do have a domain model? How many entities there are in each bounded context. I suppose, that all these questions make dependency on that fact, how many aggregates should be in a single bounded context.
Also, if to recall the SOLID principles and the common idea to have the small loosely coupled pieces of code. I suppose, that it's fine to have maximum 3-4 aggregates per single bounded context. If there are more aggregates in single bounded context, then there are probably some issues with the software design.
I'm reading the Vernon's book right now about DDD, but it's rather difficult to understand how to design certainly such things.

The trite answer is “just enough, but not too many”. There is no real guidance on how many aggregates to put in a bounded context.
The thing that drives aggregates and entities is the Ubiquitous Language that is used to describe the context. The Ubiquitous Language is different for each context, and the entities and aggregate roots needed in the context can be found in the nouns used in the language. Once you have the domain fully described by the language, count up the nouns that have a special meaning in that language and you have a count of the entities necessary.
Bear in mind, though, that I've rarely come across a bounded context that was "fully described". The goal is "described fully enough for this release". Therefore for any release the number of entities won't be "enough" and you'll probably have plans of adding more. Whether those plans ever rise to the top of the priority queue is another question.

How many aggregates should have a single bounded context?
All aggregates should have a single bounded context. You can almost work that out backwards - an aggregate is going to be stored in a single database, a database is going to belong to a single (micro) service, a service is going to serve a single bounded context; therefore it follows that an aggregate is going to belong to a single bounded context.
Where things can get messy: it's easy to take some broad business concept, like "order", and try to create a single representation for order that works for every bounded context. That's not the goal though -- the goal is for each context to have a representation of order that works in that context.
Common example: sales, billing, fulfillment may all care about "order", but the information that they need to share is largely just the order id, which acts as a correlation identifier so that they can coordinate conversations.
See Mauro Servienti: All of Our Aggregates Are Wrong

Related

Best Practices for Overlapping Object/Data Entity Types

Occasionally I run into situations where all of the conditions below are true for two highly similar, but not quite identical entities or objects. This makes it difficult for me to decide how to model them, either on the database end or in terms of object modeling. I'm going to try to spell out the issue and my questions in detail, because I've found it to be a really difficult modeling problem to define. I'm trying to do both data and object modeling with these entities, so I'm going to use the terminology of both disciplines a little loosely.
1) Both entities share many identical properties, but have a few unique ones not found in the other.
2) One is not a supertype or subtype of another.
3) The overlap is not due to object inheritance.
4) The objects are used for different purposes in the same domain, but often in close proximity in any workflow. This frequently leads those with even moderate domain knowledge to confuse the entities. On the other hand, this fine separation in purposes leads to greater differences between the methods of the associated objects than their properties.
5) In some situations it may be possible to create bridge tables on the database side to express M2M relationships between the entities. Nevertheless, they have so many properties (or columns, on the database side) in common that it might make sense to store them in the same table.
Some cases in point I've run into include:
1) "Product vs. Project confusion" - especially in software marketing, where Products and Projects share many of the same properties. Normally a product will have multiple projects associated with it, but it is also unusual yet conceivable for a project to be used in multiple products.
2) The subtle differences between Features and Components in software development. A feature is developer-centric a means of supplying a benefit, from the customer's point of view, while a component is a means of implementing features on the developer's side. This is a really subtle distinction which nevertheless counts for a lot. For further discussion see Rod Maupin's post at http://www.installationdeveloper.com/347/features-and-components-101/
3) Templates vs. Types in a lot of different problem domains. For example, when identifying types of guitars through a TypeID column, the TypeTable it refers to would probably have columns corresponding to colors, string sizes, body shapes, etc. A template, on the other hand, is something you'd build a guitar from, so it would have different methods than a Type, perhaps linked to an "Apply Template" or "Make Item from Template" menu command. Nevertheless, it would have many of the same columns or properties as a Type, such as color, shape, string size etc. This distinction raises its head in thousands of different object types and templates in many problem domains, not just this narrow example. To complicate matters further, in some situations it might be helpful to associate multiple Templates with a particular Type, and/or vice-versa.
I haven't run into this problem of overlapping entities often, but when it does occur, it becomes a real bottleneck and leads to a lot of waste time refactoring the data and object models. I've read books on both topics and done a lot searches of data/object modeling webpages about the issue, but have yet to see it discussed. The only hits for "overlap" and "data model" I could find on StackOverflow were for differentiating between similar columns in one table or entity, not across tables or entities. My questions are:
1) Is there is a formal name for this issue?
2) Is there a simple shortcut or trick of the trade to identify such overlapping entities at the beginning of the modeling process, rather than much further down the line, when late recognition makes refactoring an issue?
3) How should such overlapping entities be handled? I assume that in terms of OOP, they ought to have separate objects since their methods tend to be different. Inheriting one from the other would be awkward though. A more difficult question would be whether or not it would make sense to use separate tables on the database end. Combining them might require a complex series of views plus waste storage space when the properties/columns they don't have in common are left null. Storing them in separate tables might also be wasteful though, if the common properties could be stored in single columns.
It's a tricky issue to even recognize, let alone handle. I have only a moderate amount of experience with data/object modeling, so the input of someone who really knows what they're doing would be helpful. Thanks :)
Your question concerns both database modeling aspects that object-oriented (programming) modeling aspects. Let’s start from an abstract point of view.
You say:
1) Both entities share many identical properties, but have a few unique ones not found in the other.
2) One is not a supertype or subtype of another.
and:
3) The overlap is not due to object inheritance.
But note that inheritance should not to be confused with subtyping, even if many times they are tied together! See for instance Inheritance (object-oriented programming) in Wikipedia, where this statement is supported by two citations [1,2].
In other words, even if A is not a subtype of B, and B is not a subtype of A, you can find a C from which both A and B inherits attributes.
So, you can think or not of this C as an “abstract supertype” of both A and B; but in any case it is convenient consider it as common ancestor, at least from a database point of view, so that factorize the common attributes in a “supertable”.
Then, from the object-oriented programming side, you can see A or B as subtype of C or simple as two different things, depending on the characteristics of your Object-Relational Mapping tools, from the problem at hand, etc.
Of course, this way of modelling things does not prohibit that A and B, in addition to inherit from C, have one or more relations between them, as in the example Products-Projects that you have done.
So, here is my answer to your four final questions:
1) Yes, it is called inheritance.
2) You can check if two entities have a significant number of common attributes.
3) You can model them in the database with a common table, that perhaps has some common property like integrity constraints, and with two tables that have a foreign key to it. Of course this rule is not to be applied blindly, but can have exception as all the human rules. From the programming point of view, on the other hand, you can decide to model them both with a supertype or not. This dependes on many factors, and should be decided on a case by case basis.

Domain Driven Design - Creating general purpose entities vs. Context specific Entities

Situation
Suppose you have Orders and Clients as entities in your application. In one aggregate, the Order entity is considered to be the root but you also want to make use of the Client entity for simple things. In another the Client is the root entity and the Order entity is touched ever so lightly.
An example:
Let's say that in the Order aggregate I use the Client only to read details like name, address, build order history and not to make the client do client specific business logic. (like persistence, passwords resets and back flips..).
On the other hand, in the Client aggregate I use the Order entity to report on the client's buying habbits, order totals, order counting, without requiring advanced order functionality like order processing, updating, status changes, etc.
Possible solution
I believe the better solution is to create the entities for each aggregate specific to the aggregate context, because making them full featured (general purpose) and ready for any situation and usage seems like overkill and could potentially become a maintenance nightmare. (and potentially memory intensive)
Question
What is the DDD recommended way of handling this situation?
What is your take on the matter?
The basic driver for these decisions should be the ubiquitous language, and consequently the real world domain you're modeling. If both works in a specific domain, I'd favor separation over god-classes for maintainability reasons.
Apart from separating behavior into different aggregates, you should also take care that you don't mix different bounded contexts. Depending on the requirements of your domain, it could make sense to separate the Purchase Context from the Reporting Context (to extend on your example).
To decide on a context design, context maps are a helpful tool.
You are one the right track. In DDD, entities are not merely containers encapsulating all attributes related to a "subject" (for example: a customer, or an order). This is a very important concept that eludes a lot of people. An entity in DDD represents an operation boundary, thus only the data necessary to perform the operation is considered to be a part of the entity. Exactly which data to include in an entity can be difficult to consider because some data is relevant in a different use-cases. Here are some tips when analyzing data:
Analyze invariants, things that must be considered when applying validation rules and that can not be out of sync should be in the same aggregate.
Drop the database-thinking, normalization is not a concern of DDD
Just because things look the same, it doesn't mean that they are. For example: the current shipping address registered on a customer is different from the shipping address which a specific order was shipped to.
Don't look at reads. Reading, like creating a report or populating av viewmodel/dto/whatever has nothing to do with operation boundaries and can typically be a 360 deg view of the data. In fact don't event use your domain model when returning reads, use a different architectural stack.

Should the rule "one transaction per aggregate" be taken into consideration when modeling the domain?

Taking into consideration the domain events pattern and this post , why do people recomend keeping one aggregate per transaction model ? There are good cases when one aggregate could change the state of another one . Even by removing an aggregate (or altering it's identity) will lead to altering the state of other aggregates that reference it. Some people say that keeping one transaction per aggregates help scalability (keeping one aggregate per server) . But doesn't this type of thinking break the fundamental characteristic about DDD : technology agnostic ?
So based on the statements above and on your experience, is it bad to design aggregates, domain events, that lead to changes in other aggregates and this will lead to having 2 or more aggregates per transaction (ex. : when a new order is placed with 100 items change the customer's state from normal to V.I.P. )?
There are several things at play here and even more trade-offs to be made.
First and foremost, you are right, you should think about the model first. Afterall, the interplay of language, model and domain is what we're doing this all for: coming up with carefully designed abstractions as a solution to a problem.
The tactical patterns - from the DDD book - are a means to an end. In that respect we shouldn't overemphasize them, eventhough they have served us well (and caused major headaches for others). They help us find "units of consistency" in the model, things that change together, a transactional boundary. And therein lies the problem, I'm afraid. When something happens and when the side effects of it happening should be visible are two different things. Yet all too often they are treated as one, and thus cause this uncomfortable feeling, to which we respond by trying to squeeze everything within the boundary, without questioning. Still, we're left with that uncomfortable feeling. There are a lot of things that logically can be treated as a "whole change", whereas physically there are multiple small changes. It takes skill and experience, or even blunt trying to know when that is the case. Not everything can be solved this way mind you.
To scale or not to scale, that is often the question. If you don't need to scale, keep things on one box, be content with a certain backup/restore strategy, you can bend the rules and affect multiple aggregates in one go. But you have to be aware you're doing just that and not take it as a given, because inevitably change is going to come and it might mess with this particular way of handling things. So, fair warning. More subtle is the question as to why you're changing multiple aggregates in one go. People often respond to that with the "your aggregate boundaries are wrong" answer. In reality it means you have more domain and model exploration to do, to uncover the true motivation for those synchronous, multi-aggregate changes. Often a UI or service is the one that has this "unreasonable" expectation. But there might be other reasons and all it might take is a different set of abstractions to solve the same problem. This is a pretty essential aspect of DDD.
The example you gave seems like something I could handle as two separate transactions: an order was placed, and as a reaction to that, because the order was placed with a 100 items, the customer was made a VIP. As MikeSW hinted at in his answer (I started writing mine after he posted his), the question is when, who, how, and why should this customer status change be observed. Basically it's the "next" behavior that dictates the consistency requirements of the previous behavior(s).
An aggregate groups related business objects while an aggregate root (AR) is the 'representative' of that aggregate. Th AR itself is an entity modeling a (bigger, more complex) domain concept. In DDD a model is always relative to a context (the bounded context - BC) i.e that model is valid only in that BC.
This allows you to define a model representative of the specific business context and you don't need to shove everything in one model only. An Order is an AR in one context, while in another is just an id.
Since an AR pretty much encapsulates all the lower concepts and business rules, it acts as a whole i.e as a transaction/unit of work. A repository always works with AR because 1) a repo always deals with business objects and 2) the AR represents the business object for a given context.
When you have a use case involving 2 or more AR the business workflow and the correct modelling of that use case is paramount. In a lot of cases those AR can be modified independently (one doesn't care about other) or an AR changes as a result of other AR behaviour.
In your example, it's pretty trivial: when the customer places an order for 100 items, a domain event is generated and published. Then you have a handler which will check if the order complies with the customer promotions rules and if it does, a command is issued which will have the result of changing the client state to VIP.
Domain events are very powerful and allows you to implement transactions but in an eventual consistent environment. The old db transaction is an implementation detail and it's usually used when persisting one AR (remember AR are treated as a logical unit but persisting one may involve multiple tables hence db transaction).
Eventual consistency is a 'feature' of domain events which fits naturally a rich domain (and the real world actually). For some cases you might need instant consistency however those are particular cases and they are related to UI rather than how Domain works. Of course, it really depends from one domain to another. In your example, the customer won't mind it became a VIP 2 seconds or 2 minutes after the order was placed instead of the same milisecond.

Is structure (graph) of objects an Aggregate Root worthy of a Repository?

Philosophical DDD question here...
I've seen a lot of Entity vs. Value Object discussions here, but mine is slightly different. Forgive me if this has been covered before.
I'm working in the financial domain at the moment. We have funds (hedge variety). Those funds often invest into other funds. This results in a tree structure of sorts with one fund at the top anchoring it all together.
Obviously, a fund is an Entity (Aggregate Root, even). Things like trades and positions are most likely Value Objects.
My question is: Should the tree structure itself be considered an Aggregate Root?
Some thoughts:
The tree structure is stored in the DB by storing the components and the posistions they have into each other. We currently have no coded concept of the tree. The domain is very weak.
The tree structure has no "uniqueness" or identifier.
There is logic needed in many places to "walk" the tree to find the relationships to each other, either top-down, or sometimes bottom-up. This logic needs to be encapsulated somewhere.
There is lots of logic to compute leverage, exposure, etc... and roll it up the tree.
Is it good enough to treat the Fund as a Composite Fund object and that is the Aggregate Root with in-built Invariants? Or is a more formal tree structure useful in this case?
I usually take a more functional/domain approach to designing my aggregates and aggregate roots.
This results in a tree structure of sorts
Maybe you can talk with your domain expert to see if that notion deserves to be a first-class citizen with a name of its own in the ubiquitous language (FundTree, FundComposition... ?)
Once that is done, making it an aggregate root will basically depend on whether you consider the entity to be one of the main entry points in the application, i.e. will you sometimes need a reference to a FundTree before even having any reference to a Fund, or if you can afford to obtain it only by traversal of a Fund.
This is more a decision of if you want to load full trees at all times really.
If you are anal about what you define as an aggregate root, then you will find a lot of bloat as you will be loading full object trees any time you load them.
There is no one size fits all approach to this, but in my opinion, you should have your relationships all mapped to your aggregate roots where possible, but in some cases a part of that tree can be treated as an aggregate root when needed.
If you're in a web environment, this is a different decision to a desktop application.
In the web, you are starting again every page load so I tend to have a good MODEL to map the relationships and a repository for pretty much every entity (as I always need to save just a small part of something from some popup somewhere) and pull it together with services that are done per aggregate root. It makes the code predictable and stops those... "umm.... is this a root" moments or repositories that become unmanagable.
Then I will have mappers that can give me summary and/or listitem views of large trees as needed and when needed.
On a desktop app, you keep things in memory a lot more, so you will write less code by just working out what your aggregate roots are and loading them when you need them.
There is no right or wrong to this. I doubt you could build a big app of any sort without making compromises on what is considered an aggregate root and you'll always end up in a sitation where 2 roots end up joining each other somewhere.

Purpose and effect of SSAS hierarchies?

Firstly, I feel comfortable with what a hierarchy is in terms of the concept and how it impacts the design of a DW's star schema. I have some dimensions with lots of attributes, and I could create lots of hierarchies within SSAS. I would like a better understanding of how the OLAP engine uses the hierarchies that I create so that I can make a more informed decision on how I design my hierarchies(that's a tough word to type the first few times). There are also limitations with SSAS regarding attributes appearing in multiple hierachies so sometimes I have to do extra work to work around those limitations or decide which hierarchy is more important.
I also wonder what negative impacts a hierarchy might have, such as making the dimension more confusing for users. I might hide the attributes which are included in hierarchies to eliminate the duplicate attribute and make the dimension less confusing. But then a user wants to see which months of the year they typically get more sales. If I've hidden the month attribute so that it is only available through a Year->Month hierarchy, are they forced to always include the Year part of the hierarchy, preventing them from doing such analysis?
I few articles on hierarchies have stated something to the effect of "allowing the user to drill down to detailed data". Which is misleading, because you can simply drag the separate year and month attributes to a report and you've accomplished just that without the use of a hierarchy. So such an explanation is a little superficial. I feel like there must be a lot more to it than that.
Some articles seem to suggest it determines whether or not attributes are considered for aggregation. This seems counter intuitive, because I thought that already occurs when you included an attribute in a cube. I mean the whole point of creating a cube consisting of attributes, is to have an intersection of all of the attributes so that you can quickly aggregate on any combination of them, so it confuses me when something implies the opposite of that by saying only attributes in hierarchies are considered for aggregation:
Attributes only exposed in attribute hierarchies[as opposed to user
hierarchies] are not automatically considered for aggregation by the
Aggregation Design Wizard. Queries involving these attributes are
satisfied by summarizing data from the primary key. Without the
benefit of aggregations, query performance against these attributes
hierarchies can be slow.
-SSAS 2008 Performance Guide
Can someone explain how the engine uses my hierarchies in contrast with just including the attribute in the cube? (besides the aesthetics of grouping attributes together)
Unnatural hierarchies are confusing as heck to me in particular. In the SSAS 2008 Performance Guide they show one example as a Gender->Education hierarchy. I think my users would mumble "stupid programmer" every time they had to drill through Gender just to get to Education.
What rational do you follow on when and when not to create a hierarchy?
Not sure 100% the comments I will say applies to SSAS, but as we're both 100% MDX/XMLA compatible it's similar.
You may start by reading this and the many-to-many documentation.
The first difference between using hierarchies with levels and attributes is performance. You've two different scenarios for a drilldown (take [Asia] as a particular member and let's find all countries of [Asia]):
Using hierarchy with levels : [Asia].children()
Using attributes : ([Asia],[Countries])
The first option is trivial and very fast (the structure is in memory). The second one implies iterating though all countries and 'check' if they exist (aka are countries of [Asia]). This can be a pain for huge attributes (>100k). Once done, we need to go to our fact tables where each members has a set of associated fact rows. The version with a single hierarchy is again direct. The one with two might imply some additional internal operations -> all rows of [Asia] minus the ones of a particular country. Simplified version is also more handy for the cache.
Second, you define a 'natural' drilldown path that can be directly used in the GUI.
On top, you can add special aggregations types (First,Last, Min, Max...) that will take into account the structure of a given hierarchy.
There are successfully OLAP solutions that work without hierarchical structures but you've less features to play with for making a solution.
I hope it helps you understand these concepts better.