DDD Aggregate boundaries one to some and not one to many between entities in one aggregate - entity

I've watched a tutorial about DDD in which it says that if I have aggregate root SnackMachine which has more than 30 child elements the child elements should be in separate aggregate. For example, SnackMachine has lots of PurshaseLog (more than 30) and it is better for PurshaseLog to be in a separate aggregate. Why is that?

The reason for limiting the overall size of an aggregate is because you always load the full aggregate into memory and you always store the full aggregate transactionally. A very large aggregate would cause technical problems.
That said, there is no such "30 child elements" rule in aggregate design and it sounds arbitrary as a rule. For example, having fewer very large child elements could be technically worse than having 30 very light child elements. A good way of storing aggregates is as json documents, given that you'll always read and write the documents as atomic operations. If you think it this way, you'll realise that an aggregate design that implies a very large or even ever-growing child collection will eventually cause problems. A PurhaseLog sounds like an ever-growing collection.
The second part of the rule that says "put it in a separate aggregate" is also not correct. You don't create aggregates because you need to store some data and it doesn't fit into an existing aggregate. You create aggregates because you need to implement some business logic and this business logic will need some data, so you put both things together in an aggregate.
So, although what you explain in your question are things to take into consideration when designing aggregates to avoid having technological problems, I'd suggest you put your attention to the actual responsibilities of the aggregate.
In your example, what are the responsibilities of the SnackMachine? Does it really need the (full) list of PurchaseLogs? What operations will the SnackMachine expose? Let's say that it exposes PurchaseProduct(productId) and LoadProduct(productId, quantity). To execute its business logic, this aggregate would need a list of products and keep count of their available quantity, but it wouldn't need to store the purchase log. Instead, at every Purchase, it could publish an event ProductPurchased(SnackMachineId, ProductId, Date, AvailableQuantity). Then external systems could subscribe to this event. One subscriber could register the PurchaseLog for reporting purposes and another subscriber could send someone to reload the machine when the stock was lower than X.

If PurchaseLog is not its own aggregate then it implies that it can only be retrieved or added as part of the child collection of SnackMachine.
Therefore, each time you want to add a PurchaseLog, you'd retrieve the SnackMachine with its child PurchaseLogs, add the PurchaseLog to its collection. Then save changes on your unit of work.
Did you really need to retrieve 30+ purchase logs which are redundant for the purpose of the use case of creating a new purchase log?
Application Layer - Option 1 (PurchaseLog is an owned entity of SnackMachine)
// Retrieve the snack machine from repo, along with child purchase logs
// Assuming 30 logs, this would retrieve 31 entities from the database that
// your unit of work will start tracking.
SnackMachine snackMachine = await _snackMachineRepository.GetByIdAsync(snackMachineId);
// Ask snack machine to add a new purchase log to its collection
snackMachine.AddPurchaseLog(date, quantity);
// Update
await _unitOfWork.SaveChangesAsync();
Application Layer - Option 2 (PurchaseLog is an aggregate root)
// Get a snackmachine from the repo to make sure that one exists
// for the provided id. (Only 1 entity retrieved);
SnackMachine snackMachine = await _snackMachineRepository.GetByIdAsync(snackMachineId);
// Create Purhcase log
PurchaseLog purchaseLog = new(
snackMachine,
date,
quantity);
await _purchaseLogRepository.AddAsync(purchaseLog);
await _unitOfWork.SaveChangesAsync()
PurchaseLog - option 2
class PurchaseLog
{
int _snackMachineId;
DateTimne _date;
int _quantity;
PurchaseLog(
SnackMachine snackMachine,
DateTime date,
int quantity)
{
_snackMachineId = snackMachine?.Id ?? throw new ArgumentNullException(nameof(snackMachine));
_date = date;
_quantity = quantity;
}
}
The second option follows the contours of your use case more accurately and also results in a lot less i/o with the database.

Related

How to handle concurrency in faunadb

I've some backend APIs which connect to faunadb; I'm able to do everything I need with data but I've some serious doubts about concurrent modifications (which maybe are not strictly related to faunadb only, but I'd like to understand how to deal with it using this technology).
One example above all: I want to create a new document (A) in a collection (X) which is linked (via reference or other fields) to other documents (B and C) in another collection (Y); in order to be linked, these documents (B and C) must satisfy a condition (e.g. field F = "V"). Once A has been created, B and C cannot be modified (or the condition will be invalidated!).
Of course the API to create the document A can run concurrently with the API used to modify documents B and C.
Here comes the doubt: what if, while creating the document A linked to document B and C, someone else changes field F of document B to something different from "V"?
I could end up with A linked to a wrong document, because both APIs don't know what the other one is doing..
Do I need to use the "Do" function in both APIs to create atomic transactions? So I can:
Check if B and C are valid and, if yes, create A in a single transaction
Check if B is linked to A and, if it doesn't, modify it in a single transaction
Thanks everyone.
Fauna tries to present a consistent data view no matter or when your clients need to ask. Isolation of transaction effects is what matters on short time scales (typically less than 10ms).
The Do function merely lets you combine multiple disparate FQL expressions into a single query. There is no conditional processing aspect to Do.
You can certainly check conditions before undertaking operations, and all Fauna queries are atomic transactions: all of the query succeeds or none of its does.
Arranging for intermediate query values in order to perform conditional logic does tend to make FQL queries more complex, but they are definitely possible:
The query for your first API might look something like this:
Let(
{
document_b: Get(<reference to B document>),
document_c: Get(<reference to C document>),
required_b: Select(["data", "required_field"], Var("document_b"),
required_c: Select(["data", "other_required"], Var("document_c"),
condition: And(Var("required_b"), Var("required_c")),
},
If(
Var("condition"),
Create(Collection("A"), { data: { <document A data }}),
Abort("Cannot create A because the conditions have not been met.")
)
)
The Let function allows you to compose named values for intermediate expressions, which can read or write whatever they need, along with logical operations that determine which conditions need to be tested. The value composition is followed by an expression which, in this example, tests the conditions and only creates the document in the A collection when the conditions are met. When the conditions are not met, the transaction is aborted with an appropriate error message.
Let can nest Lets as much as required, provide the query fits within the maximum query length of 16MB, so you can embed a significant amount of logic into your queries. When the length of a single query is not sufficient, you can define UDFs that can be called, which allow you to store business logic that you can use any number of times.
See the E-commerce tutorial for a UDF that performs all of the processing required to submit an order, check if there is sufficient product in stock, deduct requested quantities from inventory, set backordered status, and create the order.

Efficient Querying Data With Shared Conditions

I have multiple sets of data which are sourced from an Entity Framework code-first context (SQL CE). There's a GUI which displays the number of records in each query set, and upon changing some set condition (e.g. Date), the sets all need to recalculate their "count" value.
While every set's query is slightly different in some way, most of them share common conditions in some way. A simple example:
RelevantCustomers = People.Where(P=>P.Transactions.Where(T=>T.Date>SelectedDate).Count>0 && P.Type=="Customer")
RelevantSuppliers = People.Where(P=>P.Transactions.Where(T=>T.Date>SelectedDate).Count>0 && P.Type=="Supplier")
So the thing is, there's enough of these demanding queries, that each time the user changes some condition (e.g. SelectedDate), it takes a really long time to recalculate the number of records in each set.
I realise that part of the reason for this is the need to query through, for example, the transactions each time to check what is really the same condition for both RelevantCustomers and RelevantSuppliers.
So my question is that, given these sets share common "base conditions" which depend on the same sets of data, is there some more efficicent way I could be calculating these sets?
I was thinking something with custom generic classes like this:
QueryGroup<People>(P=>P.Transactions.Where(T=>T.Date>SelectedDate).Count>0)
{
new Query<People>("Customers", P=>P.Type=="Customer"),
new Query<People>("Suppliers", P=>P.Type=="Supplier")
}
I can structure this just fine, but what I'm finding is that it makes basically no difference to the efficiency as it still needs to repeat the "shared condition" for each set.
I've also tried pulling the base condition data out as a static "ToList()" first, but this causes issues when running into navigation entities (i.e. People.Addresses don't get loaded).
Is there some method I'm not aware of here in terms of efficiency?
Thanks in advance!
Give something like this a try: Combine "similar" values into fewer queries, then separate the results afterwards. Also, use Any() rather than Count() for exists check. Your updated attempt goes part-way, but will still result in 2x hits to the database. Also, when querying it helps to ensure that you are querying against indexed fields, and those indexes will be more efficient with numeric IDs rather than strings. (I.e. a TypeID of 1 vs. 2 for "Customer" vs. "Supplier") Normalized values are better for indexing and lead to smaller records, at the cost of extra verbose queries.
var types = new string[] {"Customer", "Supplier"};
var people = People.Where(p => types.Contains(p.Type)
&& p.Transactions.Any(t => t.Date > selectedDate)).ToList();
var relevantCustomers = people.Where(p => p.Type == "Customer").ToList();
var relevantSuppliers = people.Where(p => p.Type == "Supplier").ToList();
This results in just one hit to the database, and the Any should be more perform-ant than fetching an entire count. We split the customers and suppliers after the fact from the in-memory set. The caveat here is that any attempt to access details such as transactions etc. on customers and suppliers would result in lazy-load hits since we didn't eager load them. If you need entire entity graphs then be sure to .Include() relevant details, or be more selective on the data extracted from the first query. I.e. select anonymous types with the applicable details rather than just the entity.

How to query multiple aggregates efficiently with DDD?

When I need to invoke some business method, I need to get all aggregate roots related to the operation, even if the operation is as primitive as the one given below (just adding item into a collection). What am I missing? Or is CRUD-based approach where you run one single query including table joins, selects and insert at the end - and database engine does all the work for you - actually better in terms of performance?
In the code below I need to query separate aggregate root (which creates another database connection and sends another select query). In real world applications I have been querying a lot more than one single aggregate, up to 8 for a single business action. How can I improve performance/query overhead?
Domain aggregate roots:
class Device
{
Set<ParameterId> parameters;
void AddParameter(Parameter parameter)
{
parameters.Add(parameter.Id);
}
}
class Parameter
{
ParameterId Id { get; }
}
Application layer:
class DeviceApplication
{
private DeviceRepository _deviceRepo;
private ParameterRepository _parameterRepo;
void AddParameterToDevice(string deviceId, string parameterId)
{
var aParameterId = new ParameterId(parameterId);
var aDeviceId = new DeviceId(deviceId);
var parameter = _parameterRepo.FindById(aParameterId);
if (parameter == null) throw;
var device = _deviceRepo.FindById(aDeviceId);
if (device == null) throw;
device.AddParameter(parameter);
_deviceRepo.Save(device);
}
}
Possible solution
I've been told that you can pass just an Id of another aggregate like this:
class Device
{
void AddParameter(ParameterId parameterId)
{
parameters.Add(parameterId);
}
}
But IMO it breaks incapsulation (by explicitely emphasizing term ID into the business), also it doesn't prevent from pasting wrong or otherwise incorrect identity (created by user).
And Vaughn Vernon gives examples of application services that use the first approach (passing whole aggregate instance).
The short answer is - don't query your aggregates at all.
An aggregate is a model that exposes behaviour, not data. Generally, it is considered a code smell to have getters on aggregates (ID is the exception). This makes querying a little tricky.
Broadly speaking there are 2 related ways to go about solving this. There are probably more but at least these don't break the encapsulation.
Option 1: Use domain events -
By getting your domain (aggregate roots) to emit events which illustrate the changes to internal state you can build up tables in your database specifically designed for querying. Done right you will have highly performant, denormalised queryable data, which can be linearly scaled if necessary. This makes for very simple queries. I have an example of this on this blog post: How to Build a Master-Details View when using CQRS and Event Sourcing
Option 2: Infer query tables -
I'm a huge fan of option 1 but if you don't have an event sourced approach you will still need to persist the state of your aggregates at some point. There are all sorts of ways to do this but you could plug into the persistence pipeline for your aggregates a process whereby you extract queryable data into a read model for use with your queries.
I hope that makes sense.
If you figured out that having RDBMS query with joins will work in this case - probably you have wrong aggregate boundaries.
For example - why would you need to load the Parameter in order to add it to the Device? You already have the identity of this Parameter, all you need to do is to add this id to the list of references Parameters in the Device. If you do it in order to satisfy your ORM - you're most probably doing something wrong.
Also remember that your aggregate is the transactional boundary. You really would want to complete all database operations inside one transaction and one connection.

NHibernate Filtering data best practices

I have the following situation:
User logs in, opens an overview of all products, can only see a list of products where a condition is added, this condition is variable. Example: WHERE category in ('catA', 'CatB')
Administrator logs in, opens an overview of all products, he can see all products no filter applied.
I need to make this as dynamically as possible. My data access classes are using Generics for most of the time.
I've seen filters but my conditions are very variable, so i don't see this as scalable enough.
We use NH filters for something similar, and it works fine. If no filter needs to be applied, you can omit setting any value for the filter. We use these filters for more basic stuff, filters that are applied nearly 100% of the time, fx deleted objects filters, client data segregating, etc. Not sure what scalability aspect you're looking for?
For more high level and complex filtering, we use a custom class that manipulates a repository root. Something like the following:
public IQueryOver<TIn, TOut> Apply(IQueryOver<TIn, TOut> query)
{
return query.Where(x => ... );
}
If you have an IoC container integrated with your NH usage, something like this can easily be generalized and plugged into your stack. We have these repository manipulators that do simple where clauses, and others that generate complex where clauses that reference domain logic and others that joins a second table on and filters on that.
You could save all categories in an category list and pass this list to the query. If the list is not null and contains elements you can work with the following:
List<string> allowedCategoriesList = new List<string>();
allowedCategoriesList.Add(...);
...
.WhereRestrictionOn(x => x.category).IsIn(allowedCategoriesList)
It's only important to skip this entry if you do not have any filters (so, you want to see all entries without filtering), as you will otherwise see not one single result.

Searching Authorize.net CIM Records

Has anyone come up with an elegant way to search data stored on Authorize.net's Customer Information Manager (CIM)?
Based on their XML Guide there doesn't appear to be any search capabilities at all. That's a huge short-coming.
As I understand it, the selling point for CIM is that the merchant doesn't need to store any customer information. They merely store a unique identifier for each and retrieve the data as needed. This may be great from a PCI Compliance perspective, but it's horrible from a flexibility standpoint.
A simple search like "Show me all orders from Texas" suddenly becomes very complicated.
How are the rest of you handling this problem?
The short answer is, you're correct: There is no API support for searching CIM records. And due to the way it is structured, there is no easy way to use CIM alone for searching all records.
To search them in the manner you describe:
Use getCustomerProfileIdsRequest to get all the customer profile IDs you have stored.
For each of the CustomerProfileIds returned by that request, use getCustomerProfileRequest to get the specific record for that client.
Examine each record at that time, looking for the criterion you want, storing the pertinent records in some other structure; a class, a multi-dimensional array, an ADO DataTable, whatever.
Yes, that's onerous. But it is literally the only way to proceed.
The previously mentioned reporting API applies only to transactions, not the Customer Information Manager.
Note that you can collect the kind of data you want at the time of recording a transaction, and as long as you don't make it personally identifiable, you can store it locally.
For example, you could run a request for all your CIM customer profile records, and store the state each customer is from in a local database.
If all you store is the state, then you can work with those records, because nothing ties the state to a specific customer record. Going forward, you could write logic to update the local state record store at the same time customer profile records are created / updated, too.
I realize this probably isn't what you wanted to hear, but them's the breaks.
This is likely to be VERY slow and inefficient. But here is one method. Request an array of all the customer Id's, and then check each one for the field you want... in my case I wanted a search-by-email function in PHP:
$cimData = new AuthorizeNetCIM;
$profileIds = $cimData->getCustomerProfileIds();
$profileIds = $cimData->getCustomerProfileIds();
$array = $profileIds->xpath('ids');
$authnet_cid = null;
/*
this seems ridiculously inefficient...
gotta be a better way to lookup a customer based on email
*/
foreach ( $array[0]->numericString as $ids ) { // put all the id's into an array
$response = $cimData->getCustomerProfile($ids); //search an individual id for a match
//put the kettle on
if ($response->xml->profile->email == $email) {
$authnet_cid = $ids;
$oldCustomerProfile = $response->xml->profile;
}
}
// now that the tea is ready, cream, sugar, biscuits, you might have your search result!
CIM's primary purpose is to take PCI compliance issues out of your hands by allowing you to store customer data, including credit cards, on their server and then access them using only a unique ID. If you want to do reporting you will need to keep track of that kind of information yourself. Since there's no PCI compliance issues with storing customer addresses, etc, it's realistic to do this yourself. Basically, this is the kind of stuff that needs to get flushed out during the design phase of the project.
They do have a new reporting API which may offer you this functionality. If it does not it's very possible it will be offered in the near future as Authnet is currently actively rolling out lots of new features to their APIs.