DDD: Do item counts belong in domain model? - repository

Say you are modeling a forum and you are doing your best to make use of DDD and CQRS (just the separate read model part). You have:
Category {
int id;
string name;
}
Post {
int id;
int categoryId;
string content;
}
Every time that a new post has been created a domain event PostCreated is raised.
Now, our view wants to project count of posts for each category. My domain doesn't care about count. I think I have two options:
Listen for PostCreated on the read model side and increment the count using something like CategoryQueryHandler.incrimentCount(categoryId).
Listen for PostCreated on domain side and increment the count using something like CategoryRepo.incrimentCount(categoryId).
The same question goes for all the other counts like number of posts by user, number of comments in a post, etc. If I don't use these counts anywhere except my views should I just have my query handlers take care of persisting them?
And finally if one of my domain services will ever want to have a count of posts in category do I have to implement the count property onto the category domain model or can that service simply use read model query to get that count or alternatively a repository query such as CategoryRepo.getPostCount(categoryId).

My domain doesn't care about count.
This is equivalent to saying that you don't have any invariant that requires or manages the count. Which means that there isn't an aggregate where count makes sense, so the count shouldn't be in your domain model.
Implement it as a count of PostCreated events, as you suggest, or by running a query against the Post store, or.... whatever works for you.
If I don't use these counts anywhere except my views should I just have my query handlers take care of persisting them?
That, or anything else in the read model -- but you don't even need that much if your read model supports something like select categoryId, count(*) from posts...
domain services will ever want to have a count of posts in category
That's a pretty strange thing for a domain service to want to do. Domain services are generally stateless query support - typically they are used by an aggregate to answer some question during command processing. They don't actually enforce any business invariant themselves, they just support an aggregate in doing so.
Querying the read model for counts to be used by the write model doesn't make sense, on two levels. First, that the data in the read model is stale - any answer you get from that query can change between the moment that you complete the query and the moment when you attempt to commit the current transaction. Second, once you've determined that stale data is useful, there's no particular reason to prefer the stale data observed during the transaction to stale data prior. Which is to say, if the data is stale anyway, you might as well pass it to the aggregate as a command argument, rather than hiding it in a domain service.
OTOH, if your domain needs it -- if there is some business invariant that constraints count, or one that uses the count to constrain something else -- then that invariant needs to be captured in some aggregate that controls the count state.
Edit
Consider two transactions running concurrently. In transaction A, Aggregate id:1 running a command that requires the count of objects, but the aggregate doesn't control that count. In transaction B, Aggregate id:2 is being created, which changes the count.
Simple case, the two transactions happen by luck to occur in contiguous blocks
A: beginTransaction
A: aggregate(id:1).validate(repository.readCount())
A: repository.save(aggregate(id:1))
A: commit
// aggregate(id:1) is currently valid
B: beginTransaction
B: aggregate(id:2) = aggregate.new
B: repository.save(aggregate(id:2))
B: commit
// Is aggregate(id:1) still in a valid state?
I represent that, if aggregate(id:1) is still in a valid state, then its validity doesn't depend on the timeliness of the repository.readCount() -- using the count prior to the beginning of the transaction would have been just as good.
If aggregate(id:1) is not in a valid state, then its validity depends on data outside its own boundary, which means that the domain model is wrong.
In the more complicated case, the two transactions can be running concurrently, which means that we might see the save of aggregate(id:2) happen between the read of the count and the save of aggregate(id:1), like so
A: beginTransaction
A: aggregate(id:1).validate(repository.readCount())
// aggregate(id:1) is valid
B: beginTransaction
B: aggregate(id:2) = aggregate.new
B: repository.save(aggregate(id:2))
B: commit
A: repository.save(aggregate(id:1))
A: commit
It may be useful to consider also why having a single aggregate that controls the state fixes the problem. Let's change this example up, so that we have a single aggregate with two entities....
A: beginTransaction
A: aggregate(version:0).entity(id:1).validate(aggregate(version:0).readCount())
// entity(id:1) is valid
B: beginTransaction
B: entity(id:2) = entity.new
B: aggregate(version:0).add(entity(id:2))
B: repository.save(aggregate(version:0))
B: commit
A: repository.save(aggregate(version:0))
A: commit
// throws VersionConflictException
Edit
The notion that the commit (or the save, if you prefer) can throw is an important one. It highlights that the model is a separate entity from the system of record. In the easy cases, the model prevents invalid writes and the system of record prevents conflicting writes.
The pragmatic answer may be to allow this distinction to blur. Trying to apply a constraint to the count is an example of Set Validation. The domain model is going to have trouble with that unless a representation of the set lies within an aggregate boundary. But relational databases tend to be good at sets - if your system of record happens to be a relational store, you may be able to maintain the integrity of the set by using database constraints/triggers.
Greg Young on Set Validation and Eventual Consistency
How you approach any problem like this should be based on an understanding of the business impact of the particular failure. Mitigation, rather than prevention, may be more appropriate.

When it comes to counts of things I think one has to consider if you actually need to save the count to the DB or not.
In my view in most cases you do not need to save counts unless their calculation is very expensive. So I would not have a CategoryQueryHandler.incrementCount or CategoryRepo.incrementCount.
I would just have a PostService.getPostCount(categoryId) that runs a query like
SELECT COUNT(*)
FROM Post
WHERE CategoryId=categoryId
and then call it when your PostCreated event fires.

Related

How to express pagination in attribute based access control?

Based on my coarse reading, ABAC, i.e. attribute based access control, boils down to attach attributes to subjects, resources and other related entities (such as actions to be performed on the resources), and then evaluate a set of boolean valued functions to grant or deny the access.
To be concrete, let's consider XACML.
This is fine when the resource to be accessed is known before it hits the decision engine (PDP, in the case of XACML), e.g. view the mobile number of some account, in which case the attributes of the resource to be accessed probability can be easily retrieved with a single select SQL.
However consider the function of listing one's bank account transaction history, 10 entries per page, let's assume that only the account owner can view this history, and the transaction is stored in the database in a table transaction like:
transaction_id, from_account_id, to_account_id, amount, time_of_transaction
This function, without access control, is usually written with a SQL like this:
select to_account_id, amount, time_of_transaction
from transaction
where from_account_id = $current_user_account_id
The question: How can one express this in XACML? Obviously, the following approach is not practical (due to performance reasons):
Attach each transaction in the transaction table with the from_account_id attribute
Attach the request (of listing transaction history) with the account_id attribute
The decision rule, R, is if from_account_id == account_id then grant else deny
The decision engine fetch loops the transaction table, evaluate each row according to R, if granted, then emit the row, util 10 rows are emitted.
I assume that there will be some preprocess step to fetch the transactions first, (without consulting the decision engine), and then consult the decision engine with the fetched transaction, to see if it has access?
What you are referring to is known as 'open-ended' or data-centric authorization i.e.access control on an unknown number (or a large number) of items such as a bank account's transaction history. Typically ABAC (and XACML or alfa) have a decision model that is transactional (i.e. Can Alice view record #123?)
It's worth noting the policy in XACML/ALFA doesn't change in either scenario. You'd still write something along the lines of:
A user can view a transaction history item if the owner is XXX and the date is less than YYY...
What you need to consider is how to ask the question (that goes from the PEP to the PDP). There are 2 ways to do this:
Use the Multiple Decision Profile to bundle your request e.g. Can Alice view items #1, #2, #3...
Use an open-ended request. This is known as partial evaluation or reverse querying. Axiomatics has a product (ARQ) that addresses this use case.
I actually wrote about a similar use case in this SO post.
HTH,
David

Repository Pattern Dilemma: Redundant Queries vs. Database Round Trips

This is the situation:
Say I have an application in which two entity types exist:
Company
Person
Moreover, Person has a reference to Company via Person.employer, which denotes the company a person is employed at.
In my application I am using repositories to separate the database operations from my business-model related services: I have a PersonRepository.findOne(id) method to retrieve a Person entity and a CompanyRepository.findOne(id) method to retrieve a Company. So far so good.
This is the dilemma:
Now if I make a call to PersonRepository.findOne(id) to fetch a Person entity, I also need to have a fully resolved Company included inline via the Person.employer property – and this is where I am facing the dilemma of having two implementation options that are both suboptimal:
Option A) Redundant queries throughout my repositories but less database round trips:
Within the PersonRepository I can build a query which selects the user and also selects the company in a single query – however, the select expression for the company is difficult and includes some joins in order to assemble the company correctly. The CompanyRepository already contains this logic to select the company and rewriting it in the UserRepository is redundant. Hence, ideally I only want the CompanyRepository to take care of the company selection logic in order to avoid having to code the same query expression redundantly in two repositories.
Option B): Separation of concerns without query-code redundancy but at the price of additional db roundtrips and repo-dependencies:
Within the PersonRepository I could reference the CompanyRepository to take care of fetching the Company object and then I would add this entity to the Person.employer property in the PersonRepository. This way, I kept the logic to query the company encapsulated inside the CompanyRepository by which a clean separation of concerns is achieved. The downside of this is that I make additional round trips to the database as two separate queries are executed by two repositories.
So generally speaking, what is the preferred way to deal with this dilemma?
Also, what is the preferred way to handle this situation in ASP.NET Core and EF Core?
Edit: To avoid opinion based answers I want to stress: I am not looking for a pros and cons of the two options presented above but rather striving for a solution that integrates the good parts of both options – because maybe I am just on the wrong track here with my two listed options. I am also fine with an answer that explains why there is no such integrative solution, so I can sleep better and move on.
In order to retrieve a company by ID you need to read Person's data, and fetch company ID from it. Hence if you would like to keep company-querying logic in a single place, you would end up with two round-trips - one to get company ID (along with whatever other attributes a Person has) and one more to get the company itself.
You could reuse the code that makes a company from DbDataReader, but the person+company query would presumably require joining to "forward" person's companyId to the Company query, so the text of these queries would have to be different.
You could have it both ways (one roundtrip, no repeated queries) if you move querying logic into stored procedures. This way your person_sp would execute company_sp, and return you all the relevant data. If necessary, your C# code would be able to harvest multi-part result set using reader.NextResult(). Now the "hand-off" of the company ID would happen on RDBMS side, eliminating the second round-trip. However, this approach would require maintaining stored procedures on RDBMS side, effectively shipping some repository logic out of your C# code base.

EF: Inserting already present record in many to many relationship

For what I searched there are 2 ways to insert an already present record into a ICollection list:
group.Users.Add(db.Users.FirstOrDefault(x=> x.Id = 1));
var to_add = new User{Id: 1}; db.Users.Attach(to_add); group.Users.Add(to_add);
The problem with both the above approach is it makes a db call every time we want to add a record. While we already know the user's Id and the group's id and that's all it needs to create a relationship.
Imagine a long list to be added, both the above methods would make multiple calls to db.
So you have Groups and Users. Every Group has zero or more Users; every User has zero or more Groups. A traditional many-to-many relationship.
Normally one would add a User to a Group, or a Group to a User. However you don't have a Group, nor a User, you only have a GroupId and a UserId. and because of the large number of insertions you don't want to fetch the Users and the Groups of which you want to create relations
The problem is, if you could add the GroupId-UserId combination directly to your junction table, how would you know that you wouldn't be adding a Group-User relation that already exists? If you wouldn't care, you'd end up with twice the relation. This would lead to problems: Would you want them to be shown twice if you'd ask the Users of a Group? Which one should be removed if the relation ends, or should they all be removed?
If you really want to implement the possibility of double relation, then you'd need to Implement a a Custom Junction Table as described here The extra field would be the number of relations.
This would not help you with your large batch, because you would still need to fetch the field from the custom junction table to increment the NrOfRelations value.
On the other hand, if you don't want double relations, you'd have to check whether the value already exists, and you didn't want to fetch data before inserting.
Usually the number of additions to a database is far less then the number of queries. If you have a large batch of data to be inserted, then it is usually only during the initialization phase of the database. I wouldn't bother optimizing initialization too much.
Consider remembering already fetched Groups and Users in a dictionary, preventing them to be fetched twice. However, if your list is really huge, this is not a practical solution.
If you really need this functionality for a prolonged period of time consider creating a Stored Procedure that checks if the GroupId / UserId already exists in the junction table, and if not, add it.
See here For SQL code how to do Add-Or-Update
Entity Framework call stored procedure

LogiQL: Use of Transaction ID and Unique Identifiers

I am just getting started with LoqiQL and LogicBlox. While looking at the different operators, I came across the transaction ID and the unique identifiers. What are use cases for these two operators?
These two operators are not very commonly used. Because the identifiers for both the uid p2p and transaction:id are only unique in the lifetime of a database, you can't use them as UUIDs (we'll add that functionality at some point). If you need to export the data and re-import into a different workspace, then you can will get conflicts eventually.
The transaction identifier can be helpful for debugging issues. For example, in LogiQL you can write a delta rule that records a log of changes to a predicate. Instead of using datetime:now (the resolution might not be sufficient), you can use the transaction id to keep a log of changes per transaction.

Is it acceptable to have multiple aggregation that can theoretically be inconsistent?

I have a question about the modelling of classes and the underlying database design.
Simply put, the situation is as follows: at the moment we have Positions and Accounts objects and tables and the relationship between them is that a Position 'has an' Account (an Account can have multiple Positions). This is simple aggregation and is handled in the DB by the Position table holding an Account ID as a foreign key.
We now need to extend this 'downwards' with Trades and Portfolios. One or more Trades make up a Position (but a Trade is not a Position in itself) and one or more Portfolios make up an Account (but a Portfolio is not an Account in itself). Trades are associated with Portfolios just like Positions are associated with Accounts ('has a'). Note that it is still possible to have a Position without Trades and an Account without Portfolios (i.e. it is not mandatory to have all the existing objects broken down in subcomponents).
My first idea was to go simply for the following (the first two classes already exist):
class Account;
class Position {
Account account;
}
class Portfolio {
Account account;
}
class Trade {
Position position;
Portfolio portfolio;
}
I think the (potential) problem is clear: starting from Trade, you might end up in different Accounts depending if you take the Position route or the Portfolio route. Of course this is never supposed to happen and the code that creates and stores the objects should never be able create such an inconsistency. I wonder though whether the fact that it is theoretically possible to have an inconsistent database implies a flawed design?
Looking forward to your feedback.
The design is not flawed just because there are two ways to get from class A to class D, one way over B and one over C. Such "squares" will appear often in OOP class models, sometimes not so obvious, especially if more classes lie in the paths. But as Dan mentioned, always the business semantics determine if such a square must commute or not (in the mathematic sense).
Personally I draw a = sign inside such a square in the UML diagram to indicate that it must commute. Also I note the precise formula in an UML comment, in my example it would be
For every object a of class A: a.B.D = a.C.D
If such a predicate holds, then you have basically two options:
Trust all programmers to not break the rule in any code, since it is very well documented
Implement some error handling (like Dan and algirdas mentioned) or, if you don't want to have such code in your model, create a Checker controller, which checks all conditions in a given model instance.