Can I Read/Write from separate actors with same PersistenceId? - akka.net

The Petabridge blog's Akka.Persistence intro makes it clear that you can't have multiple actors with the same PersistenceId:
The PersistenceId field is important - it uniquely identifies an entity that is persisting its state using Akka.Persistence, and there should be exactly one persistent actor at any given time for a single PersistenceId.
[...] so imagine if you have two actors with the same PersistenceId but different sequence numbers writing to the same store. It will be chaos and will inevitably error out - so that’s why it’s crucial that every PersistenceId be globally unique within your ActorSystem (at least for all actors writing to that store.)
I can think of a scenario where you would have two separate actors: one that takes care of saving persistence state to database (i.e. calls Persist()), and another one that replays messages from the journal when manually requested to do so (i.e. calls Recover()). The read and write operations would occur from different actors. Only one ever writes, and only one ever reads. However, both need the same PersistenceId.
I believe that in this scenario it should be safe to have two actors using the same PersistenceId. But given the above warnings quoted above, is there any reason why such an approach could be dangerous in practice?

I can think of a scenario where you would have two separate actors:
one that takes care of saving persistence state to database (i.e.
calls Persist()), and another one that replays messages from the
journal when manually requested to do so (i.e. calls Recover()). The
read and write operations would occur from different actors. Only one
ever writes, and only one ever reads. However, both need the same
PersistenceId.
The behaviour you require is already exposed as Persistent Actors and Persistent Views. From the docs:
While a persistent actor may be used to produce and persist events,
views are used only to read internal state based on them. Like the
persistent actor, a view has a PersistenceId to specify a collection
of events to be resent to current view. This value should however be
correlated with the PersistentId of an actor who is the producer of
the events.
Edit: updated to provide more info on how to access events in the Persistent View.
You can load from a journal by overriding the Receive method of a Persistent View. The argument for this method is an object, so you'll need to cast that object to whatever event(s) you have persisted via the Persistent Actor.
The Receive method also handles any other messages you pass to the View - e.g. a read request from the presentation layer. I usually store a list of events internally in the View and return a custom view model from these.
protected override bool Receive(object message)
{
// if the message is a previously persisted event, update our internal list
var e = message as MyEvent;
if (e != null) _events.Add(e);
return true;
// if the message is a request for a view model, read from our list of stored events
var r = message as ReadRequest;
if (r == null) return false;
Sender.Tell(new ViewModel(_events));
return true;
}

Related

DDD: Do item counts belong in domain model?

Say you are modeling a forum and you are doing your best to make use of DDD and CQRS (just the separate read model part). You have:
Category {
int id;
string name;
}
Post {
int id;
int categoryId;
string content;
}
Every time that a new post has been created a domain event PostCreated is raised.
Now, our view wants to project count of posts for each category. My domain doesn't care about count. I think I have two options:
Listen for PostCreated on the read model side and increment the count using something like CategoryQueryHandler.incrimentCount(categoryId).
Listen for PostCreated on domain side and increment the count using something like CategoryRepo.incrimentCount(categoryId).
The same question goes for all the other counts like number of posts by user, number of comments in a post, etc. If I don't use these counts anywhere except my views should I just have my query handlers take care of persisting them?
And finally if one of my domain services will ever want to have a count of posts in category do I have to implement the count property onto the category domain model or can that service simply use read model query to get that count or alternatively a repository query such as CategoryRepo.getPostCount(categoryId).
My domain doesn't care about count.
This is equivalent to saying that you don't have any invariant that requires or manages the count. Which means that there isn't an aggregate where count makes sense, so the count shouldn't be in your domain model.
Implement it as a count of PostCreated events, as you suggest, or by running a query against the Post store, or.... whatever works for you.
If I don't use these counts anywhere except my views should I just have my query handlers take care of persisting them?
That, or anything else in the read model -- but you don't even need that much if your read model supports something like select categoryId, count(*) from posts...
domain services will ever want to have a count of posts in category
That's a pretty strange thing for a domain service to want to do. Domain services are generally stateless query support - typically they are used by an aggregate to answer some question during command processing. They don't actually enforce any business invariant themselves, they just support an aggregate in doing so.
Querying the read model for counts to be used by the write model doesn't make sense, on two levels. First, that the data in the read model is stale - any answer you get from that query can change between the moment that you complete the query and the moment when you attempt to commit the current transaction. Second, once you've determined that stale data is useful, there's no particular reason to prefer the stale data observed during the transaction to stale data prior. Which is to say, if the data is stale anyway, you might as well pass it to the aggregate as a command argument, rather than hiding it in a domain service.
OTOH, if your domain needs it -- if there is some business invariant that constraints count, or one that uses the count to constrain something else -- then that invariant needs to be captured in some aggregate that controls the count state.
Edit
Consider two transactions running concurrently. In transaction A, Aggregate id:1 running a command that requires the count of objects, but the aggregate doesn't control that count. In transaction B, Aggregate id:2 is being created, which changes the count.
Simple case, the two transactions happen by luck to occur in contiguous blocks
A: beginTransaction
A: aggregate(id:1).validate(repository.readCount())
A: repository.save(aggregate(id:1))
A: commit
// aggregate(id:1) is currently valid
B: beginTransaction
B: aggregate(id:2) = aggregate.new
B: repository.save(aggregate(id:2))
B: commit
// Is aggregate(id:1) still in a valid state?
I represent that, if aggregate(id:1) is still in a valid state, then its validity doesn't depend on the timeliness of the repository.readCount() -- using the count prior to the beginning of the transaction would have been just as good.
If aggregate(id:1) is not in a valid state, then its validity depends on data outside its own boundary, which means that the domain model is wrong.
In the more complicated case, the two transactions can be running concurrently, which means that we might see the save of aggregate(id:2) happen between the read of the count and the save of aggregate(id:1), like so
A: beginTransaction
A: aggregate(id:1).validate(repository.readCount())
// aggregate(id:1) is valid
B: beginTransaction
B: aggregate(id:2) = aggregate.new
B: repository.save(aggregate(id:2))
B: commit
A: repository.save(aggregate(id:1))
A: commit
It may be useful to consider also why having a single aggregate that controls the state fixes the problem. Let's change this example up, so that we have a single aggregate with two entities....
A: beginTransaction
A: aggregate(version:0).entity(id:1).validate(aggregate(version:0).readCount())
// entity(id:1) is valid
B: beginTransaction
B: entity(id:2) = entity.new
B: aggregate(version:0).add(entity(id:2))
B: repository.save(aggregate(version:0))
B: commit
A: repository.save(aggregate(version:0))
A: commit
// throws VersionConflictException
Edit
The notion that the commit (or the save, if you prefer) can throw is an important one. It highlights that the model is a separate entity from the system of record. In the easy cases, the model prevents invalid writes and the system of record prevents conflicting writes.
The pragmatic answer may be to allow this distinction to blur. Trying to apply a constraint to the count is an example of Set Validation. The domain model is going to have trouble with that unless a representation of the set lies within an aggregate boundary. But relational databases tend to be good at sets - if your system of record happens to be a relational store, you may be able to maintain the integrity of the set by using database constraints/triggers.
Greg Young on Set Validation and Eventual Consistency
How you approach any problem like this should be based on an understanding of the business impact of the particular failure. Mitigation, rather than prevention, may be more appropriate.
When it comes to counts of things I think one has to consider if you actually need to save the count to the DB or not.
In my view in most cases you do not need to save counts unless their calculation is very expensive. So I would not have a CategoryQueryHandler.incrementCount or CategoryRepo.incrementCount.
I would just have a PostService.getPostCount(categoryId) that runs a query like
SELECT COUNT(*)
FROM Post
WHERE CategoryId=categoryId
and then call it when your PostCreated event fires.

Is it acceptable to have multiple aggregation that can theoretically be inconsistent?

I have a question about the modelling of classes and the underlying database design.
Simply put, the situation is as follows: at the moment we have Positions and Accounts objects and tables and the relationship between them is that a Position 'has an' Account (an Account can have multiple Positions). This is simple aggregation and is handled in the DB by the Position table holding an Account ID as a foreign key.
We now need to extend this 'downwards' with Trades and Portfolios. One or more Trades make up a Position (but a Trade is not a Position in itself) and one or more Portfolios make up an Account (but a Portfolio is not an Account in itself). Trades are associated with Portfolios just like Positions are associated with Accounts ('has a'). Note that it is still possible to have a Position without Trades and an Account without Portfolios (i.e. it is not mandatory to have all the existing objects broken down in subcomponents).
My first idea was to go simply for the following (the first two classes already exist):
class Account;
class Position {
Account account;
}
class Portfolio {
Account account;
}
class Trade {
Position position;
Portfolio portfolio;
}
I think the (potential) problem is clear: starting from Trade, you might end up in different Accounts depending if you take the Position route or the Portfolio route. Of course this is never supposed to happen and the code that creates and stores the objects should never be able create such an inconsistency. I wonder though whether the fact that it is theoretically possible to have an inconsistent database implies a flawed design?
Looking forward to your feedback.
The design is not flawed just because there are two ways to get from class A to class D, one way over B and one over C. Such "squares" will appear often in OOP class models, sometimes not so obvious, especially if more classes lie in the paths. But as Dan mentioned, always the business semantics determine if such a square must commute or not (in the mathematic sense).
Personally I draw a = sign inside such a square in the UML diagram to indicate that it must commute. Also I note the precise formula in an UML comment, in my example it would be
For every object a of class A: a.B.D = a.C.D
If such a predicate holds, then you have basically two options:
Trust all programmers to not break the rule in any code, since it is very well documented
Implement some error handling (like Dan and algirdas mentioned) or, if you don't want to have such code in your model, create a Checker controller, which checks all conditions in a given model instance.

Larman's System Operation Contracts - CRUD example

I have some confusion with applying Larman's system operation contracts (OO Analysis from book Applying UML and Patterns) on CRUD-like operations. More precisely, I'm confused with postcondition part.
For example, if I have CRUD system operations looking as follows:
createEmployee(employee:Employee),
readEmployee(employeeId:int),
updateEmployee(employee:Employee),
deleteEmployee(employeeId:int)
what would be postcondition on, for example, readEmployee system operation, or some other operation like searchEmployees etc?
For example: for read operation, system needs to read record from database, instantiate domain object, set attribute values on domain object (set relations also) and that's it. Does it means that postconditions are above mentioned - instance creation, changes on attributes, etc. Or, read operation does not have any postcondition. None of this does sound logical to me.
My confusion is about relation between domain model (state) and database (state). I just don't get implications which above operations will have on domain model. I always think in way that the database is a place that preserves the state of the system. After I create employee, its object's state will be persisted in database... But what happens with domain model state?
The post-condition defines what the state of your application (or object, depending on the level of abstraction) should be after the operation for it to be considered as successful. For the readEmployee operation, for example, the post-condition would be that:
a new Employee instance is created.
the Employee instance contains attributes matching the database values.
the database connection is closed.
I like to think of "pre-condition" and "post-condition" as the "state of mind" of your application before and after an operation has executed, respectively. As you can imagine, it's more a thought process than a coding exercise when you do DbC.
(If you do unit-testing, states make it clear what needs to be covered by your tests. Basically, you end up testing the "state of mind" of your application.)
Interestingly, if you consider the reverse of DbC, you realise that to identify what operations your application (or object) should expose, it is simply a matter of listing what states it can have and how it transitions between these states. The actions that you need to take to make these transitions then become your operations, and you do not have to bother with implementing operations that do not lead to any desired states. So, for example, you probably want the following states for your application.
Employee details added (S1)
Employee details loaded (S2)
Employee details updated (S3)
Employee details deleted (S4)
The following state transitions are possible.
S1 -> S3 (add new employee, update the details)
S1 -> S4 (add new employee, delete the employee)
S2 -> S3 (load employee details, update employee details)
S2 -> S4 (load employee details, delete employee)
S4 -> S1 (delete employee, add new employee)
S2 -> S1 (load employee details, add new employee)
S3 -> S1 (update employee details, add new employee)
S3 -> S2 (update employee details, load employee details)
Based on the above, you can write your operations in such a way that only valid transitions are allowed, with anything else giving rise to errors.
Impossible state transitions:
S4 -> S2 (cannot delete an employee, then load their details)
S4 -> S3 (cannot delete an employee, then update their details)
State modeling is probably the most important part of designing objects, so you're asking the right questions. If you want a good resource on state modeling, get Object Lifecycles Modeling the World in States from Sally Shlaer / Stephen Mellor. It is quite an old book and costs almost nothing on Amazon, but the principles it introduces form the basis of modern UML -- incidentally, the notation used in the book looks nothing like UML.
I realise I did not touch on database state, but at the conceptual level, the database layer is just another system of states and the same principles apply.
I hope this was useful.
My interpretation of Larman's contracts is always with respect to the domain model. Larman clearly states there are only 5 types of post conditions:
Instance creation
Instance deletion
Attribute change of value.
Associations formed.
Associations broken.
Therefore, a Read (or search) operation would have no post conditions, at least not on the elements that are being read or searched. For example, if 10,000 users performed reads/searches in one day, but never did any of the other operations (C, U, D), there would be no change to the objects in the domain.
There is an exception to this, however, in domains where searches/reads are remembered. For example, Google surely keeps track of searches. In this case, doing a search has the postcondition of creating a new object in their domain model, e.g., A Search instance s was created (instance creation).
The confusing comes form mentioning data model relation within the contract template that Larman provided as in :
Contract CO2: enterItem
Operation: enterItem(itemID : ItemID, quantity : integer)
...
sli was associated with a ProductSpecification, based on itemID match (association formed).
The detail referential properties of the database should not be mentioned in the operation contract. It is better to leave it as: "sli was associated with a ProductSpecification".
In fact, it is one of the things that Larman's operation contracts does not talk about in much detail. Think about a contract for an operation that calculates a total number of items and return the total ! seems that it cannot be written as an operation contract.

NHibernate HiLo ID Generator. Generating an ID before saving

I'm trying to use 'adonet.batch_size' property in NHibernate. Now, I'm creating entities across multiple sessions at a large rate (hence batch inserting). So what I'm doing is creating a buffer where I keep these entities and them flush them out all at once periodically.
However I need the ID's as soon as I create the entities. So I want to create an entity (in any session) and then have its ID generated (I'm using HiLo generator). And then at a later time (and other session) I want to flush that buffer and ensure that those IDs do not change.
Is there anyway to do this?
Thanks
Guido
I find it odd that you need many sessions to do a single job. Normally a single session is enough to do all work.
That said, the Hilo generator sets the id property on the entity when calling nhSession.Save(object) without necessarily requiring a round-trip to the database and a
nhSession.Flush() will flush the inserts to the database
UPDATE ===========================================================================
This is a method i used on a specific case that made pure-sql inserts while maintaining NHibernate compatibility.
//this will get the value and update the hi-lo value repository in the datastore
public static void GenerateIdentifier(object target)
{
var targetType = target.GetType();
var classMapping = NHibernateSessionManager.Instance.Configuration.GetClassMapping(targetType);
var impl = NHibernateSessionManager.Instance.GetSession().GetSessionImplementation();
var newId = classMapping.Identifier.CreateIdentifierGenerator(impl.Factory.Dialect, classMapping.Table.Catalog, classMapping.Table.Schema,
classMapping.RootClazz).Generate(impl, target);
classMapping.IdentifierProperty.GetSetter(targetType).Set(target, newId);
}
So, this method takes your newly constructed entity like
var myEnt = new MyEnt(); //has default identifier
GenerateIdentifier(myEnt); //now has identifier injected based on nhibernate's mapping
note that this call does not place the entity in any kind of nhibernate managed space. So you still have to make a place to place your objects and make the save on each one. Also note that i used this one with pure sql inserts and unless you specify generator="assigned" (which will then require some custom hi-lo generator) in your entity mapping nhibernate may require a different mechanism to persist it.
All in all, what you want is to generate an Id for an object that will be persisted at some time in the future. This brings up some problems such as handling non-existent entries due to rollbacks and failed commits. Additionally imo nhibernate is not the tool for this particular job, you don't need nhibernate to do your bulk insert unless there is some complex entity logic that is too costly (in dev time) to implement on your own.
Also note that you are implying that you need transient detached entities which however cannot be used unless you call .nhSes.Save(obj) on the first session and flush its contents so the 2nd session when it calls Load on the transient object there will be an existing row in the database which contradicts what you want to achieve.
Imo don't be afraid of storming the database, just optimise the procedure top-to-bottom to be able to handle the volume. Using nhibernate just to do an insert seems counter-productive when you can achieve the same result with 4 times the performance using ado.net or even an isqlquery wrapped-query (and use the method i provided above)

In which class "should" the responsiblity lie in fetching the data from the Data access layer (RDD)

Lets say I have a class representing a course.The course has its own attributes like subject name, descriprion, start- and an ending date and so on.
Then, the course has attributes like a list of participants. In the database this is obviously represented in two classes; the course and a participant table in a one-to-may relationship.
My question is about how to set the list of participant in the course class: Should it be the course class itself that is fetching the data (through a data access layer, or a layer above), or should one delegate the fetching and the setting of the participants to some kind of helper class, making the course class it's self more or less a dumb object, only holding data?
In RDD(Resopnsible driven design) it tells us to make smart object and abstract away the differentce between data and behaviours. In this regards it sounds obvious that the course class should handle the fetching of the participants. Doing this howerver, is creating a direct dependence on a data access object (or a level above), making it more couple up.
Any thought on this would be helpful.
Should it be the course class itself that is fetching the data
This is a pattern known as Active Record. Be aware that many feel that active record is an anti-pattern.
or should one delegate the fetching and the setting of the participants to some kind of helper class
This is a pattern known as Repository.
making the course class it's self more or less a dumb object, only holding data?
Removing the responsibility of saving and retrieving data from the entity doesn't make that entity a dumb object, only holding data. The entity can still hold domain logic, which is a common practice when practicing Domain-Driven Design. In DDD, entities as simple data containers without behavior are called anemic domain model.
The Course class shouldn't be responsible for fetching the participants, in fact, it shouldn't be responsible for fetching the Courses. You bring up the correct point of data access layers, but the course class itself should not work with this layer, it must just represent a course.
You create a class which has the responsibility of fetching the data, i.e the data access layer. You can name this something similar to CourseDao, but the important part is; that it simply gets the data from the database, and returns it to the client as Courses.
This class has methods like Create, Read, Update and Delete.
Now, you want to do the same for participants, with one small difference. Since your Participant table has a foreign key to Course, your ParticipantDao will have an overloaded Read.
Example:
public class ParticipantDao{
public void create(Participant participant){
//Insert participant in db
}
public Participant read(int id){
//read participant from db
}
public List<Participant> read(){
//read all participants from db
}
public List<Participant> read(Course course){
//read all participants in this course from the db
}
public void Update(Participant participant){
//update and so on.
}
}
And your CourseDao can use this ParticipantDao by going something like:
foreach(Course course in read()){
course.setParticipants(this.participantDao.read(course));
}
In short, you have an object to access the data in the database, which is not the same object that represents said data. When you have a One-To-Many relation, these access objects can work together to retrieve the correct data.