Value object in event sourcing - oop

Is there a place for value objects in an event sourced domain model?
Lets define a value object as an object with immutable state that guards its invariants and has no particular identifier.
An event sourced domain model in this context is a domain that is entirely or partially event sourced, meaning that its current state can be derived from applying all events that have occurred in the past. Events themselves are considered immutable, even over time.
Debate has taken place about the validity of using value objects within events - this question goes slightly further: Do value objects have a place in event sourced domains at all?
The (potential) problem with using value objects is that it becomes rather tricky to alter the domain in such a way that invariants are tightened.
An example of this scenario would be to have a Username value object, with the sole constraint that the name must be anywhere between 2 and 16 characters.
While this has been working well for some time, the business decides to only allow usernames of at least 5 characters.
A migration period begins and users with names of less than 5 characters are asked to update their names.
Lets say the process was successful, correction events are applied and everyone is happy.
We tighten the constraints on our Username value object to require at least 5 characters.
For a while everyone is happy, but then we discover a problem with the snapshots and replay all events.
We now face an exception from our Username object: by loading the historic data, we're breaking an invariant of our domain.
The rules of a value objects apply retroactively - does this make them inherently unsuitable for event sourcing? Would it be worth applying versioning of value objects? Is there a simpler way of avoiding such problems?

I would say, that at the moment you redefined what Username means, and you don't migrate historical data somehow, you've essentially created 2 different Username meanings.
Because there are 2 different meanings of the word, you have to make it explicit in the code somehow. "Versioning" is one way, although I wouldn't use such a generic solution, there are different modeling options.
You could make it explicit that the history of a "username" is just that, a history. So for example create a HistoricUsername, which is the event-sourced object, even a value object if you want. And create a Username which is at all times the username with the most current rules, which is not persisted at all, but created from a HistoricUsername if it can.
Some people suggest sometimes to extract the "rules" from the object, and re-apply it later. That way the object itself is valid at all times and you can ask it to validate itself against rules that might change. I don't really prefer these kinds of solutions, but it's an option, and the Username would still be a value-object.
So the problem is not really that value-objects don't fit into event-sourcing, it's just that the modeling has to be more accurate.

Do value objects have a place in event sourced domains at all?
Yes.
Is there a simpler way of avoiding such problems?
"Don't do that."
The problem you are describing is really one about messaging - if we make backwards incompatible changes to our messages, then things break.
(More precisely, you have a "Username" message, and you are trying to re-use that message with a new set of constraints that reject some previously valid uses of the message).
The answer is that you don't introduce backwards incompatible changes - instead, introduce new names that match the new requirements, and deprecated the old ones.
Which is to say, adding support for new messages, and removing support for the old messages, become two separately managed options.
Greg Young's book Versioning in an Event Sourced System dedicates some chapters to this idea. Also, Rich Hickey ends up touching on these important ideas in most of his talks -- I'd suggest starting from Spec-ulation.
The "value object", meaning that the type that the current implementation of the domain model uses to move the information around, is a separate concern from the messages. The data structures we use in memory don't need to be coupled to our serialization formats.
The representation of the information on the wire is distinct from the representation of information in memory, and that in turn is distinct from the abstractions that manipulate the information in memory.
The challenging thing is that, at the beginning of a project, you have the least amount of information about when the different representations are going to diverge.

We've solved this in a slightly different way. By separating the public API of our value objects from the internal (domain only) API, we are able to evolve one without affecting the other.
For example:
public class Username
{
private readonly string value;
// Domain-only (internal) constructor.
// Does not enforce constriants and can only be called within the domain.
internal Username(string value)
{
this.value = value;
}
// Public factory method.
// Enforces business constraints. Used by consumers of the domain (application layer etc.)
// to create new instances of the value object.
public static Username Create(string value)
{
// Business constraints. These will evolve and grow over time.
if (value == null)
{
// throw exception etc.
}
if (value.Length < 2)
{
// throw exception etc.
}
return new Username(value);
}
}
Consumers of the domain must use the static Create method to create a new instance of the value object. This factory method contains all of our business constraints and prevents an instance being created in an invalid state.
Inside the domain, classes have access to the internal (constraint-less) constructor. Since this does not enforce any business constraints, an instance of the value object can always be created in this way (regardless of its value). By using this constructor when replaying events we can ensure that historical data will always succeed.
The benefits of this design are:
A single class is used to represent the domain concept (no need for multiple classes, versioning etc.).
Business rules are free to evolve over time.
Historical data always works. A Username from a year ago is still a user name, even if our rules have changed.

Although already answered I do find this an interesting situation.
I agree with others that the event data should be record-based and, therefore, nothing more than a data container that may be used to reconstitute the aggregate.
That being said when the rules change so does the domain. A major portion of domain-driven design is to capture as much of the domain (rules/structure) as is required. If this is the case should the changes in the rules not also be kept?
For instance, if we have a Username Value Object and it starts out with the 2 to 16 characters rules then that is coded as such:
public class Username
{
public string Value { get; }
public Username(string value)
{
if (value.Length < 2 || value.Length > 16)
{
throw new DomainException("Username must be between 2 and 16 characters");
}
Value = value;
}
}
Now we get to 1 March 2018 and the rule changes. We can keep the rule around:
public class Username
{
public string Value { get; }
public Username(string value, DateTime registrationDate)
{
if (registrationDate < new Date(2018, 3, 1) &&
(value.Length < 2 || value.Length > 16))
{
throw new DomainException("Username must be between 2 and 16 characters");
}
if (registrationDate >= new Date(2018, 3, 1) &&
(value.Length < 5 || value.Length > 16))
{
throw new DomainException("Username must be between 5 and 16 characters");
}
Value = value;
}
}
That is the basic idea. In this way we keep our "old" rules around as well. This may become quite a hassle but I don't have enough experience to say. Changing our rules retroactively may introduce some pretty tricky situation so I guess one would need to evaluate this on a case-by-case basis.
Just a thought.

Related

DDD - Invariant enforcement using instance methods and a factory method

I'm designing a system using Domain-Driven design principals.
I have an aggregate named Album.
It contains a collection of Tracks.
Album instances are created using a factory method named create(props).
Rule 1: An Album must contain at least one Track.
This rule must be checked upon creation (in Album.create(props)).
Also, there must a method named addTrack(track: Track) so that a new Track can be added after the instance is created. That means addTrack(track: Track) must check the rule too.
How can I avoid this logic code duplication?
Well, if Album makes sure it has at least one Track upon instantiation I don't see why addTrack would be concerned that rule could ever be violated? Did you perhaps mean removeTrack?
In that case you could go for something as simple as the following:
class Album {
constructor(tracks) {
this._tracks = [];
this._assertWillHaveOneTrack(tracks.length);
//add tracks
}
removeTrack(trackId) {
this._assertWillHaveOneTrack(-1);
//remove track
}
_assertWillHaveOneTrack(change) {
if (this._tracks.length + change <= 0) throw new Error('Album must have a minimum of one track.');
}
}
Please note that you could also have mutated the state first and checked the rule after which makes things simpler at first glance, but it's usually a bad practice because the model could be left in an invalid state if the exception is handled, unless the model reverts the change, but that gets even more complex.
Also note that if Track is an entity, it's probably a better idea not to let the client code create the Track to preserve encapsulation, but rather pass a TrackInfo value object or something similar.

Reference Semantics in Google Protocol Buffers

I have slightly peculiar program which deals with cases very similar to this
(in C#-like pseudo code):
class CDataSet
{
int m_nID;
string m_sTag;
float m_fValue;
void PrintData()
{
//Blah Blah
}
};
class CDataItem
{
int m_nID;
string m_sTag;
CDataSet m_refData;
CDataSet m_refParent;
void Print()
{
if(null == m_refData)
{
m_refParent.PrintData();
}
else
{
m_refData.PrintData();
}
}
};
Members m_refData and m_refParent are initialized to null and used as follows:
m_refData -> Used when a new data set is added
m_refParent -> Used to point to an existing data set.
A new data set is added only if the field m_nID doesn't match an existing one.
Currently this code is managing around 500 objects with around 21 fields per object and the format of choice as of now is XML, which at 100k+ lines and 5MB+ is very unwieldy.
I am planning to modify the whole shebang to use ProtoBuf, but currently I'm not sure as to how I can handle the reference semantics. Any thoughts would be much appreciated
Out of the box, protocol buffers does not have any reference semantics. You would need to cross-reference them manually, typically using an artificial key. Essentially on the DTO layer you would a key to CDataSet (that you simply invent, perhaps just an increasing integer), storing the key instead of the item in m_refData/m_refParent, and running fixup manually during serialization/deserialization. You can also just store the index into the set of CDataSet, but that may make insertion etc more difficult. Up to you; since this is serialization you could argue that you won't insert (etc) outside of initial population and hence the raw index is fine and reliable.
This is, however, a very common scenario - so as an implementation-specific feature I've added optional (opt-in) reference tracking to my implementation (protobuf-net), which essentially automates the above under the covers (so you don't need to change your objects or expose the key outside of the binary stream).

Business Entity - should lists be exposed only as ReadOnlyCollections?

In trying to centralize how items are added, or removed from my business entity classes, I have moved to the model where all lists are only exposed as ReadOnlyCollections and I provide Add and Remove methods to manipulate the objects in the list.
Here is an example:
public class Course
{
public string Name{get; set;}
}
public class Student
{
private List<Course>_courses = new List<Course>();
public string Name{get; set;}
public ReadOnlyCollection<Course> Courses {
get{ return _courses.AsReadOnly();}
}
public void Add(Course course)
{
if (course != null && _courses.Count <= 3)
{
_courses.Add(course);
}
}
public bool Remove(Course course)
{
bool removed = false;
if (course != null && _courses.Count <= 3)
{
removed = _courses.Remove(course);
}
return removed;
}
}
Part of my objective in doing the above is to not end up with an Anemic data-model (an anti-pattern) and also avoid having the logic that adds and removes courses all over the place.
Some background: the application I am working with is an Asp.net application, where the lists used to be exposed as a list previously, which resulted in all kinds of ways in which Courses were added to the Student (some places a check was made and others the check was not made).
But my question is: is the above a good idea?
Yes, this is a good approach, in my opinion you're not doing anything than decorating your list, and its better than implementing your own IList (as you save many lines of code, even though you lose the more elegant way to iterate through your Course objects).
You may consider receiving a validation strategy object, as in the future you might have a new requirement, for ex: a new kind of student that can have more than 3 courses, etc
I'd say this is a good idea when adding/removing needs to be controlled in the manner you suggest, such as for business rule validation. Otherwise, as you know from previous code, there's really no way to ensure that the validation is performed.
The balance that you'll probably want to reach, however, is when to do this and when not to. Doing this for every collection of every kind seems like overkill. However, if you don't do this and then later need to add this kind of gate-keeping code then it would be a breaking change for the class, which may or may not be a headache at the time.
I suppose another approach could be to have a custom descendant of IList<T> which has generic gate-keeping code for its Add() and Remove() methods which notifies the system of what's happening. Something like exposing an event which is raised before the internal logic of those methods is called. Then the Student class would supply a delegate or something (sorry for being vague, I'm very coded-out today) when instantiating _courses to apply business logic to the event and cancel the operation (throw an exception, I imagine) if the business validation fails.
That could be overkill as well, depending on the developer's disposition. But at least with something a little more engineered like this you get a single generic implementation for everything with the option to add/remove business validation as needed over time without breaking changes.
I've done that in the past and regretted it: a better option is to use different classes to read domain objects than the ones you use to modify them.
For example, use a behavior-rich Student domain class that jealously guards its ownership of courses - it shouldn't expose them at all if student is responsible for them - and a StudentDataTransferObject (or ViewModel) that provides a simple list of strings of courses (or a dictionary when you need IDs) for populating interfaces.

BDD/DDD Where to put specifications for basic entity validation?

Alternatively, is basic entity validation considered a specification(s)?
In general, is it better to keep basic entity validation (name cannot be null or empty, date must be greater than xxx) in the actual entity, or outside of it in a specification?
If in a specification, what would that look like? Would you have a spec for each field, or wrap it all up in one EntityIsValid type spec?
It seems to me that once people have learned a little about DDD, they pick up the Specification pattern and look to apply it everywhere. That is really the Golden Hammer anti-pattern.
The way I see a place for the Specification pattern, and the way I understood Domain-Driven Design, is that it is a design pattern you can choose to apply when you need to vary a business rule independently of an Entity.
Remember that DDD is an iterative approach, so you don't have to get it 'right' in the first take. I would start out with putting basic validation inside Entities. This fits well with the basic idea about OOD because it lets the object that represents a concept know about the valid ranges of data.
In most cases, you shouldn't even need explicit validation because Entities should be designed so that constraints are represented as invariants, making it impossible to create an instance that violates a constraint.
If you have a rule that says that Name cannot be null or empty, you can actively enforce it directly in your Entity:
public class MyEntity
{
private string name;
public MyEntity(string name)
{
if(string.IsNullOrEmpty(name))
{
throw new ArgumentException();
}
this.name = name;
}
public string Name
{
get { return this.name; }
set
{
if(string.IsNullOrEmpty(value))
{
throw new ArgumentException();
}
this.name = value;
}
}
}
The rule that name cannot be null is now an invariant for the class: it is now impossible for the MyEntity class to get into a state where that rule is broken.
If later on you discover that the rule is more complex, or shared between many different concepts, you can always extract it into a Specification.
Entities have both data and behavior, so letting your entities ensure their invariants is the way to go IMHO. Else, you might end up with an anemic domain model [Fowler].
If your context allows you to enforce the rules in the setters as Mark Seemann suggests, it would be great since you don't have all the "IsValid" and/or "BrokenRules" logic in your model.
I've been in two contexts where we found ourselves needing the aforementioned solution though:
A classic response/request web solution where the web page displays all the broken rules of an entity upon failing save.
The model is read from a database which is updated externally (hence it's not impossible for the entity to be invalid despite the setter logic, unless you let your ORM use the setters, but the whole point for us was to find out about the validity).

Handle model objects always or allow bits of information to travel?

A question about the flow of information in an object oriented construction, e.g. from controller to repository.
Should the objects passed always be in the model or should we allow smaller parts of information to travel?
What would you recommend? What factors decide the approach?
E.g. something like
Controller:
string alias = "alpha";
bool aliasExists = Repository.CheckIfAliasExists(alias)
Repository:
bool CheckIfAliasExists(string alias);
or something like
Controller:
string alias = "alpha";
Member member = Repository.GetMemberByAlias(alias);
bool aliasExists = member != null;
Repository:
Member GetMemberByAlias(string alias);
This is a pretty subjective subject, but I think the decision needs to boil down to two ideas: the performance of retrieving an entire object only for the purposes of determining existence, and the idea of allowing object-specific information to reach a greater scope.
Some will argue that allowing the application to make greater use of this identifying information increases your chances of bypassing the object model altogether, but I generally err on the side of performance in these instances.
My specific advice is to go with the former approach (though don't invalidate the latter, either).