DDD: Repositories are in-memory collections of objects? - repository

I've noticed Repository is usually implemented in either of the following ways:
Method 1
void Add(object obj);
void Remove(object obj);
object GetBy(int id);
Method 2
void Save(object obj); // Used both for Insert and Update scenarios
void Remove(object obj);
object GetBy(int id);
Method 1 has collection semantics (which is how repositories are defined). We can get an object from a repository and modify it. But we don't tell the collection to update it. Implementing a repository this way requires another mechanism for persisting the changes made to an in-memory object. As far as I know, this is done using Unit of Work. However, some argue that UoW is only required when you need transaction control in your system.
Method 2 eliminates the need to have UoW. You can call the Save() method and it determines if the object is new and should be Inserted or is modified and should be Updated. It then uses the data mappers to persist the changes to the database. Whilst this makes life much easier, a repository modeled doesn't have collection semantics. This model has DAO semantics.
I'm really confused about this. If repositories mimic in-memory collection of objects, then we should model them according to Method 1.
What are your thoughts on this?
Mosh

I personally have no issue with the Unit of Work pattern being a part of the solution. Obviously, you only need it for the CUD in CRUD. The fact that you are implementing a UoW pattern, though, does nothing more than dictate that you have a set of operations that need to go as a batch. That is slightly different than saying it needs to be a part of a transaction. If you abstract your repositories well enough, your UoW implementation can be agnostic to the backing mechanism that you are using - whether it is database, XML, etc.
As to the specific question, I think the difference between method one and method two are trivial, if for no other reason than most instances of method two contain a check to see if the identifier is set. If set, treat as update, otherwise, treat as insert. This logic is often built into the repository and is more for simplification of the exposed interface, in my opinion. The repository's purpose is to broker objects between a consumer and a data source and to remove having to have knowledge of the data source directly. I go with method two, because I trust the simple logic of detecting an identifier than having to rely on tracking object states all over the application.
The fact that the terminology for repository usage is so similar to both data access and object collections lend to the confusion. I just treat them as their own first class citizen and do what is best for the domain. ;-)

Maybe you want to have:
T Persist(T entityToPersist);
void Remove(T entityToRemove);
"Persist" being the same as "Save Or Update" or "Add Or Update" - ie. the Repo encapsulates creating new identities (the db may do this) but always returns the new instance with the identity reference.

Related

Service access Entity attributes

In oop we seek to encapsulation. We try not to expose internal state via getters or by public fields, only expose methods.
So far so good.
In situation when we would like to operate on multiple Entities we introduce Service.
But how this service can operate freely on these entities?
If all (both Service and Entities) were in the same package, Entities could expose package private methods or fields and Service could use them, preserving encapsulation. But what when Entities and Service are from different packages? It seems that Entities should either expose public getters (first step to anemic model and leackage of logic from Entities), or public methods executing logic that is specific to the needs of service, possibly introduced only by requirements of this service - also seems bad. How to tackle this?
In the context of OO, the most important thing for you to understand is that objects respond to messages, and that in OOP in particular, methods are how these responses are implemented.
For example, imagine you have a Person object to which you (as the programmer) have assigned the responsibility to respond to the "grow" message. Generally, you would implement that as a Person.grow() method, like this.
class Person {
int age;
public void grow() { this.age++; }
}
This seems fairly obvious, but you must note that from the message sender's perspective, how Person object reacts is meaningless. For all it cares, the method Person.grow() could be triggering a missile launch, and it would not matter because some other object (or objects) could be responding in the right way (for example, a UI component updating itself on the screen). However, you decided that when the Person object handles the "grow" message, it must increment the value of its age attribute. This is encapsulation.
So, to address your concern, "public methods executing logic that is specific to the needs of service, possibly introduced only by requirements of this service - also seems bad", it is not bad at all because you are designing the entities to respond to messages from the services in specific ways to match the requirements of your application. The important thing to bear in mind is that the services do not dictate how the entities behave, but rather the entities respond in their own way to requests from the services.
Finally, you might be asking yourself: how do entities know that they need to respond to certain messages? This is easy to answer: YOU decide how to link messages to responses. In other words, you think about the requirements of your application (what "messages" will be sent by various objects) and how they will be satisfied (how and which objects will respond to messages).
In situation when we would like to operate on multiple Entities we introduce Service.
No we don't. Well, I guess some people do, but the point is they shouldn't.
In object-orientation, we model a particular problem domain. We don't (again, shouldn't) discriminate based on what amount of other objects a single object operates. If I have to model an Appointment and a collection of Appointment I don't introduce an AppointmentService, I introduce a Schedule or Timetable, or whatever is appropriate for the domain.
The distinction of Entity and Service is not domain-conform. It is purely technical and most often a regression into procedural thinking, where an Entity is data and the Service is a procedure to act on it.
DDD as is practiced today is not based on OOP, it just uses object syntax. One clear indication is that in most projects entities are directly persisted, even contain database ids or database-related annotations.
So either do OOP or do DDD, you can't really do both. Here is a talk of mine (talk is german but slides are in english) about OO and DDD.
I don't see the usage of getters as a step towards an anaemic model. Or at least, as everything in programming, it depends.
Downside of anaemic model is that every component accessing the object can mutate it without any enforcing of its invariants (opening to possible inconsistency in data), it can be done easily using the setter methods.
(I will use the terms command and query to indicate methods that modify the state of the objects and methods that just return data without changing anything)
The point of having an aggregate/entity is to enforce the object invariants, so it exposes "command" methods that don't reflect the internal structure of the object, but instead are "domain oriented" (using the "ubiquitous language" for their naming), exposing its "domain behavior" (an avoidance of get/set naming is suggested because they are standard naming for representing the object internal structure).
This is for what concern the set methods, what about get?
As set methods can be seen as "command" of the aggregate, you can see the getters as "query" methods used to ask data to the aggregate. Asking data to an aggregate is totally fine, if this doesn't break the responsability of the aggregate of enforcing invariants. This means that you should watch out to what the query method returns.
If the query method result is a value object, so, immutable, it is totally fine to have it. In this way who query the aggregate has in return something that can be only read.
So you could have query methods doing calculation using the object internal state (eg. A method int missingStudents() that calculate the number of missing student for a Lesson entity that has the totalNumber of students and a List<StudentId> in its internal state), or simple methods like List<StudentId> presentStudent() that just returns the list in its internal state, but what change from a List<StudentId> getStudents() its just the name).
So if the get method return something that is immutable who use it can't break the invariants of the aggregate.
If the method returns a mutable object that is part of the aggregate state, whoever access the object can query for that object and now can mutate something that stays inside the aggregate without passing for the right command methods, skipping invariants check (unless it is something wanted and managed).
Other possibility is that the object is created on the fly during the query and is not part of the aggregate state, so if someone access it, also if it is mutable, the aggregate is safe.
In the end, get and set methods are seen as an ugly thing if you are a ddd extremist, but sometimes they can also be useful being a standard naming convention and some libraries work on this naming convention, so I don't see them bad, if they don't break the aggregate/entity responsibilities.
As last thing, when you say In situation when we would like to operate on multiple Entities we introduce Service., this is true, but also a service should operate (mutate, save) on a single aggregate, but this is another topic 😊.

Is Martin Fowler's POEAA implementation of Unit of Work an anti-pattern?

Well from the book POEAA, Martin Fowler introduced this idea of Unit of Work. It works very well if you want to have auto-commit system, in which your domain model uses Unit of work to label itself as new, dirty, removed or clean. Then you only need to call UnitofWork.commit() and all changes of models will be saved. Below is a domain model class with such methods:
public abstract class DomainModel{
protected void markNew(){
UnitOfWork.getCurrent().registerNew(this);
}
protected void markDirty(){
UnitOfWork.getCurrent().registerDirty(this);
}
protected void markRemoved(){
UnitOfWork.getCurrent().registerRemoved(this);
}
protected void markClean(){
UnitOfWork.getCurrent().registerClean(this);
}
}
With this implementation, you can mark a domain model as any save state through business logic method:
public class Message extends DomainModel{
public void updateContent(User user, string content){
// This method update message content if the the message posted time is not longer than 24 hrs, and the user has permission to update messate content.
if(!canUpdateContent(user) && timeExpired()) throw new IllegalOperationException("An error occurred, cannot update content.");
this.content = content;
markDirty();
}
}
At first glance, it looks marvelous, since you dont have to manually call insert, save and delete method on your repository/data mapper. However, I see two problems with this approach:
Tight coupling of domain model with Unit of work: This implementation of Unit of Work will make domain models dependent on UnitOfWork class. UnitOfWork has to come from somewhere, the implementation of static class/method is bad. To improve this, we need to switch to dependency injection, and pass an instance of UnitOfWork to the constructor of Domain Model. But this still couples domain model with Unit of work. Also ideally a domain model should only accept parameters for its data fields(ie. Message domain model's constructor should only accept whats relevant to message, such as title, content, dateposted, etc). If it will need to accept a parameter of UnitOfWork, it will pollute the constructor.
The domain model now becomes persistent-aware: In modern application design, especially DDD, we strive for persistent-ignorant model. The domain model shouldnt care about whether it is being persisted or not, it should not even care about whether there's persistence layer at all. By having those markNew(), markDirty(), etc methods on domain model, our domain models now have the responsibility of informing the rest of our application that it needs to be persisted. Although it does not handle the persistence logic, the model still is aware of the existence of persistence layer. I am not sure if this is a good idea, to me it seems to have violate the single responsibility principle. There's also an article talking about this:
http://blog.sapiensworks.com/post/2014/06/04/Unit-Of-Work-is-the-new-Singleton.aspx/
So what do you think? Does the original Unit of Work pattern described in Martin Fowler violate good OO design principles? If so, do you consider it an antipattern?
To be entirely accurate, there is no one "Martin Fowler's implementation of Unit of Work". In the book he distinguishes between two types of registration of a modified object into a UoW.
Caller registration where only the calling object knows about the UoW and has to mark the (callee) domain object as dirty with it. No anti pattern or bad practice here as far as I can tell.
Object registration where the domain object registers itself with the UoW. Here again there are two options :
For this scheme to work the Unit of Work needs either to be passed to
the object or to be in a well-known place. Passing the Unit of Work
around is tedious but usually no problem to have it present in some
kind of session object.
The code sample is using UnitOfWork.GetCurrent() which is closer to the latter option and admittedly widely considered an anti-pattern today because of the tightly coupled, implicit dependency (Service Locator style).
However, if the first option was chosen, i.e. passing the UoW over to the domain object, and let's assume a Unit of Work abstraction, would it be bad practice ? From a dependency management perspective, clearly not.
Now remains the persistence ignorance aspect. Can we say about an object which can signal another object it's just been edited/created/removed that it is persistence-aware ? Highly debatable.
In comparison, if we look at more recent domain object implementations out there, for instance ones in Event Sourcing, we can see that aggregates can be responsible for keeping a list of their own uncommitted changes which is more or less the same idea. Does this violate persistence ignorance ? I don't think so.
Bottom line : the specific code Fowler chose to illustrate one of many UoW possibilities would clearly be considered bad practice now, but much more so with regard to problem #1 you pointed out and not really problem #2. And this doesn't disqualify other implementations he writes about, nor the whole UoW pattern whose change-tracking mechanics are anyway most of the time hidden away in third party library magic (read: ORM) nowadays and not hardcoded as in the book's example.
From a DDD perspective, this is something you shouldn't do.
DDD contains the following rule:
An application service should only modify one aggregate per transaction.
If you follow this rule, it's clear which aggregate changed during an app service operation. This aggregate then in turn needs to be passed to a repository for saving to the DB:
repository.update(theAggregate);
No other call is required. This defeats the gain from the pattern in the form you describe it.
On the other hand, the pattern you describe introduces a dependency from the domain to the persistence mechanism (depending on the design either a real dependency or just a conceptual dependency). Now this is something you should avoid, because it increases the complexity of your model a lot (not only internally, also for clients).
As a result, you shouldn't use the pattern in this form together with DDD.
Outside of DDD
Having that said, I think the pattern is one of many solutions to a certain problem. That solution has pros and cons, some of which you describe in the question. In some situations, the pattern may be the best trade-off, so
No, this is not an anti-pattern.
I don't think the model should not have a dependency on the UoW. It would be more like a repository that would depend on the UoW and, in turn, the repository would depend on the model.
If your repositories only depend on an abstract UoW, then the only piece of the puzzle that knows about the persistence technology is the concrete UoW.
The only classes I tend to allow the model to depend on are other pieces of the model: domain services, factories, etc.

Object persistence terminology: 'repository' vs. 'store' vs. 'context' vs. 'retriever' vs. (...)

I'm not sure how to name data store classes when designing a program's data access layer (DAL).
(By data store class, I mean a class that is responsible to read a persisted object into memory, or to persist an in-memory object.)
It seems reasonable to name a data store class according to two things:
what kinds of objects it handles;
whether it loads and/or persists such objects.
⇒ A class that loads Banana objects might be called e.g. BananaSource.
I don't know how to go about the second point (ie. the Source bit in the example). I've seen different nouns apparently used for just that purpose:
repository: this sounds very general. Does this denote something read-/write-accessible?
store: this sounds like something that potentially allows write access.
context: sounds very abstract. I've seen this with LINQ and object-relational mappers (ORMs).
P.S. (several months later): This is probably appropriate for containers that contain "active" or otherwise supervised objects (the Unit of Work pattern comes to mind).
retriever: sounds like something read-only.
source & sink: probably not appropriate for object persistence; a better fit with data streams?
reader / writer: quite clear in its intention, but sounds too technical to me.
Are these names arbitrary, or are there widely accepted meanings / semantic differences behind each? More specifically, I wonder:
What names would be appropriate for read-only data stores?
What names would be appropriate for write-only data stores?
What names would be appropriate for mostly read-only data stores that are occasionally updated?
What names would be appropriate for mostly write-only data stores that are occasionally read?
Does one name fit all scenarios equally well?
As noone has yet answered the question, I'll post on what I have decided in the meantime.
Just for the record, I have pretty much decided on calling most data store classes repositories. First, it appears to be the most neutral, non-technical term from the list I suggested, and it seems to be well in line with the Repository pattern.
Generally, "repository" seems to fit well where data retrieval/persistence interfaces are something similar to the following:
public interface IRepository<TResource, TId>
{
int Count { get; }
TResource GetById(TId id);
IEnumerable<TResource> GetManyBySomeCriteria(...);
TId Add(TResource resource);
void Remove(TId id);
void Remove(TResource resource);
...
}
Another term I have decided on using is provider, which I'll be preferring over "repository" whenever objects are generated on-the-fly instead of being retrieved from a persistence store, or when access to a persistence store happens in a purely read-only manner. (Factory would also be appropriate, but sounds more technical, and I have decided against technical terms for most uses.)
P.S.: Some time has gone by since writing this answer, and I've had several opportunities at work to review someone else's code. One term I've thus added to my vocabulary is Service, which I am reserving for SOA scenarios: I might publish a FooService that is backed by a private Foo repository or provider. The "service" is basically just a thin public-facing layer above these that takes care of things like authentication, authorization, or aggregating / batching DTOs for proper "chunkiness" of service responses.
Well so to add something to you conclusion:
A repository: is meant to only care about one entity and has certain patterns like you did.
A store: is allowed to do a bit more, also working with other entities.
A reader/writer: is separated to allow semantically show and inject only reading and wrting functionality into other classes. It's coming from the CQRS pattern.
A context: is more or less bound to a ORM mapper as you mentioned and is usually used under the hood of a repository or store, some use it directly instead of making a repository on top. But it's harder to abstract.

What classes should I map against with NHibernate?

Currently, we use NHibernate to map business objects to database tables. Said business objects enforce business rules: The set accessors will throw an exception on the spot if the contract for that property is violated. Also, the properties enforce relationships with other objects (sometimes bidirectional!). Well, whenever NHibernate loads an object from the database (e.g. when ISession.Get(id) is called), the set accessors of the mapped properties are used to put the data into the object.
What's good is that the middle tier of the application enforces business logic. What's bad is that the database does not. Sometimes crap finds its way into the database. If crap is loaded into the application, it bails (throws an exception). Sometimes it clearly should bail because it cannot do anything, but what if it can continue working? E.g., an admin tool that gathers real-time reports runs a high risk of failing unnecessarily instead of allowing an admin to even fix a (potential) problem.
I don't have an example on me right now, but in some instances, letting NHibernate use the "front door" properties that also enforce relationships (especially bidi) leads to bugs.
What are the best solutions?
Currently, I will, on a per-property basis, create a "back door" just for NHibernate:
public virtual int Blah {get {return _Blah;} set {/*enforces BR's*/}}
protected virtual int _Blah {get {return blah;} set {blah = value;}}
private int blah;
I showed the above in C# 2 (no default properties) to demonstrate how this gets us basically 3 layers of, or views, to blah!!! While this certainly works, it does not seem ideal as it requires the BL to provide one (public) interface for the app-at-large, and another (protected) interface for the data access layer.
There is an additional problem: To my knowledge, NHibernate does not give you a way to distinguish between the name of the property in the BL and the name of the property in the entity model (i.e. the name you use when you query, e.g. via HQL--whenever you give NHibernate the name (string) of a property). This becomes a problem when, at first, the BR's for some property Blah are no problem, so you refer to it in your O/R mapping... but then later, you have to add some BR's that do become a problem, so then you have to change your O/R mapping to use a new _Blah property, which breaks all existing queries using "Blah" (common problem with programming against strings).
Has anyone solved these problems?!
While I found most of your architecture problematic, the usual way to deal with this stuff is having NHibernate use the backing field instead of the setter.
In your example above, you don't need to define an additional protected property. Just use this in the mapping:
<property name="Blah" access="nosetter.lowercase"/>
This is described in the docs, http://nhibernate.info/doc/nh/en/index.html#mapping-declaration-property (Table 5.1. Access Strategies)

Object-oriented programming & transactions

A little intro:
Class contains fields and methods (let me skip properties this time).
Fields represent a state of the class.
Methods describe behavior of the class.
In a well-designed class, a method won't change the class's state if it throws an exception, right? (In other words, whatever happens, class's state shouldn't be corrupted)
Question:
Is there a framework, a design pattern, best practice or a programming language to call a sequence of methods in a transactional style, so that either class's state don't get changed (in case of exception), or everything succeeds?
E.g.:
// the class foo is now in the state S1
foo.MoveToState2();
// it is now (supposed to be) in the state S2
foo.MoveToFinalState();
// it is now (supposed to be) in the state, namely, S3
Surely, an exception might occur both in MoveToState2() and MoveToFinalState(). But from this block of code I want the class foo to be either in the state S1 or S3.
This is a simple scenario with a single class involved, no if's, no while's, no side effects, but I hope the idea is clear.
Take a look at the Memento pattern
The memento pattern is a software design pattern that provides the ability to restore an object to its previous state (undo via rollback).
Not the most efficient method, but you could have an object that represents your transactional data. When you start a transaction, make a copy of the data and perform all operations on that. When the transaction ends successfully, move the copy to your real data - this can be done using pointers, so need not be too inefficient.
Functional programming is a paradigm that seems to fit well to transactional computations. Since no side-effects are allowed without explicit declaration, you have full control of all data flow.
Therefore software transactional memory can be expressed easily in functional terms - See STM for F#
The key idea is the concept of monads. A monad can be used to model an arbitrary computation through two primitives: Return to return a value and Bind to sequence two computations. Using these two, you can model a transactional monad that controls and saves all state in form of continuations.
One could try to model these in an object-oriented way through a State+Memento pattern, but generally, transactions in imperative languages (like the common OO-ones) are much more difficult to implement since you can perform arbitrary side-effects. But of course you can think of an object defining a transaction scope, that saves, validates and restores data as needed, given they expose a suitable interface for this (the patterns I mentioned above).
The simplest and most reliable "pattern" to use here is an immutable data structure.
Instead of writing:
foo.MoveToState2();
foo.MoveToFinalState();
You write:
MyFoo foo2 = foo.MoveToState2();
MyFoo finalFoo = foo2.MoveToFinalState();
And implement the methods accordingly - that is, MoveToState2 does not actually change anything about MyFoo, it creates a new MyFoo that is in state 2. Similarly with the final state.
This is how the string classes in most OO languages work. Many OO languages are also starting to implement (or have already implemented) immutable collections. Once you have the building blocks, it's fairly straightforward to create an entire immutable "entity".
This would be pretty ugly to implement everywhere, but just saving the state locally, then restoring it in the case of an exception would work in simple scenarios. You'd have to catch and rethrow the exception, which may lose some context in some languages. It might be better to wrap it if possible to retain the context.
try {
save state in local variables
move to new state
} catch (innerException) {
restore state from local variables
throw new exception( innerException )
}
When using object copy approach, you have to watch out that the statements to be rolled-back are only affecting the object's or data itself (and aggregates).
But things are getting really difficult if the side-effects of the statements are "more external". For example I/O operations, network calls. You always have to analyze the overall state-changes of your statements.
It gets also really tricky if you touch static data (or evil mutable singletons). Reverting this data isolated is difficult, because other threads could have modified them in between (you could face lost updates).
Reverting/rollback to the past is often not so trivial ;)
I would also consider the saga pattern, you could pass a copy of the objects current state into MoveToState2 and if it throws an exception you could catch that internally and use the copy of the original state to rollback. You would have to do the same with MoveToState3 too. If however the server crashed during a rollback you might still get corrupted state, that's why databases are so good.
Transactional memory fits here the best.
An option could be a transactional storage. Sample implementation you can find here:
http://www.codeproject.com/KB/dotnet/Transactional_Repository.aspx
Memento pattern
Also let me describe a possible pattern on how to implement such behavior:
Define a base class TransactionalEntity. This class contains dictionary of properties.
All your transactional classes inherit from the TransactionalEntity and should operate over some sort of Dependency Properties/Fields, i.e. properties(getters/setters) which store it's values not in local class fields, but in dictionary, which is stored in the base class.
Then you define TransactionContext class. TransactionContext class internally contains a set of dictionaries (one dictionary for each entity that participates in the transaction) and when a transactional entity participates in transaction, it writes all data to the dictionary in the transaction context. Then all you need is basically four methods:
TransactionContext.StartTransaction();
TransactionalEntity.JoinTransaction(TransactionContext context); //if your language/framework supports Thread Static fields, then you do not need this method
TransactionContext.CommitTransaction();
TransactionContext.RollbackTransaction();
To sum up, you need to store state in base class TransactionalEntity and during transaction TransactionalEntity will cooperate with TransactionContext.
I hope, I've explained it well enough.
I think a Command Pattern could be well suited to this problem.
Linky.
I was astonished that no one suggested explicitly the simplest pattern to use .. the State Pattern
In this way you can also eliminate that 'finalState' method and just use 'handle()'.
How do you know which final state is?
The memento pattern is best used with the Command pattern, and usually applies to GUI operations to implement the undo/redo feature.
Fields represent a state of the class
Fields represents the state of the instanced object. You use many times wrong definitions of the OOP terms. Review and correct.