Object persistence terminology: 'repository' vs. 'store' vs. 'context' vs. 'retriever' vs. (...) - naming-conventions

I'm not sure how to name data store classes when designing a program's data access layer (DAL).
(By data store class, I mean a class that is responsible to read a persisted object into memory, or to persist an in-memory object.)
It seems reasonable to name a data store class according to two things:
what kinds of objects it handles;
whether it loads and/or persists such objects.
⇒ A class that loads Banana objects might be called e.g. BananaSource.
I don't know how to go about the second point (ie. the Source bit in the example). I've seen different nouns apparently used for just that purpose:
repository: this sounds very general. Does this denote something read-/write-accessible?
store: this sounds like something that potentially allows write access.
context: sounds very abstract. I've seen this with LINQ and object-relational mappers (ORMs).
P.S. (several months later): This is probably appropriate for containers that contain "active" or otherwise supervised objects (the Unit of Work pattern comes to mind).
retriever: sounds like something read-only.
source & sink: probably not appropriate for object persistence; a better fit with data streams?
reader / writer: quite clear in its intention, but sounds too technical to me.
Are these names arbitrary, or are there widely accepted meanings / semantic differences behind each? More specifically, I wonder:
What names would be appropriate for read-only data stores?
What names would be appropriate for write-only data stores?
What names would be appropriate for mostly read-only data stores that are occasionally updated?
What names would be appropriate for mostly write-only data stores that are occasionally read?
Does one name fit all scenarios equally well?

As noone has yet answered the question, I'll post on what I have decided in the meantime.
Just for the record, I have pretty much decided on calling most data store classes repositories. First, it appears to be the most neutral, non-technical term from the list I suggested, and it seems to be well in line with the Repository pattern.
Generally, "repository" seems to fit well where data retrieval/persistence interfaces are something similar to the following:
public interface IRepository<TResource, TId>
{
int Count { get; }
TResource GetById(TId id);
IEnumerable<TResource> GetManyBySomeCriteria(...);
TId Add(TResource resource);
void Remove(TId id);
void Remove(TResource resource);
...
}
Another term I have decided on using is provider, which I'll be preferring over "repository" whenever objects are generated on-the-fly instead of being retrieved from a persistence store, or when access to a persistence store happens in a purely read-only manner. (Factory would also be appropriate, but sounds more technical, and I have decided against technical terms for most uses.)
P.S.: Some time has gone by since writing this answer, and I've had several opportunities at work to review someone else's code. One term I've thus added to my vocabulary is Service, which I am reserving for SOA scenarios: I might publish a FooService that is backed by a private Foo repository or provider. The "service" is basically just a thin public-facing layer above these that takes care of things like authentication, authorization, or aggregating / batching DTOs for proper "chunkiness" of service responses.

Well so to add something to you conclusion:
A repository: is meant to only care about one entity and has certain patterns like you did.
A store: is allowed to do a bit more, also working with other entities.
A reader/writer: is separated to allow semantically show and inject only reading and wrting functionality into other classes. It's coming from the CQRS pattern.
A context: is more or less bound to a ORM mapper as you mentioned and is usually used under the hood of a repository or store, some use it directly instead of making a repository on top. But it's harder to abstract.

Related

Implementing similar UseCases looks like code duplication

I have the following case. User can export several object types (transaction, invoice, etc) to external accounting system.
Export algorithm has steps:
fetch objects by some filter
export objects one by one to the accounting system (web service method per object type)
register the fact that given document was exported, so it wont be exported again
prepare summary for user (num of exported documents, error messages etc)
The algorithm is the same for all object types but there are some important differences which must be handled:
different types
different target web service method, different object to DTO mappings
different filters per object type
I've considered a few solutions:
don't treat the export algorithm as code duplication and implement an algorithm per object type. Export of any data to any external system may be described by such algorithm - does it mean that we should always have one general class to export anything to anywhere?:)
move the differences to strategies (one strategy interface to create abstraction for all differences) - I even implemented it.
use generics - unfortunately I'm coding in PHP and it's not possible
The question:
Is creating a separate export algorithm per object type a code duplication?
Maybe all of them should be treated as separate Use Cases?
If it's a duplication then what techniques to avoid it should I consider?
Description of my first implementation:
In first approach I defined an Exportable abstraction but I was not happy about it. Each object has completely different payload.
An Exportable interface defined only one method getId and it was used to register that object was exported (and thanks to that wont be exported again).
For this purpose the abstraction was fine, but the problem was moved to exportService which had to check the concrete instance to choose DTO mapper and endpoint. So the exportService broke SOLID.
None of the things you have described above are domain-specific logic (and in fact you don't even mention the problem domain in your question), so I don't think it really falls under domain-driven design. Because it's not domain-specific logic I wouldn't worry too much about code duplication, especially considering that the solution doesn't seem obvious.
Keep it simple and just write out each use case separately. If you find that there's common code that's easily refactored, do so after you get everything working smooth. Don't overthink it or add patterns before they are obviously necessary.

NHibernate and repositories design pattern

I've been working with NHibernate for quite a while and have come to realize that my architecture might be a bit...dated. This is for an NHibernate library that is living behind several apps that are related to each other.
First off, I have a DomainEntity base class with an ID and an EntityID (which is a guid that I use when I expose an item to the web or through an API, instead of the internal integer id - I think I had a reason for it at one point, but I'm not sure it's really valid now). I have a Repository base class where T inherits from DomainEntity that provides a set of generalized search methods. The inheritors of DomainEntity may also implement several interfaces that track things like created date, created by, etc., that are largely a log for the most recent changes to the object. I'm not fond of using a repository pattern for this, but it wraps the logic of setting those values when an object is saved (provided that object is saved through the repository and not saved as part of saving something else).
I would like to rid myself of the repositories. They don't make me happy and really seem like clutter these days, especially now that I'm not querying with hql and now that I can use extension methods on the Session object. But how do I hook up this sort of functionality cleanly? Let's assume for the purposes of discussion that I have something like structuremap set up and capable of returning an object that exposes context information (current user and the like), what would be a good flexible way of providing this functionality outside of the repository structure? Bonus points if this can be hooked up along with a convention-based mapping setup (I'm looking into replacing the XML files as well).
If you dislike the fact that repositories can become bloated over time then you may want to use something like Query Objects.
The basic idea is that you break down a single query into an individual object that you can then apply it to the database.
Some example implementation links here.

What design pattern is used by IProject.setDescription in Eclipse

I'm designing an API with a specific pattern in mind, but don't know if this pattern has a name. It's similar to the Command pattern in GoF (Gang of Four) but not exactly.
One simple example of it I can find is in Eclipse where you manipulate a project (IProject), not by calling methods on the project that change its state, but by this 3 step process:
extracting its state into a descriptor object (IProjectDescription) with getDescription
setting properties on the descriptor. E.g. setName
applying the descriptor back to the original project with setDescription
The general principle seems to be that you have a complex object as part of a framework with many potentially interdependent properties, and rather than working directly on that object, one property at a time, you extract the properties into a simple data object, manipulate that, and apply it back.
It has some of the attributes of the Command pattern, in that the data object encapsulates all of the changes like a Command would - but it's not really a Command, because you don't execute it on the object, it's simply a representation of the state of the object.
It also has some attributes of a Transactional API, in that, by making the changes all in one hit with the set... call, you allow for the entire modification to effectively "roll back" if any one property changes fails. But while that's an advantage of the approach, it's not really the main purpose of it. And what's more, you can achieve the transactional nature without this approach, by simply adding transactional methods to the API (like commit and rollback)
There are two advantages in this pattern that I do want to exploit - although I don't see them being exploited by the eclipse example above:
You can represent the meaningful state of the underlying object while its implementation changes. This is useful for upgrading, or copying state from different types of representations. Say I release a new version of my API where I create an object Foo2 which is a totally new form of my old Foo1, but both have the same basic properties. To upgrade a Foo1 to a Foo2, I can extract those properties as a FooState. foo2.setFooState(foo1.getFooState) as simply as that. The way in which the properties are interpreted and represented is encapsulated in the Foos and can be totally different.
I can persist and transmit the state of the underlying object with my simple data object, where persisting the object itself would be much more complex. So I can extract the state of Foo as a FooState, and persist it as a simple XML document then later apply it to some new object by "loading" it and applying it. Or I can transmit the FooState simply to a webservice as a JSON object whereas the Foo itself is too big and complex to transmit. (Or the objects on each end of the service call are entirely different, like Foo1 and Foo2)
Anyway, I can't find an name or example of this pattern anywhere, neither in the Gang of Four design patterns, nor even in Martin Fowler's comprehensive "bliki"
Data Transfer Object(DTO) that Martin Fowler describes in his book Principles of Enterprise Application Architecture seems to be for the purpose you describe in point 2.
A DTO is a fairly simple extraction of the more complex Domain Model that it represents.
Fowler describes that the usage of a DTO in combination with an assembler can be used to keep the DTO independent from the actual Domain Object(or Objects) that it is supposed to represent. The assembler knows how to create a DTO from the Domain Object and vice versa. Also he mentions that the DTO needs to be serializable to persist/transmit its state. What you describe in point 2 seems to match this description.
What you've described in point 1 though does not seem to be an intended purpose, but definitely seems achievable using this pattern.
I'm not sure if you went through the Pattern catalog of his book or the book itself. The book itself describes this in much greater detail.
You may also want to have a look at Transfer Object definition from Oracle which Fowler says here is what he describes as DTO.
Not every design is documented as a single Design Pattern, in fact most system designs are combinations of multiple patterns.
However one part of what you're doing, with IProjectDescription is using a Memento, however yours seems to be a Polymorphic variation. Consider Patterns as they appear in Pattern Catalogues to be the pared down to the essential starting point not the end result. Patterns are by there very nature supposed to be extended and combined.
The Command pattern can give you Commit and RollBack (Do/Undo) and combining it with Memento in that way is a quite common approach. The same thing is seen in the Java Servlet API with HttpRequest & HttpResponse.

Single Responsibility Principle vs Anemic Domain Model anti-pattern

I'm in a project that takes the Single Responsibility Principle pretty seriously. We have a lot of small classes and things are quite simple. However, we have an anemic domain model - there is no behaviour in any of our model classes, they are just property bags. This isn't a complaint about our design - it actually seems to work quite well
During design reviews, SRP is brought out whenever new behaviour is added to the system, and so new behaviour typically ends up in a new class. This keeps things very easily unit testable, but I am perplexed sometimes because it feels like pulling behaviour out of the place where it's relevant.
I'm trying to improve my understanding of how to apply SRP properly. It seems to me that SRP is in opposition to adding business modelling behaviour that shares the same context to one object, because the object inevitably ends up either doing more than one related thing, or doing one thing but knowing multiple business rules that change the shape of its outputs.
If that is so, then it feels like the end result is an Anemic Domain Model, which is certainly the case in our project. Yet the Anemic Domain Model is an anti-pattern.
Can these two ideas coexist?
EDIT: A couple of context related links:
SRP - http://www.objectmentor.com/resources/articles/srp.pdf
Anemic Domain Model - http://martinfowler.com/bliki/AnemicDomainModel.html
I'm not the kind of developer who just likes to find a prophet and follow what they say as gospel. So I don't provide links to these as a way of stating "these are the rules", just as a source of definition of the two concepts.
Rich Domain Model (RDM) and Single Responsibility Principle (SRP) are not necessarily at odds. RDM is more at odds with a very specialised subclassof SRP - the model advocating "data beans + all business logic in controller classes" (DBABLICC).
If you read Martin's SRP chapter, you'll see his modem example is entirely in the domain layer, but abstracting the DataChannel and Connection concepts as separate classes. He keeps the Modem itself as a wrapper, since that is useful abstraction for client code. It's much more about proper (re)factoring than mere layering. Cohesion and coupling are still the base principles of design.
Finally, three issues:
As Martin notes himself, it's not always easy to see the different 'reasons for change'. The very concepts of YAGNI, Agile, etc. discourage the anticipation of future reasons for change, so we shouldn't invent ones where they aren't immediately obvious. I see 'premature, anticipated reasons for change' as a real risk in applying SRP and should be managed by the developer.
Further to the previous, even correct (but unnecessary anal) application of SRP may result in unwanted complexity. Always think about the next poor sod who has to maintain your class: will the diligent abstraction of trivial behaviour into its own interfaces, base classes and one-line implementations really aid his understanding of what should simply have been a single class?
Software design is often about getting the best compromise between competing forces. For example, a layered architecture is mostly a good application of SRP, but what about the fact that, for example, the change of a property of a business class from, say, a boolean to an enum has a ripple effect across all the layers - from db through domain, facades, web service, to GUI? Does this point to bad design? Not necessarily: it points to the fact that your design favours one aspect of change to another.
I'd have to say "yes", but you have to do your SRP properly. If the same operation applies to only one class, it belongs in that class, wouldn't you say? How about if the same operation applies to multiple classes? In that case, if you want to follow the OO model of combining data and behavior, you'd put the operation into a base class, no?
I suspect that from your description, you're ending up with classes which are basically bags of operations, so you've essentially recreated the C-style of coding: structs and modules.
From the linked SRP paper:
"The SRP is one of the simplest of the principle, and one of the hardest to get right."
The quote from the SRP paper is very correct; SRP is hard to get right. This one and OCP are the two elements of SOLID that simply must be relaxed to at least some degree in order to actually get a project done. Overzealous application of either will very quickly produce ravioli code.
SRP can indeed be taken to ridiculous lengths, if the "reasons for change" are too specific. Even a POCO/POJO "data bag" can be thought of as violating SRP, if you consider the type of a field changing as a "change". You'd think common sense would tell you that a field's type changing is a necessary allowance for "change", but I've seen domain layers with wrappers for built-in value types; a hell that makes ADM look like Utopia.
It's often good to ground yourself with some realistic goal, based on readability or a desired cohesion level. When you say, "I want this class to do one thing", it should have no more or less than what is necessary to do it. You can maintain at least procedural cohesion with this basic philosophy. "I want this class to maintain all the data for an invoice" will generally allow SOME business logic, even summing subtotals or calculating sales tax, based on the object's responsibility to know how to give you an accurate, internally-consistent value for any field it contains.
I personally do not have a big problem with a "lightweight" domain. Just having the one role of being the "data expert" makes the domain object the keeper of every field/property pertinent to the class, as well as all calculated field logic, any explicit/implicit data type conversions, and possibly the simpler validation rules (i.e. required fields, value limits, things that would break the instance internally if allowed). If a calculation algorithm, perhaps for a weighted or rolling average, is likely to change, encapsulate the algorithm and refer to it in the calculated field (that's just good OCP/PV).
I don't consider such a domain object to be "anemic". My perception of that term is a "data bag", a collection of fields that has no concept whatsoever of the outside world or even the relation between its fields other than that it contains them. I've seen that too, and it's not fun tracking down inconsistencies in object state that the object never knew was a problem. Overzealous SRP will lead to this by stating that a data object is not responsible for any business logic, but common sense would generally intervene first and say that the object, as the data expert, must be responsible for maintaining a consistent internal state.
Again, personal opinion, I prefer the Repository pattern to Active Record. One object, with one responsibility, and very little if anything else in the system above that layer has to know anything about how it works. Active Record requires the domain layer to know at least some specific details about the persistence method or framework (whether that be the names of stored procedures used to read/write each class, framework-specific object references, or attributes decorating the fields with ORM information), and thus injects a second reason to change into every domain class by default.
My $0.02.
I've found following the solid principles did in fact lead me away from DDD's rich domain model, in the end, I found I didn't care. More to the point, I found that the logical concept of a domain model, and a class in whatever language weren't mapped 1:1, unless we were talking about a facade of some sort.
I wouldn't say this is exactly a c-style of programming where you have structs and modules, but rather you'll probably end up with something more functional, I realise the styles are similar, but the details make a big difference. I found my class instances end up behaving like higher order functions, partial functions application, lazily evaluated functions, or some combination of the above. It's somewhat ineffable for me, but that's the feeling I get from writing code following TDD + SOLID, it ended up behaving like a hybrid OO/Functional style.
As for inheritance being a bad word, i think that's more due to the fact that the inheritance isn't sufficiently fine grained enough in languages like Java/C#. In other languages, it's less of an issue, and more useful.
I like the definition of SRP as:
"A class has only one business reason to change"
So, as long as behaviours can be grouped into single "business reasons" then there is no reason for them not to co-exist in the same class. Of course, what defines a "business reason" is open to debate (and should be debated by all stakeholders).
Before I get into my rant, here's my opinion in a nutshell: somewhere everything has got to come together... and then a river runs through it.
I am haunted by coding.
=======
Anemic data model and me... well, we pal around a lot. Maybe it's just the nature of small to medium sized applications with very little business logic built into them. Maybe I am just a bit 'tarded.
However, here's my 2 cents:
Couldn't you just factor out the code in the entities and tie it up to an interface?
public class Object1
{
public string Property1 { get; set; }
public string Property2 { get; set; }
private IAction1 action1;
public Object1(IAction1 action1)
{
this.action1 = action1;
}
public void DoAction1()
{
action1.Do(Property1);
}
}
public interface IAction1
{
void Do(string input1);
}
Does this somehow violate the principles of SRP?
Furthermore, isn't having a bunch of classes sitting around not tied to each other by anything but the consuming code actually a larger violation of SRP, but pushed up a layer?
Imagine the guy writing the client code sitting there trying to figure out how to do something related to Object1. If he has to work with your model he will be working with Object1, the data bag, and a bunch of 'services' each with a single responsibility. It'll be his job to make sure all those things interact properly. So now his code becomes a transaction script, and that script will itself contain every responsibility necessary to properly complete that particular transaction (or unit of work).
Furthermore, you could say, "no brah, all he needs to do is access the service layer. It's like Object1Service.DoActionX(Object1). Piece of cake." Well then, where's the logic now? All in that one method? Your still just pushing code around, and no matter what, you'll end up with data and the logic being separated.
So in this scenario, why not expose to the client code that particular Object1Service and have it's DoActionX() basically just be another hook for your domain model? By this I mean:
public class Object1Service
{
private Object1Repository repository;
public Object1Service(Object1Repository repository)
{
this.repository = repository;
}
// Tie in your Unit of Work Aspect'ing stuff or whatever if need be
public void DoAction1(Object1DTO object1DTO)
{
Object1 object1 = repository.GetById(object1DTO.Id);
object1.DoAction1();
repository.Save(object1);
}
}
You still have factored out the actual code for Action1 from Object1 but for all intensive purposes, have a non-anemic Object1.
Say you need Action1 to represent 2 (or more) different operations that you would like to make atomic and separated into their own classes. Just create an interface for each atomic operation and hook it up inside of DoAction1.
That's how I might approach this situation. But then again, I don't really know what SRP is all about.
Convert your plain domain objects to ActiveRecord pattern with a common base class to all domain objects. Put common behaviour in the base class and override the behaviour in derived classes wherever necessary or define the new behaviour wherever required.

Should entities have behavior or not?

Should entities have behavior? or not?
Why or why not?
If not, does that violate Encapsulation?
If your entities do not have behavior, then you are not writing object-oriented code. If everything is done with getters and setters and no other behavior, you're writing procedural code.
A lot of shops say they're practicing SOA when they keep their entities dumb. Their justification is that the data structure rarely changes, but the business logic does. This is a fallacy. There are plenty of patterns to deal with this problem, and they don't involve reducing everything to bags of getters and setters.
Entities should not have behavior. They represent data and data itself is passive.
I am currently working on a legacy project that has included behavior in entities and it is a nightmare, code that no one wants to touch.
You can read more on my blog post: Object-Oriented Anti-Pattern - Data Objects with Behavior .
[Preview] Object-Oriented Anti-Pattern - Data Objects with Behavior:
Attributes and Behavior
Objects are made up of attributes and behavior but Data Objects by definition represent only data and hence can have only attributes. Books, Movies, Files, even IO Streams do not have behavior. A book has a title but it does not know how to read. A movie has actors but it does not know how to play. A file has content but it does not know how to delete. A stream has content but it does not know how to open/close or stop. These are all examples of Data Objects that have attributes but do not have behavior. As such, they should be treated as dumb data objects and we as software engineers should not force behavior upon them.
Passing Around Data Instead of Behavior
Data Objects are moved around through different execution environments but behavior should be encapsulated and is usually pertinent only to one environment. In any application data is passed around, parsed, manipulated, persisted, retrieved, serialized, deserialized, and so on. An entity for example usually passes from the hibernate layer, to the service layer, to the frontend layer, and back again. In a distributed system it might pass through several pipes, queues, caches and end up in a new execution context. Attributes can apply to all three layers, but particular behavior such as save, parse, serialize only make sense in individual layers. Therefore, adding behavior to data objects violates encapsulation, modularization and even security principles.
Code written like this:
book.Write();
book.Print();
book.Publish();
book.Buy();
book.Open();
book.Read();
book.Highlight();
book.Bookmark();
book.GetRelatedBooks();
can be refactored like so:
Book book = author.WriteBook();
printer.Print(book);
publisher.Publish(book);
customer.Buy(book);
reader = new BookReader();
reader.Open(Book);
reader.Read();
reader.Highlight();
reader.Bookmark();
librarian.GetRelatedBooks(book);
What a difference natural object-oriented modeling can make! We went from a single monstrous Book class to six separate classes, each of them responsible for their own individual behavior.
This makes the code:
easier to read and understand because it is more natural
easier to update because the functionality is contained in smaller encapsulated classes
more flexible because we can easily substitute one or more of the six individual classes with overridden versions.
easier to test because the functionality is separated, and easier to mock
It depends on what kind of entity they are -- but the term "entity" implies, to me at least, business entities, in which case they should have behavior.
A "Business Entity" is a modeling of a real world object, and it should encapsulate all of the business logic (behavior) and properties/data that the object representation has in the context of your software.
If you're strictly following MVC, your model (entities) won't have any inherent behavior. I do however include whatever helper methods allow the easiest management of the entities persistence, including methods that help with maintaining its relationship to other entities.
If you plan on exposing your entities to the world, you're better off (generally) keeping behavior off of the entity. If you want to centralize your business operations (i.e. ValidateVendorOrder) you wouldn't want the Order to have an IsValid() method that runs some logic to validate itself. You don't want that code running on a client (what if they fudge it. i.e. akin to not providing any client UI to set the price on an item being placed in a shopping cart, but posting a a bogus price on the URL. If you don't have server-side validation, that's not good! And duplicating that validation is...redundant...DRY (Don't Repeat Yourself).
Another example of when having behaviors on an entity just doesn't work is the notion of lazy loading. Alot of ORMs today will allow you to lazy load data when a property is accessed on an entities. If you're building a 3-tier app, this just doesn't work as your client will ultimately inadvertantly try to make database calls when accessing properties.
These are my off-the-top-of-my-head arguments for keeping behavior off of entities.