Architecture for a business objects / database access layer - orm

For various reasons, we are writing a new business objects/data storage library. One of the requirements of this layer is to separate the logic of the business rules, and the actual data storage layer.
It is possible to have multiple data storage layers that implement access to the same object - for example, a main "database" data storage source that implements most objects, and another "ldap" source that implements a User object. In this scenario, User can optionally come from an LDAP source, perhaps with slightly different functionality (eg, not possible to save/update the User object), but otherwise it is used by the application the same way. Another data storage type might be a web service, or an external database.
There are two main ways we are looking at implementing this, and me and a co-worker disagree on a fundamental level which is correct. I'd like some advice on which one is the best to use. I'll try to keep my descriptions of each as neutral as possible, as I'm looking for some objective view points here.
Business objects are base classes, and data storage objects inherit business objects. Client code deals with data storage objects.
In this case, common business rules are inherited by each data storage object, and it is the data storage objects that are directly used by the client code.
This has the implication that client code determines which data storage method to use for a given object, because it has to explicitly declare an instance to that type of object. Client code needs to explicitly know connection information for each data storage type it is using.
If a data storage layer implements different functionality for a given object, client code explicitly knows about it at compile time because the object looks different. If the data storage method is changed, client code has to be updated.
Business objects encapsulate data storage objects.
In this case, business objects are directly used by client application. Client application passes along base connection information to business layer. Decision about which data storage method a given object uses is made by business object code. Connection information would be a chunk of data taken from a config file (client app does not really know/care about details of it), which may be a single connection string for a database, or several pieces connection strings for various data storage types. Additional data storage connection types could also be read from another spot - eg, a configuration table in a database that specifies URLs to various web services.
The benefit here is that if a new data storage method is added to an existing object, a configuration setting can be set at runtime to determine which method to use, and it is completely transparent to the client applications. Client apps do not need to be modified if data storage method for a given object changes.
Business objects are base classes, data source objects inherit from business objects. Client code deals primarily with base classes.
This is similar to the first method, but client code declares variables of the base business object types, and Load()/Create()/etc static methods on the business objects return the appropriate data source-typed objects.
The architecture of this solution is similar to the first method, but the main difference is the decision about which data storage object to use for a given business object is made by the business layer, not the client code.
I know there are already existing ORM libraries that provide some of this functionality, but please discount those for now (there is the possibility that a data storage layer is implemented with one of these ORM libraries) - also note I'm deliberately not telling you what language is being used here, other than that it is strongly typed.
I'm looking for some general advice here on which method is better to use (or feel free to suggest something else), and why.

might i suggest another alternative, with possibly better decoupling: business objects use data objects, and data objects implement storage objects. This should keep the business rules in the business objects but without any dependence on the storage source or format, while allowing the data objects to support whatever manipulations are required, including changing the storage objects dynamically (e.g. for online/offline manipulation)
this falls into the second category above (business objects encapsulate data storage objects), but separates data semantics from storage mechanisms more clearly

You can also have a facade to keep from your client to call the business directly. Also it creates common entry points to your business.
As said, your business should not be exposed to anything but your DTO and Facade.
Yes. Your client can deal with DTOs. It's the ideal way to pass data through your application.

I generally prefer the "business object encapsulates data object/storage" best. However, in the short you may find high redundancy with your data objects and your business objects that may seem not worthwhile. This is especially true if you opt for an ORM as the basis of your data-access layer (DAL). But, in the long term is where the real pay off is: application life cycle. As illustrated, it isn't uncommon for "data" to come from one or more storage subsystems (not limited to RDBMS), especially with the advent of cloud computing, and as commonly the case in distributed systems. For example, you may have some data that comes from a Restful service, another chunk or object from a RDBMS, another from an XML file, LDAP, and so on. With this realization, this implies the importance of very good encapsulation of the data access from the business. Take care what dependencies you expose (DI) through your c-tors and properties, too.
That said, an approach I've been toying with is to put the "meat" of the architecture in a business controller. Thinking of contemporary data-access more as a resource than traditional thinking, the controller then accepts in a URI or other form of metadata that can be used to know what data resources it must manage for the business objects. Then, the business objects DO NOT themselves encapsulate the data access; rather the controller does. This keeps your business objects lightweight and specific and allows your controller to provide optimization, composability, transaction ambiance, and so forth. Note that your controller would then "host" your business object collections, much like the controller piece of many ORMs do.
Additionally, also consider business rule management. If you squint hard at your UML (or the model in your head like I do :D ), you will notice that your business rules model are actually another model, sometimes even persistent (if you are using a business rules engine, for example). I'd consider letting the business controller also actually control your rules subsystem too, and let your business object reference the rules through the controller. The reason is because, inevitably, rule implementations often need to perform lookups and cross-checking, in order to determine validity. Often, it might require both hydrated business object lookups, as well as back-end database lookups. Consider detecting duplicate entities, for example, where only the "new" one is hydrated. Leaving your rules to be managed by your business controller, you can then do most anything you need without sacrificing that nice clean abstraction in your "domain model."
In pseudo-code:
using(MyConcreteBusinessContext ctx = new MyConcreteBusinessContext("datares://model1?DataSource=myserver;Catalog=mydatabase;Trusted_Connection=True ruleres://someruleresource?type=StaticRules&handler=My.Org.Business.Model.RuleManager")) {
User user = ctx.GetUserById("SZE543");
user.IsLogonActive = false;
ctx.Save();
}
//a business object
class User : BusinessBase {
public User(BusinessContext ctx) : base(ctx) {}
public bool Validate() {
IValidator v = ctx.GetValidator(this);
return v.Validate();
}
}
// a validator
class UserValidator : BaseValidator, IValidator {
User userInstance;
public UserValidator(User user) {
userInstance = user;
}
public bool Validate() {
// actual validation code here
return true;
}
}

Clients should never deal with storage objects directly. They can deal with DTO's directly, but any object that has any logic for storage that is not wrapped in your business object should not be called by the client directly.

Check out CSLA.net by Rocky Lhotka.

Well, here I am, the co-worker Greg mentioned.
Greg described the alternatives we have been considering with great accuracy. I just want to add some additional considerations to the situation description.
Client code can be unaware about datastorage where business objects are stored, but it is possible either in case when there is only one datastorage, or there are multiple datastorages for the same business object type (users stored in local database and in external LDAP) but the client does not create these business objects. In terms of system analysis, it means that there should be no use cases in which existence of two datastorages of objects of the same type can affect use case flow.
As soon as the need in distinguishing objects created in different data storages arise, the client component must become aware about multiplicity of data storages in its universe, and it will inevitably become responsible for the decision which data storage to use on the moment of object creation (and, I think, object loading from a data storage). Business layer can pretend it is making this decisions, but the algorithm of decision making will be based on type and content of the information coming from the Client component, making the client effectively responsible for the decision.
This responsibility can be implemented in numerous ways: it can be a connection object of specific type for each data storage; it can be segregared methods to call to create new BO instances etc.
Regards,
Michael

CLSA has been around a long time.
However I like the approach that is discussed in Eric Evans book
http://dddcommunity.org/

Related

Bidirectional communication between model objects in Objective C

I have a question regarding programming concepts rather than a specific question relating to some specific code.
I have two model objects, one relating to Core Data and one relating to Twitter.
They need to interact with each other. The Twitter object might want some tweets from the database, whereas the Core Data object might want to write some tweets into the database.
I could write public methods on each of the classes and have each class call those methods.
However, I feel that this is quite a tight coupling and I want to some other method of communication between the objects.
Would a protocol-delegate system be more appropriate in this scenario?
For example, with the Twitter class declaring a TwitterDataSource protocol and the Core Data class acting as the delegate for that protocol. And vica-versa.
Thanks very much,
Vazzyb
You are correct, that coupling would be tight. If you would like to loosen the coupling consider using a Mediator design pattern. As things change you need to only change how the mediator handles the communication between the two objects but not the two individual objects themselves.
(source: devlake.com)
They need to interact with each other. The Twitter object might want some tweets from the database, whereas the Core Data object might want to write some tweets into the database.
Let me stop you right there. That is a terrible design pattern no matter who you are. Your separation of powers, instead of making your life easier, has created a divide in your project that you now have to remedy by making each object reference each other. Both of the activities of these objects falls under the concept of a controller, anyhow. The first can be refactored out into an asynchronous operation, especially if it requires informing the database controller that it's finished. Instead of considering a delegate, write an NSOperation subclass that writes to the database (serially, of course), and make the database controller mediate both the result of the operation, and the tweets that come in from the other object that get written into the database. No more mutual references (this is not bidirectionality), no more dual-controller objects, no more hassle.

DDD: Where to put persistence logic, and when to use ORM mapping

We are taking a long, hard look at our (Java) web application patterns. In the past, we've suffered from an overly anaemic object model and overly procedural separation between controllers, services and DAOs, with simple value objects (basically just bags of data) travelling between them. We've used declarative (XML) managed ORM (Hibernate) for persistence. All entity management has taken place in DAOs.
In trying to move to a richer domain model, we find ourselves struggling with how best to design the persistence layer. I've spent a lot of time reading and thinking about Domain Driven Design patterns. However, I'd like some advice.
First, the things I'm more confident about:
We'll have "thin" controllers at the front that deal only with HTTP and HTML - processing forms, validation, UI logic.
We'll have a layer of stateless business logic services that implements common algorithms or logic, unaware of the UI, but very much aware of (and delegating to) the domain model.
We'll have a richer domain model which contains state, relationships, and logic inherent to the objects in that domain model.
The question comes around persistence. Previously, our services would be injected (via Spring) with DAOs, and would use DAO methods like find() and save() to perform persistence. However, a richer domain model would seem to imply that objects should know how to save and delete themselves, and perhaps that higher level services should know how to locate (query for) domain objects.
Here, a few questions and uncertainties arise:
Do we want to inject DAOs into domain objects, so that they can do "this.someDao.save(this)" in a save() method? This is a little awkward since domain objects are not singletons, so we'll need factories or post-construction setting of DAOs. When loading entities from a database, this gets messy. I know Spring AOP can be used for this, but I couldn't get it to work (using Play! framework, another line of experimentation) and it seems quite messy and magical.
Do we instead keep DAOs (repositories?) completely separate, on par with stateless business logic services? This can make some sense, but it means that if "save" or "delete" are inherent operations of a domain object, the domain object can't express those.
Do we just dispense with DAOs entirely and use JPA to let entities manage themselves.
Herein lies the next subtlety: It's quite convenient to map entities using JPA. The Play! framework gives us a nice entity base class, too, with operations like save() and delete(). However, this means that our domain model entities are quite closely tied to the database structure, and we are passing objects around with a large amount of persistence logic, perhaps all the way up to the view layer. If nothing else, this will make the domain model less re-usable in other contexts.
If we want to avoid this, then we'd need some kind of mapping DAO - either using simple JDBC (or at least Spring's JdbcTemplate), or using a parallel hierarchy of database entities and "business" entities, with DAOs forever copying information from one hierarchy to another.
What is the appropriate design choice here?
Martin
Your questions and doubts ring an interesting alarm here, I think you went a bit too far in your interpretation of a "rich domain model". Richness doesn't go as far as implying that persistence logic must be handled by the domain objects, in other words, no, they shouldn't know how to save and delete themselves (at least not explicitely, though Hibernate actually adds some persistence logic transparently). This is often referred to as persistence ignorance.
I suggest that you keep the existing DAO injection system (a nice thing to have for unit testing) and leave the persistence layer as is while trying to move some business logic to your entities where it's fit. A good starting point to do that is to identify Aggregates and establish your Aggregate Roots. They'll often contain more business logic than the other entities.
However, this is not to say domain objects should contain all logic (especially not logic needed by many other objects across the application, which often belongs in Services).
I am not a Java expert, but I use NHibernate in my .NET code so my experience should be directly translatable to the Java world.
When using ORM (like Hibernate you mentioned) to build Domain-Driven Design application, one of good (I won't say best) practices is to create so-called application services between the UI and the Domain. They are similar to stateless business objects you mentioned, but should contain almost no logic. They should look like this:
public void SayHello(int id, String helloString)
{
SomeDomainObject target = domainObjectRepository.findById(id); //This uses Hibernate to load the object.
target.sayHello(helloString); //There is a single domain object method invocation per application service method.
domainObjectRepository.Save(target); //This one is optional. Hibernate should already know that this object needs saving because it tracks changes.
}
Any changes to objects contained by DomainObject (also adding objects to collections) will be handled by Hibernate.
You will also need some kind of AOP to intercept application service method invocations and create Hibernate's session before the method executes and save changes after method finishes with no exceptions.
There is a really good sample how to do DDD in Java here. It is based on the sample problem from Eric Evans' 'Blue Book'. The application logic class sample code is here.

Object persistence terminology: 'repository' vs. 'store' vs. 'context' vs. 'retriever' vs. (...)

I'm not sure how to name data store classes when designing a program's data access layer (DAL).
(By data store class, I mean a class that is responsible to read a persisted object into memory, or to persist an in-memory object.)
It seems reasonable to name a data store class according to two things:
what kinds of objects it handles;
whether it loads and/or persists such objects.
⇒ A class that loads Banana objects might be called e.g. BananaSource.
I don't know how to go about the second point (ie. the Source bit in the example). I've seen different nouns apparently used for just that purpose:
repository: this sounds very general. Does this denote something read-/write-accessible?
store: this sounds like something that potentially allows write access.
context: sounds very abstract. I've seen this with LINQ and object-relational mappers (ORMs).
P.S. (several months later): This is probably appropriate for containers that contain "active" or otherwise supervised objects (the Unit of Work pattern comes to mind).
retriever: sounds like something read-only.
source & sink: probably not appropriate for object persistence; a better fit with data streams?
reader / writer: quite clear in its intention, but sounds too technical to me.
Are these names arbitrary, or are there widely accepted meanings / semantic differences behind each? More specifically, I wonder:
What names would be appropriate for read-only data stores?
What names would be appropriate for write-only data stores?
What names would be appropriate for mostly read-only data stores that are occasionally updated?
What names would be appropriate for mostly write-only data stores that are occasionally read?
Does one name fit all scenarios equally well?
As noone has yet answered the question, I'll post on what I have decided in the meantime.
Just for the record, I have pretty much decided on calling most data store classes repositories. First, it appears to be the most neutral, non-technical term from the list I suggested, and it seems to be well in line with the Repository pattern.
Generally, "repository" seems to fit well where data retrieval/persistence interfaces are something similar to the following:
public interface IRepository<TResource, TId>
{
int Count { get; }
TResource GetById(TId id);
IEnumerable<TResource> GetManyBySomeCriteria(...);
TId Add(TResource resource);
void Remove(TId id);
void Remove(TResource resource);
...
}
Another term I have decided on using is provider, which I'll be preferring over "repository" whenever objects are generated on-the-fly instead of being retrieved from a persistence store, or when access to a persistence store happens in a purely read-only manner. (Factory would also be appropriate, but sounds more technical, and I have decided against technical terms for most uses.)
P.S.: Some time has gone by since writing this answer, and I've had several opportunities at work to review someone else's code. One term I've thus added to my vocabulary is Service, which I am reserving for SOA scenarios: I might publish a FooService that is backed by a private Foo repository or provider. The "service" is basically just a thin public-facing layer above these that takes care of things like authentication, authorization, or aggregating / batching DTOs for proper "chunkiness" of service responses.
Well so to add something to you conclusion:
A repository: is meant to only care about one entity and has certain patterns like you did.
A store: is allowed to do a bit more, also working with other entities.
A reader/writer: is separated to allow semantically show and inject only reading and wrting functionality into other classes. It's coming from the CQRS pattern.
A context: is more or less bound to a ORM mapper as you mentioned and is usually used under the hood of a repository or store, some use it directly instead of making a repository on top. But it's harder to abstract.

Repository, Service or Domain object - where does logic belong?

Take this simple, contrived example:
UserRepository.GetAllUsers();
UserRepository.GetUserById();
Inevitably, I will have more complex "queries", such as:
//returns users where active=true, deleted=false, and confirmed = true
GetActiveUsers();
I'm having trouble determining where the responsibility of the repository ends. GetActiveUsers() represents a simple "query". Does it belong in the repository?
How about something that involves a bit of logic, such as:
//activate the user, set the activationCode to "used", etc.
ActivateUser(string activationCode);
Repositories are responsible for the application-specific handling of sets of objects. This naturally covers queries as well as set modifications (insert/delete).
ActivateUser operates on a single object. That object needs to be retrieved, then modified. The repository is responsible for retrieving the object from the set; another class would be responsible for invoking the query and using the object.
These are all excellent questions to be asking. Being able to determine which of these you should use comes down to your experience and the problem you are working on.
I would suggest reading a book such as Fowler's patterns of enterprise architecture. In this book he discusses the patterns you mention. Most importantly though he assigns each pattern a responsibility. For instance domain logic can be put in either the Service or Domain layers. There are pros and cons associated with each.
If I decide to use a Service layer I assign the layer the role of handling Transactions and Authorization. I like to keep it 'thin' and have no domain logic in there. It becomes an API for my application. I keep all business logic with the domain objects. This includes algorithms and validation for the object. The repository retrieves and persists the domain objects. This may be a one to one mapping between database columns and domain properties for simple systems.
I think GetAtcitveUsers is ok for the Repository. You wouldnt want to retrieve all users from the database and figure out which ones are active in the application as this would lead to poor performance. If ActivateUser has business logic as you suggest, then that logic belongs in the domain object. Persisting the change is the responsibility of the Repository layer.
Hope this helps.
When building DDD projects I like to differentiate two responsibilities: a Repository and a Finder.
A Repository is responsible for storing aggregate roots and for retrieving them, but only for usage in command processing. By command processing I meant executing any action a user invoked.
A Finder is responsible for querying domain objects for purposes of UI, like grid views and details views.
I don't consider finders to be a part of domain model. The particular IXxxFinder interfaces are placed in presentation layer, not in the domain layer. Implementation of both IXxxRepository and IXxxFinder are placed in data access layer, possibly even in the same class.

What functionality to build into business objects?

What functionality do you think should be built into a persistable business object at bare minimum?
For example:
validation
a way to compare to another object of the same type
undo capability (the ability to roll-back changes)
The functionality dictated by the domain & business.
Read Domain Driven Design.
A persistable business object should consist of the following:
Data
New
Save
Delete
Serialization
Deserialization
Often, you'll abstract the functionality to retrieve them into a repository that supports:
GetByID
GetAll
GetByXYZCriteria
You could also wrap this type of functionality into collection classes (e.g. BusinessObjectTypeCollection), however there's a lot of movement towards using the Repository Pattern in Domain Driven Design to provide these type of accessors (e.g. InvoicingRepository.GetAllCustomers, InvoicingRepository.GetAllInvoices).
You could put the business rules in the New, Save, Update, Delete ... but sometimes you could have an external business rules engine that you pass off the objects to.
This is just one piece of an answer, but I would say that you need a way to get to all objects with which this object has a relationship. In the beginning you may try to be smart and only include one-way navigability for some relationships, but I have found that this is usually more trouble than it's worth.
All persistent frameworks also include finders, ways to do cascading deletes... sorts....
Once you start modeling, all business objects should know how to manage themselves. Whenever you find another class referring TO your business object too much, it's usually time to push that behavior into the business object itself.
Of the three things noted in the question, I would say that validation is the only one that is truly required. The others depend on the overall archetecture of the application.
Also, the business rules should be in the business objects.
Whether an object should do its own serialization is an interesting question. I have had great success in the past by having each object handle its own serialization, but I can also see merit in having a serialization module load and save the business objects just the same way as the GUI writes to and reads from the objects. Then your validation will protect against errors in the database or files too.
I can't think of anything else that is required in general.