Design for persistent objects with decoupled storage

Design for persistent objects with decoupled storage - oop

An application uses several type of data coded as objects. These objects require to be persistent, and the storage back-end may change (file system, sqlite, nedb are likely options).
What are the best way to design the related code to minimize the hassle to change storage ? A specific Store object to which I would pass my objects ? Have my objects inherit from on that does the storage ? Should my object "self-store" or not ?
For information, the practical case is for a local webapp using node-webkit (javascript) but the answer should probably not be language dependent, as long as it is object oriented.

Whenever you have a functionality that may change, abstract it by introducing an interface, and provide an implementation. In your specific example, you can define an interface IRepository, which would have create/read/update/delete methods for manipulating your objects. Then, you can provide implementations that work with specific storage. In your unit tests, for example, you may use a memory-based implementation that does not even deal with files/DBs/etc, but keeps data in lists.

Related

How do I refactor a class with lots of operations which all require its internal data?

My Problem
I have a class with just a few fields but which represents a relatively complicated data structure. This class is central in my program and over time I found myself adding more and more functionality into it, making things a mess. Since (almost) all of its methods rely on its internal fields, I could not think of a way to move some of the methods elsewhere, even though most methods are independent of each other. How can I refactor this class to make it simpler and reduce the number of methods which are directly implemented in it?
More Information
The class in question represents a sort of automaton. It supports a ton of operations such as retrieving information about it, performing various binary operations between it and other automata, querying for specific information stored inside it, saving it to file, etc. Almost all of these operations depend on the precise implementation of the class - in my specific case I maintain an edge-set-based implementation, but other implementations were also used in the past and might be used again in the future.
Except for a narrow set of basic helper methods which are commonly used, most methods are independent of each other.
The language I am using is Java, but I'm hoping for general answers which could be applied to any statically-typed, object-oriented language.
What I've Tried
I tried refactoring it somehow to multiple types, but each of its operations require access to most of its fields, and I'm hesitant about migrating these operations elsewhere because I can't think of a way to do that without exposing the class's implementation.
I'm also not sure where I should migrate the operations to, assuming they are indeed independent of the implementation. An external utility class? An abstract base type? Will appreciate any input about this.

Perhaps you could remodel the data that your class holds, so that instead of holding the data directly, it holds objects that hold the data? Then you could move the methods that manipulate that data into the new classes, leaving the original class as a sort of container / dispatcher class.

What functionality to build into business objects?

What functionality do you think should be built into a persistable business object at bare minimum?
For example:
validation
a way to compare to another object of the same type
undo capability (the ability to roll-back changes)

The functionality dictated by the domain & business.
Read Domain Driven Design.

A persistable business object should consist of the following:
Data
New
Save
Delete
Serialization
Deserialization
Often, you'll abstract the functionality to retrieve them into a repository that supports:
GetByID
GetAll
GetByXYZCriteria
You could also wrap this type of functionality into collection classes (e.g. BusinessObjectTypeCollection), however there's a lot of movement towards using the Repository Pattern in Domain Driven Design to provide these type of accessors (e.g. InvoicingRepository.GetAllCustomers, InvoicingRepository.GetAllInvoices).
You could put the business rules in the New, Save, Update, Delete ... but sometimes you could have an external business rules engine that you pass off the objects to.

This is just one piece of an answer, but I would say that you need a way to get to all objects with which this object has a relationship. In the beginning you may try to be smart and only include one-way navigability for some relationships, but I have found that this is usually more trouble than it's worth.
All persistent frameworks also include finders, ways to do cascading deletes... sorts....
Once you start modeling, all business objects should know how to manage themselves. Whenever you find another class referring TO your business object too much, it's usually time to push that behavior into the business object itself.

Of the three things noted in the question, I would say that validation is the only one that is truly required. The others depend on the overall archetecture of the application.
Also, the business rules should be in the business objects.
Whether an object should do its own serialization is an interesting question. I have had great success in the past by having each object handle its own serialization, but I can also see merit in having a serialization module load and save the business objects just the same way as the GUI writes to and reads from the objects. Then your validation will protect against errors in the database or files too.
I can't think of anything else that is required in general.

How to pass global values around (singleton vs. ???)

I've inherited a project that stores various parameters either in a config file, the registry and a database. Whoever needs one of these parameters just reads (and in some cases writes) it directly from the store. This, or course, is stupid, so my first thought was to refactor the existing code so that the client doesn't know where the parameter is stored in. I created a classic AppSettings class that has a property for each parameter. Since the store has to have global scope I made a thread-safe singleton. The class doesn't store the parameter values in fields but rather acts as an access point by reading and writing them to and from the actual store, be it config file, registry or database. These days it's hard to avoid all the talk about the dangers of singletons and global state. I will take a proper look at dependency injection and Spring etc later, but for now, I just have a couple of questions.
What type of problems, other than testability, can you see with my solution?
What would be a light weight alternative? Creating a factory for each object that uses the parameters is not an option (too much work).
Wouldn't using a singleton serve as an acceptable compromise until I have a chance to do some heavier refactoring?
If the properties in my singleton class only had getters, would that make it OK?
I can anticipate that the store for some of the parameters will change in the future (eg. from registry to database), so that was my motivation for hiding the store behind a singleton class.

This is a bit of a non-answer, but I highly recommend the c2wiki's pages on Singletons as a reference http://c2.com/cgi/wiki?search=Singleton
And also the page http://c2.com/cgi/wiki?GlobalVariablesAreBad
I think the general verdict is that global state creates coupling between vastly different parts of your system that must be thought about and designed around very carefully. The question is, are all of those settings truly global and needed by disparate parts of the system? If not, then is there any way to separate them into smaller parts that can live inside different modules at a lower access level?
If it's a small project I wouldn't worry too much about it, but there is a lot of wisdom on those c2wiki pages about global state and singletons being a pain for larger projects.

I would challenge the assumption that since the config data is global, that you need a global singleton to access it, especially for reading. Consider creating an AppSettings class that can be invoked as needed to read your config settings.
If you need to write in a thread-safe manner, you can create a static (or singleton) private member of the AppSettings class to control writing only. Thus any instance of AppSettings can write, but the "global" access is actually restricted to the AppSettings class.

Thanks, guys. I would consider this a medium-size project (about 200KLOC) and it's C#. The problem is that the project has a long and troubled history and a lot coders have worked on it. As much as I'd like to properly learn dependency injection (as I do understand and subscribe to the concept), the deadline is closing fast so now is not the time for it. After looking at my current singleton class I decided to split it into two instance classes. Some of the parameters are used all over but some only in a single assembly. And like Doug said, I can achieve thread-safety just as easily with an instance class.
As for the various dependency injection frameworks, the problem is there are too many. I've briefly looked at Spring and Unity. I wish I could find a summary of the differences.
Thanks again!

Architecture for a business objects / database access layer

For various reasons, we are writing a new business objects/data storage library. One of the requirements of this layer is to separate the logic of the business rules, and the actual data storage layer.
It is possible to have multiple data storage layers that implement access to the same object - for example, a main "database" data storage source that implements most objects, and another "ldap" source that implements a User object. In this scenario, User can optionally come from an LDAP source, perhaps with slightly different functionality (eg, not possible to save/update the User object), but otherwise it is used by the application the same way. Another data storage type might be a web service, or an external database.
There are two main ways we are looking at implementing this, and me and a co-worker disagree on a fundamental level which is correct. I'd like some advice on which one is the best to use. I'll try to keep my descriptions of each as neutral as possible, as I'm looking for some objective view points here.
Business objects are base classes, and data storage objects inherit business objects. Client code deals with data storage objects.
In this case, common business rules are inherited by each data storage object, and it is the data storage objects that are directly used by the client code.
This has the implication that client code determines which data storage method to use for a given object, because it has to explicitly declare an instance to that type of object. Client code needs to explicitly know connection information for each data storage type it is using.
If a data storage layer implements different functionality for a given object, client code explicitly knows about it at compile time because the object looks different. If the data storage method is changed, client code has to be updated.
Business objects encapsulate data storage objects.
In this case, business objects are directly used by client application. Client application passes along base connection information to business layer. Decision about which data storage method a given object uses is made by business object code. Connection information would be a chunk of data taken from a config file (client app does not really know/care about details of it), which may be a single connection string for a database, or several pieces connection strings for various data storage types. Additional data storage connection types could also be read from another spot - eg, a configuration table in a database that specifies URLs to various web services.
The benefit here is that if a new data storage method is added to an existing object, a configuration setting can be set at runtime to determine which method to use, and it is completely transparent to the client applications. Client apps do not need to be modified if data storage method for a given object changes.
Business objects are base classes, data source objects inherit from business objects. Client code deals primarily with base classes.
This is similar to the first method, but client code declares variables of the base business object types, and Load()/Create()/etc static methods on the business objects return the appropriate data source-typed objects.
The architecture of this solution is similar to the first method, but the main difference is the decision about which data storage object to use for a given business object is made by the business layer, not the client code.
I know there are already existing ORM libraries that provide some of this functionality, but please discount those for now (there is the possibility that a data storage layer is implemented with one of these ORM libraries) - also note I'm deliberately not telling you what language is being used here, other than that it is strongly typed.
I'm looking for some general advice here on which method is better to use (or feel free to suggest something else), and why.

might i suggest another alternative, with possibly better decoupling: business objects use data objects, and data objects implement storage objects. This should keep the business rules in the business objects but without any dependence on the storage source or format, while allowing the data objects to support whatever manipulations are required, including changing the storage objects dynamically (e.g. for online/offline manipulation)
this falls into the second category above (business objects encapsulate data storage objects), but separates data semantics from storage mechanisms more clearly

You can also have a facade to keep from your client to call the business directly. Also it creates common entry points to your business.
As said, your business should not be exposed to anything but your DTO and Facade.
Yes. Your client can deal with DTOs. It's the ideal way to pass data through your application.

I generally prefer the "business object encapsulates data object/storage" best. However, in the short you may find high redundancy with your data objects and your business objects that may seem not worthwhile. This is especially true if you opt for an ORM as the basis of your data-access layer (DAL). But, in the long term is where the real pay off is: application life cycle. As illustrated, it isn't uncommon for "data" to come from one or more storage subsystems (not limited to RDBMS), especially with the advent of cloud computing, and as commonly the case in distributed systems. For example, you may have some data that comes from a Restful service, another chunk or object from a RDBMS, another from an XML file, LDAP, and so on. With this realization, this implies the importance of very good encapsulation of the data access from the business. Take care what dependencies you expose (DI) through your c-tors and properties, too.
That said, an approach I've been toying with is to put the "meat" of the architecture in a business controller. Thinking of contemporary data-access more as a resource than traditional thinking, the controller then accepts in a URI or other form of metadata that can be used to know what data resources it must manage for the business objects. Then, the business objects DO NOT themselves encapsulate the data access; rather the controller does. This keeps your business objects lightweight and specific and allows your controller to provide optimization, composability, transaction ambiance, and so forth. Note that your controller would then "host" your business object collections, much like the controller piece of many ORMs do.
Additionally, also consider business rule management. If you squint hard at your UML (or the model in your head like I do :D ), you will notice that your business rules model are actually another model, sometimes even persistent (if you are using a business rules engine, for example). I'd consider letting the business controller also actually control your rules subsystem too, and let your business object reference the rules through the controller. The reason is because, inevitably, rule implementations often need to perform lookups and cross-checking, in order to determine validity. Often, it might require both hydrated business object lookups, as well as back-end database lookups. Consider detecting duplicate entities, for example, where only the "new" one is hydrated. Leaving your rules to be managed by your business controller, you can then do most anything you need without sacrificing that nice clean abstraction in your "domain model."
In pseudo-code:
using(MyConcreteBusinessContext ctx = new MyConcreteBusinessContext("datares://model1?DataSource=myserver;Catalog=mydatabase;Trusted_Connection=True ruleres://someruleresource?type=StaticRules&handler=My.Org.Business.Model.RuleManager")) {
User user = ctx.GetUserById("SZE543");
user.IsLogonActive = false;
ctx.Save();
}
//a business object
class User : BusinessBase {
public User(BusinessContext ctx) : base(ctx) {}
public bool Validate() {
IValidator v = ctx.GetValidator(this);
return v.Validate();
}
}
// a validator
class UserValidator : BaseValidator, IValidator {
User userInstance;
public UserValidator(User user) {
userInstance = user;
}
public bool Validate() {
// actual validation code here
return true;
}
}

Clients should never deal with storage objects directly. They can deal with DTO's directly, but any object that has any logic for storage that is not wrapped in your business object should not be called by the client directly.

Check out CSLA.net by Rocky Lhotka.

Well, here I am, the co-worker Greg mentioned.
Greg described the alternatives we have been considering with great accuracy. I just want to add some additional considerations to the situation description.
Client code can be unaware about datastorage where business objects are stored, but it is possible either in case when there is only one datastorage, or there are multiple datastorages for the same business object type (users stored in local database and in external LDAP) but the client does not create these business objects. In terms of system analysis, it means that there should be no use cases in which existence of two datastorages of objects of the same type can affect use case flow.
As soon as the need in distinguishing objects created in different data storages arise, the client component must become aware about multiplicity of data storages in its universe, and it will inevitably become responsible for the decision which data storage to use on the moment of object creation (and, I think, object loading from a data storage). Business layer can pretend it is making this decisions, but the algorithm of decision making will be based on type and content of the information coming from the Client component, making the client effectively responsible for the decision.
This responsibility can be implemented in numerous ways: it can be a connection object of specific type for each data storage; it can be segregared methods to call to create new BO instances etc.
Regards,
Michael

CLSA has been around a long time.
However I like the approach that is discussed in Eric Evans book
http://dddcommunity.org/

Dealing with "global" data structures in an object-oriented world

This is a question with many answers - I am interested in knowing what others consider to be "best practice".
Consider the following situation: you have an object-oriented program that contains one or more data structures that are needed by many different classes. How do you make these data structures accessible?
You can explicitly pass references around, for example, in the constructors. This is the "proper" solution, but it means duplicating parameters and instance variables all over the program. This makes changes or additions to the global data difficult.
You can put all of the data structures inside of a single object, and pass around references to this object. This can either be an object created just for this purpose, or it could be the "main" object of your program. This simplifies the problems of (1), but the data structures may or may not have anything to do with one another, and collecting them together in a single object is pretty arbitrary.
You can make the data structures "static". This lets you reference them directly from other classes, without having to pass around references. This entirely avoids the disadvantages of (1), but is clearly not OO. This also means that there can only ever be a single instance of the program.
When there are a lot of data structures, all required by a lot of classes, I tend to use (2). This is a compromise between OO-purity and practicality. What do other folks do? (For what it's worth, I mostly come from the Java world, but this discussion is applicable to any OO language.)

Global data isn't as bad as many OO purists claim!
After all, when implementing OO classes you've usually using an API to your OS. What the heck is this if it isn't a huge pile of global data and services!
If you use some global stuff in your program, you're merely extending this huge environment your class implementation can already see of the OS with a bit of data that is domain specific to your app.
Passing pointers/references everywhere is often taught in OO courses and books, academically it sounds nice. Pragmatically, it is often the thing to do, but it is misguided to follow this rule blindly and absolutely. For a decent sized program, you can end up with a pile of references being passed all over the place and it can result in completely unnecessary drudgery work.
Globally accessible services/data providers (abstracted away behind a nice interface obviously) are pretty much a must in a decent sized app.

I must really really discourage you from using option 3 - making the data static. I've worked on several projects where the early developers made some core data static, only to later realise they did need to run two copies of the program - and incurred a huge amount of work making the data non-static and carefully putting in references into everything.
So in my experience, if you do 3), you will eventually end up doing 1) at twice the cost.
Go for 1, and be fine-grained about what data structures you reference from each object. Don't use "context objects", just pass in precisely the data needed. Yes, it makes the code more complicated, but on the plus side, it makes it clearer - the fact that a FwurzleDigestionListener is holding a reference to both a Fwurzle and a DigestionTract immediately gives the reader an idea about its purpose.
And by definition, if the data format changes, so will the classes that operate on it, so you have to change them anyway.

You might want to think about altering the requirement that lots of objects need to know about the same data structures. One reason there does not seem to be a clean OO way of sharing data is that sharing data is not very object-oriented.
You will need to look at the specifics of your application but the general idea is to have one object responsible for the shared data which provides services to the other objects based on the data encapsulated in it. However these services should not involve giving other objects the data structures - merely giving other objects the pieces of information they need to meet their responsibilites and performing mutations on the data structures internally.

I tend to use 3) and be very careful about the synchronisation and locking across threads. I agree it is less OO, but then you confess to having global data, which is very un-OO in the first place.
Don't get too hung up on whether you are sticking purely to one programming methodology or another, find a solution which fits your problem. I think there are perfectly valid contexts for singletons (Logging for instance).

I use a combination of having one global object and passing interfaces in via constructors.
From the one main global object (usually named after what your program is called or does) you can start up other globals (maybe that have their own threads). This lets you control the setting up of program objects in the main objects constructor and tearing them down again in the right order when the application stops in this main objects destructor. Using static classes directly makes it tricky to initialize/uninitialize any resources these classes use in a controlled manner. This main global object also has properties for getting at the interfaces of different sub-systems of your application that various objects may want to get hold of to do their work.
I also pass references to relevant data-structures into constructors of some objects where I feel it is useful to isolate those objects from the rest of the world within the program when they only need to be concerned with a small part of it.
Whether an object grabs the global object and navigates its properties to get the interfaces it wants or gets passed the interfaces it uses via its constructor is a matter of taste and intuition. Any object you're implementing that you think might be reused in some other project should definately be passed data structures it should use via its constructor. Objects that grab the global object should be more to do with the infrastructure of your application.
Objects that receive interfaces they use via the constructor are probably easier to unit-test because you can feed them a mock interface, and tickle their methods to make sure they return the right arguments or interact with mock interfaces correctly. To test objects that access the main global object, you have to mock up the main global object so that when they request interfaces (I often call these services) from it they get appropriate mock objects and can be tested against them.

I prefer using the singleton pattern as described in the GoF book for these situations. A singleton is not the same as either of the three options described in the question. The constructor is private (or protected) so that it cannot be used just anywhere. You use a get() function (or whatever you prefer to call it) to obtain an instance. However, the architecture of the singleton class guarantees that each call to get() returns the same instance.

We should take care not to confuse Object Oriented Design with Object Oriented Implementation. Al too often, the term OO Design is used to judge an implementation, just as, imho, it is here.
Design
If in your design you see a lot of objects having a reference to exactly the same object, that means a lot of arrows. The designer should feel an itch here. He should verify whether this object is just commonly used, or if it is really a utility (e.g. a COM factory, a registry of some kind, ...).
From the project's requirements, he can see if it really needs to be a singleton (e.g. 'The Internet'), or if the object is shared because it's too general or too expensive or whatsoever.
Implementation
When you are asked to implement an OO Design in an OO language, you face a lot of decisions, like the one you mentioned: how should I implement all the arrows to the oft used object in the design?
That's the point where questions are addressed about 'static member', 'global variable' , 'god class' and 'a-lot-of-function-arguments'.
The Design phase should have clarified if the object needs to be a singleton or not. The implementation phase will decide on how this singleness will be represented in the program.

Option 3) while not purist OO, tends to be the most reasonable solution. But I would not make your class a singleton; and use some other object as a static 'dictionary' to manage those shared resources.

I don't like any of your proposed solutions:
You are passing around a bunch of "context" objects - the things that use them don't specify what fields or pieces of data they are really interested in
See here for a description of the God Object pattern. This is the worst of all worlds
Simply do not ever use Singleton objects for anything. You seem to have identified a few of the potential problems yourself

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas