ORM with or without DAL wrapper - orm

In all the examples I have seen, ORM's tend to be used directly or behind some kind of DAL repository (presumably so that they can be swapped out in the future).
I am no fan of direct ORM use as it will be hard to swap out, but i am equally no fan of losing the full domain change tracking it provides!
In the past I would have written a data mapper class (Fowler) for each object in my domain, but I have learnt through experience that this CRUD coding drains around 1/3 of my time.
I a realize that changing my data access strategy is rather unlikely (I have never had to do so before) but I am really concerned that by using an ORM directly I will be locking myself into using it until the end of time.
I have been thinking about wrapping the ORM (no decision on the ORM itself yet) in a generic ORM container and passing this around to finder classes for each of the domain objects. However, I am totally unsure what a generic ORM wrapper class would look like!
Has anyone got any real life advise here? Please don't feel it nessecary to sugar coat your answers!!

The repository has a number of functions:
It allows for unit testing with a mock implementation
It allows you to hide the full implementation of the ORM from the consumer, and implement security functions
It provides a layer of abstraction for business logic (although some people use a separate service layer for this), and
It allows you to change the ORM implementation, if necessary.
Another container to genericize your ORM feels like overengineering to me. As you pointed out, it is unlikely that you will ever change your underlying implementation, but if you do, your repositories seem like the sensible place to do it.

To point you in the direction of someone much wiser than me on these matters, one of the issues with having a generic ORM wrapper as highlighted by Ayende in his blog post The false myth of encapsulating data access in the DAL is that different ORMs are inherently too different to encapsulate effectively, having different methods for transaction handling, etc.
And on top of that, there's really not much point in switching ORMs anyway - one of the main reasons for encapsulating the DAL in case of change was to cope with switching databases, but most modern ORMs are able to work with many different databases anyway.

Related

Are ORM's counterproductive to OO design?

In OOD, design of an object is said to be characterized by its identity and behavior.
Having used ORM's in the past, the primary purpose, in my opinion, revolves around the ability to store/retrieve data. That is to say, ORM objects are not design by behavior, but rather data (i.e. database tables). Case and point: Many ORM tools come with a point-to-a-database-table-and-click-object-generator.
If objects are no longer characterized by behavior this will, in my opinion, muddy the identity and responsibility of the objects. Subsequently, if objects are not defined by a responsibility this could lend a hand to having tightly coupled classes and overall poor design.
Furthermore, I would think that in an application setting, you would be heading towards scalability issues.
So, my question is, do you think that ORM's are counterproductive to OO design? Perhaps the underlying question would be whether or not they are counterproductive to application development.
There's a well-known and oft-ignored impedance mismatch between the requirements of good database design and the requirements of good OO design. Most developers (in my experience) either do not understand this impedance mismatch or do not care. Since it's more common to start with the database and generate the objects from it (rather than the reverse), then yes, you'll end up with objects that are great as a persistence layer but sub-optimal from an OO perspective. (The reverse, generating the database from the object model, makes me want to stab my eyes out.)
Why are they sub-optimal from an OO perspective? Because the objects produced by an ORM are not business objects, even with partial classes and the like. Business objects model behavior. ORM objects model persistence. I'm not going to spend ten paragraphs arguing this distinction. It's something Rocky Lhotka has covered quite well in his books on Business Objects and his CSLA framework. Whether or not you like or use CSLA, I think his arguments are solid ones.
If objects are no longer characterized by behavior this will, in my opinion, muddy the
identity and responsibility of the objects.
The objects in question do have database reading and writing as defined behavior. They just don't have much other than that.
The reality of the situation is pretty simple: object orientation isn't an end in itself, it's a means to an end -- but in some cases it just doesn't do much to improve the end result. A lot of uses for ORM form a case in point -- they are thousands of variations of CRUD applications that don't need or want to attach any real behavior to most of the data they process.
The application as a whole gains flexibility by not encoding much (if any) of the data's "behavior" into the code of the application itself. Instead, they're often better off with that as "dumb" data, that they simply pass through from UI to database and back out to reports and such. With a bit of care, this can allow a substantial level of user customization that's almost impossible to match when you try to treat the data as real objects with real behavior encoded into the application proper.
Of course, there's another side to that: it can make it substantially more difficult to ensure the integrity of the data or that the data is only used appropriately -- I've seen code that accidentally used the wrong field in a calculation, so they were averaging the office numbers instead of office sizes in square feet. Both were user-defined fields that just said the contents should be numeric. The application had no way to know that one made perfect sense, and the other didn't at all.
Case and point: Many OR/M tools come with a point-to-a-database-table-and-click-object-generator.
Yes but there are equal, if not greater numbers or ORM solutions that base themselves off your objects and generate your database tables.
If you start with data and then tramp down forcing auto-generated objects to have object behaviours, yes, you might get confused... But if you start with the object and generate the database as a secondary layer, you end up with something a lot more usable, even if the database isn't perhaps as optimised as it could be.
If you're looking for an excuse not to use ORM, don't use it. I personally find it saves me thousands of lines of code doing trivial things that the ORM does just great.
I don't believe ORM are counterproductive to OO design, unless you want to insist that persistence is an integral part of behavior.
I'd separate persistence from business behavior. You're certainly free to add busines behavior to any object that an ORM generates for you.
I've also seen ORM systems which tend to go from the OO model and generate the database.
If anything, I would say ORMs are more biased towards producing good OO code than they are to producing good database code.
Ideally, a successful ORM bridges the two worlds and your application code would be great from a business domain problem-solving and implementation perspective and your database code and model would be great from a normalization, performance and ETL/reporting/replication whatever perspective.
On systems where it separates your data from behavior it's absolutely counterproductive.
Orm systems tend to analyze existing database tables to create stupid "Objects" in your language. These things are not true objects because they do not contain business behavior, but since people have these structures, they tend to want to use them.
Ruby on Rails (Active Record) actually binds your data to a "Live" class--this is much better.
I've never seen a system I really liked--ActiveRecord is close but it makes a few rubiesque assumptions that I'm not quite comfortable with--the biggest being supplying public setters & getters by default.
But to sum up--I've seen a lot of good OO programmers write screwed up code because of ORM.
As with most questions of this type, it depends on your usage.
The main ways to use ORM tools are:
Define object by data, use this object throughout application code (BAD BAD BAD)
Use ORM objects only for data access, define your own objects with an interpretive layer between (Much Better)
If starting from scratch a 3rd method is to design data from your object model. (Best if possible)
So yes, if you define the object by the data tables and use that throughout your code you will not be using OOD and introducing very poor design and maintenance issues.
But if you only use the ORM objects as a data access tool (replacing ADO) then you are free to use good OOD and ORM together. Sure, more code is required to build the interpretation layer, but enables much better practices, with not much more code required than old ADO code.
I'm answering this from a C# perspective, since that is where I do the majority of my development....
I see your point, but I think with the ability to create partial classes you can still create objects with any behavior you like and still get the power that an OR/M brings to the table for data retrieval.
From what I've seen of ORMs they do not go against OO principles - fairly orthogonal to them in fact. (for info - pretty new to ORM technology, Java perspective)
My reasoning is that ORMs help you store the data members of a class to a persistent store without having to couple to that store and write that code yourself. You still decide on the data members and write the behaviours of a class.
I guess you could abuse ORMs to break OO principles, but then you can do that with anything. You might use tooling to create skeletal data classes from a pre-existing table, but you would still create methods etc.

.NET Dataset vs Business Object : Why the debate? Why not combine the two?

I read a debate in the comments here (current live site, without comments).
Why the debate? A Dataset for me is like a relational database, an Object is a hierarchical-like model. Why do people absolutely want a "pure" Object model, whereas we still deal with relational databases, so why not combine the two?
And if we should, is there any lightweight, comprehensive framework that allows us to do that (not a heavy mammoth, like NHibernate, which has a huge learning curve)?
"Pure objects" are a lot easier to work with, the typed object gives you intellisense and compile-time type checking.
Bare datasets are very cumbersome and annoying to work with - you need to know the column names, there's no type checking possible, so if you mistype a column name, you're out of luck and won't discover the error until runtime (the worst possible scenario).
Typed datasets are a step in the right direction, but the "things" you work with in your .NET code are still tied very closely and tightly to your database implementation - not typically a good thing, since any change in the underlying database might affect your app all the way up to your UI and cause a lot of changes being necessary.
Using an ORM like NHibernate allows you to better abstract and decouple the database (physical storage) layer from your logical business model - only in the simplest of scenarios will those two be an exact 1:1 match, so you'll need some kind of "translation" or mapping between the two anyway.
So all in all - using typed datasets might be okay for small, simple apps, but for a challenging, larger-scale, enterprise-level business app, I would never recommend coupling your business object model so closely and tightly to the database.
Marc
why do people absolutly want "pure" Object model
Because you don't want your application to depend on the database schema
Well, all the reasons you give were the same as the academical reasons that were given for EJB in Java which was a mess in the past. So arent't people falling into another fashionable hype ?
As I read here:
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
the promise is one thing, the reality is other thing.
Where is the proof upon the claims ?
Scientifically, Complexity is tight to the Concept of Entropy, you cannot reduce the inherent complexity of things, you can just move it somewhere else, so for me there is something fundamentally irational.
Ted Newards is highly controversial because it seems to me that everybody is herding like in the old EJB days: nobody dared to say EJB suck until Rod Johnson gets out with Hibernate.
Now it seems nobody cares to say ORM frameworks like Hibernate, Entity Framework, etc. are too complex, because there isn't yet another Rod Johnson II maybe :)
You pretend that adding a new layer solves the problem, it's not always the case even theorcially, like adding more team members when a project becomes a mess because adding more programmers also mean add to coordination and communication problem.
And in practice, what it seems, is that the layers that should be independant at least from the GUI viewpoint, aren't really. I see many people struggle to do simple stuff in the GUI when they use an ORM.

Can Persistence Ignorance Scale?

I've been having a brief look at NHibernate and Linq2Sql. I'm also intending to have a peek at Entity Framework.
The question that gets raised, when I talk of these ORM's is "They can't scale", so can they? From Google I get the impression they're able to scale well, but ultimately I suppose there must be a price to pay, Is it worth paying for a richer simpler business layer.
This is a good question, and IMHO they can scale just as well as any custom DAL. I've only used nHibernate so I will focus only on it and the features it has which can help scale a system.
Lazy Loading - Since it supports lazy loading you can avoid loading any unnessecary items. Of course you need to watch out for the Select n+1 problem however there are things in the system to prevent this.
Eager Fetching - There are various ways to eagerly fetch objects which you might need allowing you to avoid extra trips to SQL.
Second Level Cache - nHibernate has support for a second level cache which can be used to increase the scalability by reducing trips to the DB. There are various backing providers available which give you some flexibility.
Write your own SQL - In nHibernate you can call stored procedures, or provide the SQL query inline that will return your entities. This will let you use your own SQL when the generated sql doesn't cut it. For example eager loading a self joining tree using a recursive query.
Now with that said, I think it is easier to initially tweak a custom DAL layer because your are intimate with its construction and can fine tune it; however, a good ORM will provide plenty of hooks that allow you to optimize quite a bit. You just need to spend some time learning it.
I also feel that if you have a performance critical area of code and you can't get your ORM to work within your requirements then for that tiny area of your application you can custom build your own DAL layer. If you're using a decent design pattern such as a Repository created by a factory, then all you need to do is swap out the implementation of your repository
Hibernate Shards is being ported to NHibernate, which will allow for horizontal scaling.
There are also some very cool hacks like this one to implement sharding.
So the answer is yes, NHibernate can scale, in a persistance-ignorant and fully-transparent way.
It's simply incorrect to say that apps built in an ORM do not scale well. Certainly it has happened before that careless or lazy devs abuse an ORM by writing code that generates horribly inefficient SQL. Building performant apps means understanding something about what all the lovely abstractions actually do under the hood. It does not take much to stay out of this trap however. Using an ORM doesn't mean never opening SQL profiler or NHibernate Profiler.
And regarding the claim that SPs are just a whole lot faster, read this and this. And besides, ORMs (NHibernate, at least) give you pretty easy ways to use SPs if you ever need to.

traversing object graph from n-tier client

I'm a student currently dabbling in a .Net n-tier app that uses Nhibernate+WCF+WPF.
One of the things that is done quite terribly is object graph serialisation, In fact it isn't done at all, currently associations are ignored and we are using DTOs everywhere.
As far as I can tell one method to proceed is to predefine which objects and collections should be loaded and serialised to go across the wire, thus being able to present some associations to the client, however this seems limited, inflexible and inconsistent (can you tell that I don't like this idea).
One option that occurred to me was to simply replace the NHProxies that lazy load collection on the client tier with a "disconnectedProxy" that would retrieve the associated stuff over the wire. This would mean that we'd have to expand our web service signature a little and do some hackery on our generated proxies but this seemed like a good T4/other code gen experiment.
As far as I can tell this seems to be a common stumbling block but after doing a lot of reading I haven't been able to figure out any good/generally accepted solutions. I'm looking for a bit of direction as much as any particular solution, but if there is an easy way to make the client "feel" connected please let me know.
You ask a very good question that unfortunately does not have a very clean answer. Even if you were able to get lazy loading to work over WCF (which we were able to do) you still would have issues using the proxy interceptor. Trust me on this one, you want POCO objects on the client tier!
What you really need to consider...what has been conceived as the industry standard approach to this problem from the research I have seen, is called persistence vs. usage or persistence ignorance. In other words, your object model and mappings represent your persistence domain but it does not match your ideal usage scenarios. You don't want to bring the whole database down to the client just to display a couple properties right??
It seems like such a simple problem but the solution is either very simple, or very complex. On one hand you can design your entities around your usage scenarios but then you end up with proliferation of your object domain making it difficult to maintain. On the other, you still want the rich object model relationships in order to write granular business logic.
To simplify this problem let’s examine the two main gaps we need to fill…between the database and the database/service layer and the service to client gap. NHibernate fills the first one just fine by providing an ORM to load data into your objects. It does a decent job, but in order to achieve great performance it needs to be tweaked using custom loading strategies. I digress…
The second gap, between the server and client, is where things get dicey. To simplify, imagine if you did not send any mapped entities over the wire to the client? Try creating a mechanism that exchanges business entities into DTO objects and likewise DTO objects into business entities. That way your client deals with only DTOs (POCO of course), and your business logic can maintain its rich structure. This allows you to leverage not only NHibernate’s lazy loading mechanism, but other benefits from the session such as L1 cache.
For brevity and intellectual property reasons I will not go into the design of said mechanism, but hopefully this is enough information to point you in the right direction. If you don’t care about performance or latency at all…just turn lazy loading off all together and work through the serialization issues.
It has been a while for me but the injection/disconnected proxies may not be as bad as it sounds. Since you are a student I am going to assume you have some time and want to muck around a bit.
If you want to inject your own custom serialization/deserialization logic you can use IDataContractSurrogate which can be applied using DataContractSerializerOperationBehavior. I have only done a few basic things with this but it may be worth looking into. By adding some fun logic (read: potentially hackish) at this layer you might be able to make it more connected.
Here is an MSDN post about someone who came to the same realization, DynamicProxy used by NHibernate makes it not possible to directly serialize NHibernate objects doing lazy loading.
If you are really determined to transport the object graph across the network and preserve lazy loading functionality. Take a look at some code I produced over here http://slagd.com/?page_id=6 . Basically it creates a fake session on the other side of the wire and allows the nhibernate proxies to retain their functionality. Not saying it's the right way to do things, but it might give you some ideas.

Code generators or ORMs?

What do you suggest for Data Access layer? Using ORMs like Entity Framework and Hibernate OR Code Generators like Subsonic, .netTiers, T4, etc.?
For me, this is a no-brainer, you generate the code.
I'm going to go slightly off topic here because there's a bigger underlying fallacy at play. The fallacy is that these ORM frameworks solve the object/relational impedence mismatch. This claim is a barefaced lie.
I find the best way to resolve the object/relational impedance mismatch is to either use OOP exclusively and use an object database or use the idioms of the relational database exclusively and ignore OOP.
The abstraction "everything is a table" is to me, much more powerful than the abstraction "everything is a class." It takes less code, less intellectual effort and leads to faster code when you code to the database rather than to an object model.
To me this seems obvious. If your application is data driven then surely your code should be data driven too? Yet to say this is hugely controversial.
The central problem here is that OOP becomes a really leaky abstraction when used in conjunction with a database. Code that look perfectly sensible when written to the idioms of OOP looks completely insane when you see the traffic that code generates at the database. When that messiness becomes a performance problem, OOP is the first casualty.
There is really no way to resolve this. Databases work with sets of data. OOP focus on instances of classes. Trying to marry the two is always going to end in divorce.
So to answer your question, I believe you should generate your classes and try and make them map the underlying database structure as closely as possible.
Perhaps controversially, I've always felt that using code generators for the ADO.NET plumbing is fundamentally solving the wrong problem.
At some point, hopefully not too long after learning about Connection Strings, SqlCommands, DataAdapters, and all that, we notice that:
Such code is ugly
It is very boring to write
It's very easy to miss something if you're doing it by hand
It has to be repeated every time you want to access the database
So, the problem to solve is "how to do the same thing lots of times fast"?
I say no.
Using code generators to make this process quick still means that you have a ton of code, all the same, all over your business classes (or your data access layer, if you separate that from your business logic).
And then, if you want to do something generically (like track stored procedure usage, for instance), you end up having to customise your code generator if it doesn't already have the feature you want. And even if it does, you still have to regenerate everything all the time.
I like to do things once, not many times, no matter how fast I can do them.
So I rolled my own Data Access class that knows how to add parameters, set up and close connections, manage transactions, and other cool stuff. It only had to be written once, and calling its methods from a Business object that needs some database stuff done consists of one line of code.
When I needed to make the application support multithreaded database accesses, it required a change to the Data Access class only, and all the business classes just do what they already did.
There is no right answer it all depends on your project. As Simon points out if your application is all data driven, then it might make sense depending on the size and complexity of the domain to use non oop paradigm. I had a lot of success building a system using a Transaction Script pattern, which passed XML Messages around the system.
However this system started to break down after five or six years as the application grew in size and complexity (5 or 6 webs, several web services, tons of COM+ components, legacy and .net code, 8+ databases with 800+ tables 4,000+ procedures). No one knew what anything was, and duplication was running rampant.
There are other ways to alleviate the maintance then OOP; however, if you have a very complex domain then hainvg a rich domain model is ideal IMHO, as it allows for the business rules to be expressed in nice encapsulated components.
To answer your question, avoid code generators if you can. Code generators are a recipe for disaster, but if you do go with code generation do not modify the generated code. Also be sure to have a good process in place that is easy for developers to get the new generated code.
I recommend using either the following: ORM or hand roll a lightweight DAL. I am currently transitioning a project over to nHibernate off my hand rolled DAL and am having a lot of success; however, I like having the option of using either option. Further if you properly seperate your concerns (Data Access from Business Layer from Presentation) you can have a single service layer that might talk to a Dao (Data Access Object) that for one object is an ORM but for another is hand rolled). I like this flexibility as it allows to apply the best tool to the job.
I like nHibernate over a hand rolled DAL because while my DAL does abstract away most of the ADO.Net code you still have to write the code that takes a data reader to an object or an object and creates the parameters.
I've always preferred to go the code generator route, especially in C# where you can make use of extended classes to add functionality to the basic data objects.
Hate to say this, but it depends. If you find an ORM tool that fits your needs go for it. We wrote our own system in small steps while developing the application. We are using C++ and there are not that many tools out there anyway. Ours ended up being a XML description of the database, from that the SQL generation script and the basic object layer and metadata were generated.
Do your homework and evaluate theses tools and you will find a good fit for your needs.