Can Persistence Ignorance Scale? - nhibernate

I've been having a brief look at NHibernate and Linq2Sql. I'm also intending to have a peek at Entity Framework.
The question that gets raised, when I talk of these ORM's is "They can't scale", so can they? From Google I get the impression they're able to scale well, but ultimately I suppose there must be a price to pay, Is it worth paying for a richer simpler business layer.

This is a good question, and IMHO they can scale just as well as any custom DAL. I've only used nHibernate so I will focus only on it and the features it has which can help scale a system.
Lazy Loading - Since it supports lazy loading you can avoid loading any unnessecary items. Of course you need to watch out for the Select n+1 problem however there are things in the system to prevent this.
Eager Fetching - There are various ways to eagerly fetch objects which you might need allowing you to avoid extra trips to SQL.
Second Level Cache - nHibernate has support for a second level cache which can be used to increase the scalability by reducing trips to the DB. There are various backing providers available which give you some flexibility.
Write your own SQL - In nHibernate you can call stored procedures, or provide the SQL query inline that will return your entities. This will let you use your own SQL when the generated sql doesn't cut it. For example eager loading a self joining tree using a recursive query.
Now with that said, I think it is easier to initially tweak a custom DAL layer because your are intimate with its construction and can fine tune it; however, a good ORM will provide plenty of hooks that allow you to optimize quite a bit. You just need to spend some time learning it.
I also feel that if you have a performance critical area of code and you can't get your ORM to work within your requirements then for that tiny area of your application you can custom build your own DAL layer. If you're using a decent design pattern such as a Repository created by a factory, then all you need to do is swap out the implementation of your repository

Hibernate Shards is being ported to NHibernate, which will allow for horizontal scaling.
There are also some very cool hacks like this one to implement sharding.
So the answer is yes, NHibernate can scale, in a persistance-ignorant and fully-transparent way.

It's simply incorrect to say that apps built in an ORM do not scale well. Certainly it has happened before that careless or lazy devs abuse an ORM by writing code that generates horribly inefficient SQL. Building performant apps means understanding something about what all the lovely abstractions actually do under the hood. It does not take much to stay out of this trap however. Using an ORM doesn't mean never opening SQL profiler or NHibernate Profiler.
And regarding the claim that SPs are just a whole lot faster, read this and this. And besides, ORMs (NHibernate, at least) give you pretty easy ways to use SPs if you ever need to.

Related

ORM with or without DAL wrapper

In all the examples I have seen, ORM's tend to be used directly or behind some kind of DAL repository (presumably so that they can be swapped out in the future).
I am no fan of direct ORM use as it will be hard to swap out, but i am equally no fan of losing the full domain change tracking it provides!
In the past I would have written a data mapper class (Fowler) for each object in my domain, but I have learnt through experience that this CRUD coding drains around 1/3 of my time.
I a realize that changing my data access strategy is rather unlikely (I have never had to do so before) but I am really concerned that by using an ORM directly I will be locking myself into using it until the end of time.
I have been thinking about wrapping the ORM (no decision on the ORM itself yet) in a generic ORM container and passing this around to finder classes for each of the domain objects. However, I am totally unsure what a generic ORM wrapper class would look like!
Has anyone got any real life advise here? Please don't feel it nessecary to sugar coat your answers!!
The repository has a number of functions:
It allows for unit testing with a mock implementation
It allows you to hide the full implementation of the ORM from the consumer, and implement security functions
It provides a layer of abstraction for business logic (although some people use a separate service layer for this), and
It allows you to change the ORM implementation, if necessary.
Another container to genericize your ORM feels like overengineering to me. As you pointed out, it is unlikely that you will ever change your underlying implementation, but if you do, your repositories seem like the sensible place to do it.
To point you in the direction of someone much wiser than me on these matters, one of the issues with having a generic ORM wrapper as highlighted by Ayende in his blog post The false myth of encapsulating data access in the DAL is that different ORMs are inherently too different to encapsulate effectively, having different methods for transaction handling, etc.
And on top of that, there's really not much point in switching ORMs anyway - one of the main reasons for encapsulating the DAL in case of change was to cope with switching databases, but most modern ORMs are able to work with many different databases anyway.

Starting on ORMS - Nhibernate

I am starting to delve into the realm of ORMs, particularly NHibernate in developing .NET data-aware applications. I must say that the learning curve is pretty steep and that a lot of things should be noted. Apparently, it actually changes the way you do data-aware applications, manner of coding, development and just about everything.
Anyway, I want to ask if you do set some parameters when deciding to USE or NOT TO USE ORMs in your applications? How do you decide then the approach that one needs to make it valuable to your organization?
The organization which I work for now apparently has made a lot of SQL and Data Access thing running through back end and I must say that these class/methods/procedures have successfully performed their tasks of providing the data which is needed and when it is needed. I think it would be a tremendous effort just to map some of this into ORM and derive the same business value that the company has for the last few years.
Nevertheless, I know that ORM paves the way for applications to talk with database servers, if properly implemented. I must admit that I am at a learning stage and that I would possibly need all the help, resources and the guidance to make this transition. I was also thinking of buying the book from Manning but I feel that with so much changes to NHibernate, the book may be a bit outdated. Perhaps waiting for the Packt book on NHibernate (release on May 2010??) would help me better get up and running.
Kindly share your thoughts. By the way, if you could also point me in a small sample web app which uses NHibernate + Visual Web Developer 2008 Express and SQL Server, that would be highly appreciated.
Thanks.
For me, the short of it is the following:
If you don't use an established ORM, and you develop correctly (meaning you refactor out duplication and look to simplify where you can), you'll wind up building your own ORM through the evolution of your data access layer.
The question then becomes:
"Do I want my developers spending time learning the idiosyncrasies of my home-grown ORM or learning those of a well-documented and well-tested ORM?"
Furthermore:
"If I'm hiring a new developer, wouldn't it be nicer to bring in a developer that knows the established ORM tool we're using rather than having to train someone up on this thing I built?"
I use NHibernate, particularly Fluent - and it's great; if given the choice, I wouldn't develop on an RDBMS any other way.
To be successful with an ORM you must make sure to normalize correctly, and use the database for it's designed purpose, storing data.
I don't use an orm when:
I don't use a relation database (Relational databases are not the best choice of database for every application)
The database is has a very small amount of tables. (I might need less code without an orm)
I use a very simple database that can map to code with simple naming
conventions. (Mapping to dumb DTO classes and all queries like select * from tablename where id=#id)
Learning a good orm is worth the time and effort, it will save you writing a lot of code when you use relational databases a lot.
You can find example apps/tutorials/video's about NHibernate on with stackoverflow search. There is another book in progress by manning, maybe it's possible to read it with the early access program.

O/R Mappers - Good or bad

I am really torn right now between using O/R mappers or just sticking to traditional data access. For some reason, every time I bring up O/R mappers, fellow developers cringe and speak about performance issues or how they're just bad in general. What am I missing here? I'm looking at LINQ to SQL and Microsoft Entity Framework. Is there any basis to any of these claims? What kind of things do I have to compromise if I want to use an O/R mapper. Thanks.
This will seem like an unrelated answer at first, but: one of my side interests is WWII-era fighter planes. All of the combatant nations (US, Great Britain, Germany, USSR, Japan etc.) built a bunch of different fighters during the war. Some of them used radial engines (P47, Corsair, FW-190, Zero); some used inline liquid-cooled engines (Bf-109, Mustang, Yak-7, Spitfire); and some used two engines instead of one (P38, Do-335). Some used machine guns, some used cannons, and some used both. Some were even made out of plywood, if you can imagine.
In the end, they all went really really fast, and in the hands of a competent, experienced pilot, they would shoot your rookie ass down in a heartbeat. I don't imagine many pilots flew around thinking "oh, that idiot is flying something with a radial engine - I don't have to worry about him at all". Everyone understood that there were many different ways of achieving the ultimate goal, and each approach had its particular advantages and disadvantages, depending on the circumstances.
The debate between ORM and traditional data access is just like this, and it behooves any programmer to become competent with both approaches, and choose the option that is right for the job at hand.
I struggled with this decision for a long time. I think I was hesitant for two primary reasons. First, O/R mappers represented a lack of control over what was happening in a critical part of the app and, second, because so many times I've been disappointed by solutions that are awesome for the 90% case but miserable for the last 10%. Everything works for select * from authors, of course, but when you crank up the complexity and have a high-volume, critical system and your career is on the line, you feel you need to have complete control to tune every query pattern and byte over the wire. Most developers, including me, get frustrated the first time the tool fails us, and we cannot do what we need to do, or our need deviates from the established pattern supported by the tool. I'll probably get flamed for mentioning specific flaws in tools, so I'll leave it at that.
Fortunately, Anderson Imes finally convinced me to try CodeSmith with the netTiers template. (No, I don't work for them.) After more than a year using this, I can't believe we didn't do it sooner. My team uses Visual Studio DB Pro, and on every check-in our continuous integration build drops out a new set of data access layer assemblies. This handles all the common, low risk stuff automatically, yet we can still write custom sprocs for the tricky bits and have them included as methods on the generated classes, and we can customize the templates for the generated code as well. I highly recommend this approach. There may be other tools that allow this level of control as well, and there is a newer CodeSmith template called PLINQO that uses LINQ to SQL under the hood. We haven't that yet examined (haven't needed to), but this overall approach has a lot of merit.
Jerry
O/RM tools designed to perform very well in most situations. It will cache entities for you, it will execute queries in bulks, it has a very low level optimised access to objects which is way faster than manually assigning values to properties, they give you a very easy way to incorporate variations of aspect oriented programming using modern technics like interceptors, it will manage entity state for you and help resolve conflicts and many more.
Now cons of this approach usually lies in lack of understanding of how things work on a very low level. Most classic problem is "SELECT N+1" (link).
I've been working with NHibernate for 2.5 years now, and I'm still discovering something new about it almost on a daily basis...
Good. In most cases.
The productivity benefit of using an ORM, will in most case outweigh the loss of control over how the data is accessed.
There are not that many who would avoid C#, in order to program is MSIL or Assembly, although that would give them more control.
The problem that i see with a lot of OR mappers is that you get bloated domain objects, which are usually highly coupled with the rest of your data access framework. Our developers cringe at that as well :) It's just harder to port these object to another data access technology. If you use L2S, you can take a look at the generated code. It looks like a complete mess. NHibernate is probably one of the best at this. Your entities are completely unaware of your data access layer, if you design them right.
It really depends on the situation.
I went from a company that used a tweaked out ORM to a company that did not use a ORM and wrote SQL queries all the time. When I asked about using an ORM to simplify the code, I got that blank look in the face followed by all the negatives of it:
Its High Bloat
you don't have fine control over your queries and execute unnecessary ones
there is a heavy object to table mapping
its not dry code because you have to repeat your self
on an on
Well, after working there for a few weeks, I had noticed that:
we had several queries that were almost identical, and alot of times if there was a bug, only a handful would get fixed
instead of caching common tables queries, we would end up reading a table multiple times.
We were repeating our selves all over the place
We had several levels of skill level, so some queries were not written the most efficiently.
After I pointed most of this out, they wrote a "DBO" because the didn't want to call it an ORM. They decided to write one from scratch instead of tweaking out one.
Also, alot of the arguments come from ignorance against ORM's I feel. Every ORM that I have seen allows for custom queries, and even following the ORM's conventions, you can write very complex and detailed queries and normally are more human readable. Also, they tend to be very DRY, You give them your schema, and they figure the rest out, down to relationship mapping.
Modern ORM's have a lot of tools to help you out, like migration scripts, multiple DB types accessed to the same objects so you can leverage advantages of both NOSQL and SQL DB's. But you have to pick the right ORM for your project if your going to use one.
I first got into ORM mapping and Data Access Layers from reading Rockford Lhotka's book, C# business objects. He's spent years working on a framework for DAL's. While his framework out of the box is quite bloated and in some cases, overkill, he has some excellent ideas. I highly recommend the book for anyone looking at ORM mappers. I was influenced by his book enough to take away a lot of his ideas and build them into my own framework and code generation.
There is no simple answer to this since each ORM provider will have it's own particular pluses and minuses. Some ORM solutions are more flexible than others. The onus is on the developer to understand these before using one.
However, take LinqToSql - if you are sure you are not going to need to switch away from SQL Server then this solves a lot of the common problems seen in ORM mappers. It allows you to easily add stored procedures (as static methods), so you aren't just limited to generated SQL. It uses deferred execution, so that you can chain queries together efficiently. It uses partial classes to allow you to easily add custom logic to generated classes without needing to worry about what happens when you re-generate them. There is also nothing stopping you using LINQ to create your own, abstracted DAL - it just speeds up the process. The main, thing, though is that it alleviates the tedium and time required to create basic CRUD layer.
But there are downsides, too. There will be a tight coupling between your tables and classes, there will be a slight performance drop, you may occasionally generate queries that are not as efficient as you expected. And you are tied in to SQL Server (though some other ORM technlogies are database agnostic).
As I said, the main thing is to be aware of the pros and cons before pinning your colours to a particular methodology.

Advantages and Disadvantages of NHibernate

What are the advantages/disadvantages of using NHibernate ?
What kind of applications should be (& should not be) built using NHibernate ?
Since other ppl have listed advantages I will just list the disadvantages
Disadvantages
Increased startup time due to metadata preparation ( not good for desktop like apps)
Huge learning curve without orm background.
Comparatively Hard to fine tune generated sql.
Hard to get session management right if used in non-typical environments ( read non webapps )
Not suited for apps without a clean domain object model ( no all apps in world dont need clean domain object models) .
Have to jump through hoops if you have badly designed ( legacy ) db schema.
Advantages:
Flexible and very powerful mapping capabilities.
Caching.
Very polished UnitOfWork implementation.
Future query (article).
Model classes are POCO - which effectively means you can easily implement anemic domain antipatter.
Interceptors - you can do a kind of aspect oriented programming... Like very easily implementing audition, logging, authorization, validation, ect for your domain.
Lucene.NET and NHibernate are well integrated with each other - gives you a very fast and effective implementation of full-text indexing.
It's very mature and popular in enterprise environment.
Big community.
Disadvantages:
Already mentioned learning curve. You can start using NHibernate very fast but it will take you months to master it. I'd highly recomend to read Manning NHibernate book.
Writing XML mapping can be very tedious especially for big databases with hundreds and thousands of tables and views and stored procedures. Yes, there is tools that will help you by generating those mappings but you still will have to do quite a lot of manual work there. Fluent NHibernate seem to simplify this process by getting rid of XML mappings, so is Castle ActiveRecord (AR though is impossible to use for anemic domain as you define mappings in attributes on your model classes).
Performance may be low for certain scenarious. For instance large bulk operations. For those you might have to use IStatelessSession but its awkward experience, least to say...
Advantages:
Open source
Based on widely approved patterns
NH is not code-generator :)
Disadvantages:
Half-done LINQ support
Low performance
(see for example performance and LINQ tests on ormbattle.net)
Advantages:
Caching
Simplicity in your code
Power
Flexibility
Multi-database support
Disadvantages:
Stops you having to write your own persistence code
May reduce your knowledge of SQL
Applications you should use it for:
Any that use a database
A few more specific reasons to like NHibernate
Disadvantages: NHibernate is not a Microsoft product and therefore will face some resistance from coworkers who haven't heard of it. Especially FOSS bigots. Configuring the mapping files and lazy/eager loading behavior can be time-consuming. If your database has a bizarre naming convention, atypical design or very strict performance requirements, more work may be required than expected.
I say this a lot but ActiveRecord is a great layer over NHibernate. It uses attributes to map the data points to class members right in the classes themselves. People are not using this thing enough.
The high level answer is that NHibernate is in a class by itself and there is no near competition.
If you need CRUD against a database from a .NET application, you should be using NHibernate, for at least two reasons:
1) You get Linq support (which requires something like an ORM)
2) NHibernate is very mature
There are no significant disadvantages. There are other options, but those other options have significant disadvantages.
I wrote some more on this a while ago:
.NET and ORM - Decisions, decisions

Would nhibernate be used in large scale projects like say facebook? (for arguments sake)

For those who know the inner workings of nhibernate, do you think a large scale web application like say facebook/myspace would use nhibernate?
Or is nhibernate well suited for more low traffic sites like company sites etc? i.e. not enterprise ready because of its chatty nature?
NHibernate is not chatty at all. About scalability, there was already a question on NH's groups, which was more about the complexity of the database then traffic, but might still be interesting for you.
Even if there are always complaint's about unnecessary queries on every ORM, because of the generic nature of an ORM, it doesn't mean that it is chatty. On the other hand it optimizes situations where it would be too complex to optimize in hand-written DAL's. Eg. query batches or lazy loading.
NHibernate is quite light-weight compared to other ORM's and compared to it's powerful features.
NHibernate (as any other ORM) could be considered to be overkill if there is no object oriented business model but you need to optimize for highest performance. I don't think that Google could make use of NHibernate for its search engine, for instance.
Edit:
The performance and power of NHibernate is not fully for free. It requires that the developers understand at least the basics about relational databases. Other ORM's try to hide the whole relational problematics, which leads to much more unoptimized behaviour.
nHibernate is a professional joke.
In my company, its use has been prohibited by several reasons.
As tool is quite unproductive; you'll spend countless hours trying to figure out, or finding alternate strategies in a scarce documentation.
Much better, use your own generated DAL and SP's to achieve high performance. You'll have a cached execution plan, and in the end that's what really matters.
nHibernate has no advanced support for memcached, which is specially what you are going to use if you want to build a scalable web solution, like Facebook.
I work for a social gaming company, and we have specially forbidden to use nhibernate in particular.
NHibernate supports query caching, 2nd level caching based on primary keys, and also session cache for repeated hits on the same entity within the same session.
That's all a great help, but as long as you are hitting a database with a large load, you are going to have scaling problems. The best way to scale a database is to minimise the amount of time you actually have to use it. Distributed cache such as memcache, and caching your output (either post-datacrunched views or html) are the best ways to scale an application. If clients are regularly hitting the database, you are doing it wrong, ORM or not. In a .NET application, like a typical MVC app, has the advantages of being able to use varyby output caching, donut and donut-hole caching, as well as clients for memcache to be used with NHibernate and for your ViewModels.