When should one avoid using NHibernate's lazy-loading feature? - nhibernate

Most of what I hear about NHibernate's lazy-loading, is that it's better to use it, than not to use it. It seems like it just makes sense to minimize database access, in an effort to reduce bottlenecks. But few things come without trade-offs, certainly it slightly limits design by forcing you to have virtual properties. But I've also noticed that some developers turn lazy-loading off on certain often-used objects.
This makes me wonder if there are some definite situations where data-access performance is hurt by using lazy-loading.
So I wonder, when and in what situations should I avoid lazy-loading one of my NHibernate-persisted objects?
Is the downside to lazy-loading merely in additional processing time, or can nhibernate lazy-loading also increase the data-access time (for instance, by making additional round-trips to the database)?
Thanks!

There are clear performance tradeoffs between eager and lazy loading objects from a database.
If you use eager loading, you suck a ton of data in a single query, which you can then cache. This is most common on application startup. You are trading memory consumption for database round trips.
If you use lazy loading, you suck a minimal amount of data in a single query, but any time you need more information related to that initial data it requires more queries to the database and database performance hits are very often the major performance bottleneck in most applications.
So, in general, you always want to retrieve exactly the data you will need for the entire "unit of work", no more, no less. In some cases, you may not know exactly what you need (because the user is working through a wizard or something similar) and in that case it probably makes sense to lazy load as you go.
If you are using an ORM and focused on adding features quickly and will come back and optimize performance later (which is extremely common and a good way to do things), having lazy loading being the default is the correct way to go. If you later find (through performance profiling/analysis) that you have one query to get an object and then N queries to get the N objects related to that original object, you can change that piece of code to use eager loading to only hit the database once instead of N+1 times (the N+1 problem is a well known downside of using lazy loading).

The usual tradeoff for lazy loading is that you make a smaller hit on the database up front, but you end up making more hits on it long-term.
Without lazy loading, you'll grab an entire object graph up front, sucking down a large chunk of data at once. This could, potentially, cause lag in your UI, and so it is often discouraged. However, if you have a common object graph (not just single object - otherwise it wouldn't matter!) that you know will be accessed frequently, and top to bottom, then it makes sense to pull it down at once.
As an example, if you're doing an order management system, you probably won't pull down all the lines of every order, or all the customer information, on a summary screen. Lazy loading prevents this from happening.
I can't think of a good example for not using it offhand, but I'm sure there are cases where you'd want to do a big load of an object graph, say, on application initialization, in order to avoid lags in processing further down the line.

The short version is this:
Development is simpler if you use lazy loading. You just traverse object relationships in a natural OO way, and you get what you need when you ask for it.
Performance is generally better if you figure out what you need before you ask for it, and ask for it in one trip to the database.
For the past few years we've been focusing on quick development times. Now that we have a solid app and userbase, we're optimizing our data access.

If you are using a webservice between the client and server handling the database access using nhibernate it might be problematic using lazy loading since the object will be serialized and sent over the webservice and subsequent usage of "objects" further down in the object relationship needs a new trip to the database server using additional webservices. In such an instance it might not be too good using lazy loading. A word of caution, be careful in what you fetch if you turn lazy loading of, its way to easy to not think this through and through and end up fetching almost the whole database...

I have seen many performance problems aring from wrong loading behaviour configuration in Hibernate. The situation is quite the same with NHibernate I think. My recommendation is to always use lazy relations and then use eager fetching statemetns in your query - like fetch joins - . This ensures you are not loading to much data and you can avoid to many SQL queries.
It is easy to make a lazy releation eager by a query. It is nearly impossible the other way round.

Related

How to decide between EAGER and LAZY loading in Hibernate

In hibernate or OpenJPA, if I do FetchType.EAGER, I risk loading unnecessary data and hurting performance.
If I do FetchType.LAZYloading, I risk running into N + 1 problem.
Are there any guidelines what fetch mode to use when?
I agree with the general guidelines suggested by #D.R.:
Lazy loading on one hand imply memory saving, on the other hand imply increasing the number of queries to the db. Eager loading is the opposite.
You have to choose your poison.
Besides, I think it is worth to mention the possibility to override fetching strategies with hibernate fetch profiles (if you plan to use hibernate). When a predefined lazy approach isn't enough flexible, it is a good solution.
Using fetch profiles you tell hibernate to fetch object in a "different way" just for that transaction. Very handy when you have to get object lazily, but sometimes you need a different approach.
If you adopt a second level cache optimization, you should check for compatibility since the current fetch profile implementation supports a JOIN strategy.
In general you should use eager fetching in all cases where you immediately afterwards need the data. If you run into the N+1 problem, just re-execute the query with eager fetching.
There are of course much more opinions for more specific situations, however, I guess, SO is not the best place to discuss things.

Raw SQL vs OOP based queries (ORM)?

I was doing a project that requires frequent database access, insertions and deletions. Should I go for Raw SQL commands or should I prefer to go with an ORM technique? The project can work fine without any objects and using only SQL commands? Does this affect scalability in general?
EDIT: The project is one of the types where the user isn't provided with my content, but the user generates content, and the project is online. So, the amount of content depends upon the number of users, and if the project has even 50000 users, and additionally every user can create content or read content, then what would be the most apt approach?
If you have no ( or limited ) experience with ORM, then it will take time to learn new API. Plus, you have to keep in mind, that the sacrifice the speed for 'magic'. For example, most ORMs will select wildcard '*' for fields, even when you just need list of titles from your Articles table.
And ORMs will aways fail in niche cases.
Most of ORMs out there ( the ones based on ActiveRecord pattern ) are extremely flawed from OOP's point of view. They create a tight coupling between your database structure and class/model.
You can think of ORMs as technical debt. It will make the start of project easier. But, as the code grows more complex, you will begin to encounter more and more problems caused by limitations in ORM's API. Eventually, you will have situations, when it is impossible to to do something with ORM and you will have to start writing SQL fragments and entires statements directly.
I would suggest to stay away from ORMs and implement a DataMapper pattern in your code. This will give you separation between your Domain Objects and the Database Access Layer.
I'd say it's better to try to achieve the objective in the most simple way possible.
If using an ORM has no real added advantage, and the application is fairly simple, I would not use an ORM.
If the application is really about processing large sets of data, and there is no business logic, I would not use an ORM.
That doesn't mean that you shouldn't design your application property though, but again: if using an ORM doesn't give you any benefit, then why should you use it ?
For speed of development, I would go with an ORM, in particular if most data access is CRUD.
This way you don't have to also develop the SQL and write data access routines.
Scalability should't suffer, though you do need to understand what you are doing (you could hurt scalability with raw SQL as well).
If the project is either oriented :
- data editing (as in viewing simple tables of data and editing them)
- performance (as in designing the fastest algorithm to do a simple task)
Then you could go with direct sql commands in your code.
The thing you don't want to do, is do this if this is a large software, where you end up with many classes, and lot's of code. If you are in this case, and you scatter sql everywhere in your code, you will clearly regret it someday. You will have a hard time making changes to your domain model. Any modification would become really hard (except for adding functionalities or entites independant with the existing ones).
More information would be good, though, as :
- What do you mean by frequent (how frequent) ?
- What performance do you need ?
EDIT
It seems you're making some sort of CMS service. My bet is you don't want to start stuffing your code with SQL. #teresko's pattern suggestion seems interesting, seperating your application logic from the DB (which is always good), but giving the possiblity to customize every queries. Nonetheless, adding a layer that fills in memory objects can take more time than simply using the database result to write your page, but I don't think that small difference should matter in your case.
I'd suggest to choose a good pattern that seperates your business logique and dataAccess, like what #terekso suggested.
It depends a bit on timescale and your current knowledge of MySQL and ORM systems. If you don't have much time, just do whatever you know best, rather than wasting time learning a whole new set of code.
With more time, an ORM system like Doctrine or Propel can massively improve your development speed. When the schema is still changing a lot, you don't want to be spending a lot of time just rewriting queries. With an ORM system, it can be as simple as changing the schema file and clearing the cache.
Then when the design settles down, keep an eye on performance. If you do use ORM and your code is solid OOP, it's not too big an issue to migrate to SQL one query at a time.
That's the great thing about coding with OOP - a decision like this doesn't have to bind you forever.
I would always recommend using some form of ORM for your data access layer, as there has been a lot of time invested into the security aspect. That alone is a reason to not roll your own, unless you feel confident about your skills in protecting against SQL injection and other vulnerabilities.

NHibernate latency is very high

I am using NHibernate for ORM and have consolidated the loading of lots of entities into one big query.
I am actually loading a word dictionary, around 500K entries, and each word relates to others. Running the loading process in the background could be very tricky in our application, as we would have to manually load an entry that has not been loaded on time, as any word could be asked for at any time. Our only requirements are that all the data be loaded as fast as possible.
I also tried using a stateless session, but got an exception that stateless sessions can't fetch collections (for some reason, maybe it has to do with the fact there is no cache for stateless sessions?)
The problem is that although the query takes no more than 25 seconds in SQLServer, it takes well over 3 minutes for ICriteria.List().
I used NHProf to profile the loading process and found that the creation of the entities is a costly affair, which takes up most of the loading time in NHibernate.
Is there anything I could do to reduce this latency? Is the memory allocation expensive, or is it the "filling in" of the data?
Thanks!
Perhaps you should consider the fact that NHibernate (like most ORMs) is not particularly suited (or intended) for these types of bulk-loading scenarios. How many rows are you trying to load, give or take? What are you trying to do? Pre-populate a cache? Do batch-like processing?
My gut feeling is that you should seriously consider the purpose of your app and choose the underlying technologies accordingly. Perhaps you can shed some light on your intentions/requirements?
EDIT OK, from your comments I understand what it is you're trying to do here. The first thing I'd do is create a simple prototype using raw ADO.NET to load the same data, to get a feel for the best performance attainable using standard data access and in-memory collections. Next, fiddle around with different collection types to see what performs well when populating and searching. If loading data like this is still too slow, it's time to start looking at other methods of loading the data: file-based from a local data file, hydrating pre-serialized objects, some form of fast on-demand loading, etc.
Loading 500k entities into an NHibernate session is not a good idea. The session is made to be short lived and hold a relatively small number of entities.
If you want to do this kind of batch processing in NHibernate you should take a look at the StatelessSession instead of the ordinary session. Using a stateless session would most likely drastically improve performance in this scenario. However, when using a stateless session you lose the benefits of the NHibernate first level cache, such as change tracking.
More information about the StatelessSession can be found in this article and in the NH docs at nhibernate.info.
In this scenario I would also recommend that you consider using straight ADO.NET instead of NHibernate. I am not saying that you should switch you whole data access strategy to ADO.NET but you might want to consider using ADO.NET for the batch operations and using NHibernate for the other cases.
Profiling the creation process (for example with the VS performance analyser) should tell you exactly what is the costly operation. If you have played already with lazy loading tuning then I think the only good solution is to encapsulate the returned list to enable paging an return smaller chunks in a few iterations. I am not sure whether NHibernate support lazy result lists like JPA does (i.e. not loading entities from data reader until needed).

Is Lazy Loading really bad?

I hear a lot about performance problems about lazy loading, no matter if at NHibernate, Linq....
The problem is N+1 selects. Example, I want all posts and its users, at foreach I lazy Load Users, them I need one select for posts, plus N select for each user.
Lazy Loading:
1 - select ....from post
N - select ....from user
The "good" approach is do a join:
1 - select .....from post inner join user on post.UserId = user.Id
But seeing EF generated SQL, I realized that a lot of data is wasted. Imagine that all posts are the same User. Inner Join will bring all users columns for each post row.
In performance, which approach is best?
Lazy loading is neither good nor bad. See this for a more lengthy explanation:
When should one avoid using NHibernate's lazy-loading feature?
In general, lazy loading is a good default behavior for an ORM, but as an ORM user you need to be conscious of when to override the default and load data eagerly. Profiling the performance of your application is the best way to make decisions about using lazy loading or not using it. Be wary of spending too much effort on premature optimization.
The issue with Lazy Loading is being aware of what it is and when it can bite you. You need to be aware of how many potential trips could be made to the database, and how to work around that. I don't view LL as being bad. I just need to be aware of the ramifications of it.
Most of my applications involve a service boundary (web service, WCF, etc) and at that point lazy loading at the OR/M is pointless, and implementing lazy loading in your entities that sit on top of your service is kind of a bad idea (the entities now must know about the service).
There's no bad and good for lazy loading.
You have to decide if you prefer to load resources on run time or application loading times.
For example - Real time usually uses a buffer to avoid allocating resources on runtime. That's the opposite of lazy loading and is beneficial for Real Time software.
Lazy loading is beneficial if you have an application that runs for long duration and you don't want to allocate resources on startup.
Old thread, but search turned it up so I am adding my two cents. In addition to having to be aware of potential performance issues, the issue of accessing fields after a data context has been disposed stops me from ever using LL now. If you return an instance of an entity from a method where a data context was created and disposed, which is how they are designed to be used, accessing those virtual fields will exception fault. The solutions to this are to either include the fields in the queries (i.e. .Include), never return entity classes from your data layer/service, or keep data contexts alive for much longer. Including the fields is the best option, and that is just as easy without lazy loading enabled.

Large volume database updates with an ORM

I like ORM tools, but I have often thought that for large updates (thousands of rows), it seems inefficient to load, update and save when something like
UPDATE [table] set [column] = [value] WHERE [predicate]
would give much better performance.
However, assuming one wanted to go down this route for performance reasons, how would you then make sure that any objects cached in memory were updated correctly.
Say you're using LINQ to SQL, and you've been working on a DataContext, how do you make sure that your high-performance UPDATE is reflected in the DataContext's object graph?
This might be a "you don't" or "use triggers on the DB to call .NET code that drops the cache" etc etc, but I'm interested to hear common solutions to this sort of problem.
You're right, in this instance using an ORM to load, change and then persist records is not efficient. My process goes something like this
1) Early implementation use ORM, in my case NHibernate, exclusively
2) As development matures identify performance issues, which will include large updates
3) Refactor those out to sql or SP approach
4) Use Refresh(object) command to update cached objects,
My big problem has been informing other clients that the update has occured. In most instances we have accepted that some clients will be stale, which is the case with standard ORM usage anyway, and then check a timestamp on update/insert.
Most ORMs also have facilities for performing large or "bulk" updates efficiently. The Stateless Session is one such mechanism available in Hibernate for Java which apparently will be available in NHibernate 2.x:
http://ayende.com/Blog/archive/2007/11/13/What-is-going-on-with-NHibernate-2.0.aspx
ORMs are great for rapid development, but you're right -- they're not efficient. They're great in that you don't need to think about the underlying mechanisms which convert your objects in memory to rows in tables and back again. However, many times the ORM doesn't pick the most efficient process to do that. If you really care about the performance of your app, it's best to work with a DBA to help you design the database and tune your queries appropriately. (or at least understand the basic concepts of SQL yourself)
Bulk updates are a questionable design. Sometimes they seems necessary; in many cases, however, a better application design can remove the need for bulk updates.
Often, some other part of the application already touched each object one at a time; the "bulk" update should have been done in the other part of the application.
In other cases, the update is a prelude to processing elsewhere. In this case, the update should be part of the later processing.
My general design strategy is to refactor applications to eliminate bulk updates.
ORMs just won't be as efficient as hand-crafted SQL. Period. Just like hand-crafted assembler will be faster than C#. Whether or not that performance difference matters depends on lots of things. In some cases the higher level of abstraction ORMs give you might be worth more than potentailly higher performance, in other cases not.
Traversing relationships with object code can be quite nice but as you rightly point out there are potential problems.
That being said, I personally view ORms to be largely a false economy. I won't repeat myself here but just point to Using an ORM or plain SQL?