NHibernate latency is very high - nhibernate

I am using NHibernate for ORM and have consolidated the loading of lots of entities into one big query.
I am actually loading a word dictionary, around 500K entries, and each word relates to others. Running the loading process in the background could be very tricky in our application, as we would have to manually load an entry that has not been loaded on time, as any word could be asked for at any time. Our only requirements are that all the data be loaded as fast as possible.
I also tried using a stateless session, but got an exception that stateless sessions can't fetch collections (for some reason, maybe it has to do with the fact there is no cache for stateless sessions?)
The problem is that although the query takes no more than 25 seconds in SQLServer, it takes well over 3 minutes for ICriteria.List().
I used NHProf to profile the loading process and found that the creation of the entities is a costly affair, which takes up most of the loading time in NHibernate.
Is there anything I could do to reduce this latency? Is the memory allocation expensive, or is it the "filling in" of the data?
Thanks!

Perhaps you should consider the fact that NHibernate (like most ORMs) is not particularly suited (or intended) for these types of bulk-loading scenarios. How many rows are you trying to load, give or take? What are you trying to do? Pre-populate a cache? Do batch-like processing?
My gut feeling is that you should seriously consider the purpose of your app and choose the underlying technologies accordingly. Perhaps you can shed some light on your intentions/requirements?
EDIT OK, from your comments I understand what it is you're trying to do here. The first thing I'd do is create a simple prototype using raw ADO.NET to load the same data, to get a feel for the best performance attainable using standard data access and in-memory collections. Next, fiddle around with different collection types to see what performs well when populating and searching. If loading data like this is still too slow, it's time to start looking at other methods of loading the data: file-based from a local data file, hydrating pre-serialized objects, some form of fast on-demand loading, etc.

Loading 500k entities into an NHibernate session is not a good idea. The session is made to be short lived and hold a relatively small number of entities.
If you want to do this kind of batch processing in NHibernate you should take a look at the StatelessSession instead of the ordinary session. Using a stateless session would most likely drastically improve performance in this scenario. However, when using a stateless session you lose the benefits of the NHibernate first level cache, such as change tracking.
More information about the StatelessSession can be found in this article and in the NH docs at nhibernate.info.
In this scenario I would also recommend that you consider using straight ADO.NET instead of NHibernate. I am not saying that you should switch you whole data access strategy to ADO.NET but you might want to consider using ADO.NET for the batch operations and using NHibernate for the other cases.

Profiling the creation process (for example with the VS performance analyser) should tell you exactly what is the costly operation. If you have played already with lazy loading tuning then I think the only good solution is to encapsulate the returned list to enable paging an return smaller chunks in a few iterations. I am not sure whether NHibernate support lazy result lists like JPA does (i.e. not loading entities from data reader until needed).

Related

WCF data serialization : can it go faster?

This question is sort of a sequel to that question.
When we want to build a WCF service which works with some kind of data, it's natural that we want it to be fast and efficient. In order to achieve that, we have to make sure all segments of data road trip work as fast as they could, from data storage back-end such as SQL Server, to a WCF client who requested that data.
While seeking for an answer on that previous question, we have learned, thanks to Slauma and others who contributed through comments, that the time consuming part of Entity Framework's (first) large query is object materialization and attaching entities to the context when the result from the database is returned. We have seen that everything works much faster on subsequent queries.
Assuming those large queries are used as read-only operations, we came to a conclusion that we could set EF MergeOption to NoTracking, yielding better first query performance. What we have done with NoTracking was telling EF to create separate object for each record retrieved from the database - even when they have the same key. This will cause additional processing if we have .Include() statement in our query, which will lead to data with much larger size being returned.
The data may be so big that we could easily ask ourselves - did we really help our cause by using NoTracking option, even if we made the query faster (and maybe only the first one, depending on the number of .Include() statements, because subsequent queries without NoTracking option with multiple .Include() statements run faster simply because NoTracking option causes a lot more objects to be created when data returns from the server)?
The biggest problem is how to efficiently serialize this amount of data - and deserialize it on the client. With serialization already as slow as it is (I am using DataContractSerializer with PreserveObjectReferences set to true because I am sending EF 4.x generated POCOs to my client and vice versa), do we want to generate even more data (thanks to NoTracking)? To be honest, I haven't seen the data originated from the query with NoTracking option on ~11.000 objects not including navigation properties obtained via .Include(), arriving at the client side yet. Last time I tried to pull this off, the timeout of 00:10:00 was triggered (!)
So if you are still reading this wall of text, you tell me how to solve this situation. Which serializer to use in order to achieve acceptable results? Currently, if I don't use the NoTracking option, the serialization, transport and deserialization of ~11.000, via wsHttpBinding-like custom binding on the local machine take ~5 seconds. What's scary to me is that this large table is most likely going to contain ~500.000 records eventually.
Have you considered creating a View Model for your object and doing a projection in the select statement. That should be a lot faster so:
var result = from person in DB.Entities.Persons
.Include("District")
.Include("District.City")
.Include("District.City.State")
.Include("Nationality")
select new PersonViewModel()
{
Name = person.Name,
City = person.District.City,
State = person.District.City.State
Nationality = person.Nationality.Name
};
This would require you to create a ViewModel class to hold the flattened data for the PersonViewModel.
You might be able to further speed up things by creating a database view and letting Entity Framework select directly from there.
If you rally want the front-end to populate a grid with 500.000 records, then I'd remove the webservice layer altogether and use a DataReader to speed up the process. Entity Framework and WCF aren't suitable for transforming the data at a proper performance. What you're basically doing here is:
Database -> TDS -> .NET objects -> XML -> Plain text -> XML -> .NET Objects -> UI
While this could easily be reduced to:
Database -> TDS -> UI
Then use EntityFramwork to handle the changes to the entities in your business logic. This is in line with the Command and Query Separation pattern. Use a technology suitable for high performance querying of data and link that directly to your app. Then use a command strategy to implement your business logic.
OData services might also provide a better way to link your UI directly to the data, as it can be used to quickly query your data allowing you to implement quick filtering without the user really noticing.
If the security settings are prohibiting direct querying through OData or direct access to the SQL database, consider materializing the objects yourself. Select the data directly from either a view or a query and use a IDataReader to directly populate your ViewModel. That will probably give you the highest performance.
There are a lot of alternatives to Entity Framework created especially because EF isn't cut out for large datasets. See FluentData DapperDotNet, Massive or PetaPoco. You might want to use these side-by-side with entity Framework to handle your large, flat data queries.
I use Json.Net's implementation of Bson in my RIA application. More info here.
I yield return an IEnumerable, as I read from the database and serialize the rows. I find the speed to be acceptable and I return Entities with roughly 20 properties. This approach should minimize the concurrent memory use on the server.
Based on what I have gathered by looking at various reviews and performance benchmarks, I would choose protobuf-net as a serializer. It's just a matter of design whether it can be plugged into my service configuration. More info about that here.
Although not completely an answer to this question, jessehouwing had the best answer and I am marking it as accepted.

How to go from a full SQL querying to something like a NoSQL?

In one of my process I have this SQL query that take 10-20% of the total execution time. This SQL query does a filter on my Database, and load a list of PricingGrid object.
So I want to improve these performance.
So far I guessed 2 solutions :
Use a NoSQL solution, AFAIK these are good solutions for improving reading process.
But the migration seems hard and needs a lot of work (like import the data from sql server to nosql in a regular basis)
I don't have any knowledge , I even don't know which one I should use (the first I'd use is Ravendb because I follow ayende and it's done by the .net community).
I might have some stuff to change in my model to make my object ok for a nosql database
Load all my PricingGrid object in memory (in a static IEnumerable)
This might be a problem when my server won't have enough memory to load everything
I might reinvent the wheel (indexes...) invented by the NoSQL providers
I think I'm not the first one wondering this, so what would be the best solution ? Is there any tools that could help me ?
.net 3.5, SQL Server 2005, windows server 2005
Migrating your data from SQL is only the first step.
Moving to a document store (like RavenDB or MongoDB) also means that you need to:
Denormalize your data
Perform schema validation in your code
Handle concurrency of complex operations in your code since you no longer have transactions (at least not the same way)
Perform rollbacks in the event of partial commits (changes)
Depending on your updates, reads and network model you might also need to handle conflicts
You provided very limited information but it sounds like your needs include a single database server and that your data fits well in the relational model.
In such a case I would vote against a NoSQL solution, it is more likely that you can speed up your queries with database optimizations and still retain all the added value of a RDBMS.
Non-relational databases are tools for a specific job (no matter how they sell them), if you need them it is usually because your data doesn't fit well in the relational model or if you have a need to distribute your data over multiple machines (size or availability). For instance, I use MongoDB for a write-intensive high throughput job management application. It is centralized and the data is very transient so the "cost" of having low durability is acceptable. This doesn't sound like the case for you.
If prefer to use a NoSQL solution perhaps you should try using Memcached+MySQL (InnoDB) this will allow you to get the speed benefits of an in-memory cache (in the form of a memcached daemon plugin) with the underlying protection and capabilities of an RDBMS (MySQL). It should also ease data migration and somewhat reduce the amount of changes required in your code.
I myself have never used it, I find that I either need NoSQL for the reasons I stated above or that I can optimize the RDBMS using stored procedures, indexes and table views in a way which is sufficient for my needs.
Asaf has provided great information in regards to the usage of NoSQL and when it is most appropriate. Given that your main concern was performance, I would tend to agree with his opinion - it would take you much more time and effort to adopt a completely new (and very different) data persistence platform than it would to trick out your SQL Server cluster. That said, my answer is mainly to address the "how" part of your question.
Addressing misunderstandings:
Denormalizing Data - You do not need to manually denormalize your existing data. This will be done for you when it is migrated over. More than anything you need to simply think about your data in a different fashion - root aggregates, entity and value types, etc.
Concurrency/Transactions - Transactions are possible in both Mongo and Raven, they are simply done in a different fashion. One of the inherent ways Raven does this is by using an ORM-like "unit of work" pattern with its RavenSession objects. Yes, your data validation needs to be done in code, but you already should be doing it there anyway. In my experience this is an over-hyped con.
How:
Install Raven or Mongo on a primary server, run it as a service.
Create or extend an existing application that uses the database you intend to port. This application needs all the model classes/libraries that your SQL database provides persistence for.
a. In your "data layer" you likely have a repository class somewhere. Extract an interface form this, and use it to build another repository class for your Raven/Mongo persistence. Both DB's have plenty good documentation for using their APIs to push/pull/update changes in the document graphs. It's pretty damn simple.
b. Load your SQL data into C# objects in memory. Pull back your top-level objects (just the entities) and load their inner collections and related data in memory. Your repository is probably already doing this (ex. when fetching an Order object, ensure not only its properties but associated collections like Items are loaded in memory.
c. Instantiate your Raven/Mongo repository and push the data to it. Primary entities become "top level documents" or "root aggregates" serialized in JSON, and their collections' data nested within. Save changes and close the repository. Note: You may break this step down into as many little pieces as your data deems necessary.
Once your data is migrated, play around with it and ensure you are satisfied. You may want to modify your application Models a little to adjust the way they are persisted to Raven/Mongo - for instance you may want to make both Orders and Items top-level documents and simply use reference values (much like relationships in RDBMS systems). Watch out here though, as doing so sort-of goes against the principal and performance behind NoSQL as now you have to tap the DB twice to get the Order and the Items.
If satisfied, shard/replicate your mongo/raven servers across your remaining available server boxes.
Obviously there are tons of little details I did not explain, but that is the general process, and much of it depends on the applications already consuming the database and may be tricky if more than one app/system talks to it.
Lastly, just to reiterate what Asaf said... learn as much as you can about NoSQL and its best use-cases. It is an amazing tool, but not golden solution for all data persistence. In your case try to really find the bottlenecks in your current solution and see if they are solvable. As one of my systems guys says, "technology for technology's sake is bullshit"

When should one avoid using NHibernate's lazy-loading feature?

Most of what I hear about NHibernate's lazy-loading, is that it's better to use it, than not to use it. It seems like it just makes sense to minimize database access, in an effort to reduce bottlenecks. But few things come without trade-offs, certainly it slightly limits design by forcing you to have virtual properties. But I've also noticed that some developers turn lazy-loading off on certain often-used objects.
This makes me wonder if there are some definite situations where data-access performance is hurt by using lazy-loading.
So I wonder, when and in what situations should I avoid lazy-loading one of my NHibernate-persisted objects?
Is the downside to lazy-loading merely in additional processing time, or can nhibernate lazy-loading also increase the data-access time (for instance, by making additional round-trips to the database)?
Thanks!
There are clear performance tradeoffs between eager and lazy loading objects from a database.
If you use eager loading, you suck a ton of data in a single query, which you can then cache. This is most common on application startup. You are trading memory consumption for database round trips.
If you use lazy loading, you suck a minimal amount of data in a single query, but any time you need more information related to that initial data it requires more queries to the database and database performance hits are very often the major performance bottleneck in most applications.
So, in general, you always want to retrieve exactly the data you will need for the entire "unit of work", no more, no less. In some cases, you may not know exactly what you need (because the user is working through a wizard or something similar) and in that case it probably makes sense to lazy load as you go.
If you are using an ORM and focused on adding features quickly and will come back and optimize performance later (which is extremely common and a good way to do things), having lazy loading being the default is the correct way to go. If you later find (through performance profiling/analysis) that you have one query to get an object and then N queries to get the N objects related to that original object, you can change that piece of code to use eager loading to only hit the database once instead of N+1 times (the N+1 problem is a well known downside of using lazy loading).
The usual tradeoff for lazy loading is that you make a smaller hit on the database up front, but you end up making more hits on it long-term.
Without lazy loading, you'll grab an entire object graph up front, sucking down a large chunk of data at once. This could, potentially, cause lag in your UI, and so it is often discouraged. However, if you have a common object graph (not just single object - otherwise it wouldn't matter!) that you know will be accessed frequently, and top to bottom, then it makes sense to pull it down at once.
As an example, if you're doing an order management system, you probably won't pull down all the lines of every order, or all the customer information, on a summary screen. Lazy loading prevents this from happening.
I can't think of a good example for not using it offhand, but I'm sure there are cases where you'd want to do a big load of an object graph, say, on application initialization, in order to avoid lags in processing further down the line.
The short version is this:
Development is simpler if you use lazy loading. You just traverse object relationships in a natural OO way, and you get what you need when you ask for it.
Performance is generally better if you figure out what you need before you ask for it, and ask for it in one trip to the database.
For the past few years we've been focusing on quick development times. Now that we have a solid app and userbase, we're optimizing our data access.
If you are using a webservice between the client and server handling the database access using nhibernate it might be problematic using lazy loading since the object will be serialized and sent over the webservice and subsequent usage of "objects" further down in the object relationship needs a new trip to the database server using additional webservices. In such an instance it might not be too good using lazy loading. A word of caution, be careful in what you fetch if you turn lazy loading of, its way to easy to not think this through and through and end up fetching almost the whole database...
I have seen many performance problems aring from wrong loading behaviour configuration in Hibernate. The situation is quite the same with NHibernate I think. My recommendation is to always use lazy relations and then use eager fetching statemetns in your query - like fetch joins - . This ensures you are not loading to much data and you can avoid to many SQL queries.
It is easy to make a lazy releation eager by a query. It is nearly impossible the other way round.

Is Lazy Loading really bad?

I hear a lot about performance problems about lazy loading, no matter if at NHibernate, Linq....
The problem is N+1 selects. Example, I want all posts and its users, at foreach I lazy Load Users, them I need one select for posts, plus N select for each user.
Lazy Loading:
1 - select ....from post
N - select ....from user
The "good" approach is do a join:
1 - select .....from post inner join user on post.UserId = user.Id
But seeing EF generated SQL, I realized that a lot of data is wasted. Imagine that all posts are the same User. Inner Join will bring all users columns for each post row.
In performance, which approach is best?
Lazy loading is neither good nor bad. See this for a more lengthy explanation:
When should one avoid using NHibernate's lazy-loading feature?
In general, lazy loading is a good default behavior for an ORM, but as an ORM user you need to be conscious of when to override the default and load data eagerly. Profiling the performance of your application is the best way to make decisions about using lazy loading or not using it. Be wary of spending too much effort on premature optimization.
The issue with Lazy Loading is being aware of what it is and when it can bite you. You need to be aware of how many potential trips could be made to the database, and how to work around that. I don't view LL as being bad. I just need to be aware of the ramifications of it.
Most of my applications involve a service boundary (web service, WCF, etc) and at that point lazy loading at the OR/M is pointless, and implementing lazy loading in your entities that sit on top of your service is kind of a bad idea (the entities now must know about the service).
There's no bad and good for lazy loading.
You have to decide if you prefer to load resources on run time or application loading times.
For example - Real time usually uses a buffer to avoid allocating resources on runtime. That's the opposite of lazy loading and is beneficial for Real Time software.
Lazy loading is beneficial if you have an application that runs for long duration and you don't want to allocate resources on startup.
Old thread, but search turned it up so I am adding my two cents. In addition to having to be aware of potential performance issues, the issue of accessing fields after a data context has been disposed stops me from ever using LL now. If you return an instance of an entity from a method where a data context was created and disposed, which is how they are designed to be used, accessing those virtual fields will exception fault. The solutions to this are to either include the fields in the queries (i.e. .Include), never return entity classes from your data layer/service, or keep data contexts alive for much longer. Including the fields is the best option, and that is just as easy without lazy loading enabled.

Which approach to create the data access layer has the highest performance?

I have to create a very high performance application. Currently, I am using Entity Framework for my data access layer. My application has to insert some communication data almost every second. I found that Entity Framework is slow; it has about 2 seconds delay to finish the SaveChanges() method.
I was thinking I have the following options:
1. Create the data access layer myself using ADO.NET; using stored procedures or ad-hoc queries
2. Use Enterprise Library Data access Layer
3. Use NHibernate
4. Use Repository Factory: http://pooyakhamooshi.blogspot.com/search?q=repository
What do you think? which one is quicker for inserting data? Which one is quicker to set up?
If it's only a question of performance, it's impossible to go past using ADO.NET directly because every framework that you will use will use ADO.NET under the scenes. The performance gain has to be worth it though, and unless you're inserting millions upon millions of records, it's not likely to be worth it.
I would suggest you look at profiling your application to see why your application is taking 2 seconds to save information, it shouldn't be that slow. Maybe you've got an n + 1 performance problem. Fixing this will probably give you the performance you want using Entity Framework (or any other standard DAL for that matter). Focus your efforts on that.
Plain ADO.NET again depends how you'd implement it but performance-wise it should be the best but would take longer to develop it.
I found this site very helpful: http://ormbattle.net/
BLToolkit seems to be the best free ORM tool performance-wise; it's the first time I've heard of it!