How to speed up database operations? - sql

What can we use to reduce the time taken to get data from the database using Entity Framework? sny suggestion with caching or any other way?

I would recommend you check the tracking options (especially .AsNoTracking) provided by the EF.
A sneak peek to begin your research:
When we use function AsNoTracking() we are explicitly telling Entity Framework that the entities are not tracked by the context. This can be especially useful when retrieving large amounts of data from your data store. If you want to make changes to un-tracked entities however, you must remember to attach them before calling SaveChanges.

Related

WCF data serialization : can it go faster?

This question is sort of a sequel to that question.
When we want to build a WCF service which works with some kind of data, it's natural that we want it to be fast and efficient. In order to achieve that, we have to make sure all segments of data road trip work as fast as they could, from data storage back-end such as SQL Server, to a WCF client who requested that data.
While seeking for an answer on that previous question, we have learned, thanks to Slauma and others who contributed through comments, that the time consuming part of Entity Framework's (first) large query is object materialization and attaching entities to the context when the result from the database is returned. We have seen that everything works much faster on subsequent queries.
Assuming those large queries are used as read-only operations, we came to a conclusion that we could set EF MergeOption to NoTracking, yielding better first query performance. What we have done with NoTracking was telling EF to create separate object for each record retrieved from the database - even when they have the same key. This will cause additional processing if we have .Include() statement in our query, which will lead to data with much larger size being returned.
The data may be so big that we could easily ask ourselves - did we really help our cause by using NoTracking option, even if we made the query faster (and maybe only the first one, depending on the number of .Include() statements, because subsequent queries without NoTracking option with multiple .Include() statements run faster simply because NoTracking option causes a lot more objects to be created when data returns from the server)?
The biggest problem is how to efficiently serialize this amount of data - and deserialize it on the client. With serialization already as slow as it is (I am using DataContractSerializer with PreserveObjectReferences set to true because I am sending EF 4.x generated POCOs to my client and vice versa), do we want to generate even more data (thanks to NoTracking)? To be honest, I haven't seen the data originated from the query with NoTracking option on ~11.000 objects not including navigation properties obtained via .Include(), arriving at the client side yet. Last time I tried to pull this off, the timeout of 00:10:00 was triggered (!)
So if you are still reading this wall of text, you tell me how to solve this situation. Which serializer to use in order to achieve acceptable results? Currently, if I don't use the NoTracking option, the serialization, transport and deserialization of ~11.000, via wsHttpBinding-like custom binding on the local machine take ~5 seconds. What's scary to me is that this large table is most likely going to contain ~500.000 records eventually.
Have you considered creating a View Model for your object and doing a projection in the select statement. That should be a lot faster so:
var result = from person in DB.Entities.Persons
.Include("District")
.Include("District.City")
.Include("District.City.State")
.Include("Nationality")
select new PersonViewModel()
{
Name = person.Name,
City = person.District.City,
State = person.District.City.State
Nationality = person.Nationality.Name
};
This would require you to create a ViewModel class to hold the flattened data for the PersonViewModel.
You might be able to further speed up things by creating a database view and letting Entity Framework select directly from there.
If you rally want the front-end to populate a grid with 500.000 records, then I'd remove the webservice layer altogether and use a DataReader to speed up the process. Entity Framework and WCF aren't suitable for transforming the data at a proper performance. What you're basically doing here is:
Database -> TDS -> .NET objects -> XML -> Plain text -> XML -> .NET Objects -> UI
While this could easily be reduced to:
Database -> TDS -> UI
Then use EntityFramwork to handle the changes to the entities in your business logic. This is in line with the Command and Query Separation pattern. Use a technology suitable for high performance querying of data and link that directly to your app. Then use a command strategy to implement your business logic.
OData services might also provide a better way to link your UI directly to the data, as it can be used to quickly query your data allowing you to implement quick filtering without the user really noticing.
If the security settings are prohibiting direct querying through OData or direct access to the SQL database, consider materializing the objects yourself. Select the data directly from either a view or a query and use a IDataReader to directly populate your ViewModel. That will probably give you the highest performance.
There are a lot of alternatives to Entity Framework created especially because EF isn't cut out for large datasets. See FluentData DapperDotNet, Massive or PetaPoco. You might want to use these side-by-side with entity Framework to handle your large, flat data queries.
I use Json.Net's implementation of Bson in my RIA application. More info here.
I yield return an IEnumerable, as I read from the database and serialize the rows. I find the speed to be acceptable and I return Entities with roughly 20 properties. This approach should minimize the concurrent memory use on the server.
Based on what I have gathered by looking at various reviews and performance benchmarks, I would choose protobuf-net as a serializer. It's just a matter of design whether it can be plugged into my service configuration. More info about that here.
Although not completely an answer to this question, jessehouwing had the best answer and I am marking it as accepted.

Flexible Persistence Layer

I am designing an ASP.NET MVC 2 application. Currently I am leveraging Entity Framework 4 with switchable SQLServer and MySQL datastores.
A requirement recently surfaced for the application to allow user-defined models/entities to be manipulated. Now I'm unsure if a SQL/relational database is appropriate at all; instead of adding/removing 'Employee' objects, for example, the user should be able to define an 'Employee' and what properties it has - effectively adding/removing tables and columns on the fly, at runtime.
Is SQL unsuitable for this? Are there options which allow me to stay within a relational database structure and still satisfy this requirement? Within the Entity Framework, can I regenerate .edmx files 'on the fly' or are there alternatives which achieve similar goals?
I've looked briefly at other options like 'document-based' dbs and 'schema-free/no-sql' dbs, such as MongoDb. I've also looked at some serialization formats such as Google's Protocol Buffers, JSON, and XML. From your experience, are any of these particularly suitable for this purpose? Serialization performance is not a big concern.
The application is in its infancy and I have no time constraints. Essentially I am free to rewrite it as I please, so if scrapping and starting over is a better alternative, I am very open to this. What are your suggestions? Thanks in advance!
Before looking at options I'd suggest (if you have not already done it :-) that you need to get a clear definition of exactly what users will be able to define. Once you have that you can then deduce an idea of the level of flexibility needed and therefore the type of data store needed to do the job.
One other word of advice would be that if they clients demand to be able to create anything any way they want - walk away. I've dealt with clients and users at all levels and one thing that is guaranteed is is that users have no interest if the effective and efficient design of data and therefore will always reduce the data to a pile of poo through shear neglect.
You need to set some boundaries so that the data store behind the system maintains some integrity.

What will be the benefits of NHibernate in a data retrieval only scenario?

We have been suggested to use NHibernate for our project. But the point is our application will only be retrieving the data from the database so are there any benefits of using NHibernate in this scenario?
One more thing, what is the query execution plan in NHIbernate does it support something like prepared statements or the so called pre complied statements.?
I agree to the answers above, but there is one more reason for using nhibernate: Your completely independend of the underlaying database system. You can switch from mysql to oracle and the only thing you have to do, is to change the settings. the access to the database stays exactly the same.
NHibernate is useful is you need to map data from a database table into a .NET class. Even if you're only doing select queries, it still might be useful if you need to pass the data objects to a client tier (web page, desktop app, etc.) for display. Working with plain objects can be easier than working with a DataSet or other ADO.NET data class in a presentation layer.
NHibernate does have the ability to parse/pre-compile queries if you put them in the mapping file.
The benefit for using NHibernate in a read only scenario is that you would not need to map the results of queries back to .net objects as the runtime would do this for you. Also, it provides a more object oriented query syntax (you can also use LINQ), and you can take advantage of lazy loading.
I don't believe NHibernate can use prepared statements unless you are having it call stored procedures.

NHibernate latency is very high

I am using NHibernate for ORM and have consolidated the loading of lots of entities into one big query.
I am actually loading a word dictionary, around 500K entries, and each word relates to others. Running the loading process in the background could be very tricky in our application, as we would have to manually load an entry that has not been loaded on time, as any word could be asked for at any time. Our only requirements are that all the data be loaded as fast as possible.
I also tried using a stateless session, but got an exception that stateless sessions can't fetch collections (for some reason, maybe it has to do with the fact there is no cache for stateless sessions?)
The problem is that although the query takes no more than 25 seconds in SQLServer, it takes well over 3 minutes for ICriteria.List().
I used NHProf to profile the loading process and found that the creation of the entities is a costly affair, which takes up most of the loading time in NHibernate.
Is there anything I could do to reduce this latency? Is the memory allocation expensive, or is it the "filling in" of the data?
Thanks!
Perhaps you should consider the fact that NHibernate (like most ORMs) is not particularly suited (or intended) for these types of bulk-loading scenarios. How many rows are you trying to load, give or take? What are you trying to do? Pre-populate a cache? Do batch-like processing?
My gut feeling is that you should seriously consider the purpose of your app and choose the underlying technologies accordingly. Perhaps you can shed some light on your intentions/requirements?
EDIT OK, from your comments I understand what it is you're trying to do here. The first thing I'd do is create a simple prototype using raw ADO.NET to load the same data, to get a feel for the best performance attainable using standard data access and in-memory collections. Next, fiddle around with different collection types to see what performs well when populating and searching. If loading data like this is still too slow, it's time to start looking at other methods of loading the data: file-based from a local data file, hydrating pre-serialized objects, some form of fast on-demand loading, etc.
Loading 500k entities into an NHibernate session is not a good idea. The session is made to be short lived and hold a relatively small number of entities.
If you want to do this kind of batch processing in NHibernate you should take a look at the StatelessSession instead of the ordinary session. Using a stateless session would most likely drastically improve performance in this scenario. However, when using a stateless session you lose the benefits of the NHibernate first level cache, such as change tracking.
More information about the StatelessSession can be found in this article and in the NH docs at nhibernate.info.
In this scenario I would also recommend that you consider using straight ADO.NET instead of NHibernate. I am not saying that you should switch you whole data access strategy to ADO.NET but you might want to consider using ADO.NET for the batch operations and using NHibernate for the other cases.
Profiling the creation process (for example with the VS performance analyser) should tell you exactly what is the costly operation. If you have played already with lazy loading tuning then I think the only good solution is to encapsulate the returned list to enable paging an return smaller chunks in a few iterations. I am not sure whether NHibernate support lazy result lists like JPA does (i.e. not loading entities from data reader until needed).

Which approach to create the data access layer has the highest performance?

I have to create a very high performance application. Currently, I am using Entity Framework for my data access layer. My application has to insert some communication data almost every second. I found that Entity Framework is slow; it has about 2 seconds delay to finish the SaveChanges() method.
I was thinking I have the following options:
1. Create the data access layer myself using ADO.NET; using stored procedures or ad-hoc queries
2. Use Enterprise Library Data access Layer
3. Use NHibernate
4. Use Repository Factory: http://pooyakhamooshi.blogspot.com/search?q=repository
What do you think? which one is quicker for inserting data? Which one is quicker to set up?
If it's only a question of performance, it's impossible to go past using ADO.NET directly because every framework that you will use will use ADO.NET under the scenes. The performance gain has to be worth it though, and unless you're inserting millions upon millions of records, it's not likely to be worth it.
I would suggest you look at profiling your application to see why your application is taking 2 seconds to save information, it shouldn't be that slow. Maybe you've got an n + 1 performance problem. Fixing this will probably give you the performance you want using Entity Framework (or any other standard DAL for that matter). Focus your efforts on that.
Plain ADO.NET again depends how you'd implement it but performance-wise it should be the best but would take longer to develop it.
I found this site very helpful: http://ormbattle.net/
BLToolkit seems to be the best free ORM tool performance-wise; it's the first time I've heard of it!