How can I speed my Entity Framework code? - sql

My SQL and Entity Framework knowledge is a somewhat limited. In one Entity Framework (4) application, I notice it takes forever (about 2 minutes) to complete one of my method calls. The first queries do not take much time, but when I loop through the Entity Framework objects returned by the queries, even though I am only reading (not modifying) the data I supposedly got, it takes forever to complete the nested loops, even though there are only dozens of entries in each list and a few levels of looping.
I expect the example below could be re-written with a fancier query that could probably include all of the filtering I am doing in my loops with some SQL words I don't really know how to use, so if someone could show me what the equivalent SQL expression would be, that would be extremely educational to me and probably solve my current performance problem.
Moreover, since other parts of this and other applications I develop often want to do more complex computations on SQL data, I would also like to know a good way to retrieve data from Entity Framework to local memory objects that do not have huge delays in reading them. In my LINQ-to-SQL project there was a similar performance problem, and I solved it by refactoring the whole application to load all SQL data into parallel objects in RAM, which I had to write myself, and I wonder if there isn't a better way to either tell Entity Framework to not keep doing whatever high-latency communication it is doing, or to load into local RAM objects.
In the example below, the code gets a list of food menu items for a member (i.e. a person) on a certain date via a SQL query, and then I use other queries and loops to filter out the menu items on two criteria: 1) If the member has a rating of zero for any group id which the recipe is a member of (a many-to-many relationship) and 2) If the member has a rating of zero for the recipe itself.
Example:
List<PFW_Member_MenuItem> MemberMenuForCookDate =
(from item in _myPfwEntities.PFW_Member_MenuItem
where item.MemberID == forMemberId
where item.CookDate == onCookDate
select item).ToList();
// Now filter out recipes in recipe groups rated zero by the member:
List<PFW_Member_Rating_RecipeGroup> ExcludedGroups =
(from grpRating in _myPfwEntities.PFW_Member_Rating_RecipeGroup
where grpRating.MemberID == forMemberId
where grpRating.Rating == 0
select grpRating).ToList();
foreach (PFW_Member_Rating_RecipeGroup grpToExclude in ExcludedGroups)
{
List<PFW_Member_MenuItem> rcpsToRemove = new List<PFW_Member_MenuItem>();
foreach (PFW_Member_MenuItem rcpOnMenu in MemberMenuForCookDate)
{
PFW_Recipe rcp = GetRecipeById(rcpOnMenu.RecipeID);
foreach (PFW_RecipeGroup group in rcp.PFW_RecipeGroup)
{
if (group.RecipeGroupID == grpToExclude.RecipeGroupID)
{
rcpsToRemove.Add(rcpOnMenu);
break;
}
}
}
foreach (PFW_Member_MenuItem rcpToRemove in rcpsToRemove)
MemberMenuForCookDate.Remove(rcpToRemove);
}
// Now filter out recipes rated zero by the member:
List<PFW_Member_Rating_Recipe> ExcludedRecipes =
(from rcpRating in _myPfwEntities.PFW_Member_Rating_Recipe
where rcpRating.MemberID == forMemberId
where rcpRating.Rating == 0
select rcpRating).ToList();
foreach (PFW_Member_Rating_Recipe rcpToExclude in ExcludedRecipes)
{
List<PFW_Member_MenuItem> rcpsToRemove = new List<PFW_Member_MenuItem>();
foreach (PFW_Member_MenuItem rcpOnMenu in MemberMenuForCookDate)
{
if (rcpOnMenu.RecipeID == rcpToExclude.RecipeID)
rcpsToRemove.Add(rcpOnMenu);
}
foreach (PFW_Member_MenuItem rcpToRemove in rcpsToRemove)
MemberMenuForCookDate.Remove(rcpToRemove);
}

You can use EFProf http://www.hibernatingrhinos.com/products/EFProf to track see exactly what EF is sending to SQL. It can also show you how many queries you are sending and how many unique queries. It also provides you some analysis of each query (e.g. is it unbound etc). Entity Framework with its navigation properties, it is quite easy to not realize you are making a db request. When you are in a loop, and have a navigation property, you get in to the N + 1 problem.

You could use the Keyword Virtual on your List parts of your model if you are using code first to enable proxying, that way you will not have to get all the data back at once, only as you need it.
Also consider NoTracking for read only data
context.bigTable.MergeOption = MergeOption.NoTracking;

Related

Should I use unique tables for every user?

I'm working on an web app that collects traffic information for websites that use my service. Think google analytics but far more visual. I'm using SQL Server 2012 for the backbone of my app and am considering using MongoDB as the data gathering analytic side of the site.
If I have 100 users with an average of 20,000 hits a month on their site, that's 2,000,000 records in a single collection that will be getting queried.
Should I use MongoDB to store this information (I'm new to it and new things are intimidating)?
Should I dynamically create new collections/tables for every new user?
Thanks!
With MongoDB the collection (aka sql table) can get quite big without much issue. That is largely what it is designed for. The Mongo is part HuMONGOus (pretty clever eh). This is a great use for mongodb which is great at storing point in time information.
Options :
1. New Collection for each Client
very easy to do I use a GetCollectionSafe Method for this
public class MongoStuff
private static MongoDatabase GetDatabase()
{
var databaseName = "dbName";
var connectionString = "connStr";
var client = new MongoClient(connectionString);
var server = client.GetServer();
return server.GetDatabase(databaseName);
}
public static MongoCollection<T> GetCollection<T>(string collectionName)
{
return GetDatabase().GetCollection<T>(collectionName);
}
public static MongoCollection<T> GetCollectionSafe<T>(string collectionName)
{
//var db = GetDatabase();
var db = GetDatabase();
if (!db.CollectionExists(collectionName)) {
db.CreateCollection(collectionName);
}
return db.GetCollection<T>(collectionName);
}
}
then you can call with :
var collection = MongoStuff.GetCollectionSafe<Record>("ClientName");
Running this script
static void Main(string[] args)
{
var times = new List<long>();
for (int i = 0; i < 1000; i++)
{
Stopwatch watch = new Stopwatch();
watch.Start();
MongoStuff.GetCollectionSafe<Person>(String.Format("Mark{0:000}", i));
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds);
times.Add(watch.ElapsedMilliseconds);
}
Console.WriteLine(String.Format("Max : {0} \nMin : {1} \nAvg : {2}", times.Max(f=>f), times.Min(f=> f), times.Average(f=> f)));
Console.ReadKey();
}
Gave me (on my laptop)
Max : 180
Min : 1
Avg : 6.635
Benefits :
Ease of splitting data if one client needs to go on their own
Might match your brain map of the problem
Cons :
Almost impossible to do aggregate data over all collections
Hard to find collections in Management studios (like robomongo)
2. One Large Collection
Use one collection for it all access it this way
var coll = MongoStuff.GetCollection<Record>("Records");
Put an index on the table (the index will make reads orders of magnitude quicker)
coll.EnsureIndex(new IndexKeysBuilder().Ascending("ClientId"));
needs to only be run once (per collection, per index )
Benefits :
One Simple place to find data
Aggregate over all clients possible
More traditional Mongodb setup
Cons :
All Clients Data is intermingled
May not mentally map as well
Just as a reference the mongodb limits for sizes are here :
[http://docs.mongodb.org/manual/reference/limits/][1]
3. Store only aggregated data
If you are never intending to break down to an individual record just save the aggregates themselves.
Page Loads :
# Page Total Time Average Time
15 Default.html 1545 103
I will let someone else tackle the MongoDB side of your question as I don't feel I'm the best person to comment on it, I would point out that MongoDB is a very different animal and you'll lose a lot of the RI you enjoy in SQL.
In terms of SQL design I would not use a different schema for each customer approach. Your database schema and backups could grow uncontrollably, maintaining a dynamically growing schema will be a nightmare.
I would suggest one of two approaches:
Either you can create a new database for each customer:
This is more secure as users cannot access each other's data (just use different credentials) and users are easier to manage/migrate and separate.
However many hosting providers charge per database, it will cost more to run and maintain and should you wish to compare data across users it gets much more challenging.
Your second approach is to simply host all users in a single DB, your tables will grow large (although 2 million rows is not over the top for a well maintained SQL DB). You would simply use a UserID column to discriminate.
The emphasis will be on you to get the performance you need through proper indexing
Users' data will exist in the same system and there's no SQL defense against users accessing each other's data - your code will have to be good!

Symfony2, Doctrine, add\insert\update best solution for big count of queries

Let's imagine we have this code:
while (true)
{
foreach($array as $row)
{
$item = $em->getRepository('reponame')->findOneBy(array('filter'));
if (!$item)
{
$needPersist = true;
$item = new Item();
}
$item->setItemName()
// and so on ...
if ($needPersist)
{
$em->persist();
}
}
$em->flush();
}
So, the point is that code will be executed a lot of times (while server won't die :) ). And we want to optimize it. Every time we:
Select already entry from repository.
If entry not exists, create it.
Set new (update) vars to it.
Apply actions (flush).
So question is - how to avoid unnecessary queries and optimize "check if entry is exist"? Because when there are 100-500 queries it's not so scary... But when it comes up to 1000-10000 for one while loop - it's too much.
PS: Each entry in DB is unique by several columns (not only by ID).
Instead of fetching results one-by-one, load all results with one query.
Eg.
let's say your filter wants to load ids 1, 2, 10. So QB would be something like:
$allResults = ...
->where("o.id IN (:ids)")->setParameter("ids", $ids)
->getQuery()
->getResults() ;
"foreach" of these results, do your job of updating them and flushing
While doing that loop, save ids of those fetched objects in new array
Compare that array with original one using array_diff. Now you have ids that were not fetched the first time
Rinse and repeat :)
And don't forget $em->clear() to free memory
While this can still be slow when working with 10.000 records (dunno, never tested), it will be much faster to have 2 big queries than 10.000 small ones.
Regardless if you need them to persist or not after the update, retrieving 10k+ and up entries from the database and hydrating them to php objects is going to need too much memory. In such cases you should better fallback to the Doctrine DBAL Layer and fire pure SQL queries.

Fluent nHibernate Selective loading for collections

I was just wondering whether when loading an entity which contains a collection e.g. a Post which may contain 0 -> n Comments if you can define how many comments to return.
At the moment I have this:
public IList<Post> GetNPostsWithNCommentsAndCreator(int numOfPosts, int numOfComments)
{
var posts = Session.Query<Post>().OrderByDescending(x => x.CreationDateTime)
.Take(numOfPosts)
.Fetch(z => z.Comments)
.Fetch(z => z.Creator).ToList();
ReleaseCurrentSession();
return posts;
}
Is there a way of adding a Skip and Take to Comments to allow a kind of paging functionality on the collection so you don't end up loading lots of things you don't need.
I'm aware of lazy loading but I don't really want to use it, I'm using the MVC pattern and want my object to return from the repositories loaded so I can then cache them. I don't really want my views causing select statements.
Is the only real way around this is to not perform a fetch on comments but to perform a separate Select on Comments to Order By Created Date Time and then Select the top 5 for example and then place the returned result into the Post object?
Any thoughts / links on this would be appreciated.
Thanks,
Jon
A fetch simple does a left-outer join on the associated table so that it can hydrate the collection entities with data. What you are looking to do will require a separate query on the specific entities. From there you can use any number of constructs to limit your result set (skip/take, setmaxresults, etc)

How do I wrap an EF 4.1 DbContext in a repository?

All,
I have a requirement to hide my EF implementation behind a Repository. My simple question: Is there a way to execute a 'find' across both a DbSet AND the DbSet.Local without having to deal with them both.
For example - I have standard repository implementation with Add/Update/Remove/FindById. I break the generic pattern by adding a FindByName method (for demo purposes only :). This gives me the following code:
Client App:
ProductCategoryRepository categoryRepository = new ProductCategoryRepository();
categoryRepository.Add(new ProductCategory { Name = "N" });
var category1 = categoryRepository.FindByName("N");
Implementation
public ProductCategory FindByName(string s)
{
// Assume name is unique for demo
return _legoContext.Categories.Where(c => c.Name == s).SingleOrDefault();
}
In this example, category1 is null.
However, if I implement the FindByName method as:
public ProductCategory FindByName(string s)
{
var t = _legoContext.Categories.Local.Where(c => c.Name == s).SingleOrDefault();
if (t == null)
{
t = _legoContext.Categories.Where(c => c.Name == s).SingleOrDefault();
}
return t;
}
In this case, I get what I expect when querying against both a new entry and one that is only in the database. But this presents a few issues that I am confused over:
1) I would assume (as a user of the repository) that cat2 below is not found. But it is found, and the great part is that cat2.Name is "Goober".
ProductCategoryRepository categoryRepository = new ProductCategoryRepository();
var cat = categoryRepository.FindByName("Technic");
cat.Name = "Goober";
var cat2 = categoryRepository.FindByName("Technic");
2) I would like to return a generic IQueryable from my repository.
It just seems like a lot of work to wrap the calls to the DbSet in a repository. Typically, this means that I've screwed something up. I'd appreciate any insight.
With older versions of EF you had very complicated situations that could arise quite fast due to the required references. In this version I would recomend not exposing IQueryable but ICollections or ILists. This will contain EF in your repository and create a good seperation.
Edit: furthermore, by sending back ICollection IEnumerable or IList you are restraining and controlling the queries being sent to the database. This will also allow you to fine tune and maintain the system with greater ease. By exposing IQueriable, you are exposing yourself to side affects which occur when people add more to the query, .Take() or .Where ... .SelectMany, EF will see these additions and will generate sql to reflect these uncontrolled queries. Not confining the queries can result in queries getting executed from the UI and is more complicated tests and maintenance issues in the long run.
since the point of the repository pattern is to be able to swap them out at will. the details of DbSets should be completly hidden.
I think that you're on a good path. The only thing I probaly ask my self is :
Is the context long lived? if not then do not worry about querying Local. An object that has been Inserted / Deleted should only be accessible once it has been comitted.
if this is a long lived context and you need access to deleted and inserted objects then querying the Local is a good idea, but as you've pointed out, you may run into difficulties at some point.

EF: How to do effective lazy-loading (not 1+N selects)?

Starting with a List of entities and needing all dependent entities through an association, is there a way to use the corresponding navigation-propertiy to load all child-entities with one db-round-trip? Ie. generate a single WHERE fkId IN (...) statement via navigation property?
More details
I've found these ways to load the children:
Keep the set of parent-entities as IQueriable<T>
Not good since the db will have to find the main set every time and join to get the requested data.
Put the parent-objects into an array or list, then get related data through navigation properties.
var children = parentArray.Select(p => p.Children).Distinct()
This is slow since it will generate a select for every main-entity.
Creates duplicate objects since each set of children is created independetly.
Put the foreign keys from the main entities into an array then filter the entire dependent-ObjectSet
var foreignKeyIds = parentArray.Select(p => p.Id).ToArray();
var children = Children.Where(d => foreignKeyIds.Contains(d.Id))
Linq then generates the desired "WHERE foreignKeyId IN (...)"-clause.
This is fast but only possible for 1:*-relations since linking-tables are mapped away.
Removes the readablity advantage of EF by using Ids after all
The navigation-properties of type EntityCollection<T> are not populated
Eager loading though the .Include()-methods, included for completeness (asking for lazy-loading)
Alledgedly joins everything included together and returns one giant flat result.
Have to decide up front which data to use
It there some way to get the simplicity of 2 with the performance of 3?
You could attach the parent object to your context and get the children when needed.
foreach (T parent in parents) {
_context.Attach(parent);
}
var children = parents.Select(p => p.Children);
Edit: for attaching multiple, just iterate.
I think finding a good answer is not possible or at least not worth the trouble. Instead a micro ORM like Dapper give the big benefit of removing the need to map between sql-columns and object-properties and does it without the need to create a model first. Also one simply writes the desired sql instead of understanding what linq to write to have it generated. IQueryable<T> will be missed though.