NHibernate Lazy Loading Behaviour - nhibernate

I was reading this article on Nhibernate Lazy Loading http://nhforge.org/wikis/howtonh/lazy-loading-eager-loading.aspx and it uses and example of a class structure like this:
The article then shows the following code:
using (ISession session = SessionFactory.OpenSession())
{
var fromDb = session.Get<Order>(_order.Id);
int sum = 0;
foreach (var line in fromDb.OrderLines)
{
// just some dummy code to force loading of order line
sum += line.Amount;
}
}
It then goes on to talk about:
the n+1 select statements problem. If we access the order line items
after loading the order we generate a select statement for each line
item we access.
This is the behaviour I remebered of lazy loading, namely when I first get an order, the order lines collection is a proxy of a collection of order lines, then as I iterate through the order lines each one is loaded on demand.
However this is not the behaviour I am observing. What happens when I try this in my application is that when I get an order sure enough the collection of order lines is a proxy, but as soon as I access the first OrderLine using:
fromDb.OrderLines.First()
The entire collection is loaded into memory. This is a problem for me as the collection contains a lot of items and I only want to change one, but if I load all the items into memory and change one and try to save order I am obviously getting very poor performance.
So did the behaviour change since I this article was written? I am simply misunderstanding how lazy loading works? Or is there some way I can configure NHibernate to only load the items from the collection it needs?

"the n+1 select statements problem. If we access the order line items after loading the order we generate a select statement for each line item we access." is not corect. The Order lines are all loaded together because this is most of the time much more efficient. Select N+1 is mostly code like this:
var orders = session.QueryOver<Order>().List()
var usersWithOrders = orders.Select(o => o.User);
because you have 1 Select for the Orders and N Selects for each user (in reality only for distinct users because of session cache)
If you know you have large collections and only want to process some or need Count and Contains then there is <bag lazy="extra"> / HasMany(x => x.Lines).ExtraLazyLoad() which results in a collection proxy that issues queries for Count, Contains, this[] instead of loading it all.
Or you can session.QueryOver<OderLines>().Where(line => line.Order == order && ...) to get the specific lines you want to process

Related

Symfony2, Doctrine, add\insert\update best solution for big count of queries

Let's imagine we have this code:
while (true)
{
foreach($array as $row)
{
$item = $em->getRepository('reponame')->findOneBy(array('filter'));
if (!$item)
{
$needPersist = true;
$item = new Item();
}
$item->setItemName()
// and so on ...
if ($needPersist)
{
$em->persist();
}
}
$em->flush();
}
So, the point is that code will be executed a lot of times (while server won't die :) ). And we want to optimize it. Every time we:
Select already entry from repository.
If entry not exists, create it.
Set new (update) vars to it.
Apply actions (flush).
So question is - how to avoid unnecessary queries and optimize "check if entry is exist"? Because when there are 100-500 queries it's not so scary... But when it comes up to 1000-10000 for one while loop - it's too much.
PS: Each entry in DB is unique by several columns (not only by ID).
Instead of fetching results one-by-one, load all results with one query.
Eg.
let's say your filter wants to load ids 1, 2, 10. So QB would be something like:
$allResults = ...
->where("o.id IN (:ids)")->setParameter("ids", $ids)
->getQuery()
->getResults() ;
"foreach" of these results, do your job of updating them and flushing
While doing that loop, save ids of those fetched objects in new array
Compare that array with original one using array_diff. Now you have ids that were not fetched the first time
Rinse and repeat :)
And don't forget $em->clear() to free memory
While this can still be slow when working with 10.000 records (dunno, never tested), it will be much faster to have 2 big queries than 10.000 small ones.
Regardless if you need them to persist or not after the update, retrieving 10k+ and up entries from the database and hydrating them to php objects is going to need too much memory. In such cases you should better fallback to the Doctrine DBAL Layer and fire pure SQL queries.

Nhibernate QueryOver don't get latest database changes

I am trying get a record updated from database with QueryOver.
My code initially creates an entity and saves in database, then the same record is updated on database externally( from other program, manually or the same program running in other machine), and when I call queryOver filtering by the field changed, the query gets the record but without latest changes.
This is my code:
//create the entity and save in database
MyEntity myEntity = CreateDummyEntity();
myEntity.Name = "new_name";
MyService.SaveEntity(myEntity);
// now the entity is updated externally changing the name property with the
// "modified_name" value (for example manually in TOAD, SQL Server,etc..)
//get the entity with QueryOver
var result = NhibernateHelper.Session
.QueryOver<MyEntity>()
.Where(param => param.Name == "modified_name")
.List<T>();
The previous statement gets a collection with only one record(good), BUT with the name property established with the old value instead of "modified_name".
How I can fix this behaviour? First Level cache is disturbing me? The same problem occurs with
CreateCriteria<T>();
The session in my NhibernateHelper is not being closed in any moment due application framework requirements, only are created transactions for each commit associated to a session.Save().
If I open a new session to execute the query evidently I get the latest changes from database, but this approach is not allowed by design requirement.
Also I have checked in the NHibernate SQL output that a select with a WHERE clause is being executed (therefore Nhibernate hits the database) but don´t updates the returned object!!!!
UPDATE
Here's the code in SaveEntity after to call session.Save: A call to Commit method is done
public virtual void Commit()
{
try
{
this.session.Flush();
this.transaction.Commit();
}
catch
{
this.transaction.Rollback();
throw;
}
finally
{
this.transaction = this.session.BeginTransaction();
}
}
The SQL generated by NHibernate for SaveEntity:
NHibernate: INSERT INTO MYCOMPANY.MYENTITY (NAME) VALUES (:p0);:p0 = 'new_name'.
The SQL generated by NHibernate for QueryOver:
NHibernate: SELECT this_.NAME as NAME26_0_
FROM MYCOMPANY.MYENTITY this_
WHERE this_.NAME = :p0;:p0 = 'modified_name' [Type: String (0)].
Queries has been modified due to company confidential policies.
Help very appreciated.
As far as I know, you have several options :
have your Session as a IStatelessSession, by calling sessionFactory.OpenStatelesSession() instead of sessionFactory.OpenSession()
perform Session.Evict(myEntity) after persisting an entity in DB
perform Session.Clear() before your QueryOver
set the CacheMode of your Session to Ignore, Put or Refresh before your QueryOver (never tested that)
I guess the choice will depend on the usage you have of your long running sessions ( which, IMHO, seem to bring more problems than solutions )
Calling session.Save(myEntity) does not cause the changes to be persisted to the DB immediately*. These changes are persisted when session.Flush() is called either by the framework itself or by yourself. More information about flushing and when it is invoked can be found on this question and the nhibernate documentation about flushing.
Also performing a query will not cause the first level cache to be hit. This is because the first level cache only works with Get and Load, i.e. session.Get<MyEntity>(1) would hit the first level cache if MyEntity with an id of 1 had already been previously loaded, whereas session.QueryOver<MyEntity>().Where(x => x.id == 1) would not.
Further information about NHibernate's caching functionality can be found in this post by Ayende Rahien.
In summary you have two options:
Use a transaction within the SaveEntity method, i.e.
using (var transaction = Helper.Session.BeginTransaction())
{
Helper.Session.Save(myEntity);
transaction.Commit();
}
Call session.Flush() within the SaveEntity method, i.e.
Helper.Session.Save(myEntity);
Helper.Session.Flush();
The first option is the best in pretty much all scenarios.
*The only exception I know to this rule is when using Identity as the id generator type.
try changing your last query to:
var result = NhibernateHelper.Session
.QueryOver<MyEntity>()
.CacheMode(CacheMode.Refresh)
.Where(param => param.Name == "modified_name")
if that still doesn't work, try add this after the query:
NhibernateHelper.Session.Refresh(result);
After search and search and think and think.... I´ve found the solution.
The fix: It consist in open a new session, call QueryOver<T>() in this session and the data is succesfully refreshed. If you get child collections not initialized you can call HibernateUtil.Initialize(entity) or sets lazy="false" in your mappings. Take special care about lazy="false" in large collections, because you can get a poor performance. To fix this problem(performance problem loading large collections), set lazy="true" in your collection mappings and call the mentioned method HibernateUtil.Initialize(entity) of the affected collection to get child records from database; for example, you can get all records from a table, and if you need access to all child records of a specific entity, call HibernateUtil.Initialize(collection) only for the interested objects.
Note: as #martin ernst says, the update problem can be a bug in hibernate and my solution is only a temporal fix, and must be solved in hibernate.
People here do not want to call Session.Clear() since it is too strong.
On the other hand, Session.Evict() may seem un-applicable when the objects are not known beforehand.
Actually it is still usable.
You need to first retrieve the cached objects using the query, then call Evict() on them. And then again retrieve fresh objects calling the same query again.
This approach is slightly inefficient in case the object was not cached to begin with - since then there would be actually two "fresh" queries - but there seems to be not much to do about that shortcoming...
By the way, Evict() accepts null argument too without exceptions - this is useful in case the queried object is actually not present in the DB.
var cachedObjects = NhibernateHelper.Session
.QueryOver<MyEntity>()
.Where(param => param.Name == "modified_name")
.List<T>();
foreach (var obj in cachedObjects)
NhibernateHelper.Session.Evict(obj);
var freshObjects = NhibernateHelper.Session
.QueryOver<MyEntity>()
.Where(param => param.Name == "modified_name")
.List<T>()
I'm getting something very similar, and have tried debugging NHibernate.
In my scenario, the session creates an object with a couple children in a related collection (cascade:all), and then calls ISession.Flush().
The records are written into the DB, and the session needs to continue without closing. Meanwhile, another two child records are written into the DB and committed.
Once the original session then attempts to re-load the graph using QueryOver with JoinAlias, the SQL statement generated looks perfectly fine, and the rows are being returned correctly, however the collection that should receive these new children is found to have already been initialized within the session (as it should be), and based on that NH decides for some reason to completely ignore the respective rows.
I think NH makes an invalid assumption here that if the collection is already marked "Initialized" it does not need to be re-loaded from the query.
It would be great if someone more familiar with NHibernate internals could chime in on this.

How can I speed my Entity Framework code?

My SQL and Entity Framework knowledge is a somewhat limited. In one Entity Framework (4) application, I notice it takes forever (about 2 minutes) to complete one of my method calls. The first queries do not take much time, but when I loop through the Entity Framework objects returned by the queries, even though I am only reading (not modifying) the data I supposedly got, it takes forever to complete the nested loops, even though there are only dozens of entries in each list and a few levels of looping.
I expect the example below could be re-written with a fancier query that could probably include all of the filtering I am doing in my loops with some SQL words I don't really know how to use, so if someone could show me what the equivalent SQL expression would be, that would be extremely educational to me and probably solve my current performance problem.
Moreover, since other parts of this and other applications I develop often want to do more complex computations on SQL data, I would also like to know a good way to retrieve data from Entity Framework to local memory objects that do not have huge delays in reading them. In my LINQ-to-SQL project there was a similar performance problem, and I solved it by refactoring the whole application to load all SQL data into parallel objects in RAM, which I had to write myself, and I wonder if there isn't a better way to either tell Entity Framework to not keep doing whatever high-latency communication it is doing, or to load into local RAM objects.
In the example below, the code gets a list of food menu items for a member (i.e. a person) on a certain date via a SQL query, and then I use other queries and loops to filter out the menu items on two criteria: 1) If the member has a rating of zero for any group id which the recipe is a member of (a many-to-many relationship) and 2) If the member has a rating of zero for the recipe itself.
Example:
List<PFW_Member_MenuItem> MemberMenuForCookDate =
(from item in _myPfwEntities.PFW_Member_MenuItem
where item.MemberID == forMemberId
where item.CookDate == onCookDate
select item).ToList();
// Now filter out recipes in recipe groups rated zero by the member:
List<PFW_Member_Rating_RecipeGroup> ExcludedGroups =
(from grpRating in _myPfwEntities.PFW_Member_Rating_RecipeGroup
where grpRating.MemberID == forMemberId
where grpRating.Rating == 0
select grpRating).ToList();
foreach (PFW_Member_Rating_RecipeGroup grpToExclude in ExcludedGroups)
{
List<PFW_Member_MenuItem> rcpsToRemove = new List<PFW_Member_MenuItem>();
foreach (PFW_Member_MenuItem rcpOnMenu in MemberMenuForCookDate)
{
PFW_Recipe rcp = GetRecipeById(rcpOnMenu.RecipeID);
foreach (PFW_RecipeGroup group in rcp.PFW_RecipeGroup)
{
if (group.RecipeGroupID == grpToExclude.RecipeGroupID)
{
rcpsToRemove.Add(rcpOnMenu);
break;
}
}
}
foreach (PFW_Member_MenuItem rcpToRemove in rcpsToRemove)
MemberMenuForCookDate.Remove(rcpToRemove);
}
// Now filter out recipes rated zero by the member:
List<PFW_Member_Rating_Recipe> ExcludedRecipes =
(from rcpRating in _myPfwEntities.PFW_Member_Rating_Recipe
where rcpRating.MemberID == forMemberId
where rcpRating.Rating == 0
select rcpRating).ToList();
foreach (PFW_Member_Rating_Recipe rcpToExclude in ExcludedRecipes)
{
List<PFW_Member_MenuItem> rcpsToRemove = new List<PFW_Member_MenuItem>();
foreach (PFW_Member_MenuItem rcpOnMenu in MemberMenuForCookDate)
{
if (rcpOnMenu.RecipeID == rcpToExclude.RecipeID)
rcpsToRemove.Add(rcpOnMenu);
}
foreach (PFW_Member_MenuItem rcpToRemove in rcpsToRemove)
MemberMenuForCookDate.Remove(rcpToRemove);
}
You can use EFProf http://www.hibernatingrhinos.com/products/EFProf to track see exactly what EF is sending to SQL. It can also show you how many queries you are sending and how many unique queries. It also provides you some analysis of each query (e.g. is it unbound etc). Entity Framework with its navigation properties, it is quite easy to not realize you are making a db request. When you are in a loop, and have a navigation property, you get in to the N + 1 problem.
You could use the Keyword Virtual on your List parts of your model if you are using code first to enable proxying, that way you will not have to get all the data back at once, only as you need it.
Also consider NoTracking for read only data
context.bigTable.MergeOption = MergeOption.NoTracking;

Fluent nHibernate Selective loading for collections

I was just wondering whether when loading an entity which contains a collection e.g. a Post which may contain 0 -> n Comments if you can define how many comments to return.
At the moment I have this:
public IList<Post> GetNPostsWithNCommentsAndCreator(int numOfPosts, int numOfComments)
{
var posts = Session.Query<Post>().OrderByDescending(x => x.CreationDateTime)
.Take(numOfPosts)
.Fetch(z => z.Comments)
.Fetch(z => z.Creator).ToList();
ReleaseCurrentSession();
return posts;
}
Is there a way of adding a Skip and Take to Comments to allow a kind of paging functionality on the collection so you don't end up loading lots of things you don't need.
I'm aware of lazy loading but I don't really want to use it, I'm using the MVC pattern and want my object to return from the repositories loaded so I can then cache them. I don't really want my views causing select statements.
Is the only real way around this is to not perform a fetch on comments but to perform a separate Select on Comments to Order By Created Date Time and then Select the top 5 for example and then place the returned result into the Post object?
Any thoughts / links on this would be appreciated.
Thanks,
Jon
A fetch simple does a left-outer join on the associated table so that it can hydrate the collection entities with data. What you are looking to do will require a separate query on the specific entities. From there you can use any number of constructs to limit your result set (skip/take, setmaxresults, etc)

EF: How to do effective lazy-loading (not 1+N selects)?

Starting with a List of entities and needing all dependent entities through an association, is there a way to use the corresponding navigation-propertiy to load all child-entities with one db-round-trip? Ie. generate a single WHERE fkId IN (...) statement via navigation property?
More details
I've found these ways to load the children:
Keep the set of parent-entities as IQueriable<T>
Not good since the db will have to find the main set every time and join to get the requested data.
Put the parent-objects into an array or list, then get related data through navigation properties.
var children = parentArray.Select(p => p.Children).Distinct()
This is slow since it will generate a select for every main-entity.
Creates duplicate objects since each set of children is created independetly.
Put the foreign keys from the main entities into an array then filter the entire dependent-ObjectSet
var foreignKeyIds = parentArray.Select(p => p.Id).ToArray();
var children = Children.Where(d => foreignKeyIds.Contains(d.Id))
Linq then generates the desired "WHERE foreignKeyId IN (...)"-clause.
This is fast but only possible for 1:*-relations since linking-tables are mapped away.
Removes the readablity advantage of EF by using Ids after all
The navigation-properties of type EntityCollection<T> are not populated
Eager loading though the .Include()-methods, included for completeness (asking for lazy-loading)
Alledgedly joins everything included together and returns one giant flat result.
Have to decide up front which data to use
It there some way to get the simplicity of 2 with the performance of 3?
You could attach the parent object to your context and get the children when needed.
foreach (T parent in parents) {
_context.Attach(parent);
}
var children = parents.Select(p => p.Children);
Edit: for attaching multiple, just iterate.
I think finding a good answer is not possible or at least not worth the trouble. Instead a micro ORM like Dapper give the big benefit of removing the need to map between sql-columns and object-properties and does it without the need to create a model first. Also one simply writes the desired sql instead of understanding what linq to write to have it generated. IQueryable<T> will be missed though.