I made a lot of examples to check when bag collection is recreating during adding or removing item from collection. I read that in http://knol.google.com/k/nhibernate-chapter-16-improving-performance section 16.5.1. Taxonomy:
Bags are the worst case. Since a bag
permits duplicate element values and
has no index column, no primary key
may be defined. NHibernate has no way
of distinguishing between duplicate
rows. NHibernate resolves this problem
by completely removing (in a single
DELETE) and recreating the collection
whenever it changes. This might be
very inefficient.
I made bidirectional of type one to many(Person -> Addresses) and the following tests:
Test 1: Inverse= false; action=insert,update,remove,count; Collection types: Set, Bag
Result: Collections behave exactly the same!
Test 2: Inverse= true; action=insert,update,remove,count; Collection types: Set, Bag
Result: Collections behave almost the same! I only see difference in adding new item to bag collection - when i do that collection is not filled with data from db.
I was using nhibernate profiler/session statystics for analizying changes in session object and in database. But i did not see any recreating items of collection, whed did it happend? i memory?
Recreating collections applies only for entities loaded from the database. When running tests in the same session that the entities were created, NHibernate knows that the collections are empty, manipulates it in memory and saves only the final state to the database on transaction commit/session flush.
I've done similiar tests - see this blog entry for example of re-creating bag collection.
Related
I'm relatively new to using NHibernate and I'm running into a shortcoming I can't seem to work myself around. I have an object tree that I wish to retrieve from the database in a single roundtrip but end up with a cartesian product.
The objects I'm trying to retrieve are called 'AccountGroup', 'Concern', 'Advertiser' and 'Product' and I only wish to get those objects where the active user has permissions for.
My initial query looked like this:
using (var session = OpenSession())
{
return session.Query<AccountGroupEntity>()
.FetchMany(a => a.Planners)
.Where(a => a.Planners.Any(p => p.Id == userId))
.FetchMany(a => a.Concerns)
.ThenFetchMany(c => c.Advertisers)
.ThenFetch(a => a.Products)
.ToList();
}
This won't work as it will return a cartesian product and the resulting entities will contain many duplicates.
However, I have NO idea how to fix this. I've seen the ToFuture() method that will allow me to execute more than one query in the same roundtrip, but I have no clue how to configure my ToFuture() query in such a way that it populates all the child collections properly.
Could anyone shine some light on how I can use ToFuture to fetch the entire tree in a single query without duplicates?
I do have an answer to this topic, solution which I do use. But it at the end means "do not use Fetch" - do it differently. So, please, take it at least as a suggestion.
Check this Q & A:
How to Eager Load Associations without duplication in NHibernate?
Small cite:
Fetching Collections is a difficult operation. It has many side effects (as you realized, when there are fetched more collections). But even with fetching one collection, we are loading many duplicated rows.
Other words, Fetching is a fragil feature, and should be used wisely in very few scenarios, I'd say. So what to use? How to solve that?
Profit from a built in NHibernate feature:
19.1.5. Using batch fetching
NHibernate can make efficient use of batch fetching, that is, NHibernate can load several uninitialized proxies if one proxy is accessed (or collections. Batch fetching is an optimization of the lazy select fetching strategy. There are two ways you can tune batch fetching: on the class and the collection level.
Batch fetching for classes/entities is easier to understand. Imagine you have the following situation at runtime: You have 25 Cat instances loaded in an ISession, each Cat has a reference to its Owner, a Person. The Person class is mapped with a proxy, lazy="true". If you now iterate through all cats and call cat.Owner on each, NHibernate will by default execute 25 SELECT statements, to retrieve the proxied owners. You can tune this behavior by specifying a batch-size in the mapping of Person:
<class name="Person" batch-size="10">...</class>
NHibernate will now execute only three queries, the pattern is 10, 10, 5.
You may also enable batch fetching of collections. For example, if each Person has a lazy collection of Cats, and 10 persons are currently loaded in the ISesssion, iterating through all persons will generate 10 SELECTs, one for every call to person.Cats. If you enable batch fetching for the Cats collection in the mapping of Person, NHibernate can pre-fetch collections:
<class name="Person">
<set name="Cats" batch-size="3">
...
</set>
My experience, this approach is pricless. The setting working for us is batch-size="25".
If you ask for any kind of Entity (via session.Get() or .QueryOver()...) - until session is open, the first time we touch related reference or collection - it is loaded in few batches... No 1 + N SELECT Issue...
Summary: Mark all your classes, and all collection with batch-size="x" (x could be 25). That will support clean queries over root Entities - until session is open, all related stuff is loaded in few SELECTS. The x could be adjusted, for some could be much more higher...
I tried to do a lot of research but I'm more of a db guy - so even the explanation in the MSDN doesn't make any sense to me. Can anyone please explain, and provide some examples on what Include() statement does in the term of SQL query?
Let's say for instance you want to get a list of all your customers:
var customers = context.Customers.ToList();
And let's assume that each Customer object has a reference to its set of Orders, and that each Order has references to LineItems which may also reference a Product.
As you can see, selecting a top-level object with many related entities could result in a query that needs to pull in data from many sources. As a performance measure, Include() allows you to indicate which related entities should be read from the database as part of the same query.
Using the same example, this might bring in all of the related order headers, but none of the other records:
var customersWithOrderDetail = context.Customers.Include("Orders").ToList();
As a final point since you asked for SQL, the first statement without Include() could generate a simple statement:
SELECT * FROM Customers;
The final statement which calls Include("Orders") may look like this:
SELECT *
FROM Customers JOIN Orders ON Customers.Id = Orders.CustomerId;
I just wanted to add that "Include" is part of eager loading. It is described in Entity Framework 6 tutorial by Microsoft. Here is the link:
https://learn.microsoft.com/en-us/aspnet/mvc/overview/getting-started/getting-started-with-ef-using-mvc/reading-related-data-with-the-entity-framework-in-an-asp-net-mvc-application
Excerpt from the linked page:
Here are several ways that the Entity Framework can load related data into the navigation properties of an entity:
Lazy loading. When the entity is first read, related data isn't retrieved. However, the first time you attempt to access a navigation property, the data required for that navigation property is automatically retrieved. This results in multiple queries sent to the database — one for the entity itself and one each time that related data for the entity must be retrieved. The DbContext class enables lazy loading by default.
Eager loading. When the entity is read, related data is retrieved along with it. This typically results in a single join query that retrieves all of the data that's needed. You specify eager loading by using the Include method.
Explicit loading. This is similar to lazy loading, except that you explicitly retrieve the related data in code; it doesn't happen automatically when you access a navigation property. You load related data manually by getting the object state manager entry for an entity and calling the Collection.Load method for collections or the Reference.Load method for properties that hold a single entity. (In the following example, if you wanted to load the Administrator navigation property, you'd replace Collection(x => x.Courses) with Reference(x => x.Administrator).) Typically you'd use explicit loading only when you've turned lazy loading off.
Because they don't immediately retrieve the property values, lazy loading and explicit loading are also both known as deferred loading.
Think of it as enforcing Eager-Loading in a scenario where your sub-items would otherwise be lazy-loading.
The Query EF is sending to the database will yield a larger result at first, but on access no follow-up queries will be made when accessing the included items.
On the other hand, without it, EF would execute separte queries later, when you first access the sub-items.
include() method just to include the related entities.
but what happened on sql is based on the relationship between those entities which you are going to include what the data you going to fetch.
your LINQ query decides what type of joins have to use, there could be left outer joins there could be inner join there could be right joins etc...
#Corey Adler
Remember that you should use .Include() and .ThenInclude() only when returning the object (NOT THE QUERYABLE) with the "other table property".
As a result, it should only be used when returning APIs' objects, not in your intra-application.
I have a web application I've developed that has a fairly complex save routine. The user builds and modifies a series of plans and then chooses to save the data. All of the additions, deletions, and modifications are saved in one go at this point (all inside a single transaction).
Plan
Collection Of Child
Collection of ChildDetail
The bulk of the save is performed by calling SaveOrUpdate on a plan object and letting this save manage the plan and it's children. I use zero as the unsaved value when I want to insert a new record and use cascade=all-delete-orphan to ensure that if a child object or child detail object is removed on the client side the object is deleted.
I am receiving an exception however when the following happens: A user creates a plan with child objects and saves them. This will save fine.
Plan(id=0)
Child[0](id=0), Child[1](id=0), Child[2](id=0)
If the user then removes the child objects, and adds new child objects in their place and then attempts to save the changes.
Plan(id=123)
Child[0](id=0), Child[1](id=0), Child[2](id=0)
This throws a GenericAdoException unable to insert Child with an inner exception "SQL0803 Duplicate Key Value Specified".
The behavior I'm looking for is for NHibernate to delete the previous Child objects then insert the new ones when SaveOrUpdate is called on the Plan. How can I achieve this while still letting the parent manage the relationships?
A solution I found for the time being is to pass the Ids of the removed children to the save routine. I then set the ids of the new children to the ids of the removed children. Effectively turning a remove/add into an update.
I would like to refresh an entity and all its child collections. What is the best way to do this? I'm talking about nhibernate:)
I've read about session.Evict, session.Refresh...
But I'm still not sure if doing like:
RefreshEntity<T>(T entity)
{
session.Evict(entity);
session.Refresh(entity);
}
would work exactly how I want it to work
Is it going to work? If not What else I can do?
Refresh after Evict probably won't work.
Theoretically, Refresh alone should be enough. However, it has known issues when elements of child collections have been deleted.
Evict followed by Get usually gets things done.
Refresh(parentObject) would be a good option, but for me, it first fetched all children one by one with single requests. No batching, no subquery, no join. Very bad!
It helped to .Clear() the child collection of the parent object; I also evicted the child objects before.
(these had been changed by a HQL update before where multiple inserts by parent/children SaveOrUpdate would cause expensive clustered index rebuilds).
EDIT: I removed the HQL update again, since the query (decrement index by a unique, large number) was more expensive than hundreds of single row updates in a batch. So I ended up in a simple SaveOrUpdate(parentObject), with no need to refresh.
The reason was a child collection with unique constraint on ParentID and Index (sequential number), which would result in uniqueness violations while updating the changed children items. So the index was first incremented by 1000000 (or an arbitrary high number) for all children, then after changes, decremented again.
I am wondering how can one delete an entity having just its ID and type (as in mapping) using NHibernate 2.1?
If you are using lazy loading, Load only creates a proxy.
session.Delete(session.Load(type, id));
With NH 2.1 you can use HQL. Not sure how it actually looks like, but something like this: note that this is subject to SQL injection - if possible use parametrized queries instead with SetParameter()
session.Delete(string.Format("from {0} where id = {1}", type, id));
Edit:
For Load, you don't need to know the name of the Id column.
If you need to know it, you can get it by the NH metadata:
sessionFactory.GetClassMetadata(type).IdentifierPropertyName
Another edit.
session.Delete() is instantiating the entity
When using session.Delete(), NH loads the entity anyway. At the beginning I didn't like it. Then I realized the advantages. If the entity is part of a complex structure using inheritance, collections or "any"-references, it is actually more efficient.
For instance, if class A and B both inherit from Base, it doesn't try to delete data in table B when the actual entity is of type A. This wouldn't be possible without loading the actual object. This is particularly important when there are many inherited types which also consist of many additional tables each.
The same situation is given when you have a collection of Bases, which happen to be all instances of A. When loading the collection in memory, NH knows that it doesn't need to remove any B-stuff.
If the entity A has a collection of Bs, which contains Cs (and so on), it doesn't try to delete any Cs when the collection of Bs is empty. This is only possible when reading the collection. This is particularly important when C is complex of its own, aggregating even more tables and so on.
The more complex and dynamic the structure is, the more efficient is it to load actual data instead of "blindly" deleting it.
HQL Deletes have pitfalls
HQL deletes to not load data to memory. But HQL-deletes aren't that smart. They basically translate the entity name to the corresponding table name and remove that from the database. Additionally, it deletes some aggregated collection data.
In simple structures, this may work well and efficient. In complex structures, not everything is deleted, leading to constraint violations or "database memory leaks".
Conclusion
I also tried to optimize deletion with NH. I gave up in most of the cases, because NH is still smarter, it "just works" and is usually fast enough. One of the most complex deletion algorithms I wrote is analyzing NH mapping definitions and building delete statements from that. And - no surprise - it is not possible without reading data from the database before deleting. (I just reduced it to only load primary keys.)