NHibernate: removing from collection vs association and cascading styles - nhibernate

I'm having trouble understanding how NHibernate knows anything about objects removed from association (and then execute cascading style like delete-orphant). I mean, at a database level if I wanted to remove an association I'd have to physically log on and remove some FK. How does this happen in NH world? Do I remap my classes, remove previously established parent/child association (relationship), NH does comparative analysis, digs that something has been changed and then takes appropriate action? In this post Ayende talks about different cascading styles and delete-orphat is described as "... In addition to that, when an object is removed from the assoication and not assoicated with another object (orphaned), also delete it ..." How does this removal happen?

NHibernate watches all the mapped collections mapped that are owned by objects in the NHibernate session. As you make changes (adding/removing) NHibernate marks them as dirty. When it is time to flush the changes it compares the elements in the dirty collections and is able to identify what items have been added and removed. Depending on the cascade options for the collection NHibernate might then persist those changes to the database.
This is why you should always declare collection properties using interfaces (IList, ISet, etc.) and never replace a collection property on an object that has been loaded using NHibernate.
Additional info requested in comments:
There is a useful discussion by Fabio Maulo (NHibernate lead developer) of collection mapping here which I would strongly recommend to read. But to try and provide a brief answer to your questions:
But how does NH know that association between objects has been removed?
Generally when working in the OO model with many associations we manage relationships at the parent. That is, a child is considered to be associated with the parent when it is in a parent's collection. E.g.
child.Parent = parent;
parent.Children.Add(child); // This is the critical bit
session.Save(parent); // to have an INSERT generated here
Similarly removing an item from a collection breaks the association (assuming correct mapping attributes have used)
child.Parent = null;
parent.Children.Remove(child); // This is critical bit
session.Save(parent); // To have DELETE or UPDATE statement generated depending on cascade settings.
This is the opposite of how things work in the relational world where we manage the relationship at the child via the foreign key on the child row.
For a more detailed understanding there is nothing like downloading the NHibernate source code, creating a simple test case and then stepping through in the debugger.
What's the reason behind the "This is why..."
There are a number of things NHibernate takes care of in managing in association collections. It does this by using its own collection classes that keep track of whether they are dirty, what state they were in when they were loaded from the db and a number of other cool things. If you replace those objects then NHibernate loses that capability. So, for instance if you want to get rid of all the items in a collection you should do:
parent.Children.Clear(); // The collection object is preserved and NHibernate knows you want them all deleted.
You should NEVER do:
parent.Children = new List<X>(); // NHibernate will not track changes to this collection.
For further reading you might also want to take a look at this.

Related

Retrieve all unsaved (detached) entities

I am creating new entities but leaving them detached because I want to attach and save them later.
manager.createEntity("Employee", null, EntityState.Detached)
How can I retrieve all the added but detached entities from my entity manager? That is the entities I added that are in the cache but have not been saved?
You can't ask an EntityManager for detached entities because they are ... detached.
"Detached" means they don't belong to an EntityManager.
It is generally not a good idea to be modifying detached entities. You'll discover they don't behave like attached entities. For example, none of their navigation properties work ... for the simple reason that navigation properties look for related entities in the same EntityManager and this detached entity doesn't have an EntityManager.
I think you need to explain what motivated you to create these entities in a detached state. Why not leave them as "Added" (the default state)?
Perhaps you're worried about saving them prematurely? We can talk about how to guard against that.
Perhaps you're creating them but don't really want to save them until the user has made at least ONE change? We can talk about patterns to cover that.

Will NHibernate SaveOrUpdate be able to work thru entity's related objects as well?

I have an NHibernate entity that has 2 relationships of many to many.
Suppose I have a detached version of this entity, will SaveOrUpdate be able to decide Saving or Updating thru the related objects as well?
It depends on how relationship is configured.
The main things that come to mind are cascading and whether it is inverse or not.
If you want things to automatically cascade (~work thru) then just set that in you hbm collection association node
cascade="all"
There are more cascading options worth reading and understanding

Difference between lazy loading in NHibernate and Entity Framework

While playing around with the Entity Framework and NHibernate using POCO entities, I made the following observation which I find a bit unusual:
I have two POCO entities, an 'Order' and a 'Product'. There is a many-to-many relationship between the two. When I add a product to an Order, I use a 'FixUp' method to ensure that the opposing side of the relationship is also updated i.e. - that the products collection of 'Orders' is also updated.
My Order POCO entity has the following method to do the 'FixUp':
private void FixupProducts(object sender, NotifyCollectionChangedEventArgs e)
{
if(e.NewItems != null)
{
foreach(Product p in e.NewItems)
{
p.Order.Add(this);
}
}
}
While profiling this scenario with EFProf and NHProf, I observed that the Entity Framework generates one more SQL statement than NHibernate, the cause of which seems to be this line:
p.Order.Add(this);
With Entity Framework, the above line causes a select to be executed on the database to return all the orders for the product 'p'. I don't expect this to happen, since I'm using lazy loading and don't actually want to access the products 'Order' collection. I just want to add an order to it.
With NHibernate no attempt is made to load the products collection of orders unless I explicitly try to access it. For example, if I say:
foreach(Order o in product.Orders)
{
Console.WriteLine(o.Id);
}
So ultimately my question is, why does Entity Framework generate the extra SQL statement? Is there a difference in the implementation of lazy loading for the two frameworks that I'm not aware of?
***EDIT TO ORIGINAL
It seems that Entity Framework doesn't behave in a lazy fashion once a method of any kind is called on the collection. Any attempt to add or count (or presumably any other operation on the collection) results in that collection being loaded into memory.
What's interesting is my NHibernate mapping (which is a bag of 'Products - shown below), appears to behave in an 'extra-lazy' fashion, even though my mapping is configured as just being lazy:
<bag name="Products" cascade ="all" table="OrderProduct" mutable="true" lazy="true">
I can 'Add' to the collection without it being loaded into memory. I think that calling 'Count' will result in the orders being loaded unless I configure it as being 'extra-lazy'.
Can anyone comment on whether this is correct?
That is how EF POCO template and its FixUp methods behave. The only ways to avoid this are:
Removing FixUp methods from POCO template
Turn off lazy loading temporarily when assigning Product to Order
It is based on the way how lazy loading is implemented. Every access to property / collection itself triggers lazy loading despite of the operation you want to use on the collection. You will have to avoid build in lazy loading completely to fully avoid it. Here is example how to achieve it for Count method - you can think about similar approach for other methods.
I think that calling 'Count' will result in the orders being loaded
unless I configure it as being 'extra-lazy'.
This is correct, calling Count as well as Contains will be optimized and will not load the whole collection in NHibernate. You may also find this EF vs NHibernate comparison interesting:
Collection with lazy=”extra” – Lazy extra means that NHibernate adapts
to the operations that you might run on top of your collections. That
means that blog.Posts.Count will not force a load of the entire
collection, but rather would create a “select count(*) from Posts
where BlogId = 1” statement, and that blog.Posts.Contains() will
likewise result in a single query rather than paying the price of
loading the entire collection to memory.
Adding a new item to an uninitialized lazy collection will not load the collection. It does not have to be mapped as extra lazy for this. Take a look at this article:
Now, on collections mapped as lazy-loading, Add() operations are
allowed even if the collection is not initialized (i.e. the collection
just acts as a proxy). When adding an object to a collection in this
state, a QueueAdd() method is called that stores the added object in a
secondary collection. Once a lazy initialization is performed, this
secondary collection is merged into the main one (I believe it’s the
DelayedAddAll() method that does this). This can be hard to debug
because lazy load is transparently triggered if you just touch the
collection with the debugger (providing the session is connected at
that moment), and everything gets initialized properly.

When can I use Unidirectional relationships in NHibernate?

I'm trying to port a large graph of .NET entities to use NHibernate, but I'm encountering an issue that most of the relationships are only defined unidirectionally - in most cases, the child class contains a reference to the parent, but the parent does not contain the collection of refs to its children. It would be quite a bit of work to add all the collections to turn the relationships into bidirectional ones, so I'm wondering what the consequences for NHibernate would be of not doing so?
One consequence I've noticed is that cascading deletes seem to fail (child doesn't get deleted in the DB, causing a referential integrity violation). Is that the only consequence or are there other issues I need to be aware of?
Are there any guidelines for when relationships should be uni or bi-directional?
Thanks
I think that not being able to cascade the deletes will be the only issue with NHibernate per se.
But you will not be able to easily walk the graph. You can do it from child to parent, but obviously not from parent to child. So you would have to issue a query each time you want all the childs from a parent.
So if you are using NH for a persisted domain model, where you have a root object from which you need to use the child objects for certain operations, you would have to issue queries from within the model to get the children. So your model will be coupled to your data access.
Or you would have to pass the children to the parent object as collections, but then it might be just at easy to have the collections on the model to begin width so NH could fill them for you.

Does every Core Data Relationship have to have an Inverse?

Let's say I have two Entity classes: SocialApp and SocialAppType
In SocialApp I have one Attribute: appURL and one Relationship: type.
In SocialAppType I have three Attributes: baseURL, name and favicon.
The destination of the SocialApp relationship type is a single record in SocialAppType.
As an example, for multiple Flickr accounts, there would be a number of SocialApp records, with each record holding a link to a person's account. There would be one SocialAppType record for the "Flickr" type, that all SocialApp records would point to.
When I build an application with this schema, I get a warning that there is no inverse relationship between SocialAppType and SocialApp.
/Users/username/Developer/objc/TestApp/TestApp.xcdatamodel:SocialApp.type: warning: SocialApp.type -- relationship does not have an inverse
Do I need an inverse, and why?
Apple documentation has an great example that suggest a situation where you might have problems by not having an inverse relationship. Let's map it into this case.
Assume you modeled it as follows:
Note you have a to-one relationship called "type", from SocialApp to SocialAppType. The relationship is non-optional and has a "deny" delete rule.
Now consider the following:
SocialApp *socialApp;
SocialAppType *appType;
// assume entity instances correctly instantiated
[socialApp setSocialAppType:appType];
[managedObjectContext deleteObject:appType];
BOOL saved = [managedObjectContext save:&error];
What we expect is to fail this context save since we have set the delete rule as Deny while relationship is non optional.
But here the save succeeds.
The reason is that we haven't set an inverse relationship. Because of that, the socialApp instance does not get marked as changed when appType is deleted. So no validation happens for socialApp before saving (it assumes no validation needed since no change happened). But actually a change happened. But it doesn't get reflected.
If we recall appType by
SocialAppType *appType = [socialApp socialAppType];
appType is nil.
Weird, isn't it? We get nil for a non-optional attribute?
So you are in no trouble if you have set up the inverse relationship.
Otherwise you have to do force validation by writing the code as follows.
SocialApp *socialApp;
SocialAppType *appType;
// assume entity instances correctly instantiated
[socialApp setSocialAppType:appType];
[managedObjectContext deleteObject:appType];
[socialApp setValue:nil forKey:#"socialAppType"]
BOOL saved = [managedObjectContext save:&error];
In practice, I haven't had any data loss due to not having an inverse - at least that I am aware of. A quick Google suggests you should use them:
An inverse relationship doesn't just
make things more tidy, it's actually
used by Core Data to maintain data
integrity.
-- Cocoa Dev Central
You should typically model
relationships in both directions, and
specify the inverse relationships
appropriately. Core Data uses this
information to ensure the consistency
of the object graph if a change is
made (see “Manipulating Relationships
and Object Graph Integrity”). For a
discussion of some of the reasons why
you might want to not model a
relationship in both directions, and
some of the problems that might arise
if you don’t, see “Unidirectional
Relationships.”
-- Core Data Programming Guide
I'll paraphrase the definitive answer I found in More iPhone 3 Development by Dave Mark and Jeff LeMarche.
Apple generally recommends that you always create and specify the inverse, even if you don't use the inverse relationship in your app. For this reason, it warns you when you fail to provide an inverse.
Relationships are not required to have an inverse, because there are a few scenarios in which the inverse relationship could hurt performance. For example, suppose the inverse relationship contains an extremely large number of objects. Removing the inverse requires iterating over the set that represents the inverse, weakening performance.
But unless you have a specific reason not to, model the inverse. It helps Core Data ensure data integrity. If you run into performance issues, it's relatively easy to remove the inverse relationship later.
There is at least one scenario where a good case can be made for a core data relationship without an inverse: when there is another core data relationship between the two objects already, which will handle maintaining the object graph.
For instance, a book contains many pages, while a page is in one book. This is a two-way many-to-one relationship. Deleting a page just nullifies the relationship, whereas deleting a book will also delete the page.
However, you may also wish to track the current page being read for each book. This could be done with a "currentPage" property on Page, but then you need other logic to ensure that only one page in the book is marked as the current page at any time. Instead, making a currentPage relationship from Book to a single page will ensure that there will always only be one current page marked, and furthermore that this page can be accessed easily with a reference to the book with simply book.currentPage.
What would the reciprocal relationship be in this case? Something largely nonsensical. "myBook" or similar could be added back in the other direction, but it contains only the information already contained in the "book" relationship for the page, and so creates its own risks. Perhaps in the future, the way you are using one of these relationships is changed, resulting in changes in your core data configuration. If page.myBook has been used in some places where page.book should have been used in the code, there could be problems. Another way to proactively avoid this would also be to not expose myBook in the NSManagedObject subclass that is used to access page. However, it can be argued that it is simpler to not model the inverse in the first place.
In the example outlined, the delete rule for the currentPage relationship should be set to "No Action" or "Cascade", since there is no reciprocal relationship to "Nullify". (Cascade implies you are ripping every page out of the book as you read it, but that might be true if you're particularly cold and need fuel.)
When it can be demonstrated that object graph integrity is not at risk, as in this example, and code complexity and maintainability is improved, it can be argued that a relationship without an inverse may be the correct decision.
An alternative solution, as discussed in the comments, is to create your own UUID property on the target (in the example here, every Page would have an id that is a UUID), store that as a property (currentPage just stores a UUID as an Attribute in Book, rather than being a relationship), and then write a method to fetch the Page with the matching UUID when needed. This is probably a better approach than using a relationship without an inverse, not the least because it avoids the warning messages discussed.
The better question is, "is there a reason not to have an inverse"? Core Data is really an object graph management framework, not a persistence framework. In other words, its job is to manage the relationships between objects in the object graph. Inverse relationships make this much easier. For that reason, Core Data expects inverse relationships and is written for that use case. Without them, you will have to manage the object graph consistency yourself. In particular, to-many relationships without an inverse relationship are very likely to be corrupted by Core Data unless you work very hard to keep things working. The cost in terms of disk size for the inverse relationships really is insignificant in comparison to the benefit it gains you.
While the docs don't seem to require an inverse, I just resolved a scenario that did in fact result in "data loss" by not having an inverse. I have a report object that has a to-many relationship on reportable objects. Without the inverse relationship, any changes to the to-many relationship were lost upon relaunch. After inspecting the Core Data debug it was apparent that even though I was saving the report object, the updates to the object graph (relationships) were never being made. I added an inverse, even though I don't use it, and voila, it works. So it might not say it's required but relationships without inverses can definitely have strange side effects.
Inverses are also used for Object Integrity (for other reasons, see the other answers):
The recommended approach is to model relationships in both directions
and specify the inverse relationships appropriately. Core Data uses
this information to ensure the consistency of the object graph if a
change is made
From: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/CoreData/HowManagedObjectsarerelated.html#//apple_ref/doc/uid/TP40001075-CH17-SW1
The provided link gives you ideas why you should have an inverse set. Without it, you can lose data/integrety. Also, the chance that you access an object which is nil is more likely.
There is no need for inverse relationship generally. But there are few quirks/bugs in Core data where you need an inverse relationship. There are cases where relationships/objects go missing , even though there is no error while saving the context, if there are missing inverse relationship. Check this example, which I created to demonstrate objects missing and how to workaround, while working with Core data