I'm trying to figure out why NHibernate handles one-to-many cascading (using cascade=all-delete-orphan) the way it does. I ran into the same issue as this guy:
Forcing NHibernate to cascade delete before inserts
As far as I can tell NHibernate always performs inserts first, then updates, then deletes. There may be a very good reason for this, but I can't for the life of me figure out what that reason is. I'm hoping that a better understanding of this will help me come up with a solution that I don't hate :)
Are there any good theories on this behavior? In what scenario would deleting orphans first not work? Do all ORMs work this way?
EDIT: After saying there is no reason, here is a reason.
Lets say you have the following scenario:
public class Dog {
public DogLeg StrongestLeg {get;set;}
public IList<DogLeg> Legs {get;set;
}
If you were to delete first, and lets say you delete all of Dog.Legs, then you may delete the StrongestLeg which would cause a reference violation. Hence you cannot DELETE before you UPDATE.
Lets say you add a new leg, and that new leg is also the StrongestLeg. Then you must INSERT before you UPDATE so that the Leg has an Id that can be inserted into Dog.StrongestLegId.
So you must INSERT, UPDATE, then DELETE.
Also as nHibernate is based on Hibernate, I had a look into Hibernate and found several people talking about the same issue.
Support one-to-many list associations with constraints on both (owner_id, position) and (child_id)
Non lazy loaded List updates done in wrong order, cause exception
wrong insert/delete order when updating record-set
Why does Hibernate perform Inserts before Deletes?
Unidirection OneToMany causes duplicate key entry violation when removing from list
And here is the best answer from them:
Gail Badner added a comment - 21/Feb/08 2:30 PM: The problem arises when a new
association entity with a generated ID
is added to the collection. The first
step, when merging an entity
containing this collection, is to
cascade save the new association
entity. The cascade must occur before
other changes to the collection.
Because the unique key for this new
association entity is the same as an
entity that is already persisted, a
ConstraintViolationException is
thrown. This is expected behavior.
Related
I've learned that SQLAlchemy implements some foreign key handling such as setting them null when the parent is deleted, separately from the database, meaning they can be set to different behaviors, and different ways of doing the same thing could get either one. Example:
I have Comment and Subscription, where Subscription has a foreign key relationship to Comment. I started by setting backref = backref('subs', cascade = 'delete, delete-orphan') on the Subscription's relationship, and the result was that session.delete(comment) would delete the subscriptions on the comment properly, but session.query(Comment).filter_by(id = id).delete() would fail with a foreign key violation, because the cascade was set at the ORM level instead of the Postgres level.
Needless to say I find this very confusing, and I want to disable it so that all foreign key handling on deletes is done by Postgres. I'm not finding an obvious way to do it looking at the docs. I read about passive deletes, which sounds like it does what I want except that it doesn't apply to objects already loaded into the session.
Is there a way to disable all ORM-level foreign key handling on deletes? And is there a good reason not to?
Huh, I tried out passive deletes and it seems like it does what I need after all. The words "The cascade="all, delete-orphan" will take effect for instances of MyOtherClass which are currently present in the session" made me think that it would still manually issue deletes for dependent objects if they were added to the session, but if I fetch the depended object and then the dependent one and then modify the dependent, verifying that it's in session.dirty, and then delete the depended and commit, SQLAlchemy doesn't manually issue a delete for the dependent object.
I have an entity which contains several sets/bags.
is there any recommendation from NHibernate on how to delete it?
of course I can do foreach on each list and delete each child, but that will create many delete statements. is it better to create HQL for each child table, or maybe other approach?
I've also seen on another thread to use IStatelessSession. is it wise here?
Personally I think HQL works well in this instance.
Or if your stomach can take it use cascading deletion at the database level then delete the parent and the children precede automatically.
Let's say we have two entities, A and B. B has a many-to-one relationship to A like follows:
#Entity
public class A {
#OneToMany(mappedBy="a_id")
private List<B> children;
}
#Entity
public class B {
private String data;
}
Now, I want to delete the A object and cascade the deletions to all its children B. There are two ways to do this:
Add cascade=CascadeType.ALL, orphanRemoval=true to the OneToMany annotation, letting JPA remove all children before removing the A-object from the database.
Leave the classes as they are and simply let the database cascade the deletion.
Is there any problem with using the later option? Will it cause the Entity Manager to keep references to already deleted objects? My reason for choosing option two over one is that option one generates n+1 SQL queries for a removal, which can take a prolonged time when object A contains a lot of children, while option two only generates a single SQL query and then moves on happily. Is there any "best practice" regarding this?
I'd prefer the database. Why?
The database is probably a lot faster doing this
The database should be the primary place to hold integrity and relationship information. JPA is just reflecting that information
If you're connecting with a different application / platform (i.e. without JPA), you can still cascadingly delete your records, which helps increase data integrity
In EclipseLink you can use both if you use the #CascadeOnDelete annotation. EclipseLink will also generate the cascade DDL for you.
See,
http://wiki.eclipse.org/EclipseLink/Examples/JPA/DeleteCascade
This optimizes the deletion by letting the database do it, but also maintains the cache and the persistence unit by removing the objects.
Note that orphanRemoval=true will also delete objects removed from the collection, which the database cascade constraint will not do for you, so having the rules in JPA is still necessary. There are also some relationships that the database cannot handle deletion for, as the database can only cascade in the inverse direction of the constraint, a OneToOne with a foreign key, or a OneToMany with a join table cannot be cascaded on the database.
This answer raises some really strong arguments about why it should be JPA that handles the cascade, not the database.
Here's the relevant quote:
...if you would make cascades on database, and not declare them in
Hibernate (for performance issues) you could in some circumstances get
errors. This is because Hibernate stores entities in its session
cache, so it would not know about database deleting something in
cascade.
When you use second-level cache, your situation is even worse, because
this cache lives longer than session and such changes on db-side will
be invisible to other sessions as long old values are stored in this
cache.
I'm using NHibernate (fluent) to access an old third-party database with a bunch of tables, that are not related in any explicit way. That is a child tables does have parentID columns which contains the primary key of the parent table, but there are no foreign key relations ensuring these relations. Ideally I would like to add some foreign keys, but cannot touch the database schema.
My application works fine, but I would really like impose a referential integrity rule that would prohibit deletion of parent objects if they have children, e.i. something similar 'ON DELETE RESTRICT' but maintained by NHibernate.
Any ideas on how to approach this would be appreciated. Should I look into the OnDelete() method on the IInterceptor interface, or are there other ways to solve this?
Of course any solution will come with a performance penalty, but I can live with that.
I can't think of a way to do this in NHibernate because it would require that NHibernate have some knowledge of the relationships. I would handle this in code using the sepecification pattern. For example (using a Company object with links to Employee objects):
public class CanDeleteCompanySpecification
{
bool IsSatisfiedBy(Company candidate)
{
// Check for related Employee records by loading collection
// or using COUNT(*).
// Return true if there are no related records and the Company can be deleted.
// Hope that no linked Employee records are created before the delete commits.
}
}
Can anyone provide a clear explanation / example of what these functions do, and when it's appropriate to use them?
Straight from the manual...
We know that the foreign keys disallow creation of orders that do not relate to any products. But what if a product is removed after an order is created that references it? SQL allows you to handle that as well. Intuitively, we have a few options:
Disallow deleting a referenced product
Delete the orders as well
Something else?
CREATE TABLE order_items (
product_no integer REFERENCES products ON DELETE RESTRICT,
order_id integer REFERENCES orders ON DELETE CASCADE,
quantity integer,
PRIMARY KEY (product_no, order_id)
);
Restricting and cascading deletes are the two most common options. RESTRICT prevents deletion of a referenced row. NO ACTION means that if any referencing rows still exist when the constraint is checked, an error is raised; this is the default behavior if you do not specify anything. (The essential difference between these two choices is that NO ACTION allows the check to be deferred until later in the transaction, whereas RESTRICT does not.) CASCADE specifies that when a referenced row is deleted, row(s) referencing it should be automatically deleted as well. There are two other options: SET NULL and SET DEFAULT. These cause the referencing columns to be set to nulls or default values, respectively, when the referenced row is deleted. Note that these do not excuse you from observing any constraints. For example, if an action specifies SET DEFAULT but the default value would not satisfy the foreign key, the operation will fail.
Analogous to ON DELETE there is also ON UPDATE which is invoked when a referenced column is changed (updated). The possible actions are the same.
edit: You might want to take a look at this related question: When/Why to use Cascading in SQL Server?. The concepts behind the question/answers are the same.
I have a PostGreSQL database and I use On Delete when I have a user that I delete from the database and I need to delete it's information from other table. This ways I need to do only 1 delete and FK that has ON delete will delete information from other table.
You can do the same with ON Update. If you update the table and the field have a FK with On Update, if a change is made on the FK you will be noticed on the FK table.
What Daok says is true... it can be rather convenient. On the other hand, having things happen automagically in the database can be a real problem, especially when it comes to eliminating data. It's possible that in the future someone will count on the fact that FKs usually prevent deletion of parents when there are children and not realize that your use of On Delete Cascade not only doesn't prevent deletion, it makes huge amounts of data in dozens of other tables go away thanks to a waterfall of cascading deletes.
#Arthur's comment.
The more frequently "hidden" things happen in the database the less likely it becomes that anyone will ever have a good handle on what is going on. Triggers (and this is essentially a trigger) can cause my simple action of deleting a row, to have wide ranging consequences throughout my database. I issue a Delete statement and 17 tables are affected with cascades of triggers and constraints and none of this is immediately apparent to the issuer of the command. OTOH, If I place the deletion of the parent and all its children in a procedure then it is very easy and clear for anyone to see EXACTLY what is going to happen when I issue the command.
It has absolutely nothing to do with how well I design a database. It has everything to do with the operational issues introduced by triggers.
Instead of writing the method to do all the work, of the cascade delete or cascade update, you could simply write a warning message instead. A lot easier than reinventing the wheel, and it makes it clear to the client (and new developers picking up the code)