Constraint for one-to-many relationship - sql

We have a two tables with a one-to-many relationship. We would like to enforce a constraint that at least one child record exist for a given parent record.
Is this possible?
If not, would you change the schema a bit more complex to support such a constraint? If so how would you do it?
Edit: I'm using SQL Server 2005

Such a constraint isn't possible from a schema perspective, because you run into a "chicken or the egg" type of scenario. Under this sort of scenario, when I insert into the parent table I have to have a row in the child table, but I can't have a row in the child table until there's a row in the parent table.
This is something better enforced client-side.

It's possible if your back-end supports deferrable constraints, as does PostgreSQL.

How about a simple non nullable column?
Create Table ParentTable
(
ParentID
ChildID not null,
Primary Key (ParentID),
Foreign Key (ChildID ) references Childtable (ChildID));
)
If your business logic allows and you have default values you can query from the database for each new parent record, you can then use a before insert trigger on the parent table to populate the non nullable child column.
CREATE or REPLACE TRIGGER trigger_name
BEFORE INSERT
ON ParentTable
FOR EACH ROW
BEGIN
-- ( insert new row into ChildTable )
-- update childID column in ParentTable
END;

This isn't really something 'better enforced on the client side' so much as it is something that is impractical to enforce within certain database implementations. Realistically the job DOES belong in the database and at least one of the workarounds below should work.
Ultimately what you want is to constrain the parent to a child. This guarantees that a child exists. Unfortunately this causes a chicken-egg problem because the children must point to the same parent causing a constraint conflict.
Getting around the problem without visible side-effects in the rest of your system requires one of two abilities - neither of which is found in SQL Server.
1) Deferred constraint validation - This causes constraints to be validated at the end the transaction. Normally they happen at the end of a statement. This is the root of the chicken-egg problem since it prevents you from inserting either the first child or the parent row for lack of the other and this resolves it.
2) You can use a CTE to insert the first child where the CTE hangs off of the statement that inserts the parent (or vise versa). This inserts both rows in the same statement causing an effect similar to deferred constraint validation.
3) Without either you have no choice but to allow nulls in one of the references so you can insert that row without the dependency check. Then you must go back and update the null with the reference to the second row. If you use this technique you need to be careful to make the rest of the system refer to the parent table thru a view that hides all rows with null in the child reference column.
In any case your deletes of children are just as complicated because you cannot delete the child that proves at least one exists unless you update the parent first to point to a child that won't be deleted.
When you are about to delete the last child either you must throw an error or delete the parent at the same time. The error will occur automatically if you don't set the parent pointer to null first (or defer validation). If you do defer (or set the child pointer to null) your delete of the child will be possible and the parent can then be deleted as well.
I literally researched this for years and I watch every version of SQL Server for relief from this problem since it's so common.
PLEASE As soon as anyone has a practical solution please post!
P.S. You need to either use a compound key when referring to your proof-of-child row from the parent or a trigger to insure that the child providing proof actually considers that row to be its parent.
P.P.S Although it's true that null should never be visible to the rest of your system if you do both inserts and the update in the same transaction this relies on behavior that could fail. The point of the constraint is to insure that a logic failure won't leave your database in an invalid state. By protecting the table with a view that hides nulls any illegal row will not be visible. Clearly your insert logic must account for the possibility that such a row can exist but it needs inside knowledge anyway and nothing else needs to know.

I am encountering this issue, and have a solution implemented in Oracle rel.11.2.4.
To ensure that every child has a parent, I applied a typical foreign-key constraint from the FK of the child to the PK of the parent. -- no wizardry here
To ensure that every parent has at least one child, I did as follows:
Create a function which accepts a parent PK, and returns a COUNT of children for that PK. -- I ensure that NO_DATA_FOUND exceptions return 0.
Create a virtual column CHILD_COUNT on the parent table and calculate it to the function result.
Create a deferrable CHECK constraint on the CHILD_COUNT virtual column with the criteria of CHILD_COUNT > 0
It works as follows:
If a parent row is inserted and no children exist yet. then if that row is committed, the CHILD_COUNT > 0 CHECK constraint fails and the transaction rolls back.
If a parent row is inserted and a corresponding child row is inserted in the same transaction; then all integrity constraints are satisfied when a COMMIT is issued.
If a child row is inserted corresponding to an existing parent row, then the CHILD_COUNT virtual column is recalculated on COMMIT and no integrity violation occurs.
Any deletes of parent rows must cascade to the children otherwise orphaned child rows will violate the FOREIGN KEY constraint when the delete transaction is committed. (as usual)
Any deletes of child rows must leave at least one child for each parent otherwise the CHILD_COUNT check constraint will violate when the transaction commits.
NOTE: that I would not need a virtual column if Oracle would allow user-function-based CHECK constraints at rel.11.2.4.

Related

Delete records from relational database with no cascade involved

I need to delete records from relational database, where I attempt to start from the lowest children in the database.
I'm not very strong on how to approach the task. I don't want to do CASCADE delete, I actually want to do the opposite of CASCADE.
Is is correct that I have to find the entity that does not have child and start deleting the records there? and what if an entity has more that one foreign key, how do I decide on which parent table should I start to delete from?
You have to delete the child records first. If you try to delete a record that is referenced with a foreign key, you will get an error which should indicate which key has a conflict. You can then see which child table is impacted and delete the records that are referencing the foreign key, then try again.
You simply work your way up the chain. If more than one child record references a parent record, you simply delete all the child records first. If more than one parent record is referenced by a child record, it doesn't matter which parent is deleted first (or if they are deleted at all).
You don't give what database and what tools you have at hand.
You could manually diagram the database based on foreign keys or you could use a tool, such as visual studios to diagram your database.
As long as the multiple foreign relationships don't depend on one another it shouldn't matter where you start.

What a BEFORE trigger can do that an AFTER trigger can't and conversely

I've read a lot about these type of trigger and what I understood until now is that they can do exactly the same things, the difference is more about "taste" than what they really are able to do.
So what I'm asking is: is there something that a BEFORE trigger can't do compared to an AFTER trigger and conversely? Am I missing something?
The before trigger allows you to set action before any other action is done.
Just imagine a table with a foreign key column which is not nullable. You might insert this depending row before you insert the row which needs this FK to be set.
You could prohibit any action at all (no changes allowed to key tables...)
You could check for existance and do an update instead of an insert
and many more...
To Expand a little on #Shnugo's answer, note my comments are more specific to sql-server but I believe principles hold true for other rdbms that have the triggers.
What can BEFORE (or instead of) do that AFTER cannot
Say you have a BEFORE trigger on a blank table with an identity column that does nothing/no insert. To get a similar result in an AFTER trigger you would have to delete the inserted records.
Let's walk through inserting records with the different triggers. If BEFORE trigger is enabled and you do 100 inserts, but trigger doesn't actually insert them, then you disable the trigger and do an insert you will be at identity 1.
Do the same thing for an AFTER trigger and when you insert after you will be at 101. Because the records where actually inserted, but then deleted.
So a BEFORE trigger can stop an action completely where as an AFTER has to try to undo an action to get a similar result in the data. Complex Validation? Or in Shnugo's example more common insert a parent record before inserting a child to that parent so a foreign key constraint error doesn't occur.
What can an AFTER trigger do that a BEFORE cannot.
Use the identity column on an insert statement. In sql-server the special table inserted in the same BEFORE trigger as above will return Identity = 0 instead of Identity = 1. Where an AFTER trigger will have Identity = 1. So in the BEFORE you could avoid a foreign key constraint by inserting a parent and in the AFTER you can do the opposite you can insert a child record with the proper foreign key.

actual working of primary, foreign key and unique constraint , order/steps of their working

How do primary key , foreign key and unique constraints work? i mean in what sequence?
Like, when a child table has a FK, and a record is inserted into it , which doesn't exists in the parent table, then is this record first inserted into the child table & then the constraint checks in the Parent table if this record exists or not, and if it doesn't finds it then it rollbacks and removes the record from the Child table. is this the order of working?
or, does first SQL gets the record(on which the FK is made) from the insert query, & matches it with the parent table records, and ceases the insert when matching record is not found, while insertion itself and doesn't inserts the row in the child table?
Similarly, for the primary key, if a duplicate record is inserted in a table, then is it first inserted then checked or before insertion first it is matched with existing records, and if it is a duplicate one, then the query is ceased.
Logically speaking, all constraints are supposed to be checked simultaneously against the entire result of an UPDATE, INSERT or DELETE statement. The constraints are evaluated as if the modification to all rows had already happened and if any constraint would be violated then the modification is not permitted.
You need the basic of rdbms reference. Here is the free resource: http://msdn.microsoft.com/en-us/library/aa933098%28v=SQL.80%29.aspx
Consider the logical (conceptual) tables deleted and inserted that are accessible to a TRIGGER. Even these are only concepts. Who knows what's going on under the covers? ...well, someone is bound to know... but do you care what's going on under the covers? At the conceptual level, it either succeeds or fails or you can manipulate the outcome in a trigger. What more do you need to know? ;)

How to check that a record of master table is being used/referenced by child table

In my ERP application, I am taking a extra field named 'IsRemoved' as boolean. Whenever a user deletes any record, the record is not deleted - only its "IsRemoved" column gets value 'true'. It helps us to be able to recover data when ever we want, and its working fine.
The problem is that when ever user deletes record of Master Table, how can we check that all its child table not referring this record (because we do not preform physical deletion, we just mark "isremoved" field as true)?
Please provide me any query or sp from which I can do check that master record is used in any of its child or not.
From experience I have to tell you this design is horrible to work with. Consider changing the design to copy the data to an 'audit trail' table then physically remove it from the main table.
If you won't consider this, at the very least bury this in a VIEW and do everything you can to avoid exposing this to anyone wanting to query the database, using INSTEAD OF triggers on the VIEW if necessary. Otherwise, expect applications to have frequent bugs because someone forgot to add the AND isremoved = 0 predicate required by every query that uses this table.
But this 'answer' doesn't address the real question.
Yes. Sorry 'bout that. But sometimes you have to cure the disease rather than merely treat the symptoms.
The design is compromised: a table should model a single entity type, whereas this is modelling two. How can I tell? Because the OP has stated that once 'removed' the entity has different data requirement, by saying "The problem is ... how can we check that all its child table not referring this record".
So the 'real' answer is: move the entity to another distinct table.
But if you are in the business of treating symptoms then here's an answer:
Add the IsRemoved column to your
so-called 'child' tables, with DEFAULT false and ensure
it is NOT NULL.
Add a CHECK constraint to each
so-called 'child' table to test
isremoved = false (or whatever
'boolean' means in your SQL
product).
Add a compound key (e.g. using
UNIQUE or PRIMARY KEY) to your
so-called 'master' table on
(IsRemoved, <existing key columns
here>), or alter an existing key
accordingly.
Add FOREIGN KEY constraints to each
so-called 'child' table to reference
the compound key created above.
I think the comments on the question itself are quite pertinent. The question, at this point, is vague.
But, assuming that the child table also has a IsRemoved field - after all what would be the point of the child records remaining available if the master record is marked as removed? - why don't you implement a trigger on Master that, if IsRemoved is changed, also changes the IsRemoved flag on the Child?
This way the need to check the status of the master on the child is completely eliminated as they will be in sync as it pertains to the active or inactive status.
Can you, within a transaction, insert a replacement record with all values the same other than IsRemove='true', then remove the original record? This will generate a new primary key for the replacement record, so that old references cannot remain.
I assume you wish to detect the condition that child references a deleted record, and treat this as an error.
You can find the referenced table of master data table if we have put proper foreign key constraint in all the child tables using following query:
SELECT uc.table_name MAIN_TABLE_NAME,
ucc.column_name MAIN_TABLE_COLUMN_NAME,
ucc_ref.TABLE_NAME AS REFERENCED_TABLE_NAME,
ucc_ref.COLUMN_NAME AS REFERENCED_COLUMN_NAME,
ucc_ref.position
FROM USER_CONSTRAINTS uc,
USER_CONS_COLUMNS ucc_ref,
USER_CONS_COLUMNS ucc
WHERE uc.CONSTRAINT_TYPE = 'R'
AND uc.R_CONSTRAINT_NAME = ucc_ref.CONSTRAINT_NAME
AND ucc.Constraint_Name = uc.constraint_name
AND d ucc.table_name = uc.table_name
AND ucc_ref.position = ucc.position
AND uc.table_name = ?
ORDER BY ucc_ref.TABLE_NAME,ucc_ref.column_name
This query works in oracle. I am not sure about other databases.
Once you have found the referenced table, you need to find whether the master data record you are trying to delete exists in the referenced table or not. If the active record exists in the child table, you can throw an exception. The key thing here is that finding the referenced record alone is not sufficient. We might have cases where we find the master record reference in the child table, but the child record has also been cancelled. We have written an util for the delete pre check of master data using above concept.
USING THIS LOGIC:
SELECT t1.*
FROM Table1 AS t1
WHERE NOT EXISTS
( SELECT *
FROM Table2 AS t2
WHERE t2.FKcolumn = t1.PKcolumn
AND t2.columnX IS NULL
)

When to use "ON UPDATE CASCADE"

I use ON DELETE CASCADE regularly but I never use ON UPDATE CASCADE as I am not so sure in what situation it will be useful.
For the sake of discussion let see some code.
CREATE TABLE parent (
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
);
CREATE TABLE child (
id INT NOT NULL AUTO_INCREMENT, parent_id INT,
INDEX par_ind (parent_id),
FOREIGN KEY (parent_id)
REFERENCES parent(id)
ON DELETE CASCADE
);
For ON DELETE CASCADE, if a parent with an id is deleted, a record in child with parent_id = parent.id will be automatically deleted. This should be no problem.
This means that ON UPDATE CASCADE will do the same thing when id of the parent is updated?
If (1) is true, it means that there is no need to use ON UPDATE CASCADE if parent.id is not updatable (or will never be updated) like when it is AUTO_INCREMENT or always set to be TIMESTAMP. Is that right?
If (2) is not true, in what other kind of situation should we use ON UPDATE CASCADE?
What if I (for some reason) update the child.parent_id to be something not existing, will it then be automatically deleted?
Well, I know, some of the question above can be test programmatically to understand but I want also know if any of this is database vendor dependent or not.
Please shed some light.
It's true that if your primary key is just an identity value auto incremented, you would have no real use for ON UPDATE CASCADE.
However, let's say that your primary key is a 10 digit UPC bar code and because of expansion, you need to change it to a 13-digit UPC bar code. In that case, ON UPDATE CASCADE would allow you to change the primary key value and any tables that have foreign key references to the value will be changed accordingly.
In reference to #4, if you change the child ID to something that doesn't exist in the parent table (and you have referential integrity), you should get a foreign key error.
Yes, it means that for example if you do UPDATE parent SET id = 20 WHERE id = 10 all children parent_id's of 10 will also be updated to 20
If you don't update the field the foreign key refers to, this setting is not needed
Can't think of any other use.
You can't do that as the foreign key constraint would fail.
I think you've pretty much nailed the points!
If you follow database design best practices and your primary key is never updatable (which I think should always be the case anyway), then you never really need the ON UPDATE CASCADE clause.
Zed made a good point, that if you use a natural key (e.g. a regular field from your database table) as your primary key, then there might be certain situations where you need to update your primary keys. Another recent example would be the ISBN (International Standard Book Numbers) which changed from 10 to 13 digits+characters not too long ago.
This is not the case if you choose to use surrogate (e.g. artifically system-generated) keys as your primary key (which would be my preferred choice in all but the most rare occasions).
So in the end: if your primary key never changes, then you never need the ON UPDATE CASCADE clause.
Marc
A few days ago I've had an issue with triggers, and I've figured out that ON UPDATE CASCADE can be useful. Take a look at this example (PostgreSQL):
CREATE TABLE club
(
key SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
CREATE TABLE band
(
key SERIAL PRIMARY KEY,
name TEXT UNIQUE
);
CREATE TABLE concert
(
key SERIAL PRIMARY KEY,
club_name TEXT REFERENCES club(name) ON UPDATE CASCADE,
band_name TEXT REFERENCES band(name) ON UPDATE CASCADE,
concert_date DATE
);
In my issue, I had to define some additional operations (trigger) for updating the concert's table. Those operations had to modify club_name and band_name. I was unable to do it, because of reference. I couldn't modify concert and then deal with club and band tables. I couldn't also do it the other way. ON UPDATE CASCADE was the key to solve the problem.
The ON UPDATE and ON DELETE specify which action will execute when a row in the parent table is updated and deleted. The following are permitted actions : NO ACTION, CASCADE, SET NULL, and SET DEFAULT.
Delete actions of rows in the parent table
If you delete one or more rows in the parent table, you can set one of the following actions:
ON DELETE NO ACTION: SQL Server raises an error and rolls back the delete action on the row in the parent table.
ON DELETE CASCADE: SQL Server deletes the rows in the child table that is corresponding to the row deleted from the parent table.
ON DELETE SET NULL: SQL Server sets the rows in the child table to NULL if the corresponding rows in the parent table are deleted. To execute this action, the foreign key columns must be nullable.
ON DELETE SET DEFAULT: SQL Server sets the rows in the child table to their default values if the corresponding rows in the parent table are deleted. To execute this action, the foreign key columns must have default definitions. Note that a nullable column has a default value of NULL if no default value specified.
By default, SQL Server appliesON DELETE NO ACTION if you don’t explicitly specify any action.
Update action of rows in the parent table
If you update one or more rows in the parent table, you can set one of the following actions:
ON UPDATE NO ACTION: SQL Server raises an error and rolls back the update action on the row in the parent table.
ON UPDATE CASCADE: SQL Server updates the corresponding rows in the child table when the rows in the parent table are updated.
ON UPDATE SET NULL: SQL Server sets the rows in the child table to NULL when the corresponding row in the parent table is updated. Note that the foreign key columns must be nullable for this action to execute.
ON UPDATE SET DEFAULT: SQL Server sets the default values for the rows in the child table that have the corresponding rows in the parent table updated.
FOREIGN KEY (foreign_key_columns)
REFERENCES parent_table(parent_key_columns)
ON UPDATE <action>
ON DELETE <action>;
See the reference tutorial.
It's an excellent question, I had the same question yesterday. I thought about this problem, specifically SEARCHED if existed something like "ON UPDATE CASCADE" and fortunately the designers of SQL had also thought about that. I agree with Ted.strauss, and I also commented Noran's case.
When did I use it? Like Ted pointed out, when you are treating several databases at one time, and the modification in one of them, in one table, has any kind of reproduction in what Ted calls "satellite database", can't be kept with the very original ID, and for any reason you have to create a new one, in case you can't update the data on the old one (for example due to permissions, or in case you are searching for fastness in a case that is so ephemeral that doesn't deserve the absolute and utter respect for the total rules of normalization, simply because will be a very short-lived utility)
So, I agree in two points:
(A.) Yes, in many times a better design can avoid it; BUT
(B.) In cases of migrations, replicating databases, or solving emergencies, it's a GREAT TOOL that fortunately was there when I went to search if it existed.
My comment is mainly in reference to point #3: under what circumstances is ON UPDATE CASCADE applicable if we're assuming that the parent key is not updateable? Here is one case.
I am dealing with a replication scenario in which multiple satellite databases need to be merged with a master. Each satellite is generating data on the same tables, so merging of the tables to the master leads to violations of the uniqueness constraint. I'm trying to use ON UPDATE CASCADE as part of a solution in which I re-increment the keys during each merge. ON UPDATE CASCADE should simplify this process by automating part of the process.
To add to other great answers here it is important to use ON UPDATE CASCADE (or on DELETE CASCADE...) cautiously. Operations on tables with this specification require exclusive lock on underlaying relations.
If you have multiple CASCADE definitions in one table (as in other answer), and especially multiple tables using same definitions, and multiple users updating, this can create a deadlock when one process acquires exclusive lock on first underlaying table, other exclusive lock on second, and they block out each other by none of them being able to get both (all) exclusive locks to perform operation.