Delete parent if it's not referenced by any other child - sql

I have an example situation: parent table has a column named id, referenced in child table as a foreign key.
When deleting a child row, how to delete the parent as well if it's not referenced by any other child?

In PostgreSQL 9.1 or later you can do this with a single statement using a data-modifying CTE. This is generally less error prone. It minimizes the time frame between the two DELETEs in which a race conditions could lead to surprising results with concurrent operations:
WITH del_child AS (
DELETE FROM child
WHERE child_id = 1
RETURNING parent_id, child_id
)
DELETE FROM parent p
USING del_child x
WHERE p.parent_id = x.parent_id
AND NOT EXISTS (
SELECT 1
FROM child c
WHERE c.parent_id = x.parent_id
AND c.child_id <> x.child_id -- !
);
db<>fiddle here
Old sqlfiddle
The child is deleted in any case. I quote the manual:
Data-modifying statements in WITH are executed exactly once, and
always to completion, independently of whether the primary query reads
all (or indeed any) of their output. Notice that this is different
from the rule for SELECT in WITH: as stated in the previous section,
execution of a SELECT is carried only as far as the primary query
demands its output.
The parent is only deleted if it has no other children.
Note the last condition. Contrary to what one might expect, this is necessary, since:
The sub-statements in WITH are executed concurrently with each other
and with the main query. Therefore, when using data-modifying
statements in WITH, the order in which the specified updates actually
happen is unpredictable. All the statements are executed with the same
snapshot (see Chapter 13), so they cannot "see" each others' effects
on the target tables.
Bold emphasis mine.
I used the column name parent_id in place of the non-descriptive id.
Eliminate race condition
To eliminate possible race conditions I mentioned above completely, lock the parent row first. Of course, all similar operations must follow the same procedure to make it work.
WITH lock_parent AS (
SELECT p.parent_id, c.child_id
FROM child c
JOIN parent p ON p.parent_id = c.parent_id
WHERE c.child_id = 12 -- provide child_id here once
FOR NO KEY UPDATE -- locks parent row.
)
, del_child AS (
DELETE FROM child c
USING lock_parent l
WHERE c.child_id = l.child_id
)
DELETE FROM parent p
USING lock_parent l
WHERE p.parent_id = l.parent_id
AND NOT EXISTS (
SELECT 1
FROM child c
WHERE c.parent_id = l.parent_id
AND c.child_id <> l.child_id -- !
);
This way only one transaction at a time can lock the same parent. So it cannot happen that multiple transactions delete children of the same parent, still see other children and spare the parent, while all of the children are gone afterwards. (Updates on non-key columns are still allowed with FOR NO KEY UPDATE.)
If such cases never occur or you can live with it (hardly ever) happening - the first query is cheaper. Else, this is the secure path.
FOR NO KEY UPDATE was introduced with Postgres 9.4. Details in the manual. In older versions use the stronger lock FOR UPDATE instead.

delete from child
where parent_id = 1
After deleted in the child do it in the parent:
delete from parent
where
id = 1
and not exists (
select 1 from child where parent_id = 1
)
The not exists condition will make sure it will only be deleted if it does not exist in the child. You can wrap both delete commands in a transaction:
begin;
first_delete;
second_delete;
commit;

Related

SQL get leaves of directed acyclic graph

I am quite new to SQL and have a rather basic question. Suppose I'm dealing with the following table structure:
CREATE TABLE nodes (
id INTEGER NOT NULL PRIMARY KEY,
parent INTEGER REFERENCES nodes(id)
);
If we hold an invariant that says, the parent of a node cannot be equivalent to any of its children, then by definition we will not have any loops in our graph. Now we are left with a disjoint directed acyclic graph.
The two questions I have then are:
If we cannot change the structure of the database: What select statement would I have to write to efficiently get all of the leaves in my database? I.e. the ids that don't have any children.
If we can change the structure of the tables: What could we change or add to make this select statement more efficient?
An example of output for the graph with five nodes whose parents where 3->2, 2->1, and 5->4 would output 3 and 5 because they are the only nodes that don't have children.
You can use NOT EXISTS and a correlated subquery that checks for node where the current not is the parent. For leafs no such record can exist.
SELECT *
FROM nodes n1
WHERE NOT EXISTS (SELECT *
FROM nodes n2
WHERE n2.parent = n1.id);
Another option is a left join joining possible children of a node. If there's a null for an id of the "children's side" of the join no child exists for the current node, it's a leaf.
SELECT *
FROM nodes n1
LEFT JOIN nodes n2
ON n2.parent = n1.id
WHERE n2.id IS NULL;
And, leaving denormalization away, I don't think there's much to change in the table's structure. Indexes could help though. One should be on id (but that's already the case because of the primary key constraint) and one on parent (but again such an index already exists because MySQL creates indexes for foreign key tuples).
For more complex graph queries, you may use Common Table Expressions (CTEs), standardized in SQL:99 and supported in MySQL since 8.0.1 (reference)
But as others pointed out, for the query you're interested in, a simple NOT EXISTS subquery or equivalent is enough. Yet another equivalent to those already posted would be using the EXCEPT set operation:
SELECT id FROM nodes
EXCEPT SELECT parent FROM nodes
I would do:
select *
from nodes
where id not in (select parent from nodes where parent is not null)

Could this SQL query be made more efficient?

I have a very large table of nodes (cardinality of about 600,000), each record in this table can have one or more types associated with it. There is a node_types table that contains these (30 or so) type definitions.
To connect the two, I have a third table called node_type_relations that simply links node ids to type ids.
I am trying to clean up orphaned node_type_relation entries after a cull of the node table. My query to delete any type relations for which the node no longer exists is;
DELETE FROM node_type_relations WHERE node_id NOT IN (SELECT id FROM nodes)
But judging by the speed at which this is running (one record being deleted per 10 seconds or so), it looks like Postgres is loading up the entire nodes table once for every record in the node_type_relations table (which is about 1.4million records in size).
I was about to dive in and write some code to do it more sensibly when I thought I'd ask here if the query could be turned inside-out somehow. Anything to avoid loading the nodes table more than once.
Thanks as always.
Edit with solution
Executing the query;
DELETE FROM node_type_relations WHERE NOT EXISTS (SELECT 1 FROM nodes WHERE nodes.id=node_type_relations.node_id)
appears to have had the desired effect and deleted all orphaned records (some 170,000) in a matter of seconds.
Maybe do a left join, and then delete where null.
So:
DELETE ntr
FROM node_type_relations ntr
LEFT JOIN nodes n
ON n.id = ntr.node_id
WHERE n.id IS NULL
#lynks' found the optimal query for his case himself - with an EXISTS semi-join:
DELETE FROM node_type_relations ntr
WHERE NOT EXISTS (
SELECT 1
FROM nodes n
WHERE n.id = ntr.node_id
);
A solution with JOIN syntax would have to be constructed like this in PostgreSQL:
DELETE FROM node_type_relations d
USING node_type_relations ntr
LEFT JOIN nodes n ON n.id = ntr.node_id
WHERE ntr.node_id = d.node_id
AND n.id IS NULL;

SQLite3 and "cascade" SELECTion

I have a parent table and a child table related to the parent table by some REFERENCE.
Suppose I exec a SELECT statement on the child and that it returns the at least one result. Can I arrange for my search to automatically yield all the content of all related parents with this child too?
Or must I always take the reference from the child and put this in a second SELECT statement and exec this myself?
You can use subqueries:
SELECT *
FROM Parent
WHERE Parent.Id IN (SELECT ParentId
FROM Child
WHERE Whatever_was_your_original_query)
Or a good old join:
SELECT Parent.*
FROM Parent INNER JOIN Child ON Parent.Id = Child.ParentId
WHERE Whatever_you_want_to_query
This is the very basic purpose of SQL. You will JOIN the two tables together to create one set of result rows with some or all columns from BOTH tables included.
For more info, see this page.

Delete all records that have no foreign key constraints

I have a SQL 2005 table with millions of rows in it that is being hit by users all day and night. This table is referenced by 20 or so other tables that have foreign key constraints. What I am needing to do on a regular basis is delete all records from this table where the "Active" field is set to false AND there are no other records in any of the child tables that reference the parent record. What is the most efficient way of doing this short of trying to delete each one at a time and letting it cause SQL errors on the ones that violate constraints? Also it is not an option to disable the constraints and I cannot cause locks on the parent table for any significant amount of time.
If it's not likely that inactive rows which are not linked will become linked, you can run (or even dynamically build, based on the foreign key metadata):
SELECT k.*
FROM k WITH(NOLOCK)
WHERE k.Active = 0
AND NOT EXISTS (SELECT * FROM f_1 WITH(NOLOCK) WHERE f_1.fk = k.pk)
AND NOT EXISTS (SELECT * FROM f_2 WITH(NOLOCK) WHERE f_2.fk = k.pk)
...
AND NOT EXISTS (SELECT * FROM f_n WITH(NOLOCK) WHERE f_n.fk = k.pk)
And you can turn it into a DELETE pretty easily. But a large delete could hold a lot of locks, so you might want to put this in a table and then delete in batches - a batch shouldn't fail unless a record got linked.
For this to be efficient, you really need to have indexes on the FK columns in the related tables.
You can also do this with left joins, but then you (sometimes) have to de-dupe with a DISTINCT or GROUP BY and the execution plan isn't really usually any better and it's not as conducive to code-generation:
SELECT k.*
FROM k WITH(NOLOCK)
LEFT JOIN f_1 WITH(NOLOCK) ON f_1.fk = k.pk
LEFT JOIN f_2 WITH(NOLOCK) ON f_2.fk = k.pk
...
LEFT JOIN f_n WITH(NOLOCK) ON f_n.fk = k.pk
WHERE k.Active = 0
AND f_1.fk IS NULL
AND f_2.fk IS NULL
...
AND f_n.fk IS NULL
Let us we have parent table with the name Parent and it has at "id" field of any type and an "Active" field of the type bit. We have also a second Child table with his own "id" field and "fk" field which is the reference to the "id" field of the Parent table. Then you can use following statement:
DELETE Parent
FROM Parent AS p LEFT OUTER JOIN Child AS c ON p.id=c.fk
WHERE c.id IS NULL AND p.Active=0
Slightly confused about your question. But you can do a LeftOuterJoin from your main table, To a table that it should supposedly have a foreign key. You can then use a Where statement to check for null values inside the connecting table.
Check here for outer joins : http://en.wikipedia.org/wiki/Join_%28SQL%29#Left_outer_join
You should also write up triggers to do all this for you when a record is deleted or set to false etc.

Deleting hierarchical data in SQL table

I have a table with hierarchical data.
A column "ParentId" that holds the Id ("ID" - key column) of it's parent.
When deleting a row, I want to delete all children (all levels of nesting).
How to do it?
Thanks
On SQL Server: Use a recursive query. Given CREATE TABLE tmp(Id int, Parent int), use
WITH x(Id) AS (
SELECT #Id
UNION ALL
SELECT tmp.Id
FROM tmp
JOIN x ON tmp.Parent = x.Id
)
DELETE tmp
FROM x
JOIN tmp ON tmp.Id = x.Id
Add a foreign key constraint. The following example works for MySQL (syntax reference):
ALTER TABLE yourTable
ADD CONSTRAINT makeUpAConstraintName
FOREIGN KEY (ParentID) REFERENCES yourTable (ID)
ON DELETE CASCADE;
This will operate on the database level, the dbms will ensure that once a row is deleted, all referencing rows will be deleted, too.
When the number of rows is not too large, erikkallen's recursive approach works.
Here's an alternative that uses a temporary table to collect all children:
create table #nodes (id int primary key)
insert into #nodes (id) values (#delete_id)
while ##rowcount > 0
insert into #nodes
select distinct child.id
from table child
inner join #nodes parent on child.parentid = parent.id
where child.id not in (select id from #nodes)
delete
from table
where id in (select id from #nodes)
It starts with the row with #delete_id and descends from there. The where statement is to protect from recursion; if you are sure there is none, you can leave it out.
Depends how you store your hierarchy. If you only have ParentID, then it may not be the most effective approach you took. For ease of subtree manipulation you should have an additional column Parents that wouls store all parent IDs like:
/1/20/25/40
This way you'll be able to get all sub-nodes simply by:
where Parents like #NodeParents + '%'
Second approach
Instead of just ParentID you could also have left and right values. Inserts doing it this way are slower, but select operations are extremely fast. Especially when dealing with sub-tree nodes... http://en.wikipedia.org/wiki/Tree_traversal
Third approach
check recursive CTEs if you use SQL 2005+
Fourth approach
If you use SQL 2008, check HierarchyID type. It gives enough possibilities for your case.
http://msdn.microsoft.com/en-us/magazine/cc794278.aspx
Add a trigger to the table like this
create trigger TD_MyTable on myTable for delete as
-- Delete one level of children
delete M from deleted D inner join myTable M
on D.ID = M.ID
Each delete will call a delete on the same table, repeatedly calling the trigger. Check books online for additional rules. There may be a restriction to the number of times a trigger can nest.
ST
Depends on your database. If you are using Oracle, you could do something like this:
DELETE FROM Table WHERE ID IN (
SELECT ID FROM Table
START WITH ID = id_to_delete
CONNECT BY PRIOR.ID = ParentID
)
ETA:
Without CONNECT BY, it gets a bit trickier. As others have suggested, a trigger or cascading delete constraint would probably be easiest.
Triggers can only be used for hierarchies 32 levels deep or less:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/05/11/defensive-database-programming-fun-with-triggers.aspx
What you want is referential integrity between these tables.