Deleting hierarchical data in SQL table

Deleting hierarchical data in SQL table - sql

I have a table with hierarchical data.
A column "ParentId" that holds the Id ("ID" - key column) of it's parent.
When deleting a row, I want to delete all children (all levels of nesting).
How to do it?
Thanks

On SQL Server: Use a recursive query. Given CREATE TABLE tmp(Id int, Parent int), use
WITH x(Id) AS (
SELECT #Id
UNION ALL
SELECT tmp.Id
FROM tmp
JOIN x ON tmp.Parent = x.Id
)
DELETE tmp
FROM x
JOIN tmp ON tmp.Id = x.Id

Add a foreign key constraint. The following example works for MySQL (syntax reference):
ALTER TABLE yourTable
ADD CONSTRAINT makeUpAConstraintName
FOREIGN KEY (ParentID) REFERENCES yourTable (ID)
ON DELETE CASCADE;
This will operate on the database level, the dbms will ensure that once a row is deleted, all referencing rows will be deleted, too.

When the number of rows is not too large, erikkallen's recursive approach works.
Here's an alternative that uses a temporary table to collect all children:
create table #nodes (id int primary key)
insert into #nodes (id) values (#delete_id)
while ##rowcount > 0
insert into #nodes
select distinct child.id
from table child
inner join #nodes parent on child.parentid = parent.id
where child.id not in (select id from #nodes)
delete
from table
where id in (select id from #nodes)
It starts with the row with #delete_id and descends from there. The where statement is to protect from recursion; if you are sure there is none, you can leave it out.

Depends how you store your hierarchy. If you only have ParentID, then it may not be the most effective approach you took. For ease of subtree manipulation you should have an additional column Parents that wouls store all parent IDs like:
/1/20/25/40
This way you'll be able to get all sub-nodes simply by:
where Parents like #NodeParents + '%'
Second approach
Instead of just ParentID you could also have left and right values. Inserts doing it this way are slower, but select operations are extremely fast. Especially when dealing with sub-tree nodes... http://en.wikipedia.org/wiki/Tree_traversal
Third approach
check recursive CTEs if you use SQL 2005+
Fourth approach
If you use SQL 2008, check HierarchyID type. It gives enough possibilities for your case.
http://msdn.microsoft.com/en-us/magazine/cc794278.aspx

Add a trigger to the table like this
create trigger TD_MyTable on myTable for delete as
-- Delete one level of children
delete M from deleted D inner join myTable M
on D.ID = M.ID
Each delete will call a delete on the same table, repeatedly calling the trigger. Check books online for additional rules. There may be a restriction to the number of times a trigger can nest.
ST

Depends on your database. If you are using Oracle, you could do something like this:
DELETE FROM Table WHERE ID IN (
SELECT ID FROM Table
START WITH ID = id_to_delete
CONNECT BY PRIOR.ID = ParentID
)
ETA:
Without CONNECT BY, it gets a bit trickier. As others have suggested, a trigger or cascading delete constraint would probably be easiest.

Triggers can only be used for hierarchies 32 levels deep or less:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/05/11/defensive-database-programming-fun-with-triggers.aspx

What you want is referential integrity between these tables.

Related

Left join not being optimized

On a SQL Server database, consider a classical parent-child relation like the following:
create table Parent(
p_id uniqueidentifier primary key,
p_col1 int,
p_col2 int
);
create table Child(
c_id uniqueidentifier primary key,
c_p uniqueidentifier foreign key references Parent(p_id)
);
declare #Id int
set #Id = 1
while #Id <= 10000
begin
insert into Parent(p_id, p_col1, p_col2) values (NEWID(), #Id, #Id);
set #Id=#Id+1;
end
insert into Child(c_id, c_p) select NEWID(), p_id from Parent;
insert into Child(c_id, c_p) select NEWID(), p_id from Parent;
insert into Child(c_id, c_p) select NEWID(), p_id from Parent;
;
Now I have these two equivalent queries, one using inner and the other using left join:
Inner query:
select *
from Child c
inner join Parent p
on p.p_id=c.c_p
where p.p_col1=1 or p.p_col2=2;
Left Join query:
select *
from Child c
left join Parent p
on p.p_id=c.c_p
where p.p_col1=1 or p.p_col2=2;
I thought that the sql optimizer would be smart enough to figure out the same execution plan for these two query, but it's not the case.
The plan for the inner query is this:
The plan for the left join query is this:
The optimizer works nice, chosing the same plan, if I have only one condition like:
where p.p_col1=1
But if I add an "or" on a second different column then it doesn't chose the best plan anymore:
where p.p_col1=1 or p.p_col2=2;
Am I missing something or it is just the optimizer that is missing this improvement?

Clearly, it is the optimizer.
When you have one condition in the WHERE clause (and "condition" could be a condition connected with ANDs, but not ORs), then the optimizer can easily peak and say "yes, the condition has rows from the second table, there is no NULL value comparison, so this is really an inner join".
That logic gets harder when the conditions are connected by OR. I think you have observed that the optimizer does not do this for more complex conditions.

Sometimes if you change the order of the conditions the generated plans are different. The optimizer won't check all possible implementation scenarios (unfortunately). That is why sometimes you have to use hints for optimization.

How to auto increment a value in one table when inserted a row in another table

I currently have two tables:
Table 1 has a unique ID and a count.
Table 2 has some data columns and one column where the value of the unique ID of Table 1 is inside.
When I insert a row of data in Table 2, the the count for the row with the referenced unique id in Table 1 should be incremented.
Hope I made myself clear. I am very new to PostgreSQL and SQL in general, so I would appreciate any help how to do that. =)

You could achieve that with triggers.
Be sure to cover all kinds of write access appropriately if you do. INSERT, UPDATE, DELETE.
Also be aware that TRUNCATE on Table 2 or manual edits in Table 1 could break data integrity.
I suggest you consider a VIEW instead to return aggregated results that are automatically up to date. Like:
CREATE VIEW tbl1_plus_ct AS
SELECT t1.*, t2.ct
FROM tbl1 t1
LEFT JOIN (
SELECT tbl1_id, count(*) AS ct
FROM tbl2
GROUP BY 1
) t2 USING (tbl1_id)
If you use a LEFT JOIN, all rows of tbl1 are included, even if there is no reference in tbl2. With a regular JOIN, those rows would be omitted from the VIEW.
For all or much of the table, it is fastest to aggregate tbl2 first in a subquery, then join to tbl1 - like demonstrated above.
Instead of creating a view, you could also just use the query directly, and if you only fetch a single row, or only few, this alternative form would perform better:
SELECT t1.*, count(t2.tbl1_id) AS ct
FROM tbl1 t1
LEFT JOIN tbl2 t2 USING (tbl1_id)
WHERE t1.tbl1_id = 123 -- for example
GROUP BY t1.tbl1_id -- being the primary key of tbl1!

how to check if a key of a record is used in other tables as foreign key (sql)?

I have a table that its primary key "ID" field is used in many other table as foreign key.
How can I realize that a record from this table (for example first record "ID = 1") is used in other table?
I don't want to select from all other tables to understand it cause tables are so many and relations either. I searched for a solution, there were no working solutions or I got it wrong. Please help.

For a Generic way use this and you will get all the tables that have the foreign key, and then u can make a loop to check all tables in list. This way u can add foreign keys and no change in code will be needed...
SELECT
sys.sysobjects.name,
sys.foreign_keys.*
FROM
sys.foreign_keys
inner join sys.sysobjects on
sys.foreign_keys.parent_object_id = sys.sysobjects.id
WHERE
referenced_object_id = OBJECT_ID(N'[dbo].[TableName]')

You need to join all other tables. Like this:
select *
from Parents
where
exists(select * from Children1 where ...)
or exists(select * from Children2 where ...)
or exists(select * from Children3 where ...)
If all your FK columns are indexed this will be extremely efficient. You will get nice merge joins.

Delete all records that have no foreign key constraints

I have a SQL 2005 table with millions of rows in it that is being hit by users all day and night. This table is referenced by 20 or so other tables that have foreign key constraints. What I am needing to do on a regular basis is delete all records from this table where the "Active" field is set to false AND there are no other records in any of the child tables that reference the parent record. What is the most efficient way of doing this short of trying to delete each one at a time and letting it cause SQL errors on the ones that violate constraints? Also it is not an option to disable the constraints and I cannot cause locks on the parent table for any significant amount of time.

If it's not likely that inactive rows which are not linked will become linked, you can run (or even dynamically build, based on the foreign key metadata):
SELECT k.*
FROM k WITH(NOLOCK)
WHERE k.Active = 0
AND NOT EXISTS (SELECT * FROM f_1 WITH(NOLOCK) WHERE f_1.fk = k.pk)
AND NOT EXISTS (SELECT * FROM f_2 WITH(NOLOCK) WHERE f_2.fk = k.pk)
...
AND NOT EXISTS (SELECT * FROM f_n WITH(NOLOCK) WHERE f_n.fk = k.pk)
And you can turn it into a DELETE pretty easily. But a large delete could hold a lot of locks, so you might want to put this in a table and then delete in batches - a batch shouldn't fail unless a record got linked.
For this to be efficient, you really need to have indexes on the FK columns in the related tables.
You can also do this with left joins, but then you (sometimes) have to de-dupe with a DISTINCT or GROUP BY and the execution plan isn't really usually any better and it's not as conducive to code-generation:
SELECT k.*
FROM k WITH(NOLOCK)
LEFT JOIN f_1 WITH(NOLOCK) ON f_1.fk = k.pk
LEFT JOIN f_2 WITH(NOLOCK) ON f_2.fk = k.pk
...
LEFT JOIN f_n WITH(NOLOCK) ON f_n.fk = k.pk
WHERE k.Active = 0
AND f_1.fk IS NULL
AND f_2.fk IS NULL
...
AND f_n.fk IS NULL

Let us we have parent table with the name Parent and it has at "id" field of any type and an "Active" field of the type bit. We have also a second Child table with his own "id" field and "fk" field which is the reference to the "id" field of the Parent table. Then you can use following statement:
DELETE Parent
FROM Parent AS p LEFT OUTER JOIN Child AS c ON p.id=c.fk
WHERE c.id IS NULL AND p.Active=0

Slightly confused about your question. But you can do a LeftOuterJoin from your main table, To a table that it should supposedly have a foreign key. You can then use a Where statement to check for null values inside the connecting table.
Check here for outer joins : http://en.wikipedia.org/wiki/Join_%28SQL%29#Left_outer_join
You should also write up triggers to do all this for you when a record is deleted or set to false etc.

SQL - Temp Table: Storing all columns in temp table versus only Primary key

I would need to create a temp table for paging purposes. I would be selecting all records into a temp table and then do further processing with it.
I am wondering which of the following is a better approach:
1) Select all the columns of my Primary Table into the Temp Table and then being able to select the rows I would need
OR
2) Select only the primary key of the Primary Table into the Temp Table and then joining with the Primary Table later on?
Is there any size consideration when working with approach 1 versus approach 2?
[EDIT]
I am asking because I would have done the first approach but looking at PROCEDURE [dbo].[aspnet_Membership_FindUsersByName], that was included with ASP.NET Membership, they are doing Approach 2
[EDIT2]
With people without access to the Stored procedure:
-- Insert into our temp table
INSERT INTO #PageIndexForUsers (UserId)
SELECT u.UserId
FROM dbo.aspnet_Users u, dbo.aspnet_Membership m
WHERE u.ApplicationId = #ApplicationId AND m.UserId = u.UserId AND u.LoweredUserName LIKE LOWER(#UserNameToMatch)
ORDER BY u.UserName
SELECT u.UserName, m.Email, m.PasswordQuestion, m.Comment, m.IsApproved,
m.CreateDate,
m.LastLoginDate,
u.LastActivityDate,
m.LastPasswordChangedDate,
u.UserId, m.IsLockedOut,
m.LastLockoutDate
FROM dbo.aspnet_Membership m, dbo.aspnet_Users u, #PageIndexForUsers p
WHERE u.UserId = p.UserId AND u.UserId = m.UserId AND
p.IndexId >= #PageLowerBound AND p.IndexId <= #PageUpperBound
ORDER BY u.UserName

If you have a non-trivial amount of rows (more than 100) than a table variable's performance is generally going to be worse than a temp table equivalent. But test it to make sure.
Option 2 would use less resources, because there is less data duplication.
Tony's points about this being a dirty read are really something you should be considering.

With approach 1, the data in the temp table may be out of step with the real data, i.e. if other sessions make changes to the real data. This may be OK if you are just viewing a snapshot of the data taken at a certain point, but would be dangerous if you were also updating the real table based on changes made to the temporary copy.

This is exactly the approach I use for Paging on the server,
Create a Table Variable (why incur the overhead of transaction logging ?) With just the key values. (Create the table with an autonum Identity column Primary Key - this will be RowNum. )
Insert keys into the table based on users sort/filtering criteria.. Identity column is now a row number which can be used for paging.
Select from table variable joined to other tables with real data required, Joined on key value,
Where RowNum Between ((PageNumber-1) * PageSize) + 1 And PageNumber * PageSize

Think about it this way. Suppose your query would return enough records to populate 1000 pages. How many users do you think would really look at all those pages? By returning only the ids, you aren't returning a lot of information you may or may not need to see. So it should save on network and server resources. And if they really do go through a lot of pages, it would take enough time that the data details might indeed need to be refreshed.

An alternative to paging (the way my company does it) is to use CTE's.
Check out this example from http://softscenario.blogspot.com/2007/11/sql-2005-server-side-paging-using-cte.html
CREATE PROC GetPagedEmployees (#NumbersOnPage INT=25,#PageNumb INT = 1)
AS BEGIN
WITH AllEmployees AS
(SELECT ROW_NUMBER() OVER (Order by [Person].[Contact].[LastName]) AS RowID,
[FirstName],[MiddleName],[LastName],[EmailAddress] FROM [Person].[Contact])
SELECT [FirstName],[MiddleName],[LastName],[EmailAddress]
FROM AllEmployees WHERE RowID BETWEEN
((#PageNumb - 1) * #NumbersOnPage) + 1 AND #PageNumb * NumbersOnPage
ORDER BY RowID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Deleting hierarchical data in SQL table - sql

I have a table with hierarchical data. A column "ParentId" that holds the Id ("ID" - key column) of it's parent. When deleting a row, I want to delete all children (all levels of nesting). How to do it? Thanks

On SQL Server: Use a recursive query. Given CREATE TABLE tmp(Id int, Parent int), use WITH x(Id) AS ( SELECT #Id UNION ALL SELECT tmp.Id FROM tmp JOIN x ON tmp.Parent = x.Id ) DELETE tmp FROM x JOIN tmp ON tmp.Id = x.Id

Triggers can only be used for hierarchies 32 levels deep or less: http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/05/11/defensive-database-programming-fun-with-triggers.aspx

What you want is referential integrity between these tables.

Related

Left join not being optimized

How to auto increment a value in one table when inserted a row in another table

how to check if a key of a record is used in other tables as foreign key (sql)?

Delete all records that have no foreign key constraints

SQL - Temp Table: Storing all columns in temp table versus only Primary key

Categories

Resources