Summary
In an Azure database (using SQL Server Management Studio 17, so T-SQL) I seek to concatenate multiple parent-child relationships of different lengths.
Base Table
My table is of this form:
ID parent
1 2
2 NULL
3 2
4 3
5 NULL
Feel free to use this code to generate and fill it:
DECLARE #t TABLE (
ID int,
parent int
)
INSERT #t VALUES
( 1, 2 ),
( 2, NULL ),
( 3, 2 ),
( 4, 3 ),
( 5, NULL )
Issue
How do I receive a table with the path concatenation as shown in the following table?
ID path parentcount
1 2->1 1
2 2 0
3 2->3 1
4 2->3->4 2
5 5 0
Detail
The real table has many more rows and the longest path should contain ~15 IDs. So it would be ideal to find a solution that is dynamic in the aspect of parent count definition.
Also: I do not necessarily need the column 'parentcount', so feel free to skip that in answers.
select ##version:
Microsoft SQL Azure (RTM) - 12.0.2000.8
You can use a recursive CTE for this:
with cte as (
select id, parent, convert(varchar(max), concat(id, '')) as path, 0 as parentcount
from #t t
union all
select cte.id, t.parent, convert(varchar(max), concat(t.id, '->', path)), parentcount + 1
from cte join
#t t
on cte.parent = t.id
)
select top (1) with ties *
from cte
order by row_number() over (partition by id order by parentcount desc);
Clearly Gordon nailed it with a recursive CTE, but here is another option using the HierarchyID data type.
Example
Declare #YourTable Table ([ID] int,[parent] int)
Insert Into #YourTable Values
(1,2)
,(2,NULL)
,(3,2)
,(4,3)
,(5,NULL)
;with cteP as (
Select ID
,Parent
,HierID = convert(hierarchyid,concat('/',ID,'/'))
From #YourTable
Where Parent is Null
Union All
Select ID = r.ID
,Parent = r.Parent
,HierID = convert(hierarchyid,concat(p.HierID.ToString(),r.ID,'/'))
From #YourTable r
Join cteP p on r.Parent = p.ID
)
Select ID
,Parent
,[Path] = HierID.GetDescendant ( null , null ).ToString()
,ParentCount = HierID.GetLevel() - 1
From cteP A
Order By A.HierID
Returns
I am having a difficult time with this one. I have seen a few examples on how to obtain all child records from a self referencing table given a parent and even how to get the parents of child records.
What I am trying to do is return a record and all child records given the ID.
To put this into context - I have a corporate hierarchy. Where:
#Role Level#
--------------------
Corporate 0
Region 1
District 2
Rep 3
What I need is a procedure that (1) figures out what level the record is and (2) retrieves that record and all children records.
The idea being a Region can see all districts and reps in a district, Districts can see their reps. Reps can only see themselves.
I have table:
ID ParentId Name
-------------------------------------------------------
1 Null Corporate HQ
2 1 South Region
3 1 North Region
4 1 East Region
5 1 West Region
6 3 Chicago District
7 3 Milwaukee District
8 3 Minneapolis District
9 6 Gold Coast Dealer
10 6 Blue Island Dealer
How do I do this:
CREATE PROCEDURE GetPositions
#id int
AS
BEGIN
--What is the most efficient way to do this--
END
GO
For example the expected result for #id = 3, I would want to return:
3, 6, 7, 8, 9, 10
I'd appreciate any help or ideas on this.
You could do this via a recursive CTE:
DECLARE #id INT = 3;
WITH rCTE AS(
SELECT *, 0 AS Level FROM tbl WHERE Id = #id
UNION ALL
SELECT t.*, r.Level + 1 AS Level
FROM tbl t
INNER JOIN rCTE r
ON t.ParentId = r.ID
)
SELECT * FROM rCTE OPTION(MAXRECURSION 0);
ONLINE DEMO
Assuming that you're on a reasonably modern version of SQL Server, you can use the hierarchyid datatype with a little bit of elbow grease. First, the setup:
alter table [dbo].[yourTable] add [path] hierarchyid null;
Next, we'll populate the new column:
with cte as (
select *, cast(concat('/', ID, '/') as varchar(max)) as [path]
from [dbo].[yourTable]
where [ParentID] is null
union all
select child.*,
cast(concat(parent.path, child.ID, '/') as varchar(max)) as [path]
from [dbo].[yourTable] as child
join cte as parent
on child.ParentID = parent.ID
)
update t
set path = c.path
from [dbo].[yourTable] as t
join cte as c
on t.ID = c.ID;
This is just a bog standard recursive table expression with one calculated column that represents the hierarchy. That's the hard part. Now, your procedure can look something like this:
create procedure dbo.GetPositions ( #id int ) as
begin
declare #h hierarchyid
set #h = (select Path from [dbo].[yourTable] where ID = #id);
select ID, ParentID, Name
from [dbo].[yourTable]
where Path.IsDescendentOf(#h) = 1;
end
So, to wrap up, all you're doing with the hierarchyid is storing the lineage for a given row so that you don't have to calculate it on the fly at select time.
I'm not a SQL expert, but if anybody can help me.
I use a recursive CTE to get the values as below.
Child1 --> Parent 1
Parent1 --> Parent 2
Parent2 --> NULL
If data population has gone wrong, then I'll have something like below, because of which CTE may go to infinite recursive loop and gives max recursive error. Since the data is huge, I cannot check this bad data manually. Please let me know if there is a way to find it out.
Child1 --> Parent 1
Parent1 --> Child1
or
Child1 --> Parent 1
Parent1 --> Parent2
Parent2 --> Child1
With Postgres it's quite easy to prevent this by collecting all visited nodes in an array.
Setup:
create table hierarchy (id integer, parent_id integer);
insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3),
(5, 4),
(3, 5); -- endless loop
Recursive query:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;
To do this for multiple trees at the same time, you need to carry over the ID of the root node to the children:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents,
id as root_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id,
p.root_id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
and c.root_id = p.root_id
)
select *
from tree;
Update for Postgres 14
Postgres 14 introduced the (standard compliant) CYCLE option to detect cycles:
with recursive tree as (
select id,
parent_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id
from hierarchy c
join tree p
on c.parent_id = p.id
)
cycle id -- track cycles for this column
set is_cycle -- adds a boolean column is_cycle
using path -- adds a column that contains all parents for the id
select *
from tree
where not is_cycle
You haven't specified the dialect or your column names, so it is difficult to make the perfect example...
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 2, 'SubChild')
-- End random data
;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
UNION ALL
SELECT R.StartingID, R.Level + 1,
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.*
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description
FROM RecursiveCTE
ORDER BY StartingID, Level
Something like this will show if/where there are loops in the recursive cte. Look at the column Loop. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.
In the end the recursive cte creates a VARCHAR(MAX) of ids in the form |id1|id2|id3| (called Parents) and then checks if the current ID is already in that "list". If yes, it sets the Loop column to 1. This column is checked in the recursive join (the ABD R.Loop = 0).
The ending query uses a MAX() OVER (PARTITION BY ...) to set to 1 the Loop column for a whole "block" of chains.
A little more complex, that generates a "better" report:
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 3, 'SubChild')
-- End random data
-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
SELECT MT1.* FROM #MyTable MT1
WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)
, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
SELECT ID, -- StartingID
1, -- Level
'|' + CAST(ID AS VARCHAR(MAX)) + '|',
'|' + CAST(Description AS VARCHAR(MAX)) + '|',
0, -- Loop
ParentID
FROM WithoutChildren
UNION ALL
SELECT R.StartingID, -- StartingID
R.Level + 1, -- Level
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.ParentID
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT * FROM RecursiveCTE
WHERE ParentID IS NULL OR Loop = 1
This query should return all the "last child" rows, with the full parent chain. The column Loop is 0 if there is no loop, 1 if there is a loop.
Here's an alternate method for detecting cycles in adjacency lists (parent/child relationships) where nodes can only have one parent which can be enforced with a unique constraint on the child column (id in the table below). This works by computing the closure table for the adjacency list via a recursive query. It starts by adding every node to the closure table as its own ancestor at level 0 then iteratively walks the adjacency list to expand the closure table. Cycles are detected when a new record's child and ancestor are the same at any level other than the original level zero (0):
-- For PostgreSQL and MySQL 8 use the Recursive key word in the CTE code:
-- with RECURSIVE cte(ancestor, child, lev, cycle) as (
with cte(ancestor, child, lev, cycle) as (
select id, id, 0, 0 from Table1
union all
select cte.ancestor
, Table1.id
, case when cte.ancestor = Table1.id then 0 else cte.lev + 1 end
, case when cte.ancestor = Table1.id then cte.lev + 1 else 0 end
from Table1
join cte
on cte.child = Table1.PARENT_ID
where cte.cycle = 0
) -- In oracle uncomment the next line
-- cycle child set isCycle to 'Y' default 'N'
select distinct
ancestor
, child
, lev
, max(cycle) over (partition by ancestor) cycle
from cte
Given the following adjacency list for Table1:
| parent_id | id |
|-----------|----|
| (null) | 1 |
| (null) | 2 |
| 1 | 3 |
| 3 | 4 |
| 1 | 5 |
| 2 | 6 |
| 6 | 7 |
| 7 | 8 |
| 9 | 10 |
| 10 | 11 |
| 11 | 9 |
The above query which works on SQL Sever (and Oracle, PostgreSQL and MySQL 8 when modified as directed) rightly detects that nodes 9, 10, and 11 participate in a cycle of length 3.
SQL(/DB) Fiddles demonstrating this in various DBs can be found below:
Oracle 11gR2
SQL Server 2017
PostgeSQL 9.5
MySQL 8
You can use the same approach described by Knuth for detecting a cycle in a linked list here. In one column, keep track of the children, the children's children, the children's children's children, etc. In another column, keep track of the grandchildren, the grandchildren's grandchildren, the grandchildren's grandchildren's grandchildren, etc.
For the initial selection, the distance between Child and Grandchild columns is 1. Every selection from union all increases the depth of Child by 1, and that of Grandchild by 2. The distance between them increases by 1.
If you have any loop, since the distance only increases by 1 each time, at some point after Child is in the loop, the distance will be a multiple of the cycle length. When that happens, the Child and the Grandchild columns are the same. Use that as an additional condition to stop the recursion, and detect it in the rest of your code as an error.
SQL Server sample:
declare #LinkTable table (Parent int, Child int);
insert into #LinkTable values (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (7, 1);
with cte as (
select lt1.Parent, lt1.Child, lt2.Child as Grandchild
from #LinkTable lt1
inner join #LinkTable lt2 on lt2.Parent = lt1.Child
union all
select cte.Parent, lt1.Child, lt3.Child as Grandchild
from cte
inner join #LinkTable lt1 on lt1.Parent = cte.Child
inner join #LinkTable lt2 on lt2.Parent = cte.Grandchild
inner join #LinkTable lt3 on lt3.Parent = lt2.Child
where cte.Child <> cte.Grandchild
)
select Parent, Child
from cte
where Child = Grandchild;
Remove one of the LinkTable records that causes the cycle, and you will find that the select no longer returns any data.
Try to limit the recursive result
WITH EMP_CTE AS
(
SELECT
0 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees
WHERE ManagerId IS NULL
UNION ALL
SELECT
[LEVEL] + 1 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees e
INNER JOIN EMP_CTE c ON e.ManagerId = c.EmployeeId
AND s.LEVEL < 100 --RECURSION LIMIT
)
SELECT * FROM EMP_CTE WHERE [Level] = 100
Here is the solution for SQL Server:
Table Insert script:
CREATE TABLE MyTable
(
[ID] INT,
[ParentID] INT,
[Name] NVARCHAR(255)
);
INSERT INTO MyTable
(
[ID],
[ParentID],
[Name]
)
VALUES
(1, NULL, 'A root'),
(2, NULL, 'Another root'),
(3, 1, 'Child of 1'),
(4, 3, 'Grandchild of 1'),
(5, 4, 'Great grandchild of 1'),
(6, 1, 'Child of 1'),
(7, 8, 'Child of 8'),
(8, 7, 'Child of 7'), -- This will cause infinite recursion
(9, 1, 'Child of 1');
Script to find the exact records which are the culprit:
;WITH RecursiveCTE
AS (
-- Get all parents:
-- Any record in MyTable table could be an Parent
-- We don't know here yet which record can involve in an infinite recursion.
SELECT ParentID AS StartID,
ID,
CAST(Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM MyTable
UNION ALL
-- Recursively try finding all the childrens of above parents
-- Keep on finding it until this child become parent of above parent.
-- This will bring us back in the circle to parent record which is being
-- keep in the StartID column in recursion
SELECT RecursiveCTE.StartID,
t.ID,
CAST(RecursiveCTE.[ParentChildRelationPath] + ' -> ' + t.Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM RecursiveCTE
INNER JOIN MyTable AS t
ON t.ParentID = RecursiveCTE.ID
WHERE RecursiveCTE.StartID != RecursiveCTE.ID)
-- FInd the ones which causes the infinite recursion
SELECT StartID,
[ParentChildRelationPath],
RecursiveCTE.ID
FROM RecursiveCTE
WHERE StartID = ID
OPTION (MAXRECURSION 0);
Output of above query:
I use the following query to retrieve the parent-child relationship data, from a table which is self referencing to the parent.
-- go down the hierarchy and get the childs
WITH ChildLocations(LocationId, FkParentLocationId, [Level])
AS
(
(
-- Start CTE off by selecting the home location of the user
SELECT l.LocationId, l.FkParentLocationId, 0 as [Level]
FROM Location l
WHERE l.LocationId = #locationId
)
UNION ALL
-- Recursively add locations that are children of records already found in previous iterations.
SELECT l2.LocationId, l2.FkParentLocationId, [Level] + 1
FROM ChildLocations tmp
INNER JOIN Location l2
ON l2.FkParentLocationId = tmp.LocationId
)
INSERT INTO #tmp
SELECT * from ChildLocations;
The table has the following fields:
LocationId, FkParentLocationId, FkLocationTypeId, etc...
This works fine, but how I want to retrieve it is as follows:
Parent 1
Child 1
Child 2
Child 21
Child 3
Child 31
Parent 2
Child 4
Child 5
Child 6
What is currently gives is like:
Parent 1
Parent 2
Child 1
Child 2
Child 3
Child 4
etc....
How can I modify the above to get it in the order I want.
What about to append an 'order' field? This may be an approach:
WITH ChildLocations(LocationId, FkParentLocationId, [Level])
AS
(
(
-- Start CTE off by selecting the home location of the user
SELECT l.LocationId, l.FkParentLocationId, 0 as [Level],
cast( str( l.locationId ) as varchar(max) ) as orderField
FROM Location l
WHERE l.LocationId = #locationId
)
UNION ALL
-- Recursively add locations that are children ...
SELECT l2.LocationId, l2.FkParentLocationId, [Level] + 1,
tmp.orderField + '-' +
str(tmp.locationId) as orderField
FROM ChildLocations tmp
INNER JOIN Location l2
ON l2.FkParentLocationId = tmp.LocationId
)
SELECT * from ChildLocations order by orderField;
Remember than Order by in an Insert is not allowed.
Take a look a sample
I have a tree in my database that is stored using parent id links.
A sample of what I have for data in the table is:
id | name | parent id
---+-------------+-----------
0 | root | NULL
1 | Node 1 | 0
2 | Node 2 | 0
3 | Node 1.1 | 1
4 | Node 1.1.1 | 3
5 | Node 1.1.2 | 3
Now I would like to get a list of all the direct descendants of a given node but if none exist I would like to have it just return the node itself.
I want the return for the query for children of id = 3 to be:
children
--------
4
5
Then the query for the children of id = 4 to be:
children
--------
4
I can change the way I am storing the tree to a nested set but I don't see how that would make the query I want possible.
In new PostgreSQL 8.4 you can do it with a CTE:
WITH RECURSIVE q AS
(
SELECT h, 1 AS level, ARRAY[id] AS breadcrumb
FROM t_hierarchy h
WHERE parent = 0
UNION ALL
SELECT hi, q.level + 1 AS level, breadcrumb || id
FROM q
JOIN t_hierarchy hi
ON hi.parent = (q.h).id
)
SELECT REPEAT(' ', level) || (q.h).id,
(q.h).parent,
(q.h).value,
level,
breadcrumb::VARCHAR AS path
FROM q
ORDER BY
breadcrumb
See this article in my blog for details:
PostgreSQL 8.4: preserving order for hierarchical query
In 8.3 or earlier, you'll have to write a function:
CREATE TYPE tp_hierarchy AS (node t_hierarchy, level INT);
CREATE OR REPLACE FUNCTION fn_hierarchy_connect_by(INT, INT)
RETURNS SETOF tp_hierarchy
AS
$$
SELECT CASE
WHEN node = 1 THEN
(t_hierarchy, $2)::tp_hierarchy
ELSE
fn_hierarchy_connect_by((q.t_hierarchy).id, $2 + 1)
END
FROM (
SELECT t_hierarchy, node
FROM (
SELECT 1 AS node
UNION ALL
SELECT 2
) nodes,
t_hierarchy
WHERE parent = $1
ORDER BY
id, node
) q;
$$
LANGUAGE 'sql';
and select from this function:
SELECT *
FROM fn_hierarchy_connect_by(4, 1)
The first parameter is the root id, the second should be 1.
See this article in my blog for more detail:
Hierarchical queries in PostgreSQL
Update:
To show only the first level children, or the node itself if the children do not exist, issue this query:
SELECT *
FROM t_hierarchy
WHERE parent = #start
UNION ALL
SELECT *
FROM t_hierarchy
WHERE id = #start
AND NOT EXISTS
(
SELECT NULL
FROM t_hierarchy
WHERE parent = #start
)
This is more efficient than a JOIN, since the second query will take but two index scans at most: the first one to make sure to find out if a child exists, the second one to select the parent row if no children exist.
Found a query that works the way I wanted.
SELECT * FROM
( SELECT id FROM t_tree WHERE name = '' ) AS i,
t_tree g
WHERE
( ( i.id = g.id ) AND
NOT EXISTS ( SELECT * FROM t_tree WHERE parentid = i.id ) ) OR
( ( i.id = g.parentid ) AND
EXISTS ( SELECT * FROM t_tree WHERE parentid = i.id ) )