Finding depth of a multi-parent hierarchy - SQL - sql

I have a table with two columns: mother - node
This table forms the base of a hierarchy. A mother refers to a node.
Every node can have multiple mothers, and every mother can have multiple childs. This is accomplished by multiple rows.
If mother = NULL, then the node is a top level node. There can be several top-level nodes, and a node can be both a top-level node AND a child of another node.
e.g:
INSERT INTO MYTABLE VALUES(NULL, 2)
INSERT INTO MYTABLE VALUES(1, 2)
Im now building a procedure that will need to know the maximum depth of the hierarchy. Lets say node E is a child of node D who is a child of node C.
Node C is a top level node, and also a child of node B who is a child of node A
Node A is only a top level node.
If we say node A has a depth = 0. Then in this case the depth of node E should be 4.
Does anyone have a clue to how i could build a statement that would find this depth for me?
It would have to find the maximum depth of every node in the table and then return the max value of those.
Thanks!
Using SQL Server 2008 btw.
EDIT:
It is ONLY the absolute maximum depth of the table that is of interest. Not the depth of individual nodes.

Try this, it will find all the bottom levels of the hierarchy
declare #mytable table(id int, parent_id int)
INSERT INTO #MYTABLE VALUES(1, NULL)
INSERT INTO #MYTABLE VALUES(2, 1)
INSERT INTO #MYTABLE VALUES(3, 1) --*
INSERT INTO #MYTABLE VALUES(4, 2)
INSERT INTO #MYTABLE VALUES(5, 4) --*
;with a as
(
select
id,
parent_id,
1 lvl
from #mytable
where parent_id is null
union all
select
b.id,
b.parent_id,
lvl+1
from #mytable b
join a on a.id = b.parent_id
)
select
a.id,
a.parent_id,
a.lvl
from a
left join a b on a.id = b.parent_id
where b.id is null
option (maxrecursion 0)

Ok. I found a solution.
First i created a temp-table with two extra columns:
CREATE TABLE #ParentChildTempTable
(
Node NVARCHAR(50),
Parent NVARCHAR(50),
Depth TINYINT,
Measure_Depth BIT
)
I'm then setting the depth of true-top-level nodes to 0. False top-level nodes will have Measure_Depth set to 0.
UPDATE #ParentChildTempTable SET Depth = 0 WHERE Parent IS NULL AND node not in
(select Node from #ParentChildTempTable temp2 where temp2.Parent is not null)
UPDATE #ParentChildTempTable SET Measure_Depth = 0 WHERE Parent IS NULL AND node in
(select Node from #ParentChildTempTable temp2 where temp2.Parent is not null)
I then loop down from the top-level until every node has a depth (expect the false-top level)
WHILE EXISTS (SELECT * FROM #ParentChildTempTable WHERE Depth IS NULL AND Measure_Depth IS NULL)
UPDATE T SET T.Depth = P.Depth + 1
FROM #ParentChildTempTable AS T
INNER JOIN #ParentChildTempTable AS P ON (T.Parent=P.Node)
WHERE P.Depth>=0
AND T.Depth IS NULL
And voila! Max depth is found:
DECLARE #MaxDepth INT = (SELECT MAX(Depth) FROM #ParentChildTempTable)

Related

SQL Hierarchy - Resolve full path for all ancestors of a given node

I have a hierarchy described by an adjacency list. There is not necessarily a single root element, but I do have data to identify the leaf (terminal) items in the hiearchy. So, a hierachy that looked like this ...
1
- 2
- - 4
- - - 7
- 3
- - 5
- - 6
8
- 9
... would be described by a table, like this. NOTE: I don't have the ability to change this format.
id parentid isleaf
--- -------- ------
1 null 0
2 1 0
3 1 0
4 2 0
5 3 1
6 3 1
7 4 1
8 null 0
9 8 1
here is the sample table definition and data:
CREATE TABLE [dbo].[HiearchyTest](
[id] [int] NOT NULL,
[parentid] [int] NULL,
[isleaf] [bit] NOT NULL
)
GO
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (1, NULL, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (2, 1, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (3, 1, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (4, 2, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (5, 3, 1)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (6, 3, 1)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (7, 4, 1)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (8, NULL, 0)
INSERT [dbo].[HiearchyTest] ([id], [parentid], [isleaf]) VALUES (9, 8, 1)
GO
From this, I need to provide any id and get a list of all ancestors including all descendents of each. So, if I provided the input of id = 6, I would expect the following:
id descendentid
-- ------------
1 1
1 3
1 6
3 3
3 6
6 6
id 6 just has itself
its parent, id 3 would have decendents of 3 and 6
its parent, id 1 would have decendents of 1, 3, and 6
I will be using this data to provide roll-up calculations at each level in the hierarchy. This works well, assuming I can get the dataset above.
I have accomplished this using two recusive ctes - one to get the "terminal" item for each node in the hiearchy. Then, a second one where I get the full ancestory of my selected node (so, 6 resolves to 6, 3, 1) to walk up and get the full set. I'm hoping that I'm missing something and that this can be accomplished in one round. Here is the example double-recursion code:
declare #test int = 6;
with cte as (
-- leaf nodes
select id, parentid, id as terminalid
from HiearchyTest
where isleaf = 1
union all
-- walk up - preserve "terminal" item for all levels
select h.id, h.parentid, c.terminalid
from HiearchyTest as h
inner join
cte as c on h.id = c.parentid
)
, cte2 as (
-- get all ancestors of our test value
select id, parentid, id as descendentid
from cte
where terminalid = #test
union all
-- and walkup from each to complete the set
select h.id, h.parentid, c.descendentid
from HiearchyTest h
inner join cte2 as c on h.id = c.parentid
)
-- final selection - order by is just for readability of this example
select id, descendentid
from cte2
order by id, descendentid
Additional detail: the "real" hierarchy will be much larger than the example. It can technically have infinite depth, but realistically it would rarely go more than 10 levels deep.
In summary, my question is if I can accomplish this with a single recursive cte instead of having to recurse over the hierarchy twice.
Because your data is a tree structure, we can use the hierarchyid data type to meet your needs (despite your saying that you can't in the comments). First, the easy part - generating the hierarchyid with a recursive cte
with cte as (
select id, parentid,
cast(concat('/', id, '/') as varchar(max)) as [path]
from [dbo].[HiearchyTest]
where ParentID is null
union all
select child.id, child.parentid,
cast(concat(parent.[path], child.id, '/') as varchar(max))
from [dbo].[HiearchyTest] as child
join cte as parent
on child.parentid = parent.id
)
select id, parentid, cast([path] as hierarchyid) as [path]
into h
from cte;
Next, a little table-valued function I wrote:
create function dbo.GetAllAncestors(#h hierarchyid, #ReturnSelf bit)
returns table
as return
select #h.GetAncestor(n.n) as h
from dbo.Numbers as n
where n.n <= #h.GetLevel()
or (#ReturnSelf = 1 and n.n = 0)
union all
select #h
where #ReturnSelf = 1;
Armed with that, getting your desired result set isn't too bad:
declare #h hierarchyid;
set #h = (
select path
from h
where id = 6
);
with cte as (
select *
from h
where [path].IsDescendantOf(#h) = 1
or #h.IsDescendantOf([path]) = 1
)
select h.id as parent, c.id as descendentid
from cte as c
cross apply dbo.GetAllAncestors([path], 1) as a
join h
on a.h = h.[path]
order by h.id, c.id;
Of course, you're missing out on a lot of the benefit of using a hierarchyid by not persisting it (you'll either have to keep it up to date in the side table or generate it every time). But there you go.
Okay this has been bothering me since I have read the question and I just came back to think of it again..... Anyway, why do you need to recurse back down to get all of the descendants? You have asked for ancestors not descendants and your result set is not trying to get other siblings, grand children, etc.. It is getting a parent and a grand parent in this case. Your First cte gives you everything you need to know except when an ancestor id is also the parentid. So with a union all, a little magic to setup the originating ancestor, and you have everything you need without a second recursion.
declare #test int = 6;
with cte as (
-- leaf nodes
select id, parentid, id as terminalid
from HiearchyTest
where isleaf = 1
union all
-- walk up - preserve "terminal" item for all levels
select h.id, h.parentid, c.terminalid
from HiearchyTest as h
inner join
cte as c on h.id = c.parentid
)
, cteAncestors AS (
SELECT DISTINCT
id = IIF(parentid IS NULL, #Test, id)
,parentid = IIF(parentid IS NULL,id,parentid)
FROM
cte
WHERE
terminalid = #test
UNION
SELECT DISTINCT
id
,parentid = id
FROM
cte
WHERE
terminalid = #test
)
SELECT
id = parentid
,DecendentId = id
FROM
cteAncestors
ORDER BY
id
,DecendentId
Your result set from your first cte gives you your 2 ancestors and self related to their ancestor except in the case of the originating ancestors who's parentid is null. That null is a special case I will deal with in a minute.
Remember at this point your query is producing Ancestors not descendants, but what it doesn't give you is self references meaning grandparent = grandparent, parent = parent, self = self. But all you have to do to get that is to add rows for every id and make the parentid equal to their id. hence the union. Now your result set is almost totally shaped up:
The special case of the null parentid. So the null parentid identifies the originating ancestor meaning that ancestor has no other ancestor in your dataset. And here is how you will use that to your advantage. Because you started your initial recursion at the leaf level there is no direct tie to the id that you started with to the originating ancestor, but there is at every other level, simply hijack that null parent id and flip the values around and you now have an ancestor for your leaf.
Then in the end if you want it to be a descendants table switch the columns and you are finished. One last note DISTINCTs are there in case the id is repeated with an additional parentid. E.g. 6 | 3 and another record for 6 | 4
I'm not sure if this performs better, or even produces the proper results in all cases, but you could capture a node list, then use xml functionality to parse it out and cross apply to the id list:
declare #test int = 6;
;WITH cte AS (SELECT id, parentid, CAST(id AS VARCHAR(MAX)) as IDlist
FROM HiearchyTest
WHERE isleaf = 1
UNION ALL
SELECT h.id, h.parentid , CAST(CONCAT(c.IDlist,',',h.id) AS VARCHAR(MAX))
FROM HiearchyTest as h
JOIN cte as c
ON h.id = c.parentid
)
,cte2 AS (SELECT *, CAST ('<M>' + REPLACE(IDlist, ',', '</M><M>') + '</M>' AS XML) AS Data
FROM cte
WHERE IDlist LIKE '%'+CAST(#test AS VARCHAR(50))+'%'
)
SELECT id,Split.a.value('.', 'VARCHAR(100)') AS descendentid
FROM cte2 a
CROSS APPLY Data.nodes ('/M') AS Split(a);

SQL Tree / Hierarchial Data

This is my first post, I am trying to make a sql tree table that traverses. For example, If a person clicks on a drop down list called Categories, it will display Electric, and InterC. Then, if the user clicks on electric, it will drop down relays and switches, next if the person clicks on relays it will drop down X relays and if the person clicks on switches it will drop down Y switches. I have attempted below , but the part i don't understand is if i have another category InterC, how do I make that another level of drop downs ?
Table Category
insert test select 1, 0,'Electric'
insert test select 2, 1,'Relays'
insert test select 3, 1,'Switches'
insert test select 5, 2,'X Relays'
insert test select 6, 2,'Y Switches'
insert test select 7, 0,'InterC'
insert test select 8, 1,'x Sockets'
insert test select 9, 1,'y Sockets'
insert test select 10, 2,'X Relays'
insert test select 11, 2,'Y Relays'
;
create table test(id int,parentId int,name varchar(50))
WITH tree (id, parentid, level, name) as (
SELECT id, parentid, 0 as level, name
FROM test WHERE parentid = 0
UNION ALL
SELECT c2.id, c2.parentid, tree.level + 1, c2.name
FROM test c2
INNER JOIN tree ON tree.id = c2.parentid
)
SELECT *
FROM tree
order by parentid
Your hierarchical T-SQL query should return all the records in the table, both those under Electric and InterC.
However, you should make parentId nullable and have the root records have a null rather than 0. That will let you add a foreign key that protects your data integrity (it won't be possible to add orphaned records by mistake).
You hierarchy query returns all of your records, I'm guessing that you want to return just one at a time - for that add a where condition to the starting query.
WITH tree (id, parentid, level, name) as (
SELECT id, parentid, 0 as level, name
FROM test
WHERE name = #category AND
parentId is null
UNION ALL
SELECT c2.id, c2.parentid, tree.level + 1, c2.name
FROM test c2
INNER JOIN tree ON tree.id = c2.parentid
)
SELECT *
FROM tree
order by parentid
Then set #category to 'Electric' or'InterC' to get one or the other hierarchy.

To find infinite recursive loop in CTE

I'm not a SQL expert, but if anybody can help me.
I use a recursive CTE to get the values as below.
Child1 --> Parent 1
Parent1 --> Parent 2
Parent2 --> NULL
If data population has gone wrong, then I'll have something like below, because of which CTE may go to infinite recursive loop and gives max recursive error. Since the data is huge, I cannot check this bad data manually. Please let me know if there is a way to find it out.
Child1 --> Parent 1
Parent1 --> Child1
or
Child1 --> Parent 1
Parent1 --> Parent2
Parent2 --> Child1
With Postgres it's quite easy to prevent this by collecting all visited nodes in an array.
Setup:
create table hierarchy (id integer, parent_id integer);
insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3),
(5, 4),
(3, 5); -- endless loop
Recursive query:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;
To do this for multiple trees at the same time, you need to carry over the ID of the root node to the children:
with recursive tree as (
select id,
parent_id,
array[id] as all_parents,
id as root_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id,
p.all_parents||c.id,
p.root_id
from hierarchy c
join tree p
on c.parent_id = p.id
and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
and c.root_id = p.root_id
)
select *
from tree;
Update for Postgres 14
Postgres 14 introduced the (standard compliant) CYCLE option to detect cycles:
with recursive tree as (
select id,
parent_id
from hierarchy
where parent_id is null
union all
select c.id,
c.parent_id
from hierarchy c
join tree p
on c.parent_id = p.id
)
cycle id -- track cycles for this column
set is_cycle -- adds a boolean column is_cycle
using path -- adds a column that contains all parents for the id
select *
from tree
where not is_cycle
You haven't specified the dialect or your column names, so it is difficult to make the perfect example...
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 2, 'SubChild')
-- End random data
;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
UNION ALL
SELECT R.StartingID, R.Level + 1,
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.*
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description
FROM RecursiveCTE
ORDER BY StartingID, Level
Something like this will show if/where there are loops in the recursive cte. Look at the column Loop. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.
In the end the recursive cte creates a VARCHAR(MAX) of ids in the form |id1|id2|id3| (called Parents) and then checks if the current ID is already in that "list". If yes, it sets the Loop column to 1. This column is checked in the recursive join (the ABD R.Loop = 0).
The ending query uses a MAX() OVER (PARTITION BY ...) to set to 1 the Loop column for a whole "block" of chains.
A little more complex, that generates a "better" report:
-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
DROP TABLE #MyTable
CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2
(3, 3, 'SubChild')
-- End random data
-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
SELECT MT1.* FROM #MyTable MT1
WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)
, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
SELECT ID, -- StartingID
1, -- Level
'|' + CAST(ID AS VARCHAR(MAX)) + '|',
'|' + CAST(Description AS VARCHAR(MAX)) + '|',
0, -- Loop
ParentID
FROM WithoutChildren
UNION ALL
SELECT R.StartingID, -- StartingID
R.Level + 1, -- Level
R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|',
CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
MT.ParentID
FROM #MyTable MT
INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)
SELECT * FROM RecursiveCTE
WHERE ParentID IS NULL OR Loop = 1
This query should return all the "last child" rows, with the full parent chain. The column Loop is 0 if there is no loop, 1 if there is a loop.
Here's an alternate method for detecting cycles in adjacency lists (parent/child relationships) where nodes can only have one parent which can be enforced with a unique constraint on the child column (id in the table below). This works by computing the closure table for the adjacency list via a recursive query. It starts by adding every node to the closure table as its own ancestor at level 0 then iteratively walks the adjacency list to expand the closure table. Cycles are detected when a new record's child and ancestor are the same at any level other than the original level zero (0):
-- For PostgreSQL and MySQL 8 use the Recursive key word in the CTE code:
-- with RECURSIVE cte(ancestor, child, lev, cycle) as (
with cte(ancestor, child, lev, cycle) as (
select id, id, 0, 0 from Table1
union all
select cte.ancestor
, Table1.id
, case when cte.ancestor = Table1.id then 0 else cte.lev + 1 end
, case when cte.ancestor = Table1.id then cte.lev + 1 else 0 end
from Table1
join cte
on cte.child = Table1.PARENT_ID
where cte.cycle = 0
) -- In oracle uncomment the next line
-- cycle child set isCycle to 'Y' default 'N'
select distinct
ancestor
, child
, lev
, max(cycle) over (partition by ancestor) cycle
from cte
Given the following adjacency list for Table1:
| parent_id | id |
|-----------|----|
| (null) | 1 |
| (null) | 2 |
| 1 | 3 |
| 3 | 4 |
| 1 | 5 |
| 2 | 6 |
| 6 | 7 |
| 7 | 8 |
| 9 | 10 |
| 10 | 11 |
| 11 | 9 |
The above query which works on SQL Sever (and Oracle, PostgreSQL and MySQL 8 when modified as directed) rightly detects that nodes 9, 10, and 11 participate in a cycle of length 3.
SQL(/DB) Fiddles demonstrating this in various DBs can be found below:
Oracle 11gR2
SQL Server 2017
PostgeSQL 9.5
MySQL 8
You can use the same approach described by Knuth for detecting a cycle in a linked list here. In one column, keep track of the children, the children's children, the children's children's children, etc. In another column, keep track of the grandchildren, the grandchildren's grandchildren, the grandchildren's grandchildren's grandchildren, etc.
For the initial selection, the distance between Child and Grandchild columns is 1. Every selection from union all increases the depth of Child by 1, and that of Grandchild by 2. The distance between them increases by 1.
If you have any loop, since the distance only increases by 1 each time, at some point after Child is in the loop, the distance will be a multiple of the cycle length. When that happens, the Child and the Grandchild columns are the same. Use that as an additional condition to stop the recursion, and detect it in the rest of your code as an error.
SQL Server sample:
declare #LinkTable table (Parent int, Child int);
insert into #LinkTable values (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (7, 1);
with cte as (
select lt1.Parent, lt1.Child, lt2.Child as Grandchild
from #LinkTable lt1
inner join #LinkTable lt2 on lt2.Parent = lt1.Child
union all
select cte.Parent, lt1.Child, lt3.Child as Grandchild
from cte
inner join #LinkTable lt1 on lt1.Parent = cte.Child
inner join #LinkTable lt2 on lt2.Parent = cte.Grandchild
inner join #LinkTable lt3 on lt3.Parent = lt2.Child
where cte.Child <> cte.Grandchild
)
select Parent, Child
from cte
where Child = Grandchild;
Remove one of the LinkTable records that causes the cycle, and you will find that the select no longer returns any data.
Try to limit the recursive result
WITH EMP_CTE AS
(
SELECT
0 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees
WHERE ManagerId IS NULL
UNION ALL
SELECT
[LEVEL] + 1 AS [LEVEL],
ManagerId, EmployeeId, Name
FROM Employees e
INNER JOIN EMP_CTE c ON e.ManagerId = c.EmployeeId
AND s.LEVEL < 100 --RECURSION LIMIT
)
SELECT * FROM EMP_CTE WHERE [Level] = 100
Here is the solution for SQL Server:
Table Insert script:
CREATE TABLE MyTable
(
[ID] INT,
[ParentID] INT,
[Name] NVARCHAR(255)
);
INSERT INTO MyTable
(
[ID],
[ParentID],
[Name]
)
VALUES
(1, NULL, 'A root'),
(2, NULL, 'Another root'),
(3, 1, 'Child of 1'),
(4, 3, 'Grandchild of 1'),
(5, 4, 'Great grandchild of 1'),
(6, 1, 'Child of 1'),
(7, 8, 'Child of 8'),
(8, 7, 'Child of 7'), -- This will cause infinite recursion
(9, 1, 'Child of 1');
Script to find the exact records which are the culprit:
;WITH RecursiveCTE
AS (
-- Get all parents:
-- Any record in MyTable table could be an Parent
-- We don't know here yet which record can involve in an infinite recursion.
SELECT ParentID AS StartID,
ID,
CAST(Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM MyTable
UNION ALL
-- Recursively try finding all the childrens of above parents
-- Keep on finding it until this child become parent of above parent.
-- This will bring us back in the circle to parent record which is being
-- keep in the StartID column in recursion
SELECT RecursiveCTE.StartID,
t.ID,
CAST(RecursiveCTE.[ParentChildRelationPath] + ' -> ' + t.Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
FROM RecursiveCTE
INNER JOIN MyTable AS t
ON t.ParentID = RecursiveCTE.ID
WHERE RecursiveCTE.StartID != RecursiveCTE.ID)
-- FInd the ones which causes the infinite recursion
SELECT StartID,
[ParentChildRelationPath],
RecursiveCTE.ID
FROM RecursiveCTE
WHERE StartID = ID
OPTION (MAXRECURSION 0);
Output of above query:

SQL Server Recursion select

I have two tables:
entreprises(id, name)
entreprises_struct(id,entreprise_id, entreprise_child_id)
let's say i have these data:
entreprises:
(1,canada)
(2,ontario)
(3,quebec)
(4,ottawa)
(5,toronto)
(6,montreal)
(7,laval)
entreprises_struct
(1,1,1)
(1,1,2)
(1,1,3)
(1,2,4)
(1,2,5)
(1,3,6)
(1,3,7)
I want a query that will sort the data in this way :
montreal (child level 3)
laval (child level 3)
quebec (child level 2 and parent of those childs from level 3)
ottawa (child level 3)
toronto (child level 3)
ontario (child level 2 and parent of those childs from level 3)
canada (chil level 1 and parent of thoses childs from level 2)
If I had that from level 7 , the select must start listing thoses values till level one.
I cannot use CTE because the numbers on recursions is too much limited.
You can use option(maxrecursion 0) to get around the CTE recursion limit. As for the sorting part of the query, see the below
Sample data
create table entreprises(id int, name varchar(max));
create table entreprises_struct(id int, entreprise_id int, entreprise_child_id int);
insert entreprises values
(1,'canada'),
(2,'ontario'),
(3,'quebec'),
(4,'ottawa'),
(5,'toronto'),
(6,'montreal'),
(7,'laval');
insert entreprises_struct values
(1,1,1),
(1,1,2),
(1,1,3),
(1,2,4),
(1,2,5),
(1,3,6),
(1,3,7);
The query
;with cte as (
select entreprise_id, level=0,
path=convert(varchar(max),entreprise_id) + '/'
from entreprises_struct
where entreprise_id =entreprise_child_id -- root
union all
select s.entreprise_child_id, cte.level+1,
path=cte.path + convert(varchar(max),s.entreprise_child_id) + '/'
from entreprises_struct s
inner join cte on s.entreprise_id = cte.entreprise_id
where s.entreprise_child_id != cte.entreprise_id
)
select e.name
from cte
inner join entreprises e on e.id = cte.entreprise_id
order by path desc
option (maxrecursion 0)

Prevent recursive CTE visiting nodes multiple times

Consider the following simple DAG:
1->2->3->4
And a table, #bar, describing this (I'm using SQL Server 2005):
parent_id child_id
1 2
2 3
3 4
//... other edges, not connected to the subgraph above
Now imagine that I have some other arbitrary criteria that select the first and last edges, i.e. 1->2 and 3->4. I want to use these to find the rest of my graph.
I can write a recursive CTE as follows (I'm using terminology from MSDN):
with foo(parent_id,child_id) as (
// anchor member that happens to select first and last edges:
select parent_id,child_id from #bar where parent_id in (1,3)
union all
// recursive member:
select #bar.* from #bar
join foo on #bar.parent_id = foo.child_id
)
select parent_id,child_id from foo
However, this results in edge 3->4 being selected twice:
parent_id child_id
1 2
3 4
2 3
3 4 // 2nd appearance!
How can I prevent the query from recursing into subgraphs that have already been described? I could achieve this if, in my "recursive member" part of the query, I could reference all data that has been retrieved by the recursive CTE so far (and supply a predicate indicating in the recursive member excluding nodes already visited). However, I think I can access data that was returned by the last iteration of the recursive member only.
This will not scale well when there is a lot of such repetition. Is there a way of preventing this unnecessary additional recursion?
Note that I could use "select distinct" in the last line of my statement to achieve the desired results, but this seems to be applied after all the (repeated) recursion is done, so I don't think this is an ideal solution.
Edit - hainstech suggests stopping recursion by adding a predicate to exclude recursing down paths that were explicitly in the starting set, i.e. recurse only where foo.child_id not in (1,3). That works for the case above only because it simple - all the repeated sections begin within the anchor set of nodes. It doesn't solve the general case where they may not be. e.g., consider adding edges 1->4 and 4->5 to the above set. Edge 4->5 will be captured twice, even with the suggested predicate. :(
The CTE's are recursive.
When your CTE's have multiple initial conditions, that means they also have different recursion stacks, and there is no way to use information from one stack in another stack.
In your example, the recursion stacks will go as follows:
(1) - first IN condition
(1, 2)
(1, 2, 3)
(1, 2, 3, 4)
(1, 2, 3) - no more children
(1, 2) - no more children
(1) - no more children, going to second IN condition
(3) - second condition
(3, 4)
(3) - no more children, returning
As you can see, these recursion stack do not intersect.
You could probably record the visited values in a temporary table, JOIN each value with the temptable and do not follow this value it if it's found, but SQL Server does not support these things.
So you just use SELECT DISTINCT.
This is the approach I used. It has been tested against several methods and was the most performant. It combines the temp table idea suggested by Quassnoi and the use of both distinct and a left join to eliminate redundant paths to the recursion. The level of the recursion is also included.
I left the failed CTE approach in the code so you could compare results.
If someone has a better idea, I'd love to know it.
create table #bar (unique_id int identity(10,10), parent_id int, child_id int)
insert #bar (parent_id, child_id)
SELECT 1,2 UNION ALL
SELECT 2,3 UNION ALL
SELECT 3,4 UNION ALL
SELECT 2,5 UNION ALL
SELECT 2,5 UNION ALL
SELECT 5,6
SET NOCOUNT ON
;with foo(unique_id, parent_id,child_id, ord, lvl) as (
-- anchor member that happens to select first and last edges:
select unique_id, parent_id, child_id, row_number() over(order by unique_id), 0
from #bar where parent_id in (1,3)
union all
-- recursive member:
select b.unique_id, b.parent_id, b.child_id, row_number() over(order by b.unique_id), foo.lvl+1
from #bar b
join foo on b.parent_id = foo.child_id
)
select unique_id, parent_id,child_id, ord, lvl from foo
/***********************************
Manual Recursion
***********************************/
Declare #lvl as int
Declare #rows as int
DECLARE #foo as Table(
unique_id int,
parent_id int,
child_id int,
ord int,
lvl int)
--Get anchor condition
INSERT #foo (unique_id, parent_id, child_id, ord, lvl)
select unique_id, parent_id, child_id, row_number() over(order by unique_id), 0
from #bar where parent_id in (1,3)
set #rows=##ROWCOUNT
set #lvl=0
--Do recursion
WHILE #rows > 0
BEGIN
set #lvl = #lvl + 1
INSERT #foo (unique_id, parent_id, child_id, ord, lvl)
SELECT DISTINCT b.unique_id, b.parent_id, b.child_id, row_number() over(order by b.unique_id), #lvl
FROM #bar b
inner join #foo f on b.parent_id = f.child_id
--might be multiple paths to this recursion so eliminate duplicates
left join #foo dup on dup.unique_id = b.unique_id
WHERE f.lvl = #lvl-1 and dup.child_id is null
set #rows=##ROWCOUNT
END
SELECT * from #foo
DROP TABLE #bar
Do you happen to know which of the two edges is on a deeper level in the tree? Because in that case, you could make edge 3->4 the anchor member and start walking up the tree until you find edge 1->2.
Something like this:
with foo(parent_id, child_id)
as
(
select parent_id, child_id
from #bar
where parent_id = 3
union all
select parent_id, child_id
from #bar b
inner join foo f on b.child_id = f.parent_id
where b.parent_id <> 1
)
select *
from foo
(I'm no expert on graphs, just exploring a bit)
The DISTINCT will guarantee each row is distinct, but it won't eliminate graph routes that don't end up in your last edge. Take this graph:
insert into #bar (parent_id,child_id) values (1,2)
insert into #bar (parent_id,child_id) values (1,5)
insert into #bar (parent_id,child_id) values (2,3)
insert into #bar (parent_id,child_id) values (2,6)
insert into #bar (parent_id,child_id) values (6,4)
The results of the query here include (1,5), which is not part of the route from the first edge (1,2) to the last edge (6,4).
You could try something like this, to find only routes that start with (1,2) and end with (6,4):
with foo(parent_id, child_id, route) as (
select parent_id, child_id,
cast(cast(parent_id as varchar) +
cast(child_id as varchar) as varchar(128))
from #bar
union all
select #bar.parent_id, #bar.child_id,
cast(route + cast(#bar.child_id as varchar) as varchar(128))
from #bar
join foo on #bar.parent_id = foo.child_id
)
select * from foo where route like '12%64'
Is this what you want to do?
create table #bar (parent_id int, child_id int)
insert #bar values (1,2)
insert #bar values (2,3)
insert #bar values (3,4)
declare #start_node table (parent_id int)
insert #start_node values (1)
insert #start_node values (3)
;with foo(parent_id,child_id) as (
select
parent_id
,child_id
from #bar where parent_id in (select parent_id from #start_node)
union all
select
#bar.*
from #bar
join foo on #bar.parent_id = foo.child_id
where foo.child_id not in (select parent_id from #start_node)
)
select parent_id,child_id from foo
Edit - #bacar - I don't think this is the temp table solution Quasnoi was proposing. I believe they were suggesting basically duplicate the entire recursion member contents during each recursion, and use that as a join to prevent reprocessing (and that this is not supported in ss2k5). My approach is supported, and the only change to your original is in the predicate in the recursion member to exclude recursing down paths that were explicitly in your starting set. I only added the table variable so that you would define the starting parent_ids in one location, you could just as easily have used this predicate with your original query:
where foo.child_id not in (1,3)
EDIT -- This doesn't work at all. This is a method to stop chasing triangle routes. It doesn't do what the OP wanted.
Or you can use a recursive token separated string.
I'm at home on my laptop ( no sql server ) so this might not be completely right but here goes.....
; WITH NodeNetwork AS (
-- Anchor Definition
SELECT
b.[parent_Id] AS [Parent_ID]
, b.[child_Id] AS [Child_ID]
, CAST(b.[Parent_Id] AS VARCHAR(MAX)) AS [NodePath]
FROM
#bar AS b
-- Recursive Definition
UNION ALL SELECT
b.[Parent_Id]
, b.[child_Id]
, CAST(nn.[NodePath] + '-' + CAST(b.[Parent_Id] AS VARCHAR(MAX)) AS VARCHAR(MAX))
FROM
NodeNetwork AS nn
JOIN #bar AS b ON b.[Parent_Id] = nn.[Child_ID]
WHERE
nn.[NodePath] NOT LIKE '%[-]' + CAST(b.[Parent_Id] AS VARCHAR(MAX)) + '%'
)
SELECT * FROM NodeNetwork
Or similar. Sorry It's late and I can't test it. I'll check on Monday morning. Credit for this must go to Peter Larsson (Peso)
The idea was generated here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=115290