Multiple parents tree (or digraph) implementation sql server 2005 - sql-server-2005

I need to implement a multi-parented tree (or digraph) onto SQL Server 2005.
I've read several articles, but most of them uses single-parented trees with a unique root like the following one.
-My PC
-Drive C
-Documents and Settings
-Program Files
-Adobe
-Microsoft
-Folder X
-Drive D
-Folder Y
-Folder Z
In this one, everything derives from a root element (My PC).
In my case, a child could have more than 1 parent, like the following:
G A
\ /
B
/ \
X C
/ \
D E
\ /
F
So I have the following code:
create table #ObjectRelations
(
Id varchar(20),
NextId varchar(20)
)
insert into #ObjectRelations values ('G', 'B')
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('B', 'X')
insert into #ObjectRelations values ('C', 'E')
insert into #ObjectRelations values ('C', 'D')
insert into #ObjectRelations values ('E', 'F')
insert into #ObjectRelations values ('D', 'F')
declare #id varchar(20)
set #id = 'A';
WITH Objects (Id, NextId) AS
( -- This is the 'Anchor' or starting point of the recursive query
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
WHERE rel.Id = #id
UNION ALL -- This is the recursive portion of the query
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)
ON rel.Id = Objects.NextId
)
SELECT o.*
FROM Objects o
drop table #ObjectRelations
Which returns the following SET:
Id NextId
-------------------- --------------------
A B
B C
B X
C E
C D
D F
E F
Expected result SET:
Id NextId
-------------------- --------------------
G B
A B
B C
B X
C E
C D
D F
E F
Note that the relation G->B is missing, because it asks for an starting object (which doesn't work for me also, because I don't know the root object from the start) and using A as the start point will ignore the G->B relationship.
So, this code doesn't work in my case because it asks for a starting object, which is obvious in a SINGLE-parent tree (will always be the root object). But in multi-parent tree, you could have more than 1 "root" object (like in the example, G and A are the "root" objects, where root is an object which doesn't have a parent (ancestor)).
So I'm kind of stucked in here... I need to modify the query to NOT ask for a starting object and recursively traverse the entire tree.
I don't know if that's possible with the (Id, NextId) implementation... may be I need to store it like a graph using some kind of Incidence matrix, adjacency matrix or whatever (see http://willets.org/sqlgraphs.html).
Any help? What do you think guys?
Thank you very much for your time =)
Cheers!
Sources:
Source 1
Source 2
Source 3

Well, I finally came up with the following solution.
It's the way I found to support multi-root trees and also cycling digraphs.
create table #ObjectRelations
(
Id varchar(20),
NextId varchar(20)
)
/* Cycle */
/*
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('C', 'A')
*/
/* Multi root */
insert into #ObjectRelations values ('G', 'B')
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('B', 'X')
insert into #ObjectRelations values ('C', 'E')
insert into #ObjectRelations values ('C', 'D')
insert into #ObjectRelations values ('E', 'F')
insert into #ObjectRelations values ('D', 'F')
declare #startIds table
(
Id varchar(20) primary key
)
;WITH
Ids (Id) AS
(
SELECT Id
FROM #ObjectRelations
),
NextIds (Id) AS
(
SELECT NextId
FROM #ObjectRelations
)
INSERT INTO #startIds
/* This select will not return anything since there are not objects without predecessor, because it's a cyclic of course */
SELECT DISTINCT
Ids.Id
FROM
Ids
LEFT JOIN
NextIds on Ids.Id = NextIds.Id
WHERE
NextIds.Id IS NULL
UNION
/* So let's just pick anyone. (the way I will be getting the starting object for a cyclic doesn't matter for the regarding problem)*/
SELECT TOP 1 Id FROM Ids
;WITH Objects (Id, NextId, [Level], Way) AS
( -- This is the 'Anchor' or starting point of the recursive query
SELECT rel.Id,
rel.NextId,
1,
CAST(rel.Id as VARCHAR(MAX))
FROM #ObjectRelations rel
WHERE rel.Id IN (SELECT Id FROM #startIds)
UNION ALL -- This is the recursive portion of the query
SELECT rel.Id,
rel.NextId,
[Level] + 1,
RecObjects.Way + ', ' + rel.Id
FROM #ObjectRelations rel
INNER JOIN Objects RecObjects -- Note the reference to CTE table name (Recursive Join)
ON rel.Id = RecObjects.NextId
WHERE RecObjects.Way NOT LIKE '%' + rel.Id + '%'
)
SELECT DISTINCT
Id,
NextId,
[Level]
FROM Objects
ORDER BY [Level]
drop table #ObjectRelations
Could be useful for somebody. It is for me =P
Thanks

If you want to use all root objects as starting objects, you should first update your data to include information about the root objects (and the leaves). You should add the following inserts:
insert into #ObjectRelations values (NULL, 'G')
insert into #ObjectRelations values (NULL, 'A')
insert into #ObjectRelations values ('X', NULL)
insert into #ObjectRelations values ('F', NULL)
Of course you could also write your anchor query in such a way that you select as root nodes the records that have an Id that does not occur as a NextId, but this is easier.
Next, modify your anchor query to look like this:
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
WHERE rel.Id IS NULL
If you run this query, you'll see that you get a lot of duplicates, a lot of arcs occur multiple times. This is because you now have two results from your anchor query and therefore the tree is traversed two times.
This can be fixed by changing your select statement to this (note the DISTINCT):
SELECT DISTINCT o.*
FROM Objects o

If you dont want to do the inserts suggested by Ronald,this would do!.
WITH CTE_MultiParent (ID, ParentID)
AS
(
SELECT ID, ParentID FROM #ObjectRelations
WHERE ID NOT IN
(
SELECT DISTINCT ParentID FROM #ObjectRelations
)
UNION ALL
SELECT ObjR.ID, ObjR.ParentID FROM #ObjectRelations ObjR INNER JOIN CTE_MultiParent
ON CTE_MultiParent.ParentID = ObjR.Id
)
SELECT DISTINCT * FROM CTE_MultiParent

Related

Matching multiple key/value pairs in SQL

I have metadata stored in a key/value table in SQL Server. (I know key/value is bad, but this is free-form metadata supplied by users, so I can't turn the keys into columns.) Users need to be able to give me an arbitrary set of key/value pairs and have me return all DB objects that match all of those criteria.
For example:
Metadata:
Id Key Value
1 a p
1 b q
1 c r
2 a p
2 b p
3 c r
If the user says a=p and b=q, I should return object 1. (Not object 2, even though it also has a=p, because it has b=p.)
The metadata to match is in a table-valued sproc parameter with a simple key/value schema. The closest I have got is:
select * from [Objects] as o
where not exists (
select * from [Metadata] as m
join #data as n on (n.[Key] = m.[Key])
and n.[Value] != m.[Value]
and m.[Id] = o.[Id]
)
My "no rows exist that don't match" is an attempt to implement "all rows match" by forming its contrapositive. This does eliminate objects with mismatching metadata, but it also returns objects with no metadata at all, so no good.
Can anyone point me in the right direction? (Bonus points for performance as well as correctness.)
; WITH Metadata (Id, [Key], Value) AS -- Create sample data
(
SELECT 1, 'a', 'p' UNION ALL
SELECT 1, 'b', 'q' UNION ALL
SELECT 1, 'c', 'r' UNION ALL
SELECT 2, 'a', 'p' UNION ALL
SELECT 2, 'b', 'p' UNION ALL
SELECT 3, 'c', 'r'
),
data ([Key], Value) AS -- sample input
(
SELECT 'a', 'p' UNION ALL
SELECT 'b', 'q'
),
-- here onwards is the actual query
data2 AS
(
-- cnt is to count no of input rows
SELECT [Key], Value, cnt = COUNT(*) OVER()
FROM data
)
SELECT m.Id
FROM Metadata m
INNER JOIN data2 d ON m.[Key] = d.[Key] AND m.Value= d.Value
GROUP BY m.Id
HAVING COUNT(*) = MAX(d.cnt)
The following SQL query produces the result that you require.
SELECT *
FROM #Objects m
WHERE Id IN
(
-- Include objects that match the conditions:
SELECT m.Id
FROM #Metadata m
JOIN #data d ON m.[Key] = d.[Key] AND m.Value = d.Value
-- And discount those where there is other metadata not matching the conditions:
EXCEPT
SELECT m.Id
FROM #Metadata m
JOIN #data d ON m.[Key] = d.[Key] AND m.Value <> d.Value
)
Test schema and data I used:
-- Schema
DECLARE #Objects TABLE (Id int);
DECLARE #Metadata TABLE (Id int, [Key] char(1), Value char(2));
DECLARE #data TABLE ([Key] char(1), Value char(1));
-- Data
INSERT INTO #Metadata VALUES
(1, 'a', 'p'),
(1, 'b', 'q'),
(1, 'c', 'r'),
(2, 'a', 'p'),
(2, 'b', 'p'),
(3, 'c', 'r');
INSERT INTO #Objects VALUES
(1),
(2),
(3),
(4); -- Object with no metadata
INSERT INTO #data VALUES
('a','p'),
('b','q');

Getting nodes under a given node?

Given a source node, I would like to get all nodes "under" it where under means that all nodes whose level is less than the given node's level and are reachable from the given node. I remember this can be done using common table expressions and am currently working on it. However, is there a way to do this fast on a large graph (consisting of about 100K nodes)?
Sample data:
CREATE TABLE #TEMP(Source VARCHAR(50), SourceLevel INT, Sink VARCHAR(50), SinkLevel INT);
INSERT INTO #TEMP VALUES('A', 1, 'B', 2);
INSERT INTO #TEMP VALUES('A', 1, 'C', 2);
INSERT INTO #TEMP VALUES('B', 2, 'C', 2);
INSERT INTO #TEMP VALUES('B', 2, 'D', 3);
INSERT INTO #TEMP VALUES('B', 2, 'E', 3);
INSERT INTO #TEMP VALUES('C', 2, 'D', 3);
INSERT INTO #TEMP VALUES('C', 2, 'F', 3);
INSERT INTO #TEMP VALUES('C', 2, 'G', 3);
SELECT *
FROM #TEMP
GO
DROP TABLE #TEMP
GO
Graph:
A Level - 1
/ \
B---C Level - 2
/ \ /|\
E D F G Level - 3
Example:
Given B, I want to get: E,D
Given A, I want to get: B,C,E,D,F,G
Given C, I want to get: D,F,G
One possible solution here, using Recursive CTEs but there is still some problem with this (the answer does not exactly match yet but its getting there). I'll update this query once I get it close to my final requirement.
EDIT 1: The following satisfied all outputs except the one that asks for the root node 'A' in which case, it goes only one level deep for some reason.
EDIT 2: This gives the expected output. I'll check with other use cases now.
;WITH HIERARCHY AS (
SELECT T.Source, T.SourceLevel, T.Sink, T.SinkLevel
FROM #TEMP T
WHERE T.Source = 'A'
AND T.SourceLevel < T.SinkLevel
UNION ALL
SELECT X.Source, X.SourceLevel, X.Sink, X.SinkLevel
FROM #TEMP X
JOIN HIERARCHY H ON H.Sink = X.Source
)
SELECT DISTINCT H.Sink
FROM HIERARCHY H

Sql HierarchyId How do I get the last descendants?

Using t-sql hierarchy Id how do I get all of the rows that have no children (that is the last decendants)?
Say my table is structured like this:
Id,
Name,
HierarchyId
And has these rows:
1, Craig, /
2, Steve, /1/
3, John, /1/1/
4, Sam, /2/
5, Matt, /2/1/
6, Chris, /2/1/1/
What query would give me John and Chris?
Perhaps there are better ways but this seams to do the job.
declare #T table
(
ID int,
Name varchar(10),
HID HierarchyID
)
insert into #T values
(1, 'Craig', '/'),
(2, 'Steve', '/1/'),
(3, 'John', '/1/1/'),
(4, 'Sam', '/2/'),
(5, 'Matt', '/2/1/'),
(6, 'Chris', '/2/1/1/')
select *
from #T
where HID.GetDescendant(null, null) not in (select HID
from #T)
Result:
ID Name HID
----------- ---------- ---------------------
3 John 0x5AC0
6 Chris 0x6AD6
Update 2012-05-22
Query above will fail if node numbers is not in an unbroken sequence. Here is another version that should take care of that.
declare #T table
(
ID int,
Name varchar(10),
HID HierarchyID
)
insert into #T values
(1, 'Craig', '/'),
(2, 'Steve', '/1/'),
(3, 'John', '/1/1/'),
(4, 'Sam', '/2/'),
(5, 'Matt', '/2/1/'),
(6, 'Chris', '/2/1/2/') -- HID for this row is changed compared to above query
select *
from #T
where HID not in (select HID.GetAncestor(1)
from #T
where HID.GetAncestor(1) is not null)
Since you only need leafs and you don't need to get them from a specific ancestor, a simple non-recursive query like this should do the job:
SELECT * FROM YOUR_TABLE PARENT
WHERE
NOT EXISTS (
SELECT * FROM YOUR_TABLE CHILD
WHERE CHILD.HierarchyId = PARENT.Id
)
In plain English: select every row without a child row.
This assumes your HierarchyId is a FOREIGN KEY towards the Id, not the whole "path" as presented in your example. If it isn't, this is probably the first thing you should fix in your database model.
--- EDIT ---
OK, here is the MS SQL Server-specific query that actually works:
SELECT * FROM YOUR_TABLE PARENT
WHERE
NOT EXISTS (
SELECT * FROM YOUR_TABLE CHILD
WHERE
CHILD.Id <> PARENT.Id
AND CHILD.HierarchyId.IsDescendantOf(PARENT.HierarchyId) = 1
)
Note that the IsDescendantOf considers any row a descendant of itself, so we also need the CHILD.Id <> PARENT.Id in the condition.
Hi I use this one and works perfectly for me.
CREATE TABLE [dbo].[Test]([Id] [hierarchyid] NOT NULL, [Name] [nvarchar](50) NULL)
DECLARE #Parent AS HierarchyID = CAST('/2/1/' AS HierarchyID) -- Get Current Parent
DECLARE #Last AS HierarchyID
SELECT #Last = MAX(Id) FROM Test WHERE Id.GetAncestor(1) = #Parent -- Find Last Id for this Parent
INSERT INTO Test(Id,Name) VALUES(#Parent.GetDescendant(#Last, NULL),'Sydney') -- Insert after Last Id

Algorithm to select nodes and their parents in a tree

Let's say you have a tree structure as follows:
a [Level 0]
/ | \
b c d [Level 1]
/ \ |
e f g [Level 2]
| / \
h i j [Level 3]
I have represented this in a database like so:
node parent
------------
a null
b a
c a
d a
[...]
h f
i g
I'd like to write a function that, given a level, it will return all nodes at that level and their parents.
For example:
f(0) => { a }
f(1) => { a, b, c, d }
f(2) => { a, b, c, d, e, f, g }
Any thoughts?
Here I iterate through the levels, adding each one to the table with the level it is on.
create table mytable (
node varchar(80),
parent varchar(80),
constraint PK_mytable primary key nonclustered (node)
)
-- index for speed selecting on parent
create index IDX_mytable_parent on mytable (parent, node)
insert into mytable values ('a', null)
insert into mytable values ('b', 'a')
insert into mytable values ('c', 'a')
insert into mytable values ('d', 'a')
insert into mytable values ('e', 'b')
insert into mytable values ('f', 'b')
insert into mytable values ('g', 'd')
insert into mytable values ('h', 'f')
insert into mytable values ('i', 'g')
insert into mytable values ('j', 'g')
create function fn_level (#level int) returns #nodes table (Node varchar(80), TreeLevel int)
as begin
declare #current int
set #current = 0
while #current <= #level begin
if (#current = 0)
insert #nodes (Node, TreeLevel)
select node, #current
from mytable
where parent is null
else
insert #nodes (Node, TreeLevel)
select mt.node, #current
from #nodes n
inner join mytable mt on mt.parent = n.Node
where n.TreeLevel = (#current - 1)
set #current = #current + 1
end
return
end
select * from fn_level(2)
The usual way to do this, unless your flavour of SQL has a special non-standard function for it, is to build a path table that has these columns:
parent_key
child_key
path_length
To fill this table, you use a recursive or procedural loop to find all of the parents, grand-parents, great-grand-parents, etc for each item in your list of items. The recursion or looping needs to continue until you stop finding longer paths which return new pairs.
At the end, you'll have a list of records that tell you things like (a,b,1), (a,f,2), (a,h,3) etc. Then, to get everything that is at level x and above, you do a distinct select on all of the children with a path_length <= x (unioned with the root, unless you included a record of (null, root, 0) when you started your recursion/looping.
It would be nice if SQL were better at handling directed graphs (trees) but unfortunately you have to cheat it with extra tables like this.
A solution for MySQL is less than ideal.
Assuming that the maximum depth of the tree is known:
SELECT
nvl(e.node, nvl(d.node, nvl(c.node, nvl(b.node, a.node)))) item
, nvl2(e.node, 5, nvl2(d.node, 4, nvl2(c.node, 3, nvl2(b.node, 2, 1)))) depth
FROM table t AS a
LEFT JOIN table t AS b ON (a.node = b.parent)
LEFT JOIN table t AS c ON (b.node = c.parent)
LEFT JOIN table t AS d ON (c.node = d.parent)
LEFT JOIN table t AS e ON (d.node = e.parent)
WHERE a.parent IS NULL
This will give you every node and it's depth. After that it's trivial to select every item that has depth less that X.
If the depth of the tree is not known, or is significantly large then the solution is iterative as another poster has said.
Shamelessly copying from Jason, I made a function-less solution which I tested with postgresql (which has functions - maybe it would have worked out of the box).
create table tree (
node char(1),
parent char(1)
);
insert into tree values ('a', null);
insert into tree values ('b', 'a');
insert into tree values ('c', 'a');
insert into tree values ('d', 'a');
insert into tree values ('e', 'b');
insert into tree values ('f', 'b');
insert into tree values ('g', 'd');
insert into tree values ('h', 'f');
insert into tree values ('i', 'g');
insert into tree values ('j', 'g');
ALTER TABLE tree ADD level int2;
--
-- node parent level
-- a - 1
-- b a a.(level + 1)
-- c a a.(level + 1)
-- e b b.(level + 1)
-- root is a:
UPDATE tree SET level = 0 WHERE node = 'a';
-- every level else is parent + 1:
UPDATE tree tout -- outer
SET level = (
SELECT ti.level + 1
FROM tree ti -- inner
WHERE tout.parent = ti.node
AND ti.level IS NOT NULL)
WHERE tout.level IS NULL;
The update statement is pure sql, and has to be repeated for every level, to fill the table up.
kram=# select * from tree;
node | parent | level
------+--------+-------
a | | 1
b | a | 2
c | a | 2
d | a | 2
e | b | 3
f | b | 3
g | d | 3
h | f | 4
i | g | 4
j | g | 4
(10 Zeilen)
I started with 'level=1', not '0' for a, therefore the difference.
SQL doesn't always handle these recursive problems very well.
Some DBMS platforms allow you to use Common Table Expressions which are effectively queries that call themselves, allowing you to recurse through a data structure. There's no support for this in MySQL, so I'd recommend you use multiple queries constructed and managed by a script written in another language.
I do not know much about databases, or their terminology, but would it work if you performed a joint product of a table with itself N times in order to find all elements at level N?
In other words, perform a query in which you search for all entries that have parent A. That will return to you a list of all its children. Then, repeat the query to find the children of each of these children. Repeat this procedure until you find all children at level N.
In this way you would not have to pre-compute the depth of each item.

SQL - Ordering by multiple criteria

I have a table of categories. Each category can either be a root level category (parent is NULL), or have a parent which is a root level category. There can't be more than one level of nesting.
I have the following table structure:
Categories Table Structure http://img16.imageshack.us/img16/8569/categoriesi.png
Is there any way I could use a query which produced the following output:
Free Stuff
Hardware
Movies
CatA
CatB
CatC
Software
Apples
CatD
CatE
So the results are ordered by top level category, then after each top level category, subcategories of that category are listed?
It's not really ordering by Parent or Name, but a combo of the two. I'm using SQL Server.
It seems to me like you are looking to flatten and order your hierarchy, the cheapest way to get this ordering would be to store an additional column in the table that has the full path.
So for example:
Name | Full Path
Free Stuff | Free Stuff
aa2 | Free Stuff - aa2
Once you store the full path, you can order on it.
If you only have a depth of one you can auto generate a string to this effect with a single subquery (and order on it), but this solution does not work that easily when it gets deep.
Another option, is to move this all over to a temp table and calculate the full path there, on demand. But it is fairly expensive.
You could make the table look at itself, ordering by the parent Name then the child Name.
select categories.Name AS DisplayName
from categories LEFT OUTER JOIN
categories AS parentTable ON categories.Parent = parentTable.ID
order by parentTable.Name, DisplayName
Ok, here we go :
with foo as
(
select 1 as id, null as parent, 'CatA' as cat from dual
union select 2, null, 'CatB' from dual
union select 3, null, 'CatC' from dual
union select 4, 1, 'SubCatA_1' from dual
union select 5, 1, 'SubCatA_2' from dual
union select 6, 2, 'SubCatB_1' from dual
union select 7, 2, 'SubCatB_2' from dual
)
select child.cat
from foo parent right outer join foo child on parent.id = child.parent
order by case when parent.id is not null then parent.cat else child.cat end,
case when parent.id is not null then 1 else 0 end
Result :
CatA
SubCatA_1
SubCatA_2
CatB
SubCatB_1
SubCatB_2
CatC
Edit - Solution change inspire from van's order by ! Much simpler that way.
Not entirely sure of your questions but it sounds like PARTITION BY might be useful for you. There's a good introductory post on PARTITION BY here.
Here you have a complete working example using a resursive common table expression.
DECLARE #categories TABLE
(
ID INT NOT NULL,
[Name] VARCHAR(50),
Parent INT NULL
);
INSERT INTO #categories VALUES (4, 'Free Stuff', NULL);
INSERT INTO #categories VALUES (1, 'Hardware', NULL);
INSERT INTO #categories VALUES (3, 'Movies', NULL);
INSERT INTO #categories VALUES (2, 'Software', NULL);
INSERT INTO #categories VALUES (10, 'a', 0);
INSERT INTO #categories VALUES (12, 'apples', 2);
INSERT INTO #categories VALUES (8, 'catD', 2);
INSERT INTO #categories VALUES (9, 'catE', 2);
INSERT INTO #categories VALUES (5, 'catA', 3);
INSERT INTO #categories VALUES (6, 'catB', 3);
INSERT INTO #categories VALUES (7, 'catC', 3);
INSERT INTO #categories VALUES (11, 'aa2', 4);
WITH categories(ID, Name, Parent, HierarchicalName)
AS
(
SELECT
c.ID
, c.[Name]
, c.Parent
, CAST(c.[Name] AS VARCHAR(200)) AS HierarchicalName
FROM #categories c
WHERE c.Parent IS NULL
UNION ALL
SELECT
c.ID
, c.[Name]
, c.Parent
, CAST(pc.HierarchicalName + c.[Name] AS VARCHAR(200))
FROM #categories c
JOIN categories pc ON c.Parent = pc.ID
)
SELECT c.*
FROM categories c
ORDER BY c.HierarchicalName
SELECT
ID,
Name,
Parent,
RIGHT(
'000000000000000' +
CASE WHEN Parent IS NULL
THEN CONVERT(VARCHAR, Id)
ELSE CONVERT(VARCHAR, Parent)
END, 15
)
+ '_' + CASE WHEN Parent IS NULL THEN '0' ELSE '1' END
+ '_' + Name
FROM
categories
ORDER BY
4
The long padding is to account for the fact that SQL Server's INT data type goes from 2,147,483,648 through 2,147,483,647.
You can ORDER BY the expression directly, no need to use ORDER BY 4. It was just to show what it is sorting on.
It is worth noting that this expression cannot use any index. This means sorting a large table will be slow.