Getting nodes under a given node? - sql

Given a source node, I would like to get all nodes "under" it where under means that all nodes whose level is less than the given node's level and are reachable from the given node. I remember this can be done using common table expressions and am currently working on it. However, is there a way to do this fast on a large graph (consisting of about 100K nodes)?
Sample data:
CREATE TABLE #TEMP(Source VARCHAR(50), SourceLevel INT, Sink VARCHAR(50), SinkLevel INT);
INSERT INTO #TEMP VALUES('A', 1, 'B', 2);
INSERT INTO #TEMP VALUES('A', 1, 'C', 2);
INSERT INTO #TEMP VALUES('B', 2, 'C', 2);
INSERT INTO #TEMP VALUES('B', 2, 'D', 3);
INSERT INTO #TEMP VALUES('B', 2, 'E', 3);
INSERT INTO #TEMP VALUES('C', 2, 'D', 3);
INSERT INTO #TEMP VALUES('C', 2, 'F', 3);
INSERT INTO #TEMP VALUES('C', 2, 'G', 3);
SELECT *
FROM #TEMP
GO
DROP TABLE #TEMP
GO
Graph:
A Level - 1
/ \
B---C Level - 2
/ \ /|\
E D F G Level - 3
Example:
Given B, I want to get: E,D
Given A, I want to get: B,C,E,D,F,G
Given C, I want to get: D,F,G

One possible solution here, using Recursive CTEs but there is still some problem with this (the answer does not exactly match yet but its getting there). I'll update this query once I get it close to my final requirement.
EDIT 1: The following satisfied all outputs except the one that asks for the root node 'A' in which case, it goes only one level deep for some reason.
EDIT 2: This gives the expected output. I'll check with other use cases now.
;WITH HIERARCHY AS (
SELECT T.Source, T.SourceLevel, T.Sink, T.SinkLevel
FROM #TEMP T
WHERE T.Source = 'A'
AND T.SourceLevel < T.SinkLevel
UNION ALL
SELECT X.Source, X.SourceLevel, X.Sink, X.SinkLevel
FROM #TEMP X
JOIN HIERARCHY H ON H.Sink = X.Source
)
SELECT DISTINCT H.Sink
FROM HIERARCHY H

Related

Storing active row in a bit column vs. storing the logic in a view

I have a query where multiple factors determine the actual, active row. Can I do this in real-time and still be performant, or is an approach with a bit field the generally recommended approach, where the currently active field is indicated, index and queried?
My real-time solution involves an intermediate step in a view (temporary table in my example below). Therefore I am concerned about performance, because I will have to deal with hundreds of thousands to millions of records.
To illustrate:
DECLARE #grades TABLE (
person int,
grade int,
attempt int,
correction int)
INSERT #grades VALUES (1, 80, 1, 0)
INSERT #grades VALUES (1, 90, 2, 0)
INSERT #grades VALUES (1, 100, 3, 0)
INSERT #grades VALUES (2, 95, 1, 0)
INSERT #grades VALUES (2, 80, 1, 1)
INSERT #grades VALUES (2, 90, 1, 2)
INSERT #grades VALUES (2, 89, 1, 3)
SELECT b.*
INTO #grades_corrected
FROM #grades AS b
RIGHT JOIN (
SELECT person, attempt, MAX(correction) AS last_correction
FROM #grades as b
GROUP BY person, attempt
)
AS last_corrections
ON (b.attempt = last_corrections.attempt
AND b.correction = last_corrections.last_correction
AND b.person = last_corrections.person
)
SELECT g.*
FROM #grades_corrected g
LEFT OUTER JOIN #grades_corrected g2 ON (
g.person = g2.person
AND g.grade < g2.grade)
WHERE g2.grade is null
DROP TABLE #grades_corrected
Performance is going to be much more nuanced than just the sql you have. In any case the sql you have above will breakdown most quickly on the group by with the max and the temp table copy. Both of these will depend heavily on how many records are in the table and how much power (cpu and ram mostly) on your sql server. If you copy "millions" of records into the temp table, it will most likely be slow by most standards.

SQL 'arrays' for repeated use with IN comparisons

It's entirely possible to do a SELECT statement like so:
SELECT *
FROM orders
WHERE order_id in (10000, 10001, 10003, 10005);
However, is it possible to create a variable which stores that 'array' (10000, ...) for repeated use in multiple statements like so?
SELECT *
FROM orders
WHERE order_id in #IDarray;
Apologies if this is a painfully simple question - we've all gotta ask them once!
Edit: Hmm, perhaps I should clarify. In my exact situation, I have a load of IDs (let's use the array above as an example) that are hard coded but might change.
These should be re-usable for multiple INSERT statements, so that we can insert things into multiple tables for each of the IDs. Two such end results might be:
INSERT INTO table1 VALUES (10000, 1, 2, 3);
INSERT INTO table1 VALUES (10001, 1, 2, 3);
INSERT INTO table1 VALUES (10003, 1, 2, 3);
INSERT INTO table1 VALUES (10005, 1, 2, 3);
INSERT INTO table2 VALUES (10000, a, b, c);
INSERT INTO table2 VALUES (10001, a, b, c);
INSERT INTO table2 VALUES (10003, a, b, c);
INSERT INTO table2 VALUES (10005, a, b, c);
Obviously here being able to specify the array saves room and also allows it to be changed in one location instead of the INSERTs having to be modified.
With Microsoft SQL Server you could use an table variable:
CREATE TABLE #IDTable(id INT PRIMARY KEY);
-- insert IDs into table
SELECT *
FROM orders o
INNER JOIN #IDTable i ON i.id = o.order_id;
INSERT INTO table2
SELECT id, 1, 2, 3
FROM #IDTable
INSERT INTO table2
SELECT id, 'a', 'b', 'c'
FROM #IDTable
With the use of Declare table variable you can achieve what you want to do.
For example :
Declare #tbl table(orderID int,Orders varchar(max));
insert into #tbl
SELECT * FROM orders WHERE order_id in (10000, 10001, 10003, 10005);
Select orderID ,Orders from #tbl

Sql HierarchyId How do I get the last descendants?

Using t-sql hierarchy Id how do I get all of the rows that have no children (that is the last decendants)?
Say my table is structured like this:
Id,
Name,
HierarchyId
And has these rows:
1, Craig, /
2, Steve, /1/
3, John, /1/1/
4, Sam, /2/
5, Matt, /2/1/
6, Chris, /2/1/1/
What query would give me John and Chris?
Perhaps there are better ways but this seams to do the job.
declare #T table
(
ID int,
Name varchar(10),
HID HierarchyID
)
insert into #T values
(1, 'Craig', '/'),
(2, 'Steve', '/1/'),
(3, 'John', '/1/1/'),
(4, 'Sam', '/2/'),
(5, 'Matt', '/2/1/'),
(6, 'Chris', '/2/1/1/')
select *
from #T
where HID.GetDescendant(null, null) not in (select HID
from #T)
Result:
ID Name HID
----------- ---------- ---------------------
3 John 0x5AC0
6 Chris 0x6AD6
Update 2012-05-22
Query above will fail if node numbers is not in an unbroken sequence. Here is another version that should take care of that.
declare #T table
(
ID int,
Name varchar(10),
HID HierarchyID
)
insert into #T values
(1, 'Craig', '/'),
(2, 'Steve', '/1/'),
(3, 'John', '/1/1/'),
(4, 'Sam', '/2/'),
(5, 'Matt', '/2/1/'),
(6, 'Chris', '/2/1/2/') -- HID for this row is changed compared to above query
select *
from #T
where HID not in (select HID.GetAncestor(1)
from #T
where HID.GetAncestor(1) is not null)
Since you only need leafs and you don't need to get them from a specific ancestor, a simple non-recursive query like this should do the job:
SELECT * FROM YOUR_TABLE PARENT
WHERE
NOT EXISTS (
SELECT * FROM YOUR_TABLE CHILD
WHERE CHILD.HierarchyId = PARENT.Id
)
In plain English: select every row without a child row.
This assumes your HierarchyId is a FOREIGN KEY towards the Id, not the whole "path" as presented in your example. If it isn't, this is probably the first thing you should fix in your database model.
--- EDIT ---
OK, here is the MS SQL Server-specific query that actually works:
SELECT * FROM YOUR_TABLE PARENT
WHERE
NOT EXISTS (
SELECT * FROM YOUR_TABLE CHILD
WHERE
CHILD.Id <> PARENT.Id
AND CHILD.HierarchyId.IsDescendantOf(PARENT.HierarchyId) = 1
)
Note that the IsDescendantOf considers any row a descendant of itself, so we also need the CHILD.Id <> PARENT.Id in the condition.
Hi I use this one and works perfectly for me.
CREATE TABLE [dbo].[Test]([Id] [hierarchyid] NOT NULL, [Name] [nvarchar](50) NULL)
DECLARE #Parent AS HierarchyID = CAST('/2/1/' AS HierarchyID) -- Get Current Parent
DECLARE #Last AS HierarchyID
SELECT #Last = MAX(Id) FROM Test WHERE Id.GetAncestor(1) = #Parent -- Find Last Id for this Parent
INSERT INTO Test(Id,Name) VALUES(#Parent.GetDescendant(#Last, NULL),'Sydney') -- Insert after Last Id

Algorithm to select nodes and their parents in a tree

Let's say you have a tree structure as follows:
a [Level 0]
/ | \
b c d [Level 1]
/ \ |
e f g [Level 2]
| / \
h i j [Level 3]
I have represented this in a database like so:
node parent
------------
a null
b a
c a
d a
[...]
h f
i g
I'd like to write a function that, given a level, it will return all nodes at that level and their parents.
For example:
f(0) => { a }
f(1) => { a, b, c, d }
f(2) => { a, b, c, d, e, f, g }
Any thoughts?
Here I iterate through the levels, adding each one to the table with the level it is on.
create table mytable (
node varchar(80),
parent varchar(80),
constraint PK_mytable primary key nonclustered (node)
)
-- index for speed selecting on parent
create index IDX_mytable_parent on mytable (parent, node)
insert into mytable values ('a', null)
insert into mytable values ('b', 'a')
insert into mytable values ('c', 'a')
insert into mytable values ('d', 'a')
insert into mytable values ('e', 'b')
insert into mytable values ('f', 'b')
insert into mytable values ('g', 'd')
insert into mytable values ('h', 'f')
insert into mytable values ('i', 'g')
insert into mytable values ('j', 'g')
create function fn_level (#level int) returns #nodes table (Node varchar(80), TreeLevel int)
as begin
declare #current int
set #current = 0
while #current <= #level begin
if (#current = 0)
insert #nodes (Node, TreeLevel)
select node, #current
from mytable
where parent is null
else
insert #nodes (Node, TreeLevel)
select mt.node, #current
from #nodes n
inner join mytable mt on mt.parent = n.Node
where n.TreeLevel = (#current - 1)
set #current = #current + 1
end
return
end
select * from fn_level(2)
The usual way to do this, unless your flavour of SQL has a special non-standard function for it, is to build a path table that has these columns:
parent_key
child_key
path_length
To fill this table, you use a recursive or procedural loop to find all of the parents, grand-parents, great-grand-parents, etc for each item in your list of items. The recursion or looping needs to continue until you stop finding longer paths which return new pairs.
At the end, you'll have a list of records that tell you things like (a,b,1), (a,f,2), (a,h,3) etc. Then, to get everything that is at level x and above, you do a distinct select on all of the children with a path_length <= x (unioned with the root, unless you included a record of (null, root, 0) when you started your recursion/looping.
It would be nice if SQL were better at handling directed graphs (trees) but unfortunately you have to cheat it with extra tables like this.
A solution for MySQL is less than ideal.
Assuming that the maximum depth of the tree is known:
SELECT
nvl(e.node, nvl(d.node, nvl(c.node, nvl(b.node, a.node)))) item
, nvl2(e.node, 5, nvl2(d.node, 4, nvl2(c.node, 3, nvl2(b.node, 2, 1)))) depth
FROM table t AS a
LEFT JOIN table t AS b ON (a.node = b.parent)
LEFT JOIN table t AS c ON (b.node = c.parent)
LEFT JOIN table t AS d ON (c.node = d.parent)
LEFT JOIN table t AS e ON (d.node = e.parent)
WHERE a.parent IS NULL
This will give you every node and it's depth. After that it's trivial to select every item that has depth less that X.
If the depth of the tree is not known, or is significantly large then the solution is iterative as another poster has said.
Shamelessly copying from Jason, I made a function-less solution which I tested with postgresql (which has functions - maybe it would have worked out of the box).
create table tree (
node char(1),
parent char(1)
);
insert into tree values ('a', null);
insert into tree values ('b', 'a');
insert into tree values ('c', 'a');
insert into tree values ('d', 'a');
insert into tree values ('e', 'b');
insert into tree values ('f', 'b');
insert into tree values ('g', 'd');
insert into tree values ('h', 'f');
insert into tree values ('i', 'g');
insert into tree values ('j', 'g');
ALTER TABLE tree ADD level int2;
--
-- node parent level
-- a - 1
-- b a a.(level + 1)
-- c a a.(level + 1)
-- e b b.(level + 1)
-- root is a:
UPDATE tree SET level = 0 WHERE node = 'a';
-- every level else is parent + 1:
UPDATE tree tout -- outer
SET level = (
SELECT ti.level + 1
FROM tree ti -- inner
WHERE tout.parent = ti.node
AND ti.level IS NOT NULL)
WHERE tout.level IS NULL;
The update statement is pure sql, and has to be repeated for every level, to fill the table up.
kram=# select * from tree;
node | parent | level
------+--------+-------
a | | 1
b | a | 2
c | a | 2
d | a | 2
e | b | 3
f | b | 3
g | d | 3
h | f | 4
i | g | 4
j | g | 4
(10 Zeilen)
I started with 'level=1', not '0' for a, therefore the difference.
SQL doesn't always handle these recursive problems very well.
Some DBMS platforms allow you to use Common Table Expressions which are effectively queries that call themselves, allowing you to recurse through a data structure. There's no support for this in MySQL, so I'd recommend you use multiple queries constructed and managed by a script written in another language.
I do not know much about databases, or their terminology, but would it work if you performed a joint product of a table with itself N times in order to find all elements at level N?
In other words, perform a query in which you search for all entries that have parent A. That will return to you a list of all its children. Then, repeat the query to find the children of each of these children. Repeat this procedure until you find all children at level N.
In this way you would not have to pre-compute the depth of each item.

Multiple parents tree (or digraph) implementation sql server 2005

I need to implement a multi-parented tree (or digraph) onto SQL Server 2005.
I've read several articles, but most of them uses single-parented trees with a unique root like the following one.
-My PC
-Drive C
-Documents and Settings
-Program Files
-Adobe
-Microsoft
-Folder X
-Drive D
-Folder Y
-Folder Z
In this one, everything derives from a root element (My PC).
In my case, a child could have more than 1 parent, like the following:
G A
\ /
B
/ \
X C
/ \
D E
\ /
F
So I have the following code:
create table #ObjectRelations
(
Id varchar(20),
NextId varchar(20)
)
insert into #ObjectRelations values ('G', 'B')
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('B', 'X')
insert into #ObjectRelations values ('C', 'E')
insert into #ObjectRelations values ('C', 'D')
insert into #ObjectRelations values ('E', 'F')
insert into #ObjectRelations values ('D', 'F')
declare #id varchar(20)
set #id = 'A';
WITH Objects (Id, NextId) AS
( -- This is the 'Anchor' or starting point of the recursive query
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
WHERE rel.Id = #id
UNION ALL -- This is the recursive portion of the query
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)
ON rel.Id = Objects.NextId
)
SELECT o.*
FROM Objects o
drop table #ObjectRelations
Which returns the following SET:
Id NextId
-------------------- --------------------
A B
B C
B X
C E
C D
D F
E F
Expected result SET:
Id NextId
-------------------- --------------------
G B
A B
B C
B X
C E
C D
D F
E F
Note that the relation G->B is missing, because it asks for an starting object (which doesn't work for me also, because I don't know the root object from the start) and using A as the start point will ignore the G->B relationship.
So, this code doesn't work in my case because it asks for a starting object, which is obvious in a SINGLE-parent tree (will always be the root object). But in multi-parent tree, you could have more than 1 "root" object (like in the example, G and A are the "root" objects, where root is an object which doesn't have a parent (ancestor)).
So I'm kind of stucked in here... I need to modify the query to NOT ask for a starting object and recursively traverse the entire tree.
I don't know if that's possible with the (Id, NextId) implementation... may be I need to store it like a graph using some kind of Incidence matrix, adjacency matrix or whatever (see http://willets.org/sqlgraphs.html).
Any help? What do you think guys?
Thank you very much for your time =)
Cheers!
Sources:
Source 1
Source 2
Source 3
Well, I finally came up with the following solution.
It's the way I found to support multi-root trees and also cycling digraphs.
create table #ObjectRelations
(
Id varchar(20),
NextId varchar(20)
)
/* Cycle */
/*
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('C', 'A')
*/
/* Multi root */
insert into #ObjectRelations values ('G', 'B')
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('B', 'X')
insert into #ObjectRelations values ('C', 'E')
insert into #ObjectRelations values ('C', 'D')
insert into #ObjectRelations values ('E', 'F')
insert into #ObjectRelations values ('D', 'F')
declare #startIds table
(
Id varchar(20) primary key
)
;WITH
Ids (Id) AS
(
SELECT Id
FROM #ObjectRelations
),
NextIds (Id) AS
(
SELECT NextId
FROM #ObjectRelations
)
INSERT INTO #startIds
/* This select will not return anything since there are not objects without predecessor, because it's a cyclic of course */
SELECT DISTINCT
Ids.Id
FROM
Ids
LEFT JOIN
NextIds on Ids.Id = NextIds.Id
WHERE
NextIds.Id IS NULL
UNION
/* So let's just pick anyone. (the way I will be getting the starting object for a cyclic doesn't matter for the regarding problem)*/
SELECT TOP 1 Id FROM Ids
;WITH Objects (Id, NextId, [Level], Way) AS
( -- This is the 'Anchor' or starting point of the recursive query
SELECT rel.Id,
rel.NextId,
1,
CAST(rel.Id as VARCHAR(MAX))
FROM #ObjectRelations rel
WHERE rel.Id IN (SELECT Id FROM #startIds)
UNION ALL -- This is the recursive portion of the query
SELECT rel.Id,
rel.NextId,
[Level] + 1,
RecObjects.Way + ', ' + rel.Id
FROM #ObjectRelations rel
INNER JOIN Objects RecObjects -- Note the reference to CTE table name (Recursive Join)
ON rel.Id = RecObjects.NextId
WHERE RecObjects.Way NOT LIKE '%' + rel.Id + '%'
)
SELECT DISTINCT
Id,
NextId,
[Level]
FROM Objects
ORDER BY [Level]
drop table #ObjectRelations
Could be useful for somebody. It is for me =P
Thanks
If you want to use all root objects as starting objects, you should first update your data to include information about the root objects (and the leaves). You should add the following inserts:
insert into #ObjectRelations values (NULL, 'G')
insert into #ObjectRelations values (NULL, 'A')
insert into #ObjectRelations values ('X', NULL)
insert into #ObjectRelations values ('F', NULL)
Of course you could also write your anchor query in such a way that you select as root nodes the records that have an Id that does not occur as a NextId, but this is easier.
Next, modify your anchor query to look like this:
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
WHERE rel.Id IS NULL
If you run this query, you'll see that you get a lot of duplicates, a lot of arcs occur multiple times. This is because you now have two results from your anchor query and therefore the tree is traversed two times.
This can be fixed by changing your select statement to this (note the DISTINCT):
SELECT DISTINCT o.*
FROM Objects o
If you dont want to do the inserts suggested by Ronald,this would do!.
WITH CTE_MultiParent (ID, ParentID)
AS
(
SELECT ID, ParentID FROM #ObjectRelations
WHERE ID NOT IN
(
SELECT DISTINCT ParentID FROM #ObjectRelations
)
UNION ALL
SELECT ObjR.ID, ObjR.ParentID FROM #ObjectRelations ObjR INNER JOIN CTE_MultiParent
ON CTE_MultiParent.ParentID = ObjR.Id
)
SELECT DISTINCT * FROM CTE_MultiParent