Algorithm to select nodes and their parents in a tree - sql

Let's say you have a tree structure as follows:
a [Level 0]
/ | \
b c d [Level 1]
/ \ |
e f g [Level 2]
| / \
h i j [Level 3]
I have represented this in a database like so:
node parent
------------
a null
b a
c a
d a
[...]
h f
i g
I'd like to write a function that, given a level, it will return all nodes at that level and their parents.
For example:
f(0) => { a }
f(1) => { a, b, c, d }
f(2) => { a, b, c, d, e, f, g }
Any thoughts?

Here I iterate through the levels, adding each one to the table with the level it is on.
create table mytable (
node varchar(80),
parent varchar(80),
constraint PK_mytable primary key nonclustered (node)
)
-- index for speed selecting on parent
create index IDX_mytable_parent on mytable (parent, node)
insert into mytable values ('a', null)
insert into mytable values ('b', 'a')
insert into mytable values ('c', 'a')
insert into mytable values ('d', 'a')
insert into mytable values ('e', 'b')
insert into mytable values ('f', 'b')
insert into mytable values ('g', 'd')
insert into mytable values ('h', 'f')
insert into mytable values ('i', 'g')
insert into mytable values ('j', 'g')
create function fn_level (#level int) returns #nodes table (Node varchar(80), TreeLevel int)
as begin
declare #current int
set #current = 0
while #current <= #level begin
if (#current = 0)
insert #nodes (Node, TreeLevel)
select node, #current
from mytable
where parent is null
else
insert #nodes (Node, TreeLevel)
select mt.node, #current
from #nodes n
inner join mytable mt on mt.parent = n.Node
where n.TreeLevel = (#current - 1)
set #current = #current + 1
end
return
end
select * from fn_level(2)

The usual way to do this, unless your flavour of SQL has a special non-standard function for it, is to build a path table that has these columns:
parent_key
child_key
path_length
To fill this table, you use a recursive or procedural loop to find all of the parents, grand-parents, great-grand-parents, etc for each item in your list of items. The recursion or looping needs to continue until you stop finding longer paths which return new pairs.
At the end, you'll have a list of records that tell you things like (a,b,1), (a,f,2), (a,h,3) etc. Then, to get everything that is at level x and above, you do a distinct select on all of the children with a path_length <= x (unioned with the root, unless you included a record of (null, root, 0) when you started your recursion/looping.
It would be nice if SQL were better at handling directed graphs (trees) but unfortunately you have to cheat it with extra tables like this.

A solution for MySQL is less than ideal.
Assuming that the maximum depth of the tree is known:
SELECT
nvl(e.node, nvl(d.node, nvl(c.node, nvl(b.node, a.node)))) item
, nvl2(e.node, 5, nvl2(d.node, 4, nvl2(c.node, 3, nvl2(b.node, 2, 1)))) depth
FROM table t AS a
LEFT JOIN table t AS b ON (a.node = b.parent)
LEFT JOIN table t AS c ON (b.node = c.parent)
LEFT JOIN table t AS d ON (c.node = d.parent)
LEFT JOIN table t AS e ON (d.node = e.parent)
WHERE a.parent IS NULL
This will give you every node and it's depth. After that it's trivial to select every item that has depth less that X.
If the depth of the tree is not known, or is significantly large then the solution is iterative as another poster has said.

Shamelessly copying from Jason, I made a function-less solution which I tested with postgresql (which has functions - maybe it would have worked out of the box).
create table tree (
node char(1),
parent char(1)
);
insert into tree values ('a', null);
insert into tree values ('b', 'a');
insert into tree values ('c', 'a');
insert into tree values ('d', 'a');
insert into tree values ('e', 'b');
insert into tree values ('f', 'b');
insert into tree values ('g', 'd');
insert into tree values ('h', 'f');
insert into tree values ('i', 'g');
insert into tree values ('j', 'g');
ALTER TABLE tree ADD level int2;
--
-- node parent level
-- a - 1
-- b a a.(level + 1)
-- c a a.(level + 1)
-- e b b.(level + 1)
-- root is a:
UPDATE tree SET level = 0 WHERE node = 'a';
-- every level else is parent + 1:
UPDATE tree tout -- outer
SET level = (
SELECT ti.level + 1
FROM tree ti -- inner
WHERE tout.parent = ti.node
AND ti.level IS NOT NULL)
WHERE tout.level IS NULL;
The update statement is pure sql, and has to be repeated for every level, to fill the table up.
kram=# select * from tree;
node | parent | level
------+--------+-------
a | | 1
b | a | 2
c | a | 2
d | a | 2
e | b | 3
f | b | 3
g | d | 3
h | f | 4
i | g | 4
j | g | 4
(10 Zeilen)
I started with 'level=1', not '0' for a, therefore the difference.

SQL doesn't always handle these recursive problems very well.
Some DBMS platforms allow you to use Common Table Expressions which are effectively queries that call themselves, allowing you to recurse through a data structure. There's no support for this in MySQL, so I'd recommend you use multiple queries constructed and managed by a script written in another language.

I do not know much about databases, or their terminology, but would it work if you performed a joint product of a table with itself N times in order to find all elements at level N?
In other words, perform a query in which you search for all entries that have parent A. That will return to you a list of all its children. Then, repeat the query to find the children of each of these children. Repeat this procedure until you find all children at level N.
In this way you would not have to pre-compute the depth of each item.

Related

Azure Synapse recursive CTE alternative for flatten hierarchy

I am facing following challenge: I would like to flatten a parent child hierarchy in a way that I have per row the child + any parent of the upper levels
Source data
Child
Parent
A
B
B
C
D
E
X
Y
Y
Z
Whished result
Child
Any Parent
A
B
A
C
A
D
A
E
B
C
B
D
B
E
C
D
C
E
D
E
X
Y
X
Z
Y
Z
There is no information about levels and there is not a distinct "Top parent". It is a unbalanced tree and there might be multiple "Top 1 parents" with different childs.
I know this should normally be done via a recursive CTE, but since this is not supported in Azure Synapse I am searching for a smart loop statement to solve this puzzle. Performance is not the priority since it is in a DWH context with not that much data.
Really appreciate any smart hints or solutions
Tried several recursive CTE solutions for that which are unfortunately all not supported in Synapse
I'm not sure what the limitations are with Azure Synapse but try this a..
It's won't win any awards for elegance but does the job. I corrected what I assumed to be a missing record in your sample data too (C, D) based on your expected output.
DECLARE #t TABLE(Child varchar(10), Parent varchar(10))
INSERT INTO #t VALUES ('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E'), ('X', 'Y'), ('Y', 'Z')
DECLARE #Items TABLE (Item varchar(10), Done INT)
DECLARE #Results TABLE(Child varchar(10), Parent varchar(10))
DECLARE #p varchar(10)
DECLARE #curItem varchar(10)
INSERT INTO #Items (item)
SELECT DISTINCT Child FROM #t
UNION
SELECT DISTINCT Parent FROM #t
SET #curItem = (SELECT TOP 1 Item FROM #Items WHERE Done IS NULL)
WHILE #curItem IS NOT NULL
BEGIN
SELECT #p = Parent FROM #t WHERE Child = #curItem
WHILE #p IS NOT NULL
BEGIN
INSERT INTO #Results
SELECT #curItem, Parent FROM #t WHERE Parent = #p
SET #p = (SELECT Parent FROM #t WHERE Child = #p)
END
UPDATE #Items SET Done = 1 WHERE Item = #curItem
SET #curItem = (SELECT TOP 1 Item FROM #Items WHERE Done IS NULL)
END
SELECT * FROM #Results
I tested this and it does work, tried to stick on sql fiddle but it would not run on there for some rerason.
Anyway,
Here's the results

SQL logic for the Vlookup function in excel/ How to do a Vlookup in SQL

I have a table with 2 columns OLD_VALUE and NEW_VALUE and 5 rows. 1st row has values (A,B). Other row values can be (B,C),(C,D),(E,D),(D,F). I want to update all the old values with the new value (how a vlookup in excel would work) The Final Result Required: The newest value in the above example would be D,F. i.e. D points to F. E and C point to D. B points to C and A points to B. D pointing to F is the last and newest and there are no more successions after D,F. So (OLD_VALUE,NEW_VALUE)->(A,F), (B,F), (C,F), (D,F), (E,F). I want 5 rows with the NEW_VALUE as 'F'. The level of successions can be ranging from 1 to x.
This is the table I have used for the script:
declare #t as table(old_value char(1), new_value char(1));
insert into #t values('A','B')
insert into #t values('B','C')
insert into #t values('C','D')
insert into #t values('E','D')
insert into #t values('D','F')
This needs to be done with a recursive CTE. First, you will need to define an anchor for the CTE. The anchor in this case should be the record with the latest value. This is how I define the anchor:
select old_value, new_value, 1 as level
from #t
where new_value NOT IN (select old_value from #t)
And here is the recursive CTE I used to locate the latest value for each row:
;with a as(
select old_value, new_value, 1 as level
from #t
where new_value NOT IN (select old_value from #t)
union all
select b.old_value, a.new_value, a.level + 1
from a INNER JOIN #t b ON a.old_value = b.new_value
)
select * from a
Results:
old_value new_value level
--------- --------- -----------
D F 1
C F 2
E F 2
B F 3
A F 4
(5 row(s) affected)
I think a recursive CTE like the following is what you're looking for (where the parent is the row whose second value does not exist as a first value elsewhere). If there's no parent(s) to anchor to, this would fail (e.g. if you had A->B, B->C, C->A, you'd get no result), but it should work for your case:
DECLARE #T TABLE (val1 CHAR(1), val2 CHAR(2));
INSERT #T VALUES ('A', 'B'), ('B', 'C'), ('C', 'D'), ('E', 'D'), ('D', 'F');
WITH CTE AS
(
SELECT val1, val2
FROM #T AS T
WHERE NOT EXISTS (SELECT 1 FROM #T WHERE val1 = T.val2)
UNION ALL
SELECT T.val1, CTE.val2
FROM #T AS T
JOIN CTE
ON CTE.val1 = T.val2
)
SELECT *
FROM CTE;

Find whether graph has a cycle

I want to find out whether it is possible to find cycles in Hierarchical or Chain data with SQL.
E.g. I have following schema:
http://sqlfiddle.com/#!3/27269
create table node (
id INTEGER
);
create table edges (
id INTEGER,
node_a INTEGER,
node_b INTEGER
);
create table graph (
id INTEGER,
edge_id INTEGER);
INSERT INTO node VALUES (1) , (2), (3), (4);
INSERT INTO edges VALUES (1, 1, 2), (2, 2, 3) , (3, 3, 4) , (4, 4, 1);
-- first graph [id = 1] with cycle (1 -> 2 -> 3 -> 4 -> 1)
INSERT INTO graph VALUES (1, 1), (1, 2), (1, 3), (1, 4);
-- second graph [id =2] without cycle (1 -> 2 -> 3)
INSERT INTO graph VALUES (2, 1), (2, 2), (2, 3);
In graph table records with same ID belong to one graph.
I need a query that will return IDs of all graphs that have at least one cycle.
So for example above query should return 1, which is the id of the first graph;
First, I assume this is a directed graph. An undirected graph has a trivial cycle if it contains a single edge.
The only tricky part to the recursive CTE is stopping when you've hit a cycle -- so you don't get infinite recursion.
Try this:
with cte as (
select e.object_a, e.object_b, iscycle = 0
from edges e
union all
select cte.object_a, e.object_b,
(case when cte.object_a = e.object_b then 1 else 0 end) as iscycle
from cte join
edges e
on cte.object_b = e.object_a
where iscycle = 0
)
select max(iscycle)
from cte;
I wrote SQL query based on #gordon-linoff answer. In some cases I had infinite loop, so I added column with node_path and then I was checking if the current connection had appeared in that column.
This is this script:
create table edges (
node_a varchar(20),
node_b varchar(20)
);
INSERT INTO edges VALUES ('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'D'), ('D', 'K'), ('K', 'A')
GO
with cte as (
SELECT
e.node_a
, e.node_b
, 0 as depth
, iscycle = 0
, CAST(e.node_a +' -> '+ e.node_b AS varchar(MAX)) as nodes_path
FROM edges e
UNION ALL
SELECT
cte.node_a
, e.node_b
, depth + 1
, (case when cte.node_a = e.node_b then 1 else 0 end) as iscycle
, CAST(cte.nodes_path+' -> '+ e.node_b AS varchar(MAX)) as nodes_path
FROM cte
JOIN edges e ON cte.node_b = e.node_a AND cte.nodes_path NOT LIKE '%' + CAST(cte.node_a+' -> '+ e.node_b AS varchar(500)) + '%'
where iscycle = 0
)
SELECT * -- max(iscycle)
FROM cte
option (maxrecursion 300) --just for safety :)
I don't know if it is efficient where are millions of records, so if you can see that I could write this query more optimized, please share with your opinion.

How do you order by column within a row in sql?

Say I have a table that looks something like this
ATT A | ATT B
-------------
A | B
D | C
E | F
H | G
and instead I want a table that looks like:
ATT A | ATT B
-------------
A | B
C | D
E | F
G | H
How would I go about doing that? I'm using SQLite.
SQL Server has a case .. end expression that you can use inline in the SQL statement. I think SQLite uses the same syntax. This does what you asked for in SQL Server:
create table #temp (AttA char(1), AttB char(1))
insert into #temp valueS ('A', 'B'), ('D', 'C'), ('E', 'F'), ('H', 'G')
select * from #temp
select case when AttA < AttB then AttA else AttB end as AttA,
case when AttB > AttA then AttB else AttA end as AttB
from #temp
drop table #temp
Supposing the columns have the same type and you just have 2 columns to reorder, you can rely on the min/max functions. With sqlite, when provided with multiple parameters, these functions do not aggregate data from multiple rows.
create table mytable( ATTA char, ATTB char );
insert into mytable values ('A', 'B');
insert into mytable values ('D', 'C');
insert into mytable values ('E', 'F');
insert into mytable values ('H', 'G') ;
select min(ATTA,ATTB),max(ATTA,ATTB) from mytable order by 1,2 ;
A|B
C|D
E|F
G|H
The min/max functions reorder the values in the columns. The order by clause reorders the rows. I think it cannot be generalized to more than 2 columns, except by writing a user defined C function to be called from sqlite.

Multiple parents tree (or digraph) implementation sql server 2005

I need to implement a multi-parented tree (or digraph) onto SQL Server 2005.
I've read several articles, but most of them uses single-parented trees with a unique root like the following one.
-My PC
-Drive C
-Documents and Settings
-Program Files
-Adobe
-Microsoft
-Folder X
-Drive D
-Folder Y
-Folder Z
In this one, everything derives from a root element (My PC).
In my case, a child could have more than 1 parent, like the following:
G A
\ /
B
/ \
X C
/ \
D E
\ /
F
So I have the following code:
create table #ObjectRelations
(
Id varchar(20),
NextId varchar(20)
)
insert into #ObjectRelations values ('G', 'B')
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('B', 'X')
insert into #ObjectRelations values ('C', 'E')
insert into #ObjectRelations values ('C', 'D')
insert into #ObjectRelations values ('E', 'F')
insert into #ObjectRelations values ('D', 'F')
declare #id varchar(20)
set #id = 'A';
WITH Objects (Id, NextId) AS
( -- This is the 'Anchor' or starting point of the recursive query
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
WHERE rel.Id = #id
UNION ALL -- This is the recursive portion of the query
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
INNER JOIN Objects -- Note the reference to CTE table name (Recursive Join)
ON rel.Id = Objects.NextId
)
SELECT o.*
FROM Objects o
drop table #ObjectRelations
Which returns the following SET:
Id NextId
-------------------- --------------------
A B
B C
B X
C E
C D
D F
E F
Expected result SET:
Id NextId
-------------------- --------------------
G B
A B
B C
B X
C E
C D
D F
E F
Note that the relation G->B is missing, because it asks for an starting object (which doesn't work for me also, because I don't know the root object from the start) and using A as the start point will ignore the G->B relationship.
So, this code doesn't work in my case because it asks for a starting object, which is obvious in a SINGLE-parent tree (will always be the root object). But in multi-parent tree, you could have more than 1 "root" object (like in the example, G and A are the "root" objects, where root is an object which doesn't have a parent (ancestor)).
So I'm kind of stucked in here... I need to modify the query to NOT ask for a starting object and recursively traverse the entire tree.
I don't know if that's possible with the (Id, NextId) implementation... may be I need to store it like a graph using some kind of Incidence matrix, adjacency matrix or whatever (see http://willets.org/sqlgraphs.html).
Any help? What do you think guys?
Thank you very much for your time =)
Cheers!
Sources:
Source 1
Source 2
Source 3
Well, I finally came up with the following solution.
It's the way I found to support multi-root trees and also cycling digraphs.
create table #ObjectRelations
(
Id varchar(20),
NextId varchar(20)
)
/* Cycle */
/*
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('C', 'A')
*/
/* Multi root */
insert into #ObjectRelations values ('G', 'B')
insert into #ObjectRelations values ('A', 'B')
insert into #ObjectRelations values ('B', 'C')
insert into #ObjectRelations values ('B', 'X')
insert into #ObjectRelations values ('C', 'E')
insert into #ObjectRelations values ('C', 'D')
insert into #ObjectRelations values ('E', 'F')
insert into #ObjectRelations values ('D', 'F')
declare #startIds table
(
Id varchar(20) primary key
)
;WITH
Ids (Id) AS
(
SELECT Id
FROM #ObjectRelations
),
NextIds (Id) AS
(
SELECT NextId
FROM #ObjectRelations
)
INSERT INTO #startIds
/* This select will not return anything since there are not objects without predecessor, because it's a cyclic of course */
SELECT DISTINCT
Ids.Id
FROM
Ids
LEFT JOIN
NextIds on Ids.Id = NextIds.Id
WHERE
NextIds.Id IS NULL
UNION
/* So let's just pick anyone. (the way I will be getting the starting object for a cyclic doesn't matter for the regarding problem)*/
SELECT TOP 1 Id FROM Ids
;WITH Objects (Id, NextId, [Level], Way) AS
( -- This is the 'Anchor' or starting point of the recursive query
SELECT rel.Id,
rel.NextId,
1,
CAST(rel.Id as VARCHAR(MAX))
FROM #ObjectRelations rel
WHERE rel.Id IN (SELECT Id FROM #startIds)
UNION ALL -- This is the recursive portion of the query
SELECT rel.Id,
rel.NextId,
[Level] + 1,
RecObjects.Way + ', ' + rel.Id
FROM #ObjectRelations rel
INNER JOIN Objects RecObjects -- Note the reference to CTE table name (Recursive Join)
ON rel.Id = RecObjects.NextId
WHERE RecObjects.Way NOT LIKE '%' + rel.Id + '%'
)
SELECT DISTINCT
Id,
NextId,
[Level]
FROM Objects
ORDER BY [Level]
drop table #ObjectRelations
Could be useful for somebody. It is for me =P
Thanks
If you want to use all root objects as starting objects, you should first update your data to include information about the root objects (and the leaves). You should add the following inserts:
insert into #ObjectRelations values (NULL, 'G')
insert into #ObjectRelations values (NULL, 'A')
insert into #ObjectRelations values ('X', NULL)
insert into #ObjectRelations values ('F', NULL)
Of course you could also write your anchor query in such a way that you select as root nodes the records that have an Id that does not occur as a NextId, but this is easier.
Next, modify your anchor query to look like this:
SELECT rel.Id,
rel.NextId
FROM #ObjectRelations rel
WHERE rel.Id IS NULL
If you run this query, you'll see that you get a lot of duplicates, a lot of arcs occur multiple times. This is because you now have two results from your anchor query and therefore the tree is traversed two times.
This can be fixed by changing your select statement to this (note the DISTINCT):
SELECT DISTINCT o.*
FROM Objects o
If you dont want to do the inserts suggested by Ronald,this would do!.
WITH CTE_MultiParent (ID, ParentID)
AS
(
SELECT ID, ParentID FROM #ObjectRelations
WHERE ID NOT IN
(
SELECT DISTINCT ParentID FROM #ObjectRelations
)
UNION ALL
SELECT ObjR.ID, ObjR.ParentID FROM #ObjectRelations ObjR INNER JOIN CTE_MultiParent
ON CTE_MultiParent.ParentID = ObjR.Id
)
SELECT DISTINCT * FROM CTE_MultiParent