SQL - How to store and navigate hierarchies? - sql

What are the ways that you use to model and retrieve hierarchical info in a database?

I like the Modified Preorder Tree Traversal Algorithm. This technique makes it very easy to query the tree.
But here is a list of links about the topic which I copied from the Zend Framework (PHP) contributors webpage (posted there by Posted by Laurent Melmoux at Jun 05, 2007 15:52).
Many of the links are language agnostic:
There is 2 main representations and algorithms to represent hierarchical structures with databases :
nested set also known as modified preorder tree traversal algorithm
adjacency list model
It's well explained here:
http://www.sitepoint.com/article/hierarchical-data-database
Managing Hierarchical Data in MySQL
http://www.evolt.org/article/Four_ways_to_work_with_hierarchical_data/17/4047/index.html
Here are some more links that I've collected:
http://en.wikipedia.org/wiki/Tree_%28data_structure%29
http://en.wikipedia.org/wiki/Category:Trees_%28structure%29
adjacency list model
http://www.sqlteam.com/item.asp?ItemID=8866
nested set
http://www.sqlsummit.com/AdjacencyList.htm
http://www.edutech.ch/contribution/nstrees/index.php
http://www.phpriot.com/d/articles/php/application-design/nested-trees-1/
http://www.dbmsmag.com/9604d06.html
http://en.wikipedia.org/wiki/Tree_traversal
http://www.cosc.canterbury.ac.nz/mukundan/dsal/BTree.html (applet java montrant le fonctionnement )
Graphes
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html
Classes :
Nested Sets DB Tree Adodb
http://www.phpclasses.org/browse/package/2547.html
Visitation Model ADOdb
http://www.phpclasses.org/browse/package/2919.html
PEAR::DB_NestedSet
http://pear.php.net/package/DB_NestedSet
utilisation : https://www.entwickler.com/itr/kolumnen/psecom,id,26,nodeid,207.html
PEAR::Tree
http://pear.php.net/package/Tree/download/0.3.0/
http://www.phpkitchen.com/index.php?/archives/337-PEARTree-Tutorial.html
nstrees
http://www.edutech.ch/contribution/nstrees/index.php

The definitive pieces on this subject have been written by Joe Celko, and he has worked a number of them into a book called Joe Celko's Trees and Hierarchies in SQL for Smarties.
He favours a technique called directed graphs. An introduction to his work on this subject can be found here

What's the best way to represent a hierachy in a SQL database? A generic, portable technique?
Let's assume the hierachy is mostly read, but isn't completely static. Let's say it's a family tree.
Here's how not to do it:
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date,
mother integer,
father integer
);
And inserting data like this:
person_id name dob mother father
1 Pops 1900/1/1 null null
2 Grandma 1903/2/4 null null
3 Dad 1925/4/2 2 1
4 Uncle Kev 1927/3/3 2 1
5 Cuz Dave 1953/7/8 null 4
6 Billy 1954/8/1 null 3
Instead, split your nodes and your relationships into two tables.
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date
);
create table ancestor (
ancestor_id integer,
descendant_id integer,
distance integer
);
Data is created like this:
person_id name dob
1 Pops 1900/1/1
2 Grandma 1903/2/4
3 Dad 1925/4/2
4 Uncle Kev 1927/3/3
5 Cuz Dave 1953/7/8
6 Billy 1954/8/1
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
1 3 1
2 3 1
1 4 1
2 4 1
1 5 2
2 5 2
4 5 1
1 6 2
2 6 2
3 6 1
you can now run arbitary queries that don't involve joining the table back on itself, which would happen if you have the heirachy relationship in the same row as the node.
Who has grandparents?
select * from person where person_id in
(select descendant_id from ancestor where distance=2);
All your descendants:
select * from person where person_id in
(select descendant_id from ancestor
where ancestor_id=1 and distance>0);
Who are uncles?
select decendant_id uncle from ancestor
where distance=1 and ancestor_id in
(select ancestor_id from ancestor
where distance=2 and not exists
(select ancestor_id from ancestor
where distance=1 and ancestor_id=uncle)
)
You avoid all the problems of joining a table to itself via subqueries, a common limitation is 16 subsuqeries.
Trouble is, maintaining the ancestor table is kind of hard - best done with a stored procedure.

I've got to disagree with Josh. What happens if you're using a huge hierarchical structure like a company organization. People can join/leave the company, change reporting lines, etc... Maintaining the "distance" would be a big problem and you would have to maintain two tables of data.
This query (SQL Server 2005 and above) would let you see the complete line of any person AND calculates their place in the hierarchy and it only requires a single table of user information. It can be modified to find any child relationship.
--Create table of dummy data
create table #person (
personID integer IDENTITY(1,1) NOT NULL,
name varchar(255) not null,
dob date,
father integer
);
INSERT INTO #person(name,dob,father)Values('Pops','1900/1/1',NULL);
INSERT INTO #person(name,dob,father)Values('Grandma','1903/2/4',null);
INSERT INTO #person(name,dob,father)Values('Dad','1925/4/2',1);
INSERT INTO #person(name,dob,father)Values('Uncle Kev','1927/3/3',1);
INSERT INTO #person(name,dob,father)Values('Cuz Dave','1953/7/8',4);
INSERT INTO #person(name,dob,father)Values('Billy','1954/8/1',3);
DECLARE #OldestPerson INT;
SET #OldestPerson = 1; -- Set this value to the ID of the oldest person in the family
WITH PersonHierarchy (personID,Name,dob,father, HierarchyLevel) AS
(
SELECT
personID
,Name
,dob
,father,
1 as HierarchyLevel
FROM #person
WHERE personID = #OldestPerson
UNION ALL
SELECT
e.personID,
e.Name,
e.dob,
e.father,
eh.HierarchyLevel + 1 AS HierarchyLevel
FROM #person e
INNER JOIN PersonHierarchy eh ON
e.father = eh.personID
)
SELECT *
FROM PersonHierarchy
ORDER BY HierarchyLevel, father;
DROP TABLE #person;

FYI: SQL Server 2008 introduces a new HierarchyID data type for this sort of situation. Gives you control over where in the "tree" your row sits, horizontally as well as vertically.

Oracle: SELECT ... START WITH ... CONNECT BY
Oracle has an extension to SELECT that allows easy tree-based retrieval. Perhaps SQL Server has some similar extension?
This query will traverse a table where the nesting relationship is stored in parent and child columns.
select * from my_table
start with parent = :TOP
connect by prior child = parent;
http://www.adp-gmbh.ch/ora/sql/connect_by.html

I prefer a mix of the techinques used by Josh and Mark Harrison:
Two tables, one with the data of the Person and other with the hierarchichal info (person_id, parent_id [, mother_id]) if the PK of this table is person_id, you have a simple tree with only one parent by node (which makes sense in this case, but not in other cases like accounting accounts)
This hiarchy table can be transversed by recursive procedures or if your DB supports it by sentences like SELECT... BY PRIOR (Oracle).
Other posibility is if you know the max deep of the hierarchy data you want to mantain is use a single table with a set of columns per level of hierarchy

We had the same issue when we implemented a tree component for [fleXive] and used the nested set tree model approach mentioned by tharkun from the MySQL docs.
In addition to speed things (dramatically) up we used a spreaded approach which simply means we used the maximum Long value for the top level right bounds which allows us to insert and move nodes without recalculating all left and right values. Values for left and right are calculated by dividing the range for a node by 3 und use the inner third as bounds for the new node.
A java code example can be seen here.

If you're using SQL Server 2005 then this link explains how to retrieve hierarchical data.
Common Table Expressions (CTEs) can be your friends once you get comfortable using them.

Related

Data structure for many to many hierarchies in SQL Server

I have the following data structure already in the system.
ItemDetails:
ID Name
--------
1 XXX
2 YYY
3 ZZZ
4 TTT
5 UUU
6 WWW
And the hierarchies are in separate table (with many to many relationships)
ItemHierarchy:
ParentCode ChildCode
--------------------
1 2
1 3
3 4
4 5
5 3
5 6
As you can see that 3 is child node for 1 and 3. I want to traverse records say for example that from the node 3.
I need to write a stored procedure and get all the ancestors of 3 and all the child nodes of 3.
Could you please let me know whether any possibilities to pull the data? If so, which data structure is OK for it.
Please note that my table is containing 1 million records and out of it 40% are having multiple hierarchies.
I did 'CTE' with level and incrementing it based upon the hierarchy but I'm getting max recursive error when we traverse from root to leaf level node. I have tried 'HierarchyID' but unable to get all the details when its having multiple parent for a node.
Update: I can set a recursion limit to max and run the query. Since it has millions of rows, I'm unable to get the output at all.
I want to create a data structure such that its capable to giving information from top to bottom or bottom to top (at any node level).
Could someone kindly please help me with that?
Using RDBMS for hierarchical data structure is not recommended, its why graph database have been created.
BTW following Closure Table pattern will help you.
The Closure Table solution is a simple and elegant way of storing hierarchies. It involves storing all paths through the tree, not just those with a direct parent-child relationship.
The key point to use the pattern is how you must fill ItemHierarchy table.
Store one row in this table for each pair of nodes in the tree that shares an ancestor/descendant relationship, even if they are separated by multiple levels in the tree. Also add a row for each node to reference itself.
Think we have a simple graph like bellow:
The doted arrows shows the rows in ItemHierarchy table:
To retrieve descendants of #3:
SELECT c.*
FROM ItemDetails AS ID
JOIN ItemHierarchy AS IH ON ID.ID = IH.ChildCode
WHERE IH.ParentCode = 3;
To retrieve ancestors of #3:
SELECT c.*
FROM ItemDetails AS ID
JOIN ItemHierarchy AS IH ON ID.ID = IH.ParentCode
WHERE IH.ChildCode = 3;
To insert a new leaf node, for instance a new child of #5, first
insert the self-referencing row. Then add a copy of the set of rows in
TreePaths that reference comment #5 as a descendant (including the row
in which #5 references itself), replacing the descendant with
the number of the new item:
INSERT INTO ItemHierarchy (parentCode, childCode)
SELECT IH.parentCode, 8
FROM ItemHierarchy AS IH
WHERE IH.childCode = 5
UNION ALL
SELECT 8, 8;
To delete a complete sub-tree, for instance #4 and its descendants, delete all rows in ItemHierarchy that reference #4 as a
descendant, as well as all rows that reference any of #4’s
descendants as descendants:
DELETE FROM ItemHierarchy
WHERE chidCode IN (SELECT childCode
FROM ItemHierarchy
WHERE parrentCode = 4);
UPDATE
Since the sample data you have shown us leads to recursive loops(not hierarchies) like:
1 -> 3 -> 4 -> 5 -> 3 -> 4 -> 5
Following Path Enumeration pattern will help you.
A UNIX path like /usr/local/lib/ is a path enumeration of the file system,
where usr is the parent of local, which in turn is the parent of lib.
You can create a Table or View from ItemHierarchy table, calling it EnumPath:
Table EnumPath(NodeCode, Path) For the sample data we will have:
To find ancestors of node #4:
select distinct E1.NodeCode from EnumPath E1
inner join EnumPath E2
On E2.path like E1.path || '%'
where E2.NodeCode = 4 and E1.NodeCode != 4;
To find descendants of node #4:
select distinct E1.NodeCode from EnumPath E1
inner join EnumPath E2
On E1.path like E2.path || '%'
where E2.NodeCode = 4 and E1.NodeCode != 4;
Sqlfiddle demo

Simple SQL statement with parent/child hierarchies

It's been a while since I've needed to write SQL statements (and I don't even know if ever had enough knowledge to make this statement).
So, here's the deal. Table has two column. One is for parent id, other is for child Id.
parent_id | child_id
4 | 2
2 | 5
This is simply for saving composite parent/child hierarchies.
4, 2 line means that structure with id 4 refers to structure id 2 as a child.
2, 5 means structure with id 2 refers to structure with id 5 as a child.
And so on.
This is what I need to do:
I need to extract ALL structures, that are not referenced by any structure as a child (root structures).
What SQL (preferrably postgres) statement will accomplish that?
Finding all structures that are not a child of another structure:
select *
from YourTable
where Parent_Id not in (Select child_id from ...)
Assuming there is no scope for a grandparent, great-grandparent relationsips, I would recommend use a Left-JOIN in this case.
Somethink on the lines of:
Select * from Table
LEFt join Table on Parent_id=child_id
WHERE child_id is null
SELECT *
FROM structures
WHERE id not in ( SELECT child_id FROM Table ) AS dummy

Building a hierarchical tree with a single SQL query

I have a SQL table with the following structure.
id - int
par - int (relational to id)
name - varchar
Column par contains references to id or NULL if no reference, this table is meant to build an hierarchical tree.
Then, given the data:
id par name
1 NULL John
2 NULL Mario
3 1 George
4 3 Alfred
5 4 Nicole
6 2 Margaret
I want to retrieve a hierarchical tree, up to the last parent, from a given single id.
Example, I want to know the tree from Nicole to the last parent. So the query result will be:
id par name
5 4 Nicole
4 3 Alfred
3 1 George
1 NULL John
I would normally do this with a SQL query repeating over and over and building the tree server side but I do not want that now.
Is there any way to achieve this with a single SQL query?
I need this for either MySQL or PgSQL.
And I want to know also, if possible, is it also widely supported? In which versions of either MySQL or PgSQL can I expect support?
It is possible with a single query in Postgres using a recursive common table expression. This is not possible in MySQL as it is one of the few database to not support recursive CTEs.
It would look something like this (not tested)
WITH RECURSIVE tree (id, par, name) AS (
SELECT id, par, name
FROM the_table
WHERE name = 'Nicole'
UNION ALL
SELECT id, par, name
FROM the_table tt
JOIN tree tr ON tr.id = tt.par
)
SELECT *
FROM tree
For Postgres, see http://www.postgresql.org/docs/8.4/static/queries-with.html
MySQL doesn't support this syntax (unless it's in a beta/development tree somewhere). Oracle has something similar using connect by prior.
This article is probably what you need to look at:
http://explainextended.com/2009/03/17/hierarchical-queries-in-mysql/
In Oracle, this is done via:
SELECT [[LEVEL,]] id, par, name FROM my_table
START WITH name = 'Nicole'
CONNECT BY [[NOCYCLE]] id = PRIOR par
[[ORDER SIBLINGS BY name ASC]]
(my [[…]] syntax denotes optional query bits.
MySQL is planning to integrate such a feature. For PostgreSQL there is another answer helping you.

database model structure

I have a column groups. Groups has different type stored in group_types (buyers, sellers, referee). Only when the group is of type buyer it has another type (more specialized) like electrical and mechanical.
I'm a bit puzzled with how I will store this in a database.
Someone can suggest me a database structure?
thanks
Store your group_types as a hieararchical table (with nested sets or parent-child model):
Parent-child:
typeid parent name
1 0 Buyers
2 0 Sellers
3 0 Referee
4 1 Electrical
5 1 Mechanic
SELECT *
FROM mytable
WHERE group IN
(
SELECT typeid
FROM group_types
START WITH
typeid = 1
CONNECT BY
parent = PRIOR typeid
)
will select all buyers in Oracle.
Nested sets:
typeid lower upper Name
1 1 2 Buyers
2 3 3 Sellers
3 4 4 Referee
4 1 1 Electrical
5 2 2 Mechanic
SELECT *
FROM group_types
JOIN mytable
ON group BETWEEN lower AND upper
WHERE typeid = 1
will select all buyers in any database.
Nested sets is implementable anywhere and more performant, if you don't need hierarchical ordering or frequent updates on group_types.
Parent-child is implementable easily in Oracle and SQL Server and with a little effort in MySQL. It allow easy structure changing and hierarchical ordering.
See this article in my blog on how to implement it in MySQL:
Hierarchical queries in MySQL
You could possibly store additional types like, buyer_mechanical or buyer_electrical.
You could try:
Group
group_id
group_name
group_parent_id
with entries (1, buyers, 0), (2, sellers, 0), (3, referee, 0), (4, electrical, 1), (5, mechanical, 1)
This has the advantage of being infinitely scalable, so each subgroup can have as many subgroups as you want.
Typically, you have extension tables. These are simply additional tables in your schema which hold additional information linked to the main table by some type of key
For example let's say your main table is:
People
PersonId int, PK
GroupTypeId int, FK to GroupTypes
Name varchar(100)
GroupTypes
GroupTypeId int, PK
GroupTypeName varchar(20)
BuyerTypes
BuyerTypeId int, PK
BuyerTypeName varchar(20)
BuyerData
PersonId int, FK
BuyerTypeId int FK
====
Additionally, the BuyerData would have a composite primary key (PK) on PersonId and BuyerTypeId
When pulling Buyer data out, you could use a query like
SELECT *
FROM People P
INNER JOIN BuyerData BD on (P.PersonId = BD.PersonId)
INNER JOIN BuyerTypes BT on (BD.BuyerTypeId = BT.BuyerTypeId)
grouptype: ID, Name ('buyers', 'sellers', 'referee')
group: GroupTypeID, ID, Name ('electrical' and 'mechanical' if grouptypeid == 'buyers')
contact: GroupTypeID (NOT NULL), GroupID (NULL), other attributes
Table Group is populated with records for GroupTypes as required.
Contact.GroupID can be NULL since a GroupType need not have any Groups.
UI has to take care of Group selection. You can have a trigger check the group/type logic.

Loop through without Cursor in SQL Server 2005

I have a table OrganisationStructure like this:
OrganisationID INT
ParentOrganisationID INT
OrganisationName VARCHAR(64)
1 | 0 | Company
2 | 1 | IT DIVISION
3 | 2 | IT SYSTEM BUSINESS UNIT
4 | 1 | MARKETING DIVISION
5 | 4 | DONATION BUSINESS UNIT
I want to have a query that if the app passing let say OrganisatinID = 1 means that it will loop (looking at parent/child) through till end of this table and grap all possible Returned OrganisatioIDs = (1, 2, 3, 4, 5).
Other if passing OrganisationID = 2 then Returned OrganisationID = (2, 3)
Other if passing OrganisationID = 3 then Returned OrganisationID = 3
Any ideas to do this without cursor?
Thanks
You can use SQL 2005 CTEs to make the SQL engine do it recursively.
An enumeration of basic approaches is at http://blogs.msdn.com/anthonybloesch/archive/2006/02/15/Hierarchies-in-SQL-Server-2005.aspx
Celko also has a trees in SQL book which covers all of this to the nth degree.
Or you can brute force it by selecting each level into a local table variable and then looping, inserting children with a select, until your ##ROWCOUNT is zero (i.e., you're not finding any more children). If you don't have a lot of data, this is easy to code, but you hinted that you're looking for performance by saying you dont want a cursor.
declare #rootID int;
select #rootID = 4;
with cte_anchor as (
SELECT OrganisationID
, ParentOrganisationID
, OrganisationName
FROM Organisation
WHERE OrganisationID = #rootID)
, cte_recursive as (
SELECT OrganisationID
, ParentOrganisationID
, OrganisationName
FROM cte_anchor
UNION ALL
SELECT o.OrganisationID
, o.ParentOrganisationID
, o.OrganisationName
FROM Organisation o JOIN cte_recursive r
ON o.ParentOrganisationID = r.OrganisationID)
SELECT * FROM cte_recursive
In SqlServer 2005 with Common Table Expressions is possible to do recursive queries. For an example see 'Recursive Common Table Expressions' in Common Table Expressions (CTE) in SQL Server 2005 from 4guysfromrolla.
How many levels deep can your parent child structure go ?
You could do a self-join on the table to line up grand-parent / parent / child entities, but that's limited by the number of levels deep your parent/child relationships can go.
I know you've stated SQL 2005 but just so you're aware this kind of tree structure mapping is exactly what the new HierarchyID (Video Here) in Sql 2008 is for.
Try this for 3 levels using plain vanilla simple brute force - you can add levels as required.
SELECT DISTINCT OrganizationID
FROM
(
SELECT
ParentOrganizationID
FROM OrganizationStructure
WHERE ParentOrganizationID = #arg
UNION ALL
SELECT
OrganizationID
FROM OrganizationStructure
WHERE ParentOrganizationID = #arg
UNION ALL
SELECT os2.OrganizationID
FROM OrganizationStructure os
JOIN OrganizationStructure os2 ON os.OrganizationID = is2.ParentOrganizationID
WHERE os.ParentOrganizationID = #arg
) data
I believe the question is answered well enough, however if you're interested in alternative methods of structuring your data for better effect, google for 'evolt ways to work with hierarchical data'
I'm not allowed to post links yet :)