Data structure for many to many hierarchies in SQL Server - sql

I have the following data structure already in the system.
ItemDetails:
ID Name
--------
1 XXX
2 YYY
3 ZZZ
4 TTT
5 UUU
6 WWW
And the hierarchies are in separate table (with many to many relationships)
ItemHierarchy:
ParentCode ChildCode
--------------------
1 2
1 3
3 4
4 5
5 3
5 6
As you can see that 3 is child node for 1 and 3. I want to traverse records say for example that from the node 3.
I need to write a stored procedure and get all the ancestors of 3 and all the child nodes of 3.
Could you please let me know whether any possibilities to pull the data? If so, which data structure is OK for it.
Please note that my table is containing 1 million records and out of it 40% are having multiple hierarchies.
I did 'CTE' with level and incrementing it based upon the hierarchy but I'm getting max recursive error when we traverse from root to leaf level node. I have tried 'HierarchyID' but unable to get all the details when its having multiple parent for a node.
Update: I can set a recursion limit to max and run the query. Since it has millions of rows, I'm unable to get the output at all.
I want to create a data structure such that its capable to giving information from top to bottom or bottom to top (at any node level).
Could someone kindly please help me with that?

Using RDBMS for hierarchical data structure is not recommended, its why graph database have been created.
BTW following Closure Table pattern will help you.
The Closure Table solution is a simple and elegant way of storing hierarchies. It involves storing all paths through the tree, not just those with a direct parent-child relationship.
The key point to use the pattern is how you must fill ItemHierarchy table.
Store one row in this table for each pair of nodes in the tree that shares an ancestor/descendant relationship, even if they are separated by multiple levels in the tree. Also add a row for each node to reference itself.
Think we have a simple graph like bellow:
The doted arrows shows the rows in ItemHierarchy table:
To retrieve descendants of #3:
SELECT c.*
FROM ItemDetails AS ID
JOIN ItemHierarchy AS IH ON ID.ID = IH.ChildCode
WHERE IH.ParentCode = 3;
To retrieve ancestors of #3:
SELECT c.*
FROM ItemDetails AS ID
JOIN ItemHierarchy AS IH ON ID.ID = IH.ParentCode
WHERE IH.ChildCode = 3;
To insert a new leaf node, for instance a new child of #5, first
insert the self-referencing row. Then add a copy of the set of rows in
TreePaths that reference comment #5 as a descendant (including the row
in which #5 references itself), replacing the descendant with
the number of the new item:
INSERT INTO ItemHierarchy (parentCode, childCode)
SELECT IH.parentCode, 8
FROM ItemHierarchy AS IH
WHERE IH.childCode = 5
UNION ALL
SELECT 8, 8;
To delete a complete sub-tree, for instance #4 and its descendants, delete all rows in ItemHierarchy that reference #4 as a
descendant, as well as all rows that reference any of #4’s
descendants as descendants:
DELETE FROM ItemHierarchy
WHERE chidCode IN (SELECT childCode
FROM ItemHierarchy
WHERE parrentCode = 4);
UPDATE
Since the sample data you have shown us leads to recursive loops(not hierarchies) like:
1 -> 3 -> 4 -> 5 -> 3 -> 4 -> 5
Following Path Enumeration pattern will help you.
A UNIX path like /usr/local/lib/ is a path enumeration of the file system,
where usr is the parent of local, which in turn is the parent of lib.
You can create a Table or View from ItemHierarchy table, calling it EnumPath:
Table EnumPath(NodeCode, Path) For the sample data we will have:
To find ancestors of node #4:
select distinct E1.NodeCode from EnumPath E1
inner join EnumPath E2
On E2.path like E1.path || '%'
where E2.NodeCode = 4 and E1.NodeCode != 4;
To find descendants of node #4:
select distinct E1.NodeCode from EnumPath E1
inner join EnumPath E2
On E1.path like E2.path || '%'
where E2.NodeCode = 4 and E1.NodeCode != 4;
Sqlfiddle demo

Related

WITH RECURSIVE SELECT via secondary table

I'm having a bit of a hard time trying to piece this together. I'm not adept with databases or complex queries.
The Database
I'm using the latest MariaDB release.
I have a database table configuration like so, representing a hierarchical data structure:
|----------------------|
| fieldsets |
|----+-----------------|
| id | parent_field_id |
|----+-----------------|
| 1 | NULL |
| 2 | 1 |
|----------------------|
|-------------------------|
| fields |
|----+--------------------|
| id | parent_fieldset_id |
|----+--------------------|
| 1 | 1 |
| 2 | 1 |
|-------------------------|
The Problem
I'm trying to piece together a recursive query. I need to select every fieldset in a given hierarchy. For example, in the above, stripped-down example, I want to select fieldset of id = 1, and every descendant fieldset.
The IDs of the next rung down in any given level in the hierarchy are obtained only via columns of a secondary table.
The table fieldsets contains no column by which I can directly get all child fieldsets. I need to get all fields that are a child of a given fieldset, and then get any fieldsets that are a child of that field.
A Better Illustration of the Problem
This query does not work because of the reported error: "Restrictions imposed on recursive definitions are violated for table all_fieldsets"
However, it really illustrates what I need to do in order to get all descendant fieldsets in the hierarchy (remember, a fieldset does not contain the column for its parent fieldset, since a fieldset cannot have a fieldset as a direct parent. Instead, a fieldset has a parent_field_id which points to a row in the fields table, and that row in the fields table correspondingly has a column named parent_fieldset_id which points to a row back in the fieldsets table, which is considered the parent fieldset to a fieldset, just an indirect parent.
WITH RECURSIVE all_fieldsets AS (
SELECT fieldsets.* FROM fieldsets WHERE id = 125
UNION ALL
SELECT fieldsets.* FROM fieldsets
WHERE fieldsets.parent_field_id IN (
SELECT id FROM fields f
INNER JOIN all_fieldsets afs
WHERE f.parent_fieldset_id = afs.id
)
)
SELECT * FROM all_fieldsets
My Attempt
The query I have thus far (which does not work):
WITH RECURSIVE all_fieldsets AS (
SELECT fieldsets.* FROM fieldsets WHERE id = 125
UNION
SELECT fieldsets.* FROM fieldsets WHERE fieldsets.id IN (SELECT fs.id FROM fieldsets fs LEFT JOIN fields f ON f.id = fs.parent_field_id WHERE f.parent_fieldset_id = fieldsets.id)
)
SELECT * FROM all_fieldsets
My Research
I'm also having a hard time finding an example which fits my use-case. There's so many results for hierarchical structures that involve one table having only relations to itself, not via a secondary table, as in my case. It's difficult when you do not know the correct terms for certain concepts, and any layman explanation seems to yield too many tangential search results.
My Plea
I would be enormously grateful to all who can point out where I'm going wrong, and perhaps suggest the outline of a query that will work.
The main problem I see with your current code is that the recursive portion of the CTE (the query which appears after the union) is not selecting from the recursive CTE, when it should be. Consider this updated version:
WITH RECURSIVE all_fieldsets AS (
SELECT * FROM fieldsets WHERE id = 125
UNION ALL
SELECT f1.*
FROM fieldsets f1
INNER JOIN all_fieldsets f2
ON f1.parent_field_id = f2.id
)
SELECT *
FROM all_fieldsets;
Note that the join in the recursive portion of the CTE relates a given descendant record in fieldsets to its parent in the CTE.
I got home from work, and I just could not set this down!
But, out of that came a solution.
I highly recommend reading this answer about recursive queries to get a better idea of how they work, and what the syntax means. Quite brilliantly explained: How to select using WITH RECURSIVE clause
The Solution
WITH RECURSIVE all_fieldsets AS (
SELECT * FROM fieldsets fs
WHERE id = 59
UNION ALL
SELECT fs.* FROM fieldsets fs
INNER JOIN all_fieldsets afs
INNER JOIN fields f
ON f.parent_fieldset_id = afs.id
AND fs.parent_field_id = f.id
)
SELECT * FROM all_fieldsets
I had to use joins to get the information from the fields table, in order to get the next level in the hierarchy, and then do this recursively until there is an empty result in the recursive query.

Simple SQL statement with parent/child hierarchies

It's been a while since I've needed to write SQL statements (and I don't even know if ever had enough knowledge to make this statement).
So, here's the deal. Table has two column. One is for parent id, other is for child Id.
parent_id | child_id
4 | 2
2 | 5
This is simply for saving composite parent/child hierarchies.
4, 2 line means that structure with id 4 refers to structure id 2 as a child.
2, 5 means structure with id 2 refers to structure with id 5 as a child.
And so on.
This is what I need to do:
I need to extract ALL structures, that are not referenced by any structure as a child (root structures).
What SQL (preferrably postgres) statement will accomplish that?
Finding all structures that are not a child of another structure:
select *
from YourTable
where Parent_Id not in (Select child_id from ...)
Assuming there is no scope for a grandparent, great-grandparent relationsips, I would recommend use a Left-JOIN in this case.
Somethink on the lines of:
Select * from Table
LEFt join Table on Parent_id=child_id
WHERE child_id is null
SELECT *
FROM structures
WHERE id not in ( SELECT child_id FROM Table ) AS dummy

Selecting only rows containing at least the given elements in SQL

Using PostgreSQL, and given the following sample table, how do I select all parents that have at least a child 10 and a child 20?
parent | child
--------+-------
1 | 10
1 | 20
1 | 30
2 | 10
2 | 20
3 | 10
In other words, this is the expected result:
parent
--------
1
2
In general, how do I select all parents that have at least all of the given children x1, x2, ..., xn? What is the most efficient way to do this?
Thanks!
SELECT parent FROM table WHERE child IN(10,20)
GROUP BY parent
HAVING COUNT(DISTINCT child)>=2
Fiddle
It's not completely clear what your asking. However, I shall give it a crack.
If you're going to manually define the children you can do a simple select statement:
SELECT DISTINCT parent
FROM table1
WHERE child IN ('10', '20')
This would select all Parents that have 10 or 20 as there child. To add more, just add the number to the IN() part.
If however you want to do this for a large number of children or perhaps an unknown number of children then you can create a temp table to store the children search values and join it to your main table. Something like:
CREATE TABLE #SearchChildren
(
Child int
)
Then input your search values into #SearchChildren. Need to know more about what your doing to do this bit.
SELECT DISTINCT a.parent
FROM table1 as a
JOIN #SearchChildren as s
ON a.child = s.Child
Without knowing more about what your trying to do it's difficult to give a full answer but hopefully this helps.

How to search on levelOrder values un SQL?

I have a table in SQL Server that contains the following columns :
Id Name ParentId LevelOrder
8 vehicle 0 0/8/
9 car 8 0/8/9/
10 bike 8 0/8/10/
11 House 0 0/11/
...
This creates a tree.
Say that I have the LevelOrder 0/8/, this should return only the car and bike rows, but how do I handle this in SQL Server?
I have tried :
Select * FROM MyTable WHERE LevelOrder >= '0/8/'
but that does not work.
The underscore character will guarantee at least one character comes after '0/8/', so you don't get a match on the "vehicle" row.
SELECT *
FROM MyTable
WHERE LevelOrder LIKE '0/8/_%'
This code allows you to select values that start with 0/8/
Select * FROM MyTable WHERE LevelOrder like '0/8/%'
Okay -
While #Joe's answer is the simplest and easiest to implement (and possibly better performing than what I'm about to propose...), there are some issues with update anomalies.
Specifically:
You already have a parentId column. You need to synchronize both this and the levelOrder column, or risk inconsistent data. (I believe this also violates 1NF, although my understanding of the exact definition is a little sketchy...)
levelOrder contains the entire heirarchy. If any one parent is moved, all children rows must have levelOrder modified to reflect this (potentially very messy).
In light of this, here's what I recommend:
Drop the levelOrder column, as its existence will (generally) cause problems.
Use a recursive CTE and the parentId column to build the heirarchy dynamically. Either leave the column where it is, or move it to a dedicated relationship table. Moving one parent then requires only one cell to be updated, and cannot result in any (data, not semantic) anomalies. The CTE should look similar to this form (will need to be adjusted for purpose):
WITH heir_parent (parentId, id) as (SELECT parentId, id
FROM table
WHERE id =
UNION ALL
SELECT b.parentId, b.id
FROM heir_parent as a
JOIN table as b
ON b.parentId = a.id)
At the moment, the CTE returns a list of all children of the given id, with their id and their immediate parent. It can be adjusted to return a number of other things as well - although I recommend that the CTE be used only to generate the relationship, and join externally to get the remaining data.

SQL - How to store and navigate hierarchies?

What are the ways that you use to model and retrieve hierarchical info in a database?
I like the Modified Preorder Tree Traversal Algorithm. This technique makes it very easy to query the tree.
But here is a list of links about the topic which I copied from the Zend Framework (PHP) contributors webpage (posted there by Posted by Laurent Melmoux at Jun 05, 2007 15:52).
Many of the links are language agnostic:
There is 2 main representations and algorithms to represent hierarchical structures with databases :
nested set also known as modified preorder tree traversal algorithm
adjacency list model
It's well explained here:
http://www.sitepoint.com/article/hierarchical-data-database
Managing Hierarchical Data in MySQL
http://www.evolt.org/article/Four_ways_to_work_with_hierarchical_data/17/4047/index.html
Here are some more links that I've collected:
http://en.wikipedia.org/wiki/Tree_%28data_structure%29
http://en.wikipedia.org/wiki/Category:Trees_%28structure%29
adjacency list model
http://www.sqlteam.com/item.asp?ItemID=8866
nested set
http://www.sqlsummit.com/AdjacencyList.htm
http://www.edutech.ch/contribution/nstrees/index.php
http://www.phpriot.com/d/articles/php/application-design/nested-trees-1/
http://www.dbmsmag.com/9604d06.html
http://en.wikipedia.org/wiki/Tree_traversal
http://www.cosc.canterbury.ac.nz/mukundan/dsal/BTree.html (applet java montrant le fonctionnement )
Graphes
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html
Classes :
Nested Sets DB Tree Adodb
http://www.phpclasses.org/browse/package/2547.html
Visitation Model ADOdb
http://www.phpclasses.org/browse/package/2919.html
PEAR::DB_NestedSet
http://pear.php.net/package/DB_NestedSet
utilisation : https://www.entwickler.com/itr/kolumnen/psecom,id,26,nodeid,207.html
PEAR::Tree
http://pear.php.net/package/Tree/download/0.3.0/
http://www.phpkitchen.com/index.php?/archives/337-PEARTree-Tutorial.html
nstrees
http://www.edutech.ch/contribution/nstrees/index.php
The definitive pieces on this subject have been written by Joe Celko, and he has worked a number of them into a book called Joe Celko's Trees and Hierarchies in SQL for Smarties.
He favours a technique called directed graphs. An introduction to his work on this subject can be found here
What's the best way to represent a hierachy in a SQL database? A generic, portable technique?
Let's assume the hierachy is mostly read, but isn't completely static. Let's say it's a family tree.
Here's how not to do it:
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date,
mother integer,
father integer
);
And inserting data like this:
person_id name dob mother father
1 Pops 1900/1/1 null null
2 Grandma 1903/2/4 null null
3 Dad 1925/4/2 2 1
4 Uncle Kev 1927/3/3 2 1
5 Cuz Dave 1953/7/8 null 4
6 Billy 1954/8/1 null 3
Instead, split your nodes and your relationships into two tables.
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date
);
create table ancestor (
ancestor_id integer,
descendant_id integer,
distance integer
);
Data is created like this:
person_id name dob
1 Pops 1900/1/1
2 Grandma 1903/2/4
3 Dad 1925/4/2
4 Uncle Kev 1927/3/3
5 Cuz Dave 1953/7/8
6 Billy 1954/8/1
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
1 3 1
2 3 1
1 4 1
2 4 1
1 5 2
2 5 2
4 5 1
1 6 2
2 6 2
3 6 1
you can now run arbitary queries that don't involve joining the table back on itself, which would happen if you have the heirachy relationship in the same row as the node.
Who has grandparents?
select * from person where person_id in
(select descendant_id from ancestor where distance=2);
All your descendants:
select * from person where person_id in
(select descendant_id from ancestor
where ancestor_id=1 and distance>0);
Who are uncles?
select decendant_id uncle from ancestor
where distance=1 and ancestor_id in
(select ancestor_id from ancestor
where distance=2 and not exists
(select ancestor_id from ancestor
where distance=1 and ancestor_id=uncle)
)
You avoid all the problems of joining a table to itself via subqueries, a common limitation is 16 subsuqeries.
Trouble is, maintaining the ancestor table is kind of hard - best done with a stored procedure.
I've got to disagree with Josh. What happens if you're using a huge hierarchical structure like a company organization. People can join/leave the company, change reporting lines, etc... Maintaining the "distance" would be a big problem and you would have to maintain two tables of data.
This query (SQL Server 2005 and above) would let you see the complete line of any person AND calculates their place in the hierarchy and it only requires a single table of user information. It can be modified to find any child relationship.
--Create table of dummy data
create table #person (
personID integer IDENTITY(1,1) NOT NULL,
name varchar(255) not null,
dob date,
father integer
);
INSERT INTO #person(name,dob,father)Values('Pops','1900/1/1',NULL);
INSERT INTO #person(name,dob,father)Values('Grandma','1903/2/4',null);
INSERT INTO #person(name,dob,father)Values('Dad','1925/4/2',1);
INSERT INTO #person(name,dob,father)Values('Uncle Kev','1927/3/3',1);
INSERT INTO #person(name,dob,father)Values('Cuz Dave','1953/7/8',4);
INSERT INTO #person(name,dob,father)Values('Billy','1954/8/1',3);
DECLARE #OldestPerson INT;
SET #OldestPerson = 1; -- Set this value to the ID of the oldest person in the family
WITH PersonHierarchy (personID,Name,dob,father, HierarchyLevel) AS
(
SELECT
personID
,Name
,dob
,father,
1 as HierarchyLevel
FROM #person
WHERE personID = #OldestPerson
UNION ALL
SELECT
e.personID,
e.Name,
e.dob,
e.father,
eh.HierarchyLevel + 1 AS HierarchyLevel
FROM #person e
INNER JOIN PersonHierarchy eh ON
e.father = eh.personID
)
SELECT *
FROM PersonHierarchy
ORDER BY HierarchyLevel, father;
DROP TABLE #person;
FYI: SQL Server 2008 introduces a new HierarchyID data type for this sort of situation. Gives you control over where in the "tree" your row sits, horizontally as well as vertically.
Oracle: SELECT ... START WITH ... CONNECT BY
Oracle has an extension to SELECT that allows easy tree-based retrieval. Perhaps SQL Server has some similar extension?
This query will traverse a table where the nesting relationship is stored in parent and child columns.
select * from my_table
start with parent = :TOP
connect by prior child = parent;
http://www.adp-gmbh.ch/ora/sql/connect_by.html
I prefer a mix of the techinques used by Josh and Mark Harrison:
Two tables, one with the data of the Person and other with the hierarchichal info (person_id, parent_id [, mother_id]) if the PK of this table is person_id, you have a simple tree with only one parent by node (which makes sense in this case, but not in other cases like accounting accounts)
This hiarchy table can be transversed by recursive procedures or if your DB supports it by sentences like SELECT... BY PRIOR (Oracle).
Other posibility is if you know the max deep of the hierarchy data you want to mantain is use a single table with a set of columns per level of hierarchy
We had the same issue when we implemented a tree component for [fleXive] and used the nested set tree model approach mentioned by tharkun from the MySQL docs.
In addition to speed things (dramatically) up we used a spreaded approach which simply means we used the maximum Long value for the top level right bounds which allows us to insert and move nodes without recalculating all left and right values. Values for left and right are calculated by dividing the range for a node by 3 und use the inner third as bounds for the new node.
A java code example can be seen here.
If you're using SQL Server 2005 then this link explains how to retrieve hierarchical data.
Common Table Expressions (CTEs) can be your friends once you get comfortable using them.