How to represent a data tree in SQL? - sql

I'm writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data.
I'm using a UI library to present the tree in a windows form where I can bind the tree to the TreeView.
I will need to save this tree and nodes in the DB.
What will be the best way to save the tree and to get the following features:
Intuitive implementation.
Easy binding. Will be easy to move from the tree to the DB structure and back (if any)
I had 2 ideas. The first is to serialize the data into a one liner in a table.
The second is to save in tables but then, when moving to data entities I will loose the row states on the table on changed nodes.
Any ideas?

I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed
The recommendation from there is to use a Closure Table (it's explained in the slides).
Here is the summary (slide 77):
| Query Child | Query Subtree | Modify Tree | Ref. Integrity
Adjacency List | Easy | Hard | Easy | Yes
Path Enumeration | Easy | Easy | Hard | No
Nested Sets | Hard | Easy | Hard | No
Closure Table | Easy | Easy | Easy | Yes

The easiest implementation is adjacency list structure:
id parent_id data
However, some databases, particularly MySQL, have some issues in handling this model, because it requires an ability to run recursive queries which MySQL lacks.
Another model is nested sets:
id lft rgt data
where lft and rgt are arbitrary values that define the hierarchy (any child's lft, rgt should be within any parent's lft, rgt)
This does not require recursive queries, but it slower and harder to maintain.
However, in MySQL this can be improved using SPATIAL abitilies.
See these articles in my blog:
Adjacency list vs. nested sets: PostgreSQL
Adjacency list vs. nested sets: SQL Server
Adjacency list vs. nested sets: Oracle
Adjacency list vs. nested sets: MySQL
for more detailed explanations.

I'm suprised that nobody mentioned the materialized path solution, which is probably the fastest way of working with trees in standard SQL.
In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.
Have a look at the example table node:
+---------+-------+
| node_id | path |
+---------+-------+
| 0 | |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1.4 |
| 5 | 2.5 |
| 6 | 2.6 |
| 7 | 2.6.7 |
| 8 | 2.6.8 |
| 9 | 2.6.9 |
+---------+-------+
In order to get the children of node x, you can write the following query:
SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')
Keep in mind, that the column path should be indexed, in order to perform fast with the LIKE clause.

If you are using PostgreSQL you can use ltree, a package in the contrib extension (comes by default) which implements the tree data structure.
From the docs:
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);
You can do queries like:
ltreetest=> SELECT path FROM test WHERE path <# 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)

It depends on how you will be querying and updating the data. If you store all the data in one row, it's basically a single unit that you can't query into or partially update without rewriting all the data.
If you want to store each element as a row, you should first read Managing Hierarchical Data in MySQL (MySQL specific, but the advice holds for many other databases too).
If you're only ever accessing an entire tree, the adjacency list model makes it difficult to retrieve all nodes under the root without using a recursive query. If you add an extra column that links back to the head then you can do SELECT * WHERE head_id = #id and get the whole tree in one non-recursive query, but it denormalizes the database.
Some databases have custom extensions that make storing and retrieving heirarchical data easier, for example Oracle has CONNECT BY.

As this is the top answer when asking "sql trees" in a google search, I will try to update this from the perspective of today (december 2018).
Most answers imply that using an adjacency list is both simple and slow and therefore recommend other methods.
Since version 8 (published april 2018) MySQL supports recursive common table expressions (CTE). MySQL is a bit late to the show but this opens up a new option.
There is a tutorial here that explains the use of recursive queries to manage an adjacency list.
As the recursion now runs completely within the database engine, it is way much faster than in the past (when it had to run in the script engine).
The blog here gives some measurements (which are both biased and for postgres instead of MySQL) but nevertheless it shows that adjacency lists do not have to be slow.
So my conclusion today is:
The simple adjacency list may be fast enough if the database engine supports recursion.
Do a benchmark with your own data and your own engine.
Do not trust outdated recommendations to point out the "best" method.

PGSQL Tree relations
Hello, I just got a handle on this for a project I'm working on and figured I'd share my write-up
Hope this helps. Let's get started with some prereqs
This is essentially the closure table solution mentioned above Using recursive calls. Thanks for those slides they are very useful I wish i saw them before this write up :)
pre-requisites
Recursive Functions
these are functions that call themselves ie
function factorial(n) {
if (n = 0) return 1; //base case
return n * factorial(n - 1); // recursive call
}
This is pretty cool luckily pgsql has recursive functions too but it can be a bit much. I prefer functional stuff
cte with pgsql
WITH RECURSIVE t(n) AS (
VALUES (1) -- nonrecusive term
UNION ALL
SELECT n+1 FROM t WHERE n < 100 -- recusive term
--continues until union adds nothing
)
SELECT sum(n) FROM t;
The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:
Recursive Query Evaluation
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
a. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
b. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
to do something like factorial in sql you need to do something more like this so post
ALTER FUNCTION dbo.fnGetFactorial (#num int)
RETURNS INT
AS
BEGIN
DECLARE #n int
IF #num <= 1 SET #n = 1
ELSE SET #n = #num * dbo.fnGetFactorial(#num - 1)
RETURN #n
END
GO
Tree data structures (more of a forest :)
wikipedia
The import thing to note is that a tree is a subset of a graph, This can be simply enforced by
the relationship each node has only one parent.
Representing the Tree in PGSQL
I think it will be easiest to work it out a little more theoretically before we move on to the sql
The simple way of represent a graph relation without data duplication is by separating the nodes(id, data) from the edges.
We can then restrict the edges(parent_id, child_id) table to enforce our constraint. be mandating that parent_id,child_id
as well as just child id be unique
create table nodes (
id uuid default uuid_generate_v4() not null unique ,
name varchar(255) not null,
json json default '{}'::json not null,
remarks varchar(255),
);
create table edges (
id uuid default uuid_generate_v4() not null,
parent_id uuid not null,
child_id uuid not null,
meta json default '{}'::json,
constraint group_group_id_key
primary key (id),
constraint group_group_unique_combo
unique (parent_id, child_id),
constraint group_group_unique_child
unique (child_id),
foreign key (parent_id) references nodes
on update cascade on delete cascade,
foreign key (child_id) references nodes
on update cascade on delete cascade
);
Note that theoretical this can all be done with only one table by simply putting the parent_id in the nodes table
and then
CREATE VIEW v_edges as (SELECT id as child_id, parent_id FROM nodes)
but for the proposal of flexibility and so that we can incorporate other graph structures to this
framework I will use the common many-to-many relationship structure. This will ideally allow this research to be
expanded into other graph algorithms.
Let's start out with a sample data structure
INSERT (id, my_data) VALUES ('alpha', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('bravo', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('charly', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('berry', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('zeta', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('yank', 'my big data') INTO nodes
INSERT (parent_id, child_id) VALUES ('alpha', 'bravo') INTO edges
INSERT (parent_id, child_id) VALUES ('alpha', 'berry') INTO edges
INSERT (parent_id, child_id) VALUES ('bravo', 'charly') INTO edges
INSERT (parent_id, child_id) VALUES ('yank', 'zeta') INTO edges
-- rank0 Alpha Yank
-- rank1 Bravo Berry Zeta
-- rank2 Charly
Note the interesting properties of a tree (number of edges e) =( number of nodes n)-1
each child has exactly one parent.
We can then simplify the equations
let n = node
let p = parent
let c = child
let ns = nodes = groups
let es = edges = group_group // because this is a relationship of a group entity to another group entity
So now what sort of questions will we ask.
"Given an arbitrary set of groups 's' what is the coverage of the graph assuming nodes inherit their children?"
This is a tricky question, it requires us to traverse the graph and find all children of each node in s
This continues off of this stack overflow post
-- some DBMS (e.g. Postgres) require the word "recursive"
-- some others (Oracle, SQL-Server) require omitting the "recursive"
-- and some (e.g. SQLite) don't bother, i.e. they accept both
-- drop view v_group_descendant;
create view v_group_descendant as
with recursive descendants -- name for accumulating table
(parent_id, descendant_id, lvl) -- output columns
as
( select parent_id, child_id, 1
from group_group -- starting point, we start with each base group
union all
select d.parent_id, s.child_id, d.lvl + 1
from descendants d -- get the n-1 th level of descendants/ children
join group_group s -- and join it to find the nth level
on d.descendant_id = s.parent_id -- the trick is that the output of this query becomes the input
-- Im not sure when it stops but probably when there is no change
)
select * from descendants;
comment on view v_group_descendant is 'This aggregates the children of each group RECURSIVELY WOO ALL THE WAY DOWN THE TREE :)';
after we have this view we can join with our nodes/groups to get out data back i will not provide these samples for every single step for the most part we will just work with ids.
select d.*, g1.group_name as parent, g2.group_name as decendent --then we join it with groups to add names
from v_group_descendant d, groups g1, groups g2
WHERE g1.id = d.parent_id and g2.id = d.descendant_id
order by parent_id, lvl, descendant_id;
sample output
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
+------------------------------------+------------------------------------+---+----------+---------+
Note that this is just the minimal node descendant relationship and has actual lost all nodes with 0 children such as charly.
In order to resolve this we need to add all nodes back which don't appear in the descendants list
create view v_group_descendant_all as (
select * from v_group_descendant gd
UNION ALL
select null::uuid as parent_id,id as descendant_id, 0 as lvl from groups g
where not exists (select * from v_group_descendant gd where gd.descendant_id = g.id )
);
comment on view v_group_descendant is 'complete list of descendants including rank 0 root nodes descendant - parent relationship is duplicated for all levels / ranks';
preview
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
|null |c1529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |alpha |
|null |42529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |yank |
+------------------------------------+------------------------------------+---+----------+---------+
Lets say for example we are getting our set s of groups bases on a users(id , data) table with a user_group(user_id, group_id) relation
We can then join this to another table removing duplicates because our set s of user_group relations may cause
duplicates if a users is say assigned to both alpha assigned charly
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | charly |
| kier | yank |
| kier | bravo |
+------+--------+
--drop view v_user_group_recursive;
CREATE VIEW v_user_group_recursive AS (
SELECT DISTINCT dd.descendant_id AS group_id, ug.user_id
FROM v_group_descendant_all dd , user_group ug
WHERE (ug.group_id = dd.descendant_id
OR ug.group_id = dd.parent_id) -- should gic
);
SELECT * FROM v_user_group_recursive;
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | bravo |
| jane | berry |
| jane | charly |
-- | jane | charly | Removed by DISTINCT
| kier | yank |
| kier | zeta |
| kier | bravo |
| kier | charly |
+------+--------+
If we want we can now group by node and join we can do somthing k like the fallowing
CREATE VIEW v_user_groups_recursive AS (
SELECT user_id, json_agg(json_build_object('id', id,'parent_id',parent_id, 'group_name', group_name, 'org_id', org_id, 'json', json, 'remarks', remarks)) as groups
FROM v_user_group_recursive ug, v_groups_parent g
WHERE ug.group_id = g.id GROUP BY user_id
);
comment on view v_user_group_recursive is 'This aggregates the groups for each user recursively ';
+------+-------------------------------+
| user | groups |
+------+-------------------------------+
| jane | [alpha, bravo, berry, charly] |
| kier | [yank, zeta, bravo, charly] |
+------+-------------------------------+
This is awesome we have answered the question. We now can simply ask which groups this use inherits
SELECT * from v_user_groups_recursive where user_id = 'kier
Displaying our hard work in the front end
And further we could use somthing like jstree.com to display
our structure
async function getProjectTree(user_id) {
let res = await table.query(format('SELECT * from v_user_groups_recursive ug WHERE ug.user_id = %L', user_id));
if (res.success) {
let rows = res.data[0].groups.map(r => {
return {
id: r.id, // required
parent: r.parent_id==null?'#':r.parent_id,// required
text: r.group_name,// node text
icon: 'P', // string for custom
state: {
opened: true, // is the node open
disabled: false, // is the node disabled
selected: false, // is the node selected
},
li_attr: {}, // attributes for the generated LI node
a_attr: {} // attributes for the generated A node
}
})
return {success: true, data: rows, msg: 'Got all projects'}
} else return res;
}
<div id="v_project_tree" class="row col-10 mx-auto" style="height: 25vh"></div>
<script>
function buildTree() {
bs.sendJson('get', "/api/projects/getProjectTree").then(res => {
bs.resNotify(res);
if (!res.success) {
//:(
console.error(':(');
return
}
console.log(res.data);
$('#v_project_tree').jstree({
'core': {
'data': res.data
}
});
})
}
window.addEventListener('load', buildTree);
</script>
jstree preview
blog

The best way, I think indeed is to give each node an id and a parent_id, where the parent id is the id of the parent node. This has a couple of benefits
When you want to update a node, you only have to rewrite the data of that node.
When you want to query only a certain node, you can get exactly the information you want, thus having less overhead on the database connection
A lot of programming languages have functionality to transform mysql data into XML or json, which will make it easier to open up your application using an api.

Something like table "nodes" where each node row contains parent id (in addition to the ordinary node data). For root, the parent is NULL.
Of course, this makes finding children a bit more time consuming, but this way the actual database will be quite simple.

Related

Alternative of HIERARCHY_ANCESTORS function in SAP HANA

I wrote a piece of code in HANA and used the HIERARCHY_ANCESTORS function. However, it started giving me all sorts of OOM (Out of Memory) issues. I raised an OSS & provided all the details just to realize that it was an unidentified issue of the standard SAP function which will now be rectified in the next release.
Now I do not want to delve on the HIERARCHY_ANCESTORS function, as I already lost a month communicating through the OSS.
What I need is an alternative, which does not include any loops. Basically, I need the ancestors of all the leaf nodes (HIERARCHY_TREE_SIZE = 1) identified from the HIERARCHY function, without using loops. There can be over around 35k leaf nodes.
The data size is over 80k records, and I have tried looping over the same earlier, and it severely degrades the performance, timing out after a certain point. My need is to wrap it up in less than 30s, like the HIERARCHY_ANCESTORS functions would.
I can perhaps create a recursive function to fetch all the ancestors of 1 leaf ID. But how would I use it inside a SQL query, so that the same function can then fetch the ancestors of all the requisite IDs?
Any help is appreciated from HANA POV.
Thank you.
Consider the following hierarchy
The leaves and their ancestors are:
| NODE_ID | ANCESTOR |
| ------- | -------- |
| A010 | A |
| A10 | A |
| B | B |
| C0 | C |
To get there, you can use annother hierarchy function to produce the same behavior.
create COLUMN TABLE TSTHIERARCHY(
parent_id nvarchar(32),
node_id nvarchar(32),
val integer
);
INSERT INTO TSTHIERARCHY VALUES (NULL,'A',1);
INSERT INTO TSTHIERARCHY VALUES ('A','A0',2);
INSERT INTO TSTHIERARCHY VALUES ('A','A1',2);
INSERT INTO TSTHIERARCHY VALUES ('A0','A01',3);
INSERT INTO TSTHIERARCHY VALUES ('A01','A010',4);
INSERT INTO TSTHIERARCHY VALUES ('A1','A10',3);
INSERT INTO TSTHIERARCHY VALUES (NULL,'B',1);
INSERT INTO TSTHIERARCHY VALUES (NULL,'C',1);
INSERT INTO TSTHIERARCHY VALUES ('C','C0',2);
WITH t1 AS (SELECT * FROM HIERARCHY( SOURCE TSTHIERARCHY ))
SELECT leaves.NODE_ID, ancestors.NODE_ID AS ANCESTOR
FROM (
select NODE_ID, HIERARCHY_ROOT_RANK
FROM t1 WHERE HIERARCHY_RANK NOT IN (SELECT DISTINCT HIERARCHY_PARENT_RANK FROM t1)
) leaves
INNER join
( SELECT NODE_ID, HIERARCHY_RANK
FROM t1 WHERE HIERARCHY_RANK = HIERARCHY_ROOT_RANK
) ancestors
on (leaves.HIERARCHY_ROOT_RANK=ancestors.HIERARCHY_RANK)
Does that work ?
A simpler query exists to display the ancestor for all hierarchy members, not just leaf nodes:
SELECT NODE_ID,HIERARCHY_RANK,
FIRST_VALUE(NODE_ID) OVER (PARTITION BY HIERARCHY_ROOT_RANK ORDER BY hierarchy_level) AS ancestors
FROM HIERARCHY( SOURCE TSTHIERARCHY )

WITH RECURSIVE SELECT via secondary table

I'm having a bit of a hard time trying to piece this together. I'm not adept with databases or complex queries.
The Database
I'm using the latest MariaDB release.
I have a database table configuration like so, representing a hierarchical data structure:
|----------------------|
| fieldsets |
|----+-----------------|
| id | parent_field_id |
|----+-----------------|
| 1 | NULL |
| 2 | 1 |
|----------------------|
|-------------------------|
| fields |
|----+--------------------|
| id | parent_fieldset_id |
|----+--------------------|
| 1 | 1 |
| 2 | 1 |
|-------------------------|
The Problem
I'm trying to piece together a recursive query. I need to select every fieldset in a given hierarchy. For example, in the above, stripped-down example, I want to select fieldset of id = 1, and every descendant fieldset.
The IDs of the next rung down in any given level in the hierarchy are obtained only via columns of a secondary table.
The table fieldsets contains no column by which I can directly get all child fieldsets. I need to get all fields that are a child of a given fieldset, and then get any fieldsets that are a child of that field.
A Better Illustration of the Problem
This query does not work because of the reported error: "Restrictions imposed on recursive definitions are violated for table all_fieldsets"
However, it really illustrates what I need to do in order to get all descendant fieldsets in the hierarchy (remember, a fieldset does not contain the column for its parent fieldset, since a fieldset cannot have a fieldset as a direct parent. Instead, a fieldset has a parent_field_id which points to a row in the fields table, and that row in the fields table correspondingly has a column named parent_fieldset_id which points to a row back in the fieldsets table, which is considered the parent fieldset to a fieldset, just an indirect parent.
WITH RECURSIVE all_fieldsets AS (
SELECT fieldsets.* FROM fieldsets WHERE id = 125
UNION ALL
SELECT fieldsets.* FROM fieldsets
WHERE fieldsets.parent_field_id IN (
SELECT id FROM fields f
INNER JOIN all_fieldsets afs
WHERE f.parent_fieldset_id = afs.id
)
)
SELECT * FROM all_fieldsets
My Attempt
The query I have thus far (which does not work):
WITH RECURSIVE all_fieldsets AS (
SELECT fieldsets.* FROM fieldsets WHERE id = 125
UNION
SELECT fieldsets.* FROM fieldsets WHERE fieldsets.id IN (SELECT fs.id FROM fieldsets fs LEFT JOIN fields f ON f.id = fs.parent_field_id WHERE f.parent_fieldset_id = fieldsets.id)
)
SELECT * FROM all_fieldsets
My Research
I'm also having a hard time finding an example which fits my use-case. There's so many results for hierarchical structures that involve one table having only relations to itself, not via a secondary table, as in my case. It's difficult when you do not know the correct terms for certain concepts, and any layman explanation seems to yield too many tangential search results.
My Plea
I would be enormously grateful to all who can point out where I'm going wrong, and perhaps suggest the outline of a query that will work.
The main problem I see with your current code is that the recursive portion of the CTE (the query which appears after the union) is not selecting from the recursive CTE, when it should be. Consider this updated version:
WITH RECURSIVE all_fieldsets AS (
SELECT * FROM fieldsets WHERE id = 125
UNION ALL
SELECT f1.*
FROM fieldsets f1
INNER JOIN all_fieldsets f2
ON f1.parent_field_id = f2.id
)
SELECT *
FROM all_fieldsets;
Note that the join in the recursive portion of the CTE relates a given descendant record in fieldsets to its parent in the CTE.
I got home from work, and I just could not set this down!
But, out of that came a solution.
I highly recommend reading this answer about recursive queries to get a better idea of how they work, and what the syntax means. Quite brilliantly explained: How to select using WITH RECURSIVE clause
The Solution
WITH RECURSIVE all_fieldsets AS (
SELECT * FROM fieldsets fs
WHERE id = 59
UNION ALL
SELECT fs.* FROM fieldsets fs
INNER JOIN all_fieldsets afs
INNER JOIN fields f
ON f.parent_fieldset_id = afs.id
AND fs.parent_field_id = f.id
)
SELECT * FROM all_fieldsets
I had to use joins to get the information from the fields table, in order to get the next level in the hierarchy, and then do this recursively until there is an empty result in the recursive query.

Is chaining rows in the same table a bad pattern?

I want to create a tree structure of categories and need to find a proper way to store it into the database. Think of the following animal tree, which pretty accurately describes how it should look like:
My question now is whether chaining those entries within the same table is a good idea or not. SQLite doesn't allow me to add a FOREIGN KEY constraint to a value in the same table, so I have to make sure manually that I don't create inconsistencies. This is what I currently plan to have:
id | parent | name
---+--------+--------
1 | null | Animal
2 | 1 | Reptile
3 | 2 | Lizard
4 | 1 | Mammal
5 | 4 | Equine
6 | 4 | Bovine
parent references to an id in the same table, going up all the way until null is found, which is the root. Is this a bad pattern? And if so, what are common alternatives to put a tree structure into a relational database?
If your version of SQLite supports recursive CTE, then this is one option:
WITH RECURSIVE cte (n) AS (
SELECT id FROM yourTable WHERE parent IS NULL
UNION ALL
SELECT t1.id
FROM yourTable t1
INNER JOIN cte t2
ON t1.parent = t2.n AND t1.name NOT LIKE '%Lizard%'
)
SELECT *
FROM yourTable
WHERE id IN cte;
This is untested, but the check on t1.name in the recursive portion of the above CTE (hopefully) should stop the recursion as soon we reach a record which matches the name in the LIKE expression. In the case of searching for Lizard, the recursion should stop one level above Lizard, meaning that every record above it in the hierarchy should be returned.

simple common table expression to flatten a tree

Following is the table format that I have.
Table name :: USERS
userid reporttouserid
------ ------------
101 NULL
102 101
103 102
Now I need a query to list all the child user ids under 101 that is 102 and 103 both (103 is indirectly under 101 as its parent 102 is under 101)
I have seen the common table expression in postgresql but not being able to figure out how to go about it.
The PostgreSQL documentation covers this topic. See the examples given for recursive CTEs on that page.
Recursive CTEs can be a bit hard to grasp, but are very powerful once you use them. Read the documentation and experiment a little; you'll get it.
(Please always mention your PostgreSQL version and show desired output in table-like form in your questions).
Given demo data:
create table users (
userid integer primary key,
reporttouserid integer references users(userid)
);
insert into users(userid, reporttouserid) values (101,null), (102,101), (103,102);
(please provide this in questions where possible, it's a pain to have to create it)
you can recursively walk the graph with something like:
WITH RECURSIVE flatusers(userid, reporttouserid, baseuserid) AS (
SELECT userid, reporttouserid, userid AS baseuserid
FROM users WHERE reporttouserid IS NULL
UNION ALL
SELECT u.userid, u.reporttouserid, f.baseuserid
FROM flatusers f
INNER JOIN users u ON f.userid = u.reporttouserid
)
SELECT * FROM flatusers;
producing output like:
userid | reporttouserid | baseuserid
--------+----------------+------------
101 | | 101
102 | 101 | 101
103 | 102 | 101
(3 rows)
I'm sure you can figure out where to go from there. Make sure you understand that recursive CTE before using it.
Be aware that PostgreSQL (9.4 or older at least) cannot (unfortunately) push quals down into CTE terms, even for non-recursive CTEs. If you add WHERE baseuserid = 101 to your query, the query will still generate the entire flattened table, then throw most of it away. If you want to do this recursive operation for just one baseuserid, you must add the appropriate WHERE clause term after WHERE reporttouserid IS NULL in the static union part of the recursive CTE term.

SQL - How to store and navigate hierarchies?

What are the ways that you use to model and retrieve hierarchical info in a database?
I like the Modified Preorder Tree Traversal Algorithm. This technique makes it very easy to query the tree.
But here is a list of links about the topic which I copied from the Zend Framework (PHP) contributors webpage (posted there by Posted by Laurent Melmoux at Jun 05, 2007 15:52).
Many of the links are language agnostic:
There is 2 main representations and algorithms to represent hierarchical structures with databases :
nested set also known as modified preorder tree traversal algorithm
adjacency list model
It's well explained here:
http://www.sitepoint.com/article/hierarchical-data-database
Managing Hierarchical Data in MySQL
http://www.evolt.org/article/Four_ways_to_work_with_hierarchical_data/17/4047/index.html
Here are some more links that I've collected:
http://en.wikipedia.org/wiki/Tree_%28data_structure%29
http://en.wikipedia.org/wiki/Category:Trees_%28structure%29
adjacency list model
http://www.sqlteam.com/item.asp?ItemID=8866
nested set
http://www.sqlsummit.com/AdjacencyList.htm
http://www.edutech.ch/contribution/nstrees/index.php
http://www.phpriot.com/d/articles/php/application-design/nested-trees-1/
http://www.dbmsmag.com/9604d06.html
http://en.wikipedia.org/wiki/Tree_traversal
http://www.cosc.canterbury.ac.nz/mukundan/dsal/BTree.html (applet java montrant le fonctionnement )
Graphes
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html
Classes :
Nested Sets DB Tree Adodb
http://www.phpclasses.org/browse/package/2547.html
Visitation Model ADOdb
http://www.phpclasses.org/browse/package/2919.html
PEAR::DB_NestedSet
http://pear.php.net/package/DB_NestedSet
utilisation : https://www.entwickler.com/itr/kolumnen/psecom,id,26,nodeid,207.html
PEAR::Tree
http://pear.php.net/package/Tree/download/0.3.0/
http://www.phpkitchen.com/index.php?/archives/337-PEARTree-Tutorial.html
nstrees
http://www.edutech.ch/contribution/nstrees/index.php
The definitive pieces on this subject have been written by Joe Celko, and he has worked a number of them into a book called Joe Celko's Trees and Hierarchies in SQL for Smarties.
He favours a technique called directed graphs. An introduction to his work on this subject can be found here
What's the best way to represent a hierachy in a SQL database? A generic, portable technique?
Let's assume the hierachy is mostly read, but isn't completely static. Let's say it's a family tree.
Here's how not to do it:
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date,
mother integer,
father integer
);
And inserting data like this:
person_id name dob mother father
1 Pops 1900/1/1 null null
2 Grandma 1903/2/4 null null
3 Dad 1925/4/2 2 1
4 Uncle Kev 1927/3/3 2 1
5 Cuz Dave 1953/7/8 null 4
6 Billy 1954/8/1 null 3
Instead, split your nodes and your relationships into two tables.
create table person (
person_id integer autoincrement primary key,
name varchar(255) not null,
dob date
);
create table ancestor (
ancestor_id integer,
descendant_id integer,
distance integer
);
Data is created like this:
person_id name dob
1 Pops 1900/1/1
2 Grandma 1903/2/4
3 Dad 1925/4/2
4 Uncle Kev 1927/3/3
5 Cuz Dave 1953/7/8
6 Billy 1954/8/1
ancestor_id descendant_id distance
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
1 3 1
2 3 1
1 4 1
2 4 1
1 5 2
2 5 2
4 5 1
1 6 2
2 6 2
3 6 1
you can now run arbitary queries that don't involve joining the table back on itself, which would happen if you have the heirachy relationship in the same row as the node.
Who has grandparents?
select * from person where person_id in
(select descendant_id from ancestor where distance=2);
All your descendants:
select * from person where person_id in
(select descendant_id from ancestor
where ancestor_id=1 and distance>0);
Who are uncles?
select decendant_id uncle from ancestor
where distance=1 and ancestor_id in
(select ancestor_id from ancestor
where distance=2 and not exists
(select ancestor_id from ancestor
where distance=1 and ancestor_id=uncle)
)
You avoid all the problems of joining a table to itself via subqueries, a common limitation is 16 subsuqeries.
Trouble is, maintaining the ancestor table is kind of hard - best done with a stored procedure.
I've got to disagree with Josh. What happens if you're using a huge hierarchical structure like a company organization. People can join/leave the company, change reporting lines, etc... Maintaining the "distance" would be a big problem and you would have to maintain two tables of data.
This query (SQL Server 2005 and above) would let you see the complete line of any person AND calculates their place in the hierarchy and it only requires a single table of user information. It can be modified to find any child relationship.
--Create table of dummy data
create table #person (
personID integer IDENTITY(1,1) NOT NULL,
name varchar(255) not null,
dob date,
father integer
);
INSERT INTO #person(name,dob,father)Values('Pops','1900/1/1',NULL);
INSERT INTO #person(name,dob,father)Values('Grandma','1903/2/4',null);
INSERT INTO #person(name,dob,father)Values('Dad','1925/4/2',1);
INSERT INTO #person(name,dob,father)Values('Uncle Kev','1927/3/3',1);
INSERT INTO #person(name,dob,father)Values('Cuz Dave','1953/7/8',4);
INSERT INTO #person(name,dob,father)Values('Billy','1954/8/1',3);
DECLARE #OldestPerson INT;
SET #OldestPerson = 1; -- Set this value to the ID of the oldest person in the family
WITH PersonHierarchy (personID,Name,dob,father, HierarchyLevel) AS
(
SELECT
personID
,Name
,dob
,father,
1 as HierarchyLevel
FROM #person
WHERE personID = #OldestPerson
UNION ALL
SELECT
e.personID,
e.Name,
e.dob,
e.father,
eh.HierarchyLevel + 1 AS HierarchyLevel
FROM #person e
INNER JOIN PersonHierarchy eh ON
e.father = eh.personID
)
SELECT *
FROM PersonHierarchy
ORDER BY HierarchyLevel, father;
DROP TABLE #person;
FYI: SQL Server 2008 introduces a new HierarchyID data type for this sort of situation. Gives you control over where in the "tree" your row sits, horizontally as well as vertically.
Oracle: SELECT ... START WITH ... CONNECT BY
Oracle has an extension to SELECT that allows easy tree-based retrieval. Perhaps SQL Server has some similar extension?
This query will traverse a table where the nesting relationship is stored in parent and child columns.
select * from my_table
start with parent = :TOP
connect by prior child = parent;
http://www.adp-gmbh.ch/ora/sql/connect_by.html
I prefer a mix of the techinques used by Josh and Mark Harrison:
Two tables, one with the data of the Person and other with the hierarchichal info (person_id, parent_id [, mother_id]) if the PK of this table is person_id, you have a simple tree with only one parent by node (which makes sense in this case, but not in other cases like accounting accounts)
This hiarchy table can be transversed by recursive procedures or if your DB supports it by sentences like SELECT... BY PRIOR (Oracle).
Other posibility is if you know the max deep of the hierarchy data you want to mantain is use a single table with a set of columns per level of hierarchy
We had the same issue when we implemented a tree component for [fleXive] and used the nested set tree model approach mentioned by tharkun from the MySQL docs.
In addition to speed things (dramatically) up we used a spreaded approach which simply means we used the maximum Long value for the top level right bounds which allows us to insert and move nodes without recalculating all left and right values. Values for left and right are calculated by dividing the range for a node by 3 und use the inner third as bounds for the new node.
A java code example can be seen here.
If you're using SQL Server 2005 then this link explains how to retrieve hierarchical data.
Common Table Expressions (CTEs) can be your friends once you get comfortable using them.