simple common table expression to flatten a tree - sql

Following is the table format that I have.
Table name :: USERS
userid reporttouserid
------ ------------
101 NULL
102 101
103 102
Now I need a query to list all the child user ids under 101 that is 102 and 103 both (103 is indirectly under 101 as its parent 102 is under 101)
I have seen the common table expression in postgresql but not being able to figure out how to go about it.

The PostgreSQL documentation covers this topic. See the examples given for recursive CTEs on that page.
Recursive CTEs can be a bit hard to grasp, but are very powerful once you use them. Read the documentation and experiment a little; you'll get it.
(Please always mention your PostgreSQL version and show desired output in table-like form in your questions).
Given demo data:
create table users (
userid integer primary key,
reporttouserid integer references users(userid)
);
insert into users(userid, reporttouserid) values (101,null), (102,101), (103,102);
(please provide this in questions where possible, it's a pain to have to create it)
you can recursively walk the graph with something like:
WITH RECURSIVE flatusers(userid, reporttouserid, baseuserid) AS (
SELECT userid, reporttouserid, userid AS baseuserid
FROM users WHERE reporttouserid IS NULL
UNION ALL
SELECT u.userid, u.reporttouserid, f.baseuserid
FROM flatusers f
INNER JOIN users u ON f.userid = u.reporttouserid
)
SELECT * FROM flatusers;
producing output like:
userid | reporttouserid | baseuserid
--------+----------------+------------
101 | | 101
102 | 101 | 101
103 | 102 | 101
(3 rows)
I'm sure you can figure out where to go from there. Make sure you understand that recursive CTE before using it.
Be aware that PostgreSQL (9.4 or older at least) cannot (unfortunately) push quals down into CTE terms, even for non-recursive CTEs. If you add WHERE baseuserid = 101 to your query, the query will still generate the entire flattened table, then throw most of it away. If you want to do this recursive operation for just one baseuserid, you must add the appropriate WHERE clause term after WHERE reporttouserid IS NULL in the static union part of the recursive CTE term.

Related

Transform Row Values to Column Names

I have a table of customer contacts and their role. Simplified example below.
customer | role | userid
----------------------------
1 | Support | 123
1 | Support | 456
1 | Procurement | 567
...
desired output
customer | Support1 | Support2 | Support3 | Support4 | Procurement1 | Procurement2
-----------------------------------------------------------------------------------
1 | 123 | 456 | null | null | 567 | null
2 | 123 | 456 | 12333 | 45776 | 888 | 56723
So dynamically create number of required columns based on how many user are in that role. It's a small number of roles. Also I can assume max 5 user in that same role. Which means worst case I need to generate 5 columns for each role. The userids don't need to be in any particular order.
My current approach is getting 1 userid per role/customer. Then a second query pulls another id that wasn't part of first results set. And so on. But that way I have to statically create 5 queries. It works. But I was wondering whether there is a more efficient way? Dynamically creating needed columns.
Example of pulling one user per role:
SELECT customer,role,
(SELECT top 1 userid
FROM temp as tmp1
where tmp1.customer=tmp2.customer and tmp1.role=tmp2.role
) as userid
FROM temp as tmp2
group by customer,role
order by customer,role
SQL create with dummy data
create table temp
(
customer int,
role nvarchar(20),
userid int
)
insert into temp values (1,'Support',123)
insert into temp values (1,'Support',456)
insert into temp values (1,'Procurement',567)
insert into temp values (2,'Support',123)
insert into temp values (2,'Support',456)
insert into temp values (2,'Procurement',888)
insert into temp values (2,'Support',12333)
insert into temp values (2,'Support',45776)
insert into temp values (2,'Procurement',56723)
You may need to adapt your approach slightly if you want to avoid getting into the realm of programming user defined table functions (which is what you would need in order to generate columns dynamically). You don't mention which SQL database variant you are using (SQL Server, PostgreSQL, ?). I'm going to make the assumption that it supports some form of string aggregation feature (they pretty much all do), but the syntax for doing this will vary, so you will probably have to adjust the code to your circumstances. You mention that the number of roles is small (5-ish?). The proposed solution is to generate a comma-separated list of user ids, one for each role, using common table expressions (CTEs) and the LISTAGG (variously named STRING_AGG, GROUP_CONCAT, etc. in other databases) function.
WITH tsupport
AS (SELECT customer,
Listagg(userid, ',') AS "Support"
FROM temp
WHERE ROLE = 'Support'
GROUP BY customer),
tprocurement
AS (SELECT customer,
Listagg(userid, ',') AS "Procurement"
FROM temp
WHERE ROLE = 'Procurement'
GROUP BY customer)
--> tnextrole...
--> AS (SELECT ... for additional roles
--> Listagg...
SELECT a.customer,
"Support",
"Procurement"
--> "Next Role" etc.
FROM tsupport a
JOIN tprocurement b
ON a.customer = b.customer
--> JOIN tNextRole ...
Fiddle is here with a result that appears as below based on your dummy data:

Is chaining rows in the same table a bad pattern?

I want to create a tree structure of categories and need to find a proper way to store it into the database. Think of the following animal tree, which pretty accurately describes how it should look like:
My question now is whether chaining those entries within the same table is a good idea or not. SQLite doesn't allow me to add a FOREIGN KEY constraint to a value in the same table, so I have to make sure manually that I don't create inconsistencies. This is what I currently plan to have:
id | parent | name
---+--------+--------
1 | null | Animal
2 | 1 | Reptile
3 | 2 | Lizard
4 | 1 | Mammal
5 | 4 | Equine
6 | 4 | Bovine
parent references to an id in the same table, going up all the way until null is found, which is the root. Is this a bad pattern? And if so, what are common alternatives to put a tree structure into a relational database?
If your version of SQLite supports recursive CTE, then this is one option:
WITH RECURSIVE cte (n) AS (
SELECT id FROM yourTable WHERE parent IS NULL
UNION ALL
SELECT t1.id
FROM yourTable t1
INNER JOIN cte t2
ON t1.parent = t2.n AND t1.name NOT LIKE '%Lizard%'
)
SELECT *
FROM yourTable
WHERE id IN cte;
This is untested, but the check on t1.name in the recursive portion of the above CTE (hopefully) should stop the recursion as soon we reach a record which matches the name in the LIKE expression. In the case of searching for Lizard, the recursion should stop one level above Lizard, meaning that every record above it in the hierarchy should be returned.

SQL Server: what's the best way to query a table whose name is stored in a column?

I'd like to put together a query but avoid using a cursor in order to do so. We have PDF files stored in multiple tables. One year for each table. So we have table names such as:
"Files_2012", "Files_2013", "Files_2014", etc.
We then have a master table (called Files) that contains which table the file is stored in.
Here's the layout:
=======================================
FILES
=======================================
FileId | RecordId | FileTableName
---------------------------------------
104 | 7108162 | Files_2013
105 | 7108162 | Files_2014
106 | 7108162 | Files_2013
The yearly tables would then look like this:
=======================================
FILES_2013
=======================================
FileId | FileData (varbinary
---------------------------------------
104 | 0x255044462D312E340A25E2E3CFD30D...
106 | 0x897444462D312E340A25E2E3CFD30D...
=======================================
FILES_2014
=======================================
FileId | FileData (varbinary
---------------------------------------
105 | 0x556044462D312E340A25E2E3CFD30D...
My query needs to return records based on the RecordId. So, in this example, all 3 of the Files.RecordId values are the same. I would need to return the FileData column for all 3 records, like this:
=======================================
My returned records
=======================================
FileId | FileData (varbinary
---------------------------------------
104 | 0x255044462D312E340A25E2E3CFD30D...
105 | 0x556044462D312E340A25E2E3CFD30D...
106 | 0x897444462D312E340A25E2E3CFD30D...
How can I do this? If it helps, here's my query so far, although I may be way off. I'm storing the FileTableName values into a temporary table & was hoping to work with them that way, but I'm stuck after this:
DECLARE #recordId INT
CREATE TABLE #tmpFiles (FileId int, FileTableName varchar(100), FileData varbinary(max))
SET #recordId = 7108162
INSERT INTO #tmpFiles (FileId, FileTableName)
SELECT FileId, FileTableName
FROM dbo.Files
WHERE RecordId = #recordId
UPDATE t
SET t.FileData = f.FileData
FROM #tmpFiles t
INNER JOIN Files_2013 f ON t.FileId = f.FileId
Thanks for any help you can provide.
Assuming that you know table names in advance you can use UNION ALL:
DECLARE #recordId INT = 7108162;
WITH cte(FileId, FileData, Year) AS
(
SELECT FileId , FileData, 2012 AS [Year]
FROM FILES_2012
UNION ALL
SELECT FileId , FileData, 2013
FROM FILES_2013
UNION ALL
SELECT FileId , FileData, 2014
FROM FILES_2014
)
SELECT c.*
FROM FILES f
JOIN cte c
ON f.FileId = c.FileId
WHERE f.RecordId = #RecordId;
If you don't know tables names (I doubt because they have common name pattern) you need to use Dynamic-SQL but reconsider different options.
Read The Curse and Blessings of Dynamic SQL by Erland Sommarskog
SELECT * FROM sales + #yymm
This is a variation of the previous case, where there is a suite of
tables that actually do describe the same entity. All tables have the
same columns, and the name includes some partitioning component,
typically year and sometimes also month. New tables are created as a
new year/month begins.
In this case, writing one stored procedure per table is not really
feasible. Not the least, because the user may want to specify a date
range for a search, so even with one procedure per table you would
still need a dynamic dispatcher.
Now, let's make this very clear: this is a flawed table design. You
should not have one sales table per month, you should have one single
sales table, and the month that appear in the table name, should be
the first column of the primary key in the united sales table. But you
may be stuck with a legacy application where you cannot easily change
the table design. And, admittedly, there are situations where
partitioning makes sense. The table may be huge (say over 10 GB in
size), or you want to be able age to out old data quickly. But in such
case you should do partitioning properly.
In the following, I will look at three approaches to deal with
partitioning without using dynamic SQL.
Possible solutions:
Partitioned Tables
Views and Partitioned Views
Compatibility Views

How to add aggregate value to SELECT?

I'm selecting data from multiple tables and I also need to get maximum "timestamp" on those tables. I will need that to create custom cache control.
tbl_name tbl_surname
id | name id | surname
--------- ------------
0 | John 0 | Doe
1 | Jane 1 | Tully
... ...
I have following query:
SELECT name, surname FROM tbl_name, tbl_surname WHERE tbl_name.id = tbl_surname.id
and I need to add following info to result set:
SELECT MAX(ora_rowscn) FROM (SELECT ora_rowscn FROM tbl_name
UNION ALL
SELECT ora_rowscn FROM tbl_surname);
I was trying to use UNION but I get error - mixing group and not single group data - or something like that, I know why I cannot use the union.
I don't want to split this into 2 calls, because I need the timestamp of the current snapshot I took from DB for my cache management. And between select and the call for MAX the DB could change.
Here is result I want:
John | Doe | 123456
Jane | Tully | 123456
where 123456 is approximate time of last change (insert, update, delete) of tables tbl_name and tbl_surname.
I have read only access to DB, so I cannot create triggers, stored procedures, extra tables etc...
Thanks for any suggestions.
EDIT: The value *ora_rowscn* is assigned per block of rows. So in one table this value can differ per row. I need the maximal value from both (all) tables involved in query.
Try:
SELECT name,
surname,
max(greatest(tbl_name.ora_rowscn, tbl_surname.ora_rowscn)) over () as max_rowscn
FROM tbl_name, tbl_surname
WHERE tbl_name.id = tbl_surname.id
There's no need to aggregate here - just include both ora_rowscn values in your query and take the max:
SELECT
n.name,
n.ora_rowscn as n_ora_rowscn,
s.surname,
s.ora_rowscn as s_ora_rowscn,
greatest(n.ora_rowscn, s.ora_rowscn) as last_ora_rowscn
FROM tbl_name n
join tbl_surname s on n.id = s.id
BTW, I've replaced your old-style joins with ANSI style - better readable, IMHO.

How to represent a data tree in SQL?

I'm writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data.
I'm using a UI library to present the tree in a windows form where I can bind the tree to the TreeView.
I will need to save this tree and nodes in the DB.
What will be the best way to save the tree and to get the following features:
Intuitive implementation.
Easy binding. Will be easy to move from the tree to the DB structure and back (if any)
I had 2 ideas. The first is to serialize the data into a one liner in a table.
The second is to save in tables but then, when moving to data entities I will loose the row states on the table on changed nodes.
Any ideas?
I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed
The recommendation from there is to use a Closure Table (it's explained in the slides).
Here is the summary (slide 77):
| Query Child | Query Subtree | Modify Tree | Ref. Integrity
Adjacency List | Easy | Hard | Easy | Yes
Path Enumeration | Easy | Easy | Hard | No
Nested Sets | Hard | Easy | Hard | No
Closure Table | Easy | Easy | Easy | Yes
The easiest implementation is adjacency list structure:
id parent_id data
However, some databases, particularly MySQL, have some issues in handling this model, because it requires an ability to run recursive queries which MySQL lacks.
Another model is nested sets:
id lft rgt data
where lft and rgt are arbitrary values that define the hierarchy (any child's lft, rgt should be within any parent's lft, rgt)
This does not require recursive queries, but it slower and harder to maintain.
However, in MySQL this can be improved using SPATIAL abitilies.
See these articles in my blog:
Adjacency list vs. nested sets: PostgreSQL
Adjacency list vs. nested sets: SQL Server
Adjacency list vs. nested sets: Oracle
Adjacency list vs. nested sets: MySQL
for more detailed explanations.
I'm suprised that nobody mentioned the materialized path solution, which is probably the fastest way of working with trees in standard SQL.
In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.
Have a look at the example table node:
+---------+-------+
| node_id | path |
+---------+-------+
| 0 | |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1.4 |
| 5 | 2.5 |
| 6 | 2.6 |
| 7 | 2.6.7 |
| 8 | 2.6.8 |
| 9 | 2.6.9 |
+---------+-------+
In order to get the children of node x, you can write the following query:
SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')
Keep in mind, that the column path should be indexed, in order to perform fast with the LIKE clause.
If you are using PostgreSQL you can use ltree, a package in the contrib extension (comes by default) which implements the tree data structure.
From the docs:
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);
You can do queries like:
ltreetest=> SELECT path FROM test WHERE path <# 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
It depends on how you will be querying and updating the data. If you store all the data in one row, it's basically a single unit that you can't query into or partially update without rewriting all the data.
If you want to store each element as a row, you should first read Managing Hierarchical Data in MySQL (MySQL specific, but the advice holds for many other databases too).
If you're only ever accessing an entire tree, the adjacency list model makes it difficult to retrieve all nodes under the root without using a recursive query. If you add an extra column that links back to the head then you can do SELECT * WHERE head_id = #id and get the whole tree in one non-recursive query, but it denormalizes the database.
Some databases have custom extensions that make storing and retrieving heirarchical data easier, for example Oracle has CONNECT BY.
As this is the top answer when asking "sql trees" in a google search, I will try to update this from the perspective of today (december 2018).
Most answers imply that using an adjacency list is both simple and slow and therefore recommend other methods.
Since version 8 (published april 2018) MySQL supports recursive common table expressions (CTE). MySQL is a bit late to the show but this opens up a new option.
There is a tutorial here that explains the use of recursive queries to manage an adjacency list.
As the recursion now runs completely within the database engine, it is way much faster than in the past (when it had to run in the script engine).
The blog here gives some measurements (which are both biased and for postgres instead of MySQL) but nevertheless it shows that adjacency lists do not have to be slow.
So my conclusion today is:
The simple adjacency list may be fast enough if the database engine supports recursion.
Do a benchmark with your own data and your own engine.
Do not trust outdated recommendations to point out the "best" method.
PGSQL Tree relations
Hello, I just got a handle on this for a project I'm working on and figured I'd share my write-up
Hope this helps. Let's get started with some prereqs
This is essentially the closure table solution mentioned above Using recursive calls. Thanks for those slides they are very useful I wish i saw them before this write up :)
pre-requisites
Recursive Functions
these are functions that call themselves ie
function factorial(n) {
if (n = 0) return 1; //base case
return n * factorial(n - 1); // recursive call
}
This is pretty cool luckily pgsql has recursive functions too but it can be a bit much. I prefer functional stuff
cte with pgsql
WITH RECURSIVE t(n) AS (
VALUES (1) -- nonrecusive term
UNION ALL
SELECT n+1 FROM t WHERE n < 100 -- recusive term
--continues until union adds nothing
)
SELECT sum(n) FROM t;
The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:
Recursive Query Evaluation
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
a. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
b. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
to do something like factorial in sql you need to do something more like this so post
ALTER FUNCTION dbo.fnGetFactorial (#num int)
RETURNS INT
AS
BEGIN
DECLARE #n int
IF #num <= 1 SET #n = 1
ELSE SET #n = #num * dbo.fnGetFactorial(#num - 1)
RETURN #n
END
GO
Tree data structures (more of a forest :)
wikipedia
The import thing to note is that a tree is a subset of a graph, This can be simply enforced by
the relationship each node has only one parent.
Representing the Tree in PGSQL
I think it will be easiest to work it out a little more theoretically before we move on to the sql
The simple way of represent a graph relation without data duplication is by separating the nodes(id, data) from the edges.
We can then restrict the edges(parent_id, child_id) table to enforce our constraint. be mandating that parent_id,child_id
as well as just child id be unique
create table nodes (
id uuid default uuid_generate_v4() not null unique ,
name varchar(255) not null,
json json default '{}'::json not null,
remarks varchar(255),
);
create table edges (
id uuid default uuid_generate_v4() not null,
parent_id uuid not null,
child_id uuid not null,
meta json default '{}'::json,
constraint group_group_id_key
primary key (id),
constraint group_group_unique_combo
unique (parent_id, child_id),
constraint group_group_unique_child
unique (child_id),
foreign key (parent_id) references nodes
on update cascade on delete cascade,
foreign key (child_id) references nodes
on update cascade on delete cascade
);
Note that theoretical this can all be done with only one table by simply putting the parent_id in the nodes table
and then
CREATE VIEW v_edges as (SELECT id as child_id, parent_id FROM nodes)
but for the proposal of flexibility and so that we can incorporate other graph structures to this
framework I will use the common many-to-many relationship structure. This will ideally allow this research to be
expanded into other graph algorithms.
Let's start out with a sample data structure
INSERT (id, my_data) VALUES ('alpha', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('bravo', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('charly', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('berry', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('zeta', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('yank', 'my big data') INTO nodes
INSERT (parent_id, child_id) VALUES ('alpha', 'bravo') INTO edges
INSERT (parent_id, child_id) VALUES ('alpha', 'berry') INTO edges
INSERT (parent_id, child_id) VALUES ('bravo', 'charly') INTO edges
INSERT (parent_id, child_id) VALUES ('yank', 'zeta') INTO edges
-- rank0 Alpha Yank
-- rank1 Bravo Berry Zeta
-- rank2 Charly
Note the interesting properties of a tree (number of edges e) =( number of nodes n)-1
each child has exactly one parent.
We can then simplify the equations
let n = node
let p = parent
let c = child
let ns = nodes = groups
let es = edges = group_group // because this is a relationship of a group entity to another group entity
So now what sort of questions will we ask.
"Given an arbitrary set of groups 's' what is the coverage of the graph assuming nodes inherit their children?"
This is a tricky question, it requires us to traverse the graph and find all children of each node in s
This continues off of this stack overflow post
-- some DBMS (e.g. Postgres) require the word "recursive"
-- some others (Oracle, SQL-Server) require omitting the "recursive"
-- and some (e.g. SQLite) don't bother, i.e. they accept both
-- drop view v_group_descendant;
create view v_group_descendant as
with recursive descendants -- name for accumulating table
(parent_id, descendant_id, lvl) -- output columns
as
( select parent_id, child_id, 1
from group_group -- starting point, we start with each base group
union all
select d.parent_id, s.child_id, d.lvl + 1
from descendants d -- get the n-1 th level of descendants/ children
join group_group s -- and join it to find the nth level
on d.descendant_id = s.parent_id -- the trick is that the output of this query becomes the input
-- Im not sure when it stops but probably when there is no change
)
select * from descendants;
comment on view v_group_descendant is 'This aggregates the children of each group RECURSIVELY WOO ALL THE WAY DOWN THE TREE :)';
after we have this view we can join with our nodes/groups to get out data back i will not provide these samples for every single step for the most part we will just work with ids.
select d.*, g1.group_name as parent, g2.group_name as decendent --then we join it with groups to add names
from v_group_descendant d, groups g1, groups g2
WHERE g1.id = d.parent_id and g2.id = d.descendant_id
order by parent_id, lvl, descendant_id;
sample output
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
+------------------------------------+------------------------------------+---+----------+---------+
Note that this is just the minimal node descendant relationship and has actual lost all nodes with 0 children such as charly.
In order to resolve this we need to add all nodes back which don't appear in the descendants list
create view v_group_descendant_all as (
select * from v_group_descendant gd
UNION ALL
select null::uuid as parent_id,id as descendant_id, 0 as lvl from groups g
where not exists (select * from v_group_descendant gd where gd.descendant_id = g.id )
);
comment on view v_group_descendant is 'complete list of descendants including rank 0 root nodes descendant - parent relationship is duplicated for all levels / ranks';
preview
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
|null |c1529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |alpha |
|null |42529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |yank |
+------------------------------------+------------------------------------+---+----------+---------+
Lets say for example we are getting our set s of groups bases on a users(id , data) table with a user_group(user_id, group_id) relation
We can then join this to another table removing duplicates because our set s of user_group relations may cause
duplicates if a users is say assigned to both alpha assigned charly
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | charly |
| kier | yank |
| kier | bravo |
+------+--------+
--drop view v_user_group_recursive;
CREATE VIEW v_user_group_recursive AS (
SELECT DISTINCT dd.descendant_id AS group_id, ug.user_id
FROM v_group_descendant_all dd , user_group ug
WHERE (ug.group_id = dd.descendant_id
OR ug.group_id = dd.parent_id) -- should gic
);
SELECT * FROM v_user_group_recursive;
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | bravo |
| jane | berry |
| jane | charly |
-- | jane | charly | Removed by DISTINCT
| kier | yank |
| kier | zeta |
| kier | bravo |
| kier | charly |
+------+--------+
If we want we can now group by node and join we can do somthing k like the fallowing
CREATE VIEW v_user_groups_recursive AS (
SELECT user_id, json_agg(json_build_object('id', id,'parent_id',parent_id, 'group_name', group_name, 'org_id', org_id, 'json', json, 'remarks', remarks)) as groups
FROM v_user_group_recursive ug, v_groups_parent g
WHERE ug.group_id = g.id GROUP BY user_id
);
comment on view v_user_group_recursive is 'This aggregates the groups for each user recursively ';
+------+-------------------------------+
| user | groups |
+------+-------------------------------+
| jane | [alpha, bravo, berry, charly] |
| kier | [yank, zeta, bravo, charly] |
+------+-------------------------------+
This is awesome we have answered the question. We now can simply ask which groups this use inherits
SELECT * from v_user_groups_recursive where user_id = 'kier
Displaying our hard work in the front end
And further we could use somthing like jstree.com to display
our structure
async function getProjectTree(user_id) {
let res = await table.query(format('SELECT * from v_user_groups_recursive ug WHERE ug.user_id = %L', user_id));
if (res.success) {
let rows = res.data[0].groups.map(r => {
return {
id: r.id, // required
parent: r.parent_id==null?'#':r.parent_id,// required
text: r.group_name,// node text
icon: 'P', // string for custom
state: {
opened: true, // is the node open
disabled: false, // is the node disabled
selected: false, // is the node selected
},
li_attr: {}, // attributes for the generated LI node
a_attr: {} // attributes for the generated A node
}
})
return {success: true, data: rows, msg: 'Got all projects'}
} else return res;
}
<div id="v_project_tree" class="row col-10 mx-auto" style="height: 25vh"></div>
<script>
function buildTree() {
bs.sendJson('get', "/api/projects/getProjectTree").then(res => {
bs.resNotify(res);
if (!res.success) {
//:(
console.error(':(');
return
}
console.log(res.data);
$('#v_project_tree').jstree({
'core': {
'data': res.data
}
});
})
}
window.addEventListener('load', buildTree);
</script>
jstree preview
blog
The best way, I think indeed is to give each node an id and a parent_id, where the parent id is the id of the parent node. This has a couple of benefits
When you want to update a node, you only have to rewrite the data of that node.
When you want to query only a certain node, you can get exactly the information you want, thus having less overhead on the database connection
A lot of programming languages have functionality to transform mysql data into XML or json, which will make it easier to open up your application using an api.
Something like table "nodes" where each node row contains parent id (in addition to the ordinary node data). For root, the parent is NULL.
Of course, this makes finding children a bit more time consuming, but this way the actual database will be quite simple.