Alternative of HIERARCHY_ANCESTORS function in SAP HANA - sql

I wrote a piece of code in HANA and used the HIERARCHY_ANCESTORS function. However, it started giving me all sorts of OOM (Out of Memory) issues. I raised an OSS & provided all the details just to realize that it was an unidentified issue of the standard SAP function which will now be rectified in the next release.
Now I do not want to delve on the HIERARCHY_ANCESTORS function, as I already lost a month communicating through the OSS.
What I need is an alternative, which does not include any loops. Basically, I need the ancestors of all the leaf nodes (HIERARCHY_TREE_SIZE = 1) identified from the HIERARCHY function, without using loops. There can be over around 35k leaf nodes.
The data size is over 80k records, and I have tried looping over the same earlier, and it severely degrades the performance, timing out after a certain point. My need is to wrap it up in less than 30s, like the HIERARCHY_ANCESTORS functions would.
I can perhaps create a recursive function to fetch all the ancestors of 1 leaf ID. But how would I use it inside a SQL query, so that the same function can then fetch the ancestors of all the requisite IDs?
Any help is appreciated from HANA POV.
Thank you.

Consider the following hierarchy
The leaves and their ancestors are:
| NODE_ID | ANCESTOR |
| ------- | -------- |
| A010 | A |
| A10 | A |
| B | B |
| C0 | C |
To get there, you can use annother hierarchy function to produce the same behavior.
create COLUMN TABLE TSTHIERARCHY(
parent_id nvarchar(32),
node_id nvarchar(32),
val integer
);
INSERT INTO TSTHIERARCHY VALUES (NULL,'A',1);
INSERT INTO TSTHIERARCHY VALUES ('A','A0',2);
INSERT INTO TSTHIERARCHY VALUES ('A','A1',2);
INSERT INTO TSTHIERARCHY VALUES ('A0','A01',3);
INSERT INTO TSTHIERARCHY VALUES ('A01','A010',4);
INSERT INTO TSTHIERARCHY VALUES ('A1','A10',3);
INSERT INTO TSTHIERARCHY VALUES (NULL,'B',1);
INSERT INTO TSTHIERARCHY VALUES (NULL,'C',1);
INSERT INTO TSTHIERARCHY VALUES ('C','C0',2);
WITH t1 AS (SELECT * FROM HIERARCHY( SOURCE TSTHIERARCHY ))
SELECT leaves.NODE_ID, ancestors.NODE_ID AS ANCESTOR
FROM (
select NODE_ID, HIERARCHY_ROOT_RANK
FROM t1 WHERE HIERARCHY_RANK NOT IN (SELECT DISTINCT HIERARCHY_PARENT_RANK FROM t1)
) leaves
INNER join
( SELECT NODE_ID, HIERARCHY_RANK
FROM t1 WHERE HIERARCHY_RANK = HIERARCHY_ROOT_RANK
) ancestors
on (leaves.HIERARCHY_ROOT_RANK=ancestors.HIERARCHY_RANK)
Does that work ?
A simpler query exists to display the ancestor for all hierarchy members, not just leaf nodes:
SELECT NODE_ID,HIERARCHY_RANK,
FIRST_VALUE(NODE_ID) OVER (PARTITION BY HIERARCHY_ROOT_RANK ORDER BY hierarchy_level) AS ancestors
FROM HIERARCHY( SOURCE TSTHIERARCHY )

Related

How to find and group together records with leading zeros and their counter part (duplicate) record that has had the leading zeros removed in a DB

I have a DB that has correctly formatted records with leading zeros. However, the same records, except with the leading zeros stripped (thanks excel) got added to the DB also. In essence creating duplicate records that are following the wrong numbering convention. So, the DB has records with correct ID's such as...
01234,
01122,
01323,
but also with incorrect ID numbers like
1234,
1122,
1323,
I am trying to do a query that will return a result set grouping these duplicate records in the DB like this:
01234,
1234,
01122,
1122,
01323,
1323,
Any thoughts are much appreciated.
I used CROSS APPLY to re-query the table.
The query in the CROSS APPLY looks for "matches" where the they values would be equal if they were both CAST as integers, but are not equal as string values. Then, to clean up the result set a little, I limited it to where the base value doesn't start with a zero character. Otherwise, the query was pulling the matches in both directions, which didn't seem useful.
Data set up:
DECLARE #t TABLE
(
idCol VARCHAR(10) NOT NULL
);
INSERT #t
(
idCol
)
VALUES
('01234')
,('01122')
,('01321')
,('1234')
,('1122')
,('1321')
,('00012');
The query:
SELECT
t.idCol
,c.idCol
FROM
#t AS t
CROSS APPLY
(
SELECT
idCol
FROM
#t
WHERE
idCol = CAST(t.idCol AS INT)
AND idCol <> t.idCol
AND LEFT(idCol, 1) <> '0'
) AS c;
Results:
+-------+-------+
| idCol | idCol |
+-------+-------+
| 01234 | 1234 |
| 01122 | 1122 |
| 01321 | 1321 |
+-------+-------+

SQL Multiple Row Insert w/ multiple selects from different tables

I am trying to do a multiple insert based on values that I am pulling from a another table. Basically I need to give all existing users access to a service that previously had access to a different one. Table1 will take the data and run a job to do this.
INSERT INTO Table1 (id, serv_id, clnt_alias_id, serv_cat_rqst_stat)
SELECT
(SELECT Max(id) + 1
FROM Table1 ),
'33', --The new service id
clnt_alias_id,
'PI' --The code to let the job know to grant access
FROM TABLE2,
WHERE serv_id = '11' --The old service id
I am getting a Primary key constraint error on id.
Please help.
Thanks,
Colin
This query is impossible. The max(id) sub-select will evaluate only ONCE and return the same value for all rows in the parent query:
MariaDB [test]> create table foo (x int);
MariaDB [test]> insert into foo values (1), (2), (3);
MariaDB [test]> select *, (select max(x)+1 from foo) from foo;
+------+----------------------------+
| x | (select max(x)+1 from foo) |
+------+----------------------------+
| 1 | 4 |
| 2 | 4 |
| 3 | 4 |
+------+----------------------------+
3 rows in set (0.04 sec)
You will have to run your query multiple times, once for each record you're trying to copy. That way the max(id) will get the ID from the previous query.
Is there a requirement that Table1.id be incremental ints? If not, just add the clnt_alias_id to Max(id). This is a nasty workaround though, and you should really try to get that column's type changed to auto_increment, like Marc B suggested.

Placing different rows in succession

I've started working with access around 1 month ago and I'm actually making a tool for preventive medicine so they can use a digital version of their actual paper form.
While the program is nearly finished, the med who requested it now wants to export to excel (the easy part) all the data from a patient his treatment and all the medicines used during that treatment in a single line (the problem).
I've been beating my head over that for two days, trying and researching on google, but all i could find was how to put values from a column in a single cell, and that's not how it has to be displayed.
So far, my best attempt (which is far from a good one) has been something like that:
CREATE TABLE Patient
(`SIP` int, `name` varchar(10));
INSERT INTO Patient
(`SIP`, `name`)
VALUES
(70,'John');
-- A patient can have multiple treatments
CREATE TABLE Treatment
(`id` int, `SIPFK` int);
INSERT INTO Treatment
(`id`,`SIPFK`)
VALUES
(1,70);
-- A treatment can have multiple medicines used while it's open
CREATE TABLE Medicine
(`Id` int, `Name` varchar(8), `TreatFK` int);
INSERT INTO Medicine
(`Id`, `Name`, `TreatFK`)
VALUES
(7, 'Apples', 1),
(7, 'Tomatoes', 1),
(7, 'Potatoes', 1),
(8, 'Banana', 2),
(8, 'Peach', 2);
-- The query
select c.id, c.Name, p.id as id2, p.Name as name2, r.id as id3, r.Name as name3
from Medicine as c, Medicine as p, Medicine as r
where c.id = 7 and p.id=7 and r.id=7;
The output I was trying to get was:
7 | Apples | 7 | Tomatoes | 7 | Potatoes
The table medicines will have more columns than that and i need to show every row related to a treatment in a single row along with the treatment.
But the values keep repeating themselves on different rows and the output on the subsequent columns besides the first ones is not as expected. Also GROUP BY won't solve the problem and DISTINCT doesn't work.
The output of the query is as follows: sqlfiddle.com
If any one could give me a hint, I would be grateful.
EDIT: Since access is a derp and won't let me use any good SQL fix nor will recognize DISTINCT to make the data from the queries not repeat themselves, I will try and search for a way to organize the rows directly in the exported excel.
Thank you all for your help, I'll save it cause I'm sure it'll save me hours of hands in the head.
This is a bit problemation, because MS Access does not support recursive CTE's and I dont see a way of doing that without Ranking.
Hence, I have tried to reproduce the results by using subquery which ranks the Medicines
and store these into a temporary table.
create table newtable
select c.id
, c.Name
,(SELECT COUNT(T1.Name) FROM Medicine AS T1 WHERE t1.id=c.id and T1.Name >= c.Name) AS Rank
from Medicine as c;
Afterwards, it is easy because my query is mostly based on Ranks and IDs.
select distinct id
,(select Name from newtable t2 where t1.id=t2.id and rank=1) as firstMed
,(select Name from newtable t2 where t1.id=t2.id and rank=2) as secMed
,(select Name from newtable t2 where t1.id=t2.id and rank=3) as ThirdMed
from newtable t1;
According to me, the SELF JOIN concept and the notion of recursive CTE's are the most important points for that particular example and a good practice would be to do a resarch on these.
for reference: http://sqlfiddle.com/#!2/f80a9/2

Is it possible to use a PG sequence on a per record label?

Does PostgreSQL 9.2+ provide any functionality to make it possible to generate a sequence that is namespaced to a particular value? For example:
.. | user_id | seq_id | body | ...
----------------------------------
- | 4 | 1 | "abc...."
- | 4 | 2 | "def...."
- | 5 | 1 | "ghi...."
- | 5 | 2 | "xyz...."
- | 5 | 3 | "123...."
This would be useful to generate custom urls for the user:
domain.me/username_4/posts/1
domain.me/username_4/posts/2
domain.me/username_5/posts/1
domain.me/username_5/posts/2
domain.me/username_5/posts/3
I did not find anything in the PG docs (regarding sequence and sequence functions) to do this. Are sub-queries in the INSERT statement or with custom PG functions the only other options?
You can use a subquery in the INSERT statement like #Clodoaldo demonstrates. However, this defeats the nature of a sequence as being safe to use in concurrent transactions, it will result in race conditions and eventually duplicate key violations.
You should rather rethink your approach. Just one plain sequence for your table and combine it with user_id to get the sort order you want.
You can always generate the custom URLs with the desired numbers using row_number() with a simple query like:
SELECT format('domain.me/username_%s/posts/%s'
, user_id
, row_number() OVER (PARTITION BY user_id ORDER BY seq_id)
)
FROM tbl;
db<>fiddle here
Old sqlfiddle
Maybe this answer is a little off-piste, but I would consider partitioning the data and giving each user their own partitioned table for posts.
There's a bit of overhead to the setup as you will need triggers for managing the DDL statements for the partitions, but would effectively result in each user having their own table of posts, along with their own sequence with the benefit of being able to treat all posts as one big table also.
General gist of the concept...
psql# CREATE TABLE posts (user_id integer, seq_id integer);
CREATE TABLE
psql# CREATE TABLE posts_001 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# CREATE TABLE posts_002 (seq_id serial) INHERITS (posts);
CREATE TABLE
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_001 VALUES (1);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# INSERT INTO posts_002 VALUES (2);
INSERT 0 1
psql# select * from posts;
user_id | seq_id
---------+--------
1 | 1
1 | 2
2 | 1
2 | 2
(4 rows)
I left out some rather important CHECK constraints in the above setup, make sure you read the docs for how these kinds of setups are used
insert into t values (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4))
Check for a duplicate primary key error in the front end and retry if needed.
Update
Although #Erwin advice is sensible, that is, a single sequence with the ordering in the select query, it can be expensive.
If you don't use a sequence there is no defeat of the nature of the sequence. Also it will not result in a duplicate key violation. To demonstrate it I created a table and made a python script to insert into it. I launched 3 parallel instances of the script inserting as fast as possible. And it just works.
The table must have a primary key on those columns:
create table t (
user_id int,
seq_id int,
primary key (user_id, seq_id)
);
The python script:
#!/usr/bin/env python
import psycopg2, psycopg2.extensions
query = """
begin;
insert into t (user_id, seq_id) values
(4, (select coalesce(max(seq_id), 0) + 1 from t where user_id = 4));
commit;
"""
conn = psycopg2.connect('dbname=cpn user=cpn')
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE)
cursor = conn.cursor()
for i in range(0, 1000):
while True:
try:
cursor.execute(query)
break
except psycopg2.IntegrityError, e:
print e.pgerror
cursor.execute("rollback;")
cursor.close()
conn.close()
After the parallel run:
select count(*), max(seq_id) from t;
count | max
-------+------
3000 | 3000
Just as expected. I developed at least two applications using that logic and one of then is more than 13 years old and never failed. I concede that if you are Facebook or some other giant then you could have a problem.
Yes:
CREATE TABLE your_table
(
column type DEFAULT NEXTVAL(sequence_name),
...
);
More details here:
http://www.postgresql.org/docs/9.2/static/ddl-default.html

How to represent a data tree in SQL?

I'm writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data.
I'm using a UI library to present the tree in a windows form where I can bind the tree to the TreeView.
I will need to save this tree and nodes in the DB.
What will be the best way to save the tree and to get the following features:
Intuitive implementation.
Easy binding. Will be easy to move from the tree to the DB structure and back (if any)
I had 2 ideas. The first is to serialize the data into a one liner in a table.
The second is to save in tables but then, when moving to data entities I will loose the row states on the table on changed nodes.
Any ideas?
I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed
The recommendation from there is to use a Closure Table (it's explained in the slides).
Here is the summary (slide 77):
| Query Child | Query Subtree | Modify Tree | Ref. Integrity
Adjacency List | Easy | Hard | Easy | Yes
Path Enumeration | Easy | Easy | Hard | No
Nested Sets | Hard | Easy | Hard | No
Closure Table | Easy | Easy | Easy | Yes
The easiest implementation is adjacency list structure:
id parent_id data
However, some databases, particularly MySQL, have some issues in handling this model, because it requires an ability to run recursive queries which MySQL lacks.
Another model is nested sets:
id lft rgt data
where lft and rgt are arbitrary values that define the hierarchy (any child's lft, rgt should be within any parent's lft, rgt)
This does not require recursive queries, but it slower and harder to maintain.
However, in MySQL this can be improved using SPATIAL abitilies.
See these articles in my blog:
Adjacency list vs. nested sets: PostgreSQL
Adjacency list vs. nested sets: SQL Server
Adjacency list vs. nested sets: Oracle
Adjacency list vs. nested sets: MySQL
for more detailed explanations.
I'm suprised that nobody mentioned the materialized path solution, which is probably the fastest way of working with trees in standard SQL.
In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.
Have a look at the example table node:
+---------+-------+
| node_id | path |
+---------+-------+
| 0 | |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1.4 |
| 5 | 2.5 |
| 6 | 2.6 |
| 7 | 2.6.7 |
| 8 | 2.6.8 |
| 9 | 2.6.9 |
+---------+-------+
In order to get the children of node x, you can write the following query:
SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')
Keep in mind, that the column path should be indexed, in order to perform fast with the LIKE clause.
If you are using PostgreSQL you can use ltree, a package in the contrib extension (comes by default) which implements the tree data structure.
From the docs:
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);
You can do queries like:
ltreetest=> SELECT path FROM test WHERE path <# 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
It depends on how you will be querying and updating the data. If you store all the data in one row, it's basically a single unit that you can't query into or partially update without rewriting all the data.
If you want to store each element as a row, you should first read Managing Hierarchical Data in MySQL (MySQL specific, but the advice holds for many other databases too).
If you're only ever accessing an entire tree, the adjacency list model makes it difficult to retrieve all nodes under the root without using a recursive query. If you add an extra column that links back to the head then you can do SELECT * WHERE head_id = #id and get the whole tree in one non-recursive query, but it denormalizes the database.
Some databases have custom extensions that make storing and retrieving heirarchical data easier, for example Oracle has CONNECT BY.
As this is the top answer when asking "sql trees" in a google search, I will try to update this from the perspective of today (december 2018).
Most answers imply that using an adjacency list is both simple and slow and therefore recommend other methods.
Since version 8 (published april 2018) MySQL supports recursive common table expressions (CTE). MySQL is a bit late to the show but this opens up a new option.
There is a tutorial here that explains the use of recursive queries to manage an adjacency list.
As the recursion now runs completely within the database engine, it is way much faster than in the past (when it had to run in the script engine).
The blog here gives some measurements (which are both biased and for postgres instead of MySQL) but nevertheless it shows that adjacency lists do not have to be slow.
So my conclusion today is:
The simple adjacency list may be fast enough if the database engine supports recursion.
Do a benchmark with your own data and your own engine.
Do not trust outdated recommendations to point out the "best" method.
PGSQL Tree relations
Hello, I just got a handle on this for a project I'm working on and figured I'd share my write-up
Hope this helps. Let's get started with some prereqs
This is essentially the closure table solution mentioned above Using recursive calls. Thanks for those slides they are very useful I wish i saw them before this write up :)
pre-requisites
Recursive Functions
these are functions that call themselves ie
function factorial(n) {
if (n = 0) return 1; //base case
return n * factorial(n - 1); // recursive call
}
This is pretty cool luckily pgsql has recursive functions too but it can be a bit much. I prefer functional stuff
cte with pgsql
WITH RECURSIVE t(n) AS (
VALUES (1) -- nonrecusive term
UNION ALL
SELECT n+1 FROM t WHERE n < 100 -- recusive term
--continues until union adds nothing
)
SELECT sum(n) FROM t;
The general form of a recursive WITH query is always a non-recursive term, then UNION (or UNION ALL), then a recursive term, where only the recursive term can contain a reference to the query's own output. Such a query is executed as follows:
Recursive Query Evaluation
Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.
So long as the working table is not empty, repeat these steps:
a. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.
b. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.
to do something like factorial in sql you need to do something more like this so post
ALTER FUNCTION dbo.fnGetFactorial (#num int)
RETURNS INT
AS
BEGIN
DECLARE #n int
IF #num <= 1 SET #n = 1
ELSE SET #n = #num * dbo.fnGetFactorial(#num - 1)
RETURN #n
END
GO
Tree data structures (more of a forest :)
wikipedia
The import thing to note is that a tree is a subset of a graph, This can be simply enforced by
the relationship each node has only one parent.
Representing the Tree in PGSQL
I think it will be easiest to work it out a little more theoretically before we move on to the sql
The simple way of represent a graph relation without data duplication is by separating the nodes(id, data) from the edges.
We can then restrict the edges(parent_id, child_id) table to enforce our constraint. be mandating that parent_id,child_id
as well as just child id be unique
create table nodes (
id uuid default uuid_generate_v4() not null unique ,
name varchar(255) not null,
json json default '{}'::json not null,
remarks varchar(255),
);
create table edges (
id uuid default uuid_generate_v4() not null,
parent_id uuid not null,
child_id uuid not null,
meta json default '{}'::json,
constraint group_group_id_key
primary key (id),
constraint group_group_unique_combo
unique (parent_id, child_id),
constraint group_group_unique_child
unique (child_id),
foreign key (parent_id) references nodes
on update cascade on delete cascade,
foreign key (child_id) references nodes
on update cascade on delete cascade
);
Note that theoretical this can all be done with only one table by simply putting the parent_id in the nodes table
and then
CREATE VIEW v_edges as (SELECT id as child_id, parent_id FROM nodes)
but for the proposal of flexibility and so that we can incorporate other graph structures to this
framework I will use the common many-to-many relationship structure. This will ideally allow this research to be
expanded into other graph algorithms.
Let's start out with a sample data structure
INSERT (id, my_data) VALUES ('alpha', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('bravo', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('charly', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('berry', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('zeta', 'my big data') INTO nodes
INSERT (id, my_data) VALUES ('yank', 'my big data') INTO nodes
INSERT (parent_id, child_id) VALUES ('alpha', 'bravo') INTO edges
INSERT (parent_id, child_id) VALUES ('alpha', 'berry') INTO edges
INSERT (parent_id, child_id) VALUES ('bravo', 'charly') INTO edges
INSERT (parent_id, child_id) VALUES ('yank', 'zeta') INTO edges
-- rank0 Alpha Yank
-- rank1 Bravo Berry Zeta
-- rank2 Charly
Note the interesting properties of a tree (number of edges e) =( number of nodes n)-1
each child has exactly one parent.
We can then simplify the equations
let n = node
let p = parent
let c = child
let ns = nodes = groups
let es = edges = group_group // because this is a relationship of a group entity to another group entity
So now what sort of questions will we ask.
"Given an arbitrary set of groups 's' what is the coverage of the graph assuming nodes inherit their children?"
This is a tricky question, it requires us to traverse the graph and find all children of each node in s
This continues off of this stack overflow post
-- some DBMS (e.g. Postgres) require the word "recursive"
-- some others (Oracle, SQL-Server) require omitting the "recursive"
-- and some (e.g. SQLite) don't bother, i.e. they accept both
-- drop view v_group_descendant;
create view v_group_descendant as
with recursive descendants -- name for accumulating table
(parent_id, descendant_id, lvl) -- output columns
as
( select parent_id, child_id, 1
from group_group -- starting point, we start with each base group
union all
select d.parent_id, s.child_id, d.lvl + 1
from descendants d -- get the n-1 th level of descendants/ children
join group_group s -- and join it to find the nth level
on d.descendant_id = s.parent_id -- the trick is that the output of this query becomes the input
-- Im not sure when it stops but probably when there is no change
)
select * from descendants;
comment on view v_group_descendant is 'This aggregates the children of each group RECURSIVELY WOO ALL THE WAY DOWN THE TREE :)';
after we have this view we can join with our nodes/groups to get out data back i will not provide these samples for every single step for the most part we will just work with ids.
select d.*, g1.group_name as parent, g2.group_name as decendent --then we join it with groups to add names
from v_group_descendant d, groups g1, groups g2
WHERE g1.id = d.parent_id and g2.id = d.descendant_id
order by parent_id, lvl, descendant_id;
sample output
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
+------------------------------------+------------------------------------+---+----------+---------+
Note that this is just the minimal node descendant relationship and has actual lost all nodes with 0 children such as charly.
In order to resolve this we need to add all nodes back which don't appear in the descendants list
create view v_group_descendant_all as (
select * from v_group_descendant gd
UNION ALL
select null::uuid as parent_id,id as descendant_id, 0 as lvl from groups g
where not exists (select * from v_group_descendant gd where gd.descendant_id = g.id )
);
comment on view v_group_descendant is 'complete list of descendants including rank 0 root nodes descendant - parent relationship is duplicated for all levels / ranks';
preview
+------------------------------------+------------------------------------+---+----------+---------+
|parent_id |descendant_id |lvl|parent |decendent|
+------------------------------------+------------------------------------+---+----------+---------+
|3ef7050f-2f90-444a-a20d-c5cbac91c978|6c758087-a158-43ff-92d6-9f922699f319|1 |bravo |charly |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|3ef7050f-2f90-444a-a20d-c5cbac91c978|1 |alpha |bravo |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|7135b0c6-d59c-4c27-9617-ddcf3bc79419|1 |alpha |berry |
|c1529e8a-75b0-4242-a51a-ac60a0e48868|6c758087-a158-43ff-92d6-9f922699f319|2 |alpha |charly |
|42529e8a-75b0-4242-a51a-ac60a0e48868|44758087-a158-43ff-92d6-9f922699f319|1 |yank |zeta |
|null |c1529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |alpha |
|null |42529e8a-75b0-4242-a51a-ac60a0e48868|0 |null |yank |
+------------------------------------+------------------------------------+---+----------+---------+
Lets say for example we are getting our set s of groups bases on a users(id , data) table with a user_group(user_id, group_id) relation
We can then join this to another table removing duplicates because our set s of user_group relations may cause
duplicates if a users is say assigned to both alpha assigned charly
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | charly |
| kier | yank |
| kier | bravo |
+------+--------+
--drop view v_user_group_recursive;
CREATE VIEW v_user_group_recursive AS (
SELECT DISTINCT dd.descendant_id AS group_id, ug.user_id
FROM v_group_descendant_all dd , user_group ug
WHERE (ug.group_id = dd.descendant_id
OR ug.group_id = dd.parent_id) -- should gic
);
SELECT * FROM v_user_group_recursive;
+------+--------+
| user | group |
+------+--------+
| jane | alpha |
| jane | bravo |
| jane | berry |
| jane | charly |
-- | jane | charly | Removed by DISTINCT
| kier | yank |
| kier | zeta |
| kier | bravo |
| kier | charly |
+------+--------+
If we want we can now group by node and join we can do somthing k like the fallowing
CREATE VIEW v_user_groups_recursive AS (
SELECT user_id, json_agg(json_build_object('id', id,'parent_id',parent_id, 'group_name', group_name, 'org_id', org_id, 'json', json, 'remarks', remarks)) as groups
FROM v_user_group_recursive ug, v_groups_parent g
WHERE ug.group_id = g.id GROUP BY user_id
);
comment on view v_user_group_recursive is 'This aggregates the groups for each user recursively ';
+------+-------------------------------+
| user | groups |
+------+-------------------------------+
| jane | [alpha, bravo, berry, charly] |
| kier | [yank, zeta, bravo, charly] |
+------+-------------------------------+
This is awesome we have answered the question. We now can simply ask which groups this use inherits
SELECT * from v_user_groups_recursive where user_id = 'kier
Displaying our hard work in the front end
And further we could use somthing like jstree.com to display
our structure
async function getProjectTree(user_id) {
let res = await table.query(format('SELECT * from v_user_groups_recursive ug WHERE ug.user_id = %L', user_id));
if (res.success) {
let rows = res.data[0].groups.map(r => {
return {
id: r.id, // required
parent: r.parent_id==null?'#':r.parent_id,// required
text: r.group_name,// node text
icon: 'P', // string for custom
state: {
opened: true, // is the node open
disabled: false, // is the node disabled
selected: false, // is the node selected
},
li_attr: {}, // attributes for the generated LI node
a_attr: {} // attributes for the generated A node
}
})
return {success: true, data: rows, msg: 'Got all projects'}
} else return res;
}
<div id="v_project_tree" class="row col-10 mx-auto" style="height: 25vh"></div>
<script>
function buildTree() {
bs.sendJson('get', "/api/projects/getProjectTree").then(res => {
bs.resNotify(res);
if (!res.success) {
//:(
console.error(':(');
return
}
console.log(res.data);
$('#v_project_tree').jstree({
'core': {
'data': res.data
}
});
})
}
window.addEventListener('load', buildTree);
</script>
jstree preview
blog
The best way, I think indeed is to give each node an id and a parent_id, where the parent id is the id of the parent node. This has a couple of benefits
When you want to update a node, you only have to rewrite the data of that node.
When you want to query only a certain node, you can get exactly the information you want, thus having less overhead on the database connection
A lot of programming languages have functionality to transform mysql data into XML or json, which will make it easier to open up your application using an api.
Something like table "nodes" where each node row contains parent id (in addition to the ordinary node data). For root, the parent is NULL.
Of course, this makes finding children a bit more time consuming, but this way the actual database will be quite simple.