Finding Least Common Ancestor from a Transitive Closure Table - sql

I have a table representing the transitive closure of an organizational hierarchy (i.e., its a tree with a single root):
create table ancestry (
ancestor integer,
descendant integer,
distance integer
);
I have another table that contains the organizations that each user is allowed to access:
create table accessible (
user integer,
organization integer
);
The system shows the user a roll-up of expenditures associated with each organization the user can access. I could always start by showing the user a view of the company (i.e., the root) showing the user a list of immediate child organizations and how much his organizations contribute to the total. In most cases, there would be a single child and the user would be required to drill-down several levels before seeing multiple children. I would prefer to start the presentation with the first organization that shows multiple children (i.e., the LCA).
For a given user, I can find the set of paths to the root easy enough but am having trouble finding the least common ancestor. I am using postgresql 9.1 but would prefer a solution that is database agnostic. In the worst case, I can pull the paths to root back into the application's code and calculate the LCA there.

I took a fresh look at this and developed the following solution. I used a common-table-expression to make it easier to understand how it operates but it could easily be written using a sub-query.
with
hit (id, count) as (
select
ancestry.ancestor
,count(ancestry.descendant)
from
accessible
inner join ancestry
on accessible.organization = ancestry.descendant
where
accessible.user = #user_id
group by
ancestry.ancestor
)
select
ancestry.descendant as lca
from
hit
inner join ancestry
on ancestry.descendant = hit.id
and ancestry.ancestor = #company_id
order by
hit.count desc
,ancestry.distance desc
limit 1
;
The hit CTE counts, for each organization in the hierarchy, the number of paths from a child to the root that traverse the organization. The LCA is then the organization with the most traversals. In the event of a tie, the organization farthest from the root (i.e., max(distance)) is the actual LCA. This is best illustrated with an example.
A
|
B
/ \
C D
Assuming we wish to find the LCA of nodes C and D from the tree above. The hit CTE produces the following counts:
Node Count
A 2
B 2
C 1
D 1
The main query adds the distance:
Node Count Distance
A 2 0
B 2 1
C 1 2
D 1 2
The main query then orders the results by descending count and distance
Node Count Distance
B 2 1
A 2 0
C 1 2
D 1 2
The LCA is the first item in the list.

Just a hunch and not db agnostic (SQL Server) but adaptable
SELECT TOP 1
a1.ancestor
FROM ancestor a1
INNER JOIN
ancestor a2 ON a1.ancestor=a2.ancestor
WHERE a1.descendent = #Dec1
AND
a2.descendent = #Dec2
ORDER BY a1.distance DESC
If you want to put some data in SQLFiddle, I can have a play with it.

Related

SQL Query - Count number of descendants and summarize with condition

Consider a table with columns:
Manager
Manager ID
Headcount
Superior
Superior ID
A
1
3
C
123
B
2
4
D
345
The table consists of a hierarchy within a company. Each manager has a superior.
The task is to calculate headcount on each level of the hierarchy with the condition:
assign the headcount from the lower branch ONLY if the person has at least 3 or more people directly reporting to him/her and the person on the lower branch is not meeting this criteria.
This is effectively assigns all the direct reports to the first person that meets the criteria within the hierarchy.
Visual aid to make things clearer:
I am blocked. I can create a hierarchy with summary telling me what are all the lower levels within that person's hierarchy and what is their respective headcount. However, I fail to see how can I move further from there with SQL.
Code to create hierarchy:
cteHeadcount AS
(
SELECT
m.manager_id as step_id,
m.manager_id as id,
m.superior_id,
m.headcount,
m.eligible,
1 AS step
FRPM LineMngrs as m
UNION ALL
SELECT
c.step_id,
m.manager_id as id,
c.superior_id,
m.headcount,
m.eligible,
c.step + 1 as step
FROM LineMngrs as m
INNER JOIN cteHeadcount as c ON c.id = m.superior_id
)
Thanks for all the help and suggestions!

Get list of dependent objects via SQL query or function

I have two tables. One is for Task and second is dependency table for the tasks.
I want a query to give me all the tasks (recursively) based on a particular id.
I have two tables. One is for Task
ID TASK
1 Abc
2 Def
3 Ghi
4 Jkl
5 Mno
6 Pqr
The second one is for getting dependent tasks
ID DEPENDENT_ON
2 1
3 1
4 2
4 6
5 2
6 5
Is it possible to write a sql query to get a list of all the tasks (recursive) which are dependent on a particular task.
Example.
I want to check all tasks dependent on ID=1.
Expected output (which is 2 and 3):
2.Def
3.Ghi
Furthermore query should also give output of these two dependent tasks and so on.
Final output should be:
2.Def -- level one
3.Ghi -- level one
4.Jkl -- Dependent on task 2
5.Mno -- Dependent on task 2
6.Pqr -- Dependent on task 5
Formatting is not important. Just output is required
I need to join two tables and then do a recursive search.
You must OUTER JOIN the second table (which you didn't name, so I have called it TASK_TREE) through DEPENDENT_ON to the parent ID. Outer join because task 1 is the top of the tree and depends on no task. Then use Oracle's hierarchical query syntax to walk the tree:
select t.id, t.task, tt.dependent_on, level
from tasks t
left outer join task_tree tt on tt.id = t.id
connect by prior t.id = tt.dependent_on
start with t.id = 1
/
I have included the level so you can see how the tree unfurls. The Oracle SQL documentation covers hierarchical queries in depth. Find out more. If you don't want to use Oracle's proprietary hierarchical syntax, from 11gR2 Oracle supported recursive WITH clause. Find out more.
Incidentally, your posted data contains a error. Task 4 depends on both 2 and 6. Hierarchies must have child nodes which depend on a single parent node. Otherwise you'll get all sorts of weird results.

How to implement high performance tree view in SQL Server 2005

What is the best way to build the table that will represent the tree?
I want to implement a select ,insert ,update and delete that will work well with big data.
The select for example will have to support "Expand ALL" - getting all the children (and there children) for a given node.
Use CTE's.
Given the tree-like table structure:
id parent name
1 0 Electronics
2 1 TV
3 1 Hi-Fi
4 2 LCD
5 2 Plasma
6 3 Amplifiers
7 3 Speakers
, this query will return id, parent and depth level, ordered as a tree:
WITH v (id, parent, level) AS
(
SELECT id, parent, 1
FROM table
WHERE parent = 0
UNION ALL
SELECT id, parent, v.level + 1
FROM v
JOIN table t
ON t.parent = v.id
)
SELECT *
FROM v
id parent name
1 0 Electronics
2 1 TV
4 2 LCD
5 2 Plasma
3 1 Hi-Fi
6 3 Amplifiers
7 3 Speakers
Replace parent = 0 with parent = #parent to get only a branch of a tree.
Provided there's an index on table (parent), this query will efficiently work on a very large table, since it will recursively use INDEX LOOKUP to find all chilrden for each parent.
To update a certain branch, issue:
WITH v (id, parent, level) AS
(
SELECT id, parent, 1
FROM table
WHERE parent = 0
UNION ALL
SELECT id, parent, v.level + 1
FROM v
JOIN table t
ON t.parent = v.id
)
UPDATE table t
SET column = newvalue
WHERE t.id IN
(
SELECT id
FROM v
)
where #parent is the root of the branch.
You have to ask yourself these questions first :
1) What is ratio of modifications vs reads ? (= mostly static tree or changing constantly?)
2) How deep and how large do you expect the tree to grow ?
Nested sets are great for mostly-static trees where you need operations on whole branches. It handles deep trees without problems.
Materialized path works well for dynamic (changing) trees with constrained/predictable depth.
Recursive CTEs are ideal for very small trees, but the branch operations ("get all children in this branch..") get very costly with deep / large tree.
Check out Joe Celko's book on trees and hierarchies for multiple ways to tackle the hierarchy problem. The model that you choose will depend on how you weight lookups vs. updates vs. complexity. You can make the lookups pretty fast (especially for getting all children in a node) using the adjacency list model, but updates to the tree are slower.
If you have many updates and selects, the best option seems to be the Path Enumeration Model, which is briefly described here:
http://www.sqlteam.com/article/more-trees-hierarchies-in-sql
I'm surprised no one has mentioned going with a Closure Table. Very efficient for reads and pretty simple to write.

Finding breadcrumbs for nested sets

I'm using nested sets (aka modified preorder tree traversal) to store a list of groups, and I'm trying to find a quick way to generate breadcrumbs (as a string, not a table) for ALL of the groups at once. My data is also stored using the adjacency list model (there are triggers to keep the two in sync).
So for example:
ID Name ParentId Left Right
0 Node A 0 1 12
1 Node B 0 2 5
2 Node C 1 3 4
3 Node D 0 6 11
4 Node E 3 7 8
5 Node F 4 9 9
Which represents the tree:
Node A
Node B
Node C
Node D
Node E
Node F
I would like to be able to have a user-defined function that returns a table:
ID Breadcrumb
0 Node A
1 Node A > Node B
2 Node A > Node B > Node C
3 Node A > Node D
4 Node A > Node D > Node E
5 Node A > Node D > Node F
To make this slightly more complicated (though it's sort of out of the scope of the question), I also have user restrictions that need to be respected. So for example, if I only have access to id=3, when I run the query I should get:
ID Breadcrumb
3 Node D
4 Node D > Node E
5 Node D > Node F
I do have a user-defined function that takes a userid as a parameter, and returns a table with the ids of all groups that are valid, so as long as somewhere in the query
WHERE group.id IN (SELECT id FROM dbo.getUserGroups(#userid))
it will work.
I have an existing scalar function that can do this, but it just does not work on any reasonable number of groups (takes >10 seconds on 2000 groups). It takes a groupid and userid as a parameter, and returns a nvarchar. It finds the given groups parents (1 query to grab the left/right values, another to find the parents), restricts the list to the groups the user has access to (using the same WHERE clause as above, so yet another query), and then uses a cursor to go through each group and append it to a string, before finally returning that value.
I need a method to do this that will run quickly (eg. <= 1s), on the fly.
This is on SQL Server 2005.
here's the SQL that worked for me to get the "breadcrumb" path from any point in the tree. Hope it helps.
SELECT ancestor.id, ancestor.title, ancestor.alias
FROM `categories` child, `categories` ancestor
WHERE child.lft >= ancestor.lft AND child.lft <= ancestor.rgt
AND child.id = MY_CURRENT_ID
ORDER BY ancestor.lft
Kath
Ok. This is for MySQL, not SQL Server 2005. It uses a GROUP_CONCAT with a subquery.
This should return the full breadcrumb as single column.
SELECT
(SELECT GROUP_CONCAT(parent.name SEPARATOR ' > ')
FROM category parent
WHERE node.Left >= parent.Left
AND node.Right <= parent.Right
ORDER BY Left
) as breadcrumb
FROM category node
ORDER BY Left
If you can, use a path (or I think I've heard it referred as a lineage) field like:
ID Name ParentId Left Right Path
0 Node A 0 1 12 0,
1 Node B 0 2 5 0,1,
2 Node C 1 3 4 0,1,2,
3 Node D 0 6 11 0,3,
4 Node E 3 7 8 0,3,4,
5 Node F 4 9 9 0,3,4,
To get just node D and onward (psuedocode):
path = SELECT Path FROM Nodes WHERE ID = 3
SELECT * FROM Nodes WHERE Path LIKE = path + '%'
I modified the Statement of Kathy to get breadcrumbs for every element
SELECT
GROUP_CONCAT(
ancestor.name
ORDER BY ancestor.lft ASC
SEPARATOR ' > '
),
child.*
FROM `categories` child
JOIN `categories` ancestor
ON child.lft >= ancestor.lft
AND child.lft <= ancestor.rgt
GROUP BY child.lft
ORDER BY child.lft
Feel free to add a WHERE condition e.g.
WHERE ancestor.lft BETWEEN 6 AND 11
What I ended up doing is making a large join that simply ties this table to itself, over and over for every level.
First I populate a table #topLevelGroups with just the 1st level groups (if you only have one root you can skip this step), and then #userGroups with the groups that user can see.
SELECT groupid,
(level1
+ CASE WHEN level2 IS NOT NULL THEN ' > ' + level2 ELSE '' END
+ CASE WHEN level3 IS NOT NULL THEN ' > ' + level3 ELSE '' END
)as [breadcrumb]
FROM (
SELECT g3.*
,g1.name as level1
,g2.name as level2
,g3.name as level3
FROM #topLevelGroups g1
INNER JOIN #userGroups g2 ON g2.parentid = g1.groupid and g2.groupid <> g1.groupid
INNER JOIN #userGroups g3 ON g3.parentid = g2.groupid
UNION
SELECT g2.*
,g1.name as level1
,g2.name as level2
,NULL as level3
FROM #topLevelGroups g1
INNER JOIN #userGroups g2 ON g2.parentid = g1.groupid and g2.groupid <> g1.groupid
UNION
SELECT g1.*
,g1.name as level1
,NULL as level2
,NULL as level3
FROM #topLevelGroups g1
) a
ORDER BY [breadcrumb]
This is a pretty big hack, and is obviously limited to a certain number of levels (for my app, there is a reasonable limit I can pick), with the problem that the more levels are supported, it increases the number of joins exponentially, and thus is much slower.
Doing it in code is most certainly easier, but for me that is simply not always an option - there are times when I need this available directly from a SQL query.
I'm accepting this as the answer, since it's what I ended up doing and it may work for other people -- however, if someone can come up with a more efficient method I'll change it to them.
no sql server specific code, but are you simply looking for :
SELECT * FROM table WHERE left < (currentid.left) AND right > (currentid.right)

SQL Recursive Tables

I have the following tables, the groups table which contains hierarchically ordered groups and group_member which stores which groups a user belongs to.
groups
---------
id
parent_id
name
group_member
---------
id
group_id
user_id
ID PARENT_ID NAME
---------------------------
1 NULL Cerebra
2 1 CATS
3 2 CATS 2.0
4 1 Cerepedia
5 4 Cerepedia 2.0
6 1 CMS
ID GROUP_ID USER_ID
---------------------------
1 1 3
2 1 4
3 1 5
4 2 7
5 2 6
6 4 6
7 5 12
8 4 9
9 1 10
I want to retrieve the visible groups for a given user. That it is to say groups a user belongs to and children of these groups. For example, with the above data:
USER VISIBLE_GROUPS
9 4, 5
3 1,2,4,5,6
12 5
I am getting these values using recursion and several database queries. But I would like to know if it is possible to do this with a single SQL query to improve my app performance. I am using MySQL.
Two things come to mind:
1 - You can repeatedly outer-join the table to itself to recursively walk up your tree, as in:
SELECT *
FROM
MY_GROUPS MG1
,MY_GROUPS MG2
,MY_GROUPS MG3
,MY_GROUPS MG4
,MY_GROUPS MG5
,MY_GROUP_MEMBERS MGM
WHERE MG1.PARENT_ID = MG2.UNIQID (+)
AND MG1.UNIQID = MGM.GROUP_ID (+)
AND MG2.PARENT_ID = MG3.UNIQID (+)
AND MG3.PARENT_ID = MG4.UNIQID (+)
AND MG4.PARENT_ID = MG5.UNIQID (+)
AND MGM.USER_ID = 9
That's gonna give you results like this:
UNIQID PARENT_ID NAME UNIQID_1 PARENT_ID_1 NAME_1 UNIQID_2 PARENT_ID_2 NAME_2 UNIQID_3 PARENT_ID_3 NAME_3 UNIQID_4 PARENT_ID_4 NAME_4 UNIQID_5 GROUP_ID USER_ID
4 2 Cerepedia 2 1 CATS 1 null Cerebra null null null null null null 8 4 9
The limit here is that you must add a new join for each "level" you want to walk up the tree. If your tree has less than, say, 20 levels, then you could probably get away with it by creating a view that showed 20 levels from every user.
2 - The only other approach that I know of is to create a recursive database function, and call that from code. You'll still have some lookup overhead that way (i.e., your # of queries will still be equal to the # of levels you are walking on the tree), but overall it should be faster since it's all taking place within the database.
I'm not sure about MySql, but in Oracle, such a function would be similar to this one (you'll have to change the table and field names; I'm just copying something I did in the past):
CREATE OR REPLACE FUNCTION GoUpLevel(WO_ID INTEGER, UPLEVEL INTEGER) RETURN INTEGER
IS
BEGIN
DECLARE
iResult INTEGER;
iParent INTEGER;
BEGIN
IF UPLEVEL <= 0 THEN
iResult := WO_ID;
ELSE
SELECT PARENT_ID
INTO iParent
FROM WOTREE
WHERE ID = WO_ID;
iResult := GoUpLevel(iParent,UPLEVEL-1); --recursive
END;
RETURN iResult;
EXCEPTION WHEN NO_DATA_FOUND THEN
RETURN NULL;
END;
END GoUpLevel;
/
Joe Cleko's books "SQL for Smarties" and "Trees and Hierarchies in SQL for Smarties" describe methods that avoid recursion entirely, by using nested sets. That complicates the updating, but makes other queries (that would normally need recursion) comparatively straightforward. There are some examples in this article written by Joe back in 1996.
I don't think that this can be accomplished without using recursion. You can accomplish it with with a single stored procedure using mySQL, but recursion is not allowed in stored procedures by default. This article has information about how to enable recursion. I'm not certain about how much impact this would have on performance verses the multiple query approach. mySQL may do some optimization of stored procedures, but otherwise I would expect the performance to be similar.
Didn't know if you had a Users table, so I get the list via the User_ID's stored in the Group_Member table...
SELECT GroupUsers.User_ID,
(
SELECT
STUFF((SELECT ',' +
Cast(Group_ID As Varchar(10))
FROM Group_Member Member (nolock)
WHERE Member.User_ID=GroupUsers.User_ID
FOR XML PATH('')),1,1,'')
) As Groups
FROM (SELECT User_ID FROM Group_Member GROUP BY User_ID) GroupUsers
That returns:
User_ID Groups
3 1
4 1
5 1
6 2,4
7 2
9 4
10 1
12 5
Which seems right according to the data in your table. But doesn't match up with your expected value list (e.g. User 9 is only in one group in your table data but you show it in the results as belonging to two)
EDIT: Dang. Just noticed that you're using MySQL. My solution was for SQL Server. Sorry.
-- Kevin Fairchild
There was already similar question raised.
Here is my answer (a bit edited):
I am not sure I understand correctly your question, but this could work My take on trees in SQL.
Linked post described method of storing tree in database -- PostgreSQL in that case -- but the method is clear enough, so it can be adopted easily for any database.
With this method you can easy update all the nodes depend on modified node K with about N simple SELECTs queries where N is distance of K from root node.
Good Luck!
I don't remember which SO question I found the link under, but this article on sitepoint.com (second page) shows another way of storing hierarchical trees in a table that makes it easy to find all child nodes, or the path to the top, things like that. Good explanation with example code.
PS. Newish to StackOverflow, is the above ok as an answer, or should it really have been a comment on the question since it's just a pointer to a different solution (not exactly answering the question itself)?
There's no way to do this in the SQL standard, but you can usually find vendor-specific extensions, e.g., CONNECT BY in Oracle.
UPDATE: As the comments point out, this was added in SQL 99.