I am attempting to use a CTE to find the first common ancestor in a hierarchy for all values in a source table. I am able to return the entire hierarchy, but cannot seem to find a way to get the first common ancestor.
The ultimate goal is to update the source table to use only common ancestors.
Source table:
Type_Id Id Parent_Id Group_Id
A 3 2 1
A 4 2 1
A 5 4 1
After the CTE the table I get is:
Type_Id Id Child_Id Parent_Id Group_Id Level
A 3 3 2 1 1
A 4 4 2 1 1
A 5 5 4 1 1
A 5 4 2 1 2
What I'm looking to do is update the id of the source so that for each type and group combination, the new id is to be the first common parent. In this case, it would be set all ids to 2 as that's the first common entry.
You can try to use cte recursive self-join by your logic.
;WITH CTE as (
SELECT Type_Id,Id,id Child_Id,Parent_Id,Group_Id,1 Level
FROM T
UNION ALL
SELECT c.Type_Id,c.Id,t.id Child_Id,t.Parent_Id,t.Group_Id,c.Level+1
FROM CTE c
INNER JOIN T t
ON c.Parent_Id = t.ID AND c.Type_Id = t.Type_Id
)
SELECT *
FROM CTE
sqlfiddle
By building a list of all ancestor relationships (including self) and identifying the depth of each node from the root, you can then query all ancestors for the N selected nodes of interest. Any ancestors occurring N times are candidate common ancestors. The one with the deepest depth is the answer.
The following is what I came up with using simplified Node data containing only Id and ParentId. (The initial CTE was adapted from D-Shih's post.)
;WITH Ancestry as (
-- Recursive list of ancestors (including self) for each node
SELECT ChildId = N.Id, AncestorLevel = 0, AncestorId = N.Id, AncestorParentId = N.ParentId
FROM #Node N
UNION ALL
SELECT C.ChildId, AncestorLevel = C.AncestorLevel + 1, AncestorId = P.Id, AncestorParentId = P.ParentId
FROM Ancestry C
INNER JOIN #Node P ON P.Id = C.AncestorParentId
),
AncestryDepth AS (
-- Calculate depth from root (There may be a quicker way)
SELECT *, Depth = COUNT(*) OVER(PARTITION BY ChildId) - AncestorLevel
FROM Ancestry
),
Answer AS (
-- Answer is deepest node that is an ancestor of all selected nodes
SELECT TOP 1 Answer = A.AncestorId
FROM #Selected S
JOIN AncestryDepth A ON A.ChildId = S.Id
GROUP BY A.AncestorId, A.Depth
HAVING COUNT(*) = (SELECT COUNT(*) FROM #Selected) -- Ancestor covers all
ORDER BY A.Depth DESC -- Deepest
)
SELECT *
FROM Answer
There may be room for improvement, possibly a more direct calculation of depth, but this appears to give the correct result.
See this db<>fiddle for a working demo with data and several test scenarios.
Starting from the generic node (or leaf) how to get the root node?
SELECT
ID,MSG_ID,PARENT_ID
FROM
TABLE_X
CONNECT BY PRIOR PARENT_ID = ID;
ID MSG_ID PARENT_ID
4 3 NULL
5 93bea0f71b07-4037-9009-f148fa39bb62 4
4 3 NULL
6 6f5f5d4ab1ec-4f00-8448-7a6dfa6461b2 4
4 3 NULL
7 3 NULL
8 7e0fae569637-4d29-9075-c273eb39ae8e 7
7 3 NULL
9 8a3e7485b3e8-45b1-a31d-c52fd32111c0 7
7 3 NULL
10 fcc622d5af92-4e61-8d7c-add3da359a8b 7
7 3 NULL
How to get the root msg_id?
You were on the right path, but you are missing two essential ingredients.
First, to indicate the starting point, you need to use the start with clause. Obviously, it will be something like start with id = <input value>. You didn't tell us how you will provide the input value. The most common approach is to use bind variables (best for many reasons); I named the bind variable input_id in the query below.
Second, you only want the "last" row (since you are navigating the tree in the opposite direction: towards the root, not from the root; so the root is now the "leaf" as you navigate this way). For that you can use the connect_by_isleaf pseudocolumn in the where clause.
So, the query should look like this: (note that I am only selecting the root message id, as that is all you requested; if you need more columns, include them in select)
select msg_id
from table_x
where connect_by_isleaf = 1 -- keep just the root row (leaf in this traversal)
start with id = :input_id -- to give the starting node
connect by prior parent_id = id
;
You can use a recursive query:
with cte (id, msg_id, parent_id) as (
select id, msg_id, parent_id, from mytable where id = ?
union all
select t.id, t.msg_id, t.parent_id
from mytable t
inner join cte c on c.parent_id = t.id
)
select * from cte where parent_id is null
I would like to get help for a hierarchical query (Oracle 11gR2). I have a hard time with those kind of queries...
In fact, it's a 2 in 1 question (2 different approches needed).
I’m looking for a way to get the distance from all individials records to the root (not the opposite). My data are in a tree like structure:
CREATE TABLE MY_TREE
(ID_NAME VARCHAR2(1) PRIMARY KEY,
PARENT_ID VARCHAR2(1),
PARENT_DISTANCE NUMBER(2)
);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('A',NULL,NULL);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('B','A',1);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('C','B',3);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('D','B',5);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('E','C',7);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('F','D',11);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('G','D',13);
Hierarchically, my data look like this (but I have multiple independents roots and many more levels):
In the first approch, I'm looking for a query that will give me this result:
LEVEL ROOT NODE ID_NAME ROOT_DISTANCE
----- ---- ---- ------- -------------
1 A null A null
2 A null B 1
3 A B C 4
4 A B E 11
3 A B D 6
4 A D F 17
4 A D G 19
In this result,
the "NODE" column mean the ID_NAME of the closest split element
the "ROOT_DISTANCE" column mean the distance from an element to the root (ex: the ROOT_DISTANCE for ID_NAME=G is the distance from G to A: G(13)+D(5)+B(1)=19)
In this approch, I will always specify a maximum of 2 roots.
The second approch must be a PL/SQL script that will do the same calculation (ROOT_DISTANCE), but in a iterative way, and that will write the result in a new table. I want to run this script one time, so all roots (~1000) will be processed.
Here's the way I see the script:
For all roots, we need to find associated leafs and then calculate the distance from the leaf to the root (for all element between the leaf and the root) and put this into a table.
This script is needed for "performance perspectives", so if an element have been already calculated (ex: a split node that was calculated by another leaf), we need to stop the calculation and pass to the next leaf because we already know the result from there to the root. For example, if the system calculates E-C-B-A, and then F-D-B-A, the B-A section should not be calcultated again because it was done in the first pass.
You can awnser one or both of those questions, but i will need the awnser to those two questions.
Thank you!
Try this one:
WITH brumba(le_vel,root,node,id_name,root_distance) AS (
SELECT 1 as le_vel, id_name as root, null as node, id_name, to_number(null) as root_distance
FROM MY_TREE WHERE parent_id IS NULL
UNION ALL
SELECT b.le_vel + 1, b.root,
CASE WHEN 1 < (
SELECT count(*) FROM MY_TREE t1 WHERE t1.parent_id = t.parent_id
)
THEN t.parent_id ELSE b.node
END,
t.id_name, coalesce(b.root_distance,0)+t.parent_distance
FROM MY_TREE t
JOIN brumba b ON b.id_name = t.parent_id
)
SELECT * FROM brumba
Demo: https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=d5c231055e989c3cbcd763f4b3d3033f
There is no need for "the second approch" using PL/SQL - the above SQL will calculate results for all root nodes (which have null in parent_id column) at once.
Just add a prefix either INSERT INTO tablename(col1,col2, ... colN) ... or CREATE TABLE name AS ... to the above query.
The above demo contains an example for the latter option CREATE TABLE xxx AS query
Here's one option which shows how to get the first part of your question:
SQL> with temp as
2 (select level lvl,
3 ltrim(sys_connect_by_path(id_name, ','), ',') path
4 from my_tree
5 start with parent_id is null
6 connect by prior id_name = parent_id
7 ),
8 inter as
9 (select t.lvl,
10 t.path,
11 regexp_substr(t.path, '[^,]+', 1, column_value) col
12 from temp t,
13 table(cast(multiset(select level from dual
14 connect by level <= regexp_count(path, ',') + 1
15 ) as sys.odcinumberlist ))
16 )
17 select i.lvl,
18 i.path,
19 sum(m.parent_distance) dis
20 from inter i join my_tree m on m.id_name = i.col
21 group by i.lvl, i.path
22 order by i.path;
LVL PATH DIS
---- ---------- ----------
1 A
2 A,B 1
3 A,B,C 4
4 A,B,C,E 11
3 A,B,D 6
4 A,B,D,F 17
4 A,B,D,G 19
7 rows selected.
SQL>
Here is how you can solve this with a hierarchical (connect by) query.
In most hierarchical problems, hierarchical queries will be faster than recursive queries (recursive with clause). However, your question is not purely hierarchical - you need to compute the distance to root, and unlike recursive with, you can't get that in a single pass with hierarchical queries. So it will be interesting to hear - from you! - whether there are any significant performance differences between the approaches. For what it's worth, on the very small data sample you provided, the Optimizer estimated a cost of 5 using connect by, vs. 48 for the recursive solution; whether that means anything in real life you will find out and hopefully you will let us know, too.
In the hierarchical query, I flag out the parents that have two or more children (I use an analytic function for this, to avoid a join). Then I build the hierarchy, and in the last step I aggregate to get the required bits. As in the recursive solution, this should give you everything you need - for all roots and all nodes - in a single SQL query; there is no need for PL/SQL code.
with
branching (id_name, parent_id, parent_distance, b) as (
select id_name, parent_id, parent_distance,
case when count(*) over (partition by parent_id) > 1 then 'y' end
from my_tree
)
, hierarchy (lvl, leaf, id_name, parent_id, parent_distance, b) as (
select level, connect_by_root id_name, id_name, parent_id, parent_distance, b
from branching
connect by id_name = prior parent_id
)
select max(lvl) as lvl,
min(id_name) keep (dense_rank last order by lvl) as root,
leaf as id_name,
min(decode(b, 'y', parent_id))
keep (dense_rank first order by decode(b, 'y', lvl)) as node,
sum(parent_distance) as root_distance
from hierarchy
group by leaf;
LVL ROOT ID_NAME NODE ROOT_DISTANCE
--- ------- ------- ------- -------------
1 A A
2 A B 1
3 A C B 4
3 A D B 6
4 A E B 11
4 A F D 17
4 A G D 19
There are similar question asking how to find the top level parent of a child (this, this and this). I have a similar question but I want to find all childern of a top level parent. This is similar question but uses wordpress predefined functions.
sample table:
id parent
1 0
2 0
3 1
4 2
5 3
6 3
7 4
I want to select ID with most top parent equals 1. The output should be 3 and all children of 3 I mean (5,6) and even more deep level children if available.
I know I can select them using two times of inner join but the hirearchy may be more complex with more levels.
A simple "Recursive CTE" will do what you want:
with n as (
select id from my_table where id = 1 -- starting row(s)
union all
select t.id
from n
join my_table t on t.parent_id = n.id
)
select id from n;
This CTE will go down all levels ad infinitum. Well... by default SQL Server limits it to 128 levels (that you can increase to 65k).
Since you aren't climbing the entire ladder...
select *
from YourTable
where parent = (select top 1 parent from YourTable group by parent order by count(parent) desc)
If you were wanting to return the parent of 3, since 3 was listed most often, then you'd use a recursive CTE.
I have table and I want get 3 lvl tree.
Example
Node
Id ParentId Name
1 -1 Test 1
2 -1 Test 2
3 1 Test 1.1
4 1 Test 1.2
5 3 Test 1.1.1
6 3 Test 1.1.2
7 5 Test 1.1.1.1
If I filtered ParentId = -1 I want get rows ParentId = -1 and children's +2
lvl.
If I filtered Id = 2 I want get row Id = 2 and children's +2
lvl.
UPDATE
I use MS SQL Server 2008, Entity Framework 6.1.3.
I understand, I can use 3 selects. But I looking effective method
You can use recursive SQL to do this in SQL server.
WITH recCTE (childID, parentID, Name, Depth) Assuming
(
Select
yourTable.id as childid,
yourTable.parentID,
CAST('Test ' + yourTable.id as varchar(20)) as Name
0 as Depth
FROM
yourTable
WHERE
parentID = -1
UNION ALL
Select
yourTable.id as childID,
yourTable.ParentID as ParentID,
recCTE.path + '.' + yourTable.id AS Name
recCTE.depth + 1 as Depth
FROM
recCTE
INNER JOIN yourTable on
recCTE.childID = yourTable.parentID
Where
recCTE.Depth + 1 <= 2
)
SELECT * FROM recCTE;
The bit inside the CTE on top of the UNION is your seed query for the recursive sql. It's the place where your recursive lookup will start. You wanted to start at parentID = -1, so it's here in the WHERE statement.
The bit inside the CTE below the UNION is the recursive term. This joins the recursive CTE back to itself and brings in more data from your table. Joining the id from your table to the childID from the recursive resultset.
The recursive term is where we test to see how deep we've gotten. If the Depth from the CTE + 1 is less than or equal to 2 then we stop adding children to the loop up, ending the loop for that particular leg of the hierarchy.
The last little bit below the CTE is just the part that runs the CTE so you get results back.
These recursive queries are confusing as hell at first, but spend some time with them and you'll find a lot of use for them. You'll also find that they aren't too difficult to write once you have all the parts sussed out.