Oracle hierarchical sum (distance from leaf to root) - sql

I would like to get help for a hierarchical query (Oracle 11gR2). I have a hard time with those kind of queries...
In fact, it's a 2 in 1 question (2 different approches needed).
I’m looking for a way to get the distance from all individials records to the root (not the opposite). My data are in a tree like structure:
CREATE TABLE MY_TREE
(ID_NAME VARCHAR2(1) PRIMARY KEY,
PARENT_ID VARCHAR2(1),
PARENT_DISTANCE NUMBER(2)
);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('A',NULL,NULL);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('B','A',1);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('C','B',3);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('D','B',5);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('E','C',7);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('F','D',11);
INSERT INTO MY_TREE (ID_NAME,PARENT_ID,PARENT_DISTANCE) VALUES('G','D',13);
Hierarchically, my data look like this (but I have multiple independents roots and many more levels):
In the first approch, I'm looking for a query that will give me this result:
LEVEL ROOT NODE ID_NAME ROOT_DISTANCE
----- ---- ---- ------- -------------
1 A null A null
2 A null B 1
3 A B C 4
4 A B E 11
3 A B D 6
4 A D F 17
4 A D G 19
In this result,
the "NODE" column mean the ID_NAME of the closest split element
the "ROOT_DISTANCE" column mean the distance from an element to the root (ex: the ROOT_DISTANCE for ID_NAME=G is the distance from G to A: G(13)+D(5)+B(1)=19)
In this approch, I will always specify a maximum of 2 roots.
The second approch must be a PL/SQL script that will do the same calculation (ROOT_DISTANCE), but in a iterative way, and that will write the result in a new table. I want to run this script one time, so all roots (~1000) will be processed.
Here's the way I see the script:
For all roots, we need to find associated leafs and then calculate the distance from the leaf to the root (for all element between the leaf and the root) and put this into a table.
This script is needed for "performance perspectives", so if an element have been already calculated (ex: a split node that was calculated by another leaf), we need to stop the calculation and pass to the next leaf because we already know the result from there to the root. For example, if the system calculates E-C-B-A, and then F-D-B-A, the B-A section should not be calcultated again because it was done in the first pass.
You can awnser one or both of those questions, but i will need the awnser to those two questions.
Thank you!

Try this one:
WITH brumba(le_vel,root,node,id_name,root_distance) AS (
SELECT 1 as le_vel, id_name as root, null as node, id_name, to_number(null) as root_distance
FROM MY_TREE WHERE parent_id IS NULL
UNION ALL
SELECT b.le_vel + 1, b.root,
CASE WHEN 1 < (
SELECT count(*) FROM MY_TREE t1 WHERE t1.parent_id = t.parent_id
)
THEN t.parent_id ELSE b.node
END,
t.id_name, coalesce(b.root_distance,0)+t.parent_distance
FROM MY_TREE t
JOIN brumba b ON b.id_name = t.parent_id
)
SELECT * FROM brumba
Demo: https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=d5c231055e989c3cbcd763f4b3d3033f
There is no need for "the second approch" using PL/SQL - the above SQL will calculate results for all root nodes (which have null in parent_id column) at once.
Just add a prefix either INSERT INTO tablename(col1,col2, ... colN) ... or CREATE TABLE name AS ... to the above query.
The above demo contains an example for the latter option CREATE TABLE xxx AS query

Here's one option which shows how to get the first part of your question:
SQL> with temp as
2 (select level lvl,
3 ltrim(sys_connect_by_path(id_name, ','), ',') path
4 from my_tree
5 start with parent_id is null
6 connect by prior id_name = parent_id
7 ),
8 inter as
9 (select t.lvl,
10 t.path,
11 regexp_substr(t.path, '[^,]+', 1, column_value) col
12 from temp t,
13 table(cast(multiset(select level from dual
14 connect by level <= regexp_count(path, ',') + 1
15 ) as sys.odcinumberlist ))
16 )
17 select i.lvl,
18 i.path,
19 sum(m.parent_distance) dis
20 from inter i join my_tree m on m.id_name = i.col
21 group by i.lvl, i.path
22 order by i.path;
LVL PATH DIS
---- ---------- ----------
1 A
2 A,B 1
3 A,B,C 4
4 A,B,C,E 11
3 A,B,D 6
4 A,B,D,F 17
4 A,B,D,G 19
7 rows selected.
SQL>

Here is how you can solve this with a hierarchical (connect by) query.
In most hierarchical problems, hierarchical queries will be faster than recursive queries (recursive with clause). However, your question is not purely hierarchical - you need to compute the distance to root, and unlike recursive with, you can't get that in a single pass with hierarchical queries. So it will be interesting to hear - from you! - whether there are any significant performance differences between the approaches. For what it's worth, on the very small data sample you provided, the Optimizer estimated a cost of 5 using connect by, vs. 48 for the recursive solution; whether that means anything in real life you will find out and hopefully you will let us know, too.
In the hierarchical query, I flag out the parents that have two or more children (I use an analytic function for this, to avoid a join). Then I build the hierarchy, and in the last step I aggregate to get the required bits. As in the recursive solution, this should give you everything you need - for all roots and all nodes - in a single SQL query; there is no need for PL/SQL code.
with
branching (id_name, parent_id, parent_distance, b) as (
select id_name, parent_id, parent_distance,
case when count(*) over (partition by parent_id) > 1 then 'y' end
from my_tree
)
, hierarchy (lvl, leaf, id_name, parent_id, parent_distance, b) as (
select level, connect_by_root id_name, id_name, parent_id, parent_distance, b
from branching
connect by id_name = prior parent_id
)
select max(lvl) as lvl,
min(id_name) keep (dense_rank last order by lvl) as root,
leaf as id_name,
min(decode(b, 'y', parent_id))
keep (dense_rank first order by decode(b, 'y', lvl)) as node,
sum(parent_distance) as root_distance
from hierarchy
group by leaf;
LVL ROOT ID_NAME NODE ROOT_DISTANCE
--- ------- ------- ------- -------------
1 A A
2 A B 1
3 A C B 4
3 A D B 6
4 A E B 11
4 A F D 17
4 A G D 19

Related

How to get the root record in Oracle database

Starting from the generic node (or leaf) how to get the root node?
SELECT
ID,MSG_ID,PARENT_ID
FROM
TABLE_X
CONNECT BY PRIOR PARENT_ID = ID;
ID MSG_ID PARENT_ID
4 3 NULL
5 93bea0f71b07-4037-9009-f148fa39bb62 4
4 3 NULL
6 6f5f5d4ab1ec-4f00-8448-7a6dfa6461b2 4
4 3 NULL
7 3 NULL
8 7e0fae569637-4d29-9075-c273eb39ae8e 7
7 3 NULL
9 8a3e7485b3e8-45b1-a31d-c52fd32111c0 7
7 3 NULL
10 fcc622d5af92-4e61-8d7c-add3da359a8b 7
7 3 NULL
How to get the root msg_id?
You were on the right path, but you are missing two essential ingredients.
First, to indicate the starting point, you need to use the start with clause. Obviously, it will be something like start with id = <input value>. You didn't tell us how you will provide the input value. The most common approach is to use bind variables (best for many reasons); I named the bind variable input_id in the query below.
Second, you only want the "last" row (since you are navigating the tree in the opposite direction: towards the root, not from the root; so the root is now the "leaf" as you navigate this way). For that you can use the connect_by_isleaf pseudocolumn in the where clause.
So, the query should look like this: (note that I am only selecting the root message id, as that is all you requested; if you need more columns, include them in select)
select msg_id
from table_x
where connect_by_isleaf = 1 -- keep just the root row (leaf in this traversal)
start with id = :input_id -- to give the starting node
connect by prior parent_id = id
;
You can use a recursive query:
with cte (id, msg_id, parent_id) as (
select id, msg_id, parent_id, from mytable where id = ?
union all
select t.id, t.msg_id, t.parent_id
from mytable t
inner join cte c on c.parent_id = t.id
)
select * from cte where parent_id is null

How to workaround parent-child script without using classic way, or without using where clause

I am looking for workaround that works like parent-child, but without using recursive searching. I am not able to use temporary tables.
THIS SCRIPT WORKS but slowly, always run for 600 sec.:
SELECT CONNECT_BY_ROOT party_id as ANCESTOR,
party_id, role_id, subject_id
FROM onecrm.CRM_PARTY
WHERE LEVEL>1
and party_id = 'text'
CONNECT BY PRIOR Party_id=parent_id;
This works well, but it contain 3 steps. I need to use only one step because of aggregate tasks.
select internal_id, party_id, parent_id, subject_id, channel_type_id
from onecrm.O_ORDER oo
join onecrm.CRM_PARTY cp on oo.party_ref_no = cp.party_ref_no
where internal_id = 'O7VYECF';
Result:
INTERNAL_ID, PARTY_ID, PARENT_ID, SUBJECT_ID, CHANNEL_TYPE_ID
O7VYECF 110179237 110179236 null CRM
select internal_id, cp.party_id, parent_id
from onecrm.O_ORDER oo
right join onecrm.CRM_PARTY cp on oo.party_ref_no = cp.party_ref_no
where cp.party_id = '110179236';
Result:
INTERNAL_ID, PARTY_ID, PARENT_ID
OAMUAY7 110179236 null
select internal_id, cp.party_id, parent_id, cp.subject_id,
channel_type_id, full_name, phone_no_1, phone_no_2, email, segment
from onecrm.O_ORDER oo
right join onecrm.CRM_PARTY cp on oo.party_ref_no = cp.party_ref_no
left join onecrm.CRM_SUBJECT cs on cs.SUBJECT_ID = cp.SUBJECT_ID
left join onecrm.crm_contact_ref ccr on ccr.conre_ref_no = cs.subj_ref_no
left join onecrm.CRM_CONTACT_EXT cce on cce.contact_id = ccr.contact_id
where cp.party_id = '110179236';
Expected result:
INTERNAL_ID, PARTY_ID, PARENT_ID, SUBJECT_ID, CHANNEL_TYPE_ID, FULL_NAME, PHONE_NO_1, PHONE_NO_2, EMAIL, SEGMENT
OAMUAY7 110179236 null 102219217 TGB great_company s.r.o.
xxx xxx TNC RNC
Expected result is write only internal_id and get parent_id INFO
The original connect by query has no start with clause. This means it's calculating the tree for every single row in the table!
It's then applying the where clause to the tree generated.
For example the following builds a tree start for the rows C1 = 1, C1 = 2, & C1 = 3:
create table t as
select level c1, level - 1 c2
from dual
connect by level <= 3;
select t.*,
connect_by_root c1 rt
from t
connect by prior c1 = c2;
C1 C2 RT
1 0 1
2 1 1
3 2 1
2 1 2
3 2 2
3 2 3
As you load more data into the table, this will very quickly slow your query to a crawl.
Even if your where clause means you only get a few rows back, you're very likely to be processing a huge data set.
To avoid this, you almost certainly want a start with clause. This defines which row is the root of the tree:
select t.*,
connect_by_root c1 rt
from t
start with c1 = 1
connect by prior c1 = c2;
C1 C2 RT
1 0 1
2 1 1
3 2 1

Hierarchical queries in Oracle

I have a table with a structure like (id, parent_id) in Oracle11g.
id parent_id
---------------
1 (null)
2 (null)
3 1
4 3
5 3
6 2
7 2
I'd like to query it to get all the lines that are hierarchically linked to each of these id, so the results should be :
root_id id parent_id
------------------------
1 3 1
1 4 3
1 5 3
2 6 2
2 7 2
3 4 3
3 5 3
I've been struggling with the connect by and start with for quite some time now, and all i can get is a fraction of the results i want with queries like :
select connect_by_root(id) root_id, id, parent_id from my-table
start with id=1
connect by prior id = parent_id
I'd like to not use any for loop to get my complete results.
Any Idea ?
Best regards,
Jérôme Lefrère
PS : edited after first answer, noticing me i had forgotten some of the results i want...
The query you posted is missing the from clause and left an underscore out of connect_by_root, but I'll assume those aren't actually the source of your problem.
The following query gives you the result you're looking for:
select * from (
select connect_by_root(id) root_id, id, parent_id
from test1
start with parent_id is null
connect by prior id = parent_id)
where root_id <> id
The central problem is that you were specifying a specific value to start from, rather that specifying a way to identify the root rows. Changing id = 1 to parent_id is null allows the entire contents of the table to be returned.
I also added the outer query to filter the root rows out of the result set, which wasn't mentioned in your question, but was shown in your desired result.
SQL Fiddle Example
Comment Response:
In the version provided, you do get descendants of id = 3, but not in such a way that 3 is the root. This is because we're starting at the absolute root. Resolving this is easy, just omit the start with clause:
SELECT *
FROM
(SELECT connect_by_root(id) root_id,
id,
parent_id
FROM test1
CONNECT BY
PRIOR id = parent_id)
WHERE root_id <> id
SQL Fiddle Example
try this:
select connect_by_root(id) root_id, id, parent_id
from your_table
start with parent_id is null
connect by prior id = parent_id
It will give you the exact result you want:
select connect_by_root(id) as root, id, parent_id
from test1
connect by prior id=parent_id
start with parent_id is not null;

Trees: Calculate the cost of 2 paths and determine which is more expensive

This is my first question on this forum so I will try keep it clear.
I have 1 table entity with the following data:
ATTR1 ATTR2 ATTR3 ATTR4
A Level 1 null 35
B Level 2 A 34
C Level 2 A 33
D Level 3 B 32
E Level 3 B 31
F Level 3 C 30
G Level 3 C 29
H Level 4 D 28
I Level 4 D 27
J Level 4 E 26
K Level 4 E 25
L Level 4 F 24
M Level 4 F 23
N Level 4 G 22
O Level 4 G 21
P Level 5 H 20
Q Level 5 H 19
R Level 5 H 18
S Level 5 O 17
Where ATTR1 is the name of the node. It is also the primary key.
Where ATTR2 is the level of the node.
Where ATTR3 is the name of the node's parent node. A is the root and it has no parent nodes, therefore NULL.
Where ATTR4 is the cost of the node.
Now the question:
Given any part X and a leaf node Y (i.e. Y is a descendent of X), what is the most expensive path from either root to X or direct descendent of X to Y ?
In other words, let us say the X node is D and the Y node is P. The path from node to root would be D-B-A whereas the path from leaf to node would be P-H-D.
How is one to calculate the total cost of each path AND be able to say which is more expensive?
My approach was to do 2 recursive queries, 1 query for each path to find the SUM of each. The problem was that I was forced to create 2 tables and try to put all their data in 1. I feel I have hit a dead end and it is starting to look kinda long and not feasible.
Any help is appreciated, preferably in PostgreSQL syntax.
Having create the table like this:
create table entity (attr1 text not null primary key,
attr2 text not null,
attr3 text,
attr4 int not null);
... and populated it with the data shown above, are you looking for something like this?:
with recursive cst as (
with req as (
select 'A'::text as top, 'D'::text as bottom
union all
select 'D'::text, 'P'::text
)
select
top,
bottom,
top as last,
top as path,
attr4 as cost
from req
join entity on (top = attr1)
union
select
top,
bottom,
attr1,
path || '-' || attr1,
cost + attr4
from cst
join entity on (attr3 = last)
), res as (
select * from cst where bottom = last
)
select path from res
where cost = (select max(cost) from res);
Granted, the req CTE as a way to specify the request is a bit of hack, but I'm sure you can pretty up that part to be as you want it. Also, this always shows the path from the "upper" to "lower" rather than "outside" to "inside", but I'm not sure whether that was important to you. Anyway, this should be close enough to munge into what you want, I think.
First of all, save the level of your tree as integer not as (redundant and inappropriate) text.
The table would look like this:
CREATE TABLE entity (
name text NOT NULL PRIMARY KEY
,level int NOT NULL
,parent text
,cost int NOT NULL);
Query:
WITH RECURSIVE val(root, leaf) AS (
VALUES -- provide values here
('A'::text, 'D'::text)
,('D', 'P')
), x AS (
SELECT v.root AS name
,v.root AS path
,r.cost AS total
,1 AS path_len
,l.level - r.level AS len -- as break condition
FROM val v
JOIN entity r ON r.name = root
JOIN entity l ON l.name = leaf
UNION ALL
SELECT e.name -- AS parent
,x.path || '-' || e.name -- AS path
,x.total + e.cost -- AS total
,x.path_len + 1 -- AS path_len
,x.len -- AS len
FROM x
JOIN entity e ON e.parent = x.name
WHERE x.path_len <= x.len
)
SELECT x.path, x.total
FROM x
JOIN val v ON x.name = v.leaf AND x.path_len > 1
ORDER BY x.total DESC
LIMIT 1;
Result:
path | total
------+-------
A-B-D | 101
Demo at sqlfiddle.
Major points
VALUES is faster / simpler / more intuitive for providing values.
Use UNION ALL instead of UNION, else the recursive union has to check for (non-existing in this case) duplicates every iteration.
Don't include the columns root and leaf in the recursive CTE, they are dead weight.
There is no need for a nested WITH clause. You can have plain CTEs in a WITH RECURSIVE clause.
Most important for performance: In your model you know the lentgh of the path beforehand. Use it as break condition and don't calculate all paths to the bitter end - which can be very expensive with big trees.
The final SELECT can also be largely simplified, no need for an aggregate function.
Join to your values and pick the right paths. This way you can easily display any or all columns in the result if you want.

How to calculate the sum of values in a tree using SQL

I need to sum points on each level earned by a tree of users. Level 1 is the sum of users' points of the users 1 level below the user. Level 2 is the Level 1 points of the users 2 levels below the user, etc...
The calculation happens once a month on a non production server, no worries about performance.
What would the SQL look like to do it?
If you're confused, don't worry, I am as well!
User table:
ID ParentID Points
1 0 230
2 1 150
3 0 80
4 1 110
5 4 54
6 4 342
Tree:
0
|---\
1 3
| \
2 4---
\ \
5 6
Output should be:
ID Points Level1 Level2
1 230 150+110 150+110+54+342
2 150
3 80
4 110 54+342
5 54
6 342
SQL Server Syntax and functions preferably...
If you were using Oracle DBMS that would be pretty straightforward since Oracle supports tree queries with the CONNECT BY/STARTS WITH syntax. For SQL Server I think you might find Common Table Expressions useful
Trees don't work well with SQL. If you have very (very very) few write accesses, you could change the tree implementation to use nested sets, that would make this query incredibly easy.
Example (if I'm not mistaken):
SELECT SUM(points)
FROM users
where left > x and right < y
However, any changes on the tree require touching a massive amount of rows. It's probably better to just do the recursion in you client.
I would say: create a stored procedure, probably has the best performance.
Or if you have a maximum number of levels, you could create subqueries, but they will have a very poort performance.
(Or you could get MS SQL Server 2008 and get the new hierarchy functions... ;) )
SQL in general, like others said, does not handle well such relations. Typically, a surrogate 'relations' table is needed (id, parent_id, unique key on (id, parent_id)), where:
every time you add a record in 'table', you:
INSERT INTO relations (id, parent_id) VALUES ([current_id], [current_id]);
INSERT INTO relations (id, parent_id) VALUES ([current_id], [current_parent_id]);
INSERT INTO relations (id, parent_id)
SELECT [current_id], parent_id
FROM relations
WHERE id = [current_parent_id];
have logic to avoid cycles
make sure that updates, deletions on 'relations' are handled with stored procedures
Given that table, you want:
SELECT rel.parent_id, SUM(tbl.points)
FROM table tbl INNER JOIN relations rel ON tbl.id=rel.id
WHERE rel.parent_id <> 0
GROUP BY rel.parent_id;
Ok, this gives you the results you are looking for, but there are no guarantees that I didn't miss something. Consider it a starting point. I used SQL 2005 to do this, SQL 2000 does not support CTE's
WITH Parent (id, GrandParentId, parentId, Points, Level1Points, Level2Points)
AS
(
-- Find root
SELECT id,
0 AS GrandParentId,
ParentId,
Points,
0 AS Level1Points,
0 AS Level2Points
FROM tblPoints ptr
WHERE ptr.ParentId = 0
UNION ALL (
-- Level2 Points
SELECT pa.GrandParentId AS Id,
NULL AS GrandParentId,
NULL AS ParentId,
0 AS Points,
0 AS Level1Points,
pa.Points AS Level2Points
FROM tblPoints pt
JOIN Parent pa ON pa.GrandParentId = pt.Id
UNION ALL
-- Level1 Points
SELECT pt.ParentId AS Id,
NULL AS GrandParentId,
NULL AS ParentId,
0 AS Points,
pt.Points AS Level1Points,
0 AS Level2Points
FROM tblPoints pt
JOIN Parent pa ON pa.Id = pt.ParentId AND pa.ParentId IS NOT NULL
UNION ALL
-- Points
SELECT pt.id,
pa.ParentId AS GrandParentId,
pt.ParentId,
pt.Points,
0 AS Level1Points,
0 AS Level2Points
FROM tblPoints pt
JOIN Parent pa ON pa.Id = pt.ParentId AND pa.ParentId IS NOT NULL )
)
SELECT id,
SUM(Points) AS Points,
SUM(Level1Points) AS Level1Points,
CASE WHEN SUM(Level2Points) > 0 THEN SUM(Level1Points) + SUM(Level2Points) ELSE 0 END AS Level2Points
FROM Parent
GROUP BY id
ORDER by id
If you are working with trees stored in a relational database, I'd suggest looking at "nested set" or "modified preorder tree traversal". The SQL will be as simple as that:
SELECT id,
SUM(value) AS value
FROM table
WHERE left>left\_value\_of\_your\_node
AND right<$right\_value\_of\_your\_node;
... and do this for every node you are interested in.
Maybe this will help you:
http://www.dbazine.com/oracle/or-articles/tropashko4 or use google.
You have a couple of options:
Use a cursor and a recursive user-defined function call (it's quite slow)
Create a cache table, update it on INSERT using a trigger (it's the fastest solution but could be problematic if you have lots of updates to the main table)
Do a client-side recursive calculation (preferable if you don't have too many records)
You can write a simple recursive function to do the job. My MSSQL is a little bit rusty, but it would look like this:
CREATE FUNCTION CALC
(
#node integer,
)
returns
(
#total integer
)
as
begin
select #total = (select node_value from yourtable where node_id = #node);
declare #children table (value integer);
insert into #children
select calc(node_id) from yourtable where parent_id = #node;
#current = #current + select sum(value) from #children;
return
end
The following table:
Id ParentId
1 NULL
11 1
12 1
110 11
111 11
112 11
120 12
121 12
122 12
123 12
124 12
And the following Amount table:
Id Val
110 500
111 50
112 5
120 3000
121 30000
122 300000
Only the leaves (last level) Id's have a value defined.
The SQL query to get the data looks like:
;WITH Data (Id, Val) AS
(
select t.Id, SUM(v.val) as Val from dbo.TestTable t
join dbo.Amount v on t.Id = v.Id
group by t.Id
)
select cd.Id, ISNULL(SUM(cd.Val), 0) as Amount FROM
(
-- level 3
select t.Id, d.val from TestTable t
left join Data d on d.id = t.Id
UNION
-- level 2
select t.parentId as Id, sum(y.Val) from TestTable t
left join Data y on y.id = t.Id
where t.parentId is not null
group by t.parentId
UNION
-- level 1
select t.parentId as Id, sum(y.Val) from TestTable t
join TestTable c on c.parentId = t.Id
left join Data y on y.id = c.Id
where t.parentId is not null
group by t.parentId
) AS cd
group by id
this results in the output:
Id Amount
1 333555
11 555
12 333000
110 500
111 50
112 5
120 3000
121 30000
122 300000
123 0
124 0
I hope this helps.