Removing duplicate subtrees from CONNECT-BY query in oracle

Removing duplicate subtrees from CONNECT-BY query in oracle - sql

I have a heirarchical table in the format
CREATE TABLE tree_hierarchy (
id NUMBER (20)
,parent_id NUMBER (20)
);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (2, 1);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (4, 2);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (9, 4);
When I run the Query:-
SELECT id,parent_id,
CONNECT_BY_ISLEAF leaf,
LEVEL,
SYS_CONNECT_BY_PATH(id, '/') Path,
SYS_CONNECT_BY_PATH(parent_id, '/') Parent_Path
FROM tree_hierarchy
WHERE CONNECT_BY_ISLEAF<>0
CONNECT BY PRIOR id = PARENT_id
ORDER SIBLINGS BY ID;
Result I am Getting is like this:-
"ID" "PARENT_ID" "LEAF" "LEVEL" "PATH" "PARENT_PATH"
9 4 1 3 "/2/4/9" "/1/2/4"
9 4 1 2 "/4/9" "/2/4"
9 4 1 1 "/9" "/4"
But I need an Oracle Sql Query That gets me only this
"ID" "PARENT_ID" "LEAF" "LEVEL" "PATH" "PARENT_PATH"
9 4 1 3 "/2/4/9" "/1/2/4"
This is a simpler example I have more that 1000 records in such fashion.When I run the above query,It is generating many duplicates.Can any one give me a generic query that will give complete path from leaf to root with out duplicates.Thanks for the help in advance

The root node in finite hierarchy must be always known.
According to the definition: http://en.wikipedia.org/wiki/Tree_structure
the root node is a node that has no parents.
To check if a given node is a root node, take "parent_id" and check in the table if exists a record with this id.
The query might look like this:
SELECT id,parent_id,
CONNECT_BY_ISLEAF leaf,
LEVEL,
SYS_CONNECT_BY_PATH(id, '/') Path,
SYS_CONNECT_BY_PATH(parent_id, '/') Parent_Path
FROM tree_hierarchy th
WHERE CONNECT_BY_ISLEAF<>0
CONNECT BY PRIOR id = PARENT_id
START WITH not exists (
select 1 from tree_hierarchy th1
where th1.id = th.parent_id
)
ORDER SIBLINGS BY ID;

You should point the id patently to build the path for. Now your query is building the path for all leaves which satisfy your condition. You need to use "start with" Let's try it like this:
SELECT id,parent_id,
CONNECT_BY_ISLEAF leaf,
LEVEL,
SYS_CONNECT_BY_PATH(id, '/') Path,
SYS_CONNECT_BY_PATH(parent_id, '/') Parent_Path
FROM tree_hierarchy
WHERE CONNECT_BY_ISLEAF<>0
CONNECT BY PRIOR id = PARENT_id
START WITH id = 2
ORDER SIBLINGS BY ID;

Related

Hierarchical SQL Queries: Best SQL query to obtain the whole branch of a tree from a [nodeid, parentid] pairs table given the end node id

Is there any way to send a recursive query in SQL?
Given the end node id, I need all the rows up to the root node (which has parentid = NULL) ordered by level. E.g. if I have something like:
nodeid | parentid
a | NULL
b | a
c | b
after querying for end_node_id = c, I'd get something like:
nodeid | parentid | depth
a | NULL | 0
b | a | 1
c | b | 2
(Instead of the depth I can also work with the distance to the given end node)
The only (and obvious) way I could come up with is doing a single query per row until I reach the parent node.
Is there a more efficient way of doing it?

If you are using mssql 2005+ you can do this:
Test data:
DECLARE #tbl TABLE(nodeId VARCHAR(10),parentid VARCHAR(10))
INSERT INTO #tbl
VALUES ('a',null),('b','a'),('c','b')
Query
;WITH CTE
AS
(
SELECT
tbl.nodeId,
tbl.parentid,
0 AS Depth
FROM
#tbl as tbl
WHERE
tbl.parentid IS NULL
UNION ALL
SELECT
tbl.nodeId,
tbl.parentid,
CTE.Depth+1 AS Depth
FROM
#tbl AS tbl
JOIN CTE
ON tbl.parentid=CTE.nodeId
)
SELECT
*
FROM
CTE

Ended up with the following solutions (where level is the distance to the end node)
Oracle, using hierarchical queries (thanks to the info provided by #Mureinik):
SELECT IDCATEGORY, IDPARENTCATEGORY, LEVEL
FROM TNODES
START WITH IDCATEGORY=122
CONNECT BY IDCATEGORY = PRIOR IDPARENTCATEGORY;
Example using a view so it boils down to a single standard SQL query (requires >= 10g):
CREATE OR REPLACE VIEW VNODES AS
SELECT CONNECT_BY_ROOT IDCATEGORY "IDBRANCH", IDCATEGORY, IDPARENTCATEGORY, LEVEL AS LVL
FROM TNODES
CONNECT BY IDCATEGORY = PRIOR IDPARENTCATEGORY;
SELECT * FROM VNODES WHERE IDBRANCH = 122 ORDER BY LVL ASC;
http://sqlfiddle.com/#!4/18ba80/3
Postgres >= 8.4, using a WITH RECURSIVE Common Table Expression query:
WITH RECURSIVE BRANCH(IDPARENTCATEGORY, IDCATEGORY, LEVEL) AS (
SELECT IDPARENTCATEGORY, IDCATEGORY, 1 AS LEVEL FROM TNODES WHERE IDCATEGORY = 122
UNION ALL
SELECT p.IDPARENTCATEGORY, p.IDCATEGORY, LEVEL+1
FROM BRANCH pr, TNODES p
WHERE p.IDCATEGORY = pr.IDPARENTCATEGORY
)
SELECT IDCATEGORY,IDPARENTCATEGORY, LEVEL
FROM BRANCH
ORDER BY LEVEL ASC
Example using a view so it boils down to a single standard SQL query:
CREATE OR REPLACE VIEW VNODES AS
WITH RECURSIVE BRANCH(IDBRANCH,IDPARENTCATEGORY,IDCATEGORY,LVL) AS (
SELECT IDCATEGORY AS IDBRANCH, IDPARENTCATEGORY, IDCATEGORY, 1 AS LVL FROM TNODES
UNION ALL
SELECT pr.IDBRANCH, p.IDPARENTCATEGORY, p.IDCATEGORY, LVL+1
FROM BRANCH pr, TNODES p
WHERE p.IDCATEGORY = pr.IDPARENTCATEGORY
)
SELECT IDBRANCH, IDCATEGORY, IDPARENTCATEGORY, LVL
FROM BRANCH;
SELECT * FROM VNODES WHERE IDBRANCH = 122 ORDER BY LVL ASC;
http://sqlfiddle.com/#!11/42870/2

For Oracle, as requested in the comments, you can use the connect by operator to produce the hierarchy, and the level pseudocolumn to get the depth:
SELECT nodeid, parentid, LEVEL
FROM t
START WITH parentid IS NULL
CONNECT BY parentid = PRIOR nodeid;

Oracle CONNECT BY recursive child to parent query, include ultimate parent that self references

In the following example
id parent_id
A A
B A
C B
select id, parent_id
from table
start with id = 'A'
connect by nocycle parent_id = prior id
I get
A A
B A
C B
In my database I have millions of rows in the table and deep and wide hierarchies and I'm not interested in all children. I can derive the children I'm interested in. So I want to turn the query on its head and supply START WITH with the children ids. I then want to output the parent recursively until I get to the top. In my case the top is where the id and parent_id are equal. This is what I'm trying, but I can't get it to show the top level parent.
select id, parent_id
from table
START WITH id = 'C'
CONNECT BY nocycle id = PRIOR parent_id
This gives me
C B
B A
It's not outputting the A A. Is it possible to do this? What I'm hoping to do is not show the parent_id as a separate column in the output, but just show the name relating to the id. The hierarchy is then implied by the order.

I got that result by using WITH clause.
WITH REC_TABLE ( ID, PARENT_ID)
AS
(
--Start WITH
SELECT ID, PARENT_ID
FROM table
WHERE ID='C'
UNION ALL
--Recursive Block
SELECT T.ID, T.PARENT_ID
FROM table T
JOIN REC_TABLE R
ON R.PARENT_ID=T.ID
AND R.PARENT_ID!=R.ID --NoCycle rule
)
SELECT *
FROM REC_TABLE;
And it seems to work that way too.
select id, parent_id
from T
START WITH id = 'C'
CONNECT BY id = PRIOR parent_id and parent_id!= prior id;
-- ^^^^^^^^^^^^^^^^^^^^
-- break cycles
Hope it helps.

Flatten the tree path in SQL server Hierarchy ID

I am using SQL Hierarchy data type to model a taxonomy structure in my application.
The taxonomy can have the same name in different levels
During the setup this data needs to be uploaded via an excel sheet.
Before inserting any node I would like to check if the node at a particular path already exists so that I don't duplicate the entries.
What is the easiest way to check if the node # particular absolute path already exists or not?
for e.g Before inserting say "Retail" under "Bank 2" I should be able to check "/Bank 2/Retail" is not existing
Is there any way to provide a flattened representation of the entire tree structure so that I can check for the absolute path and then proceed?

Yes, you can do it using a recursive CTE.
In each iteration of the query you can append a new level of the hierarchy name.
There are lots of examples of this technique on the internet.
For example, with this sample data:
CREATE TABLE Test
(id INT,
parent_id INT null,
NAME VARCHAR(50)
)
INSERT INTO Test VALUES(1, NULL, 'L1')
INSERT INTO Test VALUES(2, 1, 'L1-A')
INSERT INTO Test VALUES(3, 2, 'L1-A-1')
INSERT INTO Test VALUES(4, 2, 'L1-A-2')
INSERT INTO Test VALUES(5, 1, 'L1-B')
INSERT INTO Test VALUES(6, 5, 'L1-B-1')
INSERT INTO Test VALUES(7, 5, 'L1-B-2')
you can write a recursive CTE like this:
WITH H AS
(
-- Anchor: the first level of the hierarchy
SELECT id, parent_id, name, CAST(name AS NVARCHAR(300)) AS path
FROM Test
WHERE parent_id IS NULL
UNION ALL
-- Recursive: join the original table to the anchor, and combine data from both
SELECT T.id, T.parent_id, T.name, CAST(H.path + '\' + T.name AS NVARCHAR(300))
FROM Test T INNER JOIN H ON T.parent_id = H.id
)
-- You can query H as if it was a normal table or View
SELECT * FROM H
WHERE PATH = 'L1\L1-A' -- for example to see if this exists
The result of the query (without the where filter) looks like this:
1 NULL L1 L1
2 1 L1-A L1\L1-A
5 1 L1-B L1\L1-B
6 5 L1-B-1 L1\L1-B\L1-B-1
7 5 L1-B-2 L1\L1-B\L1-B-2
3 2 L1-A-1 L1\L1-A\L1-A-1
4 2 L1-A-2 L1\L1-A\L1-A-2

Recursive query used for transitive closure

I've created a simple example to illustrate transitive closure using recursive queries in PostgreSQL.
However, something is off with my recursive query. I'm not familiar with the syntax yet so this request may be entirely noobish of me, and for that I apologize in advance. If you run the query, you will see that node 1 repeats itself in the path results. Can someone please help me figure out how to tweak the SQL?
/* 1
/ \
2 3
/ \ /
4 5 6
/
7
/ \
8 9
*/
create table account(
acct_id INT,
parent_id INT REFERENCES account(acct_id),
acct_name VARCHAR(100),
PRIMARY KEY(acct_id)
);
insert into account (acct_id, parent_id, acct_name) values (1,1,'account 1');
insert into account (acct_id, parent_id, acct_name) values (2,1,'account 2');
insert into account (acct_id, parent_id, acct_name) values (3,1,'account 3');
insert into account (acct_id, parent_id, acct_name) values (4,2,'account 4');
insert into account (acct_id, parent_id, acct_name) values (5,2,'account 5');
insert into account (acct_id, parent_id, acct_name) values (6,3,'account 6');
insert into account (acct_id, parent_id, acct_name) values (7,4,'account 7');
insert into account (acct_id, parent_id, acct_name) values (8,7,'account 8');
insert into account (acct_id, parent_id, acct_name) values (9,7,'account 9');
WITH RECURSIVE search_graph(acct_id, parent_id, depth, path, cycle) AS (
SELECT g.acct_id, g.parent_id, 1,
ARRAY[g.acct_id],
false
FROM account g
UNION ALL
SELECT g.acct_id, g.parent_id, sg.depth + 1,
path || g.acct_id,
g.acct_id = ANY(path)
FROM account g, search_graph sg
WHERE g.acct_id = sg.parent_id AND NOT cycle
)
SELECT path[1] as Child,parent_id as Parent,path || parent_id as path FROM search_graph
ORDER BY path[1],depth;

You can simplify in several places (assuming acct_id and parent_id are NOT NULL):
WITH RECURSIVE search_graph AS (
SELECT parent_id, ARRAY[acct_id] AS path
FROM account
UNION ALL
SELECT g.parent_id, sg.path || g.acct_id
FROM search_graph sg
JOIN account g ON g.acct_id = sg.parent_id
WHERE g.acct_id <> ALL(sg.path)
)
SELECT path[1] AS child
, path[array_upper(path,1)] AS parent
, path
FROM search_graph
ORDER BY path;
The columns acct_id, depth, cycle are just noise in your query.
The WHERE condition has to exit the recursion one step earlier, before the duplicate entry from the top node is in the result. That was an "off-by-one" in your original.
The rest is formatting.
If you know the only possible circle in your graph is a self-reference, we can have that cheaper:
WITH RECURSIVE search_graph AS (
SELECT parent_id, ARRAY[acct_id] AS path, acct_id <> parent_id AS keep_going
FROM account
UNION ALL
SELECT g.parent_id, sg.path || g.acct_id, g.acct_id <> g.parent_id
FROM search_graph sg
JOIN account g ON g.acct_id = sg.parent_id
WHERE sg.keep_going
)
SELECT path[1] AS child
, path[array_upper(path,1)] AS parent
, path
FROM search_graph
ORDER BY path;
SQL Fiddle.
Note there would be problems (at least up to pg v9.4) for data types with a modifier (like varchar(5)) because array concatenation loses the modifier but the rCTE insists on types matching exactly:
Surprising results for data types with type modifier

You have account 1 set as its own parent. If you set that account's parent to null you can avoid having that account as both the start and end node (the way your logic is setup you'll include a cycle but then won't add on to that cycle, which seems reasonable). It also looks a little nicer to change your final "path" column to something like case when parent_id is not null then path || parent_id else path end to avoid having the null at the end.

sql nested query - group results by parent

I have the need to return tree like results from a single table. At the moment im using connect by and start with to ensure that all the correct results are returned.
select id,parent_id
from myTable
connect by prior id = parent_id start with name = 'manager'
group by id, parent_id
order by parent_id asc
However i want the results to return in tree structure. So each time a parent row is found its children rows will be displayed directly underneath it. Then move onto next parent and do the same
Expected results
- Parent A
- child a
- child b
- Parent B
- child c
- child d
Actual results
- Parent A
- Parent B
- child a
- child b
- child c
- child d
Is it possible to do this in oracle? My table uses a parent_id field to identify when a row has a parent. Every row also has a sort order, which is the order it should be sorted under its parent and a unique Id.
I'm using an Oracle DB

What you want is to use ORDER SIBLINGS BY. The query you have is ordering by the parent_id column which is overriding any hierarchical ordering.
The query below should do what you need it to do:
with my_hierarchy_data as (
select 1 as id, null as parent_id, 'Manager' as name from dual union all
select 2 as id, 1 as parent_id, 'parent 1' as name from dual union all
select 3 as id, 1 as parent_id, 'parent 2' as name from dual union all
select 4 as id, 2 as parent_id, 'child 1' as name from dual union all
select 5 as id, 2 as parent_id, 'child 2' as name from dual union all
select 6 as id, 3 as parent_id, 'child 3' as name from dual union all
select 7 as id, 3 as parent_id, 'child 4' as name from dual
)
select id, parent_id, lpad('- ', level * 2, ' ') || name as name
from my_hierarchy_data
connect by prior id = parent_id
start with name= 'Manager'
order siblings by parent_id asc

There is a special value level that can be used in Oracle hierarchical queries. It returns 1 for rows at the top level of the hierarchy, 2 for their children, and so on. You can use this to create indentation like this:
select lpad('- ',level*2,' ') || name
from myTable
connect by prior id = parent_id start with name = 'manager'
group by id, parent_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Removing duplicate subtrees from CONNECT-BY query in oracle - sql

Related

Hierarchical SQL Queries: Best SQL query to obtain the whole branch of a tree from a [nodeid, parentid] pairs table given the end node id

Oracle CONNECT BY recursive child to parent query, include ultimate parent that self references

Flatten the tree path in SQL server Hierarchy ID

Recursive query used for transitive closure

sql nested query - group results by parent

Categories

Resources