How to prune dead branches in a hierarchical data structure - batch-processing

We have a table that has an int NodeID column for primary key and a self join to the int ParentID column and a varchar(25) Name column.
We need a query that will delete all dead branches for a given set of leaf nodes (or bottom level nodes). For Example, if given a set of nodes like below, if the NodeID 4 is passed in, Node 4 should be deleted and since Node 2 no longer has children, it should be deleted. Node 1 still has a child (Node 6) so the process would stop there.
[0] ROOT
/ \
[1] Node 1 [3] Node 3
/ \ \
[2] Node 2 [6] Node 6 [5] Node 5
/
[4] Node 4
I have a solution using a cursor, but I avoid cursors whenever possible and would much prefer a batch means of accomplishing this. Is there a way to do this with CTE? or any other means?
The CTE below returns all the parents. I've made a couple attempt to change the last select to delete where the node only includes a single child or no children. Nothing I've tried along those lines executes.
Any approach will have to have some means of walking up the hierarchy and deleting nodes without children.
Any other ideas or approaches are welcome.
WITH nAncestry (ParentID, NodeID, Name, AncestryID)
AS
(
SELECT n.ParentID, n.NodeID, n.Name, 0 AS AncestryID
FROM Node AS n
WHERE n.NodeID=#nodeID
UNION ALL
SELECT n.ParentID, n.NodeID, n.Name, AncestryID + 1
FROM Node AS n
INNER JOIN nAncestry as a
ON a.ParentID = n.NodeID
)
SELECT ParentID, NodeID, Name
FROM nAncestry
WHERE NodeID <>0
ORDER BY AncestryID DESC

Related

Find latest object in hierarchy where the latest isn't marked as deleted

I have two tables (of interest). The first table is a simple hierarchy. Each row has an ID, and a parent ID. Obviously, the parent ID can be NULL when we reach the top of that particular hierarchy. (We store multiple trees here, so there can be multiple NULL parents, but that's probably not important here.)
The second table has objects that have a non-unique identifier, for example a name, a timestamp, and a reference to the first table to indicate where on the hierarchy it sits.
Let's say the first table has a hierarchy of /A/B/C, and the second has a bunch of objects named "Foo". If I'm trying to get the latest Foo in /A/B, then I don't want to get anything from C. This seems straightforward enough. However, if the latest "Foo" in /A/B is marked in the database with a field saying it is deleted, e.g., status = 'deleted', I want to instead get the latest "Foo" in /A even if there are other "Foo" objects with earlier timestamps in /A/B.
Is this possible to do in a CTE? Or do I have to resort to a stored procedure to get this type of logic? I'm already using some stored procedures just for refactoring purposes, so that's not a barrier, but if I can do this in a simpler manner that I'm missing, that may be better (including for performance).
Since that's probably a bit vague, I put this on SQLFiddle. If I add in the override on line 24 of the schema, I should get that as the output. However, if I also add the deleted object in 26, I need to get back to the "update in /A" as the output.
I would extend your code with one intermediate step:
with recursive _rpath as (
select
0 as level,
id, parentid, name
from path
where id = 5 -- this would be filled in later
union all
select
child.level + 1 as level,
parent.id, parent.parentid, parent.name
from _rpath child
join path parent on child.parentid = parent.id
) , c AS (
select
rp, d, d.status
, ROW_NUMBER() OVER(PARTITION BY d.pathid ORDER BY d.creation DESC) AS rn
from data d
join _rpath rp
on rp.id = d.pathid
), datapaths AS (
SELECT *
FROM c
WHERE rn =1
AND status != 'deleted'
)
select dp.rp, dp.d
from datapaths dp
left join datapaths dpNext
on (dpNext.rp).level < (dp.rp).level or
((dpNext.rp).level = (dp.rp).level
and (dpNext.d).creation > (dp.d).creation)
where (dpNext.d).id is null;
DBFiddle Demo
How it works:
-- calculate node number for each pathid sort by creation descending
-- newest one gets always 1
ROW_NUMBER() OVER(PARTITION BY d.pathid ORDER BY d.creation DESC)
-- get only first for each pathid but omit if it is 'deleted'
WHERE rn =1
AND status != 'deleted'

Data structure for many to many hierarchies in SQL Server

I have the following data structure already in the system.
ItemDetails:
ID Name
--------
1 XXX
2 YYY
3 ZZZ
4 TTT
5 UUU
6 WWW
And the hierarchies are in separate table (with many to many relationships)
ItemHierarchy:
ParentCode ChildCode
--------------------
1 2
1 3
3 4
4 5
5 3
5 6
As you can see that 3 is child node for 1 and 3. I want to traverse records say for example that from the node 3.
I need to write a stored procedure and get all the ancestors of 3 and all the child nodes of 3.
Could you please let me know whether any possibilities to pull the data? If so, which data structure is OK for it.
Please note that my table is containing 1 million records and out of it 40% are having multiple hierarchies.
I did 'CTE' with level and incrementing it based upon the hierarchy but I'm getting max recursive error when we traverse from root to leaf level node. I have tried 'HierarchyID' but unable to get all the details when its having multiple parent for a node.
Update: I can set a recursion limit to max and run the query. Since it has millions of rows, I'm unable to get the output at all.
I want to create a data structure such that its capable to giving information from top to bottom or bottom to top (at any node level).
Could someone kindly please help me with that?
Using RDBMS for hierarchical data structure is not recommended, its why graph database have been created.
BTW following Closure Table pattern will help you.
The Closure Table solution is a simple and elegant way of storing hierarchies. It involves storing all paths through the tree, not just those with a direct parent-child relationship.
The key point to use the pattern is how you must fill ItemHierarchy table.
Store one row in this table for each pair of nodes in the tree that shares an ancestor/descendant relationship, even if they are separated by multiple levels in the tree. Also add a row for each node to reference itself.
Think we have a simple graph like bellow:
The doted arrows shows the rows in ItemHierarchy table:
To retrieve descendants of #3:
SELECT c.*
FROM ItemDetails AS ID
JOIN ItemHierarchy AS IH ON ID.ID = IH.ChildCode
WHERE IH.ParentCode = 3;
To retrieve ancestors of #3:
SELECT c.*
FROM ItemDetails AS ID
JOIN ItemHierarchy AS IH ON ID.ID = IH.ParentCode
WHERE IH.ChildCode = 3;
To insert a new leaf node, for instance a new child of #5, first
insert the self-referencing row. Then add a copy of the set of rows in
TreePaths that reference comment #5 as a descendant (including the row
in which #5 references itself), replacing the descendant with
the number of the new item:
INSERT INTO ItemHierarchy (parentCode, childCode)
SELECT IH.parentCode, 8
FROM ItemHierarchy AS IH
WHERE IH.childCode = 5
UNION ALL
SELECT 8, 8;
To delete a complete sub-tree, for instance #4 and its descendants, delete all rows in ItemHierarchy that reference #4 as a
descendant, as well as all rows that reference any of #4’s
descendants as descendants:
DELETE FROM ItemHierarchy
WHERE chidCode IN (SELECT childCode
FROM ItemHierarchy
WHERE parrentCode = 4);
UPDATE
Since the sample data you have shown us leads to recursive loops(not hierarchies) like:
1 -> 3 -> 4 -> 5 -> 3 -> 4 -> 5
Following Path Enumeration pattern will help you.
A UNIX path like /usr/local/lib/ is a path enumeration of the file system,
where usr is the parent of local, which in turn is the parent of lib.
You can create a Table or View from ItemHierarchy table, calling it EnumPath:
Table EnumPath(NodeCode, Path) For the sample data we will have:
To find ancestors of node #4:
select distinct E1.NodeCode from EnumPath E1
inner join EnumPath E2
On E2.path like E1.path || '%'
where E2.NodeCode = 4 and E1.NodeCode != 4;
To find descendants of node #4:
select distinct E1.NodeCode from EnumPath E1
inner join EnumPath E2
On E1.path like E2.path || '%'
where E2.NodeCode = 4 and E1.NodeCode != 4;
Sqlfiddle demo

Find all nodes in an adjacency list model with oracle connect by

Given the following model:
create table child_parent (
child number(3),
parent number(3)
);
Given the following data:
insert into child_parent values(2,1);
insert into child_parent values(3,1);
insert into child_parent values(4,2);
insert into child_parent values(5,2);
insert into child_parent values(6,3);
results in the following tree:
1
/ \
2 3
/ \ \
4 5 6
Now i can find the parents of 5 like this:
SELECT parent FROM child_parent START WITH child = 5
CONNECT BY NOCYCLE PRIOR parent = child;
But how can I get all the nodes (1,2,3,4,5,6) starting from 5?
Finally I came up with a solution similar like this:
SELECT child FROM child_parent START WITH parent =
(
SELECT DISTINCT parent FROM
(
SELECT parent
FROM child_parent
WHERE CONNECT_BY_ISLEAF = 1
START WITH child = 5
CONNECT BY PRIOR parent = child
UNION
SELECT parent
FROM child_parent
WHERE parent = 5
)
)
CONNECT BY NOCYCLE PRIOR child = parent
UNION
SELECT DISTINCT parent FROM
(
SELECT parent
FROM child_parent
WHERE CONNECT_BY_ISLEAF = 1
START WITH child = 5
CONNECT BY PRIOR parent = child
UNION
SELECT parent
FROM child_parent
WHERE parent = 5
);
It works with all nodes for the provided example.
But if one of the leafs has a second parent and the starting point is above this node or in a different branch it don't work.
But for me it is good enough.
A solution to get all nodes in graph could be:
implement the opposite of the query above (from top to bottom) and then execute them (bottom to top, top to bottom) vice versa until you find no more new nodes.
This would need PL/SQL and I also don't know about the performance.
Oracle's CONNECT BY syntax is intended for traversing hierarchical data: it is uni-directional so not a suitable for representing a graph, which requires bi-directionality. There's no way to go 2 -> 1 -> 3 in one query, which is what you need to do to get all the nodes starting from 5.
A long timne ago I answered a question on flattening nodes in a hierarchy (AKA transitive closure) i.e. if 1->2->3 is true, `1->3' is true as well. It links to a paper which demonstrates a PL/SQL solution to generate all the edges and store them in a table. A similar solution could be used in this case. But obviously it is only practical if the nodes in the graph don't chnage very often. So perhaps it will only be of limited use. Anyway, find out more.
Using 5 you cannot traverse whole tree and it's going to very very tricky even if you can achieve it as it is a leaf element.
Try below query, it will traverse whole tree but you will have to start from root instead of leaf:
select * from (
SELECT child FROM child_parent START WITH parent = 1
CONNECT BY NOCYCLE PRIOR child = parent
union
select 1 from dual)
order by child;
you can replace "1" with any other root element, all elements below that root will be printed including that root.

Using connect_by to get the depth of a node in a tree in Oracle DB table

I have a oracle table which looks like this:
Schema:
(number) node_id
(number) parent_id
(number) parent_seq
Each entry in the table represents a single parent/child relationship. A parent can be the parent for multiple children and a child can have multiple parents (but we can assume that no cycles exist because that is validated before committing). If a child has multiple parents then it will have more than one row corresponding to its node_id in the table, and the parent_seq will increment for each parent.
Now, I dont need to rebuild the entire tree, I just need to know the DEPTH of each node. DEPTH as following the common definition of (What is the difference between tree depth and height?)
The depth of a node is the number of edges from the node to the tree's
root node.
Is there a way to do this gracefully in Oracle using the CONNECT_BY syntax?
select
node_id,
min(level) as best_level,
min(sys_connect_by_path(node_id, '/'))
keep (dense_rank first order by level)
as best_path
from t
start with parent_id is null
connect by prior node_id = parent_id
group by node_id
order by 1
fiddle
I believe I found the answer, which was in the documentation. The keyword "LEVEL" is a column which shows the node level when doing a connect_by statement. So you simply need the maximum level for a given node:
select node_id, max(LEVEL) from node_parent_link
CONNECT BY PRIOR node_id = parent_id
group by node_id

Complex TSQL order by clause

Is it possible to craft an ORDER BY clause to ensure the following criteria for two fields (both of type INT), called child and parent respectively for this example.
parent references child, but can be null.
A parent can have multiple children; a child only one parent.
A child cannot be a parent of itself.
There must exist at least one child without a parent.
Each value of child must appear before it appears in parent in the ordered result set.
I'm having difficulty with point 5.
Sample unordered data:
child parent
------------
1 NULL
3 5
4 2
2 5
5 NULL
Obviously neither ORDER BY a, b or ORDER BY b, a work. In fact the more I think about it, I'm not sure it can even be done at all. Given the restrictions, obvious cases such as:
child parent
------------
1 2
2 1
aren't allowed because it violates rule 3 and 4 (and obviously 5).
So, is what I am trying to achieve possible, and if so how? Platform is SQL Server 2005.
Update: Desired sort order for the sample data:
child parent
------------
1 NULL
5 NULL
2 5
3 5
4 2
For each row that defines a non-null value in the parent column, the value has already been present int the child column.
You could use a recursive CTE to find the "depth" of each node, and order by that:
create table node (id int, parent int)
insert into node values (1, null)
insert into node values (2, 5)
insert into node values (3, 5)
insert into node values (4, 2)
insert into node values (5, null)
insert into node values (6, 4);
with node_cte (id, depth) as (
select id, 0 from node where parent is null
union all
select node.id, node_cte.depth + 1
from node join node_cte on node.parent = node_cte.id
)
select node.*
from node
join node_cte on node_cte.id=node.id
order by node_cte.depth asc
You won't be able to do it with an 'ORDER BY' clause because requirement 5 specifies that the order is sensitive to the data hierarchy. In SQL Server 2005 hierarchy data is usually dealt with using recursive CTEs; maybe someone here will provide appropriate code for this case.