Delete operation is the most complex operation in Binary Search Tree, since it needs to consider several possibilities:
The deleted node is leaf node
The deleted node has only one child
The deleted node has both left and right child
The first two cases are easy. But for the second one, I read many books or documents, the solution is: find the min value in the right subtree and replace it with the deleted node. And then delete it from right subtree.
I can fully understand this solution.
In fact, generally, the node with the min value in the right subtree is called Successor of the node. So the above solution is replace the deleted node with its successor's value. And delete the successor node from the subtree.
On the other hand, predecessor of each node is the node with max value in the left subtree.
I think, replace the deleted node with its predecessor should also work.
For instance, the example used in the book of "Data structure and algorithm analysis in C".
If we want to delete node "2". Then we replace it with "3" which is "2" 's successor.
I think, replace "2" with "1" which is "2" 's predecessor can also work. Right? But the books didn't talk about it even a bit.
So is there any convention here? And If after one deletion operation, there are two results both correct. How to keep consistent?
Edit:
Update something based on new learning about this issue. In fact, the book "data structure and algorithm analysis in c" discussed the issue. In summary, it goes as follows:
First, both methods (based on successor or predecessor)should work.
If repeat O(n^2) insert/delete pairs on the tree. And all the deletion operation is based on successor. Then the tree will become unbalanced. Because the algorithm makes the left subtrees deeper than the right. The idea can be illustrated with the following two images:
Then it introduces to the concept of balanced search tree, such as AVL tree.
I can tell for the theory, to me your argument seems correct, that one can take either the predecessor or the successor.
Now in practice, I would think that the best decision would be to keep the tree balanced, and switch between the two options depending on which makes the depth the lowest.
Related
So I got this task to suggest a AVL tree with the capability of performing the fallowing two operations in O(1):
Successor takes 𝑂(1) - return the successor.
Predecessor takes 𝑂(1)- return the Predecessor
I am kind of stuck after trying to think of a solution for this.
Any ideas?
I think it's not so much about modifying the tree or implementing any complicated predecessor/successor-finding logic to be able to physically traverse the tree to find successor/predecessor nodes in constant time as much as simply adding succ and pred links to every node in the tree, using them in your predecessor(n) and successor(n) functions, and maintaining these links over insert and delete operations.
All you have to be careful about with this approach to ensure successor(n) and predecessor(n) are actually O(1) is that the time complexity of maintaining succ/pred links does not exceed the complexity of the operations which require link updates. For example, because inserting a new node into a balanced tree is an O(lg n) operation, updating successor/predecessor links after insert cannot exceed O(lg n) (as otherwise your subsequent lookups would technically be doing amortized work greater than O(1)).
The steps you have to take on each insert and delete operation to maintain pred and succ links are actually pretty simple, and thankfully the complexities introduced to the AVL tree by the re-balance operation don't really affect them because while re-balance changes the structure of the tree, it doesn't affect the tree's ordering and therefore doesn't affect predecessor/successor relations.
Finding the successor and predecessor of a given node in balanced BST (which an AVL tree is guaranteed to be) is O(lg n), and don't differ in terms of logic from the operations you would perform to find predecessor and successor in a regular BST. Therefore, after inserting a new node n into your AVL, simply find n's successor and predecessor nodes and update the appropriate links:
insert(n) {
...
# normal AVL insert and rebalance logic
...
predecessor, successor = findPredAndSucc(n)
if (predecessor) predecessor.succ = n
if (successor) successor.pred = n
n.pred = predecessor
n.succ = successor
}
When deleting n, you'll also have to find n's predecessor and successor and update links as in:
delete(n) {
# save these values first so they aren't lost when n is deleted
predecessor, successor = findPredAndSucc(n)
...
# normal AVL delete and rebalance logic
...
if (predecessor) predecessor.succ = successor
if (successor) successor.pred = predecessor
}
I'm slightly conflating the node n and the value of n (if inserting/deleting by value you would have to traverse the tree to find n first), but you get the idea.
Following this approach, after every insert and delete each node will have direct links to its predecessor/successor nodes, and because of the AVL-tree's self balancing property the time spent finding the relevant nodes to update these links on after inserts and deletes doesn't add any additional complexity to those operations (finding predecessor and successor nodes don't require more than 2 O(h) = O(lg n) traversals each, the same complexity as insert and delete themselves, therefore not changing the overall complexity), so our successor(n) and predecessor(n) functions will be overall O(1) if they just follow the succ and pred links we set.
I want to be able to to find a specific node by it's ID for performance reasons (IDs are more efficient than indexes)
In order to execute the following example:
MATCH (s)
WHERE ID(s) = 65110
RETURN s
I will need the ID of the node (65110 in this case)
But how to I get it? Since the ID is auto-generated, It's impossible to find the ID without querying the graph, which kind of defeats the purpose since I will already have the node.
Am I missing something?
TL;DR: use an indexed property for lookups unless you absolutely need to optimise and can measure the difference.
Typically you use an index lookup as an entry point to the graph, that is, to obtain the node that provides the start of an edge traversal. While the pointer-like nature of Neo4j node IDs means they are theoretically faster, index lookups are also very efficient so you should not discount them on performance grounds unless you are sure it will make a measurable difference.
You should also consider that Neo4j node IDs are not stable. If you delete a node it is possible for the same ID to be re-used in future. For this reason they should really be considered an internal implementation detail and not one that should be relied on as part of your application's external interface.
That said, I have an application that stores Neo4j IDs in a Solr index for looking up nodes in bulk, but this index is considered volatile and the nodes also contain an indexed, application-generated UUID property (with a unique constraint) that serves as their main "primary key".
Further reading and discussion: https://github.com/neo4j/neo4j/issues/258
How can I keep the depth property of a binary search tree's node updated after something is deleted?
I'm thinking that for the case where I delete a node with one child, then I can set the depth of every node under the parent of the node deleted to (original depth - 1).
However, I can not think of a good way to keep depth updated when I am deleting a node that had two children.
For the case of deleting a node with two children, my delete method either moves the left-most node in the right subtree, or the right-most node in the left subtree, up to the node that I am deleting, depending on which path is shorter.
I am not looking for code, just a general game plan or pseudo code
I think the problem seemed more complicated to me than it really was. After drawing a few trees, and applying the delete function on a node with two children (on paper), I noticed that only one node really changes in depth -- the node that replaces the deleted node.
I set the depth of node N, that replaced the node R, with R's depth.
The data structure that represents depth aggregation is a histogram, i.e. a dictionary mapping from depth to count. A deletion of a leaf is a single update to the histogram, while a deletion of a non-leaf is an exercise left to the reader.
I have Different hierarchical structure
Please find Below structure.
1. Parent 1
1.1 Child 1
1.2 Child 2
1.3 Child 3
1.3.1 Child 4
**1.3.2 Parent 2**
Now Look at above tree, here child can also have sub child as PARENT.
So how can I achieve this, keep in mind that I want whole tree without for each loop.
Thanks in advance.
Generally, two approaches may fit your needs.
Version #1: The most obvious (but slow) attempt is to simply create a table holding each node and a reference (foreign key) to its parent. A parent of NULL indicates a/the root node.
The disadvantage of this attempt is that you either need a loop (what you want to avoid) or a RDBMS with the possibility to define and execute recursive queries (usually with a CTE).
Version #2: The second attempt would be the choice in the real world. Whereas the first solution is able to store unlimited depth, these scenarios usually don't occur in hirarchical trees.
Again you create a table with one row per node, but instead having a reference to the parent, you store the absolute path to that node within the tree in e.g. a VarChar column, just like the absolute path of a file in a filesystem. Here, the 'directory name' corresponds to e.g. the ID of the node.
Version #1 has the advantage of being very compact, but it takes quite an effort to prune the tree or retrieve a list of all nodes with their absolute path (RDBMS are not very good in recursive structures). On the other side, a lot of UI components expect exactly this structure to display the tree on screen. Questions like 'Which nodes are indirect childs of node X' are both slow and quite difficult to answer.
Version #2 has the advantage of making it very easy to implement tree manipulation (deletion, pruning, moving nodes and subtrees). Also, the list you require is a simple SELECT. The question 'show all direct or indirect childs of node X' are answered with a simple SELECT as well.
The caveat is the increased size due to redundant saving of paths and the limited depth of the possible tree to save.
I've been thinking about the modified preorder tree traversal algorithm for storing trees within a flat table (such as SQL).
One property I dislike about the standard approach is that to insert a node you
have to touch (on average) N/2 of the nodes (everything with left or right higher than the insert point).
The implementations I've seen rely on sequentially numbered values. This leaves no room for updates.
This seems bad for concurrency and scaling. Imagine you have a tree rooted at the world containing user groups for every account in a large system, it's extremely large, to the point you must store subsets of the tree on different servers. Touching half of all the nodes to add a node to the bottom of the tree is bad.
Here is the idea I was considering. Basically leave room for inserts by partitioning the keyspace and dividing at each level.
Here's an example with Nmax = 64 (this would normally be the MAX_INT of your DB)
0:64
________|________
/ \
1:31 32:63
/ \ / \
2:14 15-30 33:47 48:62
Here, a node is added to the left half of the tree.
0:64
________|________
/ \
1:31 32:63
/ | \ / \
2:11 11:20 21:30 33:47 48:62
The alogorithm must be extended for the insert and removal process to recursively renumber to the left/right indexes for the subtree. Since querying for immediate children of a node is complicated, I think it makes sense to also store the parent id in the table. The algorithm can then select the sub tree (using left > p.left && right < p.right), then use node.id and node.parent to work through the list, subdividing the indexes.
This is more complex than just incrementing all the indexes to make room for the insert (or decrementing for removal), but it has the potential to affect far fewer nodes (only decendenants of the parent of the inserted/removed node).
My question(s) are basically:
Has this idea been formalized or implemented?
Is this the same as nested intervals?
I have heard of people doing this before, for the same reasons, yes.
Note that you do lose at a couple of small advantages of the algorithm by doing this
normally, you can tell the number of descendants of a node by ((right - left + 1) div 2). This can occasionally be useful, if e.g. you'd displaying a count in a treeview which should include the number of children to be found further down in the tree
Flowing from the above, it's easy to select out all leaf nodes -- WHERE (right = left + 1).
These are fairly minor advantages and may not be useful to you anyway, though for some usage patterns they're obviously handy.
That said, it does sound like materialized paths may be more useful to you, as suggested above.
I think you're better off looking at a different way of storing trees. If your tree is broad but not terribly deep (which seems likely for the case you suggested), you can store the complete list of ancestors up to the root against each node. That way, modifying a node doesn't require touching any nodes other than the node being modified.
You can split your table into two: the first is (node ID, node value), the second (node ID, child ID), which stores all the edges of the tree. Insertion and deletion then become O(tree depth) (you have to navigate to the element and fix what is below it).
The solution you propose looks like a B-tree. If you can estimate the total number of nodes in your tree, then you can choose the depth of the tree beforehand.