keeping binary search tree node depth propery updated - binary-search-tree

How can I keep the depth property of a binary search tree's node updated after something is deleted?
I'm thinking that for the case where I delete a node with one child, then I can set the depth of every node under the parent of the node deleted to (original depth - 1).
However, I can not think of a good way to keep depth updated when I am deleting a node that had two children.
For the case of deleting a node with two children, my delete method either moves the left-most node in the right subtree, or the right-most node in the left subtree, up to the node that I am deleting, depending on which path is shorter.
I am not looking for code, just a general game plan or pseudo code

I think the problem seemed more complicated to me than it really was. After drawing a few trees, and applying the delete function on a node with two children (on paper), I noticed that only one node really changes in depth -- the node that replaces the deleted node.
I set the depth of node N, that replaced the node R, with R's depth.

The data structure that represents depth aggregation is a histogram, i.e. a dictionary mapping from depth to count. A deletion of a leaf is a single update to the histogram, while a deletion of a non-leaf is an exercise left to the reader.

Related

Deletion operation in Binary Search Tree: successor or predecessor

Delete operation is the most complex operation in Binary Search Tree, since it needs to consider several possibilities:
The deleted node is leaf node
The deleted node has only one child
The deleted node has both left and right child
The first two cases are easy. But for the second one, I read many books or documents, the solution is: find the min value in the right subtree and replace it with the deleted node. And then delete it from right subtree.
I can fully understand this solution.
In fact, generally, the node with the min value in the right subtree is called Successor of the node. So the above solution is replace the deleted node with its successor's value. And delete the successor node from the subtree.
On the other hand, predecessor of each node is the node with max value in the left subtree.
I think, replace the deleted node with its predecessor should also work.
For instance, the example used in the book of "Data structure and algorithm analysis in C".
If we want to delete node "2". Then we replace it with "3" which is "2" 's successor.
I think, replace "2" with "1" which is "2" 's predecessor can also work. Right? But the books didn't talk about it even a bit.
So is there any convention here? And If after one deletion operation, there are two results both correct. How to keep consistent?
Edit:
Update something based on new learning about this issue. In fact, the book "data structure and algorithm analysis in c" discussed the issue. In summary, it goes as follows:
First, both methods (based on successor or predecessor)should work.
If repeat O(n^2) insert/delete pairs on the tree. And all the deletion operation is based on successor. Then the tree will become unbalanced. Because the algorithm makes the left subtrees deeper than the right. The idea can be illustrated with the following two images:
Then it introduces to the concept of balanced search tree, such as AVL tree.
I can tell for the theory, to me your argument seems correct, that one can take either the predecessor or the successor.
Now in practice, I would think that the best decision would be to keep the tree balanced, and switch between the two options depending on which makes the depth the lowest.

Hierarchical Database Structure SQL Server

I have Different hierarchical structure
Please find Below structure.
1. Parent 1
1.1 Child 1
1.2 Child 2
1.3 Child 3
1.3.1 Child 4
**1.3.2 Parent 2**
Now Look at above tree, here child can also have sub child as PARENT.
So how can I achieve this, keep in mind that I want whole tree without for each loop.
Thanks in advance.
Generally, two approaches may fit your needs.
Version #1: The most obvious (but slow) attempt is to simply create a table holding each node and a reference (foreign key) to its parent. A parent of NULL indicates a/the root node.
The disadvantage of this attempt is that you either need a loop (what you want to avoid) or a RDBMS with the possibility to define and execute recursive queries (usually with a CTE).
Version #2: The second attempt would be the choice in the real world. Whereas the first solution is able to store unlimited depth, these scenarios usually don't occur in hirarchical trees.
Again you create a table with one row per node, but instead having a reference to the parent, you store the absolute path to that node within the tree in e.g. a VarChar column, just like the absolute path of a file in a filesystem. Here, the 'directory name' corresponds to e.g. the ID of the node.
Version #1 has the advantage of being very compact, but it takes quite an effort to prune the tree or retrieve a list of all nodes with their absolute path (RDBMS are not very good in recursive structures). On the other side, a lot of UI components expect exactly this structure to display the tree on screen. Questions like 'Which nodes are indirect childs of node X' are both slow and quite difficult to answer.
Version #2 has the advantage of making it very easy to implement tree manipulation (deletion, pruning, moving nodes and subtrees). Also, the list you require is a simple SELECT. The question 'show all direct or indirect childs of node X' are answered with a simple SELECT as well.
The caveat is the increased size due to redundant saving of paths and the limited depth of the possible tree to save.

How can I proof that a complete binary tree has \lceil n/2 \rceil leaves?

Given a complete binary tree with n nodes. I'm trying to proof that a complete binary tree has exactly \lceil n/2 \rceil leaves.
I think I can do this by induction.
For h(t)=0, the tree is empty. So there are no leaves and the claim holds for an empty tree.
For h(t)=1, the tree has 1 node, that also is a leaf, so the claim holds.
Here I'm stuck, I don't know what to choose as induction hypothesis and how to do the induction step.
If the root node is not a leaf, then it has two subtrees, which you solve for recursively. Each subtree has one more leaf than non-leaf nodes, so when you add the root (which has one more non-leaf than leaf nodes!) and both subtrees together, you get back to one more leaf than non-leaf nodes, or to put it another way, leaf nodes make up half of the number of nodes, rounded up.

Changing a parent in a nested-set with Sql

I have a database structure that contains a parent/child hierarchy and am using a nested-set structure to represent it.
Each record has a parentkey and a lvalue and rvalue.
Inserting new children is easy. We can adjust all subsequent lvalues and rvalues easily.
But how do I re-adjust those values when I'm modifying the parent of a given node?
ie, I'm changing the parent a node belongs to.
Currently, I'm just recomputing the whole tree using a breadth-first traversal starting at the root nodes.
Doing this in sql is time consuming (about 5min to process 50k records).
Is there any easier technique for updating those lvalue/rvalues?
I'm using Sql Server if that makes any difference.

Improving scalability of the modified preorder tree traversal algorithm

I've been thinking about the modified preorder tree traversal algorithm for storing trees within a flat table (such as SQL).
One property I dislike about the standard approach is that to insert a node you
have to touch (on average) N/2 of the nodes (everything with left or right higher than the insert point).
The implementations I've seen rely on sequentially numbered values. This leaves no room for updates.
This seems bad for concurrency and scaling. Imagine you have a tree rooted at the world containing user groups for every account in a large system, it's extremely large, to the point you must store subsets of the tree on different servers. Touching half of all the nodes to add a node to the bottom of the tree is bad.
Here is the idea I was considering. Basically leave room for inserts by partitioning the keyspace and dividing at each level.
Here's an example with Nmax = 64 (this would normally be the MAX_INT of your DB)
0:64
________|________
/ \
1:31 32:63
/ \ / \
2:14 15-30 33:47 48:62
Here, a node is added to the left half of the tree.
0:64
________|________
/ \
1:31 32:63
/ | \ / \
2:11 11:20 21:30 33:47 48:62
The alogorithm must be extended for the insert and removal process to recursively renumber to the left/right indexes for the subtree. Since querying for immediate children of a node is complicated, I think it makes sense to also store the parent id in the table. The algorithm can then select the sub tree (using left > p.left && right < p.right), then use node.id and node.parent to work through the list, subdividing the indexes.
This is more complex than just incrementing all the indexes to make room for the insert (or decrementing for removal), but it has the potential to affect far fewer nodes (only decendenants of the parent of the inserted/removed node).
My question(s) are basically:
Has this idea been formalized or implemented?
Is this the same as nested intervals?
I have heard of people doing this before, for the same reasons, yes.
Note that you do lose at a couple of small advantages of the algorithm by doing this
normally, you can tell the number of descendants of a node by ((right - left + 1) div 2). This can occasionally be useful, if e.g. you'd displaying a count in a treeview which should include the number of children to be found further down in the tree
Flowing from the above, it's easy to select out all leaf nodes -- WHERE (right = left + 1).
These are fairly minor advantages and may not be useful to you anyway, though for some usage patterns they're obviously handy.
That said, it does sound like materialized paths may be more useful to you, as suggested above.
I think you're better off looking at a different way of storing trees. If your tree is broad but not terribly deep (which seems likely for the case you suggested), you can store the complete list of ancestors up to the root against each node. That way, modifying a node doesn't require touching any nodes other than the node being modified.
You can split your table into two: the first is (node ID, node value), the second (node ID, child ID), which stores all the edges of the tree. Insertion and deletion then become O(tree depth) (you have to navigate to the element and fix what is below it).
The solution you propose looks like a B-tree. If you can estimate the total number of nodes in your tree, then you can choose the depth of the tree beforehand.