convert non balanced binary search tree to red black tree - binary-search-tree

Is it possible to convert a non balanced BST (the size of the tree is n and the height is h) to RBT in time complexirty of O(n) and space complexity of O(h)?

If you know the number of nodes before hand this is doable, knowing the number of nodes tells you the height of the target RB tree (regardless of what the original tree height).
Therefore you can simply 'peel' nodes off the original tree one-by-one starting from the minimum and place them in the correct tree slot. The easiest way to do this will end up with every row except for a potentially empty bottom row black. (That is, if you have a tree with 7 nodes they will all be black but if you have a tree with 6 the first 2 rows will be black and the bottom row will have 3 red nodes).
This will take O(n) time - to visit each node in the original tree - and O(h) space because you will need to keep track of some bookkeeping depending on where you are in the process.
And note this will only work if you know the number of nodes in the original tree, as it depends on knowing which nodes will be in the bottom row of the produced tree.

Related

Time complexity of a "modified" heap

Suppose that I call a heap* as a heap that does not need to be left aligned at the last level, nor does it need to be full. Now, given a max heap* (which has the analogous property of a max heap-that is, parent's value>child's value), to perform an Extract_Max(), I need to pop the root, and compare its two children and place whichever of its children are larger in the root node. The node occupied by the larger value is vacated, and I consider the children of the now vacant node and place whichever value is bigger in this vacant node, vacating the larger child's node. I keep progressing this way, until the last layer's child is encountered which is when the algorithm gets terminated. If there are $n$ elements in the max heap*, what is the worst case time complexity of the above Extract_Max() algorithm?

What is the most possible height when the binary search tree haw n nodes?

Is there a mathematical type for the most possible height of a tree with exactly n nodes?
It can be anything. If you are not implementing a balanced binary search tree (like AVL tree or Red-Black tree), then the height of the tree will depend on the inputs you give. In the worst-case, height can be equal to the number of nodes(if each value is greater than the previous one or each value is less than the previous one). If you need more info, please consider describing the specific use case for which this question was asked.

How is AVL tree insertion O(log n) when you need to recalculate balance factors up the tree after every insertion?

I'm implementing an AVL tree, and I'm trying to wrap my head around the time complexity of the adding process. It's my understanding that in order to achieve O(log n) you need to keep either balance or height state in tree nodes so that you don't have to recalculate them every time you need them (which may require a lot of additional tree traversal).
To solve this, I have a protocol that recursively "walks back up" a trail of parent pointers to the root, balancing if needed and setting heights along the way. This way, the addition algorithm kind of has a "capture" and "bubble" phase down and then back up the tree - like DOM events.
My question is: is this still technically O(log n) time? Technically, you only deal with divisions of half at every level in the tree, but you also need to travel down and then back up every time. What is the exact time complexity of this operation?
Assuming the height of the tree is H and the structure stays balanced during all operation.
Then, as you mentioned, inserting a node will take O(H).
However, every time a node is added to the AVL tree, you need to update the height of the parents all the way up to the root node.
Since the tree is balanced, updating height will traverse only the linked-list like structure with the newly inserted node in the tail.
The height updating can be viewed equivalent to traversing a linked-list with length equals to H.
Therefore, updating height will take another O(H) and the total update time is 2 * O(H), which is still O(log N) if we get rid of the constant factor.
Hope this makes sense to you.
"Technically, you only deal with divisions of half at every level in the tree, but you also need to travel down and then back up every time. What is the exact time complexity of this operation?"
You've stated that you have to travel down and up every time.
So, we can say that your function is upper bounded by a runtime of 2 * logn.
It's clear that this is O(logn).
More specifically, we could assign the constant 3 and a starting value of 1, such that
2 * logn <= 3 * logn for all values of n >= 1.
This reduces to 2 <= 3, which is of course true.
The idea behind big-O is to understand the basic shape of the function that upper-bounds your function's runtime as the input size moves towards infinity - thus, we can drop the constant factor of 2.

Is a given key a member of a binary tree - probabilistic answer

The Problem:
Given a BST with N nodes, with a domain of cardinality D (domain being the possible values for the node keys).
Given a key that is in the domain but may or may not be a member of the BST.
At the start, our confidence that the node is in the tree should be 1/D, but as we go deeper into the tree both D and N are split approximately in half. That would suggest that our confidence that our key is a member of the tree should remain constant until we hit the bottom or discover the key. However, I'm not sure if that reasoning is complete, since it seems more like we are choosing N nodes from D.
I was thinking something along the lines of this, but the reasoning here still doesn't seem complete. Can somebody point me in the right direction?
Apriori, the probability that your key in is the tree is N/D.
Without loss of generality, let assume that the node's value range is [1..D].
When you walk down the tree, either:
The current node matches your key, hence P = 1
The current node has value C which is larger than your key, you go left, but you don't know how many items are in the left sub-tree. Now you can make one of these assumptions:
The tree is balanced. The range in the subtree is [1..C-1], and there are (D-1)/2 nodes in the subtree. Hence, P = ((D-1)/2)/(C-1)
The tree is not balanced. The range in the subtree is [1..C-1], and the maximum likelihood estimation for the number of nodes in the subtree is N * (C-1)/D. Hence, P = (N*(C-1)/D)/(C-1) = N/D. (no change)
If you know more about how the tree was constructed - you can make a better MLE for the number of nodes in the subtree.
The current node has value C which is smaller than your key, you go right, but you don't know how many items are in the right sub-tree.
...

How do you derive the time complexity of alpha-beta pruning?

I understand the basics of minimax and alpha-beta pruning. In all the literature, they talk about the time complexity for the best case is O(b^(d/2)) where b = branching factor and d = depth of the tree, and the base case is when all the preferred nodes are expanded first.
In my example of the "best case", I have a binary tree of 4 levels, so out of the 16 terminal nodes, I need to expand at most 7 nodes. How does this relate to O(b^(d/2))?
I don't understand how they come to O(b^(d/2)).
O(b^(d/2)) correspond to the best case time complexity of alpha-beta pruning. Explanation:
With an (average or constant) branching factor of b, and a search
depth of d plies, the maximum number of leaf node positions evaluated
(when the move ordering is pessimal) is O(bb...*b) = O(b^d) – the
same as a simple minimax search. If the move ordering for the search
is optimal (meaning the best moves are always searched first), the
number of leaf node positions evaluated is about O(b*1*b*1*...*b) for
odd depth and O(b*1*b*1*...*1) for even depth, or O(b^(d/2)). In the
latter case, where the ply of a search is even, the effective
branching factor is reduced to its square root, or, equivalently, the
search can go twice as deep with the same amount of computation.
The explanation of b*1*b*1*... is that all the first player's moves
must be studied to find the best one, but for each, only the best
second player's move is needed to refute all but the first (and best)
first player move – alpha–beta ensures no other second player moves
need be considered.
Put simply, you "skip" every two level:
O describes the limiting behavior of a function when the argument tends towards a particular value or infinity, so in your case comparing precisely O(b^(d/2)) with small values of b and d doesn't really make sense.