Inorder successor in a reversed binary search tree - binary-search-tree

I have a slight confusion on inorder successor/predecessor if the BST is flipped. What I meant when the BST is flipped/reversed is that when all elements in the right subtree is smaller and all elements in the left subtree is greater. Normally the right subtree has greater value. What if it's the reversed, does the definition of inorder successor/predecessor still remains the same?
For normal tree, the inorder successor would be the leftmost child of the right subtree isn't it?
For flipped BST like the example below:
8
/\
15 4
/\ /\
20 10 6 2
Is the inorder successor of 8 is 10? Or is it 6 if we follow the "usual" definition of inorder successor?
Thanks!

If you do an in-order traversal of a reversed BST you will get the numbers sorted in descending order. So in that case the order for your values will be: 20, 15, 10, 8, 6, 4, 2. So the successor of 8 will be 6.

Related

Is this an optimal binary search tree?

In this posted question Fig. 15.9 (b) is considered the optimal tree with expected search cost of 2.75 but by swapping the k_3 subtree with leaf d_0 we can get an expected search cost of 2.65, is there something incorrect with my reasoning?
As you see in the book K = {k1; k2;:::;kn} of n distinct keys in sorted order (so that k1 < k2 < k3< k4 < k5)
Because both Fig. 15.9 (a) and Fig. 15.9 (b) are BST so they have the same order if you use preorder traversal: k1=>k2=>k3=>k4=>k5
By swapping the k_3 subtree with leaf d_0, the order will change if you use preorder traversal: k3=>k1=>k2=>k4=>k5

If get inorder successor in BST takes O(h), why does iterative inorder traversal take O(n) when calling an O(h) function n times?

In a BST, it takes O(h) time complexity to get the inorder successor of a given node, so given getNext(), which gets the inorder successor of the current node, you would need n calls to getNext() to traverse the tree, giving O(nh) time complexity.
However, iterative inorder traversal of BST's are given in books to take O(n) time. I'm confused why there's a difference.
(Nodes have parent pointers).
Getting the inorder successor is O(h), but not Ө(h). Half of the nodes will have a successor just one link away. Let’s think about the number of pointer dereferences it takes to traverse a given node:
the number of pointer dereferences to traverse the left subtree, plus
the number to ascend back from the left subtree’s rightmost node = the height of the left subtree (at most), plus
one for the node itself, plus
the number to descend to the right subtree’s rightmost node = the height of the right subtree (at most), plus
the number for the right subtree.
So an upper bound is:
f(0) = 1
f(h) = f(h − 1) + (h − 1) + h + (h − 1) + f(h − 1)
    
 = 2f(h − 1) + 3h − 2
f(h) = 5·2h − 3h − 4
and h being ⌈log₂ n⌉, f is O(n).

Which is a better time complexity?

I have to pick out which operations have a better worst case time complexity on an AVL tree than a BST. I have established that the time complexity for each operation is the same depending on the tree...
The worst case time complexity for an AVL tree is...
Insert - O(log(n))
Remove - O(log(n))
Search - O(log(n))
The worst case time complexity for a BST is....
Insert - O(height)
Remove - O(height)
Search - O(height)
So is O(log(n)) a better time complexity than O(height)?
The worst case time complexity on a BST for insert, remove, and search is O(n), where n is the number of nodes in the BST. You can trigger this worst case by inserting nodes into a BST in an order such that the BST is essentially a linked list (e.g. first insert 1, then insert 2, then insert 3, and so on... you will end up with a BST that looks like 1 -> 2 -> 3...).
O(log(n)) is a better time complexity than O(n).
O(log(n)) is best case scenario for O(height). Height of your binary tree can be any integer between log(n) and n where n denotes the number of nodes.
For example if you have a BST where each node has only right child, it is the same as if it were a linked list, thus having O(n) worst case complexity for all three operations.
On the other hand AVL is self-balancing binary search tree, meaning every two sub-trees from any node have the same depth (height) +- constant. That means you are approximately halving the values at every step, thus getting O(log(n)) complexity, which is also your O(height) complexity.
An AVL tree is basically a height balanced BST.
If you consider a full AVL tree, log n (AVL tree) > log n (BST).
-> where n is the number of nodes.
whereas when you consider O (height), it'll be the same in both the AVL and BST.
3
\
5
(BST)
height = 2 , n = 2
3
/ \
2 5
(AVL)
height = 2, n = 3

Segment tree - query complexity

I am having problems with understanding segment tree complexity. It is clear that if you have update function which has to change only one node, its complexity will be log(n).
But I have no idea why complexity of query(a,b), where (a,b) is interval that needs to be checked, is log(n).
Can anyone provide me with intuitive / formal proof to understand this?
There are four cases when query the interval (x,y)
FIND(R,x,y) //R is the node
% Case 1
if R.first = x and R.last = y
return {R}
% Case 2
if y <= R.middle
return FIND(R.leftChild, x, y)
% Case 3
if x >= R.middle + 1
return FIND(R.rightChild, x, y)
% Case 4
P = FIND(R.leftChild, x, R.middle)
Q = FIND(R.rightChild, R.middle + 1, y)
return P union Q.
Intuitively, first three cases reduce the level of tree height by 1, since the tree has height log n, if only first three cases happen, the running time is O(log n).
For the last case, FIND() divide the problem into two subproblems. However, we assert that this can only happen at most once. After we called FIND(R.leftChild, x, R.middle), we are querying R.leftChild for the interval [x, R.middle]. R.middle is the same as R.leftChild.last. If x > R.leftChild.middle, then it is Case 1; if x <= R.leftChild, then we will call
FIND ( R.leftChild.leftChild, x, R.leftChild.middle );
FIND ( R.leftChild.rightChild, R.leftChild.middle + 1, , R.leftChild.last );
However, the second FIND() returns R.leftChild.rightChild.sum and therefore takes constant time, and the problem will not be separate into two subproblems (strictly speaking, the problem is separated, though one subproblem takes O(1) time to solve).
Since the same analysis holds on the rightChild of R, we conclude that after case4 happens the first time, the running time T(h) (h is the remaining level of the tree) would be
T(h) <= T(h-1) + c (c is a constant)
T(1) = c
which yields:
T(h) <= c * h = O(h) = O(log n) (since h is the height of the tree)
Hence we end the proof.
This is my first time to contribute, hence if there are any problems, please kindly point them out and I would edit my answer.
A range query using a segment tree basically involves recursing from the root node. You can think of the entire recursion process as a traversal on the segment tree: any time a recursion is needed on a child node, you are visiting that child node in your traversal. So analyzing the complexity of a range query is equivalent to finding the upper bound for the total number of nodes that are visited.
It turns out that at any arbitrary level, there are at most 4 nodes that can be visited. Since the segment tree has a height of log(n) and that at any level there are at most 4 nodes that can be visited, the upper bound is actually 4*log(n). The time complexity is therefore O(log(n)).
Now we can prove this with induction. The base case is at the first level where the root node lies. Since the root node has at most two child nodes, we can only visit at most those two child nodes, which is at most 4 nodes.
Now suppose it is true that at an arbitrary level (say level i) we visit at most 4 nodes. We want to show that we will visit at most 4 nodes at the next level (level i+1) as well. If we had visited only 1 or 2 nodes at level i, it's trivial to show that at level i+1 we will visit at most 4 nodes because each node can have at most 2 child nodes.
So let's focus on the assumption that 3 or 4 nodes were visited at level i, and try to show that at level i+1 we can also have at most 4 visited nodes. Now since the range query is asking for a contiguous range, we know that the 3 or 4 nodes visited at level i can be categorized into 3 partitions of nodes: a leftmost single node whose segment range is only partially covered by the query range, a rightmost single node whose segment range is only partially covered by the query range, and 1 or 2 middle nodes whose segment range is fully covered by the query range. Since the middle nodes have their segment range(s) fully covered by the query range, there would be no recursion at the next level; we just use their precomputed sums. We are left with possible recursions on the leftmost node and the rightmost node at the next level, which is obviously at most 4.
This completes the proof by induction. We have proven that at any level at most 4 nodes are visited. The time complexity for a range query is therefore O(log(n)).
An interval of length n can be represented by k nodes where k <= log(n)
We can prove it based on how the binary system works.

BFS (Breadth First Search) Time complexity at every step

BFS(G,s)
1 for each vertex u ∈ G.V-{s}
2 u.color = WHITE
3 u.d = ∞
4 u.π = NIL
5 s.color = GRAY
6 s.d = 0
7 s.π = NIL
8 Q ≠ Ø
9 ENQUEUE(Q, s)
10 while Q ≠ Ø
11 u = DEQUEUE(Q)
12 for each v ∈ G.Adj[u]
13 if v.color == WHITE
14 v.color = GRAY
15 v.d = u.d + 1
16 v.π = u
17 ENQUEUE(Q, v)
18 u.color = BLACK
The above Breadth First Search code is represented using adjacency lists.
Notations -
G : Graph
s : source vertex
u.color : stores the color of each vertex u ∈ V
u.π : stores predecessor of u
u.d = stores distance from the source s to vertex u computed by the algorithm
Understanding of the code (help me if I'm wrong) -
1. As far as I could understand, the ENQUEUE(Q, s) and DEQUEUE(Q) operations take O(1) time.<br>
2. Since the enqueuing operation occurs for exactly one time for one vertex, it takes a total O(V) time.
3. Since the sum of lengths of all adjacency lists is |E|, total time spent on scanning adjacency lists is O(E).
4. Why is the running time of BFS is O(V+E)?.
Please do not refer me to some website, I've gone through many articles to understand but I'm finding it difficult to understand.
Can anyone please reply to this code by writing the time complexity of each of the 18 lines?
Lines 1-4: O(V) in total
Lines 5-9: O(1) or O(constant)
Line 11: O(V) for all operations of line 11 within the loop (each vertice can only be dequeued once)
Lines 12-13: O(E) in total as you will check through every possible edge once. O(2E) if the edges are bi-directional.
Lines 14-17: O(V) in total as out of the E edges you check, only V vertices will be white.
Line 18: O(V) in total
Summing the complexities gives you
O(4V + E + 1) which simplifies to O(V+E)
New:
It is not O(VE) because at each iteration of the loop starting at line 10, lines 12-13 will only loop through the edges the current node is linked to, not all the edges in the entire graph. Thus, looking from the point of view of the edges, they will only be looped at most twice in a bi-directional graph, once by each node it connects with.