The definition of black-height in Red-Black tree - red-black-tree

I am reading CLRS page 309, and I am confused by the definition of black-height of RBT.
The definition in in this book is
the number of black nodes on any simple path from, but not including,
a node x down to a leaf
I also referred to wikipedia, black-height is defined as
the uniform number of black nodes in all paths from root to the leaves
and geeksforgeeks.org
Black height is number of black nodes on a path from a node to a leaf.
Leaf nodes are also counted black nodes.
But none of them give examples, especially no edge cases.
Can you explain it with the following cases?
What is the black-height of a RBT:
nil
B-(nil nil)
B-R-B-(nil nil)
\ \
\ B-(nil nil)
B-(nil nil)
By the definition of CLRS, I calculate the black-height of root is
0
1
2
(In my opinion, am I right?)
And in CLRS lemma 13.1's proof, height is also used. What is the height of the above three trees?
0
1
2
(In my opinion, am I right?)
And, what is the black-height of a tree?
1
2
3
(not sure, I doubt the wikipedia's definition is different from the definition from CLRS)
And also, Exercise 13.1-1 uses this definition.

Related

Time Complexity Analysis of Recursive Tree Treversal Algorithm

I am supposed to design a recursive algorithm that traverses a tree and sets the cardinality property for every node. The cardinality property is the number of nodes that are in the subtree where the currently traversed node is the root.
Here's my algorithm in pseudo/ python code:
SetCardinality(node)
if node exists:
node.card = 1 + SetCardinality(x.left_child) + SetCardinality(x.right_child)
return node.card
else:
return 0
I'm having a hard time coming up with the recurrence relation that describes this function. I figured out that the worst-case input would be a tree of height = n. I saw on the internet that a recurrent relation for such a tree in this algorithm might be:
T(n) = T(n-1) + n but I don't know how the n in the relation corresponds to the algorithm.
You have to ask yourself: How many nodes does the algorithm visit? You will notice that if you run your algorithm on the root node, it will visit each node exactly once, which is expected as it is essentially a depth-first search.
Therefore, if the rest of your algorithm is constant-time operations, we have a time complexity of O(n) for the total number of nodes n.
Now, if you want to express it in terms of the height of the tree, you need to know more about the given tree. If it's a complete binary tree then the height is O(logn) and therefore the time complexity would be O(2h). But expressing it in terms of total nodes is simpler. Notice also that the shape of the tree does not really matter for your time complexity, as you will be visiting each node exactly once regardless.

Breadth first or Depth first

There is a theory that says six degrees of seperations is the highest
degree for people to be connected through a chain of acquaintances.
(You know the Baker - Degree of seperation 1, the Baker knows someone
you don't know - Degree of separation 2)
We have a list of People P, list A of corresponding acquaintances
among these people, and a person x
We are trying to implement an algorithm to check if person x respects
the six degrees of separations. It returns true if the distance from x
to all other people in P is at most six, false otherwise.
We are tying to accomplish O(|P| + |A|) in the worst-case.
To implement this algorithm, I thought about implementing an adjacency list over an adjacency matrix to represent the Graph G with vertices P and edges A, because an Adjacency Matrix would take O(n^2) to traverse.
Now I thought about using either BFS or DFS, but I can't seem to find a reason as to why the other is more optimal for this case.
I want to use BFS or DFS to store the distances from x in an array d, and then loop over the array d to look if any Degree is larger than 6.
DFS and BFS have the same Time Complexity, but Depth is better(faster?) in most cases at finding the first Degree larger than 6, whereas Breadth is better at excluding all Degrees > 6 simultaneously.
After DFS or BFS I would then loop over the array containing the distances from person x, and return true if there was no entry >6 and false when one is found.
With BFS, the degrees of separations would always be at the end of the Array, which would maybe lead to a higher time complexity?
With DFS, the degrees of separations would be randomly scattered in the Array, but the chance to have a degree of separation higher than 6 early in the search is higher.
I don't know if it makes any difference to the Time Complexity if using DFS or BFS here.
Time complexity of BFS and DFS is exactly the same. Both methods visit all connected vertices of the graph, so in both cases you have O(V + E), where V is the number of vertices and E is the number of edges.
That being said, sometimes one algorithm can be preferred over the other precisely because the order of vertex visitation is different. For instance, if you were to evaluate a mathematical expression, DFS would be much more convenient.
In your case, BFS could be used to optimize graph traversal, because you can simply cut-off BFS at the required degree of separation level. All the people who have the required (or bigger) degree of separation would be left unmarked as visited.
The same trick would be much more convoluted to implement with DFS, because as you've astutely noticed, DFS first gets "to the bottom" of the graph, and then it goes back recursively (or via stack) up level by level.
I believe that you can use the the Dijkstra algorithm.
Is a BFS approach that updates your path, is the path have a smaller value. Think as distance have always a cost of 1 and, if you have two friends (A and B) for a person N.
Those friends have a common friend C but, in a first time your algorithm checks a distance for friend A with cost 4 and mark as visited, they can't check the friend B that maybe have a distance of 3. The Dijkstra will help you doing checking this.
The Dijkstra solve this in O(|V|+|E|log|V)
See more at https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

Why B Tree complexity is O(log n), it is not a binary tree

According to Wiki and GFG the search/insert/delete time complexity of a B Tree is O(log n). A B Tree can have > 2 children, i.e. it is not a binary tree. So I don't understand why it is log n -- shouldn't it be faster than log n? For example search should be worst case O(h) where h is the height of the tree.
B-Tree is a generalization of Binary Tree where each node can have more than 2 children. But it is not certain. If for example, the number of children for each node was defined to be x, then the complexity would be . However, when the minimum number of children is 2 (as in Binary Tree) then the maximum height of tree will be , and as mentioned in previous answer, Big-O considers the worst case scenario which is a tree with the largest height (log base 2). Therefore, the complexity of B-Tree is .
yes, it is not a binary tree. But if we perform binary serach algoritm inside a node (for the kyes inside a node) time complexity can be considered as O(logn).
let's consider,
degree of the B-tree (maximum number of children per node) ≤ m.
total number of nodes in the node : n
if the case is like above,
height of the tree is O(logmn) ----------(1)
since number of children can be changed per node, will have to do a logarithmic search per node order (lgm) ----------(2)
so total complexity for search in binary tree is
O(logmn) . O(lgm) ----------(3)
according to the logarithmic operation ,
{logba = logca / logcb}
applying the above operation to (3)
O(logmn) . O(lgm) = O(lgn/lgm) . O(lgm)
= O(lgn)
so B tree time complexity for search operation is O(lgn)
Big-O is a measure of worst case complexity; since B-tree nodes are not required to have more than 2 children, the worst case will be that no node has more than 2 children.

Rotation in a Red Black tree

I am trying to figure out the rotation in a Red Black tree while its rebalancing is done. I understand why rotation is occurring but I don't get how it is being done. Also, what intermediate rotations like LL, RR , LR and RL are done to reach till the result and I would also appreciate if someone tells me any rule of thumb as to when to do any one of these rotation. Here is the rotation:
Rr(2) is the case when black node deficiency is in right child of "py" i.e.
"y" and grandchild of "v" are 2 red nodes i.e. "b" and "x"
You can try to understand rotations in Red Black Trees in a better way by breaking down the operation into different rotations. There are only 3 basic operations for a Left Leaning Red Black BST. The operations are performed in order as listed in this slide.
Moreover, the Red Black Tree that you have shown is not correct as it violates the condition for a Red Black Tree. ie. Every path from the root to the leaf must have equal number of black edges. But in your final tree, the path from x to c has 2 black edges, while the path from x to a has 1 black edge. I recommend you to read more about self balancing BSTs and Red-Black BSTs from here.
PS. I do not own the slide. It has been taken from Robert Sedgewick's course on Algorithms from coursera.

Search optimization problem

Suppose you have a list of 2D points with an orientation assigned to them. Let the set S be defined as:
S={ (x,y,a) | (x,y) is a 2D point, a is an orientation (an angle) }.
Given an element s of S, we will indicate with s_p the point part and with s_a the angle part. I would like to know if there exist an efficient data structure such that, given a query point q, is able to return all the elements s in S such that
(dist(q_p, s_p) < threshold_1) AND (angle_diff(q_a, s_a) < threshold_2) (1)
where dist(p1,p2), with p1,p2 2D points, is the euclidean distance, and angle_diff(a1,a2), with a1,a2 angles, is the difference between angles (taken to be the smallest one). The data structure should be efficient w.r.t. insertion/deletion of elements and the search as defined above. The number of vectors can grow up to 10.000 and more, but take this with a grain of salt.
Now suppose to change the above requirement: instead of using the condition (1), let's request all the elements of S such that, given a distance function d, we want all elements of S such that d(q,s) < threshold. If i remember well, this last setup is called range-search. I don't know if the first case can be transformed in the second.
For the distance search I believe the accepted best method is a Binary Space Partition tree. This can be stored as a series of bits. Each two bits (for a 2D tree) or three bits (for a 3D tree) subdivides the space one more level, increasing resolution.
Using a BSP, locating a set of objects to compare distances with is pretty easy. Just find the smallest set of squares or cubes which contain the edges of your distance box.
For the angle, I don't know of anything. I suppose that you could store each object in a second list or tree sorted by its angle. Then you would find every object at the proper distance using the BSP, every object at the proper angles using the angle tree, then do a set intersection.
You have effectively described a "three dimensional cyclindrical space", ie. a space that is locally three dimensional but where one dimension is topologically cyclic. In other words, it is locally flat and may be modeled as the boundary of a four-dimensional object C4 in (x, y, z, w) defined by
z^2 + w^2 = 1
where
a = arctan(w/z)
With this model, the space defined by your constraints is a 2-dimensional cylinder wrapped "lengthwise" around a cross section wedge, where the wedge wraps around the 4-d cylindrical space with an angle of 2 * threshold_2. This can be modeled using a "modified k-d tree" approach (modified 3-d tree), where the data structure is not a tree but actually a graph (it has cycles). You can still partition this space into cells with hyperplane separation, but traveling along the curve defined by (z, w) in the positive direction may encounter a point encountered in the negative direction. The tree should be modified to actually lead to these nodes from both directions, so that the edges are bidirectional (in the z-w curve direction - the others are obviously still unidirectional).
These cycles do not change the effectiveness of the data structure in locating nearby points or allowing your constraint search. In fact, for the most part, those algorithms are only slightly modified (the simplest approach being to hold a visited node data structure to prevent cycles in the search - you test the next neighbors about to be searched).
This will work especially well for your criteria, since the region you define is effectively bounded by these axis-defined hyperplane-bounded cells of a k-d tree, and so the search termination will leave a region on average populated around pi / 4 percent of the area.