What will be the successor of a right child node x when: "the first ancestor of X do fall in the left subtree" - binary-search-tree

I am a little confused on finding a successor for one particular node, that is where a node x is the right child(no child nodes of x exists) and it's parent is a left child of the root node. In such a case what will be the successor of node x.
What will be the successor of the left sub tree's node 76?
I read many other tutorials as well with no success. Most tutorial talks about cases on nodes such as 19 where it's successor is 23 and node 23 where it's successor is 50. Or the successor of 76 is 72/50
For the right child we have this condition
X is a right child of its parent P
Then the first ancestor of X, lets call it A, such that X falls in the left subtree of A is the successor of X.
Thanks.

Inorder successor of a node is the next node in Inorder traversal of the Binary Tree.
Inorder Successor is NULL for the last node in Inorder traversal.
So the Successor of 76 is no one , or NULL if you would like .
you can also look at it as the node with the smallest key greater than the key of input node. So, there is no node that greater that 76 , again - by this definition, the Successor of 76 is NUll.

Related

Is it possible for a red black tree to have a node being black, node's parent being black, but the parent has no siblings?

I'm reading through red black tree chapter from the book "introduction to algorithms" by Cormen. In that delete chapter, it basically says if both the node and node's parent are black, then the sibling of node's parent has to exist. The text is in page 327 first paragraph.
I don't get the logic for it, like how do we deduct something like that? Is that even true?
Though I did try to implement my own version of RB Tree and I couldn't produce a tree with a subtree like
/
R
/
B
/ \
B ...
/
...
or
/
B
/
B
/ \
B ...
/
...
Basically both node and parent are black but parent doesn't have sibling. Can't produce it so far
The text is correct. The only time you can have a black node with no sibling is when the node in question is the tree root. The easy way to think of this is to draw out the B-tree equivalent, Moving red nodes up so they are part of the same block as their parent black node. When you do this it becomes easy to see that a non-root black node with no sibling is going to produce an unbalanced tree.
In fact the two options for a non-root non-leaf black node are that either the sibling is black, or the sibling is red and has two black children (even if those two children are leaf nodes and simply represented by a null value in the implementation).
SoronelHaetir is right. But there are 3 things to say about fixups of deletes:
When "sibling" is talked about, this is normally the sibling of the current Node. It is not usually parent's sibling. So if the current Node is the left child of the parent then the "sibling" is the right child of the parent.
You say "I don't get the logic for it, like how do we deduct something like that? Is that even true?". Yes it is. One of the rules is the total black Node count down every path is the same.
So in a correct RB tree, for a given node, if the left child is black, then the right child is black (or it is red and it has 2 black children). For a correct RB Tree, you cannot have any node with an unbalanced count of black nodes down all paths. The count has to be the same in every path.
The other situation where you have unbalanced black tree count is where a leaf node has been deleted and it is black, and you are in the process of restoring the black tree count. Until it is restored (which can always be done), the tree is not a valid RB Tree.

Is therer such a thing as the height of a node in a BST?

I'm getting confused the more I search. Does the term "height" only apply to the root node of a BST? Or does each node have its own corresponding height? Similarly, does each node of a BST have a corresponding level?
Height of a BST or any binary tree is the total number of edges from the root node to the most distant leaf node. You are probably getting confused with recursion when a node on another level is passed as the root. For every node, it is the parent of its subtree, so it is the root for that subtree. That's how recursion works.

Binary Search Tree Minimum Value

I am new to binary search tree data structure. One thing I don't understand is why the leftest node is the smallest
10
/ \
5 12
/ \ / \
1 6 0 14
In the above instance, 0 is the smallest value not 1.
Let me know where I got mixed up.
Thank you!
That tree is not binary search tree.
Creating a binary search tree is a process which starts with adding
elements.
You can do it with array.
First there is no element so make it root.Then start adding elements as node.If the new value is bigger than before add it array[2 x n + 1] (call index of the last value: n). If it is smaller than before add it to array[2 x n]. So all values left of any node is smaller than it and all values right of any node is bigger than it. Even 10 and 6's place.6 cannot be 11.(at your tree,it isn't actually.).That's all !
For a tree to be considered as a binary search tree, it must satisfy the following property:
... the key in each node must be greater than all keys stored in the left sub-tree, and smaller than all keys in right sub-tree
Source: https://en.wikipedia.org/wiki/Binary_search_tree
The tree you posted is not a binary search tree because the root node (10) is not smaller than all keys in the right sub-tree (node 0)
I'm not really sure of your question, but binary search works by comparing the search-value to the value of the node, starting with the root node (value 10 here). If the search-value is less, it then looks at the left node of the root (value 5), otherwise it looks next at the right node (12).
It doesn't matter so much where in the tree the value is as long as the less and greater rule is followed.
In fact, you want to have trees set up like this (except for the bad 0 node), because the more balanced a tree is (number of nodes on left vs. number of nodes on right), the faster your search will be!
A tree balancing algorithm might, for example, look for the median value in a list of values and make that the value of the root node.

Gain maximization on trees

Consider a tree in which each node is associated with a system state and contains a sequence of actions that are performed on the system.
The root is an empty node associated with the original state of the system. The state associated with a node n is obtained by applying the sequence of actions contained in n to the original system state.
The sequence of actions of a node n is obtained by queuing a new action to the parent's sequence of actions.
Moving from a node to another (i.e., adding a new action to the sequence of actions) produces a gain, which is attached to the edge connecting the two nodes.
Some "math":
each system state S is associated with a value U(S)
the gain achieved by a node n associated with the state S cannot be greater than U(S) and smaller than 0
If n and m are nodes in the tree and n is the parent of m, U(n) - U(m) = g(n,m), i.e., the gain on the edge between n and m represents the reduction of U from n to m
See the figure for an example.
My objective is the one of finding the path in the tree that guarantees the highest gain (where the gain of a path is computed by summing all the gains of the edges on the path):
Path* = arg max_{path} (sum g(n,m), for each adjacent n,m in path)
Notice that the tree is NOT known at the beginning, and thus a solution that does not require to visit the entire tree (discarding those paths that for sure do not bring to the optimal solution) to find the optimal solution would be the best option.
NOTE: I obtained an answer here and here for a similar problem in offline mode, i.e., when the graph was known. However, in this context the tree is not known and thus algorithms such as Bellman-Ford would perform no better than a brute-fore approach (as suggested). Instead, I would like to build something that resembles backtracking without building the entire tree to find the best solution (branch and bound?).
EDIT: U(S) becomes smaller and smaller as depth increases.
As you have noticed, a branch and bound can be used to solve your problem. Just expand the nodes that seem the most promising until you find complete solutions, while keeping track of the best known solution. If a node has a U(S) lower than the best known solution during the process, just skip it. When you have no more node, you are done.
Here is an algorithm :
pending_nodes <- (root)
best_solution <- nothing
while pending_nodes is not empty
Drop the node n from pending_nodes having the highest U(n) + gain(n)
if n is a leaf
if best_solution = nothing
best_solution <- n
else if gain( best_solution ) < gain( n )
best_solution <- n
end if
else
if best_solution ≠ nothing
if U(n) + gain(n) < gain(best_solution)
stop. best_solution is the best
end if
end if
append the children of n to pending_nodes
end if
end while

Finding cycles in a graph (not necessarily Hamiltonian or visiting all the nodes)

I have graph like one in Figure 1 (the first image) and want to connect the red nodes to have cycle, but cycles do not have to be Hamiltonian like Figure 2 and Figure 3 (the last two images). The problem has much bigger search space than TSP since we can visit a node twice. Like the TSP, it is impossible to evaluate all the combinations in a large graph and I should try heuristic, but the problem is that, unlike the TSP, the length of cycles or tours is not fixed here. Because, visiting all the blue nodes is not mandatory and this cause having variable length including some of the blue nodes. How can I generate a possible "valid" combination every time for evaluation? I mean, a cycle can be {A, e, B, l, k, j, D, j, k, C, g, f, e} or {A, e, B, l, k, j, D, j, i, h , g, C, g, f, e}, but not {A, e, B, l, k, C, g, f, e} or {A, B, k, C, i, D}.
Update:
The final goal is to evaluate which cycle is optimal/near optimal considering length and risk (see below). So I am not only going to minimize the length but minimizing the risk as well. This cause not being able to evaluate risk of cycle unless you know all its nodes sequence. Hope this clarifies why I can not evaluate new cycle at the middle of its generating process.
We can:
generate and evaluate possible cycles one by one;
or generate all possible cycles and then do their evaluation.
Definition of the risk:
Assume cycle is a ring which connects primary node (one of the red nodes) to all other red nodes. In case of failure in any part (edge) of the ring, no red nodes should be disconnected form the primary node (this is desired). However there are some edges we have to pass twice (due to not having Hamiltonian cycle which connects all the red nodes) and in case of failure in those edges, some of red nodes may be totally disconnected. So risk of cycle is summation of the length of risky edges (we have twice in our ring/tour) multiplied by number of red nodes we lose in case of cutting each risky edge.
A real example of 3D graph I am working on including 5 red nodes and 95 blue nodes is in below:
And here is link to Excel sheet containing adjacency matrix of the above graph (the first five nodes are red and the rests are blue).
Upon a bit more reflection, I decided it's probably better to just rewrite my solution, as the fact that you can use red nodes twice, makes my original idea of mapping out the paths between red nodes inefficient. However, it isn't completely wasted, as the blue nodes between red nodes is important.
You can actually solve this using a modified version of BFS, as more-less a backtracking algorithm. For each unique branch the following information is stored, most of which simply allows for faster rejection at the cost of more space, only the first two items are actually required:
The full current path. (list with just the starting red node)
The remaining red nodes. (initially all red nodes)
The last red node. (initially the start red node)
The set of blue nodes since last red node. (initially empty)
The set of nodes with a count of 1. (initially empty)
The set of nodes with a count of 2. (initially empty)
The algorithm starts with a single node then expands adjacent nodes using BFS or DFS, this repeats until the result is a valid tour or is the node to be expanded is rejected. So the basic psudoish code (current path and remaining red points) looks something like below. Where rn is the set of red nodes, t is the list of valid tours, p/p2 is a path of nodes, r/r2 is a set of red nodes, v is the node to be expanded, and a is a possible node to expand to.
function PATHS2HOME(G,rn)
create a queue Q
create a list t
p = empty list
v ← rn.pop()
r ← rn
add v to p
Q.enqueue((p,r))
while Q is not empty
p, r ← Q.dequeue()
if r is empty and the first and last elements of p are the same:
add p to t
else
v ← last element of p
for all vertices a in G.adjacentVertices(v) do
if canExpand(p,a)
p2 ← copy(p)
r2 ← copy(r)
add a to the end of p2
if isRedNode(a) and a in r2
remove a from r2
Q.enqueue( (p2,r2) )
return t
The following conditions prevent expansion of a node. May not be a complete list.
Red nodes:
If it is in the set of nodes that have a count of 2. This is because the red node would have been used more than twice.
If it is equal to the last red node. This prevents "odd" tours when a red node is adjacent to three other blue nodes. Thus say the red node A, was adjacent to blue nodes b, c and d. Then you would end a tour where part of the tour looks like b-A-c-A-d.
Blue nodes:
If it is in the set of nodes that have a count of 2. This is because the red node would have been used more than twice.
If it is in the set of blue nodes since last red node. This is because it would cause a cycle of blue nodes between red nodes.
Possible optimizations:
You could map out the paths between red nodes, use that to build something of a suffix tree, that shows red nodes that can be reached given the following path Like. The benefit here is that you avoid expanding a node if the path that expansion leads to red nodes that have already been visited twice. Thus this is only a useful check once at least 1 red node has been visited twice.
Use a parallel version of the algorithm. A single thread could be accessing the queue, and there is no interaction between elements in the queue. Though I suspect there are probably better ways. It may be possible to cut the runtime down to seconds instead of hundreds of seconds. Although that depends on the level of parallelization, and efficiency. You could also apply this to the previous algorithm. Actually the reasons for which I switched to using this algorithm are pretty much negated by
You could use a stack instead of a queue. The main benefit here is by using a depth-first approach, the size of the queue should remain fairly small.