Remove a node in binary search tree - binary-search-tree

I am reading a book about removing a node from a binary search tree right now and the procedure described in the book seems unnecessarily complicated to me.
My question is specifically about removing a node that has both left and right subtree. In my opinion, node-to-remove should be replaced by the rightmost node in its left subtree or by its left node if its left subtree only has one node.
In case No.1, if we remove 40, it will be replaced by 30; in case No.2, if we remove 40, it will be replaced 35.
But in the book, it says the replacement should be found from node-to-remove's right subtree which could involve some complex manipulations.
Am I missing something here? Please point it out.

What you have pointed out is correct, the deleted node should be replace by either its in order successor which is the left most node in the right sub-tree or its in-order predecessor which is the right most node in the left sub-tree. This allows the tree to be traversed correctly. Most binary search tree data structures allow the deletion to be performed either way but in some cases special cases you might want to implement deletion such that the tree remains balanced.
More details and sample code is available on Wikipedia.

In case no.1 if you remove node 40 it will be replace by 50.
In case no.2 if you remove node 40 it will be replace by 50.
So basically when we delete any node that has 2 child then the removal should be as below.
We go the right child of the node, and then extreme left of that child.
Below figures shown some example, how to delete a node from binary search tree. This is also taken from one book, but it is clearly explained.

Related

Cypher BFS with multiple Relations in Path

I'd like to model autonomous systems and their relationships in Graph Database (memgraph-db)
There are two different kinds of relationships that can exist between nodes:
undirected peer2peer relationships (edges without arrows in image)
directed provider2customer relationships (arrows pointing to provider in image)
The following image shows valid paths that I want to find with some query
They can be described as
(s)-[:provider*0..n]->()-[:peer*0..n]—()<-[:provider*0..n]-(d)
or in other words
0-n c2p edges followed by 0-n p2p edges followed by 0-n p2c edges
I can fix the first and last node and would like to find a (shortest/cheapest) path. As I understand I can do BFS if there is ONE relation on the path.
Is there a way to query for paths of such form in Cypher?
As an alternative I could do individual queries where I specify the length of each of the segments and then do a query for every length of path until a path is found.
i.e.
MATCH (s)<-[]->(d) // All one hop paths
MATCH (s)-[:provider]->()-[:peer]-(d)
MATCH (s)-[:provider]->()<-[:provider]-(d)
...
Since it's viable to have 7 different path sections, I don't see how 3 BFS patterns (... BFS*0..n) would yield a valid solution. It's impossible to have an empty path because the pattern contains some nodes between them (I have to double-check that).
Writing individual patterns is not great.
Some options are:
MATCH path=(s)-[:BFS*0.n]-(d) WHERE {{filter_expression}} -> The expression has to be quite complex in order to yield valid paths.
MATCH path=(s)-[:BFS*0.n]-(d) CALL module.filter_procedure(path) -> The module.procedure(path) could be implemented in Python or C/C++. Please take a look here. I would recommend starting with Python since it's much easier. Python for the PoC should be fine. I would also recommend starting with this option because I'm pretty confident the solution will work, + it's modular. After all, the filter_procedure could be extended easily, while the query will stay the same.
Could you please provide a sample dataset in a format of a Cypher query (a couple of nodes and edges / a small graph)? I'm glad to come up with a solution.

An alternative method to create an AVL tree from a sorted array in O(n) time

I need some help in this data structure homework problem. I was requested to write an algorithm that creates an AVL tree from a sorted array in O(n) time.
I read this solution method: Creating a Binary Search Tree from a sorted array
They do it recursively for the two halves of the sorted array and it works.
I found a different solution and I want to check if it's valid.
My solution is to store another property of the root called "root.minimum" that will contain a pointer to the minimum.
Then, for the k'th element, we'll add it recursively to the AVL tree of the previous k-1 elements. We know that the k'th element is smaller than the minimum, so we'll add it to the left of root.minimum to create the new tree.
Now the tree is no longer balanced, but all we need to do to fix it is just one right rotation of the previous minimum.
This way the insertion takes O(1) for every node, and in total O(n).
Is this method valid to solve the problem?
Edit: I meant that I"m starting from the largest element. And then continue adding the rest according to the order. So each element I'm adding is smaller than the rest of them so I add it to the left of root.minimum. Then all I have to do to balance the tree is a right rotation which is O(1). Is this a correct solution?
If you pick a random element as the root in the first place (which is probably not the best idea, since we know the root should be the middle element), you put root itself in the root.minimum. Then for each new element, if it is smaller than root.minimum, you do as you said and make the tree balanced in O(1) time. But what if it is larger? In that case we need to compare it with the root.minimum of the right child, and if it is also larger, with the root.minimum of the right child of the right child and so on. This might take O(k) in the worst case, which will result in O(n^2) in the end. Also, this way, you are not using the sorted property of the array.

Is a given key a member of a binary tree - probabilistic answer

The Problem:
Given a BST with N nodes, with a domain of cardinality D (domain being the possible values for the node keys).
Given a key that is in the domain but may or may not be a member of the BST.
At the start, our confidence that the node is in the tree should be 1/D, but as we go deeper into the tree both D and N are split approximately in half. That would suggest that our confidence that our key is a member of the tree should remain constant until we hit the bottom or discover the key. However, I'm not sure if that reasoning is complete, since it seems more like we are choosing N nodes from D.
I was thinking something along the lines of this, but the reasoning here still doesn't seem complete. Can somebody point me in the right direction?
Apriori, the probability that your key in is the tree is N/D.
Without loss of generality, let assume that the node's value range is [1..D].
When you walk down the tree, either:
The current node matches your key, hence P = 1
The current node has value C which is larger than your key, you go left, but you don't know how many items are in the left sub-tree. Now you can make one of these assumptions:
The tree is balanced. The range in the subtree is [1..C-1], and there are (D-1)/2 nodes in the subtree. Hence, P = ((D-1)/2)/(C-1)
The tree is not balanced. The range in the subtree is [1..C-1], and the maximum likelihood estimation for the number of nodes in the subtree is N * (C-1)/D. Hence, P = (N*(C-1)/D)/(C-1) = N/D. (no change)
If you know more about how the tree was constructed - you can make a better MLE for the number of nodes in the subtree.
The current node has value C which is smaller than your key, you go right, but you don't know how many items are in the right sub-tree.
...

Recursive Hierarchy Ranking

I have no idea if I wrote that correctly. I want to start learning higher end data mining techniques and I'm currently using SQL server and Access 2016.
I have a system that tracks ID cards. Each ID is tagged to one particular level of a security hierarchy, which has many branches.
For example
Root
-Maintenance
- Management
- Supervisory
- Manager
- Executive
- Vendors
- Secure
- Per Diem
- Inside Trades
There are many other departments like Maintenance, some simple, some with much more convoluted, hierarchies.
Each ID card is tagged to a level so in the Maintenance example, - Per Diem:Vendors:Maintenance:Root. Others may be just tagged to Vendors, Some to general Maintenance itself (No one has root, thank god).
So lets say I have 20 ID Cards selected, these are available personnel I can task to a job but since they have different area's of security I want to find a commonalities they can all work on together as a 20 person group or whatever other groupings I can make.
So the intended output would be
CommonMatch = - Per Diem
CardID = 1
CardID = 3
CommonMatch = Vendors
CardID = 1
CardID = 3
CardID = 20
So in the example above, while I could have 2 people working on -Per Diem work, because that is their lowest common security similarity, there is also card holder #20 who has rights to the predecessor group (Vendors), that 1 and 3 share, so I could have three of them work at that level.
I'm not looking for anyone to do the work for me (Although examples always welcome), more to point me in the right direction on what I should be studying, what I'm trying to do is called, etc. I know CTE's are a way to go but that seems like only a tool in a much bigger process that needs to be done.
Thank you all in advance
Well, it is not so much a graph-theory or data-mining problem but rather a data-structure problem and one that has almost solved itself.
The objective is to be able to partition the set of card IDs into disjoint subsets given a security clearance level.
So, the main idea here would be to layout the hierarchy tree and then assign each card ID to the path implied by its security level clearance. For this purpose, each node of the hierarchy tree now becomes a container of card IDs (e.g. each node of the hierarchy tree holds a) its own name (as unique identification) b) pointers to other nodes c) a list of card IDs assigned to its "name".)
Then, retrieving the set of cards with clearance UP TO a specific security level is simply a case of traversing the tree from that specific level downwards until the tree's leafs, all along collecting the card IDs from the node containers as they are encountered.
Suppose that we have access tree:
A
+-B
+-C
D
+-E
And card ID assignments:
B:[1,2,3]
C:[4,8]
E:[10,12]
At the moment, B,C,E only make sense as tags, there is no structural information associated with them. We therefore need to first "build" the tree. The following example uses Networkx but the same thing can be achieved with a multitude of ways:
import networkx
G = networkx.DiGraph() #Establish a directed graph
G.add_edge("A","B")
G.add_edge("A","C")
G.add_edge("A","D")
G.add_edge("D","E")
Now, assign the card IDs to the node containers (in Networkx, nodes can be any valid Python object so I am going to go with a very simple list)
G.node["B"]=[1,2,3]
G.node["C"]=[4,8]
G.node["E"]=[10,12]
So, now, to get everybody working under "A" (the root of the tree), you can traverse the tree from that level downwards either via Depth First Search (DFS) or Breadth First Search (BFS) and collect the card IDs from the containers. I am going to use DFS here, purely because Networkx has a function that returns the visited nodes depending on visiting order, directly.
#dfs_preorder_nodes returns a generator, this is an efficient way of iterating very large collections in Python but I am casting it to a "list" here, so that we get the actual list of nodes back.
vis_nodes = list(networkx.dfs_preorder_nodes(G,"A")); #Start from node "A" and DFS downwards
cardIDs = []
#I could do the following with a one-line reduce but it might be clearer this way
for aNodeID in vis_nodes:
if G.node[aNodeID]:
cardIDs.extend(G.node[aNodeID])
In the end of the above iteration, cardIDs will contain all card IDs from branch "A" downwards in one convenient list.
Of course, this example is ultra simple, but since we are talking about trees, the tree can be as large as you like and you are still traversing it in the same way requiring only a single point of entry (the top level branch).
Finally, just as a note, the fact that you are using Access as your backend is not necessarily an impediment but relational databases do not handle graph type data with great ease. You might get away easily for something like a simple tree (like what you have here for example), but the hassle of supporting this probably justifies undertaking this process outside of the database (e.g, use the database just for retrieving the data and carry out the graph type data processing in a different environment. Doing a DFS on SQL is the sort of hassle I am referring to above.)
Hope this helps.

Amortized runtime for insertion in scapegoat tree

I am working on the following problem, from a problem set for a course I am self studying.
I have solved the first part. I'm stuck on the second. These are my thoughts so far. I think that the proper way to rebuild the subtree rooted at v would be to traverse it once to copy the values into an array in sorted order, and then, traverse it once again to build it into a balanced binary tree. Thus, this would be linear in v.size. However, I don't see where the potential and the constant can turn this into a O(1), let alone how such a constant could depend upon alpha. As I thought the rebuild operation was independent of alpha, and alpha simply affects how often you have to rebuild? So would the alpha come out of the potential function? And then the c just serves to cancel the alpha? If so, could I have some guidance as to how to rewrite the potential function?
You don't need to rewrite the potential function. The way that c and alpha interact is in the part of (2) in which "a subtree that is not alpha-balanced". That should help you derive a lower bound on the potential of that subtree. Part (1) helps you derive an upper bound on the potential of that subtree after the rebuilding. The resulting difference in potential should help you pay for the rebuilding.
In particular, the lower bound will be something like f(c,alpha) * m for some function f. This problem wants you to find an expression for c in terms of alpha so that f(c,alpha) >= 1.