Binary Search tree time complexity - binary-search-tree

I am right now working with a Binary Search tree. I wonder what is the Tim complexity for a Binary Search tree. More Specific what is the worst case Time complexity for the operation height, leaves and toString for a Binary Search tree and why?

All three operations have a O(n) worst-case time complexity.
For height: all nodes will be visited when the tree is degenerate, and all nodes except one have exactly one child.
For leaves: each node will have to be visited in order to check whether they are a leave.
For toString: obviously all nodes need to be visited.

Related

Why is the time complexity of binary search logN but the time complexity of a BST is N?

In Algorithms, 4th edition by Robert Sedgewick, the time complexity table for different algorithms is given as:
Based on this table, the searching time complexity of a BST is N, and of binary search in and of itself is logN.
What is the difference between the two? I have seen explanations about these separately and they made sense, however, I can't seem to understand why the searching time complexity of a BST isn't logN, as we are searching by continually breaking the tree in half and ignoring the other parts.
From binary-search-trees-bst-explained-with-examples
...on average, each comparison allows the operations to skip about half of the tree, so that each lookup, insertion or deletion takes time proportional to the logarithm of the number of items stored in the tree, O(log n) . However, some times the worst case can happen, when the tree isn't balanced and the time complexity is O(n) for all three of these functions.
So, you kind of expect log(N) but it's not absolutely guaranteed.
the searching time complexity of a BST is N, and of binary search in and of itself is logN. What is the difference between the two?
The difference is that a binary search on a sorted array always starts at the middle element (i.e. the median when n is odd). This cannot be guaranteed in a BST. The root might be the middle element, but it doesn't have to be.
For instance, this is a valid BST:
10
/
8
/
5
/
2
/
1
...but it is not a balanced one, and so the process of finding the value 1 given the root of that tree, will include visiting all its nodes. If however the same values were presented in a sorted list (1,2,5,8,10), a binary search would start at 5 and never visit 8 or 10.
Adding self-balancing trees to the table
We can extend the given table with self-balancing search trees, like AVL, and then we get this:
implementation
search
insert
delete
sequential search (unordered list)
𝑁
𝑁
𝑁
binary search (ordered array)
lg𝑁
𝑁
𝑁
BST
𝑁
𝑁
𝑁
AVL
lgN
lgN
lgN

An alternative method to create an AVL tree from a sorted array in O(n) time

I need some help in this data structure homework problem. I was requested to write an algorithm that creates an AVL tree from a sorted array in O(n) time.
I read this solution method: Creating a Binary Search Tree from a sorted array
They do it recursively for the two halves of the sorted array and it works.
I found a different solution and I want to check if it's valid.
My solution is to store another property of the root called "root.minimum" that will contain a pointer to the minimum.
Then, for the k'th element, we'll add it recursively to the AVL tree of the previous k-1 elements. We know that the k'th element is smaller than the minimum, so we'll add it to the left of root.minimum to create the new tree.
Now the tree is no longer balanced, but all we need to do to fix it is just one right rotation of the previous minimum.
This way the insertion takes O(1) for every node, and in total O(n).
Is this method valid to solve the problem?
Edit: I meant that I"m starting from the largest element. And then continue adding the rest according to the order. So each element I'm adding is smaller than the rest of them so I add it to the left of root.minimum. Then all I have to do to balance the tree is a right rotation which is O(1). Is this a correct solution?
If you pick a random element as the root in the first place (which is probably not the best idea, since we know the root should be the middle element), you put root itself in the root.minimum. Then for each new element, if it is smaller than root.minimum, you do as you said and make the tree balanced in O(1) time. But what if it is larger? In that case we need to compare it with the root.minimum of the right child, and if it is also larger, with the root.minimum of the right child of the right child and so on. This might take O(k) in the worst case, which will result in O(n^2) in the end. Also, this way, you are not using the sorted property of the array.

How is AVL tree insertion O(log n) when you need to recalculate balance factors up the tree after every insertion?

I'm implementing an AVL tree, and I'm trying to wrap my head around the time complexity of the adding process. It's my understanding that in order to achieve O(log n) you need to keep either balance or height state in tree nodes so that you don't have to recalculate them every time you need them (which may require a lot of additional tree traversal).
To solve this, I have a protocol that recursively "walks back up" a trail of parent pointers to the root, balancing if needed and setting heights along the way. This way, the addition algorithm kind of has a "capture" and "bubble" phase down and then back up the tree - like DOM events.
My question is: is this still technically O(log n) time? Technically, you only deal with divisions of half at every level in the tree, but you also need to travel down and then back up every time. What is the exact time complexity of this operation?
Assuming the height of the tree is H and the structure stays balanced during all operation.
Then, as you mentioned, inserting a node will take O(H).
However, every time a node is added to the AVL tree, you need to update the height of the parents all the way up to the root node.
Since the tree is balanced, updating height will traverse only the linked-list like structure with the newly inserted node in the tail.
The height updating can be viewed equivalent to traversing a linked-list with length equals to H.
Therefore, updating height will take another O(H) and the total update time is 2 * O(H), which is still O(log N) if we get rid of the constant factor.
Hope this makes sense to you.
"Technically, you only deal with divisions of half at every level in the tree, but you also need to travel down and then back up every time. What is the exact time complexity of this operation?"
You've stated that you have to travel down and up every time.
So, we can say that your function is upper bounded by a runtime of 2 * logn.
It's clear that this is O(logn).
More specifically, we could assign the constant 3 and a starting value of 1, such that
2 * logn <= 3 * logn for all values of n >= 1.
This reduces to 2 <= 3, which is of course true.
The idea behind big-O is to understand the basic shape of the function that upper-bounds your function's runtime as the input size moves towards infinity - thus, we can drop the constant factor of 2.

Is binary search for an ordered list O(logN) in Elixir?

For an ordered list, the binary search time complexity is O(logN). However in Elixir, the list is linked list, so in order to get the middle element of the list, you have to iterate N/2 times, which make the overall search O(NLogN).
So my question is:
Is above time complexity correct?
If it's correct, the binary search wouldn't make sense in Elixir, right? You have to iterate the list to get what you want, so the best is O(N).
Yes, there is little reason to binary search over a linked list because of the reason you stated. You need a random access data structure (usually an array) for binary search to be useful.
An interesting corner case might arise where the comparison of the elements is very costly, because for example they are just handles to remotely stored items. In that case binary search through a linked list might still outperform linear search, because while requiring more operations (O(N * log(N))) it requires less comparisons (O(log(N))) while linear search requires O(N) comparisons.

Equality of two algorithms

Consider a tree of depth B (i.e.: all the paths have length B) whose nodes represent system states and edges represent actions.
Each action a in ActionSet has a gain and makes the system move from a state to another.
Performing the sequence of actions A-B-C or C-B-A (or any other permutation of these actions) brings to the same gain. Moreover:
the higher the number of actions performed before a, the lower the increase of total gain when a is asked
the gain achieved by each path cannot be greater than a quantity H, i.e.: some paths may achieve a gain that is lower than H, but whenever performing an action makes the total gain equal to H, all the other actions performed from that point on will gain 0
what is gained by the sequence of actions #b,h,j, ..., a# is g(a) (0 <= g(a) <= H)
once an action has been performed on a path from the root to a leaf, it cannot be performed a second time on the same path
Application of Algorithm1. I apply the following algorithm (A*-like):
Start from the root.
Expand the first level of the tree, which will contain all the actions in ActionSet. Each expanded action a has gain f(a) = g(a) + h(a), where g(a) is defined as stated before and h(a) is an estimate of what will be earned by performing other B-1 actions
Select the action a* that maximizes f(a)
Expand the children of a*
Iterate 2-3 until an entire path of B actions from the root to a leaf that guarantees the highest f(n) is visited. Notice that the new selected action can be selected also from the nodes which were abandoned at previous levels. E.g., if after expanding a* the node maximizing f(a) is a children of the root, it is selected as the new best node
Application of Algorithm2. Now, suppose I have a greedy algorithm that looks only to the g(n) component of the knowledge-plus-heuristic function f(n), i.e., this algorithm chooses actions according to the gain that has been already earned:
at the first step I choose the action a maximizing the gain g(a)
at the second step I choose the action b maximizing the gain g(b)
Claim. Experimental proofs showed me that the two algorithms bring to the same result, which might be mixed (e.g., the first one suggests the sequence A-B-C and the second one suggests B-C-A).
However, I didn't succeed in understanding why.
My question is: is there a formal way of proving that the two algorithms return the same result, although mixed in some cases?
Thank you.
A* search will return the optimal path. From what I understand of the problem, your greedy search is simply performing bayes calculations and wlll continue to do so until it finds an optimal set of nodes to take. Since the order of the nodes do not matter, the two should return the same set of nodes, albiet in different orders.
I think this is correct assuming you have the same set of actions you can perform from every node.