Which is a better time complexity? - time-complexity

I have to pick out which operations have a better worst case time complexity on an AVL tree than a BST. I have established that the time complexity for each operation is the same depending on the tree...
The worst case time complexity for an AVL tree is...
Insert - O(log(n))
Remove - O(log(n))
Search - O(log(n))
The worst case time complexity for a BST is....
Insert - O(height)
Remove - O(height)
Search - O(height)
So is O(log(n)) a better time complexity than O(height)?

The worst case time complexity on a BST for insert, remove, and search is O(n), where n is the number of nodes in the BST. You can trigger this worst case by inserting nodes into a BST in an order such that the BST is essentially a linked list (e.g. first insert 1, then insert 2, then insert 3, and so on... you will end up with a BST that looks like 1 -> 2 -> 3...).
O(log(n)) is a better time complexity than O(n).

O(log(n)) is best case scenario for O(height). Height of your binary tree can be any integer between log(n) and n where n denotes the number of nodes.
For example if you have a BST where each node has only right child, it is the same as if it were a linked list, thus having O(n) worst case complexity for all three operations.
On the other hand AVL is self-balancing binary search tree, meaning every two sub-trees from any node have the same depth (height) +- constant. That means you are approximately halving the values at every step, thus getting O(log(n)) complexity, which is also your O(height) complexity.

An AVL tree is basically a height balanced BST.
If you consider a full AVL tree, log n (AVL tree) > log n (BST).
-> where n is the number of nodes.
whereas when you consider O (height), it'll be the same in both the AVL and BST.
3
\
5
(BST)
height = 2 , n = 2
3
/ \
2 5
(AVL)
height = 2, n = 3

Related

Confused about time complexity of recursive function

For the following algorithm (this algorithm doesn't really do anything useful besides being an exercise in analyzing time complexity):
const dib = (n) => {
if (n <= 1) return;
dib(n-1);
dib(n-1);
I'm watching a video where they say the time complexity is O(2^n). If I count the nodes I can see they're right (the tree has around 32 nodes) however in my head I thought it would be O(n*2^n) since n is the height of the tree and each level has 2^n nodes. Can anyone point out the flaw in my thinking?
Each tree has 2^i nodes, not 2^n.
So each level has 2^(i-1) nodes: 1 + 2 + 4 + 8 ... 2^n.
The deepest level is the decider in the complexity.
The total number of nodes beneath any level > 1 is 1 + 2*f(i-1) .
This is 2^n - 1.
Derek's answer is great, it gives intuition behind the estimation, if you want formal proof, you can use the Master theorem for Decreasing functions.
The master theorem is a formula for solving recurrences of the form
T(n) = aT(n - b) + f(n), where a ≥ 1 and b > 0 and f(n) is
asymptotically positive. (Asymptotically
positive means that the function is positive for all sufficiently large n.
Recurrent formula for above algorithm is T(n) = 2*T(n-1) + O(1). Do you see why? You can see solution for various cases (a=1, a>1, a<1) here http://cs.uok.edu.in/Files/79755f07-9550-4aeb-bd6f-5d802d56b46d/Custom/Ten%20Master%20Method.pdf
For our case a>1, so T(n) = O(a^(n/b) * f(n)) or O (a^(n/b) * n^k ) and gives O(2^n)

If get inorder successor in BST takes O(h), why does iterative inorder traversal take O(n) when calling an O(h) function n times?

In a BST, it takes O(h) time complexity to get the inorder successor of a given node, so given getNext(), which gets the inorder successor of the current node, you would need n calls to getNext() to traverse the tree, giving O(nh) time complexity.
However, iterative inorder traversal of BST's are given in books to take O(n) time. I'm confused why there's a difference.
(Nodes have parent pointers).
Getting the inorder successor is O(h), but not Ө(h). Half of the nodes will have a successor just one link away. Let’s think about the number of pointer dereferences it takes to traverse a given node:
the number of pointer dereferences to traverse the left subtree, plus
the number to ascend back from the left subtree’s rightmost node = the height of the left subtree (at most), plus
one for the node itself, plus
the number to descend to the right subtree’s rightmost node = the height of the right subtree (at most), plus
the number for the right subtree.
So an upper bound is:
f(0) = 1
f(h) = f(h − 1) + (h − 1) + h + (h − 1) + f(h − 1)
    
 = 2f(h − 1) + 3h − 2
f(h) = 5·2h − 3h − 4
and h being ⌈log₂ n⌉, f is O(n).

Why Are Time Complexities Like O(N + N) Equal To O(N)? [duplicate]

This question already has answers here:
Why is the constant always dropped from big O analysis?
(7 answers)
Closed 2 years ago.
I commonly use a site called LeetCode for practice on problems. On a lot of answers in the discuss section of a problem, I noticed that run times like O(N + N) or O(2N) gets changed to O(N). For example:
int[] nums = {1, 2, 3, 4, 5};
for(int i = 0; i < nums.length; i ++) {
System.out.println(nums[i]);
}
for(int i = 0; i < nums.length; i ++) {
System.out.println(nums[i]);
}
This becomes O(N), even though it iterates through nums twice. Why is it not O(2N) or O(N + N)?
In time complexity, constant coefficients do not play a role. This is because the actual time it takes an algorithm to run depends also on the physical constraints of the machine. This means that if you run your code on a machine which is twice as fast as another, all other conditions being equal, it would run in about half the time with the same input.
But that’s not the same thing when you compare two algorithms with different time complexities. For example, when you compare the running time of an algorithm of O( N ^ 2 ) to an algorithm of O(N), the running time of O( N ^ 2 ) grows so fast with the growth of input size that the O(N) one cannot catch up with it, no matter how big you choose its constant coefficient.
Let’s say your constant coefficient is 1000, instead of just 2, then for input sizes of ( N > 1000 ) the running time of O( N ^ 2 ) algorithm becomes a proportion of ( N * N ) while N would be growing proportional to the input size, while the running time of the O(N) algorithm only remains proportional to ( 1000 * N ).
Time complexity for O(n+n) reduces to O(2n). Now 2 is a constant. So the time complexity will essentially depend on n.
Hence the time complexity of O(2n) equates to O(n).
Also if there is something like this O(2n + 3) it will still be O(n) as essentially the time will depend on the size of n.
Now suppose there is a code which is O(n^2 + n), it will be O(n^2) as when the value of n increases the effect of n will become less significant compared to effect of n^2.

time complexity for loop justification

Hi could anyone explain why the first one is True and second one is False?
First loop , number of times the loop gets executed is k times,
Where for a given n, i takes values 1,2,4,......less than n.
2 ^ k <= n
Or, k <= log(n).
Which implies , k the number of times the first loop gets executed is log(n), that is time complexity here is O(log(n)).
Second loop does not get executed based on p as p is not used in the decision statement of for loop. p does take different values inside the loop, but doesn't influence the decision statement, number of times the p*p gets executed, its time complexity is O(n).
O(logn):
for(i=0;i<n;i=i*c){// Any O(1) expression}
Here, time complexity is O(logn) when the index i is multiplied/divided by a constant value.
In the second case,
for(p=2,i=1,i<n;i++){ p=p*p }
The incremental increase is constant i.e i=i+1, the loop will run n times irrespective of the value of p. Hence the loop alone has a complexity of O(n). Considering naive multiplication p = p*p is an O(n) expression where n is the size of p. Hence the complexity should be O(n^2)
Let me summarize with an example, suppose the value of n is 8 then the possible values of i are 1,2,4,8 as soon as 8 comes look will break. You can see loop run for 3 times i.e. log(n) times as the value of i keeps on increasing by 2X. Hence, True.
For the second part, its is a normal loop which runs for all values of i from 1 to n. And the value of p is increasing be the factor p^2n. So it should be O(p^2n). Thats why it is wrong.
In order to understand why some algorithm is O(log n) it is enough to check what happens when n = 2^k (i.e., we can restrict ourselves to the case where log n happens to be an integer k).
If we inject this into the expression
for(i=1; i<2^k; i=i*2) s+=i;
we see that i will adopt the values 2, 4, 8, 16,..., i.e., 2^1, 2^2, 2^3, 2^4,... until reaching the last one 2^k. In other words, the body of the loop will be evaluated k times. Therefore, if we assume that the body is O(1), we see that the complexity is k*O(1) = O(k) = O(log n).

Segment tree - query complexity

I am having problems with understanding segment tree complexity. It is clear that if you have update function which has to change only one node, its complexity will be log(n).
But I have no idea why complexity of query(a,b), where (a,b) is interval that needs to be checked, is log(n).
Can anyone provide me with intuitive / formal proof to understand this?
There are four cases when query the interval (x,y)
FIND(R,x,y) //R is the node
% Case 1
if R.first = x and R.last = y
return {R}
% Case 2
if y <= R.middle
return FIND(R.leftChild, x, y)
% Case 3
if x >= R.middle + 1
return FIND(R.rightChild, x, y)
% Case 4
P = FIND(R.leftChild, x, R.middle)
Q = FIND(R.rightChild, R.middle + 1, y)
return P union Q.
Intuitively, first three cases reduce the level of tree height by 1, since the tree has height log n, if only first three cases happen, the running time is O(log n).
For the last case, FIND() divide the problem into two subproblems. However, we assert that this can only happen at most once. After we called FIND(R.leftChild, x, R.middle), we are querying R.leftChild for the interval [x, R.middle]. R.middle is the same as R.leftChild.last. If x > R.leftChild.middle, then it is Case 1; if x <= R.leftChild, then we will call
FIND ( R.leftChild.leftChild, x, R.leftChild.middle );
FIND ( R.leftChild.rightChild, R.leftChild.middle + 1, , R.leftChild.last );
However, the second FIND() returns R.leftChild.rightChild.sum and therefore takes constant time, and the problem will not be separate into two subproblems (strictly speaking, the problem is separated, though one subproblem takes O(1) time to solve).
Since the same analysis holds on the rightChild of R, we conclude that after case4 happens the first time, the running time T(h) (h is the remaining level of the tree) would be
T(h) <= T(h-1) + c (c is a constant)
T(1) = c
which yields:
T(h) <= c * h = O(h) = O(log n) (since h is the height of the tree)
Hence we end the proof.
This is my first time to contribute, hence if there are any problems, please kindly point them out and I would edit my answer.
A range query using a segment tree basically involves recursing from the root node. You can think of the entire recursion process as a traversal on the segment tree: any time a recursion is needed on a child node, you are visiting that child node in your traversal. So analyzing the complexity of a range query is equivalent to finding the upper bound for the total number of nodes that are visited.
It turns out that at any arbitrary level, there are at most 4 nodes that can be visited. Since the segment tree has a height of log(n) and that at any level there are at most 4 nodes that can be visited, the upper bound is actually 4*log(n). The time complexity is therefore O(log(n)).
Now we can prove this with induction. The base case is at the first level where the root node lies. Since the root node has at most two child nodes, we can only visit at most those two child nodes, which is at most 4 nodes.
Now suppose it is true that at an arbitrary level (say level i) we visit at most 4 nodes. We want to show that we will visit at most 4 nodes at the next level (level i+1) as well. If we had visited only 1 or 2 nodes at level i, it's trivial to show that at level i+1 we will visit at most 4 nodes because each node can have at most 2 child nodes.
So let's focus on the assumption that 3 or 4 nodes were visited at level i, and try to show that at level i+1 we can also have at most 4 visited nodes. Now since the range query is asking for a contiguous range, we know that the 3 or 4 nodes visited at level i can be categorized into 3 partitions of nodes: a leftmost single node whose segment range is only partially covered by the query range, a rightmost single node whose segment range is only partially covered by the query range, and 1 or 2 middle nodes whose segment range is fully covered by the query range. Since the middle nodes have their segment range(s) fully covered by the query range, there would be no recursion at the next level; we just use their precomputed sums. We are left with possible recursions on the leftmost node and the rightmost node at the next level, which is obviously at most 4.
This completes the proof by induction. We have proven that at any level at most 4 nodes are visited. The time complexity for a range query is therefore O(log(n)).
An interval of length n can be represented by k nodes where k <= log(n)
We can prove it based on how the binary system works.