Why B Tree complexity is O(log n), it is not a binary tree - time-complexity

According to Wiki and GFG the search/insert/delete time complexity of a B Tree is O(log n). A B Tree can have > 2 children, i.e. it is not a binary tree. So I don't understand why it is log n -- shouldn't it be faster than log n? For example search should be worst case O(h) where h is the height of the tree.

B-Tree is a generalization of Binary Tree where each node can have more than 2 children. But it is not certain. If for example, the number of children for each node was defined to be x, then the complexity would be . However, when the minimum number of children is 2 (as in Binary Tree) then the maximum height of tree will be , and as mentioned in previous answer, Big-O considers the worst case scenario which is a tree with the largest height (log base 2). Therefore, the complexity of B-Tree is .

yes, it is not a binary tree. But if we perform binary serach algoritm inside a node (for the kyes inside a node) time complexity can be considered as O(logn).
let's consider,
degree of the B-tree (maximum number of children per node) ≤ m.
total number of nodes in the node : n
if the case is like above,
height of the tree is O(logmn) ----------(1)
since number of children can be changed per node, will have to do a logarithmic search per node order (lgm) ----------(2)
so total complexity for search in binary tree is
O(logmn) . O(lgm) ----------(3)
according to the logarithmic operation ,
{logba = logca / logcb}
applying the above operation to (3)
O(logmn) . O(lgm) = O(lgn/lgm) . O(lgm)
= O(lgn)
so B tree time complexity for search operation is O(lgn)

Big-O is a measure of worst case complexity; since B-tree nodes are not required to have more than 2 children, the worst case will be that no node has more than 2 children.

Related

Time Complexity Analysis of Recursive Tree Treversal Algorithm

I am supposed to design a recursive algorithm that traverses a tree and sets the cardinality property for every node. The cardinality property is the number of nodes that are in the subtree where the currently traversed node is the root.
Here's my algorithm in pseudo/ python code:
SetCardinality(node)
if node exists:
node.card = 1 + SetCardinality(x.left_child) + SetCardinality(x.right_child)
return node.card
else:
return 0
I'm having a hard time coming up with the recurrence relation that describes this function. I figured out that the worst-case input would be a tree of height = n. I saw on the internet that a recurrent relation for such a tree in this algorithm might be:
T(n) = T(n-1) + n but I don't know how the n in the relation corresponds to the algorithm.
You have to ask yourself: How many nodes does the algorithm visit? You will notice that if you run your algorithm on the root node, it will visit each node exactly once, which is expected as it is essentially a depth-first search.
Therefore, if the rest of your algorithm is constant-time operations, we have a time complexity of O(n) for the total number of nodes n.
Now, if you want to express it in terms of the height of the tree, you need to know more about the given tree. If it's a complete binary tree then the height is O(logn) and therefore the time complexity would be O(2h). But expressing it in terms of total nodes is simpler. Notice also that the shape of the tree does not really matter for your time complexity, as you will be visiting each node exactly once regardless.

What sorts of algorithmic patterns are polylogarithmic? [duplicate]

This earlier question addresses some of the factors that might cause an algorithm to have O(log n) complexity.
What would cause an algorithm to have time complexity O(log log n)?
O(log log n) terms can show up in a variety of different places, but there are typically two main routes that will arrive at this runtime.
Shrinking by a Square Root
As mentioned in the answer to the linked question, a common way for an algorithm to have time complexity O(log n) is for that algorithm to work by repeatedly cut the size of the input down by some constant factor on each iteration. If this is the case, the algorithm must terminate after O(log n) iterations, because after doing O(log n) divisions by a constant, the algorithm must shrink the problem size down to 0 or 1. This is why, for example, binary search has complexity O(log n).
Interestingly, there is a similar way of shrinking down the size of a problem that yields runtimes of the form O(log log n). Instead of dividing the input in half at each layer, what happens if we take the square root of the size at each layer?
For example, let's take the number 65,536. How many times do we have to divide this by 2 until we get down to 1? If we do this, we get
65,536 / 2 = 32,768
32,768 / 2 = 16,384
16,384 / 2 = 8,192
8,192 / 2 = 4,096
4,096 / 2 = 2,048
2,048 / 2 = 1,024
1,024 / 2 = 512
512 / 2 = 256
256 / 2 = 128
128 / 2 = 64
64 / 2 = 32
32 / 2 = 16
16 / 2 = 8
8 / 2 = 4
4 / 2 = 2
2 / 2 = 1
This process takes 16 steps, and it's also the case that 65,536 = 216.
But, if we take the square root at each level, we get
√65,536 = 256
√256 = 16
√16 = 4
√4 = 2
Notice that it only takes four steps to get all the way down to 2. Why is this?
First, an intuitive explanation. How many digits are there in the numbers n and √n? There are approximately log n digits in the number n, and approximately log (√n) = log (n1/2) = (1/2) log n digits in √n. This means that, each time you take a square root, you're roughly halving the number of digits in the number. Because you can only halve a quantity k O(log k) times before it drops down to a constant (say, 2), this means you can only take square roots O(log log n) times before you've reduced the number down to some constant (say, 2).
Now, let's do some math to make this rigorous. Le'ts rewrite the above sequence in terms of powers of two:
√65,536 = √216 = (216)1/2 = 28 = 256
√256 = √28 = (28)1/2 = 24 = 16
√16 = √24 = (24)1/2 = 22 = 4
√4 = √22 = (22)1/2 = 21 = 2
Notice that we followed the sequence 216 → 28 → 24 → 22 → 21. On each iteration, we cut the exponent of the power of two in half. That's interesting, because this connects back to what we already know - you can only divide the number k in half O(log k) times before it drops to zero.
So take any number n and write it as n = 2k. Each time you take the square root of n, you halve the exponent in this equation. Therefore, there can be only O(log k) square roots applied before k drops to 1 or lower (in which case n drops to 2 or lower). Since n = 2k, this means that k = log2 n, and therefore the number of square roots taken is O(log k) = O(log log n). Therefore, if there is algorithm that works by repeatedly reducing the problem to a subproblem of size that is the square root of the original problem size, that algorithm will terminate after O(log log n) steps.
One real-world example of this is the van Emde Boas tree (vEB-tree) data structure. A vEB-tree is a specialized data structure for storing integers in the range 0 ... N - 1. It works as follows: the root node of the tree has √N pointers in it, splitting the range 0 ... N - 1 into √N buckets each holding a range of roughly √N integers. These buckets are then each internally subdivided into √(√ N) buckets, each of which holds roughly √(√ N) elements. To traverse the tree, you start at the root, determine which bucket you belong to, then recursively continue in the appropriate subtree. Due to the way the vEB-tree is structured, you can determine in O(1) time which subtree to descend into, and so after O(log log N) steps you will reach the bottom of the tree. Accordingly, lookups in a vEB-tree take time only O(log log N).
Another example is the Hopcroft-Fortune closest pair of points algorithm. This algorithm attempts to find the two closest points in a collection of 2D points. It works by creating a grid of buckets and distributing the points into those buckets. If at any point in the algorithm a bucket is found that has more than √N points in it, the algorithm recursively processes that bucket. The maximum depth of the recursion is therefore O(log log n), and using an analysis of the recursion tree it can be shown that each layer in the tree does O(n) work. Therefore, the total runtime of the algorithm is O(n log log n).
O(log n) Algorithms on Small Inputs
There are some other algorithms that achieve O(log log n) runtimes by using algorithms like binary search on objects of size O(log n). For example, the x-fast trie data structure performs a binary search over the layers of at tree of height O(log U), so the runtime for some of its operations are O(log log U). The related y-fast trie gets some of its O(log log U) runtimes by maintaining balanced BSTs of O(log U) nodes each, allowing searches in those trees to run in time O(log log U). The tango tree and related multisplay tree data structures end up with an O(log log n) term in their analyses because they maintain trees that contain O(log n) items each.
Other Examples
Other algorithms achieve runtime O(log log n) in other ways. Interpolation search has expected runtime O(log log n) to find a number in a sorted array, but the analysis is fairly involved. Ultimately, the analysis works by showing that the number of iterations is equal to the number k such that n2-k ≤ 2, for which log log n is the correct solution. Some algorithms, like the Cheriton-Tarjan MST algorithm, arrive at a runtime involving O(log log n) by solving a complex constrained optimization problem.
One way to see factor of O(log log n) in time complexity is by division like stuff explained in the other answer, but there is another way to see this factor, when we want to make a trade of between time and space/time and approximation/time and hardness/... of algorithms and we have some artificial iteration on our algorithm.
For example SSSP(Single source shortest path) has an O(n) algorithm on planar graphs, but before that complicated algorithm there was a much more easier algorithm (but still rather hard) with running time O(n log log n), the base of algorithm is as follow (just very rough description, and I'd offer to skip understanding this part and read the other part of the answer):
divide graph into the parts of size O(log n/(log log n)) with some restriction.
Suppose each of mentioned part is node in the new graph G' then compute SSSP for G' in time O(|G'|*log |G'|) ==> here because |G'| = O(|G|*log log n/log n) we can see the (log log n) factor.
Compute SSSP for each part: again because we have O(|G'|) part and we can compute SSSP for all parts in time |n/logn| * |log n/log logn * log (logn /log log n).
update weights, this part can be done in O(n).
for more details this lecture notes are good.
But my point is, here we choose the division to be of size O(log n/(log log n)). If we choose other divisions like O(log n/ (log log n)^2) which may runs faster and brings another result. I mean, in many cases (like in approximation algorithms or randomized algorithms, or algorithms like SSSP as above), when we iterate over something (subproblems, possible solutions, ...), we choose number of iteration corresponding to the trade of that we have (time/space/complexity of algorithm/ constant factor of the algorithm,...). So may be we see more complicated stuffs than "log log n" in real working algorithms.

Breadth first or Depth first

There is a theory that says six degrees of seperations is the highest
degree for people to be connected through a chain of acquaintances.
(You know the Baker - Degree of seperation 1, the Baker knows someone
you don't know - Degree of separation 2)
We have a list of People P, list A of corresponding acquaintances
among these people, and a person x
We are trying to implement an algorithm to check if person x respects
the six degrees of separations. It returns true if the distance from x
to all other people in P is at most six, false otherwise.
We are tying to accomplish O(|P| + |A|) in the worst-case.
To implement this algorithm, I thought about implementing an adjacency list over an adjacency matrix to represent the Graph G with vertices P and edges A, because an Adjacency Matrix would take O(n^2) to traverse.
Now I thought about using either BFS or DFS, but I can't seem to find a reason as to why the other is more optimal for this case.
I want to use BFS or DFS to store the distances from x in an array d, and then loop over the array d to look if any Degree is larger than 6.
DFS and BFS have the same Time Complexity, but Depth is better(faster?) in most cases at finding the first Degree larger than 6, whereas Breadth is better at excluding all Degrees > 6 simultaneously.
After DFS or BFS I would then loop over the array containing the distances from person x, and return true if there was no entry >6 and false when one is found.
With BFS, the degrees of separations would always be at the end of the Array, which would maybe lead to a higher time complexity?
With DFS, the degrees of separations would be randomly scattered in the Array, but the chance to have a degree of separation higher than 6 early in the search is higher.
I don't know if it makes any difference to the Time Complexity if using DFS or BFS here.
Time complexity of BFS and DFS is exactly the same. Both methods visit all connected vertices of the graph, so in both cases you have O(V + E), where V is the number of vertices and E is the number of edges.
That being said, sometimes one algorithm can be preferred over the other precisely because the order of vertex visitation is different. For instance, if you were to evaluate a mathematical expression, DFS would be much more convenient.
In your case, BFS could be used to optimize graph traversal, because you can simply cut-off BFS at the required degree of separation level. All the people who have the required (or bigger) degree of separation would be left unmarked as visited.
The same trick would be much more convoluted to implement with DFS, because as you've astutely noticed, DFS first gets "to the bottom" of the graph, and then it goes back recursively (or via stack) up level by level.
I believe that you can use the the Dijkstra algorithm.
Is a BFS approach that updates your path, is the path have a smaller value. Think as distance have always a cost of 1 and, if you have two friends (A and B) for a person N.
Those friends have a common friend C but, in a first time your algorithm checks a distance for friend A with cost 4 and mark as visited, they can't check the friend B that maybe have a distance of 3. The Dijkstra will help you doing checking this.
The Dijkstra solve this in O(|V|+|E|log|V)
See more at https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

What is the worst case complexity of the given program?

program takes as input a balanced binary search tree with n leaf nodes and computes the value of a function g(x) for each node x. If the cost of computing g(x) is min{no. of leaf-nodes in left-subtree of x, no. of leaf-nodes in right-subtree of x} then the worst-case time complexity of the program is
My solution :
For each internal node the maximum cost would be n/2. For every leaf node the cost is zero.
And number of internal nodes for a balanced binary tree are : leaf nodes - 1
So the total cost will be :
n/2 * (n-1)
O(n^2)
Am I right?
I have a different solution. For the worst case the tree would happen to be a complete tree which you GOT RIGHT and therefore the number of leaf nodes would be n/2.
But it is the case for root node.
For nodes at level 1: total cost would be 2*n/4=n/2
For nodes at level 2: total cost would be 4*n/8=n/2
and so on till last level which is log(n) and cost would again be = n/2
Total Cost therefore in worst case is =n/2*log (n) = O(n*logn), In complete tree it can happen that the last level is not completely filled but in asymptotic analysis we ignore such intricate details.
You can calculate g(x) of each node using divide and conquer technique. Solve left and right subtree recursively. Combine the results to obtain your solution.
Recurrence relation for time complexity is
T(n) = 2*T(n/2) + O(1)
which gives time complexity O(n)

Using red-black tree for non-linear optimization

Suppose we have finite data set {x_i, y_i}.
I am looking for an efficient data structure for the data set, such that given a,b it will be possible to find efficiently x,y such that x > a, y > b and x*y is minimal.
Can it be done using a red black tree ?
Can we do it in complexity O(log n)?
Well, without a preprocessing step, of course you can't do it in O(log n) with any data structure, because that's not enough time to look at all the data. So I'll assume you mean "after preprocessing".
Red-black trees are intended for single-dimension sorting, and so are unlikely to be of much help here. I would expect a kD-tree to work well here, using a nearest-neighbor-esque query: You perform a depth-first traversal of the tree, skipping branches whose bounding rectangle either violates the x and y conditions or cannot contain a lower product than the lowest admissible product already found. To speed things up further, each node in the kD-tree could additionally hold the lowest product among any of its descendants; intuitively I would expect that to have a practical benefit, but not to improve the worst-case time complexity.
Incidentally, red-black trees aren't generally useful for preprocessed data. They're designed for efficient dynamic updates, which of course you won't be doing on a per-query basis. They offer the same asymptotic depth guarantees as other sorted trees, but with a higher constant factor.