Median of Medians using blocks of 3 - why is it not linearic? - time-complexity

I understand why, in worst case, where T is the running time of the algorithm, that using the median of medians algorithm with blocks of size three gives a recurrence relation of
T(n) = T(2n / 3) + T(n / 3) + O(n)
The Wikipedia article for the median-of-medians algorithm says that with blocks of size three the runtime is not O(n) because it still needs to check all n elements. I don't quite understand this explanation, and in my homework it says I need to show it by induction.
How would I show that median-of-medians takes time Ω(n log n) in this case?

Since this is a homework problem I'm going to let you figure out a rigorous proof of this result on your own, but it might be helpful to think about this one by looking at the shape of the recursion tree, which will be something like this:
n Total work: n
2n/3 n/3 Total work: n
4n/9 2n/9 2n/9 n/9 Total work: n
Essentially, each node's children collectively will do the exact same amount of work as the node itself, so if you sum up the work done across the layers, you should see roughly linear work done per level. It won't be exactly linear work per level because eventually the smaller call starts to bottom out, but for the top layers you'll see this pattern hold.
You can formalize this by induction by guessing that the runtime is something of the form cn log n, possibly with some lower-order terms added in, but (IMHO) it's more important and instructive to see where the runtime comes from than it is to be able to prove it inductively.

If we add the fractional parts of T(2n/3) and T(n/3), get T(n). Then, using the Master theorem, we have n^(log_(b)(a)) = n^(log_(1)(1)) = n. We also have f(n) = O(n). So n^(log_(b)(a)) = O(n) = Theta(f(n)), thus Case 2 of the Master theorem applies. Thus T(n) = Theta(n^(log_(b)(a)) * log(n)) = Theta(n*log(n)).

Related

When analyzing the worst case time complexity of search algorithms using Big O notation, why is the variable representing the input nonexistent?

Thanks for your willingness to help.
Straight to the point, I'm confused with the use of Big O notation when analyzing the worst case time complexity of search algorithms.
For example, the worst case time complexity of Alpha-Beta Pruning is O(b^d) where ^ means ~ to the power of ~, b representing the average branching factor and d representing the depth of the search tree.
I do get that the worst case time complexity would be less or equal to a positive constant multiplied by b^d, but why is the use of big O notation permitted here? Where did the variable n, the input size, go? I do know that the input of same size might cause significant difference in time complexity of an algorithm.
All of the research I've done only explains "the use of big o notation in the analysis of worst case time complexity" in terms of the growth function, a function that has variable y as time complexity and variable x as input size. There are also formal definitions of big o notation, which make me even more confused with the question above. definition 1definition 2
Any attempts to answer my question would be greatly appreciated.
The input size you refer here to n is in this case d. If n is the amount of entries in your tree, d can be calculated by ln_2(n), assuming your tree is a balanced binary tree.
Big O notation implies that you are discussing what the runtime would be for a very large n. In the case you noted, O(b^d), the n is the variable that changes with input size. In this case, d would be your n. As you've found, some notations make use of many variables.
n is just a general term for the number of elements, but runtime could vary on many factors- depth of a tree, or a different list entirely. For example, to traverse lists like this:
for n in firstList:
for k in secondList:
do stuff
the cost would be O(n*k).

Determine the running time of an algorithm with two parameters

I have implemented an algorithm that uses two other algorithms for calculating the shortest path in a graph: Dijkstra and Bellman-Ford. Based on the time complexity of the these algorithms, I can calculate the running time of my implementation, which is easy giving the code.
Now, I want to experimentally verify my calculation. Specifically, I want to plot the running time as a function of the size of the input (I am following the method described here). The problem is that I have two parameters - number of edges and number of vertices.
I have tried to fix one parameter and change the other, but this approach results in two plots - one for varying number of edges and the other for varying number of vertices.
This leads me to my question - how can I determine the order of growth based on two plots? In general, how can one experimentally determine the running time complexity of an algorithm that has more than one parameter?
It's very difficult in general.
The usual way you would experimentally gauge the running time in the single variable case is, insert a counter that increments when your data structure does a fundamental (putatively O(1)) operation, then take data for many different input sizes, and plot it on a log-log plot. That is, log T vs. log N. If the running time is of the form n^k you should see a straight line of slope k, or something approaching this. If the running time is like T(n) = n^{k log n} or something, then you should see a parabola. And if T is exponential in n you should still see exponential growth.
You can only hope to get information about the highest order term when you do this -- the low order terms get filtered out, in the sense of having less and less impact as n gets larger.
In the two variable case, you could try to do a similar approach -- essentially, take 3 dimensional data, do a log-log-log plot, and try to fit a plane to that.
However this will only really work if there's really only one leading term that dominates in most regimes.
Suppose my actual function is T(n, m) = n^4 + n^3 * m^3 + m^4.
When m = O(1), then T(n) = O(n^4).
When n = O(1), then T(n) = O(m^4).
When n = m, then T(n) = O(n^6).
In each of these regimes, "slices" along the plane of possible n,m values, a different one of the terms is the dominant term.
So there's no way to determine the function just from taking some points with fixed m, and some points with fixed n. If you did that, you wouldn't get the right answer for n = m -- you wouldn't be able to discover "middle" leading terms like that.
I would recommend that the best way to predict asymptotic growth when you have lots of variables / complicated data structures, is with a pencil and piece of paper, and do traditional algorithmic analysis. Or possibly, a hybrid approach. Try to break the question of efficiency into different parts -- if you can split the question up into a sum or product of a few different functions, maybe some of them you can determine in the abstract, and some you can estimate experimentally.
Luckily two input parameters is still easy to visualize in a 3D scatter plot (3rd dimension is the measured running time), and you can check if it looks like a plane (in log-log-log scale) or if it is curved. Naturally random variations in measurements plays a role here as well.
In Matlab I typically calculate a least-squares solution to two-variable function like this (just concatenates different powers and combinations of x and y horizontally, .* is an element-wise product):
x = log(parameter_x);
y = log(parameter_y);
% Find a least-squares fit
p = [x.^2, x.*y, y.^2, x, y, ones(length(x),1)] \ log(time)
Then this can be used to estimate running times for larger problem instances, ideally those would be confirmed experimentally to know that the fitted model works.
This approach works also for higher dimensions but gets tedious to generate, maybe there is a more general way to achieve that and this is just a work-around for my lack of knowledge.
I was going to write my own explanation but it wouldn't be any better than this.

Master theorem with logn

Here's a problem.
I am really confused about the c being equal to 0.5 part. Actually overall I am confused how the logn can become n^(0.5). Couldn't I just let c be equal to 100 which would mean 100 < d which results in a different case being used? What am I missing here?
You of course could set c = 100 so that n^c is a (very, veeery) rough asymptotical upper bound to log(n), but this would give you a horrendous and absolutely useless estimate on your runtime T(n).
What it tells you, is that: every polynomial function n^c grows faster than the logarithm, no matter how small c is, as long as it remains positive. You could take c=0.0000000000001, it would seem to grow ridiculously small in the beginning, but at some point it would become larger than log(n) and diverge to infinity much faster than log(n) does. Therefore, in order to get rid of the n^2 log(n) term and being able to apply the polynomial-only version of the Master theorem, you upper bound the logarithmic term by something that grows slowly enough (but still faster than log(n)). In this example, n^c with c=0.5 is sufficient, but you could also take c=10^{-10000} "just to make sure".
Then you apply the Master theorem, and get a reasonable (and sharp) asymptotic upper bound for your T(n).

Asymptotic Running Time

for i = 1....n do
j=1
while j*j<=i do j=j+1
I need to find the asysmptotic running time in theta(?) notation.
I found that
3(1) + 5(2) + 7(3) + 9(4).....+.......
and I tried to find the answere using the summation by parts.
but I couldn't....Can anyone explain or give me some clue.
The overall complexity of the code snippet can be rewritten as:
for i = 1 to n
do for j = 1 to floor(sqrt(n))
Hence, we get the overall complexity as sigma of sqrt(i) when i varies from 1 to n.
Unfortunately, there is no elementary formula for a series of sum of square roots, so we have to depend on integration.
Integration of sqrt(i) with limits would be n sqrt(n) (Ignoring constant factors).
Hence, the overall time complexity of the loop is n sqrt(n).
Using Sigma notation, you may proceed methodically:
To obtain theta, you should find out the formula of the summation of floored square root of i (which is not obvious).
To be safe, I chose Big Oh.

does every algorithm have Big Omega?

does every algorithm have Big Omega?
Is it possible for algorithms to have both Big O and Big Omega (but not equal to each other- not Big Theta) ?
For instance Quicksort's Big O - O(n log n) But does it have Big Omega? If it does, how do i calculate it?
First, it is of paramount importance that one not confuse the bound with the case. A bound - like Big-Oh, Big-Omega, Big-Theta, etc. - says something about a rate of growth. A case says something about the kinds of input you're currently considering being processed by your algorithm.
Let's consider a very simple example to illustrate the distinction above. Consider the canonical "linear search" algorithm:
LinearSearch(list[1...n], target)
1. for i := 1 to n do
2. if list[i] = target then return i
3. return -1
There are three broad kinds of cases one might consider: best, worst, and average cases for inputs of size n. In the best case, what you're looking for is the first element in the list (really, within any fixed number of the start of the list). In such cases, it will take no more than some constant amount of time to find the element and return from the function. Therefore, the Big-Oh and Big-Omega happen to be the same for the best case: O(1) and Omega(1). When both O and Omega apply, we also say Theta, so this is Theta(1) as well.
In the worst case, the element is not in the list, and the algorithm must go through all n entries. Since f(n) = n happens to be a function that is bound from above and from below by the same class of functions (linear ones), this is Theta(n).
Average case analysis is usually a bit trickier. We need to define a probability space for viable inputs of length n. One might say that all valid inputs (where integers can be represented using 32 bits in unsigned mode, for instance) are equally probable. From that, one could work out the average performance of the algorithm as follows:
Find the probability that target is not represented in the list. Multiply by n.
Given that target is in the list at least once, find the probability that it appears at position k for each 1 <= k <= n. Multiply each P(k) by k.
Add up all of the above to get a function in terms of n.
Notice that in step 1 above, if the probability is non-zero, we will definitely get at least a linear function (exercise: we can never get more than a linear function). However, if the probability in step 1 is indeed zero, then the assignment of probabilities in step 2 makes all the difference in determining the complexity: you can have best-case behavior for some assignments, worst-case for others, and possibly end up with behavior that isn't the same as best (constant) or worst (linear).
Sometimes, we might speak loosely of a "general" or "universal" case, which considers all kinds of input (not just the best or the worst), but that doesn't give any particular weighting to inputs and doesn't take averages. In other words, you consider the performance of the algorithm in terms of an upper-bound on the worst-case, and a lower-bound on the best-case. This seems to be what you're doing.
Phew. Now, back to your question.
Are there functions which have different O and Omega bounds? Definitely. Consider the following function:
f(n) = 1 if n is odd, n if n is even.
The best case is "n is odd", in which case f is Theta(1); the worst case is "n is even", in which case f is Theta(n); and if we assume for the average case that we're talking about 32-bit unsigned integers, then f is Theta(n) in the average case, as well. However, if we talk about the "universal" case, then f is O(n) and Omega(1), and not Theta of anything. An algorithm whose runtime behaves according to f might be the following:
Strange(list[1...n], target)
1. if n is odd then return target
2. else return LinearSearch(list, target)
Now, a more interesting question might be whether there are algorithms for which some case (besides the "universal" case) cannot be assigned some valid Theta bound. This is interesting, but not overly so. The reason is that you, during your analysis, are allowed to choose the cases that constitutes best- and worst-case behavior. If your first choice for the case turns out not to have a Theta bound, you can simply exclude the inputs that are "abnormal" for your purposes. The case and the bound aren't completely independent, in that sense: you can often choose a case such that it has "good" bounds.
But can you always do it?
I don't know, but that's an interesting question.
Does every algorithm have a Big Omega?
Yes. Big Omega is a lower bound. Any algorithm can be said to take at least constant time, so any algorithm is Ω(1).
Does every algorithm have a Big O?
No. Big O is a upper bound. Algorithms that don't (reliably) terminate don't have a Big O.
An algorithm has an upper bound if we can say that, in the absolute worst case, the algorithm will not take longer than this. I'm pretty sure O(∞) is not valid notation.
When will the Big O and Big Omega of an algorithm be equal?
There is actually a special notation for when they can be equal: Big Theta (Θ).
They will be equal if the algorithm scales perfectly with the size of the input (meaning there aren't input sizes where the algorithm is suddenly a lot more efficient).
This is assuming we take Big O to be the smallest possible upper bound and Big Omega to be the largest possible lower bound. This is not actually required from the definition, but they're commonly informally treated as such. If you drop this assumption, you can find a Big O and Big Omega that aren't equal for any algorithm.
Brute force prime number checking (where we just loop through all smaller numbers and try to divide them into the target number) is perhaps a good example of when the smallest upper bound and largest lower bound are not equal.
Assume you have some number n. Let's also for the time being ignore the fact that bigger numbers take longer to divide (a similar argument holds when we take this into account, although the actual complexities would be different). And I'm also calculating the complexity based on the number itself instead of the size of the number (which can be the number of bits, and could change the analysis here quite a bit).
If n is divisible by 2 (or some other small prime), we can very quickly check whether it's prime with 1 division (or a constant number of divisions). So the largest lower bound would be Ω(1).
Now if n is prime, we'll need to try to divide n by each of the numbers up to sqrt(n) (I'll leave the reason we don't need to go higher than this as an exercise). This would take O(sqrt(n)), which would also then be our smallest upper bound.
So the algorithm would be Ω(1) and O(sqrt(n)).
Exact complexity also may be hard to calculate for some particularly complex algorithms. In such cases it may be much easier and acceptable to simply calculate some reasonably close lower and upper bounds and leave it at that. I don't however have an example on hand for this.
How does this relate to best case and worst case?
Do not confuse upper and lower bounds for best and worst case. This is a common mistake, and a bit confusing, but they're not the same. This is a whole other topic, but as a brief explanation:
The best and worst (and average) cases can be calculated for every single input size. The upper and lower bounds can then be used for each of those 3 cases (separately). You can think of each of those cases as a line on a graph with input size on the x-axis and time on the y-axis and then, for each of those lines, the upper and lower bounds are lines which need to be strictly above or below that line as the input size tends to infinity (this isn't 100% accurate, but it's a good basic idea).
Quick-sort has a worst-case of Θ(n2) (when we pick the worst possible pivot at every step) and a best-case of Θ(n log n) (when we pick good pivots). Note the use of Big Theta, meaning each of those are both lower and upper bounds.
Let's compare quick-sort with the above prime checking algorithm:
Say you have a given number n, and n is 53. Since it's prime, it will (always) take around sqrt(53) steps to determine whether it's prime. So the best and worst cases are all the same.
Say you want to sort some array of size n, and n is 53. Now those 53 elements can be arranged such that quick-sort ends up picking really bad pivots and run in around 532 steps (the worst case) or really good pivots and run in around 53 log 53 steps (the best case). So the best and worst cases are different.
Now take n as 54 for each of the above:
For prime checking, it will only take around 1 step to determine that 54 is prime. The best and worst cases are the same again, but they're different from what they were for 53.
For quick-sort, you'll again have a worst case of around 542 steps and a best case of around 54 log 54 steps.
So for quick-sort the worst case always takes around n2 steps and the best case always takes around n log n steps. So the lower and upper (or "tight") bound of the worst case is Θ(n2) and the tight bound of the best case is Θ(n log n).
For our prime checking, sometimes the worst case takes around sqrt(n) steps and sometimes it takes around 1 step. So the lower bound for the worse case would be Ω(1) and upper bound would be O(sqrt(n)). It would be the same for the best case.
Note that above I simply said "the algorithm would be Ω(1) and O(sqrt(n))". This is slightly ambiguous, as it's not clear whether the algorithm always takes the same amount of time for some input size, or the statement is referring to one of the best, average or worst case.
How do I calculate this?
It's hard to give general advice for this since proofs of bounds are greatly dependent on the algorithm. You'd need to analyse the algorithm similar to what I did above to figure out the worst and best cases.
Big O and Big Omega it can be calculated for every algorithm as you can see in Big-oh vs big-theta