How to prove the time complexity of quicksort is O(nlogn) - time-complexity

I don't understand the proof given in my textbook that the time complexity of Quicksort is O(n log n). Can anyone explain how to prove it?

Typical arguments that Quicksort's average case running time is O(n log n) involve arguing that in the average case each partition operation divides the input into two equally sized partitions. The partition operations take O(n) time. Thus each "level" of the recursive Quicksort implementation has O(n) time complexity (across all the partitions) and the number of levels is however many times you can iteratively divide n by 2, which is O(log n).
You can make the above argument rigorous in various ways depending on how rigorous you want it and the background and mathematical maturity etc, of your audience. A typical way to formalize the above is to represent the the number of comparisons required by the average case of a Quicksort call as a recurrence relation like
T(n) = O(n) + 2 * T(n/2)
which can be proved to be O(n log n) via the Master Theorem or other means.

Related

Effective time complexity for n bit addition and multiplication

I have done a course on Computer Architecture and it was mentioned that on the most efficient processors with n bit architecture word size the addition/subtraction of two words has a time complexity of O(log n) while multiplication/division has a time complexity of O(n).
If you do not consider any particular architecture word size the best time complexity of addition/subtraction is O(n) (https://www.academia.edu/42811225/Fast_Arithmetic_Speeding_up_Multiplication_Division_and_Addition_of_n_Bit_Numbers) and multiplication/division seems to be O(n log n log log n) (Strassen https://en.m.wikipedia.org/wiki/Multiplication_algorithm).
Is this correct?
O(log n) is the latency of addition if you can use n-bit wide parallel hardware with stuff like carry-select or carry-lookahead.
O(n) is the total amount of work that needs doing, and thus the time complexity with a fixed-width ALU for arbitrary bigint problems as n tends towards infinity.
For a multiply, there are n partial products in an n-bit multiply, so adding them all (with a Dadda tree for example) takes on the order of O(log n) gate delays of latency. Integer addition is associative, so you can do that in parallel, e.g. (a+b) + (c+d) is 3 with the latency of 2, and it gets better from there.
Dadda trees can avoid some of the carry-propagation latency so I guess it avoids the extra factor of log n you'd get if you you just used normal addition of each partial product separately.
See Differences between Wallace Tree and Dadda Multipliers for more about practical considerations for huge Dadda trees.

How to calculate time complexity O(n) of the algorithm?

What I have done:
I measured the time spent processing 100, 1000, 10000, 100000, 1000000 items.
Measurements here: https://github.com/DimaBond174/cache_single_thread
.
Then I assumed that O(n) increases in proportion to n, and calculated the remaining algorithms with respect to O(n) ..
Having time measurements for processing 100, 1000, 10000, 100000, 1000000 items how can we now attribute the algorithm to O(1), O(log n), O(n), O(n log n), or O(n^2) ?
Let's define N as one of the possible inputs of data. An algorithm can have different Big O values depending on which input you're referring to, but generally there's only one big input that you care about. Without the algorithm in question, you can only guess. However there are some guidelines that will help you determine which it is.
General Rule:
O(1) - the speed of the program barely changes regardless of size of data. To get this, a program must not have loops operating on the data in question at all.
O(log N) - the program slows down slightly when N increases dramatically, in a logarithmic curve. To get this, loops must only go through a fraction of the data. (for example, binary search).
O(N) - the program's speed is directly proportional to the size of the data input. If you perform an operation on each unit of the data, you get this. You must not have any kind of nested loops (that act on the data).
O(N log N)- the program's speed is significantly reduced by larger input. This occurs when you have a O(logN) operation NESTED in a loop that would otherwise be O(N). So for example, you had a loop that did a binary search for each unit of data.
O(N^2) - The program will slow down to a crawl with larger input and eventually stall with large enough data. This happens when you have NESTED loops. Same as above, but this time the nested loop is O(N) instead of O(log N)
So, try to think of a looping operation as O(N) or O(log N). Then, whenever you have nesting, multiply them together. If the loops are NOT nested, they are not multiplied like this. So two loops separate from each other would simply be O(2N) and not O(N^2).
Also remember that you may have loops under the hood, so you should think about them too. For example, if you did something like Arrays.sort(X) in Java, that would be a O(N logN) operation. So if you have that inside a loop for some reason, your program is going to be a lot slower than you think.
Hope that answers your question.

Quicksort omega-notation

Best case for Quicksort is n log(n) but everyone uses Big-O notation to describe the best case as O(n log (n)). From my understanding of notations, Quicksort has Big-Omega(n log (n)) and O(n^2). Is this correct or am I misunderstanding Big-Omega notation?
Big-O and Big-Omega are ways of describing functions, not algorithms. Saying Quicksort is O(n^2) is ambiguous because you're not saying what property of the algorithm you're describing.
Algorithms can have best-case time complexities, and worst-case time complexities. These are the time complexities of the algorithm if the best or worst performing inputs are used for each input size.
This is different from Big-O, and Big-Omega which describe upper and lower bounds of a function.
The time-complexities are given as a function of the input size, which can have their own upper and lower bounds.
For example, if you knew the best-case wasn't any worse than nlogn, then you could say the best-case time complexity is O(nlogn). If you knew it was exactly nlogn then it would be more precise to say Theta(nlogn).
You have the details incorrect. Big-O is technically supposed to be worst case. QuickSort has an average case of O(n log n) and a worst case of O(n^2). Going by the technical definition, the correct Big-O of QuickSort is then n^2. But the chances of randomly getting that performance is practically zero (unless you run QuickSort on an already sorted list) so QuickSort is given the Big-O of n log n even though it is not technically correct.
Big-Omega on the other hand is technically defined as "Takes at least this long" so it is a lower bound. That means QuickSort has Big-Omega(n log n) and n^2 has nothing to to with QuickSort and Big-Omega.

mergesort complexity O(nlogn) + O(n)?

When analyzing the time complexity of the merge sort, I know that since there are O(log(n)) levels and each level takes a O(n) operation, the entire time complexity should be O(nlog(n)).
However, doesn't dividing take O(n) total? Each dividing of the set of elements take O(1) but you divide a total of O(n) times so doesn't the dividing part of the merge sort take O(n)? For example, if you have 8 elements, you have to divide 7 times and if you have 16 elements, you have to divide 15 times.
So, shouldn't the entire merge sort time complexity technically be O(nlog(n))+O(n)? I know that O(nlog(n) + n) is the same thing as O(nlog(n)) but no one seems to mention this in the explanation of the merge sort time complexity.
O(n log n + n) is the same thing as O(n log n). n log n grows faster than n, so the n term is extraneous.

What is the worst case time complexity of median of medians quicksort?

What is the worst case time complexity of median of medians quicksort ( pivot is determined by the median of medians which take O(n) time to find )?
According to Wiki,
The approximate median-selection algorithm can also be used as a pivot strategy in quicksort, yielding an optimal algorithm, with worst-case complexity O(n log n).
This is because the median of medians algorithm prevents the bad partitioning that would occur in naive quicksort on an already sorted array.