mergesort complexity O(nlogn) + O(n)? - time-complexity

When analyzing the time complexity of the merge sort, I know that since there are O(log(n)) levels and each level takes a O(n) operation, the entire time complexity should be O(nlog(n)).
However, doesn't dividing take O(n) total? Each dividing of the set of elements take O(1) but you divide a total of O(n) times so doesn't the dividing part of the merge sort take O(n)? For example, if you have 8 elements, you have to divide 7 times and if you have 16 elements, you have to divide 15 times.
So, shouldn't the entire merge sort time complexity technically be O(nlog(n))+O(n)? I know that O(nlog(n) + n) is the same thing as O(nlog(n)) but no one seems to mention this in the explanation of the merge sort time complexity.

O(n log n + n) is the same thing as O(n log n). n log n grows faster than n, so the n term is extraneous.

Related

Effective time complexity for n bit addition and multiplication

I have done a course on Computer Architecture and it was mentioned that on the most efficient processors with n bit architecture word size the addition/subtraction of two words has a time complexity of O(log n) while multiplication/division has a time complexity of O(n).
If you do not consider any particular architecture word size the best time complexity of addition/subtraction is O(n) (https://www.academia.edu/42811225/Fast_Arithmetic_Speeding_up_Multiplication_Division_and_Addition_of_n_Bit_Numbers) and multiplication/division seems to be O(n log n log log n) (Strassen https://en.m.wikipedia.org/wiki/Multiplication_algorithm).
Is this correct?
O(log n) is the latency of addition if you can use n-bit wide parallel hardware with stuff like carry-select or carry-lookahead.
O(n) is the total amount of work that needs doing, and thus the time complexity with a fixed-width ALU for arbitrary bigint problems as n tends towards infinity.
For a multiply, there are n partial products in an n-bit multiply, so adding them all (with a Dadda tree for example) takes on the order of O(log n) gate delays of latency. Integer addition is associative, so you can do that in parallel, e.g. (a+b) + (c+d) is 3 with the latency of 2, and it gets better from there.
Dadda trees can avoid some of the carry-propagation latency so I guess it avoids the extra factor of log n you'd get if you you just used normal addition of each partial product separately.
See Differences between Wallace Tree and Dadda Multipliers for more about practical considerations for huge Dadda trees.

How to prove the time complexity of quicksort is O(nlogn)

I don't understand the proof given in my textbook that the time complexity of Quicksort is O(n log n). Can anyone explain how to prove it?
Typical arguments that Quicksort's average case running time is O(n log n) involve arguing that in the average case each partition operation divides the input into two equally sized partitions. The partition operations take O(n) time. Thus each "level" of the recursive Quicksort implementation has O(n) time complexity (across all the partitions) and the number of levels is however many times you can iteratively divide n by 2, which is O(log n).
You can make the above argument rigorous in various ways depending on how rigorous you want it and the background and mathematical maturity etc, of your audience. A typical way to formalize the above is to represent the the number of comparisons required by the average case of a Quicksort call as a recurrence relation like
T(n) = O(n) + 2 * T(n/2)
which can be proved to be O(n log n) via the Master Theorem or other means.

How to calculate time complexity O(n) of the algorithm?

What I have done:
I measured the time spent processing 100, 1000, 10000, 100000, 1000000 items.
Measurements here: https://github.com/DimaBond174/cache_single_thread
.
Then I assumed that O(n) increases in proportion to n, and calculated the remaining algorithms with respect to O(n) ..
Having time measurements for processing 100, 1000, 10000, 100000, 1000000 items how can we now attribute the algorithm to O(1), O(log n), O(n), O(n log n), or O(n^2) ?
Let's define N as one of the possible inputs of data. An algorithm can have different Big O values depending on which input you're referring to, but generally there's only one big input that you care about. Without the algorithm in question, you can only guess. However there are some guidelines that will help you determine which it is.
General Rule:
O(1) - the speed of the program barely changes regardless of size of data. To get this, a program must not have loops operating on the data in question at all.
O(log N) - the program slows down slightly when N increases dramatically, in a logarithmic curve. To get this, loops must only go through a fraction of the data. (for example, binary search).
O(N) - the program's speed is directly proportional to the size of the data input. If you perform an operation on each unit of the data, you get this. You must not have any kind of nested loops (that act on the data).
O(N log N)- the program's speed is significantly reduced by larger input. This occurs when you have a O(logN) operation NESTED in a loop that would otherwise be O(N). So for example, you had a loop that did a binary search for each unit of data.
O(N^2) - The program will slow down to a crawl with larger input and eventually stall with large enough data. This happens when you have NESTED loops. Same as above, but this time the nested loop is O(N) instead of O(log N)
So, try to think of a looping operation as O(N) or O(log N). Then, whenever you have nesting, multiply them together. If the loops are NOT nested, they are not multiplied like this. So two loops separate from each other would simply be O(2N) and not O(N^2).
Also remember that you may have loops under the hood, so you should think about them too. For example, if you did something like Arrays.sort(X) in Java, that would be a O(N logN) operation. So if you have that inside a loop for some reason, your program is going to be a lot slower than you think.
Hope that answers your question.

Asymptotic growth (Big o notation)

What I am trying to do is to sort the following functions:
n, n^3, nlogn, n/logn, n/log^2n, sqrt(n), sqrt(n^3)
in increasing order of asymptotic growth.
What I did is,
n/logn, n/log^2n, sqrt(n), n, sqrt(n^3), nlogn, n^3.
1) Is my answer correct?
2) I know about the time complexity of the basic functions such as n, nlogn, n^2, but I am really confused on the functions like, n/nlogn, sqrt(n^3).
How should I figure out which one is faster or slower? Is there any way to do this with mathematical calculations?
3) Are the big O time complexity and asymptotic growth different thing?
I would be really appreciated if anyone blows up my confusion... Thanks!
An important result we need here is:
log n grows more slowly than n^a for any strictly positive number a > 0.
For a proof of the above, see here.
If we re-write sqrt(n^3) as n^1.5, we can see than n log n grows more slowly (divide both by n and use the result above).
Similarly, n / log n grows more quickly than any n^b where b < 1; again this is directly from the result above. Note that it is however slower than n by a factor of log n; same for n / log^2 n.
Combining the above, we find the increasing order to be:
sqrt(n)
n / log^2 n
n / log n
n
n log n
sqrt(n^3)
n^3
So I'm afraid to say you got only a few of the orderings right.
EDIT: to answer your other questions:
If you take the limit of f(n) / g(n) as n -> infinity, then it can be said that f(n) is asymptotically greater than g(n) if this limit is infinite, and lesser if the limit is zero. This comes directly from the definition of big-O.
big-O is a method of classifying asymptotic growth, typically as the parameter approaches infinity.

time comlpexity of enumeration all the subsets

for (i=0;i<n;i++)
{
enumerate all subsets of size i = 2^n
each subset of size i takes o(nlogn) to search a solution
from all these solution I want to search the minimum subset of size S.
}
I want to know the complexity of this algorithm it'is 2^n O(nlogn*n)=o(2^n n²) ??
If I understand you right:
You iterate all subsets of a sorted set of n numbers.
For each subset you test in O(n log n) if its is a solution. (how ever you do this)
After you have all this solutions you looking for the one with exact S elements with the smalest sum.
The way you write it, the complexity would be O(2^n * n log n) * O(log (2^n)) = O(2^n * n^2 log n). O(log (2^n)) = O(n) is for searching the minimum solution, and you do this every round of the for loop with worst case i=n/2 and every subset is a solution.
Now Im not sure if you mixing O() and o() up.
2^n O(nlogn*n)=o(2^n n²) is only right if you mean 2^n O(nlog(n*n)).
f=O(g) means, the complexity of f is not bigger than the complexity of g.
f=o(g) means the complexity of f is smaller than the complexity of g.
So 2^n O(nlogn*n) = O(2^n n logn^2) = O(2^n n * 2 logn) = O(2^n n logn) < O(2^n n^2)
Notice: O(g) = o(h) is never a good notation. You will (most likly every time) find a function f with f=o(h) but f != O(g), if g=o(h).
Improvements:
If I understand your algorithm right, you can speed it a little up. You know the size of the subset you looking for, so only look at all the subsets that have the size S. The worst case is S=n/2, so C(n,n/2) ~ 2^(n-1) will not reduce the complexity but saves you a factor 2.
You can also just save a solution and check if the next solution is smaller. this way you get the smallest solution without serching for it again. So the complexity would be O(2^n * n log n).