How recursion reduces the time complexity in merge sort - time-complexity

As per my understanding, time complexity is derived by calculating increment in number of operations with increasing input size.
In merge sort, there are 2 phases.
Divide the input array into smaller array
Sort and Merge those arrays
As per a video lecture, time complexity to divide an array for a merge sort is O(log n).
But here he is not referring the number of operations to calculate time complexity but number of decompressions or number of times recursive function is called.
*He used recursion to divide an array.
Talking purely in terms of pseudo code, recursion is taking more than n operations in this case;
Instead, this code always takes n operations;
function divide(arr){
for (let i = 0; i < arr.length; i++) {
arr[i] = [arr[i]];
}
}
So how the complexity of recursive code is lesser than loop?

Recursion doesn't reduce time complexity. You've already shown a diagram for top down merge sort. For the original bottom up merge sort, the code treats an array of n elements as n runs of size 1, for O(1) time complexity to divide the array.
Most libraries use some variation of a hybrid insertion sort and bottom up merge sort. Top down merge sort is mostly used for academic purposes.

Related

Time Complexity of Algorithms With Addition [duplicate]

I'm learning a course about big O notation on Coursera. I watched a video about the big O of a Fibonacci algorithm (non-recursion method), which is like this:
Operation Runtime
create an array F[0..n] O(n)
F[0] <-- 0 O(1)
F[1] <-- 1 O(1)
for i from 2 to n: Loop O(n) times
F[i] <-- F[i-1] + F[i-2] O(n) => I don't understand this line, isn't it O(1)?
return F[n] O(1)
Total: O(n)+O(1)+O(1)+O(n)*O(n)+O(1) = O(n^2)
I understand every part except F[i] <-- F[i-1] + F[i-2] O(n) => I don't understand this line, isn't it O(1) since it's just a simple addition? Is it the same with F[i] <-- 1+1?
The explanation they give me is:"But the addition is a bit worse. And normally additions are constant time. But these are large numbers. Remember, the nth Fibonacci number has about n over 5 digits to it, they're very big, and they often won't fit in the machine word."
"Now if you think about what happens if you add two very big numbers together, how long does that take? Well, you sort of add the tens digit and you carry, and you add the hundreds digit and you carry, and add the thousands digit, you carry and so on and so forth. And you sort of have to do work for each digits place.
And so the amount of work that you do should be proportional to the number of digits. And in this case, the number of digits is proportional to n, so this should take O(n) time to run that line of code".
I'm still a bit confusing. Does it mean a large number affects time complexity too? For example a = n+1 is O(1) while a = n^50+n^50 isn't O(1) anymore?
Video link for anyone who needed more information (4:56 to 6:26)
Big-O is just a notation for keeping track of orders of magnitude. But when we apply that in algorithms, we have to remember "orders of magnitude of WHAT"? In this case it is "time spent".
CPUs are set up to execute basic arithmetic on basic arithmetic types in constant time. For most purposes, we can assume we are dealing with those basic types.
However if n is a very large positive integer, we can't assume that. A very large integer will need O(log(n)) bits to represent. Which, whether we store it as bits, bytes, etc, will need an array of O(log(n)) things to store. (We would need fewer bytes than bits, but that is just a constant factor.) And when we do a calculation, we have to think about what we will actually do with that array.
Now suppose that we're trying to calculate n+m. We're going to need to generate a result of size O(log(n+m)), which must take at least that time to allocate. Luckily the grade school method of long addition where you add digits and keep track of carrying, can be adapted for big integer libraries and is O(log(n+m)) to track.
So when you're looking at addition, the log of the size of the answer is what matters. Since log(50^n) = n * log(50) that means that operations with 50^n are at least O(n). (Getting 50^n might take longer...) And it means that calculating n+1 takes time O(log(n)).
Now in the case of the Fibonacci sequence, F(n) is roughly φ^n where φ = (1 + sqrt(5))/2 so log(F(n)) = O(n).

Computational complexity depending on two variables

I have an algorithm and it is mainly composed of k-NN , followed by a computation involving finding permutations, followed by some for loops. Line by line, my computational complexity is :
O(n) - for k-NN
O(2^k) - for a part that computes singlets, pairs, triplets, etc.
O(k!) - for a part that deals with combinatorics.
O(k*k!) - for the final part.
K here is a parameter that can be chosen by the user, in general it is somewhat small (10-100). n is the number of examples in my dataset, and this can get very large.
What is the overall complexity of my algorithm? Is it simply O(n) ?
As k <= 100, f(k) = O(1) for every function f.
In your case, there is a function f such that the overall time is O(n + f(k)), so it is O(n)

Computing GCD on sorted array

Is it possible to get some optimization on any algorithm used for getting the gcd of numbers in an array if the array is sorted?
Thanks!
So, let's see. The general method of finding the GCD of an array of numbers is:
result = a[0]
for i = 1 to length(a)-1
result = gcd(result, a[i])
So what's the complexity of the gcd algorithm? Well, that's a rather involved question. See, for example, Time complexity of Euclid's Algorithm
If we pretend, as posited in the accepted answer, that the GCD algorithm is constant time (i.e. O(1)), then the complexity of the loop above is O(n). That's a reasonable assumption for numbers that fit into computer registers. And if that's the case then spending O(n log n) time to sort the array would almost certainly be a loser.
But in reality the GCD calculation is linear in the number of digits in the two numbers. If your input data consists of lots of large numbers, it's possible that sorting the array first will give you an advantage. The reasoning is that the result of gcd(a, b) will by definition give you a number that's no larger than min(a,b). So by getting the GCD of the two smallest numbers first, you limit the number of digits you have to deal with. Whether that limiting will overcome the cost of sorting the array is unclear.
If the numbers are larger than will fit into a computer register (hundreds of digits), then the GCD calculation is more expensive. But then again, so is sorting.
So the answer to your question is that sorting will almost certainly increase the speed of calculating the GCD of an array of numbers, but whether the performance improvement will offset the cost of sorting is unclear.
I think the only way you'll know for sure is to test it with representative data.

Why time and space complexity of counting sort is O(n + k) and not O(max(n, k))?

Here, 'n' and 'k' are the size of the input array and the maximum element of the array respectively.
Since there is one run in the array of size 'n' for the count of the frequency of elements and, a separate run in the array of size 'k' and for each pass(or iteration) in the array, there are count[i] iterations where 'count' is the array of size 'k'.
Same with space complexity.
I am looking for a good explanation explaining every bit of the concept, as you can guess I am horribly confused.
Please note that O(n+k) = O(max(n, k)) because
max(n,k) <= n+k <= 2max(n,k)
and the big-O doesn't see the constant 2.
Thanks to everyone who has responded. But, I think I got it.
Assumptions:
Actual array with size N is A[]
Maximum element in array A[] is K
Array for counting frequency of elments with size K is count[]
Auxiliary array for storing sorted elements with size N is sorted[]
I looked at it in this way, there is one run in A[] for getting the maximum element and one more run to store the frequency of each element.
This takes O(N).
Now, there is one run in count[] and for each iteration, there is a loop for count[i] times for inserting the array elements in the sorted order in sorted[].
The sum of all the elements in count[] cannot be greater than N. So the total time for these operations is O(N + K)
Therefore, the worst-case time complexity is O(N + K). Correct me if I'm wrong somewhere.
Actually, there are two runs on the array k
The k represents the size of the array. The 'k' in O notation actually represent the maximum element.
If we write O(max(n,k)) it will hide the details of the algorithm, which is highly dependent on the maximum element

do numbers in an array contain sides of a valid triange

Check if an array of n integers contains 3 numbers which can form a triangle (i.e. the sum of any of the two numbers is bigger than the third).
Apparently, this can be done in O(n) time.
(the obvious O(n log n) solution is to sort the array so please don't)
It's difficult to imagine N numbers (where N is moderately large) so that there is no triangle triplet. But we'll try:
Consider a growing sequence, where each next value is at the limit N[i] = N[i-1] + N[i-2]. It's nothing else than Fibonacci sequence. Approximately, it can be seen as a geometric progression with the factor of golden ratio (GRf ~= 1.618).
It can be seen that if the N_largest < N_smallest * (GRf**(N-1)) then there sure will be a triangle triplet. This definition is quite fuzzy because of floating point versus integer and because of GRf, that is a limit and not an actual geometric factor. Anyway, carefully implemented it will give an O(n) test that can check if the there is sure a triplet. If not, then we have to perform some other tests (still thinking).
EDIT: A direct conclusion from fibonacci idea is that for integer input (as specified in Q) there will exist a garanteed solution for any possible input if the size of array will be larger than log_GRf(MAX_INT), and this is 47 for 32 bits or 93 for 64 bits. Actually, we can use the largest value from the input array to define it better.
This gives us a following algorithm:
Step 1) Find MAX_VAL from input data :O(n)
Step 2) Compute the minimum array size that would guarantee the existence of the solution:
N_LIMIT = log_base_GRf(MAX_VAL) : O(1)
Step 3.1) if N > N_LIMIT : return true : O(1)
Step 3.2) else sort and use direct method O(n*log(n))
Because for large values of N (and it's the only case when the complexity matters) it is O(n) (or even O(1) in cases when N > log_base_GRf(MAX_INT)), we can say it's O(n).