How to analyse complexity of the given optimized bubble sort? - time-complexity

This is a pseudo code of optimized bubble sort algorithm. I have tried to analyze its time complexity, but i am not sure what is the cost of line 4 (if A[i-1] > A[i]). Is the answer (n-1)+(n-2)+........+1 ? Also what would be the cost of lines 5 to 8?
1.for j = A.length to 2
2. swapped = false
3. for i = 2 to j
4. if A[i-1] > A[i]
5. temp = A[i]
6. A[i-1] = A[i]
7. A[i-1] = temp
8. swapped = true
9. if(!swapped)
10. break

The cost of lines 5 to 8 for a single iteration is O(1).
The cost of loop at lines 3-8 is O(j-1).
The cost of the whole sort in the worst case is O((n-1) + (n-2) + ... + 2) = O(n^2) (but of course in the best case, when the array is already sorted, the cost will be only O(n-1)).
By the way, your implementation of optimized bubble sort contains an error: the if at line 9 should be inside the outer loop, but outside the inner.

Related

What is the time complexity (Big-O) of this while loop (Pseudocode)?

This is written in pseudocode.
We have an Array A of length n(n>=2)
int i = 1;
while (i < n) {
if (A[i] == 0) {
terminates the while-loop;
}
doubles i
}
I am new to this whole subject and coding, so I am having a hard time grasping it and need an "Explain like im 5".
I know the code doesnt make a lot of sense but it is just an exercise, I have to determine best case and worst case.
So in the Best case Big O would be O(1) if the value in [1] is 0.
For the worst-case scenario I thought the time complexity of this loop would be O(log(n)) as i doubles.
Is that correct?
Thanks in advance!
For Big O notation you take the worse case scenario. For the case where A[i] never evaluates to zero then your loop is like this:
int i = 1;
while(i < n) {
i *= 2;
}
i is doubled on each iteration, ie exponential growth.
Given an example of n=16
the values of i would be:
1
2
4
8
wouldn't get to 16
4 iterations
and 2^4 = 16
to work out the power, you would take log to base 2 of n, ie log(16) = 4
So the worst case would be log(n)
So the complexity would be stated as O(log(n))

How can I improve Insertionsort by the following argument ? The correct answer is b. Can someone CLEARLY explain every answer?

A person claims that they can improve InsertionSort by the following argument. In the innermost loop of InsertionSort, instead of looping over all entries in the already sorted array in order to insert the j’th observed element, simply perform BinarySearch in order to sandwich the j’th element in its correct position in the list A[1, ... , j−1]. This person claims that their resulting insertion sort is asymptotically as good as mergesort in the worst case scenario. True or False and why? Circle the one correct answer from the below:
a. True: In this version, the while loop will iterate log(n), but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards across the median elements and so this shift will still require log(n) in the worst case scenario. Adding up, Insertion Sort will significantly improve in this case to continue to require n log(n) in the worst case scenario like mergesort.
b. False: In this version, the while loop will iterate log(n), but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards and so this shift will still require n in the worst case scenario. Adding up, Insertion Sort will continue to require n² in the worst case scenario which is orders of magnitude worse than mergesort.
c. False: In this version, the while loop will iterate n, but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards and so this shift will still require log(n) in the worst case scenario. Adding up, Insertion Sort will continue to require n log(n) in the worst case scenario which is orders of magnitude worse than mergesort.
d. True: In this version, the while loop will iterate log(n), but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards and so this shift will still require n in the worst case scenario. Adding up, Insertion Sort will continue to require n log(n) in the worst case scenario which is orders of magnitude worse than mergesort.
b is correct, with some assumptions about compiler optimizations.
Consider a reverse sorted array,
8 7 6 5 4 3 2 1
and that insertion sort is half done so it is
5 6 7 8 4 3 2 1
The next step:
normal insertion sort sequence assuming most recent value read kept in register:
t = a[4] = 4 1 read
compare t and a[3] 1 read
a[4] = a[3] = 8 1 write
compare t and a[2] 1 read
a[3] = a[2] = 7 1 write
compare t and a[1] 1 read
a[2] = a[1] = 6 1 write
compare t and a[0] 1 read
a[1] = a[0] = 5 1 write
a[0] = t = 4 1 write
---------------
5 read 5 write
binary search
t = a[4] 1 read
compare t and a[1] 1 read
compare t and a[0] 1 read
a[4] = a[3] 1 read 1 write
a[3] = a[2] 1 read 1 write
a[2] = a[1] 1 read 1 write
a[1] = a[0] 1 read 1 write
a[0] = t 1 write
----------------
7 read 5 write
If a compiler re-read data with normal insertion sort it would be
9 read 5 write
In which case the binary search would save some time.
The expected answer to this question is b), but the explanation is not precise enough:
locating the position where to insert the j-th element indeed requires log(j) comparisons instead of j comparisons for regular Insertion Sort.
inserting the elements requires j element moves in the worst case for both implementations (reverse sorted array).
Summing these over the whole array produces:
n log(n) comparisons for this modified Insertion Sort idea in all cases vs: n2 comparisons in the worst case (already sorted array) for the classic implementation.
n2 element moves in the worst case in both implementations (reverse sorted array).
note that in the classic implementation the sum of the number of comparisons and element moves is constant.
Merge Sort on the other hand uses approximately n log(n) comparisons and n log(n) element moves in all cases.
Therefore the claim the resulting insertion sort is asymptotically as good as mergesort in the worst case scenario is False, indeed because the modified Insertion Sort method still performs n2 element moves in the worst case, which is asymptotically much worse than n log(n) moves.
Note however that depending on the relative cost of comparisons and element moves, the performance of this modified Insertion Sort approach may be much better than the classic implementation, for example sorting an array of string pointers containing URLs to the same site, the cost of comparing strings with a long initial substring is much greater than moving a single pointer.

Understanding time complexity: iterative algorithm

I'm new with getting time complexities and I can't seem to understand the logic behind getting this at the end:
100 (n(n+1) / 2)
For this function:
function a() {
int i,j,k,n;
for(i=1; i<=n; i++) {
for(j=1; j<=i; j++) {
for(k=1; k<=100; k++) {
print("hello");
}
}
}
}
Here's how I understand its algorithm:
i = 1, 2, 3, 4...n
j = 1, 2, 3, 4...(dependent to i, which can be 'n')
k = 1(100), 2(100), 3(100), 4(100)...n(100)
= 100 [1, 2, 3, 4.....]
If I'll use this algorithm above to simulate the end equation, I'll get this result:
End Equation:
100 (n(n+1) / 2)
Simulation
i = 1, 2, 3, 4... n
j = 1, 2, 3, 4... n
k = 100, 300, 600, 10000
I usually study these in youtube and get the idea of Big O, Omega & Theta but when it comes to this one, I can't figure out how they end with the equation such as what I have given. Please help and if you have some best practices, please share.
EDIT:
As for my own assumption of answer, it think it should be this one:
100 ((n+n)/2) or 100 (2n / 2)
Source:
https://www.youtube.com/watch?v=FEnwM-iDb2g
At around: 15:21
I think you've got i and j correct, except that it's not clear why you say k = 100, 200, 300... In every loop, k runs from 1 to 100.
So let's think through the inner loop first:
k from 1 to 100:
// Do something
The inner loop is O(100) = O(1) because its runtime does not depend on n. Now we analyze the outer loops:
i from 1 to n:
j from 1 to i:
// Do inner stuff
Now lets count how many times Do inner stuff executes:
i = 1 1 time
i = 2 2 times
i = 3 3 times
... ...
i = n n times
This is our classic triangular sum 1 + 2 + 3 + ... n = n(n+1) / 2. Therefore, the time complexity of the outer two loops is O(n(n+1)/2) which reduces to O(n^2).
The time complexity of the entire thing is O(1 * n^2) = O(n^2) because nesting loops multiplies the complexities (assuming the runtime of the inner loop is independent of the variables in the outer loops). Note here that if we had not reduced at various phases, we would be left with O(100(n)(n+1)/2), which is equivalent to O(n^2) because of the properties of big-O notation.
SOME TIPS:
You asked for some best practices. Here are some "rules" that I made use of in analyzing the example you posted.
In time complexity analysis, we can ignore multiplication by a constant. This is why the inner loop is still O(1) even though it executes 100 times. Understanding this is the basis of time complexity. We are analyzing runtime on a large scale, not counting the number of clock cycles.
With nested loops where the runtime is independent of each other, just multiply the complexity. Nesting the O(1) loop inside the outer O(N^2) loops resulted in O(N^2) code.
Some more reduction rules: http://courses.washington.edu/css162/rnash/quarters/current/labs/bigOLab/lab9.htm
If you can break code up into smaller pieces (in the same way we analyzed the k loop separately from the outer loops) then you can take advantage of the nesting rule to find the combined complexity.
Note on Omega/Theta:
Theta is the "exact bound" for time complexity whereas Big-O and Omega are upper and lower bounds respectively. Because there is no random data (like there is in a sorting algorithm), we can get an exact bound on the time complexity and the upper bound is equal to the lower bound. Therefore, it does not make any difference if we use O, Omega or Theta in this case.

Time complexity of for loops, I cannot really understand a thing

So these are the for loops that I have to find the time complexity, but I am not really clearly understood how to calculate.
for (int i = n; i > 1; i /= 3) {
for (int j = 0; j < n; j += 2) {
... ...
}
for (int k = 2; k < n; k = (k * k) {
...
}
For the first line, (int i = n; i > 1; i /= 3), keeps diving i by 3 and if i is less than 1 then the loop stops there, right?
But what is the time complexity of that? I think it is n, but I am not really sure.
The reason why I am thinking it is n is, if I assume that n is 30 then i will be like 30, 10, 3, 1 then the loop stops. It runs n times, doesn't it?
And for the last for loop, I think its time complexity is also n because what it does is
k starts as 2 and keeps multiplying itself to itself until k is greater than n.
So if n is 20, k will be like 2, 4, 16 then stop. It runs n times too.
I don't really think I am understanding this kind of questions because time complexity can be log(n) or n^2 or etc but all I see is n.
I don't really know when it comes to log or square. Or anything else.
Every for loop runs n times, I think. How can log or square be involved?
Can anyone help me understanding this? Please.
Since all three loops are independent of each other, we can analyse them separately and multiply the results at the end.
1. i loop
A classic logarithmic loop. There are countless examples on SO, this being a similar one. Using the result given on that page and replacing the division constant:
The exact number of times that this loop will execute is ceil(log3(n)).
2. j loop
As you correctly figured, this runs O(n / 2) times;
The exact number is floor(n / 2).
3. k loop
Another classic known result - the log-log loop. The code just happens to be an exact replicate of this SO post;
The exact number is ceil(log2(log2(n)))
Combining the above steps, the total time complexity is given by
Note that the j-loop overshadows the k-loop.
Numerical tests for confirmation
JavaScript code:
T = function(n) {
var m = 0;
for (var i = n; i > 1; i /= 3) {
for (var j = 0; j < n; j += 2)
m++;
for (var k = 2; k < n; k = k * k)
m++;
}
return m;
}
M = function(n) {
return ceil(log(n)/log(3)) * (floor(n/2) + ceil(log2(log2(n))));
}
M(n) is what the math predicts that T(n) will exactly be (the number of inner loop executions):
n T(n) M(n)
-----------------------
100000 550055 550055
105000 577555 577555
110000 605055 605055
115000 632555 632555
120000 660055 660055
125000 687555 687555
130000 715055 715055
135000 742555 742555
140000 770055 770055
145000 797555 797555
150000 825055 825055
M(n) matches T(n) perfectly as expected. A plot of T(n) against n log n (the predicted time complexity):
I'd say that is a convincing straight line.
tl;dr; I describe a couple of examples first, I analyze the complexity of the stated problem of OP at the bottom of this post
In short, the big O notation tells you something about how a program is going to perform if you scale the input.
Imagine a program (P0) that counts to 100. No matter how often you run the program, it's going to count to 100 as fast each time (give or take). Obviously right?
Now imagine a program (P1) that counts to a number that is variable, i.e. it takes a number as an input to which it counts. We call this variable n. Now each time P1 runs, the performance of P1 is dependent on the size of n. If we make n a 100, P1 will run very quickly. If we make n equal to a googleplex, it's going to take a little longer.
Basically, the performance of P1 is dependent on how big n is, and this is what we mean when we say that P1 has time-complexity O(n).
Now imagine a program (P2) where we count to the square root of n, rather than to itself. Clearly the performance of P2 is going to be worse than P1, because the number to which they count differs immensely (especially for larger n's (= scaling)). You'll know by intuition that P2's time-complexity is equal to O(n^2) if P1's complexity is equal to O(n).
Now consider a program (P3) that looks like this:
var length= input.length;
for(var i = 0; i < length; i++) {
for (var j = 0; j < length; j++) {
Console.WriteLine($"Product is {input[i] * input[j]}");
}
}
There's no n to be found here, but as you might realise, this program still depends on an input called input here. Simply because the program depends on some kind of input, we declare this input as n if we talk about time-complexity. If a program takes multiple inputs, we simply call those different names so that a time-complexity could be expressed as O(n * n2 + m * n3) where this hypothetical program would take 4 inputs.
For P3, we can discover it's time-complexity by first analyzing the number of different inputs, and then by analyzing in what way it's performance depends on the input.
P3 has 3 variables that it's using, called length, i and j. The first line of code does a simple assignment, which' performance is not dependent on any input, meaning the time-complexity of that line of code is equal to O(1) meaning constant time.
The second line of code is a for loop, implying we're going to do something that might depend on the length of something. And indeed we can tell that this first for loop (and everything in it) will be executed length times. If we increase the size of length, this line of code will do linearly more, thus this line of code's time complexity is O(length) (called linear time).
The next line of code will take O(length) time again, following the same logic as before, however since we are executing this every time execute the for loop around it, the time complexity will be multiplied by it: which results in O(length) * O(length) = O(length^2).
The insides of the second for loop do not depend on the size of the input (even though the input is necessary) because indexing on the input (for arrays!!) will not become slower if we increase the size of the input. This means that the insides will be constant time = O(1). Since this runs in side of the other for loop, we again have to multiply it to obtain the total time complexity of the nested lines of code: `outside for-loops * current block of code = O(length^2) * O(1) = O(length^2).
The total time-complexity of the program is just the sum of everything we've calculated: O(1) + O(length^2) = O(length^2) = O(n^2). The first line of code was O(1) and the for loops were analyzed to be O(length^2). You will notice 2 things:
We rename length to n: We do this because we express
time-complexity based on generic parameters and not on the ones that
happen to live within the program.
We removed O(1) from the equation. We do this because we're only
interested in the biggest terms (= fastest growing). Since O(n^2)
is way 'bigger' than O(1), the time-complexity is defined equal to
it (this only works like that for terms (e.g. split by +), not for
factors (e.g. split by *).
OP's problem
Now we can consider your program (P4) which is a little trickier because the variables within the program are defined a little cloudier than the ones in my examples.
for (int i = n; i > 1; i /= 3) {
for (int j = 0; j < n; j += 2) {
... ...
}
for (int k = 2; k < n; k = (k * k) {
...
}
}
If we analyze we can say this:
The first line of code is executed O(cbrt(3)) times where cbrt is the cubic root of it's input. Since i is divided by 3 every loop, the cubic root of n is the number of times the loop needs to be executed before i is smaller or equal to 1.
The second for loop is linear in time because j is executed
O(n / 2) times because it is increased by 2 rather than 1 which
would be 'normal'. Since we know that O(n/2) = O(n), we can say
that this for loop is executed O(cbrt(3)) * O(n) = O(n * cbrt(n)) times (first for * the nested for).
The third for is also nested in the first for, but since it is not nested in the second for, we're not going to multiply it by the second one (obviously because it is only executed each time the first for is executed). Here, k is bound by n, however since it is increased by a factor of itself each time, we cannot say it is linear, i.e. it's increase is defined by a variable rather than by a constant. Since we increase k by a factor of itself (we square it), it will reach n in 2log(n) steps. Deducing this is easy if you understand how log works, if you don't get this you need to understand that first. In any case, since we analyze that this for loop will be run O(2log(n)) time, the total complexity of the third for is O(cbrt(3)) * O(2log(n)) = O(cbrt(n) *2log(n))
The total time-complexity of the program is now calculated by the sum of the different sub-timecomplexities: O(n * cbrt(n)) + O(cbrt(n) *2log(n))
As we saw before, we only care about the fastest growing term if we talk about big O notation, so we say that the time-complexity of your program is equal to O(n * cbrt(n)).

Runtime complexity of the function

I have to find the time complexity of the following program:
function(int n)
{
for(int i=0;i<n;i++) //O(n) times
for(int j=i;j<i*i;j++) //O(n^2) times
if(j%i==0)
{ //O(n) times
for(int k=0;k<j;k++) //O(n^2) times
printf("8");
}
}
I analysed this function as follows:
i : O(n) : 1 2 3 4 5
j : : 1 2..3 3..8 4..15 5..24 (values taken by j)
O(n^2): 1 2 6 12 20 (Number of times executed)
j%i==0 : 1 2 3,6 4,8,12 5,10,15,20 (Values for which the condition is true)
O(n) : 1 1 2 3 4
k : 1 2 3,6 4,8,12 5,10,15,20 (Number of times printf is executed)
Total : 1 2 9 24 50 (Total)
However I am unable to bring about any conclusions since I don't find any correlation between $i$ which is essentially O(n) and Total of k (last line). In fact I don't understand if we should be looking at the time complexity in terms of number of times printf is executed since that will neglect O(n^2) execution of j-for loop. The answer given was O(n^5) which I presume is wrong but then whats correct? To be more specific about my confusion I am not able to figure out how that if(j%i==0) condition have effect on the overall runtime complexity of the function.
The answer is definitely not O(n^5). It can be seen very easily. Suppose your second inner loop always runs n^2 times and your innermost loop always runs n times, even then total time complexity would be O(n^4).
Now let us see what is actual time complexity.
1.The outermost loop always runs O(n) times.
2.Now let us see how many times second inner loop runs for a single iteration of outer loop:
The loop will run
0 time for i = 0
0 time for i = 1
2 times for i = 2
....
i*i - i times for j = i.
i*i - i is O(i^2)
3. Coming to the innermost loop, it runs only when j is divisble by i and j varies from i to i*i-1.
This means j goes through i*1, i*2 , i*3 ..... till last multiple of i less than i*i. Which is clearly O(i), Hence for a single iteration of second inner loop innermost loop runs O(i) times, this means total iterations of two inner loops is O(i^3).
Summing up O(i^3) for i = 0 to n-1 will definitely give a term that is O(n^4).
Therefore, the correct time complexity is O(n^4).