How can I improve Insertionsort by the following argument ? The correct answer is b. Can someone CLEARLY explain every answer? - while-loop

A person claims that they can improve InsertionSort by the following argument. In the innermost loop of InsertionSort, instead of looping over all entries in the already sorted array in order to insert the j’th observed element, simply perform BinarySearch in order to sandwich the j’th element in its correct position in the list A[1, ... , j−1]. This person claims that their resulting insertion sort is asymptotically as good as mergesort in the worst case scenario. True or False and why? Circle the one correct answer from the below:
a. True: In this version, the while loop will iterate log(n), but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards across the median elements and so this shift will still require log(n) in the worst case scenario. Adding up, Insertion Sort will significantly improve in this case to continue to require n log(n) in the worst case scenario like mergesort.
b. False: In this version, the while loop will iterate log(n), but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards and so this shift will still require n in the worst case scenario. Adding up, Insertion Sort will continue to require n² in the worst case scenario which is orders of magnitude worse than mergesort.
c. False: In this version, the while loop will iterate n, but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards and so this shift will still require log(n) in the worst case scenario. Adding up, Insertion Sort will continue to require n log(n) in the worst case scenario which is orders of magnitude worse than mergesort.
d. True: In this version, the while loop will iterate log(n), but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards and so this shift will still require n in the worst case scenario. Adding up, Insertion Sort will continue to require n log(n) in the worst case scenario which is orders of magnitude worse than mergesort.

b is correct, with some assumptions about compiler optimizations.
Consider a reverse sorted array,
8 7 6 5 4 3 2 1
and that insertion sort is half done so it is
5 6 7 8 4 3 2 1
The next step:
normal insertion sort sequence assuming most recent value read kept in register:
t = a[4] = 4 1 read
compare t and a[3] 1 read
a[4] = a[3] = 8 1 write
compare t and a[2] 1 read
a[3] = a[2] = 7 1 write
compare t and a[1] 1 read
a[2] = a[1] = 6 1 write
compare t and a[0] 1 read
a[1] = a[0] = 5 1 write
a[0] = t = 4 1 write
---------------
5 read 5 write
binary search
t = a[4] 1 read
compare t and a[1] 1 read
compare t and a[0] 1 read
a[4] = a[3] 1 read 1 write
a[3] = a[2] 1 read 1 write
a[2] = a[1] 1 read 1 write
a[1] = a[0] 1 read 1 write
a[0] = t 1 write
----------------
7 read 5 write
If a compiler re-read data with normal insertion sort it would be
9 read 5 write
In which case the binary search would save some time.

The expected answer to this question is b), but the explanation is not precise enough:
locating the position where to insert the j-th element indeed requires log(j) comparisons instead of j comparisons for regular Insertion Sort.
inserting the elements requires j element moves in the worst case for both implementations (reverse sorted array).
Summing these over the whole array produces:
n log(n) comparisons for this modified Insertion Sort idea in all cases vs: n2 comparisons in the worst case (already sorted array) for the classic implementation.
n2 element moves in the worst case in both implementations (reverse sorted array).
note that in the classic implementation the sum of the number of comparisons and element moves is constant.
Merge Sort on the other hand uses approximately n log(n) comparisons and n log(n) element moves in all cases.
Therefore the claim the resulting insertion sort is asymptotically as good as mergesort in the worst case scenario is False, indeed because the modified Insertion Sort method still performs n2 element moves in the worst case, which is asymptotically much worse than n log(n) moves.
Note however that depending on the relative cost of comparisons and element moves, the performance of this modified Insertion Sort approach may be much better than the classic implementation, for example sorting an array of string pointers containing URLs to the same site, the cost of comparing strings with a long initial substring is much greater than moving a single pointer.

Related

Ranking Big O Functions By Complexity

I am trying to rank these functions — 2n, n100, (n + 1)2, n·lg(n), 100n, n!, lg(n), and n99 + n98 — so that each function is the big-O of the next function, but I do not know a method of determining if one function is the big-O of another. I'd really appreciate if someone could explain how I would go about doing this.
Assuming you have some programming background. Say you have below code:
void SomeMethod(int x)
{
for(int i = 0; i< x; i++)
{
// Do Some Work
}
}
Notice that the loop runs for x iterations. Generalizing, we say that you will get the solution after N iterations (where N will be the value of x ex: number of items in array/input etc).
so This type of implementation/algorithm is said to have Time Complexity of Order of N written as O(n)
Similarly, a Nested For (2 Loops) is O(n-squared) => O(n^2)
If you have Binary decisions made and you reduce possibilities into halves and pick only one half for solution. Then complexity is O(log n)
Found this link to be interesting.
For: Himanshu
While the Link explains how log(base2)N complexity comes into picture very well, Lets me put the same in my words.
Suppose you have a Pre-Sorted List like:
1,2,3,4,5,6,7,8,9,10
Now, you have been asked to Find whether 10 exists in the list. The first solution that comes to mind is Loop through the list and Find it. Which means O(n). Can it be made better?
Approach 1:
As we know that List of already sorted in ascending order So:
Break list at center (say at 5).
Compare the value of Center (5) with the Search Value (10).
If Center Value == Search Value => Item Found
If Center < Search Value => Do above steps for Right Half of the List
If Center > Search Value => Do above steps for Left Half of the List
For this simple example we will find 10 after doing 3 or 4 breaks (at: 5 then 8 then 9) (depending on how you implement)
That means For N = 10 Items - Search time was 3 (or 4). Putting some mathematics over here;
2^3 + 2 = 10 for simplicity sake lets say
2^3 = 10 (nearly equals --- this is just to do simple Logarithms base 2)
This can be re-written as:
Log-Base-2 10 = 3 (again nearly)
We know 10 was number of items & 3 was the number of breaks/lookup we had to do to find item. It Becomes
log N = K
That is the Complexity of the alogorithm above. O(log N)
Generally when a loop is nested we multiply the values as O(outerloop max value * innerloop max value) n so on. egfor (i to n){ for(j to k){}} here meaning if youll say for i=1 j=1 to k i.e. 1 * k next i=2,j=1 to k so i.e. the O(max(i)*max(j)) implies O(n*k).. Further, if you want to find order you need to recall basic operations with logarithmic usage like O(n+n(addition)) <O(n*n(multiplication)) for log it minimizes the value in it saying O(log n) <O(n) <O(n+n(addition)) <O(n*n(multiplication)) and so on. By this way you can acheive with other functions as well.
Approach should be better first generalised the equation for calculating time complexity. liken! =n*(n-1)*(n-2)*..n-(n-1)so somewhere O(nk) would be generalised formated worst case complexity like this way you can compare if k=2 then O(nk) =O(n*n)

time complexity for loop justification

Hi could anyone explain why the first one is True and second one is False?
First loop , number of times the loop gets executed is k times,
Where for a given n, i takes values 1,2,4,......less than n.
2 ^ k <= n
Or, k <= log(n).
Which implies , k the number of times the first loop gets executed is log(n), that is time complexity here is O(log(n)).
Second loop does not get executed based on p as p is not used in the decision statement of for loop. p does take different values inside the loop, but doesn't influence the decision statement, number of times the p*p gets executed, its time complexity is O(n).
O(logn):
for(i=0;i<n;i=i*c){// Any O(1) expression}
Here, time complexity is O(logn) when the index i is multiplied/divided by a constant value.
In the second case,
for(p=2,i=1,i<n;i++){ p=p*p }
The incremental increase is constant i.e i=i+1, the loop will run n times irrespective of the value of p. Hence the loop alone has a complexity of O(n). Considering naive multiplication p = p*p is an O(n) expression where n is the size of p. Hence the complexity should be O(n^2)
Let me summarize with an example, suppose the value of n is 8 then the possible values of i are 1,2,4,8 as soon as 8 comes look will break. You can see loop run for 3 times i.e. log(n) times as the value of i keeps on increasing by 2X. Hence, True.
For the second part, its is a normal loop which runs for all values of i from 1 to n. And the value of p is increasing be the factor p^2n. So it should be O(p^2n). Thats why it is wrong.
In order to understand why some algorithm is O(log n) it is enough to check what happens when n = 2^k (i.e., we can restrict ourselves to the case where log n happens to be an integer k).
If we inject this into the expression
for(i=1; i<2^k; i=i*2) s+=i;
we see that i will adopt the values 2, 4, 8, 16,..., i.e., 2^1, 2^2, 2^3, 2^4,... until reaching the last one 2^k. In other words, the body of the loop will be evaluated k times. Therefore, if we assume that the body is O(1), we see that the complexity is k*O(1) = O(k) = O(log n).

Complexity of removing duplicates by shifting elements

Following code is trying to remove duplicates from a sorted array by overwriting repeated elements. Although it has nested for loops, its complexity is definitely less than O(n^2). What will be the complexity of following code?
int n = arr.length;
for(int i=0;i<n;i++){
if(arr[i]==arr[i+1]){
for(int j=i+1;j<n-1;j++){
arr[j]=arr[j+1];
}
n--; i--;
}
}
Lets start from first position. Its for sure not duplicate, so you don't do any moving. Then we move to second position. It can be duplicate from first position value, and then, it may not be. If it is, then you have to move it to the end, so, for first 2 positions, you have two scenarios, one is just moving through array (its not duplicate), second one is moving element to the end of array, which require n-2 operations of arr[j]=arr[j+1].
Now, in case we moved third value to second position, we are still on it (i-- part). It can be duplicate of first value, and maybe it isn't. For first option you have to take n-3 (since you did n--, so you move it from position 2 to position n-1) operations of arr[j]=arr[j+1]. Now, if second value was not duplicate of first value (so you didn't do i-- and n--), but third value is duplicate of one of first two values, you still have to do n-3 operations of arr[j]=arr[j+1] (from position 3 to position n). So, number of operations stay the same in each case.
So, we have a pattern here: n-2 + n-3 + n -4 + ... + 3 + 2 + 1 of moving things around. This sum is:
n-2 + n-3 + n -4 + ... + 3 + 2 + 1 = n(n+1)/2 - n - n + 1
So based on this, since first part is moving through array, which is O(n), complexity of this algorithm is O(n^2). Worst case scenarios is having all same values in array.
Now, there is a better way. Place all values in Set. This require moving through array (O(n)), and checking if something is in Set, which is O(1). Then take all results from Set, and place them in array, which is O(n). So, complexity of such algorithm is O(n).

How to analyse complexity of the given optimized bubble sort?

This is a pseudo code of optimized bubble sort algorithm. I have tried to analyze its time complexity, but i am not sure what is the cost of line 4 (if A[i-1] > A[i]). Is the answer (n-1)+(n-2)+........+1 ? Also what would be the cost of lines 5 to 8?
1.for j = A.length to 2
2. swapped = false
3. for i = 2 to j
4. if A[i-1] > A[i]
5. temp = A[i]
6. A[i-1] = A[i]
7. A[i-1] = temp
8. swapped = true
9. if(!swapped)
10. break
The cost of lines 5 to 8 for a single iteration is O(1).
The cost of loop at lines 3-8 is O(j-1).
The cost of the whole sort in the worst case is O((n-1) + (n-2) + ... + 2) = O(n^2) (but of course in the best case, when the array is already sorted, the cost will be only O(n-1)).
By the way, your implementation of optimized bubble sort contains an error: the if at line 9 should be inside the outer loop, but outside the inner.

Fast way of multiplying two 1-D arrays

I have the following data:
A = [a0 a1 a2 a3 a4 a5 .... a24]
B = [b0 b1 b2 b3 b4 b5 .... b24]
which I then want to multiply as follows:
C = A * B' = [a0b0 a1b1 a2b2 ... a24b24]
This clearly involves 25 multiplies.
However, in my scenario, only 5 new values are shifted into A per "loop iteration" (and 5 old values are shifted out of A). Is there any fast way to exploit the fact that data is shifting through A rather than being completely new? Ideally I want to minimize the number of multiplication operations (at a cost of perhaps more additions/subtractions/accumulations). I initially thought a systolic array might help, but it doesn't (I think!?)
Update 1: Note B is fixed for long periods, but can be reprogrammed.
Update 2: the shifting of A is like the following: a[24] <= a[19], a[23] <= a[18]... a[1] <= new01, a[0] <= new00. And so on so forth each clock cycle
Many thanks!
Is there any fast way to exploit the fact that data is shifting through A rather than being completely new?
Even though all you're doing is the shifting and adding new elements to A, the products in C will, in general, all be different since one of the operands will generally change after each iteration. If you have additional information about the way the elements of A or B are structured, you could potentially use that structure to reduce the number of multiplications. Barring any such structural considerations, you will have to compute all 25 products each loop.
Ideally I want to minimize the number of multiplication operations (at a cost of perhaps more additions/subtractions/accumulations).
In theory, you can reduce the number of multiplications to 0 by shifting and adding the array elements to simulate multiplication. In practice, this will be slower than a hardware multiplication so you're better off just using any available hardware-based multiplication unless there's some additional, relevant constraint you haven't mentioned.
on the very first 5 data set you could be saving upto 50 multiplications. but after that its a flat road of multiplications. since for every set after the first 5 set you need to multiply with the new set of data.
i'l assume all the arrays are initialized to zero.
i dont think those 50 saved are of any use considering the amount of multiplication on the whole.
But still i will give you a hint on how to save those 50 maybe you could find an extension to it?
1st data set arrived : multiply the first data set in a with each of the data set in b. save all in a, copy only a[0] to a[4] to c. 25 multiplications here.
2nd data set arrived : multiply only a[0] to a[4](having new data) with b[0] to b[4] resp. save in a[0] to a[4],copy to a[0->9] to c. 5 multiplications here
3rd data set arrived : multiply a[0] to a[9] with b[0] to b[9] this time and copy to corresponding a[0->14] to c.10 multiplications here
4th data set : multiply a[0] to a[14] with corresponding b copy corresponding a[0->19] to c. 15 multiplications here.
5th data set : mutiply a[0] to a[19] with corresponding b copy corresponding a[0->24] to c. 20 multiplications here.
total saved mutiplications : 50 multiplications.
6th data set : usual data multiplications. 25 each. this is because for each set in the array a there a new data set avaiable so multiplication is unavoidable.
Can you add another array D to flag the changed/unchanged value in A. Each time you check this array to decide whether to do new multiplications or not.