A simple question but I am confused.
Which one is bigger,
O(n^2log(n)) or O(n^3)?
Thanks in advance.
Adding onto the comments, you can always plot it on a graph or plug in very large numbers into a calculator and see where the values. O(n) > O(logn), and as this is O(N^2) * X, where one X is O(n) and another is O(logn), you can find out that O(N^3) is larger as you are multiplying.
Useful tool: http://www.wolframalpha.com/widgets/view.jsp?id=57ad04c0f04cc92e742205985c18023e
Related
I understand why, in worst case, where T is the running time of the algorithm, that using the median of medians algorithm with blocks of size three gives a recurrence relation of
T(n) = T(2n / 3) + T(n / 3) + O(n)
The Wikipedia article for the median-of-medians algorithm says that with blocks of size three the runtime is not O(n) because it still needs to check all n elements. I don't quite understand this explanation, and in my homework it says I need to show it by induction.
How would I show that median-of-medians takes time Ω(n log n) in this case?
Since this is a homework problem I'm going to let you figure out a rigorous proof of this result on your own, but it might be helpful to think about this one by looking at the shape of the recursion tree, which will be something like this:
n Total work: n
2n/3 n/3 Total work: n
4n/9 2n/9 2n/9 n/9 Total work: n
Essentially, each node's children collectively will do the exact same amount of work as the node itself, so if you sum up the work done across the layers, you should see roughly linear work done per level. It won't be exactly linear work per level because eventually the smaller call starts to bottom out, but for the top layers you'll see this pattern hold.
You can formalize this by induction by guessing that the runtime is something of the form cn log n, possibly with some lower-order terms added in, but (IMHO) it's more important and instructive to see where the runtime comes from than it is to be able to prove it inductively.
If we add the fractional parts of T(2n/3) and T(n/3), get T(n). Then, using the Master theorem, we have n^(log_(b)(a)) = n^(log_(1)(1)) = n. We also have f(n) = O(n). So n^(log_(b)(a)) = O(n) = Theta(f(n)), thus Case 2 of the Master theorem applies. Thus T(n) = Theta(n^(log_(b)(a)) * log(n)) = Theta(n*log(n)).
As we know, Dynamic Programming (DP) usually can solve problems in time O(n^2) or O(n^3) where a naive approach would take exponential time. But are there any harder problems needs O(n^4) time to solve using DP?
If you have a matrix with integers and can only go down and right, and u need to find the maximum sum that could be obtained starting from left-top corner to right-bottom. This could be solved in O(n^2) (d[i][j] = max(d[i-1][j], d[i][j-1]). But If it vere 3-dimensional array, the complexity would be O(n^3). If it were 4-dimensional then O(n^4) etc.
I've searched but I can't seem to find the complexity of the Flood Fill Algorithm (Four-way ver.). What exactly is the complexity in big O notation?
The time complexity would be O(4*mn)=(mn) because each cell of matrix is processed at most 4 times. For Example, a particular cell can be called by its top, down, left or right cell.
The complexity of the flood fill algorithm is proportional to the number of pixels in the filled area. So, if you have e.g. a square, and M is the number of pixels in the square and N is the length of the side of a square, then M = N^2 and the complexity is O(M) = O(N^2).
By the way, this question has already been answered in a comment in Stackoverflow. See How can I improve the performance of my flood-fill routine?
In worst-case, all cells of the matrix will be covered.
In terms of complexity time, this algorithm will be equals the recursive one: O(N×M)O(N×M), where N and M are the dimensions of the input matrix. The key idea is that in both algorithms each node is processed at most once.
Please refer below link for better understaning and more cases:
https://www.hackerearth.com/practice/algorithms/graphs/flood-fill-algorithm/tutorial/
The time complexity of the Flood Fill algorithm should be O(m×n) , where m= no. of rows and n=no. of columns in the given matrix.
Note that every element of matrix is processed at most three times.
So, it boils down to O(3×mn), which eventually is same as O(mn).
Isn't O(n) an improvement over O(1 + n)?
This is my conception of the difference:
O(n):
for i=0 to n do ; print i ;
O(1 + n):
a = 1;
for i=0 to n do ; print i+a ;
... which would just reduce to O(n) right?
If the target time complexity is O(1 + n), but I have a solution in O(n),
does this mean I'm doing something wrong?
Thanks.
O(1+n) and O(n) are mathematically identical, as you can straightforwardly prove from the formal definition or using the standard rule that O( a(n) + b(n) ) is equal to the bigger of O(a(n)) and O(b(n)).
In practice, of course, if you do n+1 things it'll (usually, dependent on compiler optimizations/etc) take longer than if you only do n things. But big-O notation is the wrong tool to talk about those differences, because it explicitly throws away differences like that.
It's not an improvement because BigO doesn't describe the exact running time of your algorithm but rather its growth rate. BigO therefore describes a class of functions, not a single function. O(n^2) doesn't mean that your algorithms for input of size 2 will run in 4 operations, it means that if you were to plot the running time of your application as a function of n it would be asymptotically upper bound by c*n^2 starting at some n0. This is nice because we know how much slower our algorithm will be for each input size, but we don't really know exactly how fast it will be. Why use the c? Because as I said we don't care about exact numbers but more about the shape of the function - when we multiply by a constant factor the shape stays the same.
Isn't O(n) an improvement over O(1 + n)?
No, it is not. Asymptotically these two are identical. In fact, O(n) is identical to O(n+k) where k is any constant value.
I am computing a similarity matrix based on Euclidean distance in MATLAB. My code is as follows:
for i=1:N % M,N is the size of the matrix x for whose elements I am computing similarity matrix
for j=1:N
D(i,j) = sqrt(sum(x(:,i)-x(:,j)).^2)); % D is the similarity matrix
end
end
Can any help with optimizing this = reducing the for loops as my matrix x is of dimension 256x30000.
Thanks a lot!
--Aditya
The function to do so in matlab is called pdist. Unfortunately it is painfully slow and doesnt take Matlabs vectorization abilities into account.
The following is code I wrote for a project. Let me know what kind of speed up you get.
Qx=repmat(dot(x,x,2),1,size(x,1));
D=sqrt(Qx+Qx'-2*x*x');
Note though that this will only work if your data points are in the rows and your dimensions the columns. So for example lets say I have 256 data points and 100000 dimensions then on my mac using x=rand(256,100000) and the above code produces a 256x256 matrix in about half a second.
There's probably a better way to do it, but the first thing I noticed was that you could cut the runtime in half by exploiting the symmetry D(i,j)==D(i,j)
You can also use the function norm(x(:,i)-x(:,j),2)
I think this is what you're looking for.
D=zeros(N);
jIndx=repmat(1:N,N,1);iIndx=jIndx'; %'# fix SO's syntax highlighting
D(:)=sqrt(sum((x(iIndx(:),:)-x(jIndx(:),:)).^2,2));
Here, I have assumed that the distance vector, x is initalized as an NxM array, where M is the number of dimensions of the system and N is the number of points. So if your ordering is different, you'll have to make changes accordingly.
To start with, you are computing twice as much as you need to here, because D will be symmetric. You don't need to calculate the (i,j) entry and the (j,i) entry separately. Change your inner loop to for j=1:i, and add in the body of that loop D(j,i)=D(i,j);
After that, there's really not much redundancy left in what that code does, so your only room left for improvement is to parallelize it: if you have the Parallel Computing Toolbox, convert your outer loop to a parfor and before you run it, say matlabpool(n), where n is the number of threads to use.