Before anyone asks, yes this was a previous test question I got wrong and knew I got wrong because I honestly just don't understand growth functions and Big O. I've read the technical definition, I know what they are but not how to calculate them. My textbook gives examples off of real-life situations, but I still find it hard to interpret code. If someone can tell me their thought process on how they determine these, that would seriously help. (i.e. this section of code tells me to multiply n by x, etc, etc).
public static int sort(int lowI, int highI, int nums[]) {
int i = lowI;
int j = highI;
int pivot = nums[lowI +(highI-lowI)/2];
int counter = 0;
while (i <= j) {
while (nums[i] < pivot) {
i++;
counter++;
}
while (nums[j] > pivot) {
j--;
counter++;
}
count++;
if (i <= j) {
NumSwap(i, j, nums); //saves i to temp and makes i = j, j = temp
i++;
j--;
}
}
if(lowI< j)
{
return counter + sort(lowI, j, nums);
}
if(i < highI)
{
return counter + sort(i, highI, nums);
}
return counter;
}
It might help for you to read some explanations of Big-O. I think of Big-O as the number of "basic operations" computed as the "input size" increases. For sorting algorithms, "basic operations" usually means comparisons (or counter increments, in your case), and the "input size" is the size of the list to sort.
When I analyze for runtime, I'll start by mentally dividing the code into sections. I ignore one-off lines (like int i = lowI;) because they're only run once, and Big-O doesn't care about constants (though, note in your case that int i = lowI; runs once with each recursion, so it's not only run once overall).
For example, I'd mentally divide your code into three overall parts to analyze: there's the main while loop while (i <= j), the two while loops inside of it, and the two recursive calls at the end. How many iterations will those loops run for, depending on the values of i and j? How many times will the function recurse, depending on the size of the list?
If I'm having trouble thinking about all these different parts at once, I'll isolate them. For example, how long will one of the inner for loops run for, depending on the values of i and j? Then, how long does the outer while loop run for?
Once I've thought about the runtime of each part, I'll bring them back together. At this stage, it's important to think about the relationships between the different parts. "Nested" relationships (i.e. the nested block loops a bunch of times each time the outer thing loops once) usually mean that the run times are multiplied. For example, since the inner while loops are nested within the outer while loop, the total number of iterations is (inner run time + other inner) * outer. It also seems like the total run time would look something like this - ((inner + other inner) * outer) * recursions - too.
Related
I was reading book about competitive programming and was encountered to problem where we have to count all possible paths in the n*n matrix.
Now the conditions are :
`
1. All cells must be visited for once (cells must not be unvisited or visited more than once)
2. Path should start from (1,1) and end at (n,n)
3. Possible moves are right, left, up, down from current cell
4. You cannot go out of the grid
Now this my code for the problem :
typedef long long ll;
ll path_count(ll n,vector<vector<bool>>& done,ll r,ll c){
ll count=0;
done[r][c] = true;
if(r==(n-1) && c==(n-1)){
for(ll i=0;i<n;i++){
for(ll j=0;j<n;j++) if(!done[i][j]) {
done[r][c]=false;
return 0;
}
}
count++;
}
else {
if((r+1)<n && !done[r+1][c]) count+=path_count(n,done,r+1,c);
if((r-1)>=0 && !done[r-1][c]) count+=path_count(n,done,r-1,c);
if((c+1)<n && !done[r][c+1]) count+=path_count(n,done,r,c+1);
if((c-1)>=0 && !done[r][c-1]) count+=path_count(n,done,r,c-1);
}
done[r][c] = false;
return count;
}
Here if we define recurrence relation then it can be like: T(n) = 4T(n-1)+n2
Is this recurrence relation true? I don't think so because if we use masters theorem then it would give us result as O(4n*n2) and I don't think it can be of this order.
The reason, why I am telling, is this because when I use it for 7*7 matrix it takes around 110.09 seconds and I don't think for n=7 O(4n*n2) should take that much time.
If we calculate it for n=7 the approx instructions can be 47*77 = 802816 ~ 106. For such amount of instruction it should not take that much time. So here I conclude that my recurrene relation is false.
This code generates output as 111712 for 7 and it is same as the book's output. So code is right.
So what is the correct time complexity??
No, the complexity is not O(4^n * n^2).
Consider the 4^n in your notation. This means, going to a depth of at most n - or 7 in your case, and having 4 choices at each level. But this is not the case. In the 8th, level you still have multiple choices where to go next. In fact, you are branching until you find the path, which is of depth n^2.
So, a non tight bound will give us O(4^(n^2) * n^2). This bound however is far from being tight, as it assumes you have 4 valid choices from each of your recursive calls. This is not the case.
I am not sure how much tighter it can be, but a first attempt will drop it to O(3^(n^2) * n^2), since you cannot go from the node you came from. This bound is still far from optimal.
I was recently asked an interview question about testing the validity of a Sudoku board. A basic answer involves for loops. Essentially:
for(int x = 0; x != 9; ++x)
for(int y = 0; y != 9; ++y)
// ...
Do this nested for loops to check the rows. Do it again to check the columns. Do one more for the sub-squares but that one is more funky because we're dividing the suoku board into sub-boards so we end end up more than two nested loops, maybe three or four.
I was later asked the complexity of this code. Frankly, as far as I'm concerned, all the cells of the board are visited exactly three times so O(3n). To me, the fact that we have nested loops doesn't mean this code is automatically O(n^2) or even O(n^highest-nesting-level-of-loops). But I have suspicion that that's the answer the interviewer expected...
Posed another way, what is the complexity of these two pieces of code:
for(int i = 0; i != n; ++i)
// ...
and:
for(int i = 0; i != sqrt(n); ++i)
for(int j = 0; j != sqrt(n); ++j)
// ...
Your general intuition is correct. Let's clarify a bit about Big-O notation:
Big-O gives you an upper bound for the worst-case (time) complexity for your algorithm, in relation to n - the size of your input. In essence, it is a measurement of how the amount of work changes in relation to the size of the input.
When you say something like
all the cells of the board are visited exactly three times so O(3n).
you are implying that n (the size of your input) is the the number of cells in the board and therefore visiting all cells three times would indeed be an O(3n) (which is O(n)) operation. If this is the case you would be correct.
However usually when referring to Sudoku problems (or problems involving a grid in general), n is taken to be the number of cells in each row/column (an n x n board). In this case, the runtime complexity would be O(3n²) (which is indeed equal to O(n²)).
In the future, it is perfectly valid to ask your interviewer what n is.
As for the question in the title (Is a nested for loop automatically O(n^2)?) the short answer is no.
Consider this example:
for(int i = 0 ; i < n ; i++) {
for(int j = 0 ; j < n ; j * 2) {
... // some constant time operation
}
}
The outer loops makes n iterations while the inner loop makes log2(n) iterations - therefore the time complexity will be O(nlogn).
In your examples, in the first one you have a single for-loop making n iterations, therefore a complexity of (at least) O(n) (the operation is performed an order of n times).
In the second one you two nested for loops, each making sqrt(n) iterations, therefore a total runtime complexity of (at least) O(n) as well. The second function isn't automatically O(n^2) simply because it contains a nested loop. The amount of operations being made is still of the same order (n) therefore these two examples have the same complexity - since we assume n is the same for both examples.
This is the most crucial point to sail home. To compare between the performance of two algorithms, you must be using the same input to make the comparison. In your sudoku problem you could have defined n in a few different ways, and the way you did would directly affect the complexity calculation of the problem - even if the amount of work is all the same.
*NOTE - this is unrelated to your question, but in the future avoid using != in loop conditions. In your second example, if log(n) is not a whole number, the loop could run forever, depending on the language and how it is defined. It is therefore recommended to use < instead.
It depends on how you define the so-called N.
If the size of the board is N-by-N, then yes, the complexity is O(N^2).
But if you say, the total number of grids is N (i.e., the board id sqrt(N)-by-sqrt(N)), then the complexity is O(N), or 3O(N) if you mind the constant.
Could you explain how to find time complexity of the folowing code? Any help appreciated.
int boo(n) {
if (n > 0)
{
return 1 + boo(n/2) + boo(n/2);
}
else
{
return 0;
}
}
Sometimes it is good to write it down. When you start, it sum 1 + boo(n/2) + boo(n/2), which is on the second line.
And each of that n/2 is run also
etc. etc.
So at the end, while the number of calls is growing exponencially, the number of repetions is only logharitmic, which at the end remove each other and you got O(N).
PS : It is enough to count down the last line, the whole tree has always only once time more nodes (minus one), which in complexity theory is neglible (you dont care about constants, which multiplicating by two is)
This kernel is doing the right thing giving me the correct result. My problem is more in correctness of the while loop if I want to improve the performance. I tried several configuration of blocks and threads but if i'm going to change them, the while loop won't give me the correct result.
The results i obtained changing the configuration of the kernel are that firstArray and secondArray won't be filled completely (they will have 0 inside the cells). Both arrays must be filled with the curValue obtained from the if loop.
Any advice is welcomed :)
Thank you in advance
#define N 65536
__global__ void whileLoop(int* firstArray_device, int* secondArray_device)
{
int curValue = 0;
int curIndex = 1;
int i = (threadIdx.x)+2;
while(i < N) {
if (i % curIndex == 0) {
curValue = curValue + curIndex;
curIndex *= 2;
}
firstArray_device[i] = curValue;
secondArray_device[i] = curValue;
i += blockDim.x * gridDim.x;
}
}
int main(){
firstArray_host[0] = 0;
firstArray_host[1] = 1;
secondArray_host[0] = 0;
secondArray_host[1] = 1;
// memory allocation + copy on GPU
// definition number of blocks and threads
dim3 dimBlock(1, 1);
dim3 dimGrid(1, 1);
whileLoop<<<dimGrid, dimBlock>>>(firstArray_device, secondArray_device);
// copy back to CPU + free memory
}
You have a data dependency issue here which hinders you to do some meaningful optimization. The variables curValue and curIndex are changed within the while loop and feed forward into the next run. As soon as you try to optimize the loop you will find you in a situation where this variables have different states and the result is changed.
I do not really know what you try to achieve, but try to make the while loop indepdent to the values of a former run of the loop to avoid the dependencies. Try to separate the data into threads and data chunks in a way that the indizes and values are calculated on the environment states like threadIdx, blockDim, gridDim...
Also try to avoid conditional loops. It is better to use for loops with a constant number of runs. This is also easier to optimize.
A few things:
You left out the code you used to declare your global arrays on the
device. It would be helpful to have this info.
Your algorithm is
not thread-safe when multiple blocks are used. In other words, if you are running multiple
blocks, not only would they be doing redundant work (thus giving
you no gains), but they would also likely at some point try to write
to the same global memory locations, creating errors.
Your code is thus
correct when only one block is used, but this makes it rather pointless ... you're running a serial, or lightly-threaded operation on a parallel device. You cannot run on all your available resources (multiple blocks on multiple SMPs without memory conflicts (see below)...
Currently there are two main issues with this code from a parallel standpoint:
int i = (threadIdx.x)+2; ...yields a starting index of 2 for a
single thread; 2 and 3 for two threads in a single block, and so on. I doubt this is
what you want as the first two positions (0, 1) are never getting
addressed. (Remember, arrays start at index 0 in C.)
Further, if you include multiple blocks (say 2 blocks
each with one thread) then you would have multiple duplicate indices
(e.g. for 2 b x 1 t --> indices b1t1: 2, b1t2: 2), which when you used the index
to write to global memory would create conflicts and errors. Doing something like int i = threadIdx.x + blockDim.x * blockIdx.x; would be the typical way to correctly calculate your indices so as to avoid this issue.
Your
final expression i += blockDim.x * gridDim.x; is okay, because its
adds a number equivalent to the total # of threads to i and thus
does not create additional clashing or overlap.
Why use the GPU to shuffle memory and do a trivial computation? You may not see much speedup versus a fast CPU, when you factor in the time to take your arrays onto and off of the device.
Work on problems 1 and 2 if you wish, but beyond that consider your overall goal and what exactly kind of algorithm you are trying to optimize and come up with a more parallel-friendly solution -- or consider whether GPU computing really makes sense for your problem.
To parallelize this algorithm, you need to come up with a formula that can directly calculate the value for a given index in the array. So, pick a random index within the range of the array, then consider what the factors are that go into determining what the value will be for that location. After finding a formula, test it by comparing output values for random indexes with the calculated values from your serial algorithm. When that is correct, create a kernel that starts out by selecting an unique index based on it's thread and block indexes. Then calculate the value for that index and store it in the corresponding index in the array.
A trivial example:
Serial:
__global__ void serial(int* array)
{
int j(0);
for (int i(0); i < 1024; ++i) {
array[i] = j;
j += 5;
}
int main() {
dim3 dimBlock(1);
dim3 dimGrid(1);
serial<<<dimGrid, dimBlock>>>(array);
}
Parallel:
__global__ void parallel(int* array)
{
int i(threadIdx.x + blockDim.x * blockIdx.x);
int j(i * 5);
array[i] = j;
}
int main(){
dim3 dimBlock(256);
dim3 dimGrid(1024 / 256);
parallel<<<dimGrid, dimBlock>>>(array);
}
I'm pretty new to programming, and I was just wondering in the following case what would be an appropriate name for the second integer I use in this piece of code
for (int i = 0; i < 10; i++)
{
for (int x = 0; x < 10; x++)
{
//stuff
}
}
I usually just name it x but I have a feeling that this could get confusing quickly. Is there a standard name for this kind of thing?
Depending upon what you're iterating over, a name might be easy or obvious by context:
for(struct mail *mail=inbox->start; mail ; mailid++) {
for (struct attachment *att=mail->attachment[0]; att; att++) {
/* work on all attachments on all mails */
}
}
For the cases where i makes the most sense for an outer loop variable, convention uses j, k, l, and so on.
But when you start nesting, look harder for meaningful names. You'll thank yourself in six months.
You could opt to reduce the nesting by making a method call. Inside of this method, you would be using a local variable also named i.
for (int i = 0; i < 10; i++)
{
methodCall(array[i], array);
}
I have assumed you need to pass the element at position i in the outer loop as well as the array to be iterated over in the inner loop - this is an assumption as you may actually require different arguments.
As always, you should measure the performance of this - there shouldn't be a massive overhead in making a method call within a loop, but this depends on the language.
Personally I feel that you should give variables meaningful names - here i and x mean nothing and will not help you understand your code in 3 months time, at which point it will appear to you as code written by a dyslexic monkey.
Name variables so that other people can understand what your code is trying to accomplish. You will save yourself time in the long run.
Since you said you are beginning, I'd say it's beneficial to experiment with multiple styles.
For the purposes of your example, my suggestion is simply replace x with j.
There's tons of real code that will use the convention of i, j, and k for single letter nested loop variables.
There's also tons that uses longer more meaningful names.
But there's much less that looks like your example.
So you can consider it a step forward because you're code looks more like real world code.