This question already has answers here:
What is the space complexity of a recursive fibonacci algorithm?
(5 answers)
Closed 2 years ago.
I am going through the CTCI book and can't understand one of their examples. They start with:
int sum(int n) {
if (n <= 0) {
return 0;
}
return n + sum(n-1);
}
and explain that it's O(n) time and O(n) space because each of the calls is added to the call stack and takes up actual memory.
The next example is:
int f(int n) {
if (n <= 0) {
return 1;
}
return f(n - 1) + f(n-1);
}
and states that time complexity is O(2^n) and space is O(n). Although I understand why the time is O(2^n), I am not sure why the space is O(n)? Their explanation is that "only O(n) nodes exists at any given time". Why we don't count the space taken by each call stack, as it is in the first example?
P.S. After reading similar questions, should I assume that stack frame's space is reclaimed once we start moving back (or up) the recursion?
Unlike the time complexity, which is simply a total time that is needed to run a program, the space complexity describes the space required to execute the program. So it doesn't really matter that there are 2n nodes in the execution tree of the program. The call stack automatically folds and releases the additional memory used. What matters is the maximal depth of the call tree, which is O(n) for this program. Should be noted, though, that recursion is a special case that naturally releases any used memory upon stack fold. If memory is allocated explicitly during runtime, it should be released explicitly as well.
Regarding the first example, the call tree is simply a list of depth n, resulting in similar complexity of O(n).
Related
On pg. 44 of Cracking the Coding Interview, there is the following algo:
int f(int n) {
if (n <= 1) {
return 1;
}
return f(n - 1) + f(n - 1);
}
The book says that this has time complexity of O(2^n) and space-complexity of O(n). I get the time complexity part since there are O(2^n) nodes created. I don't understand why the space-complexity is not also that. The book says because it's because only O(n) nodes exist at any given time.
How can that be? Wouldn't the call-stack have all 2^n calls when we are at the bottom level of f(1)? What am I missing?
Please let me know if I can provide more detail.
Thanks,
No. The second call to f(n-1) doesn't take place until after the first one returns, so they do not occupy stack space at the same time. When the first one returns, its stack space is freed and may be reused for the second call.
The same applies at every level of the recursion. The memory used is proportional to the maximum depth of the call tree, not the overall number of nodes.
Why is it that when I see example code for binary search there is never an if statement to check if the start of the array or end is the target?
import java.util.Arrays;
public class App {
public static int binary_search(int[] arr, int left, int right, int target) {
if (left > right) {
return -1;
}
int mid = (left + right) / 2;
if (target == arr[mid]) {
return mid;
}
if (target < arr[mid]) {
return binary_search(arr, left, mid - 1, target);
}
return binary_search(arr, mid + 1, right, target);
}
public static void main(String[] args) {
int[] arr = { 3, 2, 4, -1, 0, 1, 10, 20, 9, 7 };
Arrays.sort(arr);
for (int i = 0; i < arr.length; i++) {
System.out.println("Index: " + i + " value: " + arr[i]);
}
System.out.println(binary_search(arr, arr[0], arr.length - 1, -1));
}
}
in this example if the target was -1 or 20 the search would enter recursion. But it added an if statement to check if target is mid, so why not add two more statements also checking if its left or right?
EDIT:
As pointed out in the comments, I may have misinterpreted the initial question. The answer below assumes that OP meant having the start/end checks as part of each step of the recursion, as opposed to checking once before the recursion even starts.
Since I don't know for sure which interpretation was intended, I'm leaving this post here for now.
Original post:
You seem to be under the impression that "they added an extra check for mid, so surely they should also add an extra check for start and end".
The check "Is mid the target?" is in fact not a mere optimization they added. Recursively checking "mid" is the whole point of a binary search.
When you have a sorted array of elements, a binary search works like this:
Compare the middle element to the target
If the middle element is smaller, throw away the first half
If the middle element is larger, throw away the second half
Otherwise, we found it!
Repeat until we either find the target or there are no more elements.
The act of checking the middle is fundamental to determining which half of the array to continue searching through.
Now, let's say we also add a check for start and end. What does this gain us? Well, if at any point the target happens to be at the very start or end of a segment, we skip a few steps and end slightly sooner. Is this a likely event?
For small toy examples with a few elements, yeah, maybe.
For a massive real-world dataset with billions of entries? Hm, let's think about it. For the sake of simplicity, we assume that we know the target is in the array.
We start with the whole array. Is the first element the target? The odds of that is one in a billion. Pretty unlikely. Is the last element the target? The odds of that is also one in a billion. Pretty unlikely too. You've wasted two extra comparisons to speed up an extremely unlikely case.
We limit ourselves to, say, the first half. We do the same thing again. Is the first element the target? Probably not since the odds are one in half a billion.
...and so on.
The bigger the dataset, the more useless the start/end "optimization" becomes. In fact, in terms of (maximally optimized) comparisons, each step of the algorithm has three comparisons instead of the usual one. VERY roughly estimated, that suggests that the algorithm on average becomes three times slower.
Even for smaller datasets, it is of dubious use since it basically becomes a quasi-linear search instead of a binary search. Yes, the odds are higher, but on average, we can expect a larger amount of comparisons before we reach our target.
The whole point of a binary search is to reach the target with as few wasted comparisons as possible. Adding more unlikely-to-succeed comparisons is typically not the way to improve that.
Edit:
The implementation as posted by OP may also confuse the issue slightly. The implementation chooses to make two comparisons between target and mid. A more optimal implementation would instead make a single three-way comparison (i.e. determine ">", "=" or "<" as a single step instead of two separate ones). This is, for instance, how Java's compareTo or C++'s <=> normally works.
BambooleanLogic's answer is correct and comprehensive. I was curious about how much slower this 'optimization' made binary search, so I wrote a short script to test the change in how many comparisons are performed on average:
Given an array of integers 0, ... , N
do a binary search for every integer in the array,
and count the total number of array accesses made.
To be fair to the optimization, I made it so that after checking arr[left] against target, we increase left by 1, and similarly for right, so that every comparison is as useful as possible. You can try this yourself at Try it online
Results:
Binary search on size 10: Standard 29 Optimized 43 Ratio 1.4828
Binary search on size 100: Standard 580 Optimized 1180 Ratio 2.0345
Binary search on size 1000: Standard 8987 Optimized 21247 Ratio 2.3642
Binary search on size 10000: Standard 123631 Optimized 311205 Ratio 2.5172
Binary search on size 100000: Standard 1568946 Optimized 4108630 Ratio 2.6187
Binary search on size 1000000: Standard 18951445 Optimized 51068017 Ratio 2.6947
Binary search on size 10000000: Standard 223222809 Optimized 610154319 Ratio 2.7334
so the total comparisons does seem to tend to triple the standard number, implying the optimization becomes increasingly unhelpful for larger arrays. I'd be curious whether the limiting ratio is exactly 3.
To add some extra check for start and end along with the mid value is not impressive.
In any algorithm design the main concerned is moving around it's complexity either it is time complexity or space complexity. Most of the time the time complexity is taken as more important aspect.
To learn more about Binary Search Algorithm in different use case like -
If Array is not containing any repeated
If Array has repeated element in this case -
a) return leftmost index/value
b) return rightmost index/value
and many more point
Code:
int main()
{
for(long long i=0;i<10000000;i++)
{
}
return 0;
}
I asked this because i wanted to know , Whether an empty loop add to the time of running of program. Like, say we do have a function within the loop but it does not run on every loop due to some condition:
Code:
int main()
{
for(long long i=0;i<10000;i++)
{
for(long long i=1;i<10000;i++)
{
if(//"some condition")
{
func(); // some function which we know is going to run only one-hundredth of the time due to the condition. time complexity of func() is O(1).
}
}
}
return 0;
}
Will the timecomplexity be O(N*N)??
Time-complexity is only meaningful in the context of variable-sized data-set; it describes how quickly the program's total execution time will increase as the size of the data-set increases. For example, if you have N items to process, and your algorithm needs to read each of those items a fixed number of times, then your algorithm is considered to be O(N).
In your first case, if we assume you have a "data set" whose current size is 10000000, then your single for-loop would be O(N) -- but note that since your for-loop doesn't have any observable effects, an optimizing compiler would probably just omit the loop entirely, reducing it to effectively O(1).
In your second (nested-loop) example (assuming the variable-set-size is 10000), the algorithm is O(N^2), because the number of steps the program has to run increases with the square of the set-size. That is true regardless of how often the internal if test evaluates to true, because the program will have to do some steps (such as evaluating the if condition) N*N times no how often (or rarely) the if-test evaluates to true. (Again, the exception would be if the compiler could somehow prove that the if statement never evaluates to true, or that the func() function had no observable side-effects, in which case it could legally omit the whole thing and just return 0 immediately)
Your first code has a worst-case complexity of O(n), because it iterates n times. Regardless of it doing nothing or a milllion things in each iteration, it is always of O(n) complexity. It may not be optimized away and the optimizer may not skip the empty loop.
Similarly, your second program has a complexity of O(n^2) because it iterates n^2 many times. The if condition inside may or may not be satisfied for some cases, and the program may not execute in the cases where the if is not satisfied, but it visits n^2 cases, which is enough to establish an O(n^2) complexity.
I was recently asked an interview question about testing the validity of a Sudoku board. A basic answer involves for loops. Essentially:
for(int x = 0; x != 9; ++x)
for(int y = 0; y != 9; ++y)
// ...
Do this nested for loops to check the rows. Do it again to check the columns. Do one more for the sub-squares but that one is more funky because we're dividing the suoku board into sub-boards so we end end up more than two nested loops, maybe three or four.
I was later asked the complexity of this code. Frankly, as far as I'm concerned, all the cells of the board are visited exactly three times so O(3n). To me, the fact that we have nested loops doesn't mean this code is automatically O(n^2) or even O(n^highest-nesting-level-of-loops). But I have suspicion that that's the answer the interviewer expected...
Posed another way, what is the complexity of these two pieces of code:
for(int i = 0; i != n; ++i)
// ...
and:
for(int i = 0; i != sqrt(n); ++i)
for(int j = 0; j != sqrt(n); ++j)
// ...
Your general intuition is correct. Let's clarify a bit about Big-O notation:
Big-O gives you an upper bound for the worst-case (time) complexity for your algorithm, in relation to n - the size of your input. In essence, it is a measurement of how the amount of work changes in relation to the size of the input.
When you say something like
all the cells of the board are visited exactly three times so O(3n).
you are implying that n (the size of your input) is the the number of cells in the board and therefore visiting all cells three times would indeed be an O(3n) (which is O(n)) operation. If this is the case you would be correct.
However usually when referring to Sudoku problems (or problems involving a grid in general), n is taken to be the number of cells in each row/column (an n x n board). In this case, the runtime complexity would be O(3n²) (which is indeed equal to O(n²)).
In the future, it is perfectly valid to ask your interviewer what n is.
As for the question in the title (Is a nested for loop automatically O(n^2)?) the short answer is no.
Consider this example:
for(int i = 0 ; i < n ; i++) {
for(int j = 0 ; j < n ; j * 2) {
... // some constant time operation
}
}
The outer loops makes n iterations while the inner loop makes log2(n) iterations - therefore the time complexity will be O(nlogn).
In your examples, in the first one you have a single for-loop making n iterations, therefore a complexity of (at least) O(n) (the operation is performed an order of n times).
In the second one you two nested for loops, each making sqrt(n) iterations, therefore a total runtime complexity of (at least) O(n) as well. The second function isn't automatically O(n^2) simply because it contains a nested loop. The amount of operations being made is still of the same order (n) therefore these two examples have the same complexity - since we assume n is the same for both examples.
This is the most crucial point to sail home. To compare between the performance of two algorithms, you must be using the same input to make the comparison. In your sudoku problem you could have defined n in a few different ways, and the way you did would directly affect the complexity calculation of the problem - even if the amount of work is all the same.
*NOTE - this is unrelated to your question, but in the future avoid using != in loop conditions. In your second example, if log(n) is not a whole number, the loop could run forever, depending on the language and how it is defined. It is therefore recommended to use < instead.
It depends on how you define the so-called N.
If the size of the board is N-by-N, then yes, the complexity is O(N^2).
But if you say, the total number of grids is N (i.e., the board id sqrt(N)-by-sqrt(N)), then the complexity is O(N), or 3O(N) if you mind the constant.
I have a question about calculating the expected running time of a given function. I understand just fine, how to calculate code fragments with cycles in them (for / while / if , etc.) but functions without them seems a bit odd to me. For example, lets say that we have the following code fragment:
public void Add(T item)
{
var newArr = new T[this.arr.Length + 1];
Array.Copy(this.arr, newArr, this.arr.Length);
newArr[newArr.Length - 1] = item;
this.arr = newArr;
}
If my logic works correctly, the function Add has a complexity of O(1), because in the best/worst/average case it will just read every line of code once, right?
You always have to consider the time complexity of the function calls, too. I don't know how Array.Copy is implemented, but I'm going to guess it's O(N), making the whole Add function O(N) as well. Your intuition is right, though - the rest of it is in fact O(1).
If you have multiple sub-operations with O(n) + O(log(n)) etc and the costliest step is the cost of the whole operation - by default big O refers to the worst case. Here as you copy the array, it is an O(n) operation
Complexity is calculated following this 2 rules :
-Calling a method (complexity+ 1)
-Encountering the following keywords : if, while, repeat, for, &&, ||, catch, case, etc … (complexity+ 1)
In your case , given you are trying to copy an array and not a single value , the algorithm will complete N copy operations giving you an O(N) operation.