Calculating the expected running time of function - time-complexity

I have a question about calculating the expected running time of a given function. I understand just fine, how to calculate code fragments with cycles in them (for / while / if , etc.) but functions without them seems a bit odd to me. For example, lets say that we have the following code fragment:
public void Add(T item)
{
var newArr = new T[this.arr.Length + 1];
Array.Copy(this.arr, newArr, this.arr.Length);
newArr[newArr.Length - 1] = item;
this.arr = newArr;
}
If my logic works correctly, the function Add has a complexity of O(1), because in the best/worst/average case it will just read every line of code once, right?

You always have to consider the time complexity of the function calls, too. I don't know how Array.Copy is implemented, but I'm going to guess it's O(N), making the whole Add function O(N) as well. Your intuition is right, though - the rest of it is in fact O(1).

If you have multiple sub-operations with O(n) + O(log(n)) etc and the costliest step is the cost of the whole operation - by default big O refers to the worst case. Here as you copy the array, it is an O(n) operation

Complexity is calculated following this 2 rules :
-Calling a method (complexity+ 1)
-Encountering the following keywords : if, while, repeat, for, &&, ||, catch, case, etc … (complexity+ 1)
In your case , given you are trying to copy an array and not a single value , the algorithm will complete N copy operations giving you an O(N) operation.

Related

calculate time complexity for this solution

I have the following code. I think the solution has O(n^2) because it is a nested loop according to the attached image. can anyone confirm?
function sortSmallestToLargest(data):
sorted_data={}
while data is not empty:
smallest_data=data[0]
foreach i in data:
if (i < smallest_data):
smallest_data= i
sorted_data.add(smallest_data)
data.remove(smallest_data)
return sorted_data
reference image
Ok, now I see what you are doing! Yes, you are right, it's O(n²), because you're always looping through every element of data. Your data will decrease by one every loop, but because with O complexity we don't care about constants, we can say it's O(n) for each loop. Multiplying (because one is inside the other) we have O(n²).

Binary search start or end is target

Why is it that when I see example code for binary search there is never an if statement to check if the start of the array or end is the target?
import java.util.Arrays;
public class App {
public static int binary_search(int[] arr, int left, int right, int target) {
if (left > right) {
return -1;
}
int mid = (left + right) / 2;
if (target == arr[mid]) {
return mid;
}
if (target < arr[mid]) {
return binary_search(arr, left, mid - 1, target);
}
return binary_search(arr, mid + 1, right, target);
}
public static void main(String[] args) {
int[] arr = { 3, 2, 4, -1, 0, 1, 10, 20, 9, 7 };
Arrays.sort(arr);
for (int i = 0; i < arr.length; i++) {
System.out.println("Index: " + i + " value: " + arr[i]);
}
System.out.println(binary_search(arr, arr[0], arr.length - 1, -1));
}
}
in this example if the target was -1 or 20 the search would enter recursion. But it added an if statement to check if target is mid, so why not add two more statements also checking if its left or right?
EDIT:
As pointed out in the comments, I may have misinterpreted the initial question. The answer below assumes that OP meant having the start/end checks as part of each step of the recursion, as opposed to checking once before the recursion even starts.
Since I don't know for sure which interpretation was intended, I'm leaving this post here for now.
Original post:
You seem to be under the impression that "they added an extra check for mid, so surely they should also add an extra check for start and end".
The check "Is mid the target?" is in fact not a mere optimization they added. Recursively checking "mid" is the whole point of a binary search.
When you have a sorted array of elements, a binary search works like this:
Compare the middle element to the target
If the middle element is smaller, throw away the first half
If the middle element is larger, throw away the second half
Otherwise, we found it!
Repeat until we either find the target or there are no more elements.
The act of checking the middle is fundamental to determining which half of the array to continue searching through.
Now, let's say we also add a check for start and end. What does this gain us? Well, if at any point the target happens to be at the very start or end of a segment, we skip a few steps and end slightly sooner. Is this a likely event?
For small toy examples with a few elements, yeah, maybe.
For a massive real-world dataset with billions of entries? Hm, let's think about it. For the sake of simplicity, we assume that we know the target is in the array.
We start with the whole array. Is the first element the target? The odds of that is one in a billion. Pretty unlikely. Is the last element the target? The odds of that is also one in a billion. Pretty unlikely too. You've wasted two extra comparisons to speed up an extremely unlikely case.
We limit ourselves to, say, the first half. We do the same thing again. Is the first element the target? Probably not since the odds are one in half a billion.
...and so on.
The bigger the dataset, the more useless the start/end "optimization" becomes. In fact, in terms of (maximally optimized) comparisons, each step of the algorithm has three comparisons instead of the usual one. VERY roughly estimated, that suggests that the algorithm on average becomes three times slower.
Even for smaller datasets, it is of dubious use since it basically becomes a quasi-linear search instead of a binary search. Yes, the odds are higher, but on average, we can expect a larger amount of comparisons before we reach our target.
The whole point of a binary search is to reach the target with as few wasted comparisons as possible. Adding more unlikely-to-succeed comparisons is typically not the way to improve that.
Edit:
The implementation as posted by OP may also confuse the issue slightly. The implementation chooses to make two comparisons between target and mid. A more optimal implementation would instead make a single three-way comparison (i.e. determine ">", "=" or "<" as a single step instead of two separate ones). This is, for instance, how Java's compareTo or C++'s <=> normally works.
BambooleanLogic's answer is correct and comprehensive. I was curious about how much slower this 'optimization' made binary search, so I wrote a short script to test the change in how many comparisons are performed on average:
Given an array of integers 0, ... , N
do a binary search for every integer in the array,
and count the total number of array accesses made.
To be fair to the optimization, I made it so that after checking arr[left] against target, we increase left by 1, and similarly for right, so that every comparison is as useful as possible. You can try this yourself at Try it online
Results:
Binary search on size 10: Standard 29 Optimized 43 Ratio 1.4828
Binary search on size 100: Standard 580 Optimized 1180 Ratio 2.0345
Binary search on size 1000: Standard 8987 Optimized 21247 Ratio 2.3642
Binary search on size 10000: Standard 123631 Optimized 311205 Ratio 2.5172
Binary search on size 100000: Standard 1568946 Optimized 4108630 Ratio 2.6187
Binary search on size 1000000: Standard 18951445 Optimized 51068017 Ratio 2.6947
Binary search on size 10000000: Standard 223222809 Optimized 610154319 Ratio 2.7334
so the total comparisons does seem to tend to triple the standard number, implying the optimization becomes increasingly unhelpful for larger arrays. I'd be curious whether the limiting ratio is exactly 3.
To add some extra check for start and end along with the mid value is not impressive.
In any algorithm design the main concerned is moving around it's complexity either it is time complexity or space complexity. Most of the time the time complexity is taken as more important aspect.
To learn more about Binary Search Algorithm in different use case like -
If Array is not containing any repeated
If Array has repeated element in this case -
a) return leftmost index/value
b) return rightmost index/value
and many more point

What is the time complexity of below function?

I was reading book about competitive programming and was encountered to problem where we have to count all possible paths in the n*n matrix.
Now the conditions are :
`
1. All cells must be visited for once (cells must not be unvisited or visited more than once)
2. Path should start from (1,1) and end at (n,n)
3. Possible moves are right, left, up, down from current cell
4. You cannot go out of the grid
Now this my code for the problem :
typedef long long ll;
ll path_count(ll n,vector<vector<bool>>& done,ll r,ll c){
ll count=0;
done[r][c] = true;
if(r==(n-1) && c==(n-1)){
for(ll i=0;i<n;i++){
for(ll j=0;j<n;j++) if(!done[i][j]) {
done[r][c]=false;
return 0;
}
}
count++;
}
else {
if((r+1)<n && !done[r+1][c]) count+=path_count(n,done,r+1,c);
if((r-1)>=0 && !done[r-1][c]) count+=path_count(n,done,r-1,c);
if((c+1)<n && !done[r][c+1]) count+=path_count(n,done,r,c+1);
if((c-1)>=0 && !done[r][c-1]) count+=path_count(n,done,r,c-1);
}
done[r][c] = false;
return count;
}
Here if we define recurrence relation then it can be like: T(n) = 4T(n-1)+n2
Is this recurrence relation true? I don't think so because if we use masters theorem then it would give us result as O(4n*n2) and I don't think it can be of this order.
The reason, why I am telling, is this because when I use it for 7*7 matrix it takes around 110.09 seconds and I don't think for n=7 O(4n*n2) should take that much time.
If we calculate it for n=7 the approx instructions can be 47*77 = 802816 ~ 106. For such amount of instruction it should not take that much time. So here I conclude that my recurrene relation is false.
This code generates output as 111712 for 7 and it is same as the book's output. So code is right.
So what is the correct time complexity??
No, the complexity is not O(4^n * n^2).
Consider the 4^n in your notation. This means, going to a depth of at most n - or 7 in your case, and having 4 choices at each level. But this is not the case. In the 8th, level you still have multiple choices where to go next. In fact, you are branching until you find the path, which is of depth n^2.
So, a non tight bound will give us O(4^(n^2) * n^2). This bound however is far from being tight, as it assumes you have 4 valid choices from each of your recursive calls. This is not the case.
I am not sure how much tighter it can be, but a first attempt will drop it to O(3^(n^2) * n^2), since you cannot go from the node you came from. This bound is still far from optimal.

The time complexity of the map function

I heard the calculation amount of map function is O(1).
But I can't understand the reason.
If I understand your question correctly, O(1) is the complexity of accesing one item. Array.map() in JS passes the function the current value and iterates through all of them, and takes the return value of the function and inserts it into the new array.
Therefore, the function loops through every object in the array, having a complexity of O(n).
For example:
[1, 2, 3].map(function (item) { return item + 1; });
Said function takes one item at a time, accessing the array n times (3).
EDIT: Looks like I misunderstood your question, my bad.
The inbuilt map method shares the input iterable across the CPU cores. For an iterable of size n, the average running time would be Θ(n/num_cpu_cores)

Does initialising an auxiliary array to 0 count as n time complexity already?

very new to big O complexity and I was wondering if an algorithm where you have a given array, and you initialise an auxilary array with the same amount of indexes count as n time already, or do you just assume this is O(1), or nothing at all?
TL;DR: Ignore it
Long answer: This will depend on the rest of your algorithm as well as what you want to achieve. Typically you will do something useful with the array afterwards which does have at least the same time complexity as filling the array, so that array-filling does not contribute to the time complexity. Furthermore filling an array with 0 feels like something you do to initialize the array, so your "real" algorithm can work properly. But nevertheless there are some cases you could consider.
Please note that I use pseudocode in the following examples, I hope it's clear what the algorithm should do. Also note that all the examples don't do anything useful with the array. It's just to show my point.
Lets say you have following code:
A = Array[n]
for(i=0, i<n, i++)
A[i] = 0
print "Hello World"
Then obviously the runtime of your algorithm is highly dependent on the value of n and thus should be counted as linear complexity O(n)
On the other hand, if you have a much more complicated function, say this one:
A = Array[n]
for(i=0, i<n, i++)
A[i] = 0
for(i=0, i<n, i++)
for(j=n-1, j>=0, j--)
print "Hello World"
Then even if you take the complexity of filling the array into account, you will end with complexity of O(n^2+2n) which is equal to the class O(n^2), so it does not matter in this case.
The most interesting case is surely when you have different options to use as basic operation. Say we have the following code (someFunction being an arbitrary function):
A = Array[n*n]
for(i=0, i<n*n, i++)
A[i] = 0
for(i=0, i*i<n, i++)
someFunction(i)
Now it depends on what you choose as basic operation. Which one you choose is highly dependent on what you want to achieve. Let's say someFunction is a very cheap function (regarding time complexity) and accessing the array A is more expensive. Then you would propably go with O(n^2), since accessing the array is done n^2 times. If on the other hand someFunction is expensive compared to filling the array, you would propably choose this as base operation and go with O(sqrt(n)).
Please be aware that one could also come to the conclusion that since the first part (array-filling) is executed more often than the other part (someFunction) it does not matter which one of the operations will take longer time to finish, since at some point the array-filling will need longer time. Thus you could argue that the complexity has to be quadratic O(n^2) This may be right from a theoretical view. But in real life you usually will have an operation you want to count and don't care about the other operations.
Actually you could consider ignoring the array filling as well as taking it into account in all the examples I provided above, depending whether print or accessing the array is more expensive. But I hope in the first two examples it is obvious which one will add more runtime and thus should be considered as the basic operation.