Is this an example of logarithmic time complexity? - time-complexity

It is commonly said that an algorithm with a logarithmic time complexity O(log n) is one where doubling the inputs does not necessarily double the amount of work that is required. And often times, search algorithms are given as an example of algorithms with logarithmic complexity.
With this in mind, let’s say I have an function that takes an array of strings as the first argument, as well as an individual string as the second argument, and returns the index of the string within the array:
function getArrayItemIndex(array, str) {
let i = 0
for(let item of array) {
if(item === str) {
return i
}
i++
}
}
And lets say that this function is called as follows:
getArrayItemIndex(['John', 'Jack', 'James', 'Jason'], 'Jack')
In this instance, the function will not end up stepping through the entire array before it returns the index of 1. And similarly, if we were to double the items in the array so that it ends up being called as follows:
getArrayItemIndex(
[
'John',
'Jack',
'James',
'Jason',
'Jerome',
'Jameson',
'Jamar',
'Jabar'
],
'John'
)
...then doubling the items in the array would not have necessarily caused the running time of the function to double, seeing that it would have broken out of the loop and returned after the very first iteration. Because of this, is it then accurate to say that the getArrayItemIndex function has a logarithmic time complexity?

Not quite. What you have here is Linear Search. Its worst-ccase performance is Theta(n) since it has to check all the elements if the search target isn't in the list. What you have discovered is that its best-case performance is Theta(1) since the algorithm only has to run a few checks if you get lucky.
Binary search on pre-sorted arrays is an example of an O(log n) worst-case algorithm (the best case is still O(1)). It works like this:
Check the middle element. If it matches, return. Otherwise, if the element is too big, perform binary search on the first half of the array. If it's too big, perform binary search on the second half. Continue until you find the target or you run out of new elements to check.
In Binary Search, we never look at all the elements. That is the difference.

Related

Binary search start or end is target

Why is it that when I see example code for binary search there is never an if statement to check if the start of the array or end is the target?
import java.util.Arrays;
public class App {
public static int binary_search(int[] arr, int left, int right, int target) {
if (left > right) {
return -1;
}
int mid = (left + right) / 2;
if (target == arr[mid]) {
return mid;
}
if (target < arr[mid]) {
return binary_search(arr, left, mid - 1, target);
}
return binary_search(arr, mid + 1, right, target);
}
public static void main(String[] args) {
int[] arr = { 3, 2, 4, -1, 0, 1, 10, 20, 9, 7 };
Arrays.sort(arr);
for (int i = 0; i < arr.length; i++) {
System.out.println("Index: " + i + " value: " + arr[i]);
}
System.out.println(binary_search(arr, arr[0], arr.length - 1, -1));
}
}
in this example if the target was -1 or 20 the search would enter recursion. But it added an if statement to check if target is mid, so why not add two more statements also checking if its left or right?
EDIT:
As pointed out in the comments, I may have misinterpreted the initial question. The answer below assumes that OP meant having the start/end checks as part of each step of the recursion, as opposed to checking once before the recursion even starts.
Since I don't know for sure which interpretation was intended, I'm leaving this post here for now.
Original post:
You seem to be under the impression that "they added an extra check for mid, so surely they should also add an extra check for start and end".
The check "Is mid the target?" is in fact not a mere optimization they added. Recursively checking "mid" is the whole point of a binary search.
When you have a sorted array of elements, a binary search works like this:
Compare the middle element to the target
If the middle element is smaller, throw away the first half
If the middle element is larger, throw away the second half
Otherwise, we found it!
Repeat until we either find the target or there are no more elements.
The act of checking the middle is fundamental to determining which half of the array to continue searching through.
Now, let's say we also add a check for start and end. What does this gain us? Well, if at any point the target happens to be at the very start or end of a segment, we skip a few steps and end slightly sooner. Is this a likely event?
For small toy examples with a few elements, yeah, maybe.
For a massive real-world dataset with billions of entries? Hm, let's think about it. For the sake of simplicity, we assume that we know the target is in the array.
We start with the whole array. Is the first element the target? The odds of that is one in a billion. Pretty unlikely. Is the last element the target? The odds of that is also one in a billion. Pretty unlikely too. You've wasted two extra comparisons to speed up an extremely unlikely case.
We limit ourselves to, say, the first half. We do the same thing again. Is the first element the target? Probably not since the odds are one in half a billion.
...and so on.
The bigger the dataset, the more useless the start/end "optimization" becomes. In fact, in terms of (maximally optimized) comparisons, each step of the algorithm has three comparisons instead of the usual one. VERY roughly estimated, that suggests that the algorithm on average becomes three times slower.
Even for smaller datasets, it is of dubious use since it basically becomes a quasi-linear search instead of a binary search. Yes, the odds are higher, but on average, we can expect a larger amount of comparisons before we reach our target.
The whole point of a binary search is to reach the target with as few wasted comparisons as possible. Adding more unlikely-to-succeed comparisons is typically not the way to improve that.
Edit:
The implementation as posted by OP may also confuse the issue slightly. The implementation chooses to make two comparisons between target and mid. A more optimal implementation would instead make a single three-way comparison (i.e. determine ">", "=" or "<" as a single step instead of two separate ones). This is, for instance, how Java's compareTo or C++'s <=> normally works.
BambooleanLogic's answer is correct and comprehensive. I was curious about how much slower this 'optimization' made binary search, so I wrote a short script to test the change in how many comparisons are performed on average:
Given an array of integers 0, ... , N
do a binary search for every integer in the array,
and count the total number of array accesses made.
To be fair to the optimization, I made it so that after checking arr[left] against target, we increase left by 1, and similarly for right, so that every comparison is as useful as possible. You can try this yourself at Try it online
Results:
Binary search on size 10: Standard 29 Optimized 43 Ratio 1.4828
Binary search on size 100: Standard 580 Optimized 1180 Ratio 2.0345
Binary search on size 1000: Standard 8987 Optimized 21247 Ratio 2.3642
Binary search on size 10000: Standard 123631 Optimized 311205 Ratio 2.5172
Binary search on size 100000: Standard 1568946 Optimized 4108630 Ratio 2.6187
Binary search on size 1000000: Standard 18951445 Optimized 51068017 Ratio 2.6947
Binary search on size 10000000: Standard 223222809 Optimized 610154319 Ratio 2.7334
so the total comparisons does seem to tend to triple the standard number, implying the optimization becomes increasingly unhelpful for larger arrays. I'd be curious whether the limiting ratio is exactly 3.
To add some extra check for start and end along with the mid value is not impressive.
In any algorithm design the main concerned is moving around it's complexity either it is time complexity or space complexity. Most of the time the time complexity is taken as more important aspect.
To learn more about Binary Search Algorithm in different use case like -
If Array is not containing any repeated
If Array has repeated element in this case -
a) return leftmost index/value
b) return rightmost index/value
and many more point

Determine if List of Strings contains substring of all Strings in other list

I have this situation:
val a = listOf("wwfooww", "qqbarooo", "ttbazi")
val b = listOf("foo", "bar")
I want to determine if all items of b are contained in substrings of a, so the desired function should return true in the situation above. The best I can come up with is this:
return a.any { it.contains("foo") } && a.any { it.contains("bar") }
But it iterates over a twice. a.containsAll(b) doesn't work either because it compares on string equality and not substrings.
Not sure if there is any way of doing that without iterating over a the same amount as b.size. Because if you only want 1 iteration of a, you will have to check all the elements on b and now you are iterating over b a.size times plus, in this scenario, you also need to keep track of which item in b already had a match, and not check them again, which might be worse than just iterating over a, since you can only do that by either removing them from the list (or a copy, which you use instead of b), or by using another list to keep track of the matches, then compare that to the original b.
So I think that you are on the right track with your code there, but there are some issues. For example you don't have any reference to b, just hardcoded strings, and doing it like that for all elements in b will result in quite a big function if you have more than 2, or better yet, if you don't already know the values.
This code will do the same thing as the one you put above, but it will actually use elements from b, and not hardcoded strings that match b. (it will iterate over b b.size times, and partially over a b.size times)
return b.all { bItem ->
a.any { it.contains(bItem) }
}
Alex's answer is by far the simplest approach, and is almost certainly the best one in most circumstances.
However, it has complexity A*B (where A and B are the sizes of the two lists) — which means that it doesn't scale: if both lists get big, it'll get very slow.
So for completeness, here's a way that's more involved, and slower for the small cases, but has complexity proportional to A+B and so can cope efficiently with much larger lists.
The idea is to preprocess the a list, to generate a set of all the possible substrings, and then scan through the b list just checking for inclusion in that set.  (The preprocessing step takes time proportional* to A.  Converting the substrings into a set means that it can check whether a string is present in constant time, using its hash code; so the rest then takes time proportional to B.)
I think this is clearest using a helper function:
/**
* Generates a list of all possible substrings, including
* the string itself (but excluding the empty string).
*/
fun String.substrings()
= indices.flatMap { start ->
((start + 1)..length).map { end ->
substring(start, end)
}
}
For example, "1234".substrings() gives [1, 12, 123, 1234, 2, 23, 234, 3, 34, 4].
Then we can generate the set of all substrings of items from a, and check that every item of b is in it:
return a.flatMap{ it.substrings() }
.toSet()
.containsAll(b)
(* Actually, the complexity is also affected by the lengths of the strings in the a list.  Alex's version is directly proportional to the average length, while the preprocessing part of the algorithm above is proportional to its square (as indicated by the map nested in the flatMap).  That's not good, of course; but in practice while the lists are likely to get longer, the strings within them probably won't, so that's unlikely to be significant.  Worth knowing about, though.
And there are probably other, still more complex algorithms, that scale even better…)

How to effectively get the N lowest values from the collection (Top N) in Kotlin?

How to effectively get the N lowest values from the collection (Top N) in Kotlin?
Is there any other way besides collectionOrSequence.sortedby{it.value}.take(n)?
Assume I have a collection with +100500 elements and I need to found 10 lowest. I'm afraid that the sortedby will create new temporary collection which later will take only 10 items.
You could keep a list of the n smallest elements and just update it on demand, e.g.
fun <T : Comparable<T>> top(n: Int, collection: Iterable<T>): List<T> {
return collection.fold(ArrayList<T>()) { topList, candidate ->
if (topList.size < n || candidate < topList.last()) {
// ideally insert at the right place
topList.add(candidate)
topList.sort()
// trim to size
if (topList.size > n)
topList.removeAt(n)
}
topList
}
}
That way you only compare the current element of your list once to the largest element of the top n elements which would usually be faster than sorting the entire list https://pl.kotl.in/SyQPtDTcQ
If you're running on the JVM, you could use Guava's Comparators.least(int, Comparator), which uses a more efficient algorithm than any of these suggestions, taking O(n + k log k) time and O(k) memory to find the lowest k elements in a collection of size n, as opposed to zapl's algorithm (O(nk log k)) or Lior's (O(nk)).
You have more to worry about.
collectionOrSequence.sortedby{it.value} runs java.util.Arrays.sort, that will run timSort (or mergeSort if requested).
timSort is great, but usually ends by n*log(n) operations, which is much more than the O(n) of copying the array.
Each of the O(n*log.n) operations will run a function (the lambda you provided, {it.value}) --> an additional meaningful overhead.
Lastly, java.util.Arrays.sort will convert the collection to Array and back to a List - 2 additional conversions (which you wanted to avoid, but this is secondary)
The efficient way to do it is probably:
map the values for comparison into a list: O(n) conversions (once per element) rather than O(n*log.n) or more.
Iterate over the list (or Array) created to collect the N smallest elements in one pass
Keep a list of N smallest elements found so far and their index on the original list. If it is small (e.g. 10 items) - mutableList is a good fit.
Keep a variable holding the max value for the small element list.
When iterating over the original collection, compare the current element on the original list against the max value of the small values list. If smaller than it - replace it in the "small list" and find the updated max value in it.
Use the indexes from the "small list" to extract the 10 smallest elements of the original list.
That would allow you to go from O(n*log.n) to O(n).
Of course, if time is critical - it is always best to benchmark the specific case.
If you managed, on the first step, to extract primitives for the basis of comparison (e.g. int or long) - that would be even more efficient.
I suggest implementing your own sort method based on a typical quickSort algorithm(in descending order, and take the first N elements), if the collection has 1k+ values spread randomly.

The time complexity of the map function

I heard the calculation amount of map function is O(1).
But I can't understand the reason.
If I understand your question correctly, O(1) is the complexity of accesing one item. Array.map() in JS passes the function the current value and iterates through all of them, and takes the return value of the function and inserts it into the new array.
Therefore, the function loops through every object in the array, having a complexity of O(n).
For example:
[1, 2, 3].map(function (item) { return item + 1; });
Said function takes one item at a time, accessing the array n times (3).
EDIT: Looks like I misunderstood your question, my bad.
The inbuilt map method shares the input iterable across the CPU cores. For an iterable of size n, the average running time would be Θ(n/num_cpu_cores)

Calculating the expected running time of function

I have a question about calculating the expected running time of a given function. I understand just fine, how to calculate code fragments with cycles in them (for / while / if , etc.) but functions without them seems a bit odd to me. For example, lets say that we have the following code fragment:
public void Add(T item)
{
var newArr = new T[this.arr.Length + 1];
Array.Copy(this.arr, newArr, this.arr.Length);
newArr[newArr.Length - 1] = item;
this.arr = newArr;
}
If my logic works correctly, the function Add has a complexity of O(1), because in the best/worst/average case it will just read every line of code once, right?
You always have to consider the time complexity of the function calls, too. I don't know how Array.Copy is implemented, but I'm going to guess it's O(N), making the whole Add function O(N) as well. Your intuition is right, though - the rest of it is in fact O(1).
If you have multiple sub-operations with O(n) + O(log(n)) etc and the costliest step is the cost of the whole operation - by default big O refers to the worst case. Here as you copy the array, it is an O(n) operation
Complexity is calculated following this 2 rules :
-Calling a method (complexity+ 1)
-Encountering the following keywords : if, while, repeat, for, &&, ||, catch, case, etc … (complexity+ 1)
In your case , given you are trying to copy an array and not a single value , the algorithm will complete N copy operations giving you an O(N) operation.