How to calculate the worst case time for binary search for a key that appears twice in the sorted array? - time-complexity

What would be the worst case time complexity for finding a key that appears twice in the sorted array using binary search? I know that the worst case time complexity for binary search on a sorted array is O(log n). So, in the case that the key appears more than once the time complexity should be lesser than O(log n). However, I am not sure how to calculate this.

In the worst case the binary search needs to perform ⌊log_2(n) + 1⌋ iterations to find the element or to conclude that the element is not in the array.
By having a duplicate you might just need one step less.
For instance, suppose your duplicate elements appear in the first and second indices of the array (same if they are in the last and one before the last).
In such a case you would have ⌊log_2(n)⌋ comparisons, thus, still O(log(n)) as a worst case time complexity.

Related

Worst case time complexity of edit distance?

I am trying to calculate the worst case scenario time complexity for finding the edit distance from T test words to D dictionary words, where all words have a length MAX_LEN.
Worst time complexity can be exponential, O(3^MAX_LEN) when using a naive recursive solution. The worst-case happens when none of the characters of two strings match.

Time Complexity of 1-pass lookup given input size N**2

Given a list of lists, i.e.
[[1,2,3],[4,5,6],[7,8,9]]:
What is the time complexity of using nested For loops to see if each numeral from 1-9 is used once and only once? Furthermore, what would be the time complexity if the input is now a singular combined list, i.e. [1,2,3,4,5,6,7,8,9]?
What really matters is the size of the input, not the format. Either you have a list of 9 elements or 9 lists with 1 element, you still have 9 elements to be checked in the worst case.
The answer to the question, as stated, would be O(1), because you have a constant size input.
If what you mean is something like Given N elements what is the time complexity of checking if all number between 1 and N are present, then it would take linear time, i.e., O(N).
Indeed, an option is to use a hash table (e.g., a python set) and check if the element is already in the set, if not adding it. Note that in using this specific option you would get an expected (but not guaranteed, due to potential collisions) linear time complexity algorithm.

What would be the binary search complexity to find second largest number in array

Can someone explain how to calculate the binary search complexity to find second largest number in array.
Binary search is done on a sorted array.
If you already have a sorted array, why do you need to do anything at all?
The second to last number in the array (sorted in ascending order) would be the second largest number.(O(1))
If the array contains duplicates:
For example,
{0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,... }
The time complexity would be O(log n) where n is the number of elements in the array.
The smallest number is the one at index 0 (call it x), now you can use binary search to find the array bounds within which all elements are equal to x. The immediate neighbour outside these bounds would be the second largest number in the array.
If you are using C++, you can use this method to get the upper_bound.
Binary search for an element with any given property is always logarithmic, provided that you can determine in constant time whether that property holds.
If the array can’t contain duplicates, you don’t need a binary search and the complexity is constant.
One way to do it in python efficiently can be to convert list[which allows duplicates] to set[which does not allow duplicates] almost in O(1) time and then fetching item at index[-2] again in O(1) time, assuming that as it is binary search list would be sorted in ascending order.

Is binary search for an ordered list O(logN) in Elixir?

For an ordered list, the binary search time complexity is O(logN). However in Elixir, the list is linked list, so in order to get the middle element of the list, you have to iterate N/2 times, which make the overall search O(NLogN).
So my question is:
Is above time complexity correct?
If it's correct, the binary search wouldn't make sense in Elixir, right? You have to iterate the list to get what you want, so the best is O(N).
Yes, there is little reason to binary search over a linked list because of the reason you stated. You need a random access data structure (usually an array) for binary search to be useful.
An interesting corner case might arise where the comparison of the elements is very costly, because for example they are just handles to remotely stored items. In that case binary search through a linked list might still outperform linear search, because while requiring more operations (O(N * log(N))) it requires less comparisons (O(log(N))) while linear search requires O(N) comparisons.

Time Complexity confusion

Ive always been a bit confused on this, possibly due to my lack of understanding in compilers. But lets use python as an example. If we had some large list of numbers called numlist and wanted to get rid of any duplicates, we could use a set operator on the list, example set(numlist). In return we would have a set of our numbers. This operation to the best of my knowledge will be done in O(n) time. Though if I were to create my own algorithm to handle this operation, the absolute best I could ever hope for is O(n^2).
What I don't get is, what allows a internal operation like set() to be so much faster then an external to the language algorithm. The checking still needs to be done, don't they?
You can do this in Θ(n) average time using a hash table. Lookup and insertion in a hash table are Θ(1) on average . Thus, you just run through the n items and for each one checking if it is already in the hash table and if not inserting the item.
What I don't get is, what allows a internal operation like set() to be so much faster then an external to the language algorithm. The checking still needs to be done, don't they?
The asymptotic complexity of an algorithm does not change if implemented by the language implementers versus being implemented by a user of the language. As long as both are implemented in a Turing complete language with random access memory models they have the same capabilities and algorithms implemented in each will have the same asymptotic complexity. If an algorithm is theoretically O(f(n)) it does not matter if it is implemented in assembly language, C#, or Python on it will still be O(f(n)).
You can do this in O(n) in any language, basically as:
# Get min and max values O(n).
min = oldList[0]
max = oldList[0]
for i = 1 to oldList.size() - 1:
if oldList[i] < min:
min = oldList[i]
if oldList[i] > max:
max = oldList[i]
# Initialise boolean list O(n)
isInList = new boolean[max - min + 1]
for i = min to max:
isInList[i] = false
# Change booleans for values in old list O(n)
for i = 0 to oldList.size() - 1:
isInList[oldList[i] - min] = true
# Create new list from booleans O(n) (or O(1) based on integer range).
newList = []
for i = min to max:
if isInList[i - min]:
newList.append (i)
I'm assuming here that append is an O(1) operation, which it should be unless the implementer was brain-dead. So with k steps each O(n), you still have an O(n) operation.
Whether the steps are explicitly done in your code or whether they're done under the covers of a language is irrelevant. Otherwise you could claim that the C qsort was one operation and you now have the holy grail of an O(1) sort routine :-)
As many people have discovered, you can often trade off space complexity for time complexity. For example, the above only works because we're allowed to introduce the isInList and newList variables. If this were not allowed, the next best solution may be sorting the list (probably no better the O(n log n)) followed by an O(n) (I think) operation to remove the duplicates.
An extreme example, you can use that same extra-space method to sort an arbitrary number of 32-bit integers (say with each only having 255 or less duplicates) in O(n) time, provided you can allocate about four billion bytes for storing the counts.
Simply initialise all the counts to zero and run through each position in your list, incrementing the count based on the number at that position. That's O(n).
Then start at the beginning of the list and run through the count array, placing that many of the correct value in the list. That's O(1), with the 1 being about four billion of course but still constant time :-)
That's also O(1) space complexity but a very big "1". Typically trade-offs aren't quite that severe.
The complexity bound of an algorithm is completely unrelated to whether it is implemented 'internally' or 'externally'
Taking a list and turning it into a set through set() is O(n).
This is because set is implemented as a hash set. That means that to check if something is in the set or to add something to the set only takes O(1), constant time. Thus, to make a set from an iterable (like a list for example), you just start with an empty set and add the elements of the iterable one by one. Since there are n elements and each insertion takes O(1), the total time of converting an iterable to a set is O(n).
To understand how the hash implementation works, see the wikipedia artcle on hash tables
Off hand I can't think of how to do this in O(n), but here is the cool thing:
The difference between n^2 and n is sooo massive that the difference between you implementing it and python implementing is tiny compared to the algorithm used to implement it. n^2 is always worse than O(n), even if the n^2 one is in C and the O(n) one is in python. You should never think that kind of difference comes from the fact that you're not writing in a low level language.
That said, if you want to implement your own, you can do a sort then remove dups. the sort is n*ln(n) and the remove dups in O(n)...
There are two issues here.
Time complexity (which is expressed in big O notation) is a formal measure of how long an algorithm takes to run for a given set size. It's more about how well an algorithm scales than about the absolute speed.
The actual speed (say, in milliseconds) of an algorithm is the time complexity multiplied by a constant (in an ideal world).
Two people could implement the same removal of duplicates algorithm with O(log(n)*n) complexity, but if one writes it in Python and the other writes it in optimised C, then the C program will be faster.