When sequencial search is better than binary search? - binary-search

I know that:
A linear search looks down a list, one item at a time, without jumping. In complexity terms this is an O(n) search - the time taken to search the list gets bigger at the same rate as the list does.
A binary search is when you start with the middle of a sorted list, and see whether that's greater than or less than the value you're looking for, which determines whether the value is in the first or second half of the list. Jump to the half way through the sublist, and compare again etc.
Is there a case where the sequencial/linear search becomes more eficient than Binary Search ?

Yes, e.g. when the item you are looking for happens to be one of the first to be looked at in a sequential search.

Related

How to calculate the worst case time for binary search for a key that appears twice in the sorted array?

What would be the worst case time complexity for finding a key that appears twice in the sorted array using binary search? I know that the worst case time complexity for binary search on a sorted array is O(log n). So, in the case that the key appears more than once the time complexity should be lesser than O(log n). However, I am not sure how to calculate this.
In the worst case the binary search needs to perform ⌊log_2(n) + 1⌋ iterations to find the element or to conclude that the element is not in the array.
By having a duplicate you might just need one step less.
For instance, suppose your duplicate elements appear in the first and second indices of the array (same if they are in the last and one before the last).
In such a case you would have ⌊log_2(n)⌋ comparisons, thus, still O(log(n)) as a worst case time complexity.

What would be the binary search complexity to find second largest number in array

Can someone explain how to calculate the binary search complexity to find second largest number in array.
Binary search is done on a sorted array.
If you already have a sorted array, why do you need to do anything at all?
The second to last number in the array (sorted in ascending order) would be the second largest number.(O(1))
If the array contains duplicates:
For example,
{0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,... }
The time complexity would be O(log n) where n is the number of elements in the array.
The smallest number is the one at index 0 (call it x), now you can use binary search to find the array bounds within which all elements are equal to x. The immediate neighbour outside these bounds would be the second largest number in the array.
If you are using C++, you can use this method to get the upper_bound.
Binary search for an element with any given property is always logarithmic, provided that you can determine in constant time whether that property holds.
If the array can’t contain duplicates, you don’t need a binary search and the complexity is constant.
One way to do it in python efficiently can be to convert list[which allows duplicates] to set[which does not allow duplicates] almost in O(1) time and then fetching item at index[-2] again in O(1) time, assuming that as it is binary search list would be sorted in ascending order.

Is binary search for an ordered list O(logN) in Elixir?

For an ordered list, the binary search time complexity is O(logN). However in Elixir, the list is linked list, so in order to get the middle element of the list, you have to iterate N/2 times, which make the overall search O(NLogN).
So my question is:
Is above time complexity correct?
If it's correct, the binary search wouldn't make sense in Elixir, right? You have to iterate the list to get what you want, so the best is O(N).
Yes, there is little reason to binary search over a linked list because of the reason you stated. You need a random access data structure (usually an array) for binary search to be useful.
An interesting corner case might arise where the comparison of the elements is very costly, because for example they are just handles to remotely stored items. In that case binary search through a linked list might still outperform linear search, because while requiring more operations (O(N * log(N))) it requires less comparisons (O(log(N))) while linear search requires O(N) comparisons.

Elasticsearch - higher scoring if higher frequency of term

I have 2 documents, and am searching for the keyword "Twitter". Suppose both documents are blog posts with a "tags" field.
Document A has ONLY 1 term in the "tags" field, and it's "Twitter".
Document B has 100 terms in the "tags" field, but 3 of them is "Twitter".
Elastic Search gives the higher score to Document A even though Document B has a higher frequency. But the score is "diluted" because it has more terms. How do I give Document B a higher score, since it has a higher frequency of the search term?
I know ElasticSearch/Lucene performs some normalization based on the number of terms in the document. How can I disable this normalization, so that Document B gets a higher score above?
As the other answer says it would be interesting to see whether you have the same result on a single shard. I think you would and that depends on the norms for the tags field, which is taken into account when computing the score using the tf/idf similarity (default).
In fact, lucene does take into account the term frequency, in other words the number of times the term appears within the field (1 or 3 in your case), and the inverted document frequency, in other words how the term is frequent in the index, in order to compare it with other terms in the query (in your case it doesn't make any difference if you are searching for a single term).
But there's another factor called norms, that rewards shorter fields and take into account eventual index time boosting, which can be per field (in the mapping) or even per document. You can verify that norms are the reason of your result enabling the explain option in your search request and looking at the explain output.
I guess the fact that the first document contains only that tag makes it more important that the other ones that contains that tag multiple times but a lot of ther tags as well. If you don't like this behaviour you can just disable norms in your mapping for the tags field. It should be enabled by default if the field is "index":"analyzed" (default). You can either switch to "index":"not_analyzed" if you don't want your tags field to be analyzed (it usually makes sense but depends on your data and domain) or add the "omit_norms": true option in the mapping for your tags field.
Are the documents found on different shards? From Elastic search documentation:
"When a query is executed on a specific shard, it does not take into account term frequencies and other search engine information from the other shards. If we want to support accurate ranking, we would need to first execute the query against all shards and gather the relevant term frequencies, and then, based on it, execute the query."
The solution is to specify the search type. Use dfs_query_and_fetch search type to execute an initial scatter phase which goes and computes the distributed term frequencies for more accurate scoring.
You can read more here.

Permutation, Even algo and reverse elimination

I coded a function that implement Even's alogrithm to find all permuations of a increasing sorted vector. But I don't need the "reverse" route, i.e the route that is the same when you read it starting at the end. So far, I "rewind" and compare all my permutation and eliminate the "reverse" route but it takes me half of my runing time to reverse, so is there a way to adapt the algorithm to get only half the permutation but with no reverse one ?
OK, I've found the solution, Indeed , if you have, as I had, a sorted list of consecutive number, when your originally first number becomes the last and last becomes the first, you start to create 'reverse' permutation, i.e you obtain the same list as you have before bif you read in reverse way.
So, the condition, if originally first is last AND originally last is first, break, is efficient and time sparing.