Can we use a linear search instead of a binary search to find insertion position, without incurring any significant runtime penalty? - binary-search

While solving the problem on Leetcode https://leetcode.com/problems/find-median-from-data-stream
I came across an insertion sort approach. However, the pop quiz mentions using linear search instead of binary search. I wonder why is that, and what are the trade-offs?

My guess is if you are searching for number i, instead of searching from the beginning, you can start searching from the index i. But this only works well if the numbers do not contain a lot of duplicates.

The problem will cost O(n) both in linear search and binary search, so just do the easier linear search.

Related

Iterative deepening in minimax - sorting all legal moves, or just finding the PV-move then using MVV-LVA?

After reading the chessprogramming wiki and other sources, I've been confused about what the exact purpose of iterative deepening. My original understanding was the following:
It consisted of minimax search performed at depth=1, depth=2, etc. until reaching the desired depth. After a minimax search of each depth, sort the root-node moves according to the results from that search, to make for optimal move ordering in the next search with depth+1, so in the next deeper search,the PV-move is searched, then the next best move, then the next best move after that, and so on.
Is this correct? Doubts emerged when I read about MVV-LVA ordering, specifically about ordering captures, and additionally, using hash tables and such. For example, this page recommends a move ordering of:
PV-move of the principal variation from the previous iteration of an iterative deepening framework for the leftmost path, often implicitly done by 2.
Hash move from hash tables
Winning captures/promotions
Equal captures/promotions
Killer moves (non capture), often with mate killers first
Non-captures sorted by history heuristic and that like
Losing captures
If so, then what's the point of sorting the minimax from each depth, if only the PV-move is needed? On the other hand, if the whole point of ID is the PV-move, won't it be a waste to search from every single minimax depth up till desired depth just to calculate the PV-move of each depth?
What is the concrete purpose of ID, and how much computation does it save?
Correct me if I am wrong, but I think you are mixing 2 different concepts here.
Iterative deepening is mainly used to set a maximum search time for each move. The AI will go deeper and deeper, and then when the decided time is up it returns the move from the latest depth it finished searching. Since each increase in depth leads to exponentially longer search times, searching each depth from e.g. 1 to 12 take almost the same time as only searching with depth 12.
Sorting the moves is done to maximize the effect of alpha-beta pruning. If you want an optimal alpha-beta pruning you look at the best move first. Which is of course impossible to know beforehand, but the points you stated above is a good guess. Just make sure that the sorting algorithm doens't slow down your recursive function, and by that removing the effect from the alhpa-beta.
Hope this helps and that I understood your question correctly.

Best performance approach to find all combinations of numbers from a given set(>80 elements) to reach a given final sum

Before I am directed to go and keep searching instead of asking this general question, please understand my question in detail.
We have the algorithm that does it in pl sql. however it is not performing well when the set of numbers given has large number of elements. for example it works well when the set has around 22 elements. However after that the performance dies.
We are working with oracle database 12c and this combination of number searching is part of one of our applications and is pulled from oracle tables into associative arrays for finding combinations. example final sum required = 30
set of elements to choose from {1,2,4,6,7,2,8,10,5} and so forth.
My question in gist :
Is pl sql realistically suited to write such algo ? Should we be looking at another programming language/ technology/ server capacity/ tool to handle larger set of more than 80 elements ?
Oracle is not suitable for solving this problem because databases are not suited for it. In fact, I think this problem is an NP-complete problem, so there are no truly efficient solutions.
The approach in a database is to generate all possible combinations up to a certain size, and then filter down to the ones that match your sum. This is inherently an exponential algorithm. There may be some heuristic algorithms that come close to solving the problem, but this is an inherently hard problem.
Unless you can find some special condition to shrink the problem you will never solve it. Don't worry about the language implementation until you know this problem is even theoretically possible.
As others have mentioned, this problem grows exponentially. Solving it for 22 elements is not even close to solving it for 80.
A dynamic programming algorithm may be able to quickly find if there is one solution to a subset sum problem. But finding all solutions requires testing 2^80 sets.
2^80 = 1,208,925,819,614,629,174,706,176. That's 1.2e24.
That's a big number. Let's make a wildly optimistic assumption that a processor can test one billion sets a second. Buy a million of them and you can find your answer in about 38 years. Maybe a quantum computer can solve it more quickly some day.
It might help to explain exactly what you're trying to do. Unless there is some special condition, some way to eliminate most of the processing and avoid a brute-force solution, I don't see any hope for solving this problem. Perhaps this is a question for the Theoretical Computer Science site.

Linear & Binary search

What if a computer is superfast and has unlimited memory which searching operation is best to use or can we use any of our choice? (between linear & Binary search)
A linear search scans one item at a time, without jumping to any item. Time complexity is O(n).
Where as a binary search cut down your search to half as soon as you find middle of a sorted list. Time complexity is O(log n).
Note: A binary search will have the same performance as linear search if it's not sorted.
So Binary search is always better no matter how much computing power or space you have.
It depends. In general, if what you are searching is already sorted - use binary search otherwise use linear search.
Here is a great article for Linear vs Binary Search

How to extract semantic relatedness from a text corpus

The goal is to assess semantic relatedness between terms in a large text corpus, e.g. 'police' and 'crime' should have a stronger semantic relatedness than 'police' and 'mountain' as they tend to co-occur in the same context.
The simplest approach I've read about consists of extracting IF-IDF information from the corpus.
A lot of people use Latent Semantic Analysis to find semantic correlations.
I've come across the Lucene search engine: http://lucene.apache.org/
Do you think it is suitable to extract IF-IDF?
What would you recommend to do what I'm trying to do, both in terms of technique and software tools (with a preference for Java)?
Thanks in advance!
Mulone
Yes, Lucene gets TF-IDF data. The Carrot^2 algorithm is an example of a semantic extraction program built on Lucene. I mention it since, as a first step, they create a correlation matrix. Of course, you probably can build this matrix yourself easily.
If you deal with a ton of data, you may want to use Mahout for the harder linear algebra parts.
It is very easy if you have lucene index. For example to get correllation you can use simple formula count(term1 and term2)/ count(term1)* count(term2). Where count is hits from you search results. Moreover you can easility calculate other semntica metrics such as chi^2, info gain. All you need is to get formula and convert it to terms of count from Query

Optimization algorithm question

This may be a simple question for those know-how guys. But I cannot figure it out by myself.
Suppose there are a large number of objects that I need to select some from. Each object has two known variables: cost and benefit. I have a budget, say $1000. How could I find out which objects I should buy to maximize the total benefit within the given budget? I want a numeric optimization solution. Thanks!
Your problem is called the "knapsack problem". You can read more on the wikipedia page. Translating the nomenclature from your original question into that of the wikipedia article, your problem's "cost" is the knapsack problem's "weight". Your problem's "benefit" is the knapsack problem's "value".
Finding an exact solution is an NP-complete problem, so be prepared for slow results if you have a lot of objects to choose from!
You might also look into Linear Programming. From MathWorld:
Simplistically, linear programming is
the optimization of an outcome based
on some set of constraints using a
linear mathematical model.
Yes, as stated before, this is the knapsack problem and I would choose to use linear programming.
The key to this problem is storing data so that you do not need to recompute things more than once (if enough memory is available). There are two general ways to go about linear programming: top-down, and bottom - up. This one is a bottom up problem.
(in general) Find base case values, what is the most optimal object to select for a small case. Then build on this. If we allow ourselves to spend more money what is the best combination of objects for that small increment in money. Possibilities could be taking more of what you previously had, taking one new object and replacing the old one, taking another small object that will still keep you under your budget etc.
Like I said, the main idea is to not recompute values. If you follow this pattern, you will get to a high number and find that in order to buy X amount of dollars worth of goods, the best solution is combining what you had for two smaller cases.