Linear & Binary search - binary-search

What if a computer is superfast and has unlimited memory which searching operation is best to use or can we use any of our choice? (between linear & Binary search)

A linear search scans one item at a time, without jumping to any item. Time complexity is O(n).
Where as a binary search cut down your search to half as soon as you find middle of a sorted list. Time complexity is O(log n).
Note: A binary search will have the same performance as linear search if it's not sorted.
So Binary search is always better no matter how much computing power or space you have.

It depends. In general, if what you are searching is already sorted - use binary search otherwise use linear search.

Here is a great article for Linear vs Binary Search

Related

Can we use a linear search instead of a binary search to find insertion position, without incurring any significant runtime penalty?

While solving the problem on Leetcode https://leetcode.com/problems/find-median-from-data-stream
I came across an insertion sort approach. However, the pop quiz mentions using linear search instead of binary search. I wonder why is that, and what are the trade-offs?
My guess is if you are searching for number i, instead of searching from the beginning, you can start searching from the index i. But this only works well if the numbers do not contain a lot of duplicates.
The problem will cost O(n) both in linear search and binary search, so just do the easier linear search.

Iterative deepening in minimax - sorting all legal moves, or just finding the PV-move then using MVV-LVA?

After reading the chessprogramming wiki and other sources, I've been confused about what the exact purpose of iterative deepening. My original understanding was the following:
It consisted of minimax search performed at depth=1, depth=2, etc. until reaching the desired depth. After a minimax search of each depth, sort the root-node moves according to the results from that search, to make for optimal move ordering in the next search with depth+1, so in the next deeper search,the PV-move is searched, then the next best move, then the next best move after that, and so on.
Is this correct? Doubts emerged when I read about MVV-LVA ordering, specifically about ordering captures, and additionally, using hash tables and such. For example, this page recommends a move ordering of:
PV-move of the principal variation from the previous iteration of an iterative deepening framework for the leftmost path, often implicitly done by 2.
Hash move from hash tables
Winning captures/promotions
Equal captures/promotions
Killer moves (non capture), often with mate killers first
Non-captures sorted by history heuristic and that like
Losing captures
If so, then what's the point of sorting the minimax from each depth, if only the PV-move is needed? On the other hand, if the whole point of ID is the PV-move, won't it be a waste to search from every single minimax depth up till desired depth just to calculate the PV-move of each depth?
What is the concrete purpose of ID, and how much computation does it save?
Correct me if I am wrong, but I think you are mixing 2 different concepts here.
Iterative deepening is mainly used to set a maximum search time for each move. The AI will go deeper and deeper, and then when the decided time is up it returns the move from the latest depth it finished searching. Since each increase in depth leads to exponentially longer search times, searching each depth from e.g. 1 to 12 take almost the same time as only searching with depth 12.
Sorting the moves is done to maximize the effect of alpha-beta pruning. If you want an optimal alpha-beta pruning you look at the best move first. Which is of course impossible to know beforehand, but the points you stated above is a good guess. Just make sure that the sorting algorithm doens't slow down your recursive function, and by that removing the effect from the alhpa-beta.
Hope this helps and that I understood your question correctly.

Binary Search Tree Density?

I am working on a homework assignment that deals with binary search trees and I came across a question that I do not quite understand. The question asks how density affects the time it takes to search a binary tree. I understand binary search trees and big-O notation, but we have never dealt with density before.
Density of a binary search tree can be defined as the number of nodes cumulative to a level. A perfect binary tree would have the highest density. So the question basically asks you about how the number of nodes at each level effect the searching time in the tree. Let me know if that's not clear.

Where can I find several significant sorting algorithms tests cases?

I want to develop a very efficient sorting algorithm based on some ideas that I have. The problem is that I want to test my algorithm's efficiency against the majority highly appreciated sorting algorithms that already exist.
Ideally I would like to find:
a large bunch of sorting tests that are SIGNIFICANT for providing me with the efficiency of my algorithm
a large set of already existing and strongly-optimized sorting algorithms (with their code - no matter the language)
even better, software that provides adequate environment for sorting algorithms developers
Here's a post that I found earlier which contains 2 tables with comparisons between timsort, quicksort, dual-pivot quicksort and java 6 sort: http://blog.quibb.org/2009/10/sorting-algorithm-shootout/
I can see in those tables that those TXT files (starting from 1245.repeat.1000.txt on to sequential.10000000.txt) contain the test cases for those algorithms, but I can't find the original TXT's anywhere!
Can anyone point me to any link with many sorting test-cases AND/OR many HIGHLY EFFICIENT sorting algorithms? (it's the test cases I am interested in the most, sorting algorithms are all over the internet)
Thank you very much in advance!
A few things:
Quicksort goes nuts on forward- and reverse sorted lists so it will need other list types.
Testing on random data is fine, but if you want to compare the performance of different algorithms that means you cannot generate new random data every time or your results won't be reliable. I think you should try to come up with a pseudo"random" algorithm that writes data in in an order that is based on the number of entries. That way the data generated for lists of size n, 10n and 100n will be similar.
Testing of sorting is not primarily about speed (until an algorithm has been finalized) but the ratio of comparisons to entries. If one sort requires 15 comparisons per entry in a list and another 12 for the same list the second will be more efficient even if it executed in twice the time. For the more trivial sorting concepts the number of exchanges necessary will also come into play.
For testing use a vector of integers in RAM. If the algorithm works well the vector of integers can be translated to a vector of indeces into a buffer containing data to be compared. Such an algorithm would sort the vector of indeces based on the data they point to.

How to extract semantic relatedness from a text corpus

The goal is to assess semantic relatedness between terms in a large text corpus, e.g. 'police' and 'crime' should have a stronger semantic relatedness than 'police' and 'mountain' as they tend to co-occur in the same context.
The simplest approach I've read about consists of extracting IF-IDF information from the corpus.
A lot of people use Latent Semantic Analysis to find semantic correlations.
I've come across the Lucene search engine: http://lucene.apache.org/
Do you think it is suitable to extract IF-IDF?
What would you recommend to do what I'm trying to do, both in terms of technique and software tools (with a preference for Java)?
Thanks in advance!
Mulone
Yes, Lucene gets TF-IDF data. The Carrot^2 algorithm is an example of a semantic extraction program built on Lucene. I mention it since, as a first step, they create a correlation matrix. Of course, you probably can build this matrix yourself easily.
If you deal with a ton of data, you may want to use Mahout for the harder linear algebra parts.
It is very easy if you have lucene index. For example to get correllation you can use simple formula count(term1 and term2)/ count(term1)* count(term2). Where count is hits from you search results. Moreover you can easility calculate other semntica metrics such as chi^2, info gain. All you need is to get formula and convert it to terms of count from Query