Conditions for binary search to succeed in finding nearest neighbor on array of 3D points? - binary-search

I have a ordered array of 3D points. The points represent a path in 3D space.
Given an arbitrary point I want to find the nearest point on the path.
If the path was relatively straight this would be trivial application of binary search, but since the path can have arbitrary curvature(looping back on itself) binary search may fail to find the nearest point.
My question is as follows:
What is the least strict constraint under which binary search will succeed to find the nearest point? Is it monotomic in each dimension? Is it related to the paths curvature? etc...

It depends a little on whether your path is given or whether you are free to use any path you like.
Let's assume your path is given.
To answer your question: A simple binary search cannot be guaranteed to find the closest point. Imagine your path is a circle that is cut open at one place. The first and last point of your curve (the circle) will always be very close, but no binary search can fix that. As #Yann Vernier suggested, you can use spatial searches for that, search for "nearest neighbor query", these can usually be done with spatial indexes such as kd-tree, quadtree or R-Tree. You can find Java implementations here.
In case the path is not predefined, you can order the points with a z-curve (morton order) or Hilbert curve (the curves being your path). This gives you a linear ordering which can searched with a binary search. This does not always give the closest point, but it is very fast, space efficient and will often give you the closest point. Hilbert curve is more likely to give you the closest point than z-curve, but it is harder to calculate.

Related

Verifying nearest neighbors algorithm is working correctly

I am implementing a complex algorithm for determining the n-nearest neighbors in a high dimensional embedding space from a paper I found online. After I finished the implementation, I wanted to check the results to make sure the code did indeed return the desired n-nearest neighbors. In order to to do so, I check to see if the results are equal to a brute-force search across every element in the embedding space and find the n-nearest neighbors.
The issue arises when there are multiple elements with the same distance from the query input.
For example, if I am checking for the 3-nearest neighbors, and I have four points, one of which is closest and the other 3 all equidistant from the search key, one element will necessarily be left out. I'd like to test to ensure that the two implementations are roughly the same, and I am not interested in the exact details of which elements are left out. As a result, I can't just do an element-wise equality check across the complex algorithm and the brute-force solution.
For business reasons, it is actually helpful if the element left out is random, because I want the end user to see a variety of results, as long as all results are equally relevant. I specifically do not want a stable-ordering on the results.
Is there an off-the-shelf solution for this problem? I am implementing this code in Python, but the solution can be language agnostic.

Finding the nearest location to a test point

I have about 2000+ sets of geographical coordinates (lat, long). Given one coordinate I want to find the closest one from that set. My approach was to measure the distance but hundreds of requests per second can be a little rough to the server doing all that math.
What is the best-optimized solution for this?
The problem you’re describing here is called a nearest neighbor search and there are lots of good data structures that support fast nearest neighbor lookups. The k-d tree is a particularly simple and fast choice and there are many good libraries out there that you can use. You can also look into alternatives like vantage-point trees or quadtrees if you’d like.
Hope this helps!

Parameter Estimation to Minimize Runtime

Suppose, I an algorithm, whose runtime depends on two parameters. I want to find the best set of parameters that minimizes the runtime. The two parameters are continuous double values in the range of 0 to INFINITY.
Therefore, for two parameters a,b: I want to find the best values of a and b that minimize the runtime. I think this is pretty standard practice, but I could not find good literature on this. I found few literature such as MLE, Least Squares, etc. but they talk about distribution.
First use your brains to understand the possible functional relationship between those parameters and the running time, in a qualitative way. This means having a first idea on the number and positions of possible maxima, smoothness of the function, asymptotic behavior and any other clue that you can find.
Then make up your mind about a reasonable range of values where it makes sense to sample the function values. If those ranges are very wide, it is preferable to sample using a geometric progression rather than arithmetic (say, powers of 2).
Then measure and observe the function values with a graphical viewer and confirm your intuitions. It is likely that this will be enough to spot the gross location of the absolute maximum. Finding an accurate position might be useless if it gives you the last percents of improvement. It is also very likely that the location of the optimum will depend on the particular dataset, making accurate location even less useful.

What are some practical applications of taking the vertical sum of a binary tree

I came across this interview question and was wondering why you would want to take the vertical sum of a binary tree. Is this algorithm useful?
http://www.careercup.com/question?id=2375661
For a balanced tree the vertical sum could give you a rough insight into the range of the data. Binary trees, although easier to code, can take on more pathological shapes depending on the order in which the data is inserted. The vertical sum would be a good indicator of that pathology.
Look at the code at vertical sum in binary tree. This algorithm is written assuming a max width for the tree. Using this algorithm you will be able to get a feel for different types of unbalanced trees.
An interesting variation of this program would be to use permutations of a fixed data set to build binary trees and look at the various vertical sums. The leading and trailing zeroes give you a feel for how the tree is balanced, and the vertical sums can give you insight into how data arrival order can affect the height of the tree (And the average access time for the data in the tree). An internet search will return an implementation of this algorithm using dynamic data structures. With these I think you would want to document which sum included the root node.
Your question "Is this algorithm useful?" actually begs the question of how useful is a binary tree compared to a balanced tree. The vertical sum of a tree documents whether the implementation is closer to O(N) or O(log N). Here is an article on [balanced binary trees][3]. Put a balanced tree implementation in your personal toolkit, and try to remember if you would use a pre-order, in-order, or post-order traversal of the tree to calculate your vertical sum. You'll get an A+ for this question.

Binary Search Tree Density?

I am working on a homework assignment that deals with binary search trees and I came across a question that I do not quite understand. The question asks how density affects the time it takes to search a binary tree. I understand binary search trees and big-O notation, but we have never dealt with density before.
Density of a binary search tree can be defined as the number of nodes cumulative to a level. A perfect binary tree would have the highest density. So the question basically asks you about how the number of nodes at each level effect the searching time in the tree. Let me know if that's not clear.