I'm working with billions of rows of data, and each row has an associated start latitude/longitude, and end latitude/longitude. I need to calculate the distance between each start/end point - but it is taking an extremely long time.
I really need to make what I'm doing more efficient.
Currently I use a function (below) to calculate the hypotenuse between points. Is there some way to make this more efficient?
I should say that I have already tried casting the lat/longs as spatial geographies and using SQL built in STDistance() functions (not indexed), but this was even slower.
Any help would be much appreciated. I'm hoping there is some way to speed up the function, even if it degrades accuracy a little (nearest 100m is probably ok).
Thanks in advance!
DECLARE #l_distance_m float
, #l_long_start FLOAT
, #l_long_end FLOAT
, #l_lat_start FLOAT
, #l_lat_end FLOAT
, #l_x_diff FLOAT
, #l_y_diff FLOAT
SET #l_lat_start = #lat_start
SET #l_long_start = #long_start
SET #l_lat_end = #lat_end
SET #l_long_end = #long_end
-- NOTE 2 x PI() x (radius of earth) / 360 = 111
SET #l_y_diff = 111 * (#l_lat_end - #l_lat_start)
SET #l_x_diff = 111 * (#l_long_end - #l_long_start) * COS(RADIANS((#l_lat_end + #l_lat_start) / 2))
SET #l_distance_m = 1000 * SQRT(#l_x_diff * #l_x_diff + #l_y_diff * #l_y_diff)
RETURN #l_distance_m
I haven't done any SQL programming since around 1994, however I'd make the following observations:The formula that you're using is a formula that works as long as the distances between your coordinates doesn't get too big. It'll have big errors for working out the distance between e.g. New York and Singapore, but for working out the distance between New York and Boston it should be fine to within 100m.I don't think there's any approximation formula that would be faster, however I can see some minor implementation improvements that might speed it up such as (1) why do you bother to assign #l_lat_start from #lat_start, can't you just use #lat_start directly (and same for #long_start, #lat_end, #long_end), (2) Instead of having 111 in the formulas for #l_y_diff and #l_x_diff, you could get rid of it there hence saving a multiplication, and instead of 1000 in the formula for #l_distance_m you could have 111000, (3) using COS(RADIANS(#l_lat_end)) or COS(RADIANS(#l_lat_start)) won't degrade the accuracy as long as the points aren't too far away, or if the points are all within the same city you could just work out the cosine of any point in the cityApart from that, I think you'd need to look at other ideas such as creating a table with the results, and whenever points are added/deleted from the table, updating the results table at that time.
Related
I keep stumbling into game/simulation solutions for finding distance while time is running, and it's not what I'm looking for.
I'm looking for an O(1) formula to calculate the (0 or 1 or 2) clock time(s) in which two circles are exactly r1+r2 distance from each other. Negative time is possible. It's possible two circles don't collide, and they may not have an intersection (as in 2 cars "clipping" each other while driving too close to the middle of the road in opposite directions), which is messing up all my mx+b solutions.
Technically, a single point collision should be possible.
I'm about 100 lines of code deep, and I feel sure there must be a better way, and I'm not even sure whether my test cases are correct or not. My initial setup was:
dist( x1+dx1*t, y1+dy1*t, x2+dx2*t, y2+dy2*t ) == r1+r2
By assuming the distance at any time t could be calculated with Pythagoras, I would like to know the two points in time in which the distance from the centers is precisely the sum of the radii. I solved for a, b, and c and applied the quadratic formula, and I believe that if I'm assuming they were phantom objects, this would give me the first moment of collision and the final moment of collision, and I could assume at every moment between, they are overlapping.
I'm working under the precondition that it's impossible for 2 objects to be overlapping at t0, which means infinite collision of "stuck inside each other" is not possible. I'm also filtering out and using special handling for when the slope is 0 or infinite, which is working.
I tried calculating the distance when, at the moment object 1 is at the intersection point, it's distance from object 2, and likewise when o2 is at the intersection point, but this did not work as it's possible to have collision when they are not at their intersection.
I'm having problems for when the slopes are equal, but different magnitude.
Is there a simple physics/math formula for this already?
Programming language doesn't matter, pseudcode would be great, or any math formula that doesn't have complex symbols (I'm not a math/physics person)... but nothing higher order (I assume python probably has a collide(p1, p2) method already)
There is a simple(-ish) solution. You already mentioned using the quadratic formula which is a good start.
First define your problem where the quadratic formula can be useful, in this case, distance between to centers, over time.
Let's define our time as t
Because we are using two dimensions we can call our dimensions x & y
First let's define the two center points at t = 0 of our circles as a & b
Let's also define our velocity at t = 0 of a & b as u & v respectively.
Finally, assuming a constant acceleration of a & b as o & p respectively.
The equation for a position along any one dimension (which we'll call i) with respect to time t is as follows: i(t) = 1 / 2 * a * t^2 + v * t + i0; with a being constant acceleration, v being initial velocity, and i0 being initial position along dimension i.
We know the distance between two 2D points at any time t is the square root of ((a.x(t) - b.x(t))^2 + (a.y(t) - b.y(t))^2)
Using the formula of position along a dimensions we can substitute everything in the distance equation in terms of just t and the constants we defined earlier. For shorthand we will call the function d(t);
Finally using that equation, we will know that the t values where d(t) = a.radius + b.radius are where collision starts or ends.
To put this in terms of quadratic formula we move the radius to the left so we get d(t) - (a.radius + b.radius) = 0
We can then expand and simplify the resulting equation so everything is in terms of t and the constant values that we were given. Using that solve for both positive & negative values with the quadratic formula.
This will handle errors as well because if you get two objects that will never collide, you will get an undefined or imaginary number.
You should be able to translate the rest into code fairly easily. I'm running out of time atm and will write out a simple solution when I can.
Following up on #TinFoilPancakes answer and heavily using using WolframAlpha to simplify the formulae, I've come up with the following pseudocode, well C# code actually that I've commented somewhat:
The Ball class has the following properties:
public double X;
public double Y;
public double Xvel;
public double Yvel;
public double Radius;
The algorithm:
public double TimeToCollision(Ball other)
{
double distance = (Radius + other.Radius) * (Radius + other.Radius);
double a = (Xvel - other.Xvel) * (Xvel - other.Xvel) + (Yvel - other.Yvel) * (Yvel - other.Yvel);
double b = 2 * ((X - other.X) * (Xvel - other.Xvel) + (Y - other.Y) * (Yvel - other.Yvel));
double c = (X - other.X) * (X - other.X) + (Y - other.Y) * (Y - other.Y) - distance;
double d = b * b - 4 * a * c;
// Ignore glancing collisions that may not cause a response due to limited precision and lead to an infinite loop
if (b > -1e-6 || d <= 0)
return double.NaN;
double e = Math.Sqrt(d);
double t1 = (-b - e) / (2 * a); // Collison time, +ve or -ve
double t2 = (-b + e) / (2 * a); // Exit time, +ve or -ve
// b < 0 => Getting closer
// If we are overlapping and moving closer, collide now
if (t1 < 0 && t2 > 0 && b <= -1e-6)
return 0;
return t1;
}
The method will return the time that the Balls collide, which can be +ve, -ve or NaN, NaN means they won't or didn't collide.
Further points to note are, we can check the discriminant against <zero to bail out early which will be most of the time, and avoid the Sqrt. Also since I'm using this in a continuous collision detection system, I'm ignoring collisions (glancing) that will have little or no impact since it's possible the response to the collision won't change the velocities and lead to the same situation being checked infinitely, freezing the simulation.
The 'b' variable can used for this check since luckily it's similar to the dot product. If b is >-1e-6 ie. they're not moving closer fast enough we return NaN, ie. they don't collide. You can tweak this value to avoid freezes, smaller will allow closer glancing collisions but increase the chance of a freeze when they happen like when a bunch of circles are packed tightly together. Likewise to avoid Balls moving through each other we signal an immediate collison if they're already overlapping and moving closer.
I am running following SQL query in my JAVA Spring server. This query works perfect for almost all coordinates except for one specific pair c = <23.065079, 72.511478> (= to_lat, to_long):
SELECT *
FROM karpool.ride
WHERE Acos(Sin(Radians(23.065079)) * Sin(Radians(to_lat)) +
Cos(Radians(23.065079)) * Cos(Radians(to_lat)) *
Cos(Radians(to_lon) - Radians(72.511478))) * 6371 <= 10;
My database has many locations within 10 km distance to c. With the above query, I get all those locations' distances, except for the one which exactly matches with c. The distance returned should be 0 in that case, but the query fails.
Is this an SQL issue or is there something wrong with the formula?
This is most probably due to floating point accuracy problems.
First of all, the used formula is the Great circle distance formula:
Let φ1,λ1 and φ1,λ2 be the geographical latitude and longitude of two points 1 and 2, and Δφ,Δλ their absolute differences; then Δσ, the central angle between them, is given by the spherical law of cosines:
Δσ = arccos ( sin φ1 ∙ sin φ2 + cos φ1 ∙ cos φ2 ∙ cos (Δλ) ).
The distance d, i.e. the arc length, for a sphere of radius r and Δσ given in radians
d = r Δσ.
Now if the two points are the same, then Δλ = 0, and thus cos(Δλ) = cos(0) = 1, and the first formula reduces to:
Δσ = arccos (sin φ ∙ sin φ + cos φ ∙ cos φ).
The argument to arccos has become the Pythagorean trigonometric identity, and thus equals 1.
So the above reduces to:
Δσ = arccos (1).
The problem
The domain of the arccosine is: −1 ≤ x ≤ 1, so with the value 1 we are at the boundary of the domain.
As the value of 1 was the result of several floating point operations (sines, cosines, multiplications), it could occur that the value is not exactly 1, but something like 1.0000000000004. That poses a problem, for that value is out of range for calculating the arccosine. Database engines respond differently to this situation:
SQL Server will raise an exception:
An invalid floating point operation occurred.
MySql will just evaluate the expression as null.
The solution
Somehow the argument passed to the arccosine should be made to stay in the range −1 ≤ x ≤ 1. One way of doing this, is to round the argument to a number of decimals that is large enough to keep some precision, but small enough to round away any excess outside this range caused by floating point operations.
Most database engines have a round function to which a second argument can be provided to specify the number of digits to keep, and so the SQL would look like this (keeping 6 decimals):
SELECT *
FROM karpool.ride
WHERE Acos(Round(
Sin(Radians(23.065079)) * Sin(Radians(to_lat)) +
Cos(Radians(23.065079)) * Cos(Radians(to_lat)) *
Cos(Radians(to_lon) - Radians(72.511478)),
6
)) * 6371 <= 10;
Alternatively, you could use the functions greatest and least, which some database engines provide, to turn any excess value to 1 (or -1):
SELECT *
FROM karpool.ride
WHERE Acos(Greatest(Least(
Sin(Radians(23.065079)) * Sin(Radians(to_lat)) +
Cos(Radians(23.065079)) * Cos(Radians(to_lat)) *
Cos(Radians(to_lon) - Radians(72.511478)),
1), -1)
) * 6371 <= 10;
Note that SQL Server does not provide greatest/least functions. A question to overcome this has several answers.
I am new to Spatialite. I have following query:
select A.*
from linka as A, pointa as B
where Contains(Buffer(B.Geometry, 100), A.Geometry)
I actually want to create 100 meters buffer and get to know which are the link's are contained by it.
I can able to find the inserted '100' is actually degree value and it's giving me output which are coming in this range.
I can put the degree value also in my query but the transformation from degree to meters/kilometers is not same all around the world.
I gone through many sites and able to know 1 degree = 110 KM approx.
but from GIS expert and some reference sites also get to know at each pole on earth it's different.
For instance, the difference at Alta/Norway between metrical x and y for planar approximation is 34 km in x direction equal 111 km in y direction. The buffer looks similar to this while using geographic coordinates:
http://extremelysatisfactorytotalitarianism.com/blog/wp-content/uploads/2010/08/tissot_indicatrix_equirectangular_proj.png
I build software which convert geographical data to geometrical (X, Y -coordinate format) data and make transformation where Spatiallite can understand.
I also trying to read regarding SRID things but not able to understand how to insert it into my query.
temporary transform your geometry to a metric projection (eg UTM)
if i assume your current projection is WGS84 try the following statment
transform (buffer (transform (B.geometry, #projection), #dist), 4326))
-in #projection: your new projection, eg: 32631 for WGS 84 / UTM zone 31N (choose the projection that fits your Zone)
-in #dist: distance in meters
(4326 for WGS84)
If You are using SQL server 2008 or later, You should be able to use spatial types
lets assume linka contains geography column, and its name is geo, and it contains Points
dont forget to create spatial index !
try this
DECLARE #buffer geography = geography::Point( 1.234, 5.678, 4326 );
DECLARE #distance float = 100.0;
SELECT * from linka
WHERE linka.geo.STDistance(#buffer) < #distance
I have 2 inputs from 0-180 for x and y i need to add them together and stay in the range of 180 and 0 i am having some trouble since 90 is the mid point i cant seem to keep my data in that range im doing this in vb.net but i mainly need help with the logic
result = (x + y) / 2
Perhaps? At least that will stay in the 0-180 range. Are there any other constraints you're not telling us about, since right now this seems pretty obvious.
If you want to map the two values to the limited range in a linear fashion, just add them together and divide by two:
out = (in1 + in2) / 2
If you just want to limit the top end, add them together then use the minimimum of that and 180:
out = min (180, in1 + in2)
Are you wanting to find the average of the two or add them? If you're adding them, and you're dealing with angles which wrap around (which is what it sounds like) then, why not just add them and then modulo? Like this:
(in1 + in2) mod 180
Hopefully you're familiar with the modulo operator.
I am processing a series of points which all have the same Y value, but different X values. I go through the points by incrementing X by one. For example, I might have Y = 50 and X is the integers from -30 to 30. Part of my algorithm involves finding the distance to the origin from each point and then doing further processing.
After profiling, I've found that the sqrt call in the distance calculation is taking a significant amount of my time. Is there an iterative way to calculate the distance?
In other words:
I want to efficiently calculate: r[n] = sqrt(x[n]*x[n] + y*y)). I can save information from the previous iteration. Each iteration changes by incrementing x, so x[n] = x[n-1] + 1. I can not use sqrt or trig functions because they are too slow except at the beginning of each scanline.
I can use approximations as long as they are good enough (less than 0.l% error) and the errors introduced are smooth (I can't bin to a pre-calculated table of approximations).
Additional information:
x and y are always integers between -150 and 150
I'm going to try a couple ideas out tomorrow and mark the best answer based on which is fastest.
Results
I did some timings
Distance formula: 16 ms / iteration
Pete's interperlating solution: 8 ms / iteration
wrang-wrang pre-calculation solution: 8ms / iteration
I was hoping the test would decide between the two, because I like both answers. I'm going to go with Pete's because it uses less memory.
Just to get a feel for it, for your range y = 50, x = 0 gives r = 50 and y = 50, x = +/- 30 gives r ~= 58.3. You want an approximation good for +/- 0.1%, or +/- 0.05 absolute. That's a lot lower accuracy than most library sqrts do.
Two approximate approaches - you calculate r based on interpolating from the previous value, or use a few terms of a suitable series.
Interpolating from previous r
r = ( x2 + y2 ) 1/2
dr/dx = 1/2 . 2x . ( x2 + y2 ) -1/2 = x/r
double r = 50;
for ( int x = 0; x <= 30; ++x ) {
double r_true = Math.sqrt ( 50*50 + x*x );
System.out.printf ( "x: %d r_true: %f r_approx: %f error: %f%%\n", x, r, r_true, 100 * Math.abs ( r_true - r ) / r );
r = r + ( x + 0.5 ) / r;
}
Gives:
x: 0 r_true: 50.000000 r_approx: 50.000000 error: 0.000000%
x: 1 r_true: 50.010000 r_approx: 50.009999 error: 0.000002%
....
x: 29 r_true: 57.825065 r_approx: 57.801384 error: 0.040953%
x: 30 r_true: 58.335225 r_approx: 58.309519 error: 0.044065%
which seems to meet the requirement of 0.1% error, so I didn't bother coding the next one, as it would require quite a bit more calculation steps.
Truncated Series
The taylor series for sqrt ( 1 + x ) for x near zero is
sqrt ( 1 + x ) = 1 + 1/2 x - 1/8 x2 ... + ( - 1 / 2 )n+1 xn
Using r = y sqrt ( 1 + (x/y)2 ) then you're looking for a term t = ( - 1 / 2 )n+1 0.36n with magnitude less that a 0.001, log ( 0.002 ) > n log ( 0.18 ) or n > 3.6, so taking terms to x^4 should be Ok.
Y=10000
Y2=Y*Y
for x=0..Y2 do
D[x]=sqrt(Y2+x*x)
norm(x,y)=
if (y==0) x
else if (x>y) norm(y,x)
else {
s=Y/y
D[round(x*s)]/s
}
If your coordinates are smooth, then the idea can be extended with linear interpolation. For more precision, increase Y.
The idea is that s*(x,y) is on the line y=Y, which you've precomputed distances for. Get the distance, then divide it by s.
I assume you really do need the distance and not its square.
You may also be able to find a general sqrt implementation that sacrifices some accuracy for speed, but I have a hard time imagining that beating what the FPU can do.
By linear interpolation, I mean to change D[round(x)] to:
f=floor(x)
a=x-f
D[f]*(1-a)+D[f+1]*a
This doesn't really answer your question, but may help...
The first questions I would ask would be:
"do I need the sqrt at all?".
"If not, how can I reduce the number of sqrts?"
then yours: "Can I replace the remaining sqrts with a clever calculation?"
So I'd start with:
Do you need the exact radius, or would radius-squared be acceptable? There are fast approximatiosn to sqrt, but probably not accurate enough for your spec.
Can you process the image using mirrored quadrants or eighths? By processing all pixels at the same radius value in a batch, you can reduce the number of calculations by 8x.
Can you precalculate the radius values? You only need a table that is a quarter (or possibly an eighth) of the size of the image you are processing, and the table would only need to be precalculated once and then re-used for many runs of the algorithm.
So clever maths may not be the fastest solution.
Well there's always trying optimize your sqrt, the fastest one I've seen is the old carmack quake 3 sqrt:
http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/
That said, since sqrt is non-linear, you're not going to be able to do simple linear interpolation along your line to get your result. The best idea is to use a table lookup since that will give you blazing fast access to the data. And, since you appear to be iterating by whole integers, a table lookup should be exceedingly accurate.
Well, you can mirror around x=0 to start with (you need only compute n>=0, and the dupe those results to corresponding n<0). After that, I'd take a look at using the derivative on sqrt(a^2+b^2) (or the corresponding sin) to take advantage of the constant dx.
If that's not accurate enough, may I point out that this is a pretty good job for SIMD, which will provide you with a reciprocal square root op on both SSE and VMX (and shader model 2).
This is sort of related to a HAKMEM item:
ITEM 149 (Minsky): CIRCLE ALGORITHM
Here is an elegant way to draw almost
circles on a point-plotting display:
NEW X = OLD X - epsilon * OLD Y
NEW Y = OLD Y + epsilon * NEW(!) X
This makes a very round ellipse
centered at the origin with its size
determined by the initial point.
epsilon determines the angular
velocity of the circulating point, and
slightly affects the eccentricity. If
epsilon is a power of 2, then we don't
even need multiplication, let alone
square roots, sines, and cosines! The
"circle" will be perfectly stable
because the points soon become
periodic.
The circle algorithm was invented by
mistake when I tried to save one
register in a display hack! Ben Gurley
had an amazing display hack using only
about six or seven instructions, and
it was a great wonder. But it was
basically line-oriented. It occurred
to me that it would be exciting to
have curves, and I was trying to get a
curve display hack with minimal
instructions.