SQL Distance Query without Trigonometry - sql

I have an SQLite database, which does not support trig functions. I would like to sort a set of lat,lng pairs in my table by distance as compared to a second lat,lng pair. I'm familiar with the standard haversine distance formula for sorting lat,lng pairs by distance.
In this case I don't care particularly for precision, my points are separated by large distances, so I don't mind rounding off the distances by treating curves as straight lines.
My question, is there a generally accepted formula for this kind of query? Remember no trig functions!

If your points are within reasonable distance of each other (i.e. not across half the world, and not across the date line), you can make a correction for the difference between latitude and longitude (as a longitude degree is shorter, except at the Equator), and then just calculate the distance as if the earth was flat.
As you just want to sort the values, you don't even have to use the square root, you can just add the squares of the differences.
Example, where #lat and #lng is your current position, and 2 is the difference correction:
select *
from Points
order by (lat - #lat) * (lat - #lat) + ((lng - #lng) * 2) * ((lng - #lng) * 2)
You can calculate the difference correction for a specific latitude as 1 / cos(lat).
Cees Timmerman came up with this formula which also works across the date line:
pow(lat-lat2, 2) + pow(2 * min(abs(lon-lon2), 360 - abs(lon-lon2)), 2)

If you want proper spatial data in your model then use SpatiaLite, a spatially-enabled version of SQLite:
http://www.gaia-gis.it/spatialite/
Its like PostGIS is for PostgreSQL. All your SQLite functionality will work perfectly and unchanged, and you'll get spatial functions too.

You could always truncate the Taylor series expansion of sine and use the fact that sin^2(x)+cos^2(x)=1 to get the approximation of cosine. The only tricky part would be using Taylor's theorem to estimate the number of terms that you'd need for a given amount of precision.

Change "*" with "/" works for me:
select *
from Points
order by (lat - #lat) * (lat - #lat) + ((lng - #lng) / 2) * ((lng - #lng) / 2)

Related

How to get longitude and latitude for a place within certain distance from a known longitude and latitude

I want to find a place (longitude and latitude) with distance less than 10 km from a known longitude and latitude using BigQuery SQL. Is there any possible query for this?
I read your request as saying that given a geospatial point, you wish to query for anything within less than a 10km radius of that point. Here's two ways to solve this:
Using ST_BUFFER
You could use the ST_BUFFER function which similarly takes an argument of the radius to use around a point, but instead uses a segmented circle with 8 segments by default.
SELECT *
FROM `table`
WHERE ST_CONTAINS(
ST_BUFFER(
ST_GEOPOINT(longitude, latitude),
10 * 1000), -- Radius argument is expressed in meters
YourGeoPointColumn)
Using ST_BUFFERWITHTOLERANCE
You might use ST_BUFFERWITHTOLERANCE that replaces the segmented circle with tolerance instead of circle segments.
SELECT *
FROM `table`
WHERE ST_CONTAINS(
ST_BUFFERWITHTOLERANCE(
ST_GEOPOINT(longitude, latitude),
10 * 1000, -- Radius argument is expressed in meters
1), -- Tolerance of 1% of the buffer radius, expressed in meters
YourGeoPointColumn)
ST_Distance function should work here, like this:
with data as (
select 1 id, st_geogpoint(-122, 47) as geo
union all
select 2 id, st_geogpoint(-121, 47) as geo
)
select * from data
where st_distance(geo, st_geogpoint(-122.1, 47)) < 10000
id geo
------------------
1 POINT(-122 47)
Another way to write the distance condition is
ST_DWithin(geo, st_geogpoint(-122.1, 47), 10000)
If something does not work, please provide sample data and what data you expect in the results but is missing.

Get Distance in Meters instead of degrees in Spatialite

I have the following query:
select distance(GeomFromText('POINT(8 49)',4326),GeomFromText('LINESTRING(8.329969 49.919323,8.330181 49.919468)',4326))
this gives me 0.97 degrees. But I need it in meters and do not know which SRID to transform to.
Can somebody give me an example how to get the result in meters for spatialite?
The positions are all in Europe.
Just multiply the value in degrees by 111195 - this value is (Earth mean radius)*PI/180 - that is 'mean length of one great circle degree in meters on Earth's surface'.
The result obtained using this method is within 1% of the geodesic distance for the WGS84 ellipsoid.
EDIT
OK, my answer above still stands for the question: "how to convert arcs in degrees into lengths in meters", however, it's not the question you asked (should have asked).
I haven't used Spatialite professionally, so I assumed that your sample query indeed returns the 'length in degrees'. That's not true.
Unfortunately, it appears that Spatialite fails to calculate the distance in 'geographic sense'. Despite your geometries are defined with SRID 4326, it treats them as if they were on a plane.
Here's a simple proof:
select Distance(GeomFromText('POINT(0 0)',4326),GeomFromText('POINT(3 4)',4326));
returns 5.0.
It's a shame ...
Lets have a look at your original query:
select Distance(
GeomFromText('POINT(8 49)',4326),
GeomFromText('LINESTRING(8.329969 49.919323,8.330181 49.919468)',4326)
)
An equivalent query in MS SQL Server:
SELECT (geography::STGeomFromText('POINT(8 49)', 4326)).STDistance(geography::STGeomFromText('LINESTRING(8.329969 49.919323,8.330181 49.919468)', 4326));
gets you the correct result immediately: 105006.59673084648, in meters, and without any extra brouhaha.
So what are your options with Spatialite?
Indeed, as you said in comments, one option is to project your geometries, and calculate on those. Using SRID 3035 for Europe makes sense, too (if your locations are mostly in Germany, I'd consider SRID 25832).
select Distance(
Transform(GeomFromText('POINT(8 49)',4326),25832),
Transform(GeomFromText('LINESTRING(8.329969 49.919323,8.330181 49.919468)',4326),25832)
)
returns 104969.401605453.
As to your other sample (in comments):
select distance(
Transform(GeomFromText('POINT(8.328957 49.920900)',4326),3035),
Transform(GeomFromText('POINT(8.339665 49.918000)',4326),3035)
)
There's a simpler way to do it (if you have two POINTs, not a POINT and a LINESTRING): create a LINESTRING with your POINTs and use GeodesicLength function, like this:
select GeodesicLength(GeomFromText('LINESTRING(8.328957 49.920900, 8.339665 49.918000)',4326))
It returns 833.910006698673, as expected.
In SpatiaLite's functions reference guide, you can see there are two version of the Distance() function. One takes only two arguments and return the distance in CRS units, the other takes 3 arguments and return the distance in meters.
To get the distance in meters, simply pass a third argument to Distance:
sqlite> select Distance(MakePoint(0, 0), MakePoint(3, 4));
5.0
sqlite> select Distance(MakePoint(0, 0), MakePoint(3, 4), 1);
554058.923752633

Distance between two coordinates, how can I simplify this and/or use a different technique?

I need to write a query which allows me to find all locations within a range (Miles) from a provided location.
The table is like this:
id | name | lat | lng
So I have been doing research and found: this my sql presentation
I have tested it on a table with around 100 rows and will have plenty more! - Must be scalable.
I tried something more simple like this first:
//just some test data this would be required by user input
set #orig_lat=55.857807; set #orig_lng=-4.242511; set #dist=10;
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((orig.lat - abs(dest.lat)) * pi()/180 / 2), 2)
+ COS(orig.lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((orig.lng - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest, locations orig
WHERE orig.id = '1'
HAVING distance < 1
ORDER BY distance;
This returned rows in around 50ms which is pretty good!
However this would slow down dramatically as the rows increase.
EXPLAIN shows it's only using the PRIMARY key which is obvious.
Then after reading the article linked above. I tried something like this:
// defining variables - this when made into a stored procedure will call
// the values with a SELECT query.
set #mylon = -4.242511;
set #mylat = 55.857807;
set #dist = 0.5;
-- calculate lon and lat for the rectangle:
set #lon1 = #mylon-#dist/abs(cos(radians(#mylat))*69);
set #lon2 = #mylon+#dist/abs(cos(radians(#mylat))*69);
set #lat1 = #mylat-(#dist/69);
set #lat2 = #mylat+(#dist/69);
-- run the query:
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((#mylat - abs(dest.lat)) * pi()/180 / 2) ,2)
+ COS(#mylat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((#mylon - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest
WHERE dest.lng BETWEEN #lon1 AND #lon2
AND dest.lat BETWEEN #lat1 AND #lat2
HAVING distance < #dist
ORDER BY distance;
The time of this query is around 240ms, this is not too bad, but is slower than the last. But I can imagine at much higher number of rows this would work out faster. However anEXPLAIN shows the possible keys as lat,lng or PRIMARY and used PRIMARY.
How can I do this better???
I know I could store the lat lng as a POINT(); but I also haven't found too much documentation on this which shows if it's faster or accurate?
Any other ideas would be happily accepted!
Thanks very much!
-Stefan
UPDATE:
As Jonathan Leffler pointed out I had made a few mistakes which I hadn't noticed:
I had only put abs() on one of the lat values. I was using an id search in the WHERE clause in the second one as well, when there was no need. In the first query was purely experimental the second one is more likely to hit production.
After these changes EXPLAIN shows the key is now using lng column and average time to respond around 180ms now which is an improvement.
Any other ideas would be happily accepted!
If you want speed (and simplicity) you'll want some decent geospatial support from your database. This introduces geospatial datatypes, geospatial indexes and (a lot of) functions for processing / building / analyzing geospatial data.
MySQL implements a part of the OpenGIS specifications although it is / was (last time I checked it was) very very rough around the edges / premature (not useful for any real work).
PostGis on PostgreSql would make this trivially easy and readable:
(this finds all points from tableb which are closer then 1000 meters from point a in tablea with id 123)
select
myvalue
from
tablea, tableb
where
st_dwithin(tablea.the_geom, tableb.the_geom, 1000)
and
tablea.id = 123
The first query ignores the parameters you set - using 1 instead of #dist for the distance, and using the table alias orig instead of the parameters #orig_lat and #orig_lon.
You then have the query doing a Cartesian product between the table and itself, which is seldom a good idea if you can avoid it. You get away with it because of the filter condition orig.id = 1, which means that there's only one row from orig joined with each of the rows in dest (including the point with dest.id = 1; you should probably have a condition AND orig.id != dest.id). You also have a HAVING clause but no GROUP BY clause, which is indicative of problems. The HAVING clause is not relating any aggregates, but a HAVING clause is (primarily) for comparing aggregate values.
Unless my memory is failing me, COS(ABS(x)) === COS(x), so you might be able to simplify things by dropping the ABS(). Failing that, it is not clear why one latitude needs the ABS and the other does not - symmetry is crucial in matters of spherical trigonometry.
You have a dose of the magic numbers - the value 69 is presumably number of miles in a degree (of longitude, at the equator), and 3956 is the radius of the earth.
I'm suspicious of the box calculated if the given position is close to a pole. In the extreme case, you might need to allow any longitude at all.
The condition dest.id = 1 in the second query is odd; I believe it should be omitted, but its presence should speed things up, because only one row matches that condition. So the extra time taken is puzzling. But using the primary key index is appropriate as written.
You should move the condition in the HAVING clause into the WHERE clause.
But I'm not sure this is really helping...
The NGS Online Inverse Geodesic Calculator is the traditional reference means to calculate the distance between any two locations on the earth ellipsoid:
http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl
But above calculator is still problematic. Especially between two near-antipodal locations, the computed distance can show an error of some tens of kilometres !!! The origin of the numeric trouble was identified long time ago by Thaddeus Vincenty (page 92):
http://www.ngs.noaa.gov/PUBS_LIB/inverse.pdf
In any case, it is preferrable to use the reliable and very accurate online calculator by Charles Karney:
http://geographiclib.sourceforge.net/cgi-bin/Geod
Some thoughts on improving performance. It wouldn't simplify things from a maintainability standpoint (makes things more complex), but it could help with scalability.
Since you know the radius, you can add conditions for the bounding box, which may allow the db to optimize the query to eliminate some rows without having to do the trig calcs.
You could pre-calculate some of the trig values of the lat/lon of stored locations and store them in the table. This would shift some of the performance cost when inserting the record, but if queries outnumber inserts, this would be good. See this answer for an idea of this approach:
Query to get records based on Radius in SQLite?
You could look at something like geohashing.
When used in a database, the structure of geohashed data has two advantages. ,,, Second, this index structure can be used for a quick-and-dirty proximity search - the closest points are often among the closest geohashes.
You could search SO for some ideas on how to implement:
https://stackoverflow.com/search?q=geohash
If you're only interested in rather small distances, you can approximate the geographical grid by a rectangular grid.
SELECT *, SQRT(POWER(RADIANS(#mylat - dest.lat), 2) +
POWER(RADIANS(#mylon - dst.lng)*COS(RADIANS(#mylat)), 2)
)*#radiusOfEarth AS approximateDistance
…
You could make this even more efficient by storing radians instead of (or in addition to) degrees in your database. If your queries may cross the 180° meridian, some extra care would be neccessary there, but many applications don't have to deal with those locations. You could also try to change POWER(x) to x*x, which might get computed faster.

Optimizing Sqlite query for INDEX

I have a table of 320000 rows which contains lat/lon coordinate points. When a user selects a location my program gets the coordinates from the selected location and executes a query which brings all the points from the table that are near. This is done by calculating the distance between the selected point and each coordinate point from my table row. This is the query I use:
select street from locations
where ( ( (lat - (-34.594804)) *(lat - (-34.594804)) ) + ((lon - (-58.377676 ))*(lon - (-58.377676 ))) <= ((0.00124)*(0.00124)))
group by street;
As you can see the WHERE clause is a simple Pythagoras formula to calculate the distance between two points.
Now my problem is that I can not get an INDEX to be usable. I've tried with
CREATE INDEX indx ON location(lat,lon)
also with
CREATE INDEX indx ON location(street,lat,lon)
with no luck. I've notice that when there is math operation with lat or lon, the index is not being called . Is there any way I can optimize this query for using an INDEX so as to gain speed results?
Thanks in advance!
The problem is that the sql engine needs to evaluate all the records to do the comparison (WHERE ..... <= ...) and filter the points so the indexes don’t speed up the query.
One approach to solve the problem is compute a Minimum and Maximum latitude and longitude to restrict the number of record.
Here is a good link to follow: Finding Points Within a Distance of a Latitude/Longitude
Did you try adjusting the page size? A table like this might gain from having a different (i.e. the largest?) available page size.
PRAGMA page_size = 32768;
Or any power of 2 between 512 and 32768. If you change the page_size, don't forget to vacuum the database (assuming you are using SQLite 3.5.8. Otherwise, you can't change it and will need to start a fresh new database).
Also, running the operation on floats might not be as fast as running it on integers (big maybe), so that you might gain speed if you record all your coordinates times 1 000 000.
Finally, euclydian distance will not yield very accurate proximity results. The further you get from the equator, the more the circle around your point will flatten to ressemble an ellipse. There are fast approximations which are not as calculation intense as a Great Circle Distance Calculation (avoid at all cost!)
You should search in a square instead of a circle. Then you will be able to optimize.
Surely you have a primary key in locations? Probably called id?
Why not just select the id along with the street?
select id, street from locations
where ( ( (lat - (-34.594804)) *(lat - (-34.594804)) ) + ((lon - (-58.377676 ))*(lon - (-58.377676 ))) <= ((0.00124)*(0.00124)))
group by street;

SQL query to calculate coordinate proximity

I'm using this formula to calculate the distance between entries in my (My)SQL database which have latitude and longitude fields in decimal format:
6371 * ACOS(SIN(RADIANS( %lat1% )) * SIN(RADIANS( %lat2% )) +
COS(RADIANS( %lat1% )) * COS(RADIANS( %lat2% )) * COS(RADIANS( %lon2% ) -
RADIANS( %lon1% )))
Substituting %lat1% and %lat2% appropriately it can be used in the WHERE clause to find entries within a certain radius of another entry, using it in the ORDER BY clause together with LIMIT will find the nearest x entries etc.
I'm writing this mostly as a note for myself, but improvements are always welcome. :)
Note: As mentioned by Valerion below, this calculates in kilometers. Substitute 6371 by an appropriate alternative number to use meters, miles etc.
For databases (such as SQLite) that don't support trigonometric functions you can use the Pythagorean theorem.
This is a faster method, even if your database does support trigonometric functions, with the following caveats:
you need to store coords in x,y grid instead of (or as well as) lat,lng;
the calculation assumes 'flat earth', but this is fine for relatively local searches.
Here's an example from a Rails project I'm working on (the important bit is the SQL in the middle):
class User < ActiveRecord::Base
...
# has integer x & y coordinates
...
# Returns array of {:user => <User>, :distance => <distance>}, sorted by distance (in metres).
# Distance is rounded to nearest integer.
# point is a Geo::LatLng.
# radius is in metres.
# limit specifies the maximum number of records to return (default 100).
def self.find_within_radius(point, radius, limit = 100)
sql = <<-SQL
select id, lat, lng, (#{point.x} - x) * (#{point.x} - x) + (#{point.y} - y) * (#{point.y} - y) d
from users where #{(radius ** 2)} >= d
order by d limit #{limit}
SQL
users = User.find_by_sql(sql)
users.each {|user| user.d = Math.sqrt(user.d.to_f).round}
return users
end
Am i right in thinking this is the Haversine formula?
I use the exact same method on a vehicle-tracking application and have done for years. It works perfectly well. A quick check of some old code shows that I multiply the result by 6378137 which if memory serves converts to meters, but I haven't touched it for a very long time.
I believe SQL 2008 has a new spatial datatype that I imagine allows these kinds of comparisons without knowing this formula, and also allows spatial indexes which might be interesting, but I've not looked into it.
I have been using this, forget where I got it though.
SELECT n, SQRT(POW((69.1 * (n.field_geofield_lat - :lat)) , 2 ) + POW((53 * (n.field_geofield_lon - :lon)), 2)) AS distance FROM field_revision_field_geofield n ORDER BY distance ASC