Should I create an index on the columns if their values are used in functions ? (SQLite) - optimization

I am working with a huge database and trying top optimize it.
I was wondering if it will make any change to index the values that are used as criteria in the request, but through a function.
For example I have this GPS coordinate table :
-Node (#id,lat,lng)
and this request :
SELECT * FROM Node WHERE distance( lat, lng, $lat, $lng ) < $threshold
Would creating an index on lat and lng make any optimization ? (I'm working with SQLite)
Thanks
Edit I just thought about the same question, but if I make the calculation directly like :
SELECT * FROM Node WHERE (lat-$lat)*(lat-$lat) + (lng-$lng)*(lng-$lng) < $threshold

For queries, you would absolutely see an performance benefit.
But with a jumbo database you will also encounter a performance hit on insertions.

The database will need to calculate the distance for each node in your example and will not benefit from an index. If you however index the lng and lat columns and use these to first eliminate all nodes that either have abs(lat - $lat) > $threshold or abs(lng - $lng) > $threshold you could see increased performance since the database can use the created index to eliminate a number of records before calculating the distance for the remaining records.
The query would look something like this:
SELECT * FROM Node
WHERE lat >= $lat - $threshold
AND lat <= $lat + $threshold
AND lng >= $lng - $threshold
AND lng <= $lng + $threshold
AND distance( lat, lng, $lat, $lng ) < $threshold;

Related

Data retrieving by Latitude longitude matching from both tables in mysql

I have two tables are A & B.
A table having columns are hotelcode_id, latitude,longitude
B table having columns are latitude, longitude
Requirement is, I need retrieving hotelcode_id according to match latitude from both tables and longitude from both tables
I have designed the following query, but still in query performance
SELECT a.hotelcode_id, a.latitude,b.latitude,b.longitude,b.longitude
FROM A
JOIN B
ON a.latitude like concat ('%', b.latitude, '%') AND a.longitude like concat ('%', b.longitude, '%')
Also I'm designed the following another query but I can't able to accuret data's.
This query running too much time but still now I can't able to retrieve the data's.
NOTE:
A table has 150k records
B table has 250k records
: I have set DECIMAL(10,6) for latitude and longitude columns in both tables.
I have done the following job but still in problems in query performance,
done index properly using EXPLAIN statements
done hash partition for this tables
I think wild card characters not allowed the index reference.
Also LIKE SELECT query performance very poor in MySQL.
Any other solution is there instead wild cards issues & LIKE issues in SELECT query?
If you are sure that the numeric values of LAT/LON pairs are equal across the two table, the simple approach would be
SELECT a.hotelcode_id, a.latitude,b.latitude,b.longitude,b.longitude
FROM A JOIN B
WHERE a.latitude = b.latitude
AND a.longitude = b.longitude
If there is some inaccuracy in the data, you may want to define the maximum deviation (here 3.6 angle seconds) which you would regard as "same place", e.g.
SELECT a.hotelcode_id, a.latitude,b.latitude,b.longitude,b.longitude
FROM A JOIN B
WHERE ABS(a.latitude-b.latitude) < 0.001
AND ABS(a.longitude-b.longitude) < 0.001
Mind that in the second case the actual distance (in km) between two points are not the same at any given LAT ... higher LAT --> less distance
And review the sizing of LON and LAT columns ... you know that (usually ...)
-180 <= LON <= 180
-90 <= LAT <= 90

Distance between two coordinates, how can I simplify this and/or use a different technique?

I need to write a query which allows me to find all locations within a range (Miles) from a provided location.
The table is like this:
id | name | lat | lng
So I have been doing research and found: this my sql presentation
I have tested it on a table with around 100 rows and will have plenty more! - Must be scalable.
I tried something more simple like this first:
//just some test data this would be required by user input
set #orig_lat=55.857807; set #orig_lng=-4.242511; set #dist=10;
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((orig.lat - abs(dest.lat)) * pi()/180 / 2), 2)
+ COS(orig.lat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((orig.lng - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest, locations orig
WHERE orig.id = '1'
HAVING distance < 1
ORDER BY distance;
This returned rows in around 50ms which is pretty good!
However this would slow down dramatically as the rows increase.
EXPLAIN shows it's only using the PRIMARY key which is obvious.
Then after reading the article linked above. I tried something like this:
// defining variables - this when made into a stored procedure will call
// the values with a SELECT query.
set #mylon = -4.242511;
set #mylat = 55.857807;
set #dist = 0.5;
-- calculate lon and lat for the rectangle:
set #lon1 = #mylon-#dist/abs(cos(radians(#mylat))*69);
set #lon2 = #mylon+#dist/abs(cos(radians(#mylat))*69);
set #lat1 = #mylat-(#dist/69);
set #lat2 = #mylat+(#dist/69);
-- run the query:
SELECT *, 3956 * 2 * ASIN(
SQRT( POWER(SIN((#mylat - abs(dest.lat)) * pi()/180 / 2) ,2)
+ COS(#mylat * pi()/180 ) * COS(abs(dest.lat) * pi()/180)
* POWER(SIN((#mylon - dest.lng) * pi()/180 / 2), 2) ))
AS distance
FROM locations dest
WHERE dest.lng BETWEEN #lon1 AND #lon2
AND dest.lat BETWEEN #lat1 AND #lat2
HAVING distance < #dist
ORDER BY distance;
The time of this query is around 240ms, this is not too bad, but is slower than the last. But I can imagine at much higher number of rows this would work out faster. However anEXPLAIN shows the possible keys as lat,lng or PRIMARY and used PRIMARY.
How can I do this better???
I know I could store the lat lng as a POINT(); but I also haven't found too much documentation on this which shows if it's faster or accurate?
Any other ideas would be happily accepted!
Thanks very much!
-Stefan
UPDATE:
As Jonathan Leffler pointed out I had made a few mistakes which I hadn't noticed:
I had only put abs() on one of the lat values. I was using an id search in the WHERE clause in the second one as well, when there was no need. In the first query was purely experimental the second one is more likely to hit production.
After these changes EXPLAIN shows the key is now using lng column and average time to respond around 180ms now which is an improvement.
Any other ideas would be happily accepted!
If you want speed (and simplicity) you'll want some decent geospatial support from your database. This introduces geospatial datatypes, geospatial indexes and (a lot of) functions for processing / building / analyzing geospatial data.
MySQL implements a part of the OpenGIS specifications although it is / was (last time I checked it was) very very rough around the edges / premature (not useful for any real work).
PostGis on PostgreSql would make this trivially easy and readable:
(this finds all points from tableb which are closer then 1000 meters from point a in tablea with id 123)
select
myvalue
from
tablea, tableb
where
st_dwithin(tablea.the_geom, tableb.the_geom, 1000)
and
tablea.id = 123
The first query ignores the parameters you set - using 1 instead of #dist for the distance, and using the table alias orig instead of the parameters #orig_lat and #orig_lon.
You then have the query doing a Cartesian product between the table and itself, which is seldom a good idea if you can avoid it. You get away with it because of the filter condition orig.id = 1, which means that there's only one row from orig joined with each of the rows in dest (including the point with dest.id = 1; you should probably have a condition AND orig.id != dest.id). You also have a HAVING clause but no GROUP BY clause, which is indicative of problems. The HAVING clause is not relating any aggregates, but a HAVING clause is (primarily) for comparing aggregate values.
Unless my memory is failing me, COS(ABS(x)) === COS(x), so you might be able to simplify things by dropping the ABS(). Failing that, it is not clear why one latitude needs the ABS and the other does not - symmetry is crucial in matters of spherical trigonometry.
You have a dose of the magic numbers - the value 69 is presumably number of miles in a degree (of longitude, at the equator), and 3956 is the radius of the earth.
I'm suspicious of the box calculated if the given position is close to a pole. In the extreme case, you might need to allow any longitude at all.
The condition dest.id = 1 in the second query is odd; I believe it should be omitted, but its presence should speed things up, because only one row matches that condition. So the extra time taken is puzzling. But using the primary key index is appropriate as written.
You should move the condition in the HAVING clause into the WHERE clause.
But I'm not sure this is really helping...
The NGS Online Inverse Geodesic Calculator is the traditional reference means to calculate the distance between any two locations on the earth ellipsoid:
http://www.ngs.noaa.gov/cgi-bin/Inv_Fwd/inverse2.prl
But above calculator is still problematic. Especially between two near-antipodal locations, the computed distance can show an error of some tens of kilometres !!! The origin of the numeric trouble was identified long time ago by Thaddeus Vincenty (page 92):
http://www.ngs.noaa.gov/PUBS_LIB/inverse.pdf
In any case, it is preferrable to use the reliable and very accurate online calculator by Charles Karney:
http://geographiclib.sourceforge.net/cgi-bin/Geod
Some thoughts on improving performance. It wouldn't simplify things from a maintainability standpoint (makes things more complex), but it could help with scalability.
Since you know the radius, you can add conditions for the bounding box, which may allow the db to optimize the query to eliminate some rows without having to do the trig calcs.
You could pre-calculate some of the trig values of the lat/lon of stored locations and store them in the table. This would shift some of the performance cost when inserting the record, but if queries outnumber inserts, this would be good. See this answer for an idea of this approach:
Query to get records based on Radius in SQLite?
You could look at something like geohashing.
When used in a database, the structure of geohashed data has two advantages. ,,, Second, this index structure can be used for a quick-and-dirty proximity search - the closest points are often among the closest geohashes.
You could search SO for some ideas on how to implement:
https://stackoverflow.com/search?q=geohash
If you're only interested in rather small distances, you can approximate the geographical grid by a rectangular grid.
SELECT *, SQRT(POWER(RADIANS(#mylat - dest.lat), 2) +
POWER(RADIANS(#mylon - dst.lng)*COS(RADIANS(#mylat)), 2)
)*#radiusOfEarth AS approximateDistance
…
You could make this even more efficient by storing radians instead of (or in addition to) degrees in your database. If your queries may cross the 180° meridian, some extra care would be neccessary there, but many applications don't have to deal with those locations. You could also try to change POWER(x) to x*x, which might get computed faster.

SQL - Lat/Lng Distance Query - Returns nothing if distance = 0

If my input $latitude $longitude values happen to match exactly the stored values in the DB nothing is returned (eg searching for yourself)...I assume because the distance will be 0.
Im using this query:
SELECT *,
(((acos(sin((".$latitude."*pi()/180)) * sin((`lat`*pi()/180))
+cos((".$latitude."*pi()/180)) * cos((`lat`*pi()/180))
* cos(((".$longitude."- `lng`)*pi()/180))))*180/pi())*60*1.1515)
AS distance
FROM ...
LEFT JOIN ...
ON ...
WHERE 'cat_id' = '$cat_id'
HAVING distance <= $radius
ORDER BY distance ASC
One solution I found was to make the input less accurate, by reducing the decimal of the lat lng values - but that's not really a solution.
How can I alter the query so that the row is still returned if the distance is 0?
The HAVING clause is killing the output. Change the query so that its condition is part of the WHERE:
...
WHERE 'cat_id' = '$cat_id' AND distance <= $radius

Does this mySQL "spatial" query work in SQL Server 2008 as well?

Before I embark on a a pretty decent overhaul of my web app to use a spatial query, I'd like to know if this MySQL query works in SQL Server 2008:
SELECT id, ( 3959 * acos( cos( radians(37) ) * cos( radians( lat ) ) *
cos( radians( lng ) - radians(-122) ) + sin( radians(37) ) *
sin( radians( lat ) ) ) ) AS distance
FROM markers HAVING distance < 25
ORDER BY distance LIMIT 0 , 20;
Or is there a better way to do this in SQL Server 2008?
My database currently stores that lat/long of businesses near military bases in Japan. However, I'm querying the table to find businesses that contain the specified bases' id.
Biz table
----------------------
PK BizId bigint (auto increment)
Name
Address
Lat
Long
**FK BaseId int (from the MilBase table)**
A spatial query, based on having a center lat/long and given radius (in km) would be a better fit for the app and would open up some new possibilities.
Any help is greatly appreciated!
It looks like you're selecting the distance between two points. In SQL Server 2008, you can use the STDistance method of the geography data type. This will look something like this:
SELECT TOP 20
geography::STGeomFromText('POINT(-122.0 37.0)', 4326).STDistance(p)
FROM markers
WHERE geography::STGeomFromText('POINT(-122.0 37.0)', 4326).STDistance(p) < 25
ORDER BY geography::STGeomFromText('POINT(-122.0 37.0)', 4326).STDistance(p);
Where p would be a field of type geography instead of two separate decimal fields. You may probably also want to create a spatial index on your p field for better performance.
To use the geography data type, simply specify your field as geography in your CREATE TABLE:
CREATE TABLE markers (
id int IDENTITY (1,1),
p geography,
title varchar(100)
);
Inserting values into your markers table will now look like this:
INSERT INTO markers (id, p, title)
VALUES (
1,
geography::STGeomFromText('POINT(-122.0 37.0)', 4326),
'My Marker'
);
Where -122.0 is the longitude, and 37.0 is the latitude.
Creating a spatial index would look something like this:
CREATE SPATIAL INDEX ix_sp_markers
ON markers(p)
USING GEOGRAPHY_GRID
WITH ( GRIDS = (HIGH, HIGH, HIGH, HIGH),
CELLS_PER_OBJECT = 2,
PAD_INDEX = ON);
If you are only interested in retrieving points within 25 miles, then there is absolutely no need to use spherical or great circle math in the distance calculations... More than sufficient would be to just use the standard cartesian distance formula...
Where Square(Delta-X) + Square(Delta-Y) < 225
All you need to do is convert the difference in Latitudes and the difference in longitudes to mileages in whatever units you are using (statue miles naultical miles, whatever)
If u r using nautical miles each degree of latitude = 60 nm...
And each degree of Longitude is equal to 60 * cos(Latitude) nm
Here if both points are within 25 miles of one another, you don;t even need to worry about the difference between this factor from one point to the other...

SQL query to calculate coordinate proximity

I'm using this formula to calculate the distance between entries in my (My)SQL database which have latitude and longitude fields in decimal format:
6371 * ACOS(SIN(RADIANS( %lat1% )) * SIN(RADIANS( %lat2% )) +
COS(RADIANS( %lat1% )) * COS(RADIANS( %lat2% )) * COS(RADIANS( %lon2% ) -
RADIANS( %lon1% )))
Substituting %lat1% and %lat2% appropriately it can be used in the WHERE clause to find entries within a certain radius of another entry, using it in the ORDER BY clause together with LIMIT will find the nearest x entries etc.
I'm writing this mostly as a note for myself, but improvements are always welcome. :)
Note: As mentioned by Valerion below, this calculates in kilometers. Substitute 6371 by an appropriate alternative number to use meters, miles etc.
For databases (such as SQLite) that don't support trigonometric functions you can use the Pythagorean theorem.
This is a faster method, even if your database does support trigonometric functions, with the following caveats:
you need to store coords in x,y grid instead of (or as well as) lat,lng;
the calculation assumes 'flat earth', but this is fine for relatively local searches.
Here's an example from a Rails project I'm working on (the important bit is the SQL in the middle):
class User < ActiveRecord::Base
...
# has integer x & y coordinates
...
# Returns array of {:user => <User>, :distance => <distance>}, sorted by distance (in metres).
# Distance is rounded to nearest integer.
# point is a Geo::LatLng.
# radius is in metres.
# limit specifies the maximum number of records to return (default 100).
def self.find_within_radius(point, radius, limit = 100)
sql = <<-SQL
select id, lat, lng, (#{point.x} - x) * (#{point.x} - x) + (#{point.y} - y) * (#{point.y} - y) d
from users where #{(radius ** 2)} >= d
order by d limit #{limit}
SQL
users = User.find_by_sql(sql)
users.each {|user| user.d = Math.sqrt(user.d.to_f).round}
return users
end
Am i right in thinking this is the Haversine formula?
I use the exact same method on a vehicle-tracking application and have done for years. It works perfectly well. A quick check of some old code shows that I multiply the result by 6378137 which if memory serves converts to meters, but I haven't touched it for a very long time.
I believe SQL 2008 has a new spatial datatype that I imagine allows these kinds of comparisons without knowing this formula, and also allows spatial indexes which might be interesting, but I've not looked into it.
I have been using this, forget where I got it though.
SELECT n, SQRT(POW((69.1 * (n.field_geofield_lat - :lat)) , 2 ) + POW((53 * (n.field_geofield_lon - :lon)), 2)) AS distance FROM field_revision_field_geofield n ORDER BY distance ASC