Data retrieving by Latitude longitude matching from both tables in mysql - sql

I have two tables are A & B.
A table having columns are hotelcode_id, latitude,longitude
B table having columns are latitude, longitude
Requirement is, I need retrieving hotelcode_id according to match latitude from both tables and longitude from both tables
I have designed the following query, but still in query performance
SELECT a.hotelcode_id, a.latitude,b.latitude,b.longitude,b.longitude
FROM A
JOIN B
ON a.latitude like concat ('%', b.latitude, '%') AND a.longitude like concat ('%', b.longitude, '%')
Also I'm designed the following another query but I can't able to accuret data's.
This query running too much time but still now I can't able to retrieve the data's.
NOTE:
A table has 150k records
B table has 250k records
: I have set DECIMAL(10,6) for latitude and longitude columns in both tables.
I have done the following job but still in problems in query performance,
done index properly using EXPLAIN statements
done hash partition for this tables
I think wild card characters not allowed the index reference.
Also LIKE SELECT query performance very poor in MySQL.
Any other solution is there instead wild cards issues & LIKE issues in SELECT query?

If you are sure that the numeric values of LAT/LON pairs are equal across the two table, the simple approach would be
SELECT a.hotelcode_id, a.latitude,b.latitude,b.longitude,b.longitude
FROM A JOIN B
WHERE a.latitude = b.latitude
AND a.longitude = b.longitude
If there is some inaccuracy in the data, you may want to define the maximum deviation (here 3.6 angle seconds) which you would regard as "same place", e.g.
SELECT a.hotelcode_id, a.latitude,b.latitude,b.longitude,b.longitude
FROM A JOIN B
WHERE ABS(a.latitude-b.latitude) < 0.001
AND ABS(a.longitude-b.longitude) < 0.001
Mind that in the second case the actual distance (in km) between two points are not the same at any given LAT ... higher LAT --> less distance
And review the sizing of LON and LAT columns ... you know that (usually ...)
-180 <= LON <= 180
-90 <= LAT <= 90

Related

Fastest way to filter latitude and longitude from a sql table?

I have a huge table data where sample data is like below. I want to filter few latitude and longitude records from the huge table and I am using In clause to filter list of lat,lon values but when I try to run the query it takes more a min to execute what is the better query to execute it faster? the list of lat,lon is around 120-150
id longitude latitude
--------------------------
190 -0.410123 51.88409
191 -0.413256 51.84567
query:-
SELECT DISTINCT id, longitude, latitude
FROM geo_table
WHERE ROUND(longitude::numeric, 3) IN (-0.418, -0.417, -0.417, -0.416 and so on )
AND ROUND(latitude::numeric, 3) IN (51.884, 51.884, 51.883, 51.883 and so on);
If at least one of the ranges of values in X or Y is tight you can try prefiltering rows. For example, if X (longitude) values are all close together you could try:
SELECT distinct id,longitude,latitude
from (
select *
FROM geo_table
where longitude between -0.418 and -0.416 -- prefilter with index scan
and latitude between 51.883 and 51.884 -- prefilter with index filter
) x
-- now the re-check logic for exact filtering
where ROUND(longitude::numeric,3) in (-0.418, -0.417, -0.417, -0.416, ...)
and ROUND(latitude::numeric,3) in (51.884, 51.884, 51.883, 51.883, ...)
You would need an index with the form:
create index ix1 on geo_table (longitude, latitude);
First, the way you are looking for a list of latitudes and a list of longitudes is likely wrong if you are looking for points locations:
point: lat;long
----------------
Point A: 1;10
Point B: 2;10
Point C: 1;20
Point D: 2;20
--> if you search for latitude in (1;2) and longitude in (10;20), the query will return the 4 points, while if you search for (latitude,longitude) in ((1;10),(2;20)), the query will return only points A and D.
Then, since you are looking for rounded values, you must index the rounded values:
create index latlong_rdn on geo_table (round(longitude,3),round(latitude,3));
and the query should use the exact same expression:
select *
from geo_table
where (round(longitude,3),round(latitude,3)) in
(
(-0.413,51.846),
(-0.410,51.890)
);
But here again rounding is not necessarily the best approach when dealing with locations. You may want to have a look at the PostGIS extension, to save the points as geography, add a spatial index, and to search for points within a distance (st_dwithin()) of the input locations.

How do I calculate the distance of each row in an SQL table with all the rows of another table?

I have three SQL tables:
A list of 100k weather stations with a latitude and longitude coordinate
A list of 15 cities of interest with a latitude and longitude coordinate
A list of weather data for each weather station
My interest right now is only with the first two tables. How do I filter the list of weather stations to those within e.g. 100km of each city of interest?
I have a Microsoft SQL Server and I'd prefer to do it within SQL if possible.
Basically, if you try to do this yourself, you end up with a Cartesian product:
select c.*, ws.*
from cities c cross apply
(select ws.*
from (select ws.*,
<complicated expression to calculate distance> as distance
from weather_station ws
) ws
where distance < 100
) ws;
In order to get the list of weather stations, all cities and weather stations have to be compared. The distance calculation is often rather expensive, so you can cut down on this by "prefiltering". For instance, in most inhabited places, 100 km is within 1 degree latitude and 2 degrees longitude:
select c.*, ws.*
from cities c cross apply
(select ws.*
from (select ws.*,
<complicated expression to calculate distance> as distance
from weather_station ws
where ws.latitutde between c.latitude - 1 and c.latitude + 1 and
ws.longitude between c.longitude - 2 and c.longitude + 2
) ws
where distance < 100
) ws;
Although that helps, this is still essentially a filtered Cartesian product.
So, what should you really do? If you care about coordinates as spatial data, you should look into SQL Server's spatial extensions (the documentation is here, particularly the geography type because that is most relevant to your needs).
As Gordon mentioned, you can define spatial geography datatype for your needs. You can follow below steps to achieve the goal.
Store the latitude, longitude data in the Point
Now, use the STDistance to calculate the distance between two points
You can leverage common scenario of finding nearest neighbor

How to find distance between two points using latitude and longitude

I have a ROUTES table which has columns SOURCE_AIRPORT and DESTINATION_AIRPORT and describes a particular route that an airplane would take to get from one to the other.
I have an AIRPORTS table which has columns LATITUDE and LONGITUDE which describes an airports geographic position.
I can join the two tables using columns which they both share called SOURCE_AIRPORT_ID and DESTINATION_AIRPORT_ID in the routes table, and called IATA in the airports table (a 3 letter code to represent an airport such as LHR for London Heathrow).
My question is, how can I write an SQL query using all of this information to find, for example, the longest route out of a particular airport such as LHR?
I believe I have to join the two tables, and for every row in the routes table where the source airport is LHR, look at the destination airport's latitude and longitude, calculate how far away that is from LHR, save that as a field called "distance", and then order the data by the highest distance first. But in terms of SQL syntax i'm at a loss.
This would have probably been a better question for the Mathematics Stack Exchange, but I’ll provide some insight here. If you are relatively farmiliar with trigonometry, I’m sure you could understand the implementation given this resource: https://en.m.wikipedia.org/wiki/Haversine_formula. You are looking to compute the distance between two point on the surface of a sphere in terms of their distance across its surface (not a straight line, you can’t travel through the Earth).
The page displays this formula:
https://wikimedia.org/api/rest_v1/media/math/render/svg/a65dbbde43ff45bacd2505fcf32b44fc7dcd8cc0
Where
• φ1, φ2 are the latitude of point 1 and latitude of point 2 (in radians),
• λ1, λ2 are the longitude of point 1 and longitude of point 2 (in radians).

If you data is in degrees, you can simply convert to radians by multiplying by pi/180
There is a formula called great circle distance to calculate distance between two points. You probably can load is as a library for your operating system. Forget the haversine, our planet is not a perfect sphere.
If you use this value often, save it in your routes table.
I think you're about 90% there in terms of the solution method. I'll add in some additional detail regarding a potential SQL query to get your answer. So there's 2 steps you need to do to calculate the distances - step 1 is to create a table containing joining the ROUTES table to the AIRPORTS table to get the latitude/longitude for both the SOURCE_AIRPORT and DESTINATION_AIRPORT on the route. This might look something like this:
SELECT t1.*, CONVERT(FLOAT, t2.LATITUDE) AS SOURCE_LAT, CONVERT(FLOAT, t2.LONGITUDE) AS SOURCE_LONG, CONVERT(FLOAT, t3.LATITUDE) AS DEST_LAT, CONVERT(FLOAT, t3.LONGITUDE) AS DEST_LONG, 0.00 AS DISTANCE_CALC
INTO ROUTE_CALCULATIONS
FROM ROUTES t1 LEFT OUTER JOIN AIRPORTS t2 ON t1.SOURCE_AIRPORT_ID = t2.IATA
LEFT OUTER JOIN AIRPORTS t3 ON t1.DESTINATION_AIRPORT_ID = t3.IATA;
The resulting output should create a new table titled ROUTE_CALCULATIONS made up of all the ROUTES columns, the longitude/latitude for both the SOURCE and DESTINATION airports, and a placeholder DISTANCE_CALC column with a value of 0.
Step 2 is calculating the distance. This should be a relatively straightforward calculation and update.
UPDATE ROUTE_CALCULATIONS
SET DISTANCE_CALC = 2 * 3961 * asin(sqrt((sin(radians((DEST_LAT- SOURCE_LAT) / 2))) ^ 2 + cos(radians(SOURCE_LAT)) * cos(radians(DEST_LAT)) * (sin(radians((DEST_LONG- SOURCE_LONG) / 2))) ^ 2))
And that should give the calculated distance in the DISTANCE_CALC table for all routes seen in the data. From there you should be able to do whatever distance-related route analysis you want.

SQL Network Length Calculation Lon/Lat

I currently have an Azure postgresql database containing openstreetmap data and I was wondering if there's a SQL query that can get the total distance of a way by using the lat/longs of the nodes the way uses.
I would like the SQL query to return way_id and distance.
My current approach is using C# to download all the ways and all the nodes into dictionaries (with their id's being the key). I then loop through all the ways, grouping all the nodes that belong to that way and then use their lat/longs (value divided by 10000000) to calculate the distance. This part works as excepted but rather it be done on the server.
The SQL I have attempted is below but I'm stuck on calculating the total distance per way based on the lat/longs.
Update: Postgis extension is installed.
SELECT current_ways.id as wId, node_id, (CAST(latitude as float)) / 10000000 as lat, (CAST(longitude as float)) / 10000000 as lon FROM public.current_ways
JOIN current_way_nodes as cwn ON current_ways.id = cwn.way_id
JOIN current_nodes as cn ON cwn.node_id = cn.id
*output*
wId node_id latitude longitude
2 1312575 51.4761127 -3.1888786
2 1312574 51.4759647 -3.1874216
2 1312573 51.4759207 -3.1870016
2 1213756 51.4758761 -3.1865223
3 ....
*desired_output*
way_id length
2 x.xxx
3 ...
**Tables**
current_nodes
id
latitude
longitude
current_ways
id
current_way_nodes
way_id
node_id
sequence_id
It would be much simpler should you also had the geometry in your table, i.e. the actual point instead of just the coordinates, or, even better, the actual lines.
That being said, here is a query to get what you are looking for:
SELECT w.way_id,
ST_Length( -- compute the length
ST_MAKELINE( --of a new line
ST_SetSRID( --made of an aggregation of NEW points
ST_MAKEPOINT((CAST(longitude as float)) / 10000000,(CAST(latitude as float)) / 10000000), --created using the long/lat from your text fields
4326) -- specify the projection
ORDER BY w.sequence_id -- order the points using the given sequence
)::geography --cast to geography so the output length will be in meters and not in degrees
) as length_m
FROM current_way_nodes w
JOIN current_nodes n ON w.node_id = n.node_id
GROUP BY w.way_id;

geolocating self join too slow

I am trying to get the count of all records within 50 miles of each record in a huge table (1m + records), using self join as shown below:
proc sql;
create table lab as
select distinct a.id, sum(case when b.value="New York" then 1 else 0 end)
from latlon a, latlon b
where a.id <> b.id
and geodist(a.lat,a.lon,b.lat,b.lon,"M") <= 50
and a.state = b.state;
This ran for 6 hours and was still running when i last checked.
Is there a way to do this more efficiently?
UPDATE: My intention is to get the number of new yorkers in a 50 mile radius from every record identified in table latlon which has name, location and latitude/longitude where lat/lon could be anywhere in the world but location will be a person's hometown. I have to do this for close to a dozen towns. Looks like this is the best it could get. I may have to write a C code for this one i guess.
The geodist() function you're using has no chance of exploiting any index. So, you have an algorithn that's O(n**2) at best. That's gonna be slow.
You can take advantage of a simple fact of spherical geometry, though, to get access to an indexable query. A degree of latitude (north - south) is equivalent to sixty nautical miles, 69 statute miles, or 111.111 km. The British definition of nautical mile was originally equal to a minute. The original Napoleonic meter was defined as one part in ten thousand of the distance from the equator to the pole, also defined as 90 degrees.
(These defintions depend on the assumption that the earth is spherical. It isn't, quite. If you're a civil engineer these definitions break down. If you use them to design a parking lot, it will have some nasty puddles in it when it rains, and will encrooach on the neighbors' property.)
So, what you want is to use a bounding range. Assuming your latitude values a.lat and b.lat are in degrees, two of them are certainly more than fifty statute miles apart unless
a.lat BETWEEN b.lat - 50.0/69.0 AND b.lat + 50.0/69.0
Let's refactor your query. (I don't understand the case stuff about New York so I'm ignoring it. You can add it back.) This will give the IDs of all pairs of places lying within 50 miles of each other. (I'm using the 21st century JOIN syntax here).
select distinct a.id, b.id
from latlon a
JOIN latlon b ON a.id<>b.id
AND a.lat BETWEEN b.lat - 50.0/69.0 AND b.lat + 50.0/69.0
AND a.state = b.state
AND geodist(a.lat,a.lon,b.lat,b.lon,"M") <= 50
Try creating an index on the table on the lat column. That should help performance a LOT.
Then try creating a compound index on (state, lat, id, lon, value). Try those columns in the compound index in different orders, if you don't get satisfactory performance acceleration. It's called a covering index, because the some of its columns (the first two in this case) are used for quick lookups and the rest are used to provide values that would otherwise have to be fetched from the main table.
Your question is phrased ambiguously - I'm interpreting it as "give me all (A, B) city pairs within 50 miles of each other." The NYC special case seems to be for a one-off test - the problem is not to (trivially, in O(n) time) find all cities within 50 miles of NYC.
Rather than computing Great Circle distances, find Manhattan distances instead, using simple addition, and simple bounding boxes. Given (A, B) city tuples with Manhattan distance less than 50 miles, it is straightforward to prune out the few (on diagonals) that have Great Circle (or Euclidean) distance less than 50 miles.
You didn't show us EXPLAIN output describing the backend optimizer's plan.
You didn't tell us about indexes on the latlon table.
I'm not familiar with the SAS RDBMS. Oracle, MySQL, and others have geospatial extensions to support multi-dimensional indexing. Essentially, they merge high-order coordinate bits, down to low-order coordinate bits, to construct a quadtree index. The technique could prove beneficial to your query.
Your DISTINCT keyword will make a big difference for the query plan. Often it will force a tablescan and a filesort. Consider deleting it.
The equijoin on state seems wrong, but maybe you don't care about the tri-state metropolitan area and similar densely populated regions near state borders.
You definitely want the WHERE clause to prune out b rows that are more than 50 miles from the current a row:
too far north, OR
too far south, OR
too far west, OR
too far east
Each of those conditionals boils down to a simple range query that the RDBMS backend can evaluate and optimize against an index. Unfortunately, if it chooses the latitude index, any longitude index that's on disk will be ignored, and vice versa. Which motivates using your vendor's geospatial support.