How to limit a MySQL Distance Query - sql

I'm trying to preform a distance calculation to return a listing of places within a certain distance. This is based on using a zip code database and determining the distance from the origin to each location. What I want to do is limit the results to be within a certain distance from the origin, but I'm having trouble with my MySQL query. Here's the basic query:
SELECT *,
ROUND(DEGREES(ACOS(SIN(RADIANS(42.320271)) * SIN(RADIANS(zip_latitude)) + COS(RADIANS(42.320271)) * COS(RADIANS(zip_latitude)) * COS(RADIANS(-88.462832 - zip_longitude))))) * 69.09 AS distance
FROM locations
LEFT JOIN zip_codes USING (zip_code)
ORDER BY distance ASC
This works great and gives me all the info for each location including the distance from the origin zip code...exactly what I want. However, I want to limit the results to fall within a certain distance (i.e., WHERE distance<=50).
My question and problem is I can't figure out where to include (WHERE distance<=50) into the query above to make it all work. Everything I've tried gives me an error message. Any help would be great.

You have two options:
Restate the logic in the WHERE clause so you can filter by it:
SELECT *,
ROUND(DEGREES(ACOS(SIN(RADIANS(42.320271)) * SIN(RADIANS(zip_latitude)) + COS(RADIANS(42.320271)) * COS(RADIANS(zip_latitude)) * COS(RADIANS(-88.462832 - zip_longitude))))) * 69.09 AS distance
FROM locations
LEFT JOIN zip_codes USING (zip_code)
WHERE (ROUND(DEGREES(ACOS(SIN(RADIANS(42.320271)) * SIN(RADIANS(zip_latitude)) + COS(RADIANS(42.320271)) * COS(RADIANS(zip_latitude)) * COS(RADIANS(-88.462832 - zip_longitude))))) * 69.09) <= 50
ORDER BY distance
This is the better choice, because it requires only one pass over the data. Sadly, it requires you to duplicate the logic -- if you were using the information in the GROUP BY or HAVING clause, MySQL supports referencing a column alias in those.
Use a subquery:
SELECT x.*
FROM (SELECT *,
ROUND(DEGREES(ACOS(SIN(RADIANS(42.320271)) * SIN(RADIANS(zip_latitude)) + COS(RADIANS(42.320271)) * COS(RADIANS(zip_latitude)) * COS(RADIANS(-88.462832 - zip_longitude))))) * 69.09 AS distance
FROM locations
LEFT JOIN zip_codes USING (zip_code)) x
WHERE x.distance <= 50
ORDER BY x.distance

A simple solution is to wrap your query in another query and put the condition and order there. This should work:
SELECT * FROM (
SELECT *, ROUND(DEGREES(ACOS(SIN(RADIANS(42.320271)) * SIN(RADIANS(zip_latitude)) + COS(RADIANS(42.320271)) * COS(RADIANS(zip_latitude)) * COS(RADIANS(-88.462832 - zip_longitude))))) * 69.09 AS distance FROM locations LEFT JOIN zip_codes USING (zip_code)
) WHERE distance <= 50 ORDER BY distance ASC
The middle line is your query without the ORDER BY

The simplest solution is actually using HAVING instead of WHERE, also i moved the last bracket of your ROUND function behind the 69.09, which gives you a higher precision in calculation but still returns a rounded number:
SELECT *,
ROUND(DEGREES(ACOS(SIN(RADIANS(42.320271)) * SIN(RADIANS(zip_latitude))
+ COS(RADIANS(42.320271)) * COS(RADIANS(zip_latitude))
* COS(RADIANS(-88.462832 - zip_longitude)))) * 69.09,2) AS distance
FROM locations
LEFT JOIN zip_codes USING (zip_code)
HAVING distance <= 50
ORDER BY distance ASC
So basically your query was fine, you just need to add the HAVING clause and you may want to fix that rounding to get better results. In my project i used following very similar formula discarding the outer DEGREES function and multiplying by earths circumference:
ROUND(ACOS(SIN(RADIANS('42.320271')) * SIN(RADIANS(zip_latitude))
+ COS(RADIANS('42.320271')) * COS(RADIANS(zip_latitude))
* COS(RADIANS(zip_longitude - '-88.462832'))) * 3963.190592,3) AS distance
and by changing the multiplier at the end, you have kilometers instead of miles:
ROUND(ACOS(SIN(RADIANS('42.320271')) * SIN(RADIANS(zip_latitude))
+ COS(RADIANS('42.320271')) * COS(RADIANS(zip_latitude))
* COS(RADIANS(zip_longitude - '-88.462832'))) * 6378.137,3) AS distance

Related

Filter duplicated value in SQL

I'm trying to build a query that provides me a list of five jobs for a weekly promotion. The query works fine and gives the right result. There is only one factor that needs a filter.
We want to promote different jobs of different companies. The ORDER BY makes it possible to select jobs with the highest need for applicants. It could be that one company has five times the most urgent need. Therefore the query selects the five jobs of one company. I want to add a filter so the query selects a maximum of two or three job from one company. But couldn't find out how.
I've tried it with different angles of the DISTINCT function. But without results. I think that the underlying problem has something to do with a wrong group function on job.id (just a thought) but can't find a solution.
SELECT
job.id,
company_name,
city,
job.title,
hourly_rate_amount,
created_at,
count(work_intent.id),
number_of_contractors,
(count(work_intent.id)/number_of_contractors) AS applicants,
(3959 * acos(cos(radians(52.370216)) * cos( radians(address.latitude))
* cos(radians(longitude) - radians(4.895168)) + sin(radians(52.370216)) * sin(radians(latitude)))) AS distance
FROM job
INNER JOIN client on job.client_id = client.id
INNER JOIN address on job.address_id = address.id
LEFT JOIN work_intent on job.id = work_intent.job_id
INNER JOIN job_title on job.job_title_id = job_title.id
WHERE job_title.id = ANY
(SELECT job_title.id FROM job_title WHERE job.job_title_id = '28'
or job.job_title_id = '30'
or job.job_title_id = '31'
or job.job_title_id = '32'
)
AND job.status = 'open'
AND convert(job.starts_at, date) = '2019-09-19'
AND hourly_rate_amount > 1500
GROUP BY job.id
HAVING distance < 20
ORDER BY applicants, distance
LIMIT 5
I expect the output would be:
job.id - company_name - applicants
14842 - company_1 - 0
46983 - company_6 - 0
45110 - company_5 - 0
95625 - company_1 - 1
12055 - company_3 - 2
One quite simple solution, that can be applied without essentially modifyin the logic of the query, is to wrap the query and use ROW_NUMBER() to rank the records. Then, you can filter on the row number to limit the number of records per company.
Consider:
SELECT *
FROM (
SELECT
x.*,
row_number() over(partition by company order by applicants, distance) rn
FROM (
-- your query, without ORDER BY and LIMIT
) x
) y
WHERE rn <= 3
ORDER BY applicants, distance
LIMIT 5

Applying the TOP keyword to an INTERSECTion in Transact-SQL

I'm working on a project that requires many INTERSECTions and uses a pretty large database, so I'd love to be able to apply TOP to my queries to make things not-so-slow.
Problem is, I know you can do something like (pseudocode-y but I hope it's understandable):
(SELECT TOP 50 * FROM A) INTERSECT (SELECT TOP 50 * FROM B); GO
BUT
can you do something along these lines in some way?
SELECT TOP 50 (SELECT * FROM A INTERSECT SELECT * FROM B); GO
You can write it as:
SELECT TOP 50 * from (SELECT * FROM A INTERSECT SELECT * FROM B) x; GO

Oracle 'Invalid Identifier' in sub-query

I'm having an issue with converting a view from PostgreSQL to Oracle when a sub-query is referencing a column in the outer query.
This issue seems to have been discussed here several times but I have been unable to get any of the fixes to work with my specific query.
The query's purpose is to get a mobile devices last recorded position and get the distance in KM from it's closest checkpoint/Geo-boundary and it references 3 separate tables: devices, device_locations and checkpoints.
SELECT
d.id,
dl.latitude AS last_latitude,
dl.longitude AS last_longitude,
(SELECT * /* Get closest 'checkpoint' to the last device position by calculating the Great-circle distance */
FROM (
SELECT
6371 * acos(cos(dl.latitude / (180/acos(-1))) * cos(checkpoints.latitude / (180/acos(-1))) * cos((checkpoints.longitude / (180/acos(-1))) - (dl.longitude / (180/acos(-1)))) + sin(dl.latitude / (180/acos(-1))) * sin(checkpoints.latitude / (180/acos(-1)))) AS distance
FROM checkpoints
ORDER BY distance)
WHERE ROWNUM = 1) AS distance_to_checkpoint
FROM devices d
LEFT JOIN ( /* Get the last position of the device */
SELECT l.id,
l.time,
l.latitude,
l.longitude,
l.accuracy
FROM device_locations l
WHERE l.ROWID IN (SELECT MAX(ROWID) FROM device_locations GROUP BY id)
ORDER BY l.id, l.time DESC) dl
ON dl.id = d.id;
I've been stuck on this for a while and hoping someone can put me on the right path, thanks.
This is a follow-up to my other answer. In order to get the checkpoints record with the minimum distance, you'd join with the table and use window functions again to pick the best record. E.g.:
select
device_id,
last_latitude,
last_longitude,
checkpoint_latitude,
checkpoint_longitude,
distance
from
(
select
device_id,
last_latitude,
last_longitude,
checkpoint_latitude,
checkpoint_longitude,
distance,
min(distance) over (partition by device_id) as min_distance
from
(
select
d.id as device_id,
dl.latitude as last_latitude,
dl.longitude as last_longitude,
cp.latitude as checkpoint_latitude,
cp.longitude as checkpoint_longitude,
6371 *
acos(cos(dl.latitude / (180/acos(-1))) *
cos(cp.latitude / (180/acos(-1))) *
cos((cp.longitude / (180/acos(-1))) - (dl.longitude / (180/acos(-1))))
+
sin(dl.latitude / (180/acos(-1))) *
sin(cp.latitude / (180/acos(-1)))
) as distance
from devices d
left join
(
select
id as device_id, latitude, longitude, time,
max(time) over (partition by id) as max_time
from device_locations
) dl on dl.device_id = d.id and dl.time = dl.max_time
cross join checkpoints cp
)
)
where (distance = min_distance) or (distance is null and min_distance is null);
Such queries are easier to write with CROSS APPLY and OUTER APPLY, available as of Oracle 12c.
I see two issues:
Extra comma after you final select column: AS distance_to_checkpoint,
Outer select columns reference an inner table device_locations l, instead of the derived table dl - example: l.latitude should be dl.latitude
First of all: The query doesn't get the last device positions. It gets the records with the highest ROWID per ID which may happen to be the latest entry, but is not at all guaranteed to be.
Then you most probably have an issue with scope. Unfortunately, names are only valid one level deep, which is an annoying limitation. dl.latitude etc. are probably not valid in your subquery, because it's actually a subquery within a subquery. Anyway, what you are trying to get is the minimum distance, which you can easily get with MIN.
An ORDER BY in a subquery is superfluous in standard SQL. Oracle makes an exception for their ROWNUM technique, but I wouldn't make use of this. (And as mentioned, it's even clumsy for getting a minimum value.) The ORDER BY in the outer join is superfluous anyway.
This is how I would approach the problem:
select
d.id as device_id,
dl.latitude as last_latitude,
dl.longitude as last_longitude,
(
select min(6371 *
acos(cos(dl.latitude / (180/acos(-1))) *
cos(cp.latitude / (180/acos(-1))) *
cos((cp.longitude / (180/acos(-1))) - (dl.longitude / (180/acos(-1))))
+
sin(dl.latitude / (180/acos(-1))) *
sin(cp.latitude / (180/acos(-1)))
)
)
from checkpoints cp
) as distance
from devices d
left join
(
select
id as device_id, latitude, longitude, time,
max(time) over (partition by id) as max_time
from device_locations
) dl on dl.device_id = d.id and dl.time = dl.max_time;

How to search in a radius using Postgres extension?

Finding distances on the surface of the earth means using Great Circle distances, worked out with the Haversine formula, also called the Spherical Cosine Law formula.
The problem is this: Given a table of locations with latitudes and longitudes, which of those locations are nearest to a given location?
I have the following query:
SELECT z.id,
z.latitude, z.longitude,
p.radius,
p.distance_unit
* DEGREES(ACOS(COS(RADIANS(p.latpoint))
* COS(RADIANS(z.latitude))
* COS(RADIANS(p.longpoint - z.longitude))
+ SIN(RADIANS(p.latpoint))
* SIN(RADIANS(z.latitude)))) AS distance
FROM doorbots as z
JOIN ( /* these are the query parameters */
SELECT 34.0480698 AS latpoint, -118.3589196 AS longpoint,
2 AS radius, 111.045 AS distance_unit
) AS p ON 1=1
WHERE z.latitude between ... and
z.longitude between ...
How to use earthdistance extension to change my complicated formula in the query?
Is it equivalent change?
SELECT z.id,
z.latitude, z.longitude,
p.radius,
round(earth_distance(ll_to_earth(p.latpoint, p.longpoint), ll_to_earth(z.latitude, z.longitude))::NUMERIC,0) AS distance
FROM doorbots as z
JOIN ( /* these are the query parameters */
SELECT 34.0480698 AS latpoint, -118.3589196 AS longpoint,
2 AS radius, 111.045 AS distance_unit
) AS p ON 1=1
WHERE z.latitude between ... and
z.longitude between ...
You can get the most out of earthdistance with the following queries:
Locations close enough (i.e. within 1000000.0 meters -- 621.371192 miles) to (34.0480698, -118.3589196):
select *
from doorbots z
where earth_distance(ll_to_earth(z.latitude, z.longitude), ll_to_earth(34.0480698, -118.3589196)) < 1000000.0; -- in meters
select *
from doorbots z
where point(z.longitude, z.latitude) <#> point(-118.3589196, 34.0480698) < 621.371192; -- in miles
Top 5 locations closest to (34.0480698, -118.3589196):
select *
from doorbots z
order by earth_distance(ll_to_earth(z.latitude, z.longitude), ll_to_earth(34.0480698, -118.3589196))
limit 5;
select *
from doorbots z
order by point(z.longitude, z.latitude) <#> point(-118.3589196, 34.0480698)
limit 5;
To use indexes, apply the following one to your table:
create index idx_doorbots_latlong
on doorbots using gist (earth_box(ll_to_earth(latitude, longitude), 0));
Use index for: locations close enough (i.e. within 1000000.0 meters -- 621.371192 miles) to (34.0480698, -118.3589196):
with p as (
select 34.0480698 as latitude,
-118.3589196 as longitude,
1000000.0 as max_distance_in_meters
)
select z.*
from p, doorbots z
where earth_box(ll_to_earth(z.latitude, z.longitude), 0) <# earth_box(ll_to_earth(p.latitude, p.longitude), p.max_distance_in_meters)
and earth_distance(ll_to_earth(z.latitude, z.longitude), ll_to_earth(p.latitude, p.longitude)) < p.max_distance_in_meters;
Use index for: top 5 locations closest to (34.0480698, -118.3589196):
select z.*
from doorbots z
order by earth_box(ll_to_earth(z.latitude, z.longitude), 0) <-> earth_box(ll_to_earth(34.0480698, -118.3589196), 0)
limit 5;
http://rextester.com/WQAY4056

t-SQL: Use a variable to filter result set

Pardon my naivety.
I have a query that allows you to find the distance between two points on a sphere, in this case, the distance between zip codes.
SELECT TOP 5 zip, city, state, latitude, longitude,
69.0 * DEGREES(ACOS(COS(RADIANS(latpoint))
* COS(RADIANS(latitude))
* COS(RADIANS(longpoint) - RADIANS(longitude))
+ SIN(RADIANS(latpoint))
* SIN(RADIANS(latitude)))) AS distance_in_miles
FROM us_loc_data
JOIN (
SELECT 39.317974 AS latpoint, -94.57545 AS longpoint
) AS p ON 1=1
ORDER BY distance_in_miles
As you can see from the join, the result set is filtered by specifying a pair of coordinates as the "starting" point, and then returns a list of the top 5 nearest locations. (Example below)
Ultimately, I would like to filter the results by specifying a single starting zip code instead of a pair of coordinates. How can I implement a variable to do so? What is best practice?
Not tested, but you can simply alter your sub-query to pull Lat/Lng by zip code.
Declare #Zip varchar(10) = '02806'
SELECT TOP 5 zip, city, state, latitude, longitude,
69.0 * DEGREES(ACOS(COS(RADIANS(latpoint))
* COS(RADIANS(latitude))
* COS(RADIANS(longpoint) - RADIANS(longitude))
+ SIN(RADIANS(latpoint))
* SIN(RADIANS(latitude)))) AS distance_in_miles
FROM us_loc_data
JOIN (
SELECT latitude AS latpoint
, longitude AS longpoint
From us_loc_data
Where Zip = #Zip
) AS p ON 1=1
ORDER BY distance_in_miles