Getting all Buildings in range of 5 miles from specified coordinates - sql

I have database table Building with these columns: name, lat, lng
How can I get all Buildings in range of 5 miles from specified coordinates, for example these:
-84.38653999999998
33.72024
My try but it does not work:
SELECT ST_CONTAINS(
SELECT ST_BUFFER(ST_Point(-84.38653999999998,33.72024), 5),
SELECT ST_POINT(lat,lng) FROM "my_db"."Building" LIMIT 50
);
https://docs.aws.amazon.com/athena/latest/ug/geospatial-functions-list.html

Why are you storing x,y in separated columns? I strongly suggest you to store them as geometry or geography to avoid unnecessary casting overhead in query time.
That being said, you can compute and check distances in miles using ST_DWithin or ST_Distance:
(Test data)
CREATE TABLE building (name text, long numeric, lat numeric);
INSERT INTO building VALUES ('Kirk Michael',-4.5896,54.2835);
INSERT INTO building VALUES ('Baldrine',-4.4077,54.2011);
INSERT INTO building VALUES ('Isle of Man Airport',-4.6283,54.0804);
ST_DWithin
ST_DWithin returns true if the given geometries are within the specified distance from another. The following query searches for geometries that are in 5 miles radius from POINT(-4.6314 54.0887):
SELECT name,long,lat,
ST_Distance('POINT(-4.6314 54.0887)'::geography,
ST_MakePoint(long,lat)) * 0.000621371 AS distance
FROM building
WHERE
ST_DWithin('POINT(-4.6314 54.0887)'::geography,
ST_MakePoint(long,lat),8046.72); -- 8046.72 metres = 5 miles;
name | long | lat | distance
---------------------+---------+---------+-------------------
Isle of Man Airport | -4.6283 | 54.0804 | 0.587728347062174
(1 row)
ST_Distance
The function ST_Distance (with geography type parameters) will return the distance in meters. Using this function all you have to do is to convert meters to miles in the end.
Attention: Distances in queries using ST_Distance are computed in real time and therefore do not use the spatial index. So, it is not recommended to use this function in the WHERE clause! Use it rather in the SELECT clause. Nevertheless the example below shows how it could be done:
SELECT name,long,lat,
ST_Distance('POINT(-4.6314 54.0887)'::geography,
ST_MakePoint(long,lat)) * 0.000621371 AS distance
FROM building
WHERE
ST_Distance('POINT(-4.6314 54.0887)'::geography,
ST_MakePoint(long,lat)) * 0.000621371 <= 5;
name | long | lat | distance
---------------------+---------+---------+-------------------
Isle of Man Airport | -4.6283 | 54.0804 | 0.587728347062174
(1 row)
Mind the parameters order with ST_MakePoint: It is longitude,latitude.. not the other way around.
Demo: db<>fiddle
Amazon Athena equivalent (distance in degrees):
SELECT *, ST_DISTANCE(ST_GEOMETRY_FROM_TEXT('POINT(-84.386330 33.753746)'),
ST_POINT(long,lat)) AS distance
FROM building
WHERE
ST_Distance(ST_GEOMETRY_FROM_TEXT('POINT(-84.386330 33.753746)'),
ST_POINT(long,lat)) <= 5;

First thing first. If possible use Postgis not amazon-athena. Looking on documentation athena looks like the castrated version of a spatial tool.
First - Install postgis.
CREATE EXTENSION postgis SCHEMA public;
Now create geometry(if you want to use metric SRID like 3857 for example) or geography (if you want use degree SRID like 4326) column for your data.
alter table building add column geog geography;
Then transform your point data (lat,long) data to geometry/geography:
update building
set geog=(ST_SetSRID(ST_MakePoint(lat,long),4326)::geography)
Next create spatial index on it
create index on buildings using gist(geog);
Now you are ready for action
select *,
st_distance(geog, ST_makePoint(-84.386,33.72024))/1609.34 dist_miles
from building
where st_dwithin(geog, ST_makePoint(-84.38653999999998,33.72024),5*1609.34);
Few words of explenations:
Index is useful if you have many records in your table.
ST_Dwithin uses index when st_distance doesn't so ST_dwithin will make your query much faster on big data sets.

For aws Athena , try to use this for calculte aprox distance in degrees
decimal_degree_distance = 5000.0 * 360.0 / (2.0 * pi() * cos( radians(latitud) ) * 6400000.0)
where 5000.0 y distance in meters
is good for near ecuador places

Related

SQL Spatial Query - Determine Which Polygon A Lat/Long Point Falls Into

Via ArcMap, I imported a Feature Class into my SQL2019 Server. No issues and the polygons are displaying properly in the 'spatial results' tab when I check. Inside that feature class, there are three distinct shapes (Lets call the field tblGeo.AREA).
I have another table with LAT/LNG coordinate points (tblPoint.LAT, tblPoint.LNG).
Using the two tables (tblGeo and tblPoint), how can I determine which AREA field the coordinate falls into (if any)?
tblGeo:
Field Name
Field Type
Sample
GID
INT
1,2,3...
SHAPE
GEOMETRY
0x2569... or 0x110F...
GEOAREA
VARCHAR(50)
Washington, New York,...
tblPoint:
Field Name
Field Type
Sample
PID
INT
1,2,3...
LOCATION
VARCHAR(100)
White House
LAT
DECIMAL(9,6)
38.897957
LNG
DECIMAL(9,6)
-77.036560
Desired Output
PID
Location
Lat
Lng
GeoArea
1
White House
38.897957
-77.036560
Washington
2
Empire State Building
40.748817
-73.985428
New York
...
...
...
...
Sample input and output data would be nice.
You'll need to convert LAT and LNG to a geometry point.
Assuming LAT and LNG are DECIMAL(9, 6)...
select g.name as AreaName
, p.name as PointName
from tblGeo g
right outer join tblPoint p on g.AREA.STContains(geometry::Point(p.LAT, p.LNG, 0)) = 1
I could check my work if you provided sample data.

Filter list of points using list of Polygons

Given a list of points and a list of polygons. How do you return a list of points (subset of original list of points) that is in any of the polygons on the list
I've removed other columns in the sample tables to simplify things
Points Table:
| Longitude| Latitude |
|----------|-----------|
| 7.07491 | 51.28725 |
| 3.674765 | 51.40205 |
| 6.049105 | 51.86624 |
LocationPolygons Table:
| LineString |
|----------------------|
| CURVEPOLYGON (COMPOUNDCURVE (CIRCULARSTRING (-122.20 47.45, -122.81 47.0, -122.942505 46.687131 ... |
| MULTIPOLYGON (((-110.3086 24.2154, -110.30842 24.2185966, -110.3127...
If I had row from the LocationPolygons table I could do something like
DECLARE #homeLocation geography;
SET #homeLocation = (select top 1 GEOGRAPHY::STGeomFromText(LineString, 4326)
FROM LocationPolygon where LocationPolygonId = '123abc')
select Id, Longitude, Latitude, #homeLocation.STContains(geography::Point(Latitude, Longitude, 4326))
as IsInLocation from Points PointId in (1, 2, 3,)
which would return what I want in a format like the below. However this is only true for just one location on the list
| Id | Longitude| Latitude | IsInLocation |
|----|----------|-----------|--------------|
| 1 | 7.07491 | 51.28725 | 0 |
| 2 | 3.674765 | 51.40205 | 1 |
| 3 | 6.049105 | 51.86624 | 0 |
How do I handle the scenario with multiple rows of the LocationPolygon table?
I'd like to know
if any of the points are in any of the locationPolygons?
what specific location polygon they are in? or if they are in more than one polygon.
Question 2 is more of an extra. Can someone help?
Update #1
In response to #Ben-Thul answer.
Unfortunately I don't have access/permission to make changes to the original tables, I can request access but not certain it'll be given. So not certain I'll be able to add the columns or create the index. Although I can create temp tables in a stored proc, I might be able to use test your solution that way
I stumbled on an answer like the below, but slightly worried about performance implications of using a cross join.
WITH cte AS (
select *, (GEOGRAPHY::STGeomFromText(LineString, 4326)).STContains(geography::Point(Latitude, Longitude, 4326)) as IsInALocation from
(
select Longitude, Latitude from Points nolock
) a cross join (
select LineString FROM LocationPolygons nolock
) b
)
select * from cte where IsInALocation = 1
Obviously, it's better to look at a query plan but is the solution I stumbled upon essentially the same as yours? Are there any potential issues that I missed. Apologies for this but my sql isn't very good.
Question 1 shouldn't be too bad. First, some set up:
alter table dbo.Points add Point as (GEOGRAPHY::Point(Latitude, Longitude, 4326));
create spatial index IX_Point on dbo.Points (Point) with (online = on);
alter table dbo.LocationPolygon add Polygon as (GEOGRAPHY::STGeomFromText(LineString, 4326));
create spatial index IX_Polygon on dbo.LocationPolygon (Polygon) with (online = on);
This will create a computed column on each of your tables that is of type geography that has a spatial index on it.
From there, you should be able to do something like this:
select pt.ID,
pt.Longitude,
pt.Latitude,
coalesce(pg.IsInLocation, 0) as IsInLocation
from Points as pt
outer apply (
select top(1) 1 as IsInLocation
from dbo.LocationPolygon as pg
where pg.Polygon.STContains(p.Point) = 1
) as pg;
Here, you're selecting every row from the Points table and using outer apply to see if any polygons contain that point. If one does (it doesn't matter which one), that query will return a 1 in the result set and bubble that back up to the driving select.
To extend this to Question 2, you can remove the top() from the outer apply and have it return either the IDs from the Polygon table or whatever you want. Note though that it'll return one row per polygon that contains the point, potentially changing the cardinality of your result set!

query to get the results of each row in table1 with a subquery of N maximum records found to meet a condition in table2

I am trying without success to calculate building heights in my city using the LIDAR satellite dataset.
System specs
CPU: Core i7 6700k 4200MHz, 4 cores, 8 threads
RAM: 32GB DDR4 3200mhz
SSD: 1TB Samsung 970 EVO
OS: Ubuntu 18.04
Postgres setup
I am using the latest version of Postgres v12.1 database with PostGIS with the following tweaks recommended in different sources:
shared_buffers = 256MB
maintenance_work_mem = 4GB
max_parallel_maintenance_workers = 7
max_parallel_workers = 7
max_wal_size = 60GB
min_wal_size = 90MB
random_page_cost = 1.0
Database setup
In the lidar table I have more than 3000 million rows, and in the buildings table more than 150000 rows.
In the lidar table the GiST index was created: CREATE INDEX lidar_idx ON lidar USING GIST (geom);
building table: | gid | geom |
lidar table: | z | geom |
Height calculation
Currently in order to calculate the height of a building, it is necessary to check if each one of the 3000 million points (rows) is inside the area of each building and calculate the average of all the points found inside a building area.
The queries I have tried are taking forever (probably more than 5 days or even more) and I would like to simplify the query so that I can get the height of the building with a lot less points, without having to compare with all the insane 3000 million records each time for each building.
In example:
For building with id1, I would like to get only the first 100 records found which are inside the building geometry area ( ST_Within(l.geom, e.geom) ), and once those 100 records are found, pass to the next building.
For building with id2, I would like the same, get only the first 100 records found which are inside the building area.
And so on..
My main query is
SELECT e.gid, AVG(l.z) AS height
FROM lidar l,
buildings e
WHERE ST_Within(l.geom, e.geom)
GROUP BY e.gid) t
I have tried with another query, but I can not get it to work.
SELECT e.gid, AVG(l.z), COUNT(1) FILTER (WHERE ST_Within(l.geom, e.geom)) AS gidc
FROM lidar l, buildings e
WHERE gidc < 100
GROUP BY e.gid
I don't think you really want to do this at all. You should first try to make the correct query faster rather than compromising correctness by working with an arbitrary (but not random) subset of the data.
But if you do want it, then you can use a lateral join.
SELECT e.gid from
buildings e cross join lateral
(select AVG(l.z) AS height FROM lidar l WHERE ST_Within(l.geom, e.geom) LIMIT 100)
it is necessary to check if each one of the 3000million points (rows) is inside the area of each building and calculate the average of all the points found inside a building area.
This is exactly what a geometry index is for. You don't need to look at every point to get just the ones inside the a building area. If you don't have the right index, such as on lidar using gist (geom), then the lateral join query will also be awful.

Incorrect results returned by postgres

I ran the following commands in posgresql 9.6:
./bin/createdb testSpatial
./bin/psql -d testSpatial -c "CREATE EXTENSION postgis;"
create table test(name character varying(250), lat_long character varying(90250), the_geom geometry);
\copy test(name,lat_long) FROM 'test.csv' DELIMITERS E'\t' CSV HEADER;
CREATE INDEX spatial_gist_index ON test USING gist (the_geom );
UPDATE test SET the_geom = ST_GeomFromText(lat_long,4326);
On running: select * from test; I get the following output:
name | lat_long
|
the_geom
------+-----------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------+--------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------
A | POLYGON((-0.061225 -128.427791,-0.059107 -128.428264,-0.056311 -128.428911,-0.054208 -128.426510,-0.055431 -128.426324,-0.057363 -128.42
6124,-0.059315 -128.425843,-0.061225 -128.427791)) | 0103000020E61000000100000008000000D42B6519E258AFBFBE50C076B00D60C07DE9EDCF4543AEBFBC41B456B
40D60C08063CF9ECBD4ACBFA1BC8FA3B90D60C07BF65CA626C1ABBF58AD4CF8A50D60C0BF805EB87361ACBFFFAF3A72A40D60C0B83A00E2AE5EADBF4D81CCCEA20D60C01F1153228
95EAEBF60C77F81A00D60C0D42B6519E258AFBFBE50C076B00D60C0
B | POINT(1.978165 -128.639779)
| 0101000020E61000002D78D15790A6FF3F5D35CF11791460C0
(2 rows)
After this I ran a query: To find all "name" which are within 5 meters of each other. For doing so, I wrote the following command.
testSpatial=# select s1.name, s2.name from test s1, test s2 where ST_DWithin(s1.the_geom, s2.the_geom, 5);
name | name
------+------
A | A
A | B
B | A
B | B
(4 rows)
To my surprise I am getting incorrect output as "A" and "B" are 227.301 km away from each other (as calculated using haversine distance here: http://andrew.hedges.name/experiments/haversine/). Can someone please help me understand as to where am I going wrong.
You have defined your geometry as follows
the_geom geometry
ie, it's not geography. But the ST_DWithin docs say
For Geometries: The distance is specified in units defined by the
spatial reference system of the geometries. For this function to make
sense, the source geometries must both be of the same coordinate
projection, having the same SRID.
For geography units are in meters and measurement is defaulted to
use_spheroid=true, for faster check, use_spheroid=false to measure
along sphere.
So you are actually searching for places that are within 5 degrees of each other. A degree is roughly equal to 111km so you are looking for places that are about 550 km from each other rather than 5 meters.
Additionally, it doesn't make much sense to store strings like POINT(1.978165 -128.639779) in your table. It's completely redundant. It's information that can be generated quite easily from the geography column.

Efficient sorted bounding box query

How would I create indexes in PostgresSQL 8.3 which would make a sorted bounding box query efficient? The table I'm querying has quite a few rows.
That is I want the create indexes that makes the following query as efficient as possible:
SELECT * FROM features
WHERE lat BETWEEN ? AND ?
AND lng BETWEEN ? AND ?
ORDER BY score DESC
The features table look like this:
Column | Type |
------------+------------------------+
id | integer |
name | character varying(255) |
type | character varying(255) |
lat | double precision |
lng | double precision |
score | double precision |
html | text |
To create a GiST index on a point attribute so that we can efficiently use box operators on the result of the conversion function:
CREATE INDEX pointloc
ON points USING gist (box(location,location));
SELECT * FROM points
WHERE box(location,location) && '(0,0),(1,1)'::box;
http://www.postgresql.org/docs/9.0/static/sql-createindex.html
This is the example in 9.0 docs. It should work for 8.3 though as these are features that have been around for ages.
You could try using a GiST index to implement an R-Tree. This type of index is poorly documented, so you might have to trawl through example code in the source distribution.
(Note: My prior advice to use R-Tree indexes appears to be out of date; they are deprecated.)
Sounds like you'd want to take a look at PostGIS, a PostgreSQL module for spatial data types and queries. It supports quick lookups using GiST indexes. Unfortunately I can't guide you further as I haven't used PostGIS myself.