find number of polygons a line intersects - sql

I am working with SQL Server and spatial types. Have found the very interesting. I have reached a situation and I am not sure how I might work it out. Let me outline
I have 2 points (coordinates) eg. 3,9 and 50, 20 - this is a straight line, joining such.
I have multiple polygons (50 +).
I want to be able to calculate how many polygons that the above line passes through. What I mean by pass through, when I join the 2 coordinates, how many polygons the line intersects? I want to work this out with a SQL query.
Please let me know if not clear - it is difficult to explain!

Based on your coordinates, I'm assuming geometry (as opposed to geography), but the approach should hold regardless. If you have a table dbo.Shapes that has a Shape column of type geometry and each row contains one polygon, this should do:
declare #line geometry = geometry::STLineFromText('LINESTRING(3 9, 50 20)', 0);
select count(*)
from dbo.Shapes as s
where s.shape.STIntersects(#line) = 1;
If you want to know which polygons intersect, just change the count(*) to something more appropriate.

Related

Find records by given latitude and longitude which intersects and circle within 2 mile radius using PostGIS?

I have a Postgres table with some data created by using a shapefile. I need to find all records which intersect within 2 miles radius from a given latitude and longitude. I used some queries including the following one.
SELECT * FROM us_census_zcta WHERE ST_INTERSECTS(geom,
CIRCLE(POINT(40.730610, -73.935242), 2));
But none of them worked. What I am doing wrong here? How can I get the results I need?
The SRID is 4269 (NAD 83).
EDIT: After Mike Organek pointed me out that I have switched the lat-long in the above query. And then I tried a few things and the following query gave me 1 record.
SELECT * FROM us_census_zcta WHERE ST_INTERSECTS(geom::geometry,
ST_SETSRID(ST_POINT(-73.935242, 40.730610), 4269)::geometry);
But how can I use Circle and find records which intersect within 2 miles radius from that given lat-long?
What you're looking for is ST_DWithin, which will check if records intersect within a given buffer. In order to use it with miles you better cast the geometry column to geography, as it then computes distances in metres:
For geography: units are in meters and distance measurement defaults to use_spheroid=true. For faster evaluation use use_spheroid=false to measure on the sphere.
SELECT * FROM us_census_zcta
WHERE
ST_DWithin(
geom::geography,
ST_SetSRID(ST_MakePoint(-73.935242, 40.730610), 4269)::geography,
3218.688); -- ~2 miles
Keep in mind that this cast might affect query performance if the indexes aren't set properly.
See also: Getting all Buildings in range of 5 miles from specified coordinates

How to divide world into cells (grid)

How to divide the world into cells of almost equal size, such that each lat,lon can be mapped to a different cell?
I am pretty sure I've seen a library do this, labelling cells as S1, S2, etc..
Say we have 62.356279,-99.422395 , how to map it to a 2km*2km cell named "FR,23" ?
Thank you!
PostGIS 3.1+
PostGIS 3.1 introduces very easy to use grid generators, namely ST_SquareGrid and ST_HexagonGrid. An easy way to use these functions with data from a table is to use LATERAL to execute it, e.g. cells with 0.1° in size::
Sample Data
Consider the following polygon:
Creating a squared grid with cell size of 0.1° over a given geometry
SELECT grid.* FROM isle_of_man imn,
LATERAL ST_SquareGrid(0.1,imn.geom) grid;
If you only want the cells that interersect with the geometry, just call the function ST_Intersects in the WHERE clause:
SELECT grid.* FROM isle_of_man imn,
LATERAL ST_SquareGrid(0.1,imn.geom) grid
WHERE ST_Intersects(imn.geom,grid.geom);
The same principle applies to ST_HexagonGrid:
SELECT grid.* FROM isle_of_man imn,
LATERAL ST_HexagonGrid(0.1,imn.geom) grid;
SELECT grid.* FROM isle_of_man imn,
LATERAL ST_HexagonGrid(0.1,imn.geom) grid
WHERE ST_Intersects(imn.geom,grid.geom);
Older PostGIS versions
Inspired by this post I started writing a function to do just that - it still needs some tweaking but it'll certainly give you a direction look at. The following function creates a grid with cells of a given size covering the area of a given geometry:
CREATE OR REPLACE FUNCTION public.generate_grid(_size numeric,_geom geometry)
RETURNS TABLE(gid bigint, cell geometry) LANGUAGE 'plpgsql'
AS $BODY$
DECLARE
_bbox box2d := ST_Extent(_geom);
_ncol int := ceil(abs(ST_Xmax(_bbox)-ST_Xmin(_bbox))/_size);
_nrow int := ceil(abs(ST_Ymax(_bbox)-ST_Ymin(_bbox))/_size);
_srid int DEFAULT 4326;
BEGIN
IF ST_SRID(_geom) <> 0 THEN
IF EXISTS (SELECT 1 FROM spatial_ref_sys crs
WHERE crs.srid = ST_SRID(_geom) AND NOT crs.proj4text LIKE '+proj=longlat%') THEN
RAISE EXCEPTION 'Only lon/lat spatial reference systems are supported in this function.';
ELSE
_srid := ST_SRID($2);
END IF;
END IF;
RETURN QUERY
SELECT ROW_NUMBER() OVER (), geom FROM (
SELECT
ST_SetSRID(
(ST_PixelAsPolygons(
ST_AddBand(
ST_MakeEmptyRaster(_ncol, _nrow, ST_XMin(_bbox), ST_YMax(_bbox),_size),
'1BB'::text, 1, 0),
1, false)).geom,_srid)) j(geom);
END;
$BODY$;
Note: This function relies on the extension PostGIS Raster.
SELECT cell FROM isle_of_man,
LATERAL generate_grid(0.1,geom);
... if you're only interested in cells that overlap your polygon, add a ST_Intersects to the query:
SELECT cell FROM isle_of_man,
LATERAL generate_grid(0.1,geom)
WHERE ST_Intersects(geom,cell)
Other alternatives
Mike's fishnet function does basically the same, but you'd need to manually provide the number of rows and columns, and the coordinate pair of the lower left corner:
SELECT ST_SetSRID(cells.geom,4326)
FROM ST_CreateFishnet(4, 6, 0.1, 0.1,-4.8411, 54.0364) AS cells;
You could use this makegrid_2d function to create a grid over an area using a polygon, e.g. a grid with cells of 5000 meters in size:
CREATE TABLE grid_isle_of_man AS
SELECT 'S'||ROW_NUMBER() OVER () AS grid_id, (g).geom
FROM (
SELECT ST_Dump(makegrid_2d(geom,5000))
FROM isle_of_man) j(g)
JOIN isle_of_man ON ST_Intersects((g).geom,geom);
The same logic applies for this hexagrid function. It creates a hexagon grid with fixed sized cells over a given BBOX. You can either manually provide the BBOX (function's second parameter) or extract it from a given polygon. For instance, to create a hexagrid that matches the polygon's extent and store it in a new table with the label you want - with cells of 0.1° in size:
CREATE TABLE hexgrid_isle_of_man AS
WITH j (hex_rec) AS (
SELECT generate_hexagons(0.1,ST_Extent(geom))
FROM isle_of_man
)
SELECT 'S'||ROW_NUMBER() OVER () AS grid_id,(hex_rec).hexagon FROM j
JOIN isle_of_man t ON ST_Intersects(t.geom,(hex_rec).hexagon);
Further reading:
Import World Shapefile
Waiting for PostGIS 3.1: Grid Generators, by Paul Ramsey
Jim's answer is excellent. There are use cases though, where you don't need the actual geometries. Where, as you've mentioned, all you need is coordinates in the same cell mapping to the same code. So instead of a costly point-in-polygon operation that takes O(n) for n polygons - without index, that is ;) - you call a function that simply evaluates a formula transforming a coordinate into a code in O(1). Very handy to aggregate data spatially fast.
Personally, I love Uber's H3 library for that sort of thing, but I'm sure S2 does something similar and does it well. There is a well-maintained PostgreSQL binding for H3 and a simple aggregation example would look something like this:
SELECT h3_geo_to_h3(geom_4326, 9) AS h3res09, SUM(pop_19) AS pop_19
FROM uk_postcode_population
GROUP BY 1;
Read: Sum up the postcode-level population of the UK for each resolution 9 hexagon.
You can still create the actual hexagon geometries when you need them (whole hex grids even). But in my experience, once you commit to the grid approach, you will only need polygons for visualisation in the very end.
I should note that you can't divide the world yourself using this library - Uber has already divided it for you. So if 2km squares are a hard requirement, this is not for you.
Installing the H3 extension is not as straightforward as CREATE EXTENSION postgis, but you don't need to be a command line warrior either. You will at the least have to install PGXN, most likely PostgreSQL's extension build library and the extension itself.
Someone asked for a point-in-polygon example in the comments. This is not exactly related to the question, but highlights how one would use H3.
Polygon layer prep:
CREATE TABLE poly_h3 AS (
SELECT id, h3_polyfill(geom_4326, 13) AS h3res13
FROM poly
);
CREATE INDEX ON poly_h3 (h3res13);
Point layer prep:
ALTER TABLE points ADD COLUMN h3res13 h3index;
UPDATE points
SET h3res13 = h3_geo_to_h3(geom_p_4326, 13);
CREATE INDEX ON points (h3res13);
Count points per polygon:
SELECT poly.id, COUNT(*) AS n
FROM poly_h3 AS poly
INNER JOIN points AS x
ON poly.h3res13 = x.h3res13
GROUP BY poly.id;

Closest position between randomly moving objects

I have a large database tables that contains grid references (X and Y) associated with various objects (each with a unique object identifier) as they move with time. The objects move at approximately constant speed but random directions.
The table looks something like this….
CREATE TABLE positions (
objectId INTEGER,
x_coord INTEGER,
y_coord INTEGER,
posTime TIMESTAMP);
I want to find which two objects got closest to each other and at what time.
Finding the distance between two fixes is relatively easy – simple Pythagoras for the differences between the X and Y values should do the trick.
The first problem seems to be one of volume. The grid itself is large, 100,000 possible X co-ordinates and a similar number of Y co-ordinates. For any given time period the table might contain 10,000 grid reference positions for 1000 different objects – 10 million rows in total.
That’s not in itself a large number, but I can’t think of a way of avoiding doing a ‘product query’ to compare every fix to every other fix. Doing this with 10 million rows will produce 100 million million results.
The next issue is that I’m not just interested in the closest two fixes to each other, I’m interested in the closest two fixes from different objects.
Another issue is that I need to match time as well as position – I’m not just interested in two objects that have visited the same grid square, they need to have done so at the same time.
The other point (may not be relevant) is that the items are unlikely to every occupy exactly the same location at the same time.
I’ve got as far as a simple product query with a few sample rows, but I’m not sure on my next steps. I’m beginning to think this isn’t going something I can pull off with a single SQL query (please prove me wrong) and I’m likely to have to extract the data and subject it to some procedural programming.
Any suggestions?
I’m not sure what SE forum this best suited for – database SQL? Programming? Maths?
UPDATE - Another issue to add to the complexity, the timestamping for each object and position is irregular, one item might have a position recorded at 14:10:00 and another at 14:10:01. If these two positions are right next to each other and one second apart then they may actually represent the closest position although the time don't match!
In order to reduce the number of tested combinations you should segregate them by postime using subqueries. Also, it's recommended you create an index by postime to increase performance.
create index ix1_time on positions (postime);
Since you didn't mention any specific database I assumed PostgreSQL since it's easy to use (for me). The solution should look like:
with t as (
select distinct(postime) as pt from positions
)
select *
from t,
(
select *
from (
select
a.objectid as aid, b.objectid as bid,
a.x_coord + a.y_coord + b.x_coord + b.y_coord as dist -- fix here!
from t
join positions a on a.postime = t.pt
join positions b on b.postime = t.pt
where a.objectid <> b.objectid
) x
order by dist desc
limit 1
) y;
This SQL should compare each 10000 objects against each other on by postime. It will test 10 million combinations for each different postime value, but not against other postime values.
Please note: I used a.x_coord + a.y_coord + b.x_coord + b.y_coord as the distance formula. I leave the correct one for you to implement here.
In total it will compute 10 million x 1000 time values: a total of 10 billion comparisons. It will return the closest two points for each timepos, that is a total of 1000 rows.

SQL Cross Apply Performance Issues

My database has a directory of about 2,000 locations scattered throughout the United States with zipcode information (which I have tied to lon/lat coordinates).
I also have a table function which takes two parameters (ZipCode & Miles) to return a list of neighboring zip codes (excluding the same zip code searched)
For each location I am trying to get the neighboring location ids. So if location #4 has three nearby locations, the output should look like:
4 5
4 24
4 137
That is, locations 5, 24, and 137 are within X miles of location 4.
I originally tried to use a cross apply with my function as follows:
SELECT A.SL_STORENUM,A.Sl_Zip,Q.SL_STORENUM FROM tbl_store_locations AS A
CROSS APPLY (SELECT SL_StoreNum FROM tbl_store_locations WHERE SL_Zip in (select zipnum from udf_GetLongLatDist(A.Sl_Zip,7))) AS Q
WHERE A.SL_StoreNum='04'
However that ran for over 20 minutes with no results so I canceled it. I did try hardcoding in the zipcode and it immediately returned a list
SELECT A.SL_STORENUM,A.Sl_Zip,Q.SL_STORENUM FROM tbl_store_locations AS A
CROSS APPLY (SELECT SL_StoreNum FROM tbl_store_locations WHERE SL_Zip in (select zipnum from udf_GetLongLatDist('12345',7))) AS Q
WHERE A.SL_StoreNum='04'
What is the most efficient way of accomplishing this listing of nearby locations? Keeping in mind while I used "04" as an example here, I want to run the analysis for 2,000 locations.
The "udf_GetLongLatDist" is a function which uses some math to calculate distance between two geographic coordinates and returns a list of zipcodes with a distance of > 0. Nothing fancy within it.
When you use the function you probably have to calculate every single possible distance for each row. That is why it takes so long. SInce teh actual physical locations don;t generally move, what we always did was precalculate the distance from each zipcode to every other zip code (and update only once a month or so when we added new possible zipcodes). Once the distances are precalculated, all you have to do is run a query like
select zip2 from zipprecalc where zip1 = '12345' and distance <=10
We have something similar and optimized it by only calculating the distance of other zipcodes whose latitude is within a bounded range. So if you want other zips within #miles, you use a
where latitude >= #targetLat - (#miles/69.2) and latitude <= #targetLat + (#miles/69.2)
Then you are only calculating the great circle distance of a much smaller subset of other zip code rows. We found this fast enough in our use to not require precalculating.
The same thing can't be done for longitude because of the variation between equator and pole of what distance a degree of longitude represents.
Other answers here involve re-working the algorithm. I personally advise the pre-calculated map of all zipcodes against each other. It should be possible to embed such optimisations in your existing udf, to minimise code-changes.
A refactoring of the query, however, could be as follows...
SELECT
A.SL_STORENUM, A.Sl_Zip, C.SL_STORENUM
FROM
tbl_store_locations AS A
CROSS APPLY
dbo.udf_GetLongLatDist(A.Sl_Zip,7) AS B
INNER JOIN
tbl_store_locations AS C
ON C.SL_Zip = B.zipnum
WHERE
A.SL_StoreNum='04'
Also, the performance of the CROSS APPLY will benefit greatly if you can ensure that the udf is INLINE rather than MULTI-STATEMENT. This allows the udf to be expanded inline (macro like) for a much cleaner execution plan.
Doing so would also allow you to return additional fields from the udf. The optimiser can then include or exclude those fields from the plan depending on whether you actually use them. Such an example would be to include the SL_StoreNum if it's easily accessible from the query in the udf, and so remove the need for the last join...

Optimizing Sqlite query for INDEX

I have a table of 320000 rows which contains lat/lon coordinate points. When a user selects a location my program gets the coordinates from the selected location and executes a query which brings all the points from the table that are near. This is done by calculating the distance between the selected point and each coordinate point from my table row. This is the query I use:
select street from locations
where ( ( (lat - (-34.594804)) *(lat - (-34.594804)) ) + ((lon - (-58.377676 ))*(lon - (-58.377676 ))) <= ((0.00124)*(0.00124)))
group by street;
As you can see the WHERE clause is a simple Pythagoras formula to calculate the distance between two points.
Now my problem is that I can not get an INDEX to be usable. I've tried with
CREATE INDEX indx ON location(lat,lon)
also with
CREATE INDEX indx ON location(street,lat,lon)
with no luck. I've notice that when there is math operation with lat or lon, the index is not being called . Is there any way I can optimize this query for using an INDEX so as to gain speed results?
Thanks in advance!
The problem is that the sql engine needs to evaluate all the records to do the comparison (WHERE ..... <= ...) and filter the points so the indexes don’t speed up the query.
One approach to solve the problem is compute a Minimum and Maximum latitude and longitude to restrict the number of record.
Here is a good link to follow: Finding Points Within a Distance of a Latitude/Longitude
Did you try adjusting the page size? A table like this might gain from having a different (i.e. the largest?) available page size.
PRAGMA page_size = 32768;
Or any power of 2 between 512 and 32768. If you change the page_size, don't forget to vacuum the database (assuming you are using SQLite 3.5.8. Otherwise, you can't change it and will need to start a fresh new database).
Also, running the operation on floats might not be as fast as running it on integers (big maybe), so that you might gain speed if you record all your coordinates times 1 000 000.
Finally, euclydian distance will not yield very accurate proximity results. The further you get from the equator, the more the circle around your point will flatten to ressemble an ellipse. There are fast approximations which are not as calculation intense as a Great Circle Distance Calculation (avoid at all cost!)
You should search in a square instead of a circle. Then you will be able to optimize.
Surely you have a primary key in locations? Probably called id?
Why not just select the id along with the street?
select id, street from locations
where ( ( (lat - (-34.594804)) *(lat - (-34.594804)) ) + ((lon - (-58.377676 ))*(lon - (-58.377676 ))) <= ((0.00124)*(0.00124)))
group by street;