Determine "reach" by geographic distribution - sql

I have a large collection of checkins for products manufactured at a distinct geographic location. I'd like to create a summary metric used to rank these products by how far, globally, they have traveled from their point of origin. For example, a product produced in Maine that is found in California, Florida, and Dublin, Ireland should rank higher than a product made in California that hasn't been seen outside of California.
What kind of algorithms should I be looking at? How would you approach this?

MS SQL Server (which I've just spotted may not be relevant to you) includes spatial data types that allow you to calculate (among other things) the distance between two points defined by their latitude and longitude. So this code:-
DECLARE #p1 geography = geography::Point(#lat1, #long1, 4326);
SELECT #distance=#p1.STDistance(geography::Point(#lat2, #long2, 4326))
would load #distance with the distance in metres between the two points. I lifted the code from a scalar valued in line function that I wrote - but it could also be targeting table columns directly. The 4326 magic number is a reference to the Spatial Reference System Identifier (SRID) that provides answers in metres. This calculation doesn't take into account altitude and the distortion of the globe (other functions/SRIDs are available for this) but it's probably accurate enough for most purposes.
Unfortunately, if you are restricted to postgresql, this answer is of no use (though it may point you in a direction for further investigation).
A reference for Sql Server can be found here : http://technet.microsoft.com/en-us/library/bb933790.aspx

Related

Checking if a Coordinate is Within a Range - BigQuery GIS

I'm looking at the freely available Solar potential dataset on Google BigQuery that may be found here: https://bigquery.cloud.google.com/table/bigquery-public-data:sunroof_solar.solar_potential_by_censustract?pli=1&tab=schema
Each record on the table has the following border definitions:
lat_max - maximum latitude for that region
lat_min - minimum latitude for that region
lng_max - maximum longitude for that region
lng_min - minimum longitude for that region
Now I have a coordinate (lat/lng pair) and I would like to query to see whether or not that coordinate is within the above range. How do I do that with BQ Standard SQL?
I've seen the Geo Functions here: https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions
But I'm still not sure how to write this query.
Thanks!
Assuming the points are just latitude and longitude as numbers, why can't you just do a standard numerical comparison?
Note: The first link doesn't work without a google account, so I can't see the data.
But if you want to become spatial, I'd suggest you're going to need to take the border coordinates that you have and turn them into a polygon using one of: ST_MAKEPOLYGON, ST_GEOGFROMGEOJSON, or ST_GEOGFROMTEXT. Then create a point using the coords you wish to test ST_MAKEPOINT.
Now you have two geographies you can compare them both using ST_INTERSECTION or ST_DISJOINT depending on what outcome you want.
If you want to get fancy and see how far aware from the border you are (which I guess means more efficient?) you can use ST_DISTANCE.
Agree with Jonathan, just checking if each of the lat/lon value is within the bounds is simplest way to achieve it (unless there are any issues around antimeridian, but most likely you can just ignore them).
If you do want to use Geography objects for that, you can construct Geography objects for these rectangles, using
ST_MakePolygon(ST_MakeLine(
[ST_GeogPoint(lon_min, lat_min), ST_GeogPoint(lon_max, lat_min),
ST_GeogPoint(lon_max, lat_max), ST_GeogPoint(lon_min, lat_max),
ST_GeogPoint(lon_min, lat_min)]))
And then check if the point is within particular rectangle using
ST_Intersects(ST_GeogPoint(lon, lat), <polygon-above>)
But it will likely be slower and would not provide any benefit for this particular case.

What is SRID 0 for geometry columns?

So I added geometry columns to a spatial table and using some of the msdn references I ended up specifying the SRID as 0 like so:
update dbo.[geopoint] set GeomPoint = geometry::Point([Longitude], [Latitude], 0)
However, I believe this was a mistake, but before having to update the column, is 0 actually the default = 4326? The query works as long as I specify the SRID as 0 on the query, but I'm getting weird results in comparison to the geography field I have... SRID 0 does not exist in sys.spatial_reference_systems and I haven't been able to dig up any information on it. Any help would be appreciated.
A SRID of 0 doesn't technically exist, it just means no SRID -- ie, the default if you forget to set it. So, technically, you can still perform distance, intersection and all other queries, so long as both sets of geometries have a SRID of 0. If you have one field of geometries with a SRID of 0 and another set with a SRID that actually exists, you will most likely get very strange results. I remember scratching my head once when not getting any results from a spatial query in exactly this situation and SQL Server did not complain, just 0 results (for what is is worth Postgis will actually fail, with a warning about non-matching SRIDs).
In my opinion, you should always explicitly set the SRID of your geometries (or geographies, which naturally will always be 4326), as not only does it prevent strange query results, but it means you can convert from one coordinate system to another. Being able to convert on the fly from lat/lon (4326), to Spherical Mercator (3857), as used in Google Maps/Bing, which is in meters, or some local coordinate system, such as 27700, British National Grid, also in meters, can be very useful. SQL Server does not to my knowledge support conversion from one SRID to another, but as spatial types are essentially CLR types, there are .NET libraries available should you ever need to do so, see Transform/ Project a geometry from one SRID to another for an example.
If you do decide to change you geometries, you can do something like:
UPDATE your_table SET newGeom = geometry::STGeomFromWKB(oldGeom.STAsBinary(), SRID);
which will create a new column or to do it in place:
UPDATE geom SET geom.STSrid=4326;
where 4326 is just an example SRID.
There is a good reference for SRIDs at http://spatialreference.org/, though this is essentially the same information as you find in sys.spatial_reference_systems.
SRIDs are a way to take into account that the distances that you're measuring on aren't on a flat, infinite plane but rather an oblong spheroid. They make sense for the geography data type, but not for geometry. So, if you're doing geographic calculations (as your statement of "in comparison to the geography field I have"), create geography points instead of geometry points. In order to do calculations on any geospatial data (like "find the distance from this point to this other point"), the SRID of all the objects involved need to be the same.
TL;DR: Is the point on the Cartesian plane? Use geometry. Is the point on the globe? Use geography.

spatial data distance search - optimization options

Our business user loves for our searches to be done by distance, problem is we have over 1 million records with a lat/long location. We are using SQL 2008 but we keep running into issues when we order or restrict our searches by distance that the queries take way to long (30 seconds plus). This is unacceptable, there has got to be a better way to do this. We have done everything we can with SQL 2008 and want to upgrade to 2012 if we can at some point.
I ask though, if there is another technology or optimization that we could apply. Could we switch to a different DB for faster performance, a different search algorithm to apply, estimation algorithm, tree, grids, pre-computation, etc?
A solution that might be useful here would be to break your search into two parts:
1) Run a query where you find all records that are within a certain value + or - of the current lat/lng of your location, the where clause might look like:
where (#latitude > (lat - .001) and #latitude > (lat - .001)) and (#longitude> (lng- .001) and #longitude> (longitude- .001))
Using this approach, and especially with an index on both the latitude and longitude columns, you can very quickly define a working set of locations within a specified distance.
2) with the rough results from step 1, use the great circle/haversine method to determine what the actual distance between the source location and each point is.
Where this approach falls over is if there is never any limit to the radius that you are searching, but it works great if you are for instance looking to find all locations within a specific distance of a given point.

Finding posts where position distance is less or equal to dynamic value

I need help with a architectural problem that im working with. The user enters a position and a radius (e.g. distance). The software searches in a (giant = couple of 100k posts) database table for posts where the users location and the "posts" distance to each other is less than the entered distance.
It's kind of hard for me to explain, but imagine a table with two posts, point a and point c, point U is the user location. The user has entered a position and a radius, and the position and radius for a and c is predefined (stored in a database).
In this case i would only be interested in the point A, because the two areas intersect with each other. How should i transform this into doing in a database with a couple of hundred thousand posts in an effective way? In the database i shall store longitude,latitude and radius.
Depends on which database server you're using, but look into the GIS capabilities that might be included. For example, MS SQL Server 2008 has a built-in geometry type, and PostgreSQL has PostGIS. Oracle has something like this too. Anyhow - these native GIS formats come with spacial querying functions that do the sort of thing you're talking about - searching for matches within given distances, etc... It is pretty simple to accomplish once to switch to the proper datatype.
edit
Since you're using SQL 2008, and your data is lat/long, I suggest the "geography" rather than the "geometry" datatype. Take a look here: http://msdn.microsoft.com/en-us/library/cc280766.aspx

SQL Server units question

This may be a really dumb question, but...
What units does Geography.STLength return? The official MSDN page doesn't say anything about the units returned, and this blog entry here says STLength() returns a float indicating the length of the instance in units. Yes, that's right, it says it returns it in units.
Can anyone shed some light on what units STLength returns? Feet? Meters? Inches? Help!
The units are entirely dependent on the Spatial Reference ID (SRID) of the geography/geometry data being used. By convention, you would generally use an SRID of "0" for geometry types if all the data is in the same unit system.
However, usually the geography type uses an SRID of 4326, which is the reference ID of the latitude/longitude ellipsoidal earth coordinate system known as WGS 84. When you specify point coordinates in this system, it is in degrees of angle of latitude and longitude, rather than some distance from an origin. Length and area calculations on points in this reference system will return completely different results from geometric calculations on the exact same point positions (for a great example see Differences between Geography and Geometry here, and as for why this happens, see here).
So if your data columns were created with an SRID of "0", then the system is defined to be unitless and you would need some metadata about the data model to figure out the units. If they were defined with a real SRID, then you can use this query:
SELECT spatial_reference_id
, well_known_text
, unit_of_measure
, unit_conversion_factor
FROM sys.spatial_reference_systems
to check what units the SRID represents. Most are in metres, but a few are in feet.