Query geospatial data with BigQuery - google-bigquery

Hi I would like to obtain a list of public locations (restaurant, hotels, cinema etc.) neighbours based on GPS coordinates. Is this possible with BigQuery ?

If you have lat-lon or GPS coordinates as columns, you could definitely grab rectangular regions from BigQuery using WHERE comparisons on the coordinates and then aggregate on the selected rows.
The scalar operations available in BigQuery are pretty powerful too -- you can add a variety of arithmetic functions to your query and still get excellent performance.
You find listed example queries on the linked page:
Return a collection of points within a rectangular bounding box centered around San Francisco (37.46, -122.50).
Return a collection of up to 100 points within an approximated circle determined by the using the Spherical Law of Cosines, centered around Denver Colorado (39.73, -104.98).

GCP announced new geospatial data types and functions with BigQuery GIS.New functions and data types follow the SQL/MM Spatial standard and will be familiar to PostGIS users and anyone already doing geospatial analysis in SQL.
Also, a new lightweight visualization tool called BigQuery Geo Viz is announced which is designed for BigQuery users that want to plot and style their geospatial query results on a map.
Implementation, currently in alpha. You can request to get access.
More details can be found here - GCP Blog

Related

Calculate driving distance between origin and destination using longitude and latitude natively in BigQuery?

I would like to calculate driving distance between two points writing SQL in Google BigQuery. I understand there is a method to calculate linear distance or "bird" miles using the following function: ROUND(ST_DISTANCE(ST_GEOGPOINT(C.LONGITUDE, C.LATITUDE), ST_GEOGPOINT(B.LNG_NBR, B.LAT_NBR))/1609.34,2) AS LINEAR_DIST_MILES
However, I am interested in driving distance instead of a linear distance. Is there a way to do this natively in Google BigQuery without needing to hit a Google Map API? I've also explored some solutions in R but that requires a Google Maps API key.
You would need two parts
good roads datasets
routing algorithms
BigQuery public datasets includes OpenStreetMaps, which is a reasonable dataset of roads (and other types of information) in most areas. There is also TIGER (bigquery-public-data.geo_us_roads) dataset which is US-specific.
Carto provides a sets of UDFs that can be used for routing. They've published an article how to connect things together:
https://carto.com/blog/how-to-do-route-optimization-at-scale-with-carto-bigquery/

Filtering out GPS coordinates that are within same radius

I have a list with nearly 100,000 GPS coordinates in lat/long format in a CSV file. A lot of these are only a few inches away from each other, so I would like to merge them somehow, or filter those out that are too close together within a certain radius.
Do you guys know of a script or a service that can do this automatically?
There is a reference to a nice paper that explains how to find nearest lat/lng points inside a specified bounding box in another thread, which you can find here: latitude/longitude find nearest latitude/longitude - complex sql or complex calculation
Here is the direct link to the paper: Geo Distance Search with MySQL
I think you can adapt the idea from the paper to your domain in order to set up a filter procedure.

Using SQL spacial data queries to calculate distance from two different projections

So, I've got some data that has longitude and latitude. I don't know what projection those are from. I've got some latitude and longitude I'll be fetching from Google maps API, which uses a projection with SRID of 3857.
If I just assume the data is from the same projection, and it turns out they're not, how far off could my distances be?
For instance, if they're from a 3-d projection (say 4326), but I just put them into a Geometry column with SRID 3857, and we're in the Northern Hemisphere, (Great Lakes area, but also other parts of the US), is there a way I can figure out how far off that would be?
EPSG:3857 uses meters as units, while EPSG:4326 uses degrees. If you try to plot them on the same map without reprojecting one or the other, they will be very far off (many orders of magnitude) from each other.
You said you'll be fetching lat-lng from the Google Maps API, using a EPSG:3857 as a projection, but latitude and longitude coordinates are not projected by definition, although they may use a different datum. I can't find official Google documentation, but consensus seems to be that Google Maps API uses WGS84, same as EPSG:4326, so lat-lngs you pull from google maps API will probably fit exactly on top of others from EPSG:4326.
See http://spatialreference.org/ref/epsg/4326/ and http://spatialreference.org/ref/sr-org/7483/ and https://gis.stackexchange.com/questions/34276/whats-the-difference-between-epsg4326-and-epsg900913

Determine if GPS location is within city limits?

I want to be able to determine if a GPS location is in an inhabited or uninhabited zone.
I have tried several reverse geocoding API like Nominatim, but failed to get good results. It always returns the nearest possible address, even when I selected a location in the middle of a forest.
Is there any way to determine this with reasonable accuracy? Are there any databases or web services for this ?
If you have to calculate that youself, then the interesting things start:
The information whether or not a region is inhabited is stored in digital maps in layer "Land_Use". There are values for Forest, Water, Industry, Cemetary, etc.
You would have to import these Land_use polygons into a special DB (PostGres).
Such a spatial DB provides fast geo indizeds for searching only the relevant polygons.
Some countries may also fit in main memory, but then you need some kind of geo spatial index, like Quad-Tree or k-d tree to store the polygons.
Once you have imported the polygons, it is a simple "point in polygon" query, or "polygons within radius r". The typoe of th epolygon denotes the land use.
OpenStreetMap provides these polygons for free.
Otherwise you have to buy them from TomTom or probably NavTeq (Nokia Maps). But this makes only sense for major companies.
Since you're using Nominatim, you're getting the coordinates of the nearest address back in the reply.
Since the distance between two coordinates can be calculated, you can just use that to calculate the distance to the closest address found, and from that figure out if you're close to populated areas or not.

Location data storage (points, grouping by distance etc) - Best Practices and Recommended Solutions

I have come across a problem that I 've never solved before but I find it frequently implemented in various apps so I would like to ask if there is a common way to solve it. I have a set of analytics data each representing some logging action (i.e. info, warn etc). Each of this items has a location and a type (i.e. action). There can be millions of these items per area (depending on the area size or map zoom).
I am looking for the best way to store this set of data in my database. I am very comfortable with SQL Server but dont mind what db I have to use as long as it can handle the scalability requirements. If Amazon WS offers such a product or some other cloud solution then even better cause thats how we are planning to host this app. Google maps will be used to visualize the data.
Some requirements:
Be able to plot all data for a given map rectangle (a common google
map interface with markers representing the logging actions)
Be able to zoom in/zoom out and get relevant data for the new map rectangle
Be able to "group" markers in one bigger marker if data are very close. For instance, if point A is 1 km away from point B and I am seeing a map of 10 km radius then I should see two independent points, A and B. But if I zoom out to 500 km radius then point A and B are too close to each other so I would like to group them in one marker. Hopefully that's possible.
If SQL Server is not a good solution then a free, very cheap or cloud-based storage solution should be recommended (no I cant afford an Oracle).
All the queries above should be able to come back within milliseconds or somehow to be cached. Queries will be of the kind: Get me all analytics data for the given map window with zoom of the given rectangle latitude/longitude.
Thanks,
Yannis
If I undertsood correctly, there are 2 sides to your question:
1- A Database System, that supports storage and lookup of spatial data. Many of the free/open source RDBMS have spatial extensions: MySQL and Postgres (PostGIS) in particular. Spatial data are stored like any other data with the addition of spatial geometry attribute, which describes the shape of your data instance (point, rectangle, polygon, ellipse, ...). You can query spatial data entities, with spatial filters. And of course, spatial queries support joints and unions and almost all kind of SQL constructs.
2- A client/server api that would support rendering of spatial data (with the usual functions such as zoom in, zoom out, pan, etc.), caching and drill-down. As far as I know, there isn't one api that support all these features together, out the box. But there are some interesting apis that you might want to investigate.
Hope this helps.