OK so I don't have an issue here but I'm just wondering if there's a more standardized way to handle what I'm doing.
Essentially I have a DB table full of locations including longitude and Latitude, there could potentially be thousands of locations. I also have some functionality to search your postcode and you can then see from the stored the locations the closest x amount to you.
Ive read about going off and using the Google Maps api to do this but I don't really want to pull back and send thousands of requests to the google maps api.
So here's what I'm doing. I have a stored procedure where I am passing the users Long and Lat. I am then using this to form a column called distance with which I am then ordering the data. The distance column I am working out using the below logic:
SQRT(SQUARE((CAST(USERSLAT AS decimal(9,6))) - Latitude) + SQUARE((CAST(USERSLONG AS decimal(9,6)))-(Longitude))) AS Distance
Essentially what this is doing is the classic a^2=b^2+c^2 to find the distance between to coords, and then using these results I can theoretically see the closest locations to the user. Once I have this data i can use the google maps api to find the exact distances. Is this an ok way to do things? I have this nagging feeling in the back of my head that im missing something.
Related
I am trying to utilize tableau in creating a web dashboard to interact with a postgres database with a fair amount of rows.
The key here is the relevant data is within latitude/longitude boundaries, so I'm using tableau parameters in a custom SQL statement to get what I need, like so
SELECT id, lat, lng... FROM my_table
WHERE lat >= <Parameters.MIN_LAT> AND lat <= <Parameters.MAX_LAT>
AND lng >= <Parameters.MIN_LNG> AND lng <= <Parameters.MAX_LNG>
LIMIT 10000
I'm setting these parameters using the Tableau JavaScript API based off of a Google maps widget boundaries. When the map is moved, I'll refresh the parameters and the data needs to update as well. This refresh is not done constantly, but frequent enough that long wait times are not acceptable.
Because the lat/lng boundaries are dynamic and the full unfiltered table is very big (~1GB) I presumed it is impractical to create a data extract. Am I wrong?
Furthermore when I change some of the in-Tableau filters I'm applying there is a very long wait as if it is re-executing the query every-time, even if the MIN_LAT, MAX_LAT, .. parameters are un-changed.
What's the best way of resolving this? I'm new to Tableau so sorry if I'm missing something super obvious!
Thanks.
The best way of resolving this, is by making a query with less information on it (1GB is too much, the extract can help to group data to present dimensions very fast, but that's it.. if there is nothing to group it will be very extense), which permits doing a drill down to present more information on subsequents steps or dashboards levels.
I am thinking of a field of the database which can tell the zoom level of presenting information.
If you are navigating on googlemaps... first you find the countrys, then the capital cities, then the cities, then the small towns, then the local stores...
The key is on the zoom level you are on each time.
You may visit Tableau about Drill downs.
I have an application with a sqlite database that contains 7000+ records in it with city names, longitudes and latitudes.. also these "cities" are connected to relevant city fields on the database too.
What my app doing is, query the current location with core location, fetch the lon and lat values, and then find the closest location from the database.
The result doesn't have to be super accurate (i just want to match cities), so I want to use Hypotenuse formula for finding the closest point:
closest city in db: min((x1-x2)^2 +(y1-y2)^2)^(1/2)
x1, y1: lon and lat for user
x2, y2: lon and lat for points in database.
If I was using ms-sql or sqlite database, I could easily create a query but when it comes to core data, I'm out of ideas.
I don't want to fetch all the data (and fill the memory) then aggregate this formula on all fields so is there a way to create a query and get the result from the db?
Am I overthinking this problem, and missing a simple solution?
If I'm understanding your problem correctly, you're wanting to find the closest "n" cities to your current location.
I had something similar and here's how I approached it.
In essence, you probably need to take each city's lat/lon and hash it into some index. We use a Mercator Projection to convert the lat/lon to x/y, then hash that value in a manner similar to how Google/Bing/Apple Maps hash their map tiles. Fortunately, MapKit has a built-in Mercator Projection function.
In pseudocode:
for each city's lat/lon {
CLLocationCoordinate2D coordinate = (CLLocationCoordinate2D){lat, lon};
MKMapPoint point = MKMapPointForCoordinate(coordinate);
//256 represents the size of a map tile at zoomLevel 20. You can use whatever zoomLevel
//you want here, but we need something to quickly lookup close-by cities.
//this is the formula you can use to determine how granular your index is
//(256 * pow(2, (20 - zoomLevel)))
NSInteger x = point.x/256.0;
NSInteger y = point.y/256.0;
save x & y in a CityHashIndex table
}
Now, you get the current location's lat/lon, hash that into the index as above, and just simply write a query against this CityHashIndex table.
So say that, for simplicity sake, you're current location is indexed at 1000, 1000. So to find close by cities, maybe you search for cities with indexes in the range of `900-1100, 900-1100'.
From there, you're now only pulling in a much smaller set of cities and the memory requirements to process your Hypotenuse Formula isn't so bad.
I can elaborate more if you're interested.
This is directly related to a commonly asked question about Core Data.
Searching for surrounding suburbs based on latitude & longitude using Objective C
Calculate a bounding box around the point you need (min lat/long max lat/long) then use an NSPredicate against those values to find everything within the box. From there you can do a distance calculation on the results that return and sort them.
I would suggest setting this up so that it can search at multiple distances then you can see if a city is within 10 miles, 100 miles, etc. Slowly increasing the bounding box until you get one or more results back.
I would use NSPredicate to define my search criteria it will act as a filter. I'm not sure how optimized is this and if it will pull all your registers but I'm assuming that coreData has some kind of indexing mechanism that will optimize the search.
You can take a look of this document
https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/CoreData/Articles/cdFetching.html
Check the section named
Retrieving Specific Objects
Our business user loves for our searches to be done by distance, problem is we have over 1 million records with a lat/long location. We are using SQL 2008 but we keep running into issues when we order or restrict our searches by distance that the queries take way to long (30 seconds plus). This is unacceptable, there has got to be a better way to do this. We have done everything we can with SQL 2008 and want to upgrade to 2012 if we can at some point.
I ask though, if there is another technology or optimization that we could apply. Could we switch to a different DB for faster performance, a different search algorithm to apply, estimation algorithm, tree, grids, pre-computation, etc?
A solution that might be useful here would be to break your search into two parts:
1) Run a query where you find all records that are within a certain value + or - of the current lat/lng of your location, the where clause might look like:
where (#latitude > (lat - .001) and #latitude > (lat - .001)) and (#longitude> (lng- .001) and #longitude> (longitude- .001))
Using this approach, and especially with an index on both the latitude and longitude columns, you can very quickly define a working set of locations within a specified distance.
2) with the rough results from step 1, use the great circle/haversine method to determine what the actual distance between the source location and each point is.
Where this approach falls over is if there is never any limit to the radius that you are searching, but it works great if you are for instance looking to find all locations within a specific distance of a given point.
I need help with a architectural problem that im working with. The user enters a position and a radius (e.g. distance). The software searches in a (giant = couple of 100k posts) database table for posts where the users location and the "posts" distance to each other is less than the entered distance.
It's kind of hard for me to explain, but imagine a table with two posts, point a and point c, point U is the user location. The user has entered a position and a radius, and the position and radius for a and c is predefined (stored in a database).
In this case i would only be interested in the point A, because the two areas intersect with each other. How should i transform this into doing in a database with a couple of hundred thousand posts in an effective way? In the database i shall store longitude,latitude and radius.
Depends on which database server you're using, but look into the GIS capabilities that might be included. For example, MS SQL Server 2008 has a built-in geometry type, and PostgreSQL has PostGIS. Oracle has something like this too. Anyhow - these native GIS formats come with spacial querying functions that do the sort of thing you're talking about - searching for matches within given distances, etc... It is pretty simple to accomplish once to switch to the proper datatype.
edit
Since you're using SQL 2008, and your data is lat/long, I suggest the "geography" rather than the "geometry" datatype. Take a look here: http://msdn.microsoft.com/en-us/library/cc280766.aspx
Does any one know of database structure such as this http://www.maxmind.com/app/geolitecity that is optimized for super fast retrieval of long and lat based on either ZIP or (City, State, Country) parameters?
Maxmind's database does not support any other retrieval than IP retrieval, at least not to mine knowledge. So if you know how to do it preferably in Java, I'm all ears.
This should not be SQL type database or CSV file or Google API solution. Thous are just to slow. Especially if you want to offer search results sorted by distance.
Paid solutions are also option. The data structure doesn't have to be free.
I don't believe there is such a thing as a "fast" way to do this. I've built a geocoding API for Canadian postal codes and the way we search is to have two indexes of postal codes - one sorted by lattitude and one sorted by longitude. You can do some spherical geometry and develop a bounding "box" that fits everything in a given radius but you still have to go back and do a point to point distance measurement using Vincenty or Haversine or your algorithm of choice for the distance between your origin and each postal code you find.
With a world-wide database, your math gets complicated by the fact that you can cross meridians and the equator.
You'll want some kind of encoding scheme that lets you work in radians, since that is what most distance calculation hueristics require.
this can be done very quickly with any database engine that supports two dimensional indexes... and mysql supports unlimited dimensions as well as I know... it's simple.. you use a 2-d index to limit your result set to a reasonable size extremely quickly... then you examine your result set with a high precision calculation algorithm if you need to.. not hard.. except you may need to or two lists together if they cross the 180/-180 longitude line
making a 2d index is simple.... index (latitude,longitude) ... that index only works on latitude or latitude,longitude pairs... it won't work on longitude alone... if you want an additional index for longitude index (longitude) .... I select out a rough estimate square and round the corners if I care about them. ...
if you have a zip or city to start with... zip codes are just a 1-d index... no problem making that happen fast.. just use an index index(zip) ... and if your hard drive is too slow, get a solid state drive to eliminate the seek times.. or use a huge ram and cache the whole table... this is not a hard problem either way you want to go
if that's not fast enough for you, using someones service won't help because you have network overhead... you will have to hold your data directly in ram/ssd and build your own 2-d /1-d indexing system if you need it (not hard)... that route could probably beat sql by a factor of 10 or so because the sql engine has a lot of overhead.... I suppose someone might offer a service that runs on your own machine, but realistically, that wouldn't beat sql by very far because you still have to go through a bunch of hoopdiloops to make the request to their service. sql and 2-d indexes with a solid state drive will be damned fast you shouldn't need to process the data yourself unless you are the post office, sorting 10,000 pieces of mail per second with one machine serving the data. then you'll have to write your own data management routines.