Google Reverse Geocoding - how to choose between a street_address or route? - reverse-geocoding

I have been using Google's reverse geocoding APIs in a vehicle tracking application to convert lat/lon information into an "address" for at least 5 years. Recently, this conversion has started yielding some surprising results.
For example, the lat/lon pair, 36.7653111,-121.74852, when plugged into Google Maps, yields "CA-156, Castroville, CA 95012" as the address. This is the desirable answer.
The tracking application yields "11298 Haight St, Castroville, Monterey County, CA, 95012, US" The problem is that the JSON result contains two "street_address" and one "route" type. The dumb algorithm of choosing the first street_address or route occurring in the result no longer works. The question now is how to decide which of the possibilities is a better match to the given lan/lon? The lat/lon is clearly on route CA-156. Haight St. does not cross CA-156 at all.
What is special about this case is that the vehicle is not travelling on either of the streets in the two "street_address" types but is on the street in the route. In this case, the route should have been given priority over the two street_address types.
I have now examined the results of hundreds of reverse geocodings. There does not appear to be any simple algorithmic way of choosing the best result. For example, reverse geocoding 37.31674,-122.0472125 returns only two results:
Type: premise
Address: Child Development Center, Cupertino, Santa Clara County, CA 95014, US
location_type: ROOFTOP
37.316425,-122.0460558 Distance: 354.7286202778164 Feet
Type: route
Address: CA-85, Cupertino, Santa Clara County, CA 95014, US
location_type: GEOMETRIC_CENTER
37.3145586,-122.0461306 Distance: 855.4738140974437 Feet
The vehicle is travelling on CA-85. Choosing the first result (premise) or the result with least distance, does not yield the best result.
The fundamental problem here is the for "route" types, the distance to the GEOMETRIC_CENTER does not tell you if you are "on the route" (0 distance) or if you are "off the route", how far off.
I have filed a case with Google. If I get a useful response, I will post it here.

If you are reverse geocoding lat/lon information coming from in-vehicle devices here are two approached that significantly improve the results. The discussion assumes you have limited the results to types: "premise", "street_address" or "route", If you are interested in other types, you may have to experiment a bit.
First, if the in-vehicle device returns the speed along with lat/lon, then choose the "route" result, if one is present, when the speed is above a certain threshold. Otherwise, choose the "street_address" or "premise" with the least distance to the lat/lon. You may have to experiment a bit with the speed threshold to find a reasonable value. For me, 25 MPH seemed to do a decent job.
Second, if you don't have speed or another indication that the vehicle is stopped or moving, then try the following "hack".
Scan the results up to the first occurrence of a "route" and determine amongst "premise" or "street_address" types the one with the least distance to the lat/lon. Remember the "route", if one is found.
Then
1. If no "route" result exists, return the "premise" or "street_address" with the least distance to the lat/lon.
2. Else
a. If the "route" has a "route" name that matches the regex "[A-Z]+-[0-9]+", return the route as the best result.
b. Else if a least distance "premise" or "street_address" exists, return that as the best result.
c. Otherwise, return the "route" as a best result.
This is far from perfect, but seems to work well enough for the US which is all I care about right now. As route names differ significantly from country to country some enhancement will likely be necessary.

Related

SUMO - simulating traffic scenario

How can I simulate continuous traffic flow from historical data which consists of:
1. Vehicle ID;
2. Speed;
3. Coordinates
without knowing the routes of each vehicle ID.
This is a commonly asked questions but probably hasn't been answered here before. Unfortunately the answer largely depends on the quality of your input data mainly on the frequency / distance of your location updates (it would be also helpful if there is a time stamp to each datum) and how precise the locations fit your street network. In the best case there is a location update on each edge of the route in the street network and you can simply read off the route by mapping the location to the street. This mapping can be done using the python sumolib coming with sumo:
import sumolib
net = sumolib.net.readNet("myNet.net.xml")
route = []
radius = 1
for x, y in coordinates:
minDist, minEdge = min([(dist, edge) for edge, dist in net.getNeighboringEdges(x_coordinate, y_coordinate, radius)])
if len(route) == 0 or route[-1] != minEdge.getID():
route.append(minEdge.getID())
See also http://sumo.dlr.de/wiki/Tools/Sumolib#locate_nearby_edges_based_on_the_geo-coordinate for additional geo conversion.
This will fail when there is an edge in the route which did not get hit by a data point or if you have a mismatch (for instance matching an edge which goes in the "wrong" direction). In the former case you can easily repair the route using sumo's duarouter.
> duarouter -n myNet.net.xml -r myRoutesWithGaps.rou.xml -o myRepairedRoutes.rou.xml --repair
The latter case is considerably harder both to detect and to repair because it largely depends on your definition of a wrong edge. There are almost clear cases like hitting suddenly the opposite direction (which still can happen in real traffic) and a lot of small detours which are hard to decide and deserve a separate answer.
Since you are asking for continuous input you may also be interested in doing this live with TraCI and in this FAQ on constant input flow.

Understanding Google Code Jam 2013 - X Marks the Spot

I was trying to solve Google Code Jam problems and there is one of them that I don't understand. Here is the question (World Finals 2013 - problem C): https://code.google.com/codejam/contest/2437491/dashboard#s=p2&a=2
And here follows the problem analysis: https://code.google.com/codejam/contest/2437491/dashboard#s=a&a=2
I don't understand why we can use binary search. In order to use binary search the elements have to be sorted. In order words: for a given element e, we can't have any element less than e at its right side. But that is not the case in this problem. Let me give you an example:
Suppose we do what the analysis tells us to do: we start with a left bound angle of 90° and a right bound angle of 0°. Our first search will be at angle of 45°. Suppose we find that, for this angle, X < N. In this case, the analysis tells us to make our left bound 45°. At this point, we can have discarded a viable solution (at, let's say, 75°) and at the same time there can be no more solutions between 0° and 45°, leading us to say that there's no solution (wrongly).
I don't think Google's solution is wrong =P. But I can't figure out why we can use a binary search in this case. Anyone knows?
I don't understand why we can use binary search. In order to use
binary search the elements have to be sorted. In order words: for a
given element e, we can't have any element less than e at its right
side. But that is not the case in this problem.
A binary search works in this case because:
the values vary by at most 1
we only need to find one solution, not all of them
the first and last value straddle the desired value (X .. N .. 2N-X)
I don't quite follow your counter-example, but here's an example of a binary search on a sequence with the above constraints. Looking for 3:
1 2 1 1 2 3 2 3 4 5 4 4 3 3 4 5 4 4
[ ]
[ ]
[ ]
[ ]
*
I have read the problem and in the meantime thought about the solution. When I read the solution I have seen that they have mostly done the same as I would have, however, I did not thought about some minor optimizations they were using, as I was still digesting the task.
Solution:
Step1: They choose a median so that each of the line splits the set into half, therefore there will be two provinces having x mines, while the other two provinces will have N - x mines, respectively, because the two lines each split the set into half and
2 * x + 2 * (2 * N - x) = 2 * x + 4 * N - 2 * x = 4 * N.
If x = N, then we were lucky and accidentally found a solution.
Step2: They are taking advantage of the "fact" that no three lines are collinear. I believe they are wrong, as the task did not tell us this is the case and they have taken advantage of this "fact", because they assumed that the task is solvable, however, in the task they were clearly asking us to tell them if the task is impossible with the current input. I believe this part is smelly. However, the task is not necessarily solvable, not to mention the fact that there might be a solution even for the case when three mines are collinear.
Thus, somewhere in between X had to be exactly equal to N!
Not true either, as they have stated in the task that
You should output IMPOSSIBLE instead if there is no good placement of
borders.
Step 3: They are still using the "fact" described as un-true in the previous step.
So let us close the book and think ourselves. Their solution is not bad, but they assume something which is not necessarily true. I believe them that all their inputs contained mines corresponding to their assumption, but this is not necessarily the case, as the task did not clearly state this and I can easily create a solvable input having three collinear mines.
Their idea for median choice is correct, so we must follow this procedure, the problem gets more complicated if we do not do this step. Now, we could search for a solution by modifying the angle until we find a solution or reach the border of the period (this was my idea initially). However, we know which provinces have too much mines and which provinces do not have enough mines. Also, we know that the period is pi/2 or, in other terms 90 degrees, because if we move alpha by pi/2 into either positive (counter-clockwise) or negative (clockwise) direction, then we have the same problem, but each child gets a different province, which is irrelevant from our point of view, they will still be rivals, I guess, but this does not concern us.
Now, we try and see what happens if we rotate the lines by pi/4. We will see that some mines might have changed borders. We have either not reached a solution yet, or have gone too far and poor provinces became rich and rich provinces became poor. In either case we know in which half the solution should be, so we rotate back/forward by pi/8. Then, with the same logic, by pi/16, until we have found a solution or there is no solution.
Back to the question, we cannot arrive into the situation described by you, because if there was a valid solution at 75 degrees, then we would see that we have not rotated the lines enough by rotating only 45 degrees, because then based on the number of mines which have changed borders we would be able to determine the right angle-interval. Remember, that we have two rich provinces and two poor provinces. Each rich provinces have two poor bordering provinces and vice-versa. So, the poor provinces should gain mines and the rich provinces should lose mines. If, when rotating by 45 degrees we see that the poor provinces did not get enough mines, then we will choose to rotate more until we see they have gained enough mines. If they have gained too many mines, then we change direction.

Measure gps distance

Here's my problem: a smartphone will send to my server some gps coordinates (latitude,longitude,altitude) and I'll have to compare these to an address stored in db in order to see how much distance there is between smartphone and address.
I'll need to obtain this address coordinates as well in order to do the actual comparison.
Is there a good and easy to use gps library for java?Any suggestions?
In your answers please note that I need a way to get coordinates from an address too!! So, given an address "second street 2,New York, zip code 01245", I need to find latitude,longitude,altitude,ecc.
Android's Location class has a static method distanceBetween(startLatitude, startLongitude, endLatitude, endLongitude, results). You can look at the source code and use it in your program.
You could take a look at
A distance calculator using GeoCodes
Distance between 2 geocodes

CLLocation geocodeAddressString result accuracy

I'm currently working on an app where the user inputs and address, which is then converted into coords. A database of locations is then queried and locations with in, say, 5km of search location is returned.
The problem I'm having is the accuracy returned by the geocodeAddressString function. When searching: Auckland, New Zealand, I'm getting back -36.90000, 174.70000, which is about 10 km's off the correct result. It's a few suburbs over.
Is there any way to improve on this? The Google Maps result is -36.848479, 174.763373, which you can see is much sharper and what I'm after.
Thanks!

How to get reliable U.S. state responses by reverse geocoding?

Google sometimes returns the incorrect U.S. state when reverse geocoding a lat/long. Presumably this is because Google is trying to return the nearest street address, which in some cases is not in the same state as the lat/long you are trying to reverse geocode.
Though it may not be a common scenario in practice, it's pretty easy to reproduce by playing around with a map: http://gmaps-samples.googlecode.com/svn/trunk/geocoder/reverse.html
For my application, I am less concerned about getting the nearest address and more concerned about always getting the correct U.S. state for a lat/long. Is there a way to achieve this with Google's API?
Thank you
Iterate over all results and pick the one with "administrative_area_level_1" in results[i].types
This is better than taking the "equivalent" address component from the first result, i.e. finding "administrative_area_level_1" in results[0].address_components[j].types
When reverse geocoding snaps your latlng to the nearest address which happens to be in a different state (or country), the state/country address component of the first result will be that of where that address is, but the subsequent result will be the state/country where the input latlng is.
Example: 42.834185,-0.302811 is in Spain, but snaps to an address in France.
https://google-developers.appspot.com/maps/documentation/utils/geocoder/#q%3D42.834185%252C-0.302811
results[0].address_components[3].types = ["administrative_area_level_1", "political"]
results[0].address_components[3].short_name = "FR"
results[6].types = ["administrative_area_level_1", "political"]
results[6].short_name = "ES"