Obtain coordinates of a point on a road - api

I have a range defined by an intersection and a number of feet away from the intersection. (e.g. 100 ft north of Washington St. & 5th St. to 300 ft south of Washington St. & 6th St.)
I am looking to geocode this into a lat/long pair. However, I cannot see any way to get Google Maps API or Virtual Earth, etc. to do this. They will happily geocode the intersection, but not the distance away. I can't just add 100 ft because the road doesn't necessarily go exactly straight or exactly in a cardinal direction.
I investigated getting the polyline that describes the road, but am not having much luck with obtaining that either from Google/VEarth. I looked at TIGER/LINE from the US Census but their data is very inaccurate.
Can anyone make a suggestion for how to geocode this? This is for a public map, so any of the free APIs from Google, Microsoft, etc. should be fine.
Ultimately, by the way, I'm looking for an actual street address rather than coordinates. I want to know that the range in the example I gave above, for instance, would be 508 to 563 Washington St.

Food for thought - I don't know what your application is but remember the accuracy of a typical GPS device is 10 meters (approx 33 feet). Is it possible you are trying to make this more precise than necessary?
Possible solutions
Can you just add the 100 feet and project that point perpendicular to the road? Close enough?
Grab the intersection. Traverse the first segment. If its length < 100 feet, grab the scond segment, and so on until you are 100 feet away from the intersection as measured along the road. You will need to add the appropriate checks / do some math to determine where along a segment the actual 100' mark falls.
I used the second method over the summer and will see if I can find some sample code. No promises though.
#3 - Grab the intersection. Get the bearing of the first segment and use coordinate geometry to calculate the point 100 feet along the street at a given angle. This assumes the street does not have any deflections in it, otherwise use #2.

I ended up taking a different approach -- rather than geocoding to house numbers, I use percentage up the street. I can therefore get away with calculating the length of the street and then determining how far up the street the geocoded coordinates are.

Related

NEO-6M GPS inaccurate by 30 miles

I have a Ublox NEO-6M and NEO-N8M, both ordered from China (possibly foreshadowing?).
When I wired the N8M up, it gave me a location in the middle of a lake 30 miles away. For context, I am in South Africa in the southern hemisphere.
I drove around logging two other locations, assuming it could be calibrated with a fixed point error by getting a fixed location reading and subtracting the real coordinates, and using that offset in future calculations.
It didn't work, the inaccuracy was too great, however a lot better, but obviously, it was not the solution.
I then used the 6M. To my absolute surprise, it gave me the exact same location as the first module, exactly in the middle of that same lake. So at this point, I'm starting to wonder, could this be related to the hemisphere? Is it the fact that it could be counterfeit?
The N8M used $GNGLL and the 6M used $GPGLL so I doubt it's a discrepancy with the GPS and GLONASS systems.
Any help would be appreciated, all I want is a really accurate fix on my location.
I stumbled the same way (~30 miles away) until re-reading the NMEA Protocol and realising that it was my fault. So maybe this may help you as well:
The format in the sentence for lat is ddmm.mmmm and for lon it is dddmm.mmmm so you have to do some calculations on your own:
Separate the (d)egrees and the (m)inutes. Divide minutes by 60.0 and add them to the degrees. Voila - you have your correct lat and lon.

How to get Lucene scoring to account for words not specified in search terms?

There is probably a name for what I'm asking and it has something to do with Bayesian statistics.
I have a database of street addresses and I'm using Lucene to match user-entered addresses (if you need an analogy, pretend I work for Google Maps).
Given that both "West North Avenue" and "West North Shore Avenue" are valid street names, how can I get Lucene to score "2000 West North Avenue" higher than "1000 West North Shore Avenue" when searching for "1000^0.001 West North Avenue"?
The 1000^0.001 means, the number should be used to break a tie, but otherwise matching the street name is more important than matching the right number to the wrong street.
Unfortunately in this example, the 1000^0.001 causes the wrong match (North Shore) to get ahead of the correct one.
What scoring algorithm would enable Lucene to adjust the score downwards for failure to specify an indexed term in the search, with rare terms weighing more than common terms?
I would solve this by carefully tokenizing street names. For instance, you could do this:
extract the number and the street name to two different fields street_nb, street_nm. And index them separately.
now use two clauses for your query, one, targeting street_nb is MUST,and the other SHOULD. So you make sure the street name alone will match, and then if the name matches, even better.
you can do different things besides this, like using phrases to force a perfect match on the street name etc. Play around with the variants till it gives you good results.

How to group nearby latitude and longitude locations stored in SQL

Im trying to analyse data from cycle accidents in the UK to find statistical black spots. Here is the example of the data from another website. http://www.cycleinjury.co.uk/map
I am currently using SQLite to ~100k store lat / lon locations. I want to group nearby locations together. This task is called cluster analysis.
I would like simplify the dataset by ignoring isolated incidents and instead only showing the origin of clusters where more than one accident have taken place in a small area.
There are 3 problems I need to overcome.
Performance - How do I ensure finding nearby points is quick. Should I use SQLite's implementation of an R-Tree for example?
Chains - How do I avoid picking up chains of nearby points?
Density - How to take cycle population density into account? There is a far greater population density of cyclist in london then say Bristol, therefore there appears to be a greater number of backstops in London.
I would like to avoid 'chain' scenarios like this:
Instead I would like to find clusters:
London screenshot (I hand drew some clusters)...
Bristol screenshot - Much lower density - the same program ran over this area might not find any blackspots if relative density was not taken into account.
Any pointers would be great!
Well, your problem description reads exactly like the DBSCAN clustering algorithm (Wikipedia). It avoids chain effects in the sense that it requires them to be at least minPts objects.
As for the differences in densities across, that is what OPTICS (Wikipedia) is supposed do solve. You may need to use a different way of extracting clusters though.
Well, ok, maybe not 100% - you maybe want to have single hotspots, not areas that are "density connected". When thinking of an OPTICS plot, I figure you are only interested in small but deep valleys, not in large valleys. You could probably use the OPTICS plot an scan for local minima of "at least 10 accidents".
Update: Thanks for the pointer to the data set. It's really interesting. So I did not filter it down to cyclists, but right now I'm using all 1.2 million records with coordinates. I've fed them into ELKI for analysis, because it's really fast, and it actually can use the geodetic distance (i.e. on latitude and longitude) instead of Euclidean distance, to avoid bias. I've enabled the R*-tree index with STR bulk loading, because that is supposed to help to get the runtime down a lot. I'm running OPTICS with Xi=.1, epsilon=1 (km) and minPts=100 (looking for large clusters only). Runtime was around 11 Minutes, not too bad. The OPTICS plot of course would be 1.2 million pixels wide, so it's not really good for full visualization anymore. Given the huge threshold, it identified 18 clusters with 100-200 instances each. I'll try to visualize these clusters next. But definitely try a lower minPts for your experiments.
So here are the major clusters found:
51.690713 -0.045545 a crossing on A10 north of London just past M25
51.477804 -0.404462 "Waggoners Roundabout"
51.690713 -0.045545 "Halton Cross Roundabout" or the crossing south of it
51.436707 -0.499702 Fork of A30 and A308 Staines By-Pass
53.556186 -2.489059 M61 exit to A58, North-West of Manchester
55.170139 -1.532917 A189, North Seaton Roundabout
55.067229 -1.577334 A189 and A19, just south of this, a four lane roundabout.
51.570594 -0.096159 Manour House, Picadilly Line
53.477601 -1.152863 M18 and A1(M)
53.091369 -0.789684 A1, A17 and A46, a complex construct with roundabouts on both sides of A1.
52.949281 -0.97896 A52 and A46
50.659544 -1.15251 Isle of Wight, Sandown.
...
Note, these are just random points taken from the clusters. It may be sensible to compute e.g. cluster center and radius instead, but I didn't do that. I just wanted to get a glimpse of that data set, and it looks interesting.
Here are some screenshots, with minPts=50, epsilon=0.1, xi=0.02:
Notice that with OPTICS, clusters can be hierarchical. Here is a detail:
First, your example is quite misleading. You have two different sets of data, and you don't control the data. If it appears in a chain, then you will get a chain out.
This problem is not exactly suitable for a database. You'll have to write code or find a package that implements this algorithm on your platform.
There are many different clustering algorithms. One, k-means, is an iterative algorithm where you look for a fixed number of clusters. k-means requires a few complete scans of the data, and voila, you have your clusters. Indexes are not particularly helpful.
Another, which is usually appropriate on slightly smaller data sets, is hierarchical clustering -- you put the two closest things together, and then build the clusters. An index might be helpful here.
I recommend though that you peruse a site such as kdnuggets in order to see what software -- free and otherwise -- is available.

Is there a formula to change a latitude and longitude into a single number?

Can you tell me if there is a formula to change a latitude and longitude into a single number?
I plan to use this for a database table in software that provides routing for deliveries. The table row would have that number as well as the postal address. The database table would be sorted in ascending numeric order so the software can figure out which address the truck would need to go to first, second etc.
Please can you respond showing VB or VB.Net syntax so I can understand how it works?
For example I would use the following numbers for the latitude and longitude:
Lat = 40.71412890
Long = -73.96140740
Additional Information:
I'm developing an Android app using Basic4Android. Basic4Android uses a VB or VB.Net syntax with SQLite as the database.
Part of this app will have route planning. I want to use this number as the first column in an SQLite table and the other columns will be for the address. If I do a query within the app that sorts the rows in numerical ascending order, I will be able to figure out which postal address are closest to each other so it will take less time for me to go from house to house.
For example, if the numbers were:
194580, 199300, 178221
I can go to postal address 178221 then to 194580 and finally to 199300 and I won't need to take the long way around town to do my deliveries after they were sorted.
As an alternative, I would be happy if there was an easy way to call a web service that returns maybe a json response that has the single number if I send a postal address to the web site. Basic4Android does have http services that can send requests to a web site.
A latitude an longitude, can both be represented as 4 byte integer, such that the coordinates has an accuracy of 3cm which is sufficent for most applications.
Steps to create one 8 byte value of type long from latitude and longitude:
1) convert lat and lon to int by: int iLat = lat * 1E7;
2) Use a 8 byte long value to store both 4 byte int.
set upper 4byte to latitude, and lower 4 to longitude.
Now you have a 8 byte long representing a point on world up to 3cm accuracy.
There are other, better solutions, such ones that maintain similar numbers for near locations, but these are more complex.
You can add them up, but it makes little sense.
For instance a total of "10" - 8 lat and 2 long would then be the same as "10" - 3 lat and 7 long.
You can concatenate them, maybe with a dash.
But why do either? They are both really bad choices. A delivery system would want real x-y co-ordinates and if planning a route would want them seperate in order to calculate things like Euclidean distances.
Is this a homework question? I doubt a delivery service is designing their service structure on SO. Least hope not.
Based on AlexWien's anwser this is a solution in JavaScript:
pairCoordinates = function(lat, lng) {
return lat * 1e7 << 16 & 0xffff0000 | lng * 1e7 & 0x0000ffff;
}
How about this:
(lat+90)*180+lng
From Tom Clarkson's comment in Geospatial Indexing with Redis & Sinatra for a Facebook App
If you want to treat location as "one thing", the best way to handle this is to create a data structure that contains both values. A Class for OO languages, or a struct otherwise. Combining them into a single scalar value has little value, even for display.
Location is a really rich problem space, and there are dozens of ways to represent it. Lat/Lon is the tip of the iceberg.
As always, the right answer depends on what you're using it for, which you haven't mentioned.
I have created a method of putting the latitude and longitude into one base-36 number which for now I'm calling a geohexa.
The method works by dividing the world into a 36 x 36 grid. The first character is a longitude and the second character is a latitude. The latitude and longitude those two characters represent is the midpoint of that 'rectangle'. You just keep adding characters, alternating between longitude and latitude. Eventually the geohexa, when converted back to a lat and lon will be close enough to your original lat and lon.
Nine characters will typically get you within 5 meters of a randomly generated lat and lon.
The geohexa for London Bridge is hszaounu and for Tower Bridge is hszaqu88.
It is possible to sort the geohexa, and locations that are near each other will tend to be next to each other in a sorted list to some extent. However it by no means solves the travelling salesman problem!
The project, including a full explanation, implementations in Python, Java and JavaScript can be found here: https://github.com/Qarj/geohexa
You can use the Hilbert space filling curve to convert latitude,longitude into a single number: e.g., https://geocode.xyz/40.71413,-73.96141?geoit=xml 2222211311031 and https://geocode.xyz/40.71413,-73.96151?geoit=xml 2222211311026
The source code is here: https://github.com/eruci/geocode
In a nutshell:
Let X,Y be latitude,longitude
Truncate both to the 5th decimal point and convert to integers multiplying by 100000
Let XY = X+Y and YX = X-Y
Convert XY,YX to binary, and merge them into XYX by alternating the bits
Convert XYX to decimal
Add an extra number (1,2,3,4) to indicate when one or both XY,YX are negative numbers.
Now you have a single number that can be converted back to latitude,longitude and which preserves all their positional properties.
I found I can get good results by adding the latitude and longitude of a particular address by not including the house number and sorting the results in the database table by the added number following by a 2nd sort on the house number in ascending order.
I used this url to get the numbers I needed to add together:
http://where.yahooapis.com/geocode?q=stedman+st,+lowell,+ma

Calculating person's time zone (GMT offset) based on phone number?

I've gotten a request to show a person's local time based on their phone number. I know our local GMT offset, so I can handle USA phones via a database table we have that links US zip_code to GMT offset (e.g. -5). But I've no clue how to convert non-US phone numbers or country names (these people obviously don't have a zip code).
If you care, my employer college wants to solicit our alumni for donations and do it during reasonable hours.
Sorry to all that I didn't clearly state that I was considering HOME phone numbers. So roaming isn't an issue. I'm looking for some reference table or Oracle application I can source this info from.
Florida has two time zones, but many countries only have one. You need this table: http://en.wikipedia.org/wiki/List_of_country_calling_codes . Parse the country code out of the phone number by looking for the 1 and the area code for NANPA countries (those countries using the same 1+area code allocation as the USA), 7 for Russia or Kazakhstan. If that doesn't match check to see whether the number starts with one of the 2 digit calling prefixes, and then the 3-digit ones.
Remember that the first few digits of the number may be the international dialing prefix, and are not properly part of the telephone number.
For countries that span more than one time zone, see if you can get allocation information from their national telecom regulator. For the USA and other NANPA countries, check out http://www.nanpa.com/ .
Of course your results will be far from perfect, but hopefully you will wake fewer customers from their night's sleep.
Local time is one thing but, if you have worldwide customers, there are also local habits to take into account.
People typically go to bed earlier in Norway than in Spain (and they are in the same time zone).
You might be able to get the phone company to feed you location data (this info should exist for land lines and must exist for cells) but expect to pay.
Some nations are easy, since they are in a single time-zone. Look at Europe and add millions of people by just using the internation dialing code. +47 for Norway etc.
Phone-number allocations are usually done by a national telecom authority, so you could probably get the information for free.
As you allready know this would only take into account default-timezone, since they might be anywhere on the planet at the time. Also number-allocation might not distingish at all between timezones, so the approach is buggy but potentially usefull to provide default settings.
Regards
Look in the phone book. Ours has quite a few pages mapping area codes onto countries/provinces/states. Then you have to map geographical locations onto time zones, but that is pretty straightforward.
Impossible. If I drive about 400 miles east (west coast of the US) then I'll break your algorithm by having a XXX number in a YYY timezone.
Now if this is a cell phone app, it does seem possible with something called NITZ.
I think Danie, Bortzmeyer, and others are over thinking the problem. It's not to maximize the calling window, it's to find an acceptable time.
Let's take the US and consider only the 4 major timezones. Say we define acceptable as from 10AM - 7PM. I doubt even the Norwegian Bachelor farmers go to be before 7PM.
So if you know that the phone is in the US, don't make any call before 1PM. That way if they are in NYC or LA, it's still after 10AM. And no calls after 7PM. Who cares if it's Florida main or its hour later panhandle? Dallas or El Paso, also same state but different time zones. For US, filter for AK and HI. The only seriously difficult country is Russia -- lots-o-timezones.