I have a database of user submitted latitude/longitude points and am trying to group 'close' points together. 'Close' is relative, but for now it seems to ~500 feet.
At first it seemed I could just group by rows that have the same latitude/longitude for the first 3 decimal places (roughly a 300x300 box, understanding that it changes as you move away from the equator).
However, that method seems to be quite lacking. 'Closeness' can't be significantly different than the distance each decimal place represents. It doesn't take into account that two locations may have different digits in the 3rd (or any) decimal place, but still be within the distance that place represents (33.1239 and 33.1240).
I've also mulled over the situation where Point A, and Point C are both 'close' to Point B (but not each other) - should they be grouped together? If so, what happens when Point D is 'close' to point C (and no other points) - should it be grouped as well. Certainly I have to determine the desired behavior, but how would either be implemented?
Can anyone point me in the right direction as to how this can be done and what different methods/approaches can be used?
I feel a bit like I'm missing something obvious.
Currently the data is an a MySQL database, use by a PHP application; however, I'm open to other storage methods if they're a key part in accomplishing this. here.
There are a number of ways of determining the distance between two points, but for plotting points on a 2-D graph you probably want the Euclidean distance. If (x1, y1) represents your first point and (x2, y2) represents your second, the distance is
d = sqrt( (x2-x1)^2 + (y2-y1)^2 )
Regarding grouping, you may want to use some sort of 2-D mean to determine how "close" things are to each other. For example, if you have three points, (x1, y1), (x2, y2), (x3, y3), you can find the center of these three points by simple averaging:
x(mean) = (x1+x2+x3)/3
y(mean) = (y1+y2+y3)/3
You can then see how close each is to the center to determine whether it should be part of the "cluster".
There are a number of ways one can define clusters, all of which use some variant of a clustering algorithm. I'm in a rush now and don't have time to summarize, but check out the link and the algorithms, and hopefully other people will be able to provide more detail. Good luck!
Use something similar to the method you outlined in your question to get an approximate set of results, then whittle that approximate set down by doing proper calculations. If you pick your grid size (i.e. how much you round off your co-ordinates) correctly, you can at least hope to reduce the amount of work to be done to an acceptable level, although you have to manage what that grid size is.
For example, the earthdistance extension to PostgreSQL works by converting lat/long pairs to (x,y,z) cartesian co-ordinates, modelling the Earth as a uniform sphere. PostgreSQL has a sophisticated indexing system that allows these co-ordinates, or boxes around them, to be indexed into R-trees, but you can whack something together that is still useful without that.
If you take your (x,y,z) triple and round off- i.e. multiply by some factor and truncate to integer- you then have three integers that you can concatenate to produce a "box name", which identifies a box in your "grid" that the point is in.
If you want to search for all points within X km of some target point, you generate all the "box names" around that point (once you've converted your target point to an (x,y,z) triple as well, that's easy) and eliminate all the boxes that don't intersect the Earth's surface (tricker, but use of the x^2+y^2+z^2=R^2 formula at each corner will tell you) you end up with a list of boxes target points can be in- so just search for all points matching one of those boxes, which will also return you some extra points. So as a final stage you need to calculate the actual distance to your target point and eliminate some (again, this can be sped up by working in Cartesian co-ordinates and converting your target great-circle distance radius to secant distance).
The fiddling around comes down to making sure you don't have to search too many boxes, but at the same time don't bring in too many extra points. I've found it useful to index each point on several different grids (e.g. resolutions of 1Km, 5Km, 25Km, 125Km etc). Ideally you want to be searching just one box, remember it expands to at least 27 as soon as your target radius exceeds your grid size.
I've used this technique to construct a spatial index using Lucene rather than doing calculations in a SQL databases. It does work, although there is some fiddling to set it up, and the indices take a while to generate and are quite big. Using an R-tree to hold all the co-ordinates is a much nicer approach, but would take more custom coding- this technique basically just requires a fast hash-table lookup (so would probably work well with all the NoSQL databases that are the rage these days, and should be usable in a SQL database too).
Maybe overkill, but it seems to me a clustering problem: distance measure will determine how the similarity of two elements is calculated. If you need a less naive solution try Data Mining: Practical Machine Learning Tools and Techniques, and use Weka or Orange
If I were tackling it, I'd start with a grid. Put each point into a square on the grid. Look for grids that are densely populated. If the adjacent grids aren't populated, then you have a decent group.
If you have adjacent densely populated grids, you can always drop a circle at the center of each grid and optimize for circle area vs (number of points in the circle * some tunable weight). Not perfect, but easy. Better groupings are much more complicated optimization problems.
Facing a similar issue, I've just floor the Longitude and Latitude until I got the required 'closeness' in meters. In my case, floor to 4 digits got me locations grouped when they are approx. 13 meters apart.
If the Long or Lat are negatives - replace floor with ceil
First FLOOR (or CEIL) to required precision and then GROUP on the rounded long and lat.
The code to measure distance between two geo locations was borrowed from Getting distance between two points based on latitude/longitude
from math import sin, cos, sqrt, atan2, radians
R = 6373.0
lat1 = radians(48.71953)
lon1 = radians(-73.72882)
lat2 = radians(48.719)
lon2 = radians(-73.728)
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1 - a))
distance = (R * c)*1000
print("Distance in meters:", round(distance))
Distance in meters: 84
As expected, the distance is larger in the south, and smaller in the north - for the same angle.
For the same coordinates, but on the equator, the distance is 109 meters (modify the latitudes to 0.71953 and 0.719).
I modified the number of digits in the following and always kept one-click on Long and one on Lats, and measured the resulting distances:
lat1 = radians(48.71953)
lon1 = radians(-73.72882)
lat2 = radians(48.71954)
lon2 = radians(-73.72883)
Distance in meters 1
lat1 = radians(48.7195)
lon1 = radians(-73.7288)
lat2 = radians(48.7196)
lon2 = radians(-73.7289)
Distance in meters 13
lat1 = radians(48.719)
lon1 = radians(-73.728)
lat2 = radians(48.720)
lon2 = radians(-73.729)
Distance in meters 133
lat1 = radians(48.71)
lon1 = radians(-73.72)
lat2 = radians(48.72)
lon2 = radians(-73.73)
Distance in meters 1333
Summary: Floor / Ceil the longitude and latitude to 4 digits, will help you group on locations that are approximately 13 meters apart.
This number changes depending on the above equation: larger near the equator and smaller in the north.
If you are considering latitude and longitude there are several factors to be considered in real time data: obstructions, such as rivers and lakes, and facilities, such as bridges and tunnels. You cannot group them simply; if you use the simple algorithm as k means you will not be able to group them. I think you should go for the spatial clustering methods as partitioning CLARANS method.
Related
I'm trying to decide whether it makes cpu processing time sense to use the more complex Haversine formula instead of the faster Pythagorean's formula but while there seems to be a pretty unanimous answer on the lines of: "you can use Pythagora's formula for acceptable results on small distances but haversine is better", I can not find even a vague definition on what "small distances" mean.
This page, linked in the top answer to the very popular question Calculate distance between two latitude-longitude points? claims:
If performance is an issue and accuracy less important, for small distances Pythagoras’ theorem can be used on an equirectangular projection:*
Accuracy is somewhat complex: along meridians there are no errors, otherwise they depend on distance, bearing, and latitude, but are small enough for many purposes*
the asterisc even says "Anyone care to quantify them?"
But this answer claims that the error is about 0.1% at 1000km (but it doesn't cite any reference, just personal observations) and that for 4km (even assuming the % doesn't shrink due to way smaller distance) it would mean under 4m of error which for public acces GPS is around the open-space best gps accuracy.
Now, I don't know what the average Joe thinks of when they say "small distances" but for me, 4km is definitely not a small distance (- I'm thinking more of tens of meters), so I would be grateful if someone can link or calculate a table of errors just like the one in this answer of Measuring accuracy of latitude and longitude? but I assume the errors would be higher near the poles so maybe choose 3 representative lattitudes (5*, 45* and 85*?) and calculate the error with respect to the decimal degree place.
Of course, I would also be happy with an answer that gives an exact meaning to "small distances".
Yes ... at 10 meters and up to 1km meters you're going to be very accurate using plain old Pythagoras Theorem. It's really ridiculous nobody talks about this, especially considering how much computational power you save.
Proof:
Take the top of the earth, since it will be a worst case, the top 90 miles longitude, so that it's a circle with the longitudinal lines intersecting in the middle.
Note above that as you zoom in to an area as small as 1km, just 50 miles from the poles, what originally looked like a trapezoid with curved top and bottom borders, essentially looks like a nearly perfect rectangle. In other words we can assume rectilinearity at 1km, and especially at a mere 10M.
Now, its true of course that the longitude degrees are much shorter near the poles than at the equator. For example any slack-jawed yokel can see that the rectangles made by the latitude and longitude lines grow taller, the aspect ratio increasing, as you get closer to the poles. In fact the relationship of the longitude distance is simply what it would be at the equator multiplied by the cosine of the latitude of anywhere along the path. ie. in the image above where "L" (longitude distance) and "l" (latitude distance) are both the same degrees it is:
LATcm = Latitude at *any* point along the path (because it's tiny compared to the earth)
L = l * cos(LATcm)
Thus, we can for 1km or less (even near the poles) calculate the distance very accurately using Pythagoras Theorem like so:
Where: latitude1, longitude1 = polar coordinates of the start point
and: latitude2, longitude2 = polar coordinates of the end point
distance = sqrt((latitude2-latitude1)^2 + ((longitude2-longitude1)*cos(latitude1))^2) * 111,139*60
Where 111,139*60 (above) is the number of meters within one degree at the equator,
because we have to convert the result from equator degrees to meters.
A neat thing about this is that GPS systems usually take measurements at about 10m or less, which means you can get very accurate over very large distances by summing up the results from this equation. As accurate as Haversine formula. The super-tiny errors don't magnify as you sum up the total because they are a percentage that remains the same as they are added up.
Reality is however that the Haversine formula (which is very accurate) isn't difficult, but relatively speaking Haversine will consume your processor at least 3 times more, and up to 31x more computational intensive according to this guy: https://blog.mapbox.com/fast-geodesic-approximations-with-cheap-ruler-106f229ad016.
For me this formula did come useful to me when I was using a system (Google sheets) that couldn't give me the significant digits that are necessary to do the haversine formula.
I have GPS coordinates provided as degrees latitude, longitude and would like to offset them by a distance and an angle.E.g.: What are the new coordinates if I offset 45.12345, 7.34567 by 22km along bearing 104 degrees ?Thanks
For most applications one of these two formulas are sufficient:
"Lat/lon given radial and distance"
The second one is slower, but makes less problems in special situations (see docu on that page).
Read the introduction on that page, and make sure that lat/lon are converted to radians before and back to degrees after having the result.
Make sure that your system uses atan2(y,x) (which is usually the case) and not atan2(x,y) which is the case in Excell.
The link in the previous answer no longer works, here is the link using the way back machine:
https://web.archive.org/web/20161209044600/http://williams.best.vwh.net/avform.htm
The formula is:
A point {lat,lon} is a distance d out on the tc radial from point 1 if:
lat=asin(sin(lat1)*cos(d)+cos(lat1)*sin(d)*cos(tc))
IF (cos(lat)=0)
lon=lon1 // endpoint a pole
ELSE
lon=mod(lon1-asin(sin(tc)*sin(d)/cos(lat))+pi,2*pi)-pi
ENDIF
This algorithm is limited to distances such that dlon <pi/2, i.e those that extend around less than one quarter of the circumference of the earth in longitude. A completely general, but more complicated algorithm is necessary if greater distances are allowed:
lat =asin(sin(lat1)*cos(d)+cos(lat1)*sin(d)*cos(tc))
dlon=atan2(sin(tc)*sin(d)*cos(lat1),cos(d)-sin(lat1)*sin(lat))
lon=mod( lon1-dlon +pi,2*pi )-pi
I have a set of GPS Coordinates and I want to find the speed required for a UAV to travel between them. Doing this by calculating distance in x y z and then dividing by time to travel - m/s.
I know the great circle distance but I assume this will be incorrect since they are all relatively close together(within 10m)?
Is there an accurate way to do this?
For small distances you can use the haversine formula without a relevant loss of accuracy in comparison to Vincenty's f.e. Plus, it's designed to be accurate for very small distances. This can be read up here if you are interested.
You can do this by converting lat/long/alt into XYZ format for both points. Then, figure out the rotation angles to move one of those points (usually, the oldest point) so that it would be at lat=0 long=0 alt=0. Rotate the second position report (the newest point) by the same rotation angles. If you do it all correctly, you will find X equals the east offset, Y equals the north offset, and Z equals the up offset. You can use Pythagorean theorm with X and Y (north and east) offsets to determine the horizontal distance traveled. Normally, you just ignore the altitude differences and work with horizontal data only.
All of this assumes you are using accurate formulas to convert lat/lon/alt into XYZ. It also assumes you have enough precision in the lat/lon/alt values to be accurate. Approximations are not good if you want good results. Normally, you need about 6 decimal digits of precision in lat/lon values to compute positions down to the meter level of accuracy.
Keep in mind that this method doesn't work very well if you haven't moved far (greater than about 10 or 20 meters, more is better). There is enough noise in the GPS position reports that you are going to get jumpy velocity values that you will need to further filter to get good accuracy. The math approach isn't the problem here, it's the inherent noise in the GPS position reports. When you have good reports, you will get good velocity.
A GPS receiver doesn't normally use this approach to know velocity. It looks at the way doppler values change for each satellite and factor in current position to know what the velocity is. This works reasonably well when the vehicle is moving. It is a much faster way to detect changes in velocity (for instance, to release a position clamp). The normal user doesn't have access to the internal doppler values and the math gets very complicated, so it's not something you can do.
I'm trying to figure out the most efficient/fast way to add a large number of convex quads (four given x,y points) into an array/list and then to check against those quads if a point is within or on the border of those quads.
I originally tried using ray casting but thought that it was a little overkill since I know that all my polygons will be quads and that they are also all convex.
currently, I am splitting each quad into two triangles that share an edge and then checking if the point is on or in each of those two triangles using their areas.
for example
Triangle ABC and test point P.
if (areaPAB + areaPAC + areaPBC == areaABC) { return true; }
This seems like it may run a little slow since I need to calculate the area of 4 different triangles to run the check and if the first triangle of the quad returns false, I have to get 4 more areas. (I include a bit of an epsilon in the check to make up for floating point errors)
I'm hoping that there is an even faster way that might involve a single check of a point against a quad rather than splitting it into two triangles.
I've attempted to reduce the number of checks by putting the polygon's into an array[,]. When adding a polygon, it checks the minimum and maximum x and y values and then using those, places the same poly into the proper array positions. When checking a point against the available polygons, it retrieves the proper list from the array of lists.
I've been searching through similar questions and I think what I'm using now may be the fastest way to figure out if a point is in a triangle, but I'm hoping that there's a better method to test against a quad that is always convex. Every polygon test I've looked up seems to be testing against a polygon that has many sides or is an irregular shape.
Thanks for taking the time to read my long winded question to what's prolly a simple problem.
I believe that fastest methods are:
1: Find mutual orientation of all vector pairs (DirectedEdge-CheckedPoint) through cross product signs. If all four signs are the same, then point is inside
Addition: for every edge
EV[i] = V[i+1] - V[i], where V[] - vertices in order
PV[i] = P - V[i]
Cross[i] = CrossProduct(EV[i], PV[i]) = EV[i].X * PV[i].Y - EV[i].Y * PV[i].X
Cross[i] value is positive, if point P lies in left semi-plane relatively to i-th edge (V[i] - V[i+1]), and negative otherwise. If all the Cross[] values are positive, then point p is inside the quad, vertices are in counter-clockwise order. f all the Cross[] values are negative, then point p is inside the quad, vertices are in clockwise order. If values have different signs, then point is outside the quad.
If quad set is the same for many point queries, then dmuir suggests to precalculate uniform line equation for every edge. Uniform line equation is a * x + b * y + c = 0. (a, b) is normal vector to edge. This equation has important property: sign of expression
(a * P.x + b * Y + c) determines semi-plane, where point P lies (as for crossproducts)
2: Split quad to 2 triangles and use vector method for each: express CheckedPoint vector in terms of basis vectors.
P = a*V1+b*V2
point is inside when a,b>=0 and their sum <=1
Both methods require about 10-15 additions, 6-10 multiplications and 2-7 comparisons (I don't consider floating point error compensation)
If you could afford to store, with each quad, the equation of each of its edges then you could save a little time over MBo's answer.
For example if you have an inward pointing normal vector N for each edge of the quad, and a constant d (which is N.p for one of the vertcies p on the edge) then a point x is in the quad if and only if N.x >= d for each edge. So thats 2 multiplications, one addition and one comparison per edge, and you'll need to perform up to 4 tests per point.This technique works for any convex polygon.
I'm trying to figure out how to calculate a min/max lat/long bound on the specific given range of a gps coordinate.
for example: gps coord 37.42935699924869,-122.16962099075317 range .2 miles
I'm looking at the point + range + bearing in the http://www.movable-type.co.uk/scripts/latlong.html site but im not sure if this is exactly what i want.
This gives 4 unique lat/long pairs and I want/need a max/min lat and a max/min long.
Calculate the distance between the (constant) central point and the point you want to test. (This page should give you the distance (in meters)).
If (distance < 0.2) then ...
Well, given a point and a distance, you will get a circle.
You're looking for two points, which will essentially describe a square (two opposite corners). The two points you're looking for won't even be on the circle. I'm not exactly sure why you want this, but I don't think there is an answer to your question.
Perhaps you could tell us what you're trying to accomplish.
EDIT: Added image to illustrate. The orange line is the distance from the centre (e.g. 0.2 miles)
alt text http://img155.imageshack.us/img155/1315/diagramp.png
After your clarification, here is a less elegant answer that might give you what you want. Well, you want the inverse of a really complicated function. I'm afraid my math skills aren't up to the task, but it should be doable.
A less elegant solution is to find it by trial and error. Essentially, keep longitude the same and vary latitude. Using the right algorithm, you should be able to find one that is very close to the distance you want. This will give you a point on the circle (one of four that is also on the square).
Then keep latitude the same and vary longitude. This will give you a second point on the square (on the middle of one of the sides), from there you can find the 4 corners of the square.
This will slow, depending on how often you have to do it, that might or might not matter.