Fast check if polygon contains point between dataframes - pandas

I have two dataframes. One contains a column of Polygons, taken from an image of polygon shapes. Each polygon has a set of coordinates. This dataframe also has a "segment-id" column. I have another dataframe, containing a column of Points, also with coordinates. These Points represent pixels from the same image of Polygon shapes, and therefore have the same coordinate system. I want to give every Point the "segment-id" of the Polygon which contains it. Every Polygon contains at least one Point.
Currently, I achieve this by using a nested for loop:
for i, row in enumerate(point_df.itertuples(), 0):
point = pixel_df.at[i, 'geometry']
for j in range(len(polygon_df)):
polygon = polygon_df.iat[j, 0]
if polygon.contains(point):
pixel_df.at[i, 'segment_id'] = polygon_df.at[j, 'segment_id']
else:
pass
This is extremely slow. For 100 Points, it takes around 10 seconds. I need a faster way of doing this. I have tried using apply but it is still super slow.
Hope someone can help me out, thanks very much.

For fast "is point inside polygon":
Preparation: In the code that obtains the data describing the polygons; using all the vertices, find the minimum and maximum y-coord, and minimum and maximum x-coord; and store that with the polygon's data.
1) Using the point's coords and the polygon's "minimum and maximum x and y" (pre-determined during preparation); do a "bounding box" test. This is just a fast way to find out if the point is definitely not inside the polygon (so you can skip the more expensive steps most of the time).
2) Set a "yes/no" flag to "no"
3) For each edge in the polygon; determine if a horizontal line passing through the point would intersect with the edge, and if it does determine the x-coord of the intersection. If the x-coord of the intersection is less than the point's x-coord, toggle (with NOT) the "yes/no" flag. Ignore "horizontal line passes through a vertex" during this step.
4) For each vertex, compare its y-coord with the point's y-coord. If they're the same you need to look at both edges coming from that vertex to determine if the edge's vertices are in the same y direction. If the edge's vertices are in the same y direction (if the edges form a 'V' shape or upside-down 'V' shape) ignore the vertex. Otherwise (if the edges form a '<' or '>' shape), if the vertex's x-coord is less than the point's x-coord, toggle the "yes/no" flag.
After all this is done; that "yes/no" flag will tell you if the point was in the polygon.

Related

Line Profile Diagonal

When you make a line profile of all x-values or all y-values the extraction from each pixel is clear. But when you take a line profile along a diagonal, how does DM choose which pixels to use in the one dimensional readout?
Not really a scripting question, but I'm rather certain that it uses bi-linear interpolation between the grid-points along the drawn line. (And if perpendicular integration is enabled, it does so in an integral.) It's the same interpolation you would get for a "rotate" image.
In fact, you can think of it as a rotate-image (bi-linearly interpolated) with a 'cut-out' afterwards, potentially summed/projected onto the new X-axis.
Here is an example
Assume we have a 5 x 4 image, which gives the grid as shown below.
I'm drawing top-left corners to indicate the coordinates system pixel convention used in DigitalMicrgraph, where
(x/y)=(0/0) is the top-left corner of the image
Now extract a LineProfile from (1/1) to (4/3). I have highlighted the pixels for those coordinates.
Note, that a Line drawn from the corners seems to be shifted by half-a-pixel from what feels 'natural', but that is the consequence of the top-left-corner convention. I think, this is why a LineProfile-Marker is shown shifted compared to f.e. LineAnnotations.
In general, this top-left corner convention makes schematics with 'pixels' seem counter-intuitive. It is easier to think of the image simply as grid with values in points at the given coordinates than as square pixels.
Now the maths.
The exact profile has a length of:
As we can only have profiles with integer channels, we actually extract a LineProfile of length = 4, i.e we round up.
The angle of the profile is given by the arc-tangent of dX and dY.
So to extract the profile, we 'rotate' the grid by that angle - done by bilinear interpolation - and then extract the profile as grid of size 4 x 1:
This means the 'values' in the profile are from the four points:
Which are each bi-linearly interpolated values from four closest points of the original image:
In case the LineProfile is averaged over a certain width W, you do the same thing but:
extract a 2D grid of size L x W centered symmetrically over the line.i.e. the grid is shifted by (W-1)/2 perpendicular to the profile direction.
sum the values along W

How to draw an outline of a group of multiple rectangles?

I need to draw an enclosing polygon of a group of rectangles that are placed next to each other.
Let's think of text fields that share at least one edge (or part of it) with at least one of the other rectangles.
I can get the rectangles points coordinates, and so I basically have any data I need about them.
Can you think of a simple algorithm / procedure to draw a polygon (connected straight paths) around these objects.
Here's a demonstration of different potential cases (A, B, C, etc...). In example A I also drew a blue polygon which is the path that I need to draw, outlining the group of rectangles.
I've read here about convex hull and stuff like that but really, this looks like a far simpler problem.
One (beginning of) solution I thought of was that the points I actually need to draw through are only ones that are not shared by any pair of rectangles, meaning points that are vertices of more than one rectangle are redundant. What I couldn't find out was the order by which I need to draw lines from one to the next.
I currently work on objective c, but any other language or algo would be appreciated, including pseudo.
Thanks!
IMHO it should be like this. Make a list of edged and see if some are overlaying: This should be simple if the rectangles are aligned with the x,y axis. You just find the edges that have the vertexes on the same x or y and the other coordinates need to be in between. After this the remaining edges should form the outline.
Another method to find common edges is to break all rectangles along each x and y axis where you have vertices. This should look as if you are growing all lines to infinity. After this all common edges will have common vertices and can be eliminated.
You have two rows, and three different y-values. Let's say y0 is the top of the thing, y2 is the bottom end, and y1 marks the middle between both rows.
Each row has a maximum and a minimum x-value, let's say the top-row goes from x0_min to x0_max, and the bottom row from x2_min to x2_max. Given those values you just draw around the thing:
(x0_min,y0)->
(x0_max,y0)->
(x0_max,y1)->
(x2_max,y1)->
(x2_max,y2)->
(x2_min,y2)->
(x2_min,y1)->
(x0_min,y1)->
(x0_min,y0)

Calculating total coverage area of a union of polygons

I have a number of 2D (possibly intersecting) polygons which I rendered using OpenGL ES on the screen. All the polygons are completely contained within the screen. What is the most timely way to find the percentage area of the union of these polygons to the total screen area? Timeliness is required as I have a requirement for the coverage area to be immediately updated whenever a polygon is shifted.
Currently, I am representing each polygon as a 2D array of booleans. Using a point-in-polygon function (from a geometry package), I sample each point (x,y) on the screen to check if it belongs to the polygon, and set polygon[x][y] = true if so, false otherwise.
After doing that to all the polygons in the screen, I loop through all the screen pixels again, and check through each polygon array, counting that pixel as "covered" if any polygon has its polygon[x][y] value set to true.
This works, but the performance is not ideal as the number of polygons increases. Are there any better ways to do this, using open-source libraries if possible? I thought of:
(1) Unioning the polygons to get one or more non-overlapping polygons. Then compute the area of each polygon using the standard area-of-polygon formula. Then sum them up. Not sure how to get this to work?
(2) Using OpenGL somehow. Imagine that I am rendering all these polygons with a single color. Is it possible to count the number of pixels on the screen buffer with that certain color? This would really sound like a nice solution.
Any efficient means for doing this?
If you know background color and all polygons have other colors, you can read all pixels from framebuffer glReadPixels() and simply count all pixels that have color different than background.
If first condition is not met you may consider creating custom framebuffer and render all polygons with the same color (For example (0.0, 0.0, 0.0) for backgruond and (1.0, 0.0, 0.0) for polygons). Next, read resulting framebuffer and calculate mean of red color across the whole screen.
If you want to get non-overlapping polygons, you can run a line intersection algorithm. A simple variant is the Bentley–Ottmann algorithm, but even faster algorithms of O(n log n + k) (with n vertices and k crossings) are possible.
Given a line intersection, you can unify two polygons by constructing a vertex connecting both polygons on the intersection point. Then you follow the vertices of one of the polygons inside of the other polygon (you can determine the direction you have to go in using your point-in-polygon function), and remove all vertices and edges until you reach the outside of the polygon. There you repair the polygon by creating a new vertex on the second intersection of the two polygons.
Unless I'm mistaken, this can run in O(n log n + k * p) time where p is the maximum overlap of the polygons.
After unification of the polygons you can use an ordinary area function to calculate the exact area of the polygons.
I think that attempt to calculate area of polygons with number of pixels is too complicated and sometimes inaccurate. You can see something similar in stackoverflow answer about calculation the area covered by a polygon and if you construct regular polygons see area of a regular polygon ,

Most efficient way to check if a point is in or on a convex quad polygon

I'm trying to figure out the most efficient/fast way to add a large number of convex quads (four given x,y points) into an array/list and then to check against those quads if a point is within or on the border of those quads.
I originally tried using ray casting but thought that it was a little overkill since I know that all my polygons will be quads and that they are also all convex.
currently, I am splitting each quad into two triangles that share an edge and then checking if the point is on or in each of those two triangles using their areas.
for example
Triangle ABC and test point P.
if (areaPAB + areaPAC + areaPBC == areaABC) { return true; }
This seems like it may run a little slow since I need to calculate the area of 4 different triangles to run the check and if the first triangle of the quad returns false, I have to get 4 more areas. (I include a bit of an epsilon in the check to make up for floating point errors)
I'm hoping that there is an even faster way that might involve a single check of a point against a quad rather than splitting it into two triangles.
I've attempted to reduce the number of checks by putting the polygon's into an array[,]. When adding a polygon, it checks the minimum and maximum x and y values and then using those, places the same poly into the proper array positions. When checking a point against the available polygons, it retrieves the proper list from the array of lists.
I've been searching through similar questions and I think what I'm using now may be the fastest way to figure out if a point is in a triangle, but I'm hoping that there's a better method to test against a quad that is always convex. Every polygon test I've looked up seems to be testing against a polygon that has many sides or is an irregular shape.
Thanks for taking the time to read my long winded question to what's prolly a simple problem.
I believe that fastest methods are:
1: Find mutual orientation of all vector pairs (DirectedEdge-CheckedPoint) through cross product signs. If all four signs are the same, then point is inside
Addition: for every edge
EV[i] = V[i+1] - V[i], where V[] - vertices in order
PV[i] = P - V[i]
Cross[i] = CrossProduct(EV[i], PV[i]) = EV[i].X * PV[i].Y - EV[i].Y * PV[i].X
Cross[i] value is positive, if point P lies in left semi-plane relatively to i-th edge (V[i] - V[i+1]), and negative otherwise. If all the Cross[] values are positive, then point p is inside the quad, vertices are in counter-clockwise order. f all the Cross[] values are negative, then point p is inside the quad, vertices are in clockwise order. If values have different signs, then point is outside the quad.
If quad set is the same for many point queries, then dmuir suggests to precalculate uniform line equation for every edge. Uniform line equation is a * x + b * y + c = 0. (a, b) is normal vector to edge. This equation has important property: sign of expression
(a * P.x + b * Y + c) determines semi-plane, where point P lies (as for crossproducts)
2: Split quad to 2 triangles and use vector method for each: express CheckedPoint vector in terms of basis vectors.
P = a*V1+b*V2
point is inside when a,b>=0 and their sum <=1
Both methods require about 10-15 additions, 6-10 multiplications and 2-7 comparisons (I don't consider floating point error compensation)
If you could afford to store, with each quad, the equation of each of its edges then you could save a little time over MBo's answer.
For example if you have an inward pointing normal vector N for each edge of the quad, and a constant d (which is N.p for one of the vertcies p on the edge) then a point x is in the quad if and only if N.x >= d for each edge. So thats 2 multiplications, one addition and one comparison per edge, and you'll need to perform up to 4 tests per point.This technique works for any convex polygon.

Search optimization problem

Suppose you have a list of 2D points with an orientation assigned to them. Let the set S be defined as:
S={ (x,y,a) | (x,y) is a 2D point, a is an orientation (an angle) }.
Given an element s of S, we will indicate with s_p the point part and with s_a the angle part. I would like to know if there exist an efficient data structure such that, given a query point q, is able to return all the elements s in S such that
(dist(q_p, s_p) < threshold_1) AND (angle_diff(q_a, s_a) < threshold_2) (1)
where dist(p1,p2), with p1,p2 2D points, is the euclidean distance, and angle_diff(a1,a2), with a1,a2 angles, is the difference between angles (taken to be the smallest one). The data structure should be efficient w.r.t. insertion/deletion of elements and the search as defined above. The number of vectors can grow up to 10.000 and more, but take this with a grain of salt.
Now suppose to change the above requirement: instead of using the condition (1), let's request all the elements of S such that, given a distance function d, we want all elements of S such that d(q,s) < threshold. If i remember well, this last setup is called range-search. I don't know if the first case can be transformed in the second.
For the distance search I believe the accepted best method is a Binary Space Partition tree. This can be stored as a series of bits. Each two bits (for a 2D tree) or three bits (for a 3D tree) subdivides the space one more level, increasing resolution.
Using a BSP, locating a set of objects to compare distances with is pretty easy. Just find the smallest set of squares or cubes which contain the edges of your distance box.
For the angle, I don't know of anything. I suppose that you could store each object in a second list or tree sorted by its angle. Then you would find every object at the proper distance using the BSP, every object at the proper angles using the angle tree, then do a set intersection.
You have effectively described a "three dimensional cyclindrical space", ie. a space that is locally three dimensional but where one dimension is topologically cyclic. In other words, it is locally flat and may be modeled as the boundary of a four-dimensional object C4 in (x, y, z, w) defined by
z^2 + w^2 = 1
where
a = arctan(w/z)
With this model, the space defined by your constraints is a 2-dimensional cylinder wrapped "lengthwise" around a cross section wedge, where the wedge wraps around the 4-d cylindrical space with an angle of 2 * threshold_2. This can be modeled using a "modified k-d tree" approach (modified 3-d tree), where the data structure is not a tree but actually a graph (it has cycles). You can still partition this space into cells with hyperplane separation, but traveling along the curve defined by (z, w) in the positive direction may encounter a point encountered in the negative direction. The tree should be modified to actually lead to these nodes from both directions, so that the edges are bidirectional (in the z-w curve direction - the others are obviously still unidirectional).
These cycles do not change the effectiveness of the data structure in locating nearby points or allowing your constraint search. In fact, for the most part, those algorithms are only slightly modified (the simplest approach being to hold a visited node data structure to prevent cycles in the search - you test the next neighbors about to be searched).
This will work especially well for your criteria, since the region you define is effectively bounded by these axis-defined hyperplane-bounded cells of a k-d tree, and so the search termination will leave a region on average populated around pi / 4 percent of the area.