pdfbox 2.0.2 > How to combine the TextPosition coordinates and Graphics GeneralPath coordinates into the same quadrant - pdf

As a newbie of pdfbox user, I plan to extract data in a table, but tables with special formats, say with merged column headers should be processed with the help of table's borderlines. Therefore, the coordinates of the text and at least the table's horizontal borderlines should be extracted.
In order to extract the text from the table, I used PDFTextStripper to get the list of TextPosition objects; in order to extract the horizontal lines from the same page, I used PDFGraphicsStreamEngine to extract the list of stroked GeneralPath objects, and inside the stroked GeneralPath object, there is the corresponding Rectangle2D object representing the line (height = 0). But it seems that the aforementioned coordinates of TextPosition objects and the coordinates of GeneralPath objects are not in the same quadrant but with different Y-axis ray starting from the same origin.
According to my investigation, the origin of the TextPosition object is the top left corner, whereas the origin of the Rectangle2D is the bottom left corner, and the direction of each of the Y-axis differs from each other.
First, I would like to confirm that my investigation is right. If so I would like to get some hint about how to make the coordinates of Rectangle2D and TextPosition into the same quadrant.
Thanks in advance

Related

Line Profile Diagonal

When you make a line profile of all x-values or all y-values the extraction from each pixel is clear. But when you take a line profile along a diagonal, how does DM choose which pixels to use in the one dimensional readout?
Not really a scripting question, but I'm rather certain that it uses bi-linear interpolation between the grid-points along the drawn line. (And if perpendicular integration is enabled, it does so in an integral.) It's the same interpolation you would get for a "rotate" image.
In fact, you can think of it as a rotate-image (bi-linearly interpolated) with a 'cut-out' afterwards, potentially summed/projected onto the new X-axis.
Here is an example
Assume we have a 5 x 4 image, which gives the grid as shown below.
I'm drawing top-left corners to indicate the coordinates system pixel convention used in DigitalMicrgraph, where
(x/y)=(0/0) is the top-left corner of the image
Now extract a LineProfile from (1/1) to (4/3). I have highlighted the pixels for those coordinates.
Note, that a Line drawn from the corners seems to be shifted by half-a-pixel from what feels 'natural', but that is the consequence of the top-left-corner convention. I think, this is why a LineProfile-Marker is shown shifted compared to f.e. LineAnnotations.
In general, this top-left corner convention makes schematics with 'pixels' seem counter-intuitive. It is easier to think of the image simply as grid with values in points at the given coordinates than as square pixels.
Now the maths.
The exact profile has a length of:
As we can only have profiles with integer channels, we actually extract a LineProfile of length = 4, i.e we round up.
The angle of the profile is given by the arc-tangent of dX and dY.
So to extract the profile, we 'rotate' the grid by that angle - done by bilinear interpolation - and then extract the profile as grid of size 4 x 1:
This means the 'values' in the profile are from the four points:
Which are each bi-linearly interpolated values from four closest points of the original image:
In case the LineProfile is averaged over a certain width W, you do the same thing but:
extract a 2D grid of size L x W centered symmetrically over the line.i.e. the grid is shifted by (W-1)/2 perpendicular to the profile direction.
sum the values along W

How can I find the amount of pixels in part of an image?

I have an image and I want to see how many pixels are in different parts of the image. Is there a software I can use to do this?
In Gimp, the "Histogram" dialog applies to the selection, so the pixel count displayed is the pixels in the selection (weighted by their selection level):
In the image below the selection covers the black circle which has a 100px radius. The Pixels value is close to 100²*Pi (314000 instead of 314159).
The Count is the number of pixels between the two values indicated by the handles at the bottom of the histogram.
Of course the selection can have any shape and be obtained with various tools.
I assume PS has something equivalent.

Fast check if polygon contains point between dataframes

I have two dataframes. One contains a column of Polygons, taken from an image of polygon shapes. Each polygon has a set of coordinates. This dataframe also has a "segment-id" column. I have another dataframe, containing a column of Points, also with coordinates. These Points represent pixels from the same image of Polygon shapes, and therefore have the same coordinate system. I want to give every Point the "segment-id" of the Polygon which contains it. Every Polygon contains at least one Point.
Currently, I achieve this by using a nested for loop:
for i, row in enumerate(point_df.itertuples(), 0):
point = pixel_df.at[i, 'geometry']
for j in range(len(polygon_df)):
polygon = polygon_df.iat[j, 0]
if polygon.contains(point):
pixel_df.at[i, 'segment_id'] = polygon_df.at[j, 'segment_id']
else:
pass
This is extremely slow. For 100 Points, it takes around 10 seconds. I need a faster way of doing this. I have tried using apply but it is still super slow.
Hope someone can help me out, thanks very much.
For fast "is point inside polygon":
Preparation: In the code that obtains the data describing the polygons; using all the vertices, find the minimum and maximum y-coord, and minimum and maximum x-coord; and store that with the polygon's data.
1) Using the point's coords and the polygon's "minimum and maximum x and y" (pre-determined during preparation); do a "bounding box" test. This is just a fast way to find out if the point is definitely not inside the polygon (so you can skip the more expensive steps most of the time).
2) Set a "yes/no" flag to "no"
3) For each edge in the polygon; determine if a horizontal line passing through the point would intersect with the edge, and if it does determine the x-coord of the intersection. If the x-coord of the intersection is less than the point's x-coord, toggle (with NOT) the "yes/no" flag. Ignore "horizontal line passes through a vertex" during this step.
4) For each vertex, compare its y-coord with the point's y-coord. If they're the same you need to look at both edges coming from that vertex to determine if the edge's vertices are in the same y direction. If the edge's vertices are in the same y direction (if the edges form a 'V' shape or upside-down 'V' shape) ignore the vertex. Otherwise (if the edges form a '<' or '>' shape), if the vertex's x-coord is less than the point's x-coord, toggle the "yes/no" flag.
After all this is done; that "yes/no" flag will tell you if the point was in the polygon.

TCPDF - Cropping polygons

I'm using TCPDF::Polygon() to render coastline (land) coordinates from a text file on top of a blue TCPDF::Rect(). The text file contains coastlines for the entire world, however by specifying a center latitude and longitude in the map projection, together with some multiplication to get a 'zooming' effect, I manage to display the desired area within the A4 page.
Problem:
As you can see by the image the coastlines are drawn all the way to the edge of the document (and beyond). Although most of the coastline coordinates from the text file are 'outside' the document's visible area they are still taking up some hundred kilobytes in the output file.
Is there a nice way to 'crop' the coastline-polygon, so that the coastlines fit nicely inside the blue area and the excess vertecies are completely excluded from the document (not taking up file space)?
Solution:
The 'cropping' I was looking for is done using clipping, as suggested by #Rad Lexus:
// Start clipping
$pdf->StartTransform();
// Draw clipping rectangle
$pdf->Rect($DOC_MARG, $DOC_MARG, $MAP_W, $MAP_H, 'CNZ');
// -- Draw all polygons here (land areas) --
// Stop clipping
$pdf->StopTransform();
Source: https://stackoverflow.com/a/9400490/2667737
To save space in the output file I check every pixel in each polygon (land area) and render only the polygons that has one or more pixels within the bounds of the page - also suggested by #Rad. In the example view in my first post, the size was halved using this method.
Thanks for the help!

Calculating collision for a moving circle, without overlapping the boundaries

Let's say I have circle bouncing around inside a rectangular area. At some point this circle will collide with one of the surfaces of the rectangle and reflect back. The usual way I'd do this would be to let the circle overlap that boundary and then reflect the velocity vector. The fact that the circle actually overlaps the boundary isn't usually a problem, nor really noticeable at low velocity. At high velocity it becomes quite clear that the circle is doing something it shouldn't.
What I'd like to do is to programmatically take reflection into account and place the circle at it's proper position before displaying it on the screen. This means that I have to calculate the point where it hits the boundary between it's current position and it's future position -- rather than calculating it's new position and then checking if it has hit the boundary.
This is a little bit more complicated than the usual circle/rectangle collision problem. I have a vague idea of how I should do it -- basically create a bounding rectangle between the current position and the new position, which brings up a slew of problems of it's own (Since the rectangle is rotated according to the direction of the circle's velocity). However, I'm thinking that this is a common problem, and that a common solution already exists.
Is there a common solution to this kind of problem? Perhaps some basic theories which I should look into?
Since you just have a circle and a rectangle, it's actually pretty simple. A circle of radius r bouncing around inside a rectangle of dimensions w, h can be treated the same as a point p at the circle's center, inside a rectangle (w-r), (h-r).
Now position update becomes simple. Given your point at position x, y and a per-frame velocity of dx, dy, the updated position is x+dx, y+dy - except when you cross a boundary. If, say, you end up with x+dx > W (letting W = w-r), then you do the following:
crossover = (x+dx) - W // this is how far "past" the edge your ball went
x = W - crossover // so you bring it back the same amount on the correct side
dx = -dx // and flip the velocity to the opposite direction
And similarly for y. You'll have to set up a similar (reflected) check for the opposite boundaries in each dimension.
At each step, you can calculate the projected/expected position of the circle for the next frame.
If this lies outside the rectangle, then you can then use the distance from the old circle position to the rectangle's edge and the amount "past" the rectangle's edge that the next position lies at (the interpenetration) to linearly interpolate and determine the precise time when the circle "hits" the rectangle edge.
For example, if the circle is 10 pixels away from the rectangle's edge, then is predicted to move to 5 pixels beyond it, you know that for 2/3rds of the timestep (10/15ths) it moves on its orginal path, then is reflected and continues on its new path for the remaining 1/3rd of the timestep (5/15ths). By calculating these two parts of the motion and "adding" the translations together, you can find the correct new position.
(Of course, it gets more complicated if you hit near a corner, as there may be several collisions during the timestep, off different edges. And if you have more than one circle moving, things get a lot more complex. But that's where you can start for the case you've asked about)
Reflection across a rectangular boundary is incredibly simple. Just take the amount that the object passed the boundary and subtract it from the boundary position. If the position without reflecting would be (-0.8,-0.2) for example and the upper left corner is at (0,0), the reflected position would be (0.8,0.2).