How can I get text's heigth from pdf's transformation matrix? - pdf

I am making a pdf parser and I have a problem when I am trying to read the transformation matrix (Tm) of a text.
For example, when I have a horizontal text, the transformation matrix looks like this:
"71.9871 0 0 73.5 178.668 522.2227 Tm"
which means that the text's height is the d parameter (73.5), the ratio of each character is a/d (71.9871/73.5) and it has to be translated to the point (178.668 522.2227).
If I rotate this text, then the transformation matrix looks like this:
"63.1614 -34.5367 35.2625 64.4888 181.8616 575.8494 Tm"
How can I get the height of the text, which is 73.5?
If I export the same file as an svg file I get this matrix:
"0.8593 0.4699 -0.4798 0.8774 181.8616 266.0405"
and that the height of the text is 73.5. (I have noticed that if i divide the d parameter of my rotated text with the text's height (73.5) I get the d parameter of the svg matrix (0.8774), but agian, how can I know the text's height?).
Thank you.

As already mentioned in a comment, you actually have a multitude of matrices and scalars to deal with, at least the current transformation matrix, the text matrix, the font size, the horizontal scaling, and the page user unit setting. Of course, though, you can combine all these into one matrix.
Thus, let's assume the matrix you have is this combined one.
To determine the factors by which the font is stretched from its size 1 default state, you could simply apply that matrix to a vertical and a horizontal line segment of length 1, e.g. [0, 0, 1] to [1, 0, 1] and [0, 0, 1] to [0, !, 1], and then calculate the lengths of the resulting line segments.
PS Doing some minor linear algebra, you will see that for a matrix
a b 0
c d 0
e f 1
this amounts to a horizontal font extent of sqrt(a² + b²) and a vertical font extent (the height) of sqrt(c² + d²).

Related

Convert raw YOLO output to bounding boxes

I'm trying to convert the raw output of my tiny yoloV3-model to bounding box coordinates. My input is a 416x416-image and the raw output has shape [2535, 6], corresponding to [center_x, center_y, width, height, obj score, class prob.] for each box. I want to convert the first four elements of this array into actual pixel coordinates, but I'm not sure how to interpret the values. As an example, most arrays look something like this:
[-4.3124937e+03 -4.5493687e+03 4.7279790e+03 5.0129067e+03 9.9985057e-01 9.9992472e-01]
Why are the values for center_x and center_y negative? And why is width and height so large?

How to get a uniform line width in PDF regardless of the device space aspect ratio?

The width of a line in PDF is defined in terms of distances in the user space. In my use case, the aspect ratio of the device space (e.g. 4:3) is different from the aspect ratio of the user space (e.g. 1:1), which causes the line widths in the device space to be different in vertical and horizontal directions.
For example, in this picture the horizontal and vertical lines should be of the same width, but they're not:
I would like to perform scaling that only results in line width uniformity and does not affect anything else.
I asked a similar question regarding PostScript here: How to ensure line widths are the same vertically and horizontally in PostScript?. A solution based in part on the answer to this question works for PostScript, but does not work in PDF after what seems to be an almost one-to-one translation.
I tried changing the stroke command S to q 1 0 0 1.5 0 0 cm S Q h, where q saves the graphics state, 1 0 0 1.5 0 0 cm scales the current transformation matrix, Q restores the graphics state, and h closes the current subpath. However, in addition to correctly scaling the line widths, this also scales the y-coordinates of the line endpoints by 1.5.
This is what I need to get:
But with q 1 0 0 1.5 0 0 cm S Q h, I get this instead:
How to make the line width uniform in the device space in PDF without affecting anything else?

Calculating the exact positions of(Td, TD, Tm, cm, T*) content stream in pdf?

Getting or calculating the exact positions of(Td, TD, Tm, cm, T*) content stream in pdf?
As a human I am able to calculate(whether it is replacing last Td or adding to last Td or multiplication with fontsize) the positions of tags in pdf content stream by comparing , where the glyphs are located in pdf and content stream position values. But I am unable to calculate perfect positions of glyph's programatically . Please see the screen short.
In above image left side box is pdf ui glyphs and right side box contains the related content stream. In content stream I highlighted two Td positions.
In first circle
3.321 -6.475999832 Td
The Td positions should add to the last Td positions. Assume x1, y1.
Current_x_pos = x1+3.321
Curent_y_pos = y1-6.475999832
then we can get the exact position of glyph "t".
In second highlighted circle the new Td positions (231.544 366.377990 Td) are completely replaced like
Current_x_pos = 231.544
Curent_y_pos = 366.377990
Along with that some times the parent tag is Tm at that case the formula might be like this
Current_x_pos = x1+(tdx1*font_size)
Curent_y_pos = y1+(tdy1*font_size)
When we need to multiply like above, and some times addition. Programatically how can I know this. To parse exact positions?(new screen short added for multiplication)
Any help ?
Thanks.
When we need to multiply like above, and some times addition. Programatically how can I know this. To parse exact positions?
It's quite simple, for a Td operation you always multiply, see the specification ISO 32000-1 (similarly in ISO 32000-2):
For a freshly initialized (i.e. identity) text line matrix Tlm this matrix multiplication looks like replacing its bottom row with tx ty 1.
For a text line matrix Tlm with only changes in the bottom row against an identity this matrix multiplication looks like an addition to the bottom row, e.g. x y 1 becomes x+tx y+ty 1.
For a text line matrix Tlm like in your second example
a 0 0
0 a 0
x y 1
this matrix multiplication looks like a multiplication with a followed by an addition to the bottom row, i.e. x y 1 becomes x+a·tx y+a·ty 1. If the font size parameter of the preceding Tf operation was 1, then a would effectively be the resultant font size giving rise to your assumption the font size is part of the formula.
In general, for an arbitrary, non-degenerate text line matrix Tlm
a b 0
c d 0
x y 1
this matrix multiplication looks even more complex, x y 1 becomes x+a·tx+c·ty y+b·tx+d·ty 1.
Thus, concerning your question
Programatically how can I know this. To parse exact positions?
your program should simply always use matrix multiplication and ignore what it looks like on the level of the separate coordinates.
What makes the second circled instruction look like a mere replacement, is that the prior text line matrix is the identity matrix. This is not due to the restore-state operation as assumed by François, though, but more simply to the start of text object operation BT:
As the text matrix and the text line matrix are reset at the start of a text object and the graphics state cannot be saved or restored in a text object, the save and restore graphics state operations are not to blame in this case.
(Screen shots are from the ISO 32000-1 copy shared by Adobe.)
When you say:
In second highlighted circle the new Td positions (231.544 366.377990
Td) are completely replaced
Actually, the positions Current_x_pos and Current_x_pos are not replaced. This Td command does exactly like always:
Current_x_pos = x1 + 231.544
Curent_y_pos = y1 - 366.377990
It is the Q from 3 line above that reloads previous graphic state, right after the current graphic state has been saved with q.

imshow non unifrom matrix bin size

I am trying to create an image with imshow, but the bins in my matrix are not equal.
For example the following matrix
C = [[1,2,2],[2,3,2],[3,2,3]]
is for X = [1,4,8] and for Y = [2,4,9]
I know I can just do xticks and yticks, but I want the axis to be equal..This means that I will need the squares which build the imshow to be in different sizes.
Is it possible?
This seems like a job for pcolormesh.
From When to use imshow over pcolormesh:
Fundamentally, imshow assumes that all data elements in your array are
to be rendered at the same size, whereas pcolormesh/pcolor associates
elements of the data array with rectangular elements whose size may
vary over the rectangular grid.
pcolormesh plots a matrix as cells, and take as argument the x and y coordinates of the cells, which allows you to draw each cell in a different size.
I assume the X and Y of your example data are meant to be the size of the cells. So I converted them in coordinates with:
xSize=[1,4,9]
ySize=[2,4,8]
x=np.append(0,np.cumsum(xSize)) # gives [ 0 1 5 13]
y=np.append(0,np.cumsum(ySize)) # gives [ 0 2 6 15]
Then if you want a similar behavior as imshow, you need to revert the y axis.
c=np.array([[1,2,2],[2,3,2],[3,2,3]])
plt.pcolormesh(x,-y,c)
Which gives us:

How does PDF line width interact with the CTM in both horizontal and vertical dimensions?

I'm trying to figure out exactly how line width affects a stroked line in PDF, given the current transformation matrix (CTM). Two questions...
First: how do I convert the line width to device space using the CTM? Page 208 in the PDF 1.7 Reference, which describes how to convert points using the CTM, assumes the input data is an (x, y) point. Line width is just a single value, so how do I convert it? Do I create a "dummy" point from it like (lineWidth, lineWidth)?
Second: once I make that calculation, I'll get another (x, y) point. If the CTM has different scaling factors for horizontal vs. vertical, that gives me two different line widths. How are these line widths actually applied? Does the first one (x) get applied only when drawing horizontal lines?
A concrete example for the second question: if I draw/stroke a horizontal line from (0, 0) to (4, 4) with line width (2, 1), what are the coordinates of the bounding box of the resulting rectangle (i.e., the rectangle that contains the line width)?
This is from Page 215 in the Reference, but it doesn't actually explain how the thickness of stroked lines will vary:
The effect produced in device space depends on the current transformation matrix
(CTM) in effect at the time the path is stroked. If the CTM specifies scaling by
different factors in the horizontal and vertical dimensions, the thickness of
stroked lines in device space will vary according to their orientation.
how do I convert the line width to device space using the CTM?
The line width essentially is the line size perpendicular to its direction. Thus, to calculate the width after transformation using the CTM, you choose a planar vector perpendicular to the original line whose length is the line width from the current graphics state, apply the CTM (without translation, i.e. setting e and f to 0) to that vector (embedded in the three dimensional space by setting the third coordinate to 1) and calculate the length of the resulting 2D vector (projecting on the first two coordinates).
E.g. you have a line from (0,0) to (1,4) in current user space coordinates with a width of 1. You have to find a vector perpendicular to it, e.g. (-4,1) by rotating 90° counter clockwise, and scale it to a length of 1, i.e. ( -4/sqrt(17), 1/sqrt(17) ) in that case.
If the CTM is the one from #Tikitu's answer
CTM has a horizontal scaling factor of 2 and a vertical scaling factor of 1
it would be
2 0 0
0 1 0
0 0 1
This matrix would make the line from the example above go from (0,0) to (2,4) and the "width vector" ( -4/sqrt(17), 1/sqrt(17) ) would be transformed to ( -8/sqrt(17), 1/sqrt(17) ) (the CTM already has no translation part) with a length of sqrt(65/17) which is about 1.955. I.e. the width of the resulting line (its size perpendicular to its direction) is nearly 2.
If the original line would instead have been (0,0) to (4,1) with width 1, a width vector choice would have been ( -1/sqrt(17), 4/sqrt(17) ). In that case the transformed line would go from (0,0) to (8,1) and the width vector would be transformed to ( -2/sqrt(17), 4/sqrt(17) ) with a length of sqrt(20/17) which is about 1.085. I.e. the width of the resulting line (perpendicular to its direction) is slightly more than 1.
You seem to be interested in the "corners" of the line. For this you have to take start and end of the transformed line and add or subtract half the transformed width vector. In the samples above:
(original line from (0,0) to (1,4)): ( -4/sqrt(17), 1/(2*sqrt(17)) ), ( 4/sqrt(17), -1/(2*sqrt(17)) ), ( 2-4/sqrt(17), 4+1/(2*sqrt(17)) ), ( 2+4/sqrt(17), 4-1/(2*sqrt(17)) );
(original line from (0,0) to (4,1)): ( -1/sqrt(17), 2/sqrt(17) ), ( 1/sqrt(17), -2/sqrt(17) ), ( 8-1/sqrt(17), 1+2/sqrt(17) ), ( 8+1/sqrt(17), 1-2/sqrt(17) ).
Don't forget, though, that PDF lines often are not cut off at the end but instead have some cap. And furthermore remember the special meaning of line width 0.
I don't know anything about PDF internals, but I can make a guess at what that passage might mean, based on knowing a bit about using matrices to represent linear transformations.
If you imagine your stroked line as a rectangle (long and thin, but with a definite width) and apply the CTM to the four corner points, you'll see how the orientation of the line changes its width when the CTM has different horizontal and vertical scaling factors.
If your CTM has a horizontal scaling factor of 2 and a vertical scaling factor of 1, think about lines at various angles:
a horizontal line (a short-but-wide rectangle) gets its length doubled, and it's "height" (the width of the line) stays the same;
a vertical line (a tall-and-thin rectangle) gets it's width doubled (i.e., the line gets twice as thick), and it's length stays the same;
lines at various angles get thicker by different degrees, depending on the angle, because they get stretched horizontally but not verticallye.g.
the thickness of a line at 45 degrees is measured diagonally (45 degrees the other way), so it gets somewhat thicker (some horizontal stretching), but not twice as thick (the vertical component of the diagonal didn't get bigger). (You can figure out the thickness with two applications of the Pythagorean theorem; it's about 1.58 times greater, or sqrt(5)/sqrt(2).)
If this story is correct, you can't convert line width using the CTM: it is simply different case-by-case, depending on the orientation of the line. What you can convert is the width of a particular line, with a particular orientation, via the trick of thinking of the line as a solid area and running its corners individually through the CTM. (This also means that "the same" line, with the same thickness, will look different as you vary its orientation, if your CTM has different horizontal and vertical scaling factors.)