PDF transformation matrix has a scaling of 50 units

PDF transformation matrix has a scaling of 50 units - pdf

I'm trying to highlight some text with a glyph width of 1000 (which corresponds to 1 unit of text space)and font size of 1; the transformation matrix is [50 0 0 50 0 0]. The result is text that is too big. But this is not the case. The text that is being displayed is not big at all; it's a normal size.
Any PDF reader I open the file with has no problems highlighting the word, which means that I'm missing something somewhere.
Currently I'm checking for the default font and the font array in the fonts dictionary, the font size, and the transformation matrix. Is there any other way to scale text in a PDF besides the ones I just mentioned?

This answer combines the comments to the original question:
Currently I'm checking for the default font and the font array in the fonts dictionary, the font size, and the transformation matrix. Is there any other way to scale text in a PDF besides the ones I just mentioned?
A few possibility coming to my mind immediately:
A new transformation matrix (argument to cm) does not replace the old one; instead it is multiplied to it (from the left).
In case of q ... Q you have to consider resets of the current transformation matrix.
(The current transformation matrix, line widths, colors, overprint settings, and much, much more are part of the graphics state. To get an impression, have a look at the entries in tables 57 and 58 of the PDF specification ISO 32000-1. At least all the properties described there are part of the graphics state and, therefore, saved during q and restored during Q.)
Furthermore there is the text matrix to consider.
Finally the UserUnit entry of the page might change the rules.
So there's more to look at than the text positioning operators.
For a good overview have a look at section 9.4.4 Text Space Details of the PDF specification, especially Note 2 therein. (Thanks to #plinth.)

Related

How to model fixed width columns for 2d bin/strip packing problem?

I want to create a mathematical model for 2d bin packing optimization problem. I am not quite sure if it is bin packing problem it may be called strip packing, anyway let me introduce the problem.
1- There are some group of boxes to be placed on strips (see article 3.)
2- Each group contains a number of boxes which have same width and same hight. For example,
group A
100 boxes with width = 80cm and height = 120cm
group B
250 boxes with width = 150cm and height = 200cm
etc.
3- There are unlimited number of equal sized strips which have fixed width and height, for example
infinite number of Width = 800cm and Height 1400cm
4- The main goal is packing these boxes into minimum number of the strips. However, there are some restrictions to do this job.
5- If we think of the strips as a 2d row and column plane, at each column must has a fixed width of boxes. For example, if (column 0 and row 0) has a box w=100,h=80 then (column 0 and row 1) also has to has a box w=100,h=80. It is not allowed to be in the same column for diferent sized boxes. This rule is not valid for rows. Each row can have different sized boxes, there is no restriction.
6- It is not important to fill the whole strip. We want to fill strips with minimum space between boxes. The heighest column indicates a stop line through other columns and we calculate the loss value (space ratio over the whole strip area).
I tried to implement this optimization problem with GLPK linear programming tool. I have used a mathematical model from the paper (C. Blum, V. Schmid Solving the 2D bin packing problem by means of a hybrid evolutionary algorithm)
C. Blum, V. Schmid Solving the 2D bin packing problem by means of a hybrid evolutionary algorithm
This math model works great in the GLPK. However, it is designed for boxes for packing in x,y coordinates. If you see article 5 we want them in a fixed-width column fashion.
Can you please help me to modify the mathematical model to make possible to implement article 5.
Thank you all,

A PDF with different outputs in different PDF viewers (with shades)

Consider the following PostScript file
[1 0 0.5 0.866 150 550] concat
<<
/ShadingType 2
/Coords [ 0 0 100 100]
/BBox [ 0 0 100 100]
/ColorSpace [ /DeviceRGB ]
/Function
<<
/FunctionType 0
/Domain [0 1]
/Range [0 1 0 1 0 1]
/BitsPerSample 8
/Size [2]
/DataSource <FFA0A0FFE0E0>
>>
/Extend [false false]
>>
shfill
Consider that we convert that file in PDF with GhostScript (ps2pdf) or Adobe Distiller.
The resulting PDF does not render the same way in the different PDF viewers :
In Adobe Reader or Firefox (which uses PDF.js), we have a parallelogram (not a rectangle).
In SumatraPDF (which uses MuPDF) and Chrome (which uses PDFium), we have a rectangle.
Who is right?

In my opinion Adobe Acrobat is right but the specification could be read differently, too.
Your PDF contains the following content stream:
/GS1 gs
q
1 0 .5 .866 150 550 cm
/Sh1 sh
Q
I.e. first the current transformation matrix is changed, it is sheared and squished a bit, and then the shading Sh1 is painted. That shading in turn is defined as
<</BBox[0 0 100 100]/ColorSpace/DeviceRGB/Coords[0 0 100 100]/Function 15 0 R/ShadingType 2>>
I.e. with a 100×100 square bounding box (interpreted as a temporary additional clipping path) and an axial shading along its (0, 0) to (100, 100) diagonal, matching your postscript definition.
The shading operator sh is specified as
Operands
Operator
Description
name
sh
(PDF 1.3) Paint the shape and colour shading described by a shading dictionary, subject to the current clipping path. The current colour in the graphics state is neither used nor altered. The effect is different from that of painting a path using a shading pattern as the current colour. name is the name of a shading dictionary resource in the Shading subdictionary of the current resource dictionary (see 7.8.3, "Resource dictionaries"). All coordinates in the shading dictionary are interpreted relative to the current user space. (By contrast, when a shading dictionary is used in a Type 2 pattern, the coordinates are expressed in pattern space.) All colours are interpreted in the colour space identified by the shading dictionary’s ColorSpace entry (see "Table 77 — Entries common to all shading dictionaries"). The Background entry, if present, is ignored.This operator should be applied only to bounded or geometrically defined shadings. If applied to an unbounded shading, it paints the shading’s gradient fill across the entire clipping region, which may be time-consuming.
(ISO 32000-2:2017, Table 76 — Shading operator)
In particular: All coordinates in the shading dictionary are interpreted relative to the current user space.
Thus, the square bounding box / temporary clip path is squished and sheared by the current transformation matrix to a non-rectangular parallelogram as can be viewed in Adobe Acrobat:
I mentioned above that the specification can be read differently, too: If one considers the BBox entry as the coordinates of two points, the lower left corner and the upper right corner of the box, and applied the transformation before making the result a box, one would get a squished, elongated rectangle as can be viewed in Chrome:
But the BBox here is specified as an array of four numbers giving the left, bottom, right, and top coordinates, respectively, of the shading’s bounding box (ibidem, Table 77 — Entries common to all shading dictionaries) and not as the coordinates of two endpoints of a diagonal. Thus, I'd favor the first interpretation also implemented by Adobe.
I don't have a copy of ISO 32000-2:2020 yet, so maybe this has been clarified one way or the other.
The situation would be different if the shading would have been used in a pattern which would have served as current color during a fill instruction. In that case the specification says:
A pattern’s appearance is described with respect to its own internal coordinate system. Every pattern has a pattern matrix, a transformation matrix that maps the pattern’s internal coordinate system to the default coordinate system of the pattern’s parent content stream (the content stream in which the pattern is defined as a resource). The concatenation of the pattern matrix with that of the parent content stream establishes the pattern coordinate space, within which all graphics objects in the pattern shall be interpreted.
(ISO 32000-2:2017, Section 8.7.2 — General properties of patterns)
In this case the square bounding box with the diagonal axial shading would not have been subject to the current transformation matrix.

PDF glyph widths for composite (Type0) fonts

In what unit/space are CIDfont widths defined?
I am trying to get device space coordinates for glyphs in a document (or, equivalently in my case, in default user space), but I'm having trouble with glyph displacement for composite fonts.
The ISO spec (8.7.1 on CIDFonts) says that DW is defined in user units.
This seems like a weird a choice, given than other font types (except Type3) have widths defined in glyph space, but it would make sense then that the widths in W are also defined in user units.
This doesn't appear to be the case though. When calculating glyph displacements (as described in 9.4.4 Text Space Details), multiplying the widths with the inverse text matrix, to convert them to text space, does not appear to give me the right results. The document I'm working on uses default user space (does not define a CTM or set the UserUnit), so by my understanding, that should have worked.
Where am I wrong?

The ISO spec (8.7.1 on CIDFonts) says that DW is defined in user units.
I assume you found that "user units" mentioning in ISO 32000-1, section 9.7.4.1 (subsection "General" of section "CIDFonts"):
DW
integer
(Optional) The default width for glyphs in the CIDFont (see 9.7.4.3, "Glyph Metrics in CIDFonts"). Default value: 1000 (defined in user units).
(ISO 32000-1, Table 117 – Entries in a CIDFont dictionary)
Indeed, the "(defined in user units)" here is quite misleading, so it has been removed in ISO 32000-2 where the corresponding entry only says
DW
number
(Optional) The default width for glyphs in the CIDFont (see 9.7.4.3, "Glyph metrics in CIDFonts"). Default value: 1000.
(ISO 32000-2, Table 115 — Entries in a CIDFont dictionary)
It also doesn't make any sense to assume font displacement numbers to be given in user space units as the displacement must respect current states like the text matrix, the horizontal scaling, and the font size, and therefore, cannot be a fixed dimension in user space.
Instead we're actually in just the same situation with CID fonts as with other fonts, the displacements are given in glyph space and are transformed to text space as defined in section 8.3.2.4 ("Other Coordinate Spaces") of both ISO 32000-1 and ISO 32000-2:
Character glyphs in a font shall be defined in glyph space (see 9.2.4, "Glyph Positioning and Metrics"). The transformation from glyph space to text space shall be defined by the font matrix. For most types of fonts, this matrix shall be predefined to map 1000 units of glyph space to 1 unit of text space; for Type 3 fonts, the font matrix shall be given explicitly in the font dictionary (see 9.6.5, "Type 3 Fonts").
Thus, the default value 1000 of the default CIDFont glyph width DW allows for a square 1×1 text space area, and a squarish area indeed is what many CJK glyphs can properly be drawn in, making this default value sensible.

Finding the length of a line through every pixel

I have a raster image with multiple polyline feature classes over it. The lines are not overlapping but they are in multiple different orientations. For every pixel in the raster, I want to calculate the length of the line through that pixel so that the result would be a raster with cells assigned a float value of zero to 2^0.5 times the cell size. What's the best way to do this? I'm using ArcPro with an advanced license.

You can have a look at the answers to a similar question here (using R --- but you have a license for that too)
https://gis.stackexchange.com/questions/119993/convert-line-shapefile-to-raster-value-total-length-of-lines-within-cell/120175

Questions about Glyph sizes

I read in the PDF Specification that a glyphs's actual size depends on the Tm, CTM, and other Text state operators. Can anyone explain why?
Say that I have the values of Tm, CTM, and other text state operators (if they are applicable), how will I use them to get the glyph's size and position?

The CTM and TRM together define a box in which the glyf is displayed. But you would also need to consider things like the Bounding box on the Gylf and the FontBox.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas