Clipping path seems to be outside of text - pdf

recently I wanted to construct a PDF document which should have text clipping: With 4 Tr I tried to define the text as clipping area. But when I wanted to fill the lower part of the text with red color, the result was reversed.
Do anyone knows, why?
Thanks for any answer!
stream
BT
4 8 Td
0.8 0.2 0.7 rg % Writing lila.
4 Tr % Fill & Use text as clipping area.
/TR 32 Tf
(Hallo Welt) Tj
1 0 0 rg % Fill in red.
0 0 200 20 re F % <- Mistake?
ET
What I wanted to have:
What I got:

Have a look at the specification ISO 32000-1:
The behaviour of the clipping modes requires further explanation. Glyph outlines shall begin accumulating if a BT operator is executed while the text rendering mode is set to a clipping mode or if it is set to a clipping mode within a text object. Glyphs shall accumulate until the text object is ended by an ET operator; the text rendering mode shall not be changed back to a nonclipping mode before that point.
(section 9.3.6 Text Rendering Mode )
In your sample you don't wait until the ET for the clipping path to take effect. So, when you are painting the red rectangle, your special clipping path is not yet in effect.
Furthermore your operation sequence actually is invalid! Neither path construction nor path painting operators (i.e. neither your 0 0 200 20 re nor your F) are allowed inside a text object, cf. Figure 9 – Graphics Objects in the specification:
Thus, strictly speaking your PDF viewer had better refuse to draw your content stream at all.

Related

How to get a uniform line width in PDF regardless of the device space aspect ratio?

The width of a line in PDF is defined in terms of distances in the user space. In my use case, the aspect ratio of the device space (e.g. 4:3) is different from the aspect ratio of the user space (e.g. 1:1), which causes the line widths in the device space to be different in vertical and horizontal directions.
For example, in this picture the horizontal and vertical lines should be of the same width, but they're not:
I would like to perform scaling that only results in line width uniformity and does not affect anything else.
I asked a similar question regarding PostScript here: How to ensure line widths are the same vertically and horizontally in PostScript?. A solution based in part on the answer to this question works for PostScript, but does not work in PDF after what seems to be an almost one-to-one translation.
I tried changing the stroke command S to q 1 0 0 1.5 0 0 cm S Q h, where q saves the graphics state, 1 0 0 1.5 0 0 cm scales the current transformation matrix, Q restores the graphics state, and h closes the current subpath. However, in addition to correctly scaling the line widths, this also scales the y-coordinates of the line endpoints by 1.5.
This is what I need to get:
But with q 1 0 0 1.5 0 0 cm S Q h, I get this instead:
How to make the line width uniform in the device space in PDF without affecting anything else?

How do Text Objects in PDF work?

I have a PDF document of which I would like to remove watermarks as automatically as possible to get better results from pdftotext.
After uncompressing it with pdftk I see the watermark almost in plain text:
BT
1 0 0 1 277.40012 755.2005 Tm
0.501961 0.501961 0.501961 rg /R1 gs /R2 8 Tf
[()]TJ
0 0 Td
[(Abc)30(defghi K)30(lm)-40(no)]TJ
-5.423981 -9.600038 Td
[()]TJ
0 0 Td
[(Apr 01, 2017 12:34)]TJ
ET
The watermark is
Abcdefghi Klmno
Apr 01, 2017 12:34
After skimming through Document management — Portable document format (especially page 248f), I found the following:
BT: Begin Text
Tm: Text matrix - what is that?
x y Td: Move to the start of the next line with an offset of (x, y)
TJ: Text showing
Tf: Text state
ET: End Text
What I don't understand is all the numbers and why
[(Abc)30(defghi K)30(lm)-40(no)]TJ
Does it increase the space between Abc and defghi K and decrease the space between lm and no (seems so, looking at Figure 46 on page 259)? By what unit?
What does Tf do?
Could somebody please explain that?
What I don't understand is all the numbers and why
[(Abc)30(defghi K)30(lm)-40(no)]TJ
Does it increase the space between Abc and defghi K and decrease the space between lm and no (seems so, looking at Figure 46 on page 259)?
Nearly so, the positive value decreases and the negative value increases, cf. Table 109 – Text-showing operators in the PDF specification:
array
TJ :
Show one or more text strings, allowing individual glyph positioning. Each element of array shall be either a string or a number. If the element is a string, this operator shall show the string. If it is a number, the operator shall adjust the text position by that amount; that is, it shall translate the text matrix, Tm. The number shall be expressed in thousandths of a unit of text space (see 9.4.4, "Text Space Details"). This amount shall be subtracted from the current horizontal or vertical coordinate, depending on the writing mode. In the default coordinate system, a positive adjustment has the effect of moving the next glyph painted either to the left or down by the given amount. Figure 46 shows an example of the effect of passing offsets to TJ.
The figure is misleading, obviously some type-setting program scrambled up the effect the author wanted to show. The actual source of the figure looks like this:
BT
/T1_2 1 Tf
0 Tc 8.7503 0 0 8.7503 118.989 450.2115 Tm
[([ \()11(A)53(W)57(A)79(Y again\) ] )41(T)43(J)]TJ
40.0016 0 0 40.0015 296.9949 440.2111 Tm
[(A)53(W)57(A)79(Y again)]TJ
8.7503 0 0 8.7503 118.989 403.2097 Tm
[([ \()11(A)9(\) 120 \()-50(W)-55(\) 120 \()11(A)9(\) 95 \()-41(Y again\) ] )41(T)43(J)]TJ
40.0016 0 0 40.0015 296.9949 392.2093 Tm
(AWAY again)Tj
ET
By what unit?
thousandths of a unit of text space, cf. the quote above.
Text space is the coordinate system in which text is shown. It shall be defined by the text matrix, Tm, and the text state parameters Tfs, Th, and Trise, which together shall determine the transformation from text space to user space.
This often coincides with a single unit in glyph space
What does Tf do?
According to Table 105 – Text state operators in the PDF specification
font size Tf :
Set the text font, Tf, to font and the text font size, Tfs, to size. font shall be the name of a font resource in the Font subdictionary of the current resource dictionary; size shall be a number representing a scale factor. There is no initial value for either font or size; they shall be specified explicitly by using Tf before any text is shown.
The only thing I don't understand now is the line
0.501961 0.501961 0.501961 rg /R1 gs /R2 8 Tf
Can you explain that, too?
The instruction
0.501961 0.501961 0.501961 rg
sets the fill color to a medium gray in an RGB color space.
Then
/R1 gs
sets additional graphics state parameters from the ExtGState resource named R1; probably here some transparency effect is defined.
Finally
/R2 8 Tf
sets the font to one defined by the Font resource named R2 and the font size to 8.
Partial answer
Tf
font size Tf
sets the font and the size (see page 244)
gs
dictName gs sets the graphic state:
(PDF 1.2) Set the specified parameters in the graphics state.
dictName shall be the name of a graphics state parameter
dictionary in the ExtGState subdictionary of the current resource
dictionary (see the next sub-clause).
I am not too sure what \R1 means.
rg
1.0 1.0 0.0 rg % Set nonstroking colour to yellow
Hence 0.501961 0.501961 0.501961 rg sets the color to some gray value.
Text matrix
The Text matrix is an affine transformation matrix as explained in this answer.
Hence
1 0 0 1 0 0 Tf
doesn't change anything.
The matrix 1 0 0 1 277.40012 755.2005 Tm moves the text right (?) by 277.40012 text units (?) and down by 755.2005 text units.

PDF Low-Level: Drawing a line in the content object?

I have searched extensively online and I have the PDF specification in which I have looked, yet I still can't figure out how to draw a simple black line on a PDF page from the content object's instructions (stream).
Let's say I just want to draw a 1-pixel thickness (assuming 72 dpi) black line at x 400, y 100-300.
This should in theory be a very simple operation, but the PDF spec goes on and on about all kinds of fancy things and appears to forget to explain how I would go about performing this simple operation.
Please can someone point me in the right direction?
In the PDF specification, have a look at chapter 8 (Graphics) and in there section 8.5, Path Construction and Painting.
To draw a simple straight path, you need a "move to" operation followed by a "line to" operation:
400 100 m
400 300 l
You can then stroke the path using the S operator so your code becomes
400 100 m
400 300 l
S
By default the color is black so you've already gotten a black line :-) But if you want to make sure you have to set some parameters in the graphics state.
0 G
1 w
400 100 m
400 300 l
S
The first line now sets the color space to "gray" and puts the shade of grey to 0 (black). The following line sets the line width of your stroked line to 1 user unit (what this comes out as is dependent on your current transformation matrix.
You can apply a neat trick if you really want 1 pixel (please don't for production files though!) and that is to set the width to zero:
0 w
This gives you "the thinnest line that can be rendered at device resolution: 1 device pixel wide".

Is it possible to draw strokes for a path after restoring the graphics state in PDF?

I'm drawing lines in PDF that I want to scale in a ratio other than 1:1.
The problem is that i get strokes that looks like they been drawn with a caligraphic pen.
Is it possible somehow in PDF to resize the path, restore the graphics state and then draw the stroke of the previous path.
This is how I get caligraphical line strokes in PDF:
5 w // width of stroke
q // saves the current graphics state
0 1 0 0.2 0 0 cm // transformation matrix scaling with height reduced to 20%
0 10 m // Start of line
10 10 l // line to
20 100 l
30 100 l
40 10 l
S // draws stroke
Q // Restores graphics state
In HTML5 canvas it's possible to draw stroke after restoring the graphics state so that the path is drawn by a equally width line.
http://www.html5canvastutorials.com/advanced/html5-canvas-ovals/
In PDF putting S after Q doesn't work.
Is there some way to get the same result in PDF where only the line path gets scaled, not the stroke itself?
Have a look at Figure 9 - Graphics Objects - on page 113 of the PDF specification ISO 32000-1:2008. It illustrates that as soon as you have started constructing a path, the only allowed operators are those for path construction, path clipping, and path painting. Q being a special graphics state operator is only allowed after a path painting operator, e.g. your S.
This also is stated in the example right below the graphic:
The path construction operators m and re signal the beginning
of a path object. Inside the path object, additional path construction
operators are permitted, as are the clipping path operators W and
W*, but not general graphics state operators such as w or J.
A path-painting operator, such as S or f, ends the path object
and returns to the page description level.
Thus in response to "Is there some way to get the same result in PDF where only the line path get's scaled, not the stroke itself?": No, you have to explicitly select a smaller stroke width to compensate the different scale introduced by the transmation matrix.

PDF Low-Level: Adding text as an invisible layer with each letter in specific position

I'm writing a PDF file directly from code, it's all working nicely, but I don't know how to add text into the content object of a page with each letter at a specific position.
I have the coordinates of each letter, something like this:
x0 y0 x1 y1
a = 345,200,350,210
n = 352,201,360,209
d = 365,200,371,212
I want to be able to put this onto the PDF page as an invisible layer so it can be searched or selected, but with each letter in the exact correct coordinates.
Alternatively I could do it with only the coordinates for each word, if this is better.
What is the format for writing this into the content object?
Thank you very much for your help!
There are many ways of doing this. You'll need to use a text block:
BT
%..you need to set a font...
/f1 10 Tf
%..you need to set the text matrix to include Tx and Ty (if not already done)..
1 0 0 1 345 200 Tm
(a) Tj % or (and) Tj to display the word in one go (position of chars depends on font selected)
1 0 0 1 352 201 Tm
(n) Tj
% etc.
ET
You also mentioned that you wanted the text to be invisible. If you are in complete control of the page content you can set the text stroke and fill colour to be the same as the background colour (which will probably be white)
1 1 1 RG
1 1 1 rg
Otherwise you can paint over the text, it will still be selectable.