How to Calculate Leading in PDF Document - pdf

How do I calculate leading in a PDF document?
For example:
48 0 0 48 72 677.28 Tm
(Hello World) Tj
0 -1.1075 TD
This renders the text Hello World at 48pt/57.6pt (120% line height) in Times-Roman.
According to the PDF Reference manual, "the leading parameter is measured in unscaled text space units. It specifies the vertical distance between the baselines of adjacent lines of text... The number is expressed in thousandths of a unit of text space."
Can someone please explain how 1.1075 and 57.6 are related?

You pdf commands is incorrect. I suppose you mean:
48 0 0 48 72 677.28 Tm
0 -1.1075 TD
(Hello World) Tj
This code set text coordinate system to (Tm command):
Scale x48 on x and x48 on y
Start position (72, 677.28)
Then it's move position to next line. Next line in 1.1075 "text" pixels. And then move start position by -1.1075 "text" pixels on y coordinate. Text pixel in this example it's pdf pixel multiplyed by 48. It's set by Tm command.
I may simplify you PDF code. It's the same:
48 0 0 48 72 570.096 Tm
(Hello World) Tj
Explanation: 677.28 - (1.1075*48) - (1.1075*48)
YOU should always remember that PDF it's a language. To calculate the real coordinates you shoud parse all previous commands.
There may be something like this before you commands:
10 0 0 10 0 0 cm

The leading is usually set in the PDF by the command TL, just like this:
12 TL
(El ingenioso hidalgo don Quijote de la Mancha)'
That 12 indicates a leading of 12 points until another TL is found.
I hope it helps you. I think this is the easiest way to do it :)

Related

How to find and provide advance and displacement values of a glyph into pdf content stream

I have to write a multi lingual text a pdf using C++. I have unicode values as well as glyph id values with their advances and displacements for the string input.
But I need to know how to position the dependent glyph with the independent base glyph.
Suppose if I have a advance and displacement values using FreeType / HarfBuzz, how should I input these values into the pdf content stream along with the glyph ids in the input.
I have tried the output values of FreeType & HarfBuzz, which could print the individual glyphs properly, but the positioning of the glyphs with its base glyph is not proper still, even if i used the advance and displacement values given in their outputs.
I just need the logic of how to use the output values in the content stream to deliver a proper readable word/letter.
Example:
Text = tamil letter + hindi letter.
I need to print this output.proper output
But currently only I am able to print this. improper output
Tamil combined letter:
வ = U+0BB5 TAMIL LETTER VA = base glyph
ா = U+0BBE TAMIL VOWEL SIGN AA = dependent glyph
HarfBuzz run:
hb-shape.exe -O json -u u+0bb5,u+0bbe --no-glyph-names "C:\\Windows\\Fonts\\Nirmala.ttf"
gid output:
[{"g":2953,"cl":0,"dx":0,"dy":0,"ax":2111,"ay":0},{"g":2959,"cl":0,"dx":0,"dy":0,"ax":1453,"ay":0}]
Hindi combined letter:
म = U+092E DEVANAGARI LETTER MA = base glyph
ि = U+093F DEVANAGARI VOWEL SIGN I = dependent glyph
HarfBuzz run:
hb-shape.exe -O json -u u+092e,u+093f --no-glyph-names "C:\\Windows\\Fonts\\Nirmala.ttf"
gid output:
[{"g":302,"cl":0,"dx":0,"dy":0,"ax":532,"ay":0},{"g":273,"cl":0,"dx":0,"dy":0,"ax":1379,"ay":0}]
Subjecting these output values into the formula,
PDF doc formula
Assuming unity for all variables except width and advance,
by obtaining the width value using FreeType and computing them.
Glyph Advance values for four glyphs in order:
tx = 1769
tx = 1132
tx = 1586
tx = 1448
If I provide these values in the content stream in the order as
<glyph id 1> tx 1 <glyph id 2> tx 2 <glyph id 3> tx 3 <glyph id 4> tx 4
Content stream:
/OC /oc2 BDC q BT /FXF1 1 Tf 70.866142 0.000000 0.000000 70.866142 28.346457 141.732285 Tm[<0B89>-1769<0B8F>-1132<0111>-1586<012E>-1448]TJ ET Q EMC
PDF Doc says (+)ve value of advances will move the text towards left.
Is it other way...?
Or if the difference of the advances is to be obtained...?
Additional PDF objects:
Font descriptor object,Base font object,Font object.
I have tried using only advance values and only computed values also.
The only problem is the horizontal & vertical space within combined glyphs, which also affects the spacing between subsequent glyphs.
Any of these does not render the glyphs as legible, atleast in a generalised programmatic manner.
From my analysis of #mkl at various stack overflow places, I suspect the need for individual transformation matrix or Td for each glyph. But is it that complex...?
As per my thought, it must be easily be rendered.
If individual transformation matrix or Td is the need, then how to compute the values to be supplied in for them.
Any help & guidance is welcome and much appreciated.
Thank you.
It helps to work out pdf as plain text you can compile by save in notepad.
Here I am altering a batch.cmd (work in progress :-) to test my compiler handles the changes as text but you can use raw pdf in editor too. beware cut and paste may need a value or two changed Also unknown yet how you can easily reference non Latin fonts (next hurdle after images, which are almost done), so I used "symbol" font as illustrative of those positioning mods.
Note for specific queries #mkl is the expert I simply do programming by examples, that function not by the book.
%PDF-1.0
%µ¶µ¶
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj
2 0 obj<</Type/Pages/Count 1/Kids[3 0 R]>>endobj
3 0 obj<</Type/Page/Parent 2 0 R/MediaBox [0 0 594 792]/Resources<</Font<< /F1 4 0 R /F2 5 0 R>>>>/Contents 6 0 R>>endobj
4 0 obj<</Type/Font/Subtype/Type1/BaseFont/Helvetica>>endobj
5 0 obj<</Type/Font/Subtype/Type1/BaseFont/Symbol>>endobj
%Comment the following /Length 0999 is a dummy value it should be altered to equal decimal stream length, but most readers will ignore or work around invalid
6 0 obj<</Length 1326>>
stream
q
BT /F1 20 Tf 072 740 Td (20 units (default units usually = pts) high Headline) Tj ET
BT /F1 16 Tf 036 700 Td (All text is "Body" text. (no heads or tails)) Tj ET
BT /F1 10 Tf 004 780 Td (Text can be any order see "Body" text above. (Printed by Filename="C:\Users\K\Downloads\Programming\CMDaPDF\MAKE2PDF.cmd") spot the escape errors) Tj ET
BT /F1 12 Tf 036 675 Td (Here # 12 units high you must include just enough text for parts of a line. PDF has no page feeds no wrapping,) Tj 0 -20 Td (nor \\new line feed, no ¶aragraphs) Tj 86 -15 Td (nor carriage \r\\return. \n\r ) Tj 100 5 Td ( It is not \007\010\011\012\\tabular, each page is one row of multiple pages,) Tj 50 -15 Td (each page is one text column wide .[ ×] no yes check) Tj 0 -10 Td (each row is one text column wide .[x] no is yes) Tj 0 -10 Td (each row is one text column wide . · bullet point OK) Tj ET
BT +0.50 Tc -1.4 Tw 999 TL /F1 1 Tf 15 001 10. 30 200.000 440.000 Tm [(Jane A)600(usten)] TJ ET
BT +0.50 Tc 0.00 Tw 000 TL /F2 1 Tf 15 000 000 15 200.000 430.000 Tm [(Ja)-1000(ne Austen)] TJ ET
BT -1.20 Tc 0.00 Tw 999 TL /F2 1 Tf 15 000 000 15 200.000 420.000 Tm [(J)-1200(a)800(ne Austen)] TJ ET
BT +0.00 Tc 0.00 Tw 000 TL /F2 1 Tf 15 000 000 15 200.000 410.000 Tm [(Jane A)100(us)-500(ten)] TJ ET
Q
endstream
xref
0 7
0000000000 65535 f
0000000019 00000 n
0000000065 00000 n
0000000117 00000 n
0000000242 00000 n
0000000306 00000 n
0000000527 00000 n
trailer<</Size 7/Root 1 0 R>>
startxref
1903
%%EOF

PDF sourcecode of text with tab/tabstop or fixed width trick

I have this s variable string:
ID#9NAME#9VALUE
How this string look like in PDF?
(ID) Tj (NAME) Tj (VALUE) Tj
I have to convert the s variable to PDF string.
How can I change the #9 character to a working tabstop character?
I can change the #9 character to 7 pieces of #20, but it is not good for me, because I and W are different widths.
Is there a trick?
Like horizontal spacing in percent?
(ID) Tj
some code that spacing 100 horizontal pixels
(NAME) Tj
some code that spacing 100 horizontal pixels
(VALUE) Tj
Your #9 seems to be ^09 i.e. (HT)
That should be x09 in (Base 16 / hex.) or \011 (base 8) or \t in literal string
IF defined like that in a base font, then you should be able to insert that.
(ID\t\tNAME\t\tVALUE) TJ
or
(ID\011\011NAME\011\011VALUE) Tj
However as pointed out by #mkl those were traditional mechanical printer carriage stops that could be set ON at 4 or 8 characters from line left or anything the printer operator chose to place indents or columns. Thus in a Word Processor are highly variable in number and position. But in a PDF are usually ignored.
In PDF it is more conventional to set each block of characters at a new x,y position, where y is constant for each text block at that elevation.
So for a tab stop approach with tabs at one inch (based on default 1 unit =1/72") try this
stream
q
BT
/F1 12 Tf
1 0 0 1 144 720 Tm
(ID) Tj
ET
BT
/F1 12 Tf
1 0 0 1 216 720 Tm
(NAME) Tj
ET
BT
/F1 12 Tf
1 0 0 1 288 720 Tm
(VALUE) Tj
ET
Q
endstream
Remember in PDF all whitespace is equal, but some is more so than others.
so here find id name value accepts the non existent tabs as a single white space:-
Finally to answer your query you can set fixed space from the start of a text to the start of another just like tab stops using Td. Note I had intentionally used TD and Td mixed to show it does not matter in such cases :-), however the Human readable convention is to use CAPS for the Object (Noun) and lowercase the "action" (Verb) so Td is better for debugging by eye.
This can be written as suggested by #mkl (with me adding a start point)
50 800 Td (ID) Tj 100 0 Td (NAME) Tj 100 0 Td (VALUE) Tj
In comments you asked about adding lines and the simplest, for line by line loop programming, is to use something like this. (In this case skipping 780) and contra my above comment BT and ET are usually both CAPS.
BT 50 800 Td (ID) Tj 100 0 Td (NAME) Tj 100 0 Td (VALUE) Tj ET
BT 50 760 Td (A1) Tj 100 0 Td (Example) Tj 100 0 Td (2000) Tj ET
BT 50 740 Td (B2) Tj 100 0 Td (Another) Tj 100 0 Td (1000) Tj ET

Rasterizing Paths data from Photoshop file

I was able to read the paths data from a Photoshop file.Photoshop File Format. The curves bezier curves. I want to convert this data into pixel format. How do i do this?.
Read thouroughly the documentation given On the Adobe's Website. I separated the data as 26 byte records.
Let's say one of the record is as follows
0 0 | 0 12 0 0 0 1 0 0 | 0 0 0 0 0 0 0 0 | 0 0 0 0 0 0 0 0
The first two bytes of each record is a selector to indicate what kind of path it is. 0 0 indicates that it is a Closed subpath length record.
The next 8 Bytes tell us the control point for the Bezier segment preceding the knot. Now this can again be split into two components X and Y
the first 4 bytes are the vertical components.
0 12 0 0
I converted 12 0 0 to binary format and add them and them convert to decimal
00001100 + 00000000 + 00000000 = 00001100
and then converted the result back to decimal. Which gave me Y co-ordinate.
where 0 indicates that the position is in the positive range(Signed magnitude form).
The next 8 bytes indicate the the anchor point for the knot, and the last 8 bytes the control point for the Bezier segment leaving the knot. The X and Y Components can be found in a similar manner.
I had this data exported to a svg file and then ran a rasterizer to convert the point data to pixel data.
If someone comes across this post I Hope this helps. :)

Text positioning in PDF file

I'm experiencing a real difficulty when trying to compute (tx,ty) position of text objects from a parsed PDF stream.
I have a following stream code:
BT
0.75 0.68 0.67 0.902 k
/GS0 gs
/TT0 1 Tf
-0.018 Tc 7.56 0 0 7.56 77.1871 528.3107 Tm
(Text line 1)Tj
-0.019 Tc 0 -1.917 TD
(Text line 2)Tj
-0.017 Tc 0 -2.917 TD
(HEADER)Tj
ET
q
43.167 489.881 7.56 7.56 re
W n
BT
/TT0 1 Tf
0 Tc 7.56 0 0 7.56 43.1671 491.7707 Tm
(INDEX)Tj
ET
When I open this PDF in some PDF reader, the HEADER and INDEX objects appear exactly next to each other, as they were in the same line.
However, when calculating HEADER's ty value from previous Tm (528,3107), I get 491,7657 which is 0,0050 lower than INDEX's ty (491,7707). In other parts of file the more text paragraph has the greater is this difference.
Basically what I do is multiply Tm's scaling factor (7,56) by TD's ty deltas. Obiously, I'am doing it wrong, but still - on the net there is little docs for dummies like myself...
So my question is - how to the other PDF readers interpret HEADER and INDEX ty values as equal, so they print it at the same ty?

How to set fill alpha in PDF

This is a red box:
162 86 m 162 286 l 362 286 l 362 86 l h
1 0 0 rg f
How can I add partial transparency to it?
I've read the transparency section of the PDF spec, but I can only seem to find models and formulas, not how to actually add alpha to a fill.
As the OP indicated, there is a whole section in the PDF specification on the topic of Transparency. This is due to a multitude of ways to apply transparency. The most appropriate way for the OP's context is explained in the following section:
11.6.4.4 Constant Shape and Opacity
The current alpha constant parameter in the graphics state (see “Graphics State”) shall be two scalar values—one for strokes and one for all other painting operations—to be used for the constant shape (f_k) or
constant opacity (q_k) component in the colour compositing formulas.
NOTE 1 This parameter is analogous to the current colour used when painting elementary objects.
The nonstroking alpha constant shall also be applied when painting a transparency group’s results onto its backdrop.
The stroking and nonstroking alpha constants shall be set, respectively, by the CA and ca entries in a graphics state parameter dictionary (see “Graphics State Parameter Dictionaries”). As described previously for the soft mask, the alpha source flag in the graphics state shall determine whether the alpha constants are interpreted as shape values (true) or opacity values (false).
Thus, you first have to define an appropriate graphics state parameter dictionary in the page resources, e.g.:
/Resources<</ExtGState<<
/GS1 <</ca 0.5>>
>>>>
Now you can use these named graphics state parameters in your content stream:
/GS1 gs
1 0 0 rg
162 86 m
162 286 l
362 286 l
362 86 l
h
f
If drawn upon a green lattice, the result looks like this:
By the way, there was an error in the OP's original content stream fragment
162 86 m 162 286 l 362 286 l 362 86 l h
1 0 0 rg f
The color setting operation here is between the path definition (162 ... l h) and the path filling operation (f). This is invalid, compare Figure 9 – Graphics Objects in the specification, after path construction (and an optional clipping path operator) the path painting operation must follow immediately. (Numerous PDF viewers do accept the invalid operation order but it's invalid nonetheless).
The alpha value for the upcoming operations need not be constant. Instead it can e.g. be governed by a mask with, say, a radial shading.
Indeed, if you define the graphics state parameters like this:
/Resources<</ExtGState<<
/GS1 << /SMask<</Type/Mask/S/Luminosity/G 1 0 R >> >>
>> >>
and the object 1 0 is this XObject:
1 0 obj
<<
/Group<</CS/DeviceGray/S/Transparency>>
/Type/XObject
/Resources<</Shading<<
/Sh1<<
/Coords[262 186 10 262 186 190]
/ColorSpace/DeviceRGB
/ShadingType 3
/Extend[true true]
/Function <</Domain[0 1]/FunctionType 2/N 1/C1[0 0 0]/C0[1 1 1]>>
>>
>>>>
/Subtype/Form
/BBox[0 0 500 400]
/Matrix [1 0 0 1 0 0]
/Length 10
/FormType 1
>>stream
/Sh1 sh
endstream
you get for the above content stream fragment drawn upon a green lattice: