How to set fill alpha in PDF - pdf

This is a red box:
162 86 m 162 286 l 362 286 l 362 86 l h
1 0 0 rg f
How can I add partial transparency to it?
I've read the transparency section of the PDF spec, but I can only seem to find models and formulas, not how to actually add alpha to a fill.

As the OP indicated, there is a whole section in the PDF specification on the topic of Transparency. This is due to a multitude of ways to apply transparency. The most appropriate way for the OP's context is explained in the following section:
11.6.4.4 Constant Shape and Opacity
The current alpha constant parameter in the graphics state (see “Graphics State”) shall be two scalar values—one for strokes and one for all other painting operations—to be used for the constant shape (f_k) or
constant opacity (q_k) component in the colour compositing formulas.
NOTE 1 This parameter is analogous to the current colour used when painting elementary objects.
The nonstroking alpha constant shall also be applied when painting a transparency group’s results onto its backdrop.
The stroking and nonstroking alpha constants shall be set, respectively, by the CA and ca entries in a graphics state parameter dictionary (see “Graphics State Parameter Dictionaries”). As described previously for the soft mask, the alpha source flag in the graphics state shall determine whether the alpha constants are interpreted as shape values (true) or opacity values (false).
Thus, you first have to define an appropriate graphics state parameter dictionary in the page resources, e.g.:
/Resources<</ExtGState<<
/GS1 <</ca 0.5>>
>>>>
Now you can use these named graphics state parameters in your content stream:
/GS1 gs
1 0 0 rg
162 86 m
162 286 l
362 286 l
362 86 l
h
f
If drawn upon a green lattice, the result looks like this:
By the way, there was an error in the OP's original content stream fragment
162 86 m 162 286 l 362 286 l 362 86 l h
1 0 0 rg f
The color setting operation here is between the path definition (162 ... l h) and the path filling operation (f). This is invalid, compare Figure 9 – Graphics Objects in the specification, after path construction (and an optional clipping path operator) the path painting operation must follow immediately. (Numerous PDF viewers do accept the invalid operation order but it's invalid nonetheless).
The alpha value for the upcoming operations need not be constant. Instead it can e.g. be governed by a mask with, say, a radial shading.
Indeed, if you define the graphics state parameters like this:
/Resources<</ExtGState<<
/GS1 << /SMask<</Type/Mask/S/Luminosity/G 1 0 R >> >>
>> >>
and the object 1 0 is this XObject:
1 0 obj
<<
/Group<</CS/DeviceGray/S/Transparency>>
/Type/XObject
/Resources<</Shading<<
/Sh1<<
/Coords[262 186 10 262 186 190]
/ColorSpace/DeviceRGB
/ShadingType 3
/Extend[true true]
/Function <</Domain[0 1]/FunctionType 2/N 1/C1[0 0 0]/C0[1 1 1]>>
>>
>>>>
/Subtype/Form
/BBox[0 0 500 400]
/Matrix [1 0 0 1 0 0]
/Length 10
/FormType 1
>>stream
/Sh1 sh
endstream
you get for the above content stream fragment drawn upon a green lattice:

Related

How to find and provide advance and displacement values of a glyph into pdf content stream

I have to write a multi lingual text a pdf using C++. I have unicode values as well as glyph id values with their advances and displacements for the string input.
But I need to know how to position the dependent glyph with the independent base glyph.
Suppose if I have a advance and displacement values using FreeType / HarfBuzz, how should I input these values into the pdf content stream along with the glyph ids in the input.
I have tried the output values of FreeType & HarfBuzz, which could print the individual glyphs properly, but the positioning of the glyphs with its base glyph is not proper still, even if i used the advance and displacement values given in their outputs.
I just need the logic of how to use the output values in the content stream to deliver a proper readable word/letter.
Example:
Text = tamil letter + hindi letter.
I need to print this output.proper output
But currently only I am able to print this. improper output
Tamil combined letter:
வ = U+0BB5 TAMIL LETTER VA = base glyph
ா = U+0BBE TAMIL VOWEL SIGN AA = dependent glyph
HarfBuzz run:
hb-shape.exe -O json -u u+0bb5,u+0bbe --no-glyph-names "C:\\Windows\\Fonts\\Nirmala.ttf"
gid output:
[{"g":2953,"cl":0,"dx":0,"dy":0,"ax":2111,"ay":0},{"g":2959,"cl":0,"dx":0,"dy":0,"ax":1453,"ay":0}]
Hindi combined letter:
म = U+092E DEVANAGARI LETTER MA = base glyph
ि = U+093F DEVANAGARI VOWEL SIGN I = dependent glyph
HarfBuzz run:
hb-shape.exe -O json -u u+092e,u+093f --no-glyph-names "C:\\Windows\\Fonts\\Nirmala.ttf"
gid output:
[{"g":302,"cl":0,"dx":0,"dy":0,"ax":532,"ay":0},{"g":273,"cl":0,"dx":0,"dy":0,"ax":1379,"ay":0}]
Subjecting these output values into the formula,
PDF doc formula
Assuming unity for all variables except width and advance,
by obtaining the width value using FreeType and computing them.
Glyph Advance values for four glyphs in order:
tx = 1769
tx = 1132
tx = 1586
tx = 1448
If I provide these values in the content stream in the order as
<glyph id 1> tx 1 <glyph id 2> tx 2 <glyph id 3> tx 3 <glyph id 4> tx 4
Content stream:
/OC /oc2 BDC q BT /FXF1 1 Tf 70.866142 0.000000 0.000000 70.866142 28.346457 141.732285 Tm[<0B89>-1769<0B8F>-1132<0111>-1586<012E>-1448]TJ ET Q EMC
PDF Doc says (+)ve value of advances will move the text towards left.
Is it other way...?
Or if the difference of the advances is to be obtained...?
Additional PDF objects:
Font descriptor object,Base font object,Font object.
I have tried using only advance values and only computed values also.
The only problem is the horizontal & vertical space within combined glyphs, which also affects the spacing between subsequent glyphs.
Any of these does not render the glyphs as legible, atleast in a generalised programmatic manner.
From my analysis of #mkl at various stack overflow places, I suspect the need for individual transformation matrix or Td for each glyph. But is it that complex...?
As per my thought, it must be easily be rendered.
If individual transformation matrix or Td is the need, then how to compute the values to be supplied in for them.
Any help & guidance is welcome and much appreciated.
Thank you.
It helps to work out pdf as plain text you can compile by save in notepad.
Here I am altering a batch.cmd (work in progress :-) to test my compiler handles the changes as text but you can use raw pdf in editor too. beware cut and paste may need a value or two changed Also unknown yet how you can easily reference non Latin fonts (next hurdle after images, which are almost done), so I used "symbol" font as illustrative of those positioning mods.
Note for specific queries #mkl is the expert I simply do programming by examples, that function not by the book.
%PDF-1.0
%µ¶µ¶
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj
2 0 obj<</Type/Pages/Count 1/Kids[3 0 R]>>endobj
3 0 obj<</Type/Page/Parent 2 0 R/MediaBox [0 0 594 792]/Resources<</Font<< /F1 4 0 R /F2 5 0 R>>>>/Contents 6 0 R>>endobj
4 0 obj<</Type/Font/Subtype/Type1/BaseFont/Helvetica>>endobj
5 0 obj<</Type/Font/Subtype/Type1/BaseFont/Symbol>>endobj
%Comment the following /Length 0999 is a dummy value it should be altered to equal decimal stream length, but most readers will ignore or work around invalid
6 0 obj<</Length 1326>>
stream
q
BT /F1 20 Tf 072 740 Td (20 units (default units usually = pts) high Headline) Tj ET
BT /F1 16 Tf 036 700 Td (All text is "Body" text. (no heads or tails)) Tj ET
BT /F1 10 Tf 004 780 Td (Text can be any order see "Body" text above. (Printed by Filename="C:\Users\K\Downloads\Programming\CMDaPDF\MAKE2PDF.cmd") spot the escape errors) Tj ET
BT /F1 12 Tf 036 675 Td (Here # 12 units high you must include just enough text for parts of a line. PDF has no page feeds no wrapping,) Tj 0 -20 Td (nor \\new line feed, no ¶aragraphs) Tj 86 -15 Td (nor carriage \r\\return. \n\r ) Tj 100 5 Td ( It is not \007\010\011\012\\tabular, each page is one row of multiple pages,) Tj 50 -15 Td (each page is one text column wide .[ ×] no yes check) Tj 0 -10 Td (each row is one text column wide .[x] no is yes) Tj 0 -10 Td (each row is one text column wide . · bullet point OK) Tj ET
BT +0.50 Tc -1.4 Tw 999 TL /F1 1 Tf 15 001 10. 30 200.000 440.000 Tm [(Jane A)600(usten)] TJ ET
BT +0.50 Tc 0.00 Tw 000 TL /F2 1 Tf 15 000 000 15 200.000 430.000 Tm [(Ja)-1000(ne Austen)] TJ ET
BT -1.20 Tc 0.00 Tw 999 TL /F2 1 Tf 15 000 000 15 200.000 420.000 Tm [(J)-1200(a)800(ne Austen)] TJ ET
BT +0.00 Tc 0.00 Tw 000 TL /F2 1 Tf 15 000 000 15 200.000 410.000 Tm [(Jane A)100(us)-500(ten)] TJ ET
Q
endstream
xref
0 7
0000000000 65535 f
0000000019 00000 n
0000000065 00000 n
0000000117 00000 n
0000000242 00000 n
0000000306 00000 n
0000000527 00000 n
trailer<</Size 7/Root 1 0 R>>
startxref
1903
%%EOF

PDF How to get Font object with id not in cross reference table

Like in this discussion,
Tj command with angle brackets
I'm faced with TJ operator where content is between angle brackets:
<00030037005200570044004F000300550048004600520051005100580056>Tj
the parent page gives the list of font object id's like this
Font /C2_0 39 0 R/T1_0 41 0 R/T1_1 43 0 R/T1_2 44 0 R
and for the object where the angle brackets string is, a Tf operator specifies that the font reference is C2_0
So from the font list, I know the C2 font object is 39
Ok, but now, what is the fastest way to access this 39 object that is embedded in a stream object having 16 as id. In this #16 object, there is the list of embedded objects
32 0 33 106 34 131 35 141 36 193 37 436 38 16720 39 16728 ....
So my quetion is how to get the 16 value, when I only know that the font object id 39 is not in the cross reference table? Do I have to parse all stream objects and read their stream object list to detect which one has the object 39?
Thanks for your attention.

Graphing code size

I was curious if there exists a ready-made script that would provide some starting point for an ultimate code size tracker tool. To start with I'd like to be able to graph size with various optimisation options for an number of cross-compiler targets and I'm quite tempted to put this on revision timeline later as well.
So taken the output from size command:
text data bss dec hex filename
1634 0 128 1762 6e2 csv_data.o (ex libs/libxyz.a)
28 0 0 28 1c csv_data_layer.o (ex libs/libxyz.a)
1063 0 0 1063 427 http_parser.o (ex libs/libxyz.a)
1312 0 1024 2336 920 http_queries.o (ex libs/libxyz.a)
8 36 0 44 2c transport.o (ex libs/libxyz.a)
1748 0 3688 5436 153c transport_layer.o (ex libs/libxyz.a)
8 0 0 8 8 misc_allocator.o (ex libs/libxyz.a)
847 108 1 956 3bc misc_err.o (ex libs/libxyz.a)
0 4 0 4 4 misc_globals.o (ex libs/libxyz.a)
273 0 0 273 111 misc_helpers.o (ex libs/libxyz.a)
71 0 4 75 4b misc_printf.o (ex libs/libxyz.a)
1044 0 44 1088 440 misc_time.o (ex libs/libxyz.a)
3724 0 0 3724 e8c xyz.o (ex libs/libxyz.a)
627 0 0 627 273 dummy.o (ex libs/libxyz.a)
8 16 0 24 18 dummy_layer.o (ex libs/libxyz.a)
12395 164 4889 17448 4428 (TOTALS)
With most of values being different when the library is being compiled with various optimisation flags (i.e.: -Os, -O0, -O1, -O2) and a variety of cross-compilers (e.g.: AVR, MSP430, ARMv6, i386), I'd like to make a combined graph or set of graphs using either gnuplot, d3.js, matplotlib or any other package. Has anyone have a seen ready-made script which would help this partially (e.g. at least convert the above tabular format to CSV, JSON or XML) or some study paper that presents a decent visualisation example? I have to admit, it's rather hard to find this using a web search engine.
Here is a possible visualization of the data as bar chart using gnuplot. This is of course not the ultimate visualization, but should be a good starting point.
set style data histogram
set style histogram rowstacked
set style fill solid 1.0 border lc rgb "white"
set xtics rotate 90
set key outside reverse Left
set bmargin 8
plot 'file.dat' using (!(stringcolumn(6) eq "(TOTALS)") ? column(1) : 1/0):xtic(6) title columnheader(1), \
for [i=2:5] '' using (!(stringcolumn(6) eq "(TOTALS)") ? column(i) : 1/0) title columnheader(i)
With the settings set terminal pngcairo size 1000,800, this gives
You must also decide, which columns you want to use, because plotting every column for every file for every compiler will be quite messy. Maybe you want to plot only the size:
set style data histogram
set style histogram clustered
set style fill solid 1.0 noborder
set xtics rotate 90
set key outside reverse Left
set bmargin 8
plot 'file.dat' using (!(stringcolumn(6) eq "(TOTALS)") ? $4 : 1/0):xtic(6) title 'i386', \
'' using (!(stringcolumn(6) eq "(TOTALS)") ? $4*1.2 : 1/0) title 'ARMv6',\
'' using (!(stringcolumn(6) eq "(TOTALS)") ? $4*0.7 : 1/0) title 'AVR'
Which gives you:
Note, that the lengthy using statements are only to skip the last line with the TOTAL. Alternatively you could also remove this last line with head, either when generating the data files, or on-the-fly like this:
plot '< head -n -1 file.dat' using 4:xtic(6) title 'i386', \
'' using ($4*1.2) title 'ARMv6',\
'' using ($4*0.7) title 'AVR'
Of course, for your real data you would have something like
plot '< head -n -1 file-i386.dat' using 4:xtic(6) title 'i386', \
'< head -n -1 file-armv6.dat' using ($4*1.2) title 'ARMv6',\
'< head -n -1 file-avr.dat' using ($4*0.7) title 'AVR'
I hope, this gives you an idea of different visualization possiblities. What might be appropriate, you must decide by yourself.

How to add text object to existing pdf

I have a source pdf which I am modifying by adding text objects. I am using "Incremental Updates" which is mentioned in the PDF specification. But while adding text objects using this method I am making some mistakes due to which the pdf doesn't render properly in Adobe Reader 11. When the pdf is opened and I double-click on it, the added text objects get deleted. I figured out that this is due to text annotation.
Now I want to know how a new text object can be added using incremental update? How do the Contents and RC of a free text annotation have to be to maintained?
Also is it possible to disable or delete the annotation so that my problem can be avoided easily? Because I want a simple pdf, I don't want annotation options.
The source pdf I am using is here.
The modified pdf after adding text object is here.
I am not sure that source pdf is itself proper according to pdf specification.
First off let me show you how easy things are if you can use a decent PDF library. I use iTextSharp as an example but the same can also be done with others like PDFBox or PDFNet (already mentioned by #Ika in his answer):
PdfReader reader = new PdfReader(sourcePdf);
using (PdfStamper stamper = new PdfStamper(reader, targetPdfStream)) {
Font FONT = new Font(Font.FontFamily.HELVETICA, 12, Font.BOLD, new GrayColor(0.75f));
PdfContentByte canvas = stamper.GetOverContent(1);
ColumnText.ShowTextAligned(
canvas,
Element.ALIGN_LEFT,
new Phrase("Hello people!", FONT),
36, 540, 0
);
}
(Derived from the Webified iTextSharp Example StampText.cs explained in chapter 6 of iText in Action — 2nd Edition.)
(Which PDF library you choose, depends on your general requirements and available license models.)
If, in spite of the ease of use of such PDF libraries, you insist on doing it manually, here some remarks:
First you have to find the Page dictionary of the page you want to add content to. Depending on the type of PDF this already might require decompression of object streams etc. but in your sample modified1.pdf that is not necessary:
7 0 obj
<</Rotate 90
/Type /Page
/TrimBox [ 9.54 6.12 585.68 835.88 ]
/Resources 8 0 R
/CropBox [ 0 0 595.22 842 ]
/ArtBox [ 9.54 18.36 585.68 842 ]
/Contents [ 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R ]
/Parent 6 0 R
/MediaBox [ 0 0 595.22 842 ]
/Annots 17 0 R
/BleedBox [ 9.54 6.12 585.68 835.88 ]
>>
endobj
You see the array of references to content streams. This is where you have to add new page content to. You can manipulate an existing stream or create a new stream and add it to that array.
(Most PDFs have their content stream compressed. For the general case, therefore, you'd have to decompress a stream before you can work on it. Thus, in my eyes, the easier way would be to start a new stream.)
You chose to manipulate the last referenced stream 16 0 which in your PDF is uncompressed:
16 0 obj
<</Length 37 0 R>>
stream
S 1 0 0 1 13.183 0 cm 0 0 m
[...]
0 10 -10 -0 506.238 342.629 Tm
.13333 .11765 .12157 scn
-.0002 Tc
.0006 Tw
(the Bank and branch on which cheque is drawn\).)Tj
/F1 2 Tf
-15.1279 10.9462 Td
(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~!##$%^&*aaaaaaaaaaaaa)Tj
/F2 1 Tf
015.1279 01.9462 Td
(ANAabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789)Tj
ET
endstream
endobj
Your additions, I gather, are the two 3-liners at the bottom which first select a font, then position the insertion point and finally print a selection of letters.
Now you say you added text abc..z and ABC...Z just for testing. But letters b j k q v etc not appearing in the pdf. The problem becomes even more visible for your second addition of letters; here only the capital 'A' and 'N' are displayed.
This is due to the fact that the fonts in question are embedded into the PDF --- fonts are embedded into PDFs to allow PDF viewers on systems which don't have the font in question, to display the PDF --- but they are not completely embedded, only the subset of characters required from that font.
Let's look for the font F2 for which only 'N' and 'A' appear:
According to the page object, the page resources can be found in object 8 0:
8 0 obj
<</Font <</F1 45 0 R /TT2 46 0 R /F2 47 0 R>>
/ExtGState <</GS2 48 0 R>>
/ProcSet [ /PDF /Text ]
/ColorSpace <</Cs6 49 0 R>>
>>
endobj
So F2 is defined in 47 0:
47 0 obj
<</Subtype /Type1
/Type /Font
/Widths [ 722 250 250 250 250 250 250 250 250 250 250 250 250 722 ]
/Encoding 52 0 R
/FirstChar 65
/FontDescriptor 53 0 R
/ToUnicode 54 0 R
/BaseFont /ILBPOB+TimesNewRomanPSMT-Bold
/LastChar 78
>>
endobj
In the referenced ToUnicode map 54 0 you see
54 0 obj
<</Length 55 0 R>>stream
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
/Registry (AAAAAA+F2+0) /Ordering (T1UV) /Supplement 0 >> def
/CMapName /AAAAAA+F2+0 def
/CMapType 2 def
1 begincodespacerange <41> <4e> endcodespacerange
2 beginbfchar
<41> <0041>
<4e> <004E>
endbfchar
endcmap CMapName currentdict /CMap defineresource pop end end
endstream
endobj
In this mapping you see that only character codes 0x41 'A' and 0x4e 'N' are mapped
In your document the font is used only to print "NA" in the amount table cells and for nothing else. Thus, only those two letters 'N' and 'A' are embedded, which results in your addition with that font only outputting these letters.
Thus, to successfully add text to the page you either have to check the font ressources associated with the page for the glyphs they provide (and restrict your additions to those glyphs) or you have to add your own font resource.
As the presence of characters in the encoding often is not as easy to see as it is here (ToUnicode is optional), I would propose, you add your own font ressources. The PDF specification ISO 32000-1 explains how to do that.
Furthermore you state the x and y axis position for the text is not properly displaying in pdf. While you don't say what exactly you mean, you should be aware that in the content stream you can apply affine transformations to the coordinate system of the page, i.e. stretch, skew, rotate, and move the axis.
If you want to use the original coordinate system and not depend on the coordinates to be proper at your additions, you should add an initial content stream to the page containing a q operator (to save the current graphics state on the graphics state stack) and start your additions in a new final content stream with a Q operator (to restore the graphics state by removing the most recently saved state from the stack and making it the current state).
EDIT As a sample I applied the Java equivalent of the C# code at the top to your modified1.pdf with append mode activated. The following objects were changed or added as a result:
The page object 7 0 has been updated:
7 0 obj
<</CropBox[0 0 595.22 842]
/Parent 6 0 R
/Contents[69 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R 70 0 R]
/Type/Page
/Resources<<
/ExtGState<</GS2 48 0 R>>
/ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
/ColorSpace<</Cs6 49 0 R>>
/Font<</F1 45 0 R/F2 47 0 R/TT2 46 0 R/Xi0 68 0 R>>
>>
/MediaBox[0 0 595.22 842]
/TrimBox[9.54 6.12 585.68 835.88]
/BleedBox[9.54 6.12 585.68 835.88]
/Annots 17 0 R
/ArtBox[9.54 18.36 585.68 842]
/Rotate 90
>>
endobj
If you compare with your former version, you see that
two new content streams have been added, 69 0 at the start and 70 0 at the end;
the resources are not an indirect object anymore but instead are directly included here;
the resources contain a new Font ressource Xi0 at 68 0.
Now let's look at the added objects.
This is the font ressource for Helvetica-Bold named Xi0 at 68 0:
68 0 obj
<</BaseFont/Helvetica-Bold
/Type/Font
/Encoding/WinAnsiEncoding
/Subtype/Type1
>>
endobj
Non-embedded, standard 14 font resources are not complicated at all...
Now there are the additional content streams. iText does compress them, but I'll show them in an uncompressed state here:
69 0 obj
<</Length 1>>stream
q
endstream
endobj
70 0 obj
<</Length 106>>stream
Q
q
0 1 -1 0 595.22 0 cm
q
BT
1 0 0 1 36 540 Tm
/Xi0 12 Tf
0.75 g
(Hello people!)Tj
0 g
ET
Q
Q
endstream
endobj
So the new content stream at the start stores the current graphic state, and the new one at the end retrieves that stored state, changes the coordinate system, positions for text insertion, selects font, font size, and the fill colour, and finally prints a string.

How to Calculate Leading in PDF Document

How do I calculate leading in a PDF document?
For example:
48 0 0 48 72 677.28 Tm
(Hello World) Tj
0 -1.1075 TD
This renders the text Hello World at 48pt/57.6pt (120% line height) in Times-Roman.
According to the PDF Reference manual, "the leading parameter is measured in unscaled text space units. It specifies the vertical distance between the baselines of adjacent lines of text... The number is expressed in thousandths of a unit of text space."
Can someone please explain how 1.1075 and 57.6 are related?
You pdf commands is incorrect. I suppose you mean:
48 0 0 48 72 677.28 Tm
0 -1.1075 TD
(Hello World) Tj
This code set text coordinate system to (Tm command):
Scale x48 on x and x48 on y
Start position (72, 677.28)
Then it's move position to next line. Next line in 1.1075 "text" pixels. And then move start position by -1.1075 "text" pixels on y coordinate. Text pixel in this example it's pdf pixel multiplyed by 48. It's set by Tm command.
I may simplify you PDF code. It's the same:
48 0 0 48 72 570.096 Tm
(Hello World) Tj
Explanation: 677.28 - (1.1075*48) - (1.1075*48)
YOU should always remember that PDF it's a language. To calculate the real coordinates you shoud parse all previous commands.
There may be something like this before you commands:
10 0 0 10 0 0 cm
The leading is usually set in the PDF by the command TL, just like this:
12 TL
(El ingenioso hidalgo don Quijote de la Mancha)'
That 12 indicates a leading of 12 points until another TL is found.
I hope it helps you. I think this is the easiest way to do it :)