Continuing from this question, the PDF is now constructed as such:
8 0 obj
<</F 132/Type/Annot/Subtype/Widget/Rect[2 198 100 190]/FT/Sig/DR<<>>/T(Signature1)/V 6 0 R/P 3 0 R/AP<</N 7 0 R>>>>
endobj
6 0 obj
<</Contents <...>/Type/Sig/SubFilter/ETSI.CAdES.detached/M(D:20230128131946+00'00')/ByteRange [0 830 60832 1714]/Filter/Adobe.PPKLite>>
endobj
9 0 obj
<</BaseFont/Helvetica/Type/Font/Subtype/Type1/Encoding/WinAnsiEncoding/Name/Helv>>
endobj
10 0 obj
<</BaseFont/ZapfDingbats/Type/Font/Subtype/Type1/Name/ZaDb>>
endobj
12 0 obj<</Font 13 0 R>>
endobj
13 0 obj<</FAdESFont1 14 0 R>>
endobj
14 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj
15 0 obj
<</Length 90>>stream
BT
2 194 TD
/FAdESFont1 5 Tf
(m#turboirc.com MICHAIL CHOURDAKIS 1/28/2023 15:19:46) Tj
ET
endstream
endobj
7 0 obj
<</Type/XObject/Resources<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]>>/Subtype/Form/BBox[2 198 100 190]/Length 90/FormType 1/Filter/FlateDecode>>stream
BT
2 194 TD
/FAdESFont1 5 Tf
(m#turboirc.com MICHAIL CHOURDAKIS 1/28/2023 15:19:46) Tj
ET
endstream
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 4 0 R>>>>/Contents 5 0 R/Annots[8 0 R]>>
endobj
2 0 obj
<</Type/Pages/MediaBox[0 0 200 200]/Count 1/Kids[3 0 R]>>
endobj
1 0 obj
<</AcroForm<</Fields[8 0 R]/DR<</Font<</Helv 9 0 R/ZaDb 10 0 R>>>>/DA(/Helv 0 Tf 0 g )/SigFlags 3>>/Type/Catalog/Pages 2 0 R>>
endobj
11 0 obj
<</Producer(AdES Tools https://www.turboirc.com)/ModDate(D:20230128131946+00'00')>>
endobj
xref
0 4
0000000000 65535 f
0000061862 00000 n
0000061787 00000 n
0000061681 00000 n
6 10
0000000810 00000 n
0000061409 00000 n
0000000679 00000 n
0000060958 00000 n
0000061056 00000 n
0000062004 00000 n
0000061133 00000 n
0000061165 00000 n
0000061203 00000 n
0000061271 00000 n
trailer
<</Root 1 0 R/Prev 492/Info 11 0 R/Size 20/ID[<6BD3BF95416A5C19FFBC464EC610875C><54ACC00AA74869363131BCC04E65417F>]>>
startxref
62104
%%EOF
The idea is:
Create the annotation object (ID 8) which refers to the signature /V (6) and something to show ? /N (8).
The annotation object is a stream containing the text?
7 0 obj <</Type/XObject/Resources<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]>>/Subtype/Form/BBox[2 198 100 190]/Length 90/FormType 1/Filter/FlateDecode>>stream
BT
2 194 TD
/FAdESFont1 5 Tf
(m#turboirc.com MICHAIL CHOURDAKIS 1/28/2023 15:19:46) Tj
ET
endstream
endobj
This time adobe accepts the signature and has a "box" in which I can click to show signature information, but the text (mail name date) is not displayed.
What am I missing?
In the previous mode I was changing the content of the original root by I learned from this question that this is an incorrect way of adding a visible signature and will not work for re-signing.
Your appearance stream in object 7 has some errors, in particular
Its resources dictionary does not contain a fonts section; so how should the text in it be rendered?
It claims to be flate-encoded but obviously is not.
Related
Where do I find information about how a pdf is made up?
For example: A pdf I created named Dokname containing the string TEST opend in a text-editor looks like this:
(I replaced the parts the text-editor couldn't decode with [...])
%PDF-1.4
%Óëéá
1 0 obj
<</Title (Dokname)
/Producer (Skia/PDF m102 Google Docs Renderer)>>
endobj
3 0 obj
<</ca 1
/BM /Normal>>
endobj
5 0 obj
<</Filter /FlateDecode
/Length 160>> stream
[...]
endstream
endobj
2 0 obj
<</Type /Page
/Resources <</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
/ExtGState <</G3 3 0 R>>
/Font <</F4 4 0 R>>>>
/MediaBox [0 0 596 842]
/Contents 5 0 R
/StructParents 0
/Parent 6 0 R>>
endobj
6 0 obj
<</Type /Pages
/Count 1
/Kids [2 0 R]>>
endobj
7 0 obj
<</Type /Catalog
/Pages 6 0 R>>
endobj
8 0 obj
<</Length1 14972
/Filter /FlateDecode
/Length 7164>> stream
[...]
endstream
endobj
9 0 obj
<</Type /FontDescriptor
/FontName /AAAAAA+ArialMT
/Flags 4
/Ascent 905.27344
/Descent -211.91406
/StemV 45.898438
/CapHeight 715.82031
/ItalicAngle 0
/FontBBox [-664.55078 -324.70703 2000 1005.85938]
/FontFile2 8 0 R>>
endobj
10 0 obj
<</Type /Font
/FontDescriptor 9 0 R
/BaseFont /AAAAAA+ArialMT
/Subtype /CIDFontType2
/CIDToGIDMap /Identity
/CIDSystemInfo <</Registry (Adobe)
/Ordering (Identity)
/Supplement 0>>
/W [0 [750] 40 54 666.99219 55 [610.83984]]
/DW 0>>
endobj
11 0 obj
<</Filter /FlateDecode
/Length 243>> stream
[...]
endstream
endobj
4 0 obj
<</Type /Font
/Subtype /Type0
/BaseFont /AAAAAA+ArialMT
/Encoding /Identity-H
/DescendantFonts [10 0 R]
/ToUnicode 11 0 R>>
endobj
xref
0 12
0000000000 65535 f
0000000015 00000 n
0000000365 00000 n
0000000098 00000 n
0000008721 00000 n
0000000135 00000 n
0000000573 00000 n
0000000628 00000 n
0000000675 00000 n
0000007925 00000 n
0000008159 00000 n
0000008407 00000 n
trailer
<</Size 12
/Root 7 0 R
/Info 1 0 R>>
startxref
8860
%%EOF
What do these obj-elements represent? Where is my TEST? Why did it get scrambled?
What I am searching for can probably all be found in adobe's documentations, but those have hundreds of pages which is very overwhelming. I get that this is a very complex topic and I am not trying to understand it completely. Just looking for an introduction or an overview. Unfontunately I didn't find anything like that on youtube or elsewhere..
Too complex for comments and yes you will only find snippets here and there including this and bits in my and others answers.
For a quick overview of the code sample you provided
A pdf is a collection of objects which are placed in no sequential order. So you start at the end before the last %%EOF (potentially one of many !) with startxref 8860 where 8860 is the decimal address of the Cross(XRef)erence table i.e. the files index.
There are many abbreviations (too many to list) and like a stack language most things may appear (literally) backwards so the xref points to each objects position in the file.
The prime target in this case is 7 0 obj <</Type /Catalog /Pages 6 0 R>> endobj since the catalog tells us about where the number of following pages will be found thus in object 6 /Pages /Count 1 /Kids [2 0 R] so its one page further defined in 2 0 obj
We now see there is an image and font(s) placed within /MediaBox [0 0 596 842] which is roughly (a tad wider) than a standard A4 page since 595/72" is closer to 210 mm.
Too much to describe about that one item alone, so skipping to Where is your text? and we see /Contents 5 0 R so that compressed stream of data that you need to decode is most likely your text but the length (/Length 160) is the binary flate encoded stream with placements not just your raw plain text.
The quantity of date sub setting the font seems odd and excessive for just 4 letters (if it was similar Helvetica it would not need including nor breaking the font as CID ArialMT) and without the full file its hard to say why the words /Image* is there, but it is Google Docs Renderer!
My suspicion is we may see characteristics of OCR in that stream.
The document contains only Text no images the relveant portions of the PDF are as under:
trailer
<</Root 1 0 R>>
1 0 obj
<</Type/Catalog/Pages 3 0 R>>
endobj
3 0 obj
<</Type/Pages/Kids[4 0 R]/Count 1/Rotate 0/ITXT(5.0.6)>>
endobj
4 0 obj
<</Type/Page
/MediaBox[0 0 612 1008]
/Rotate 0
/Parent 3 0 R
/Resources<<
/ProcSet[/PDF/Text]
/ExtGState 12 0 R
/Font 13 0 R>>
/Contents 5 0 R
/Annots[24 0 R]>>
endobj
12 0 obj
<</R7 7 0 R>>
endobj
7 0 obj
<</Type/ExtGState /OPM 1>>
endobj
13 0 obj
<</R8 8 0 R
/R10 10 0 R>>
endobj
8 0 obj
<</BaseFont /LRSXWR+TimesNewRoman
/FontDescriptor 9 0 R
/Type/Font
/FirstChar 1
/LastChar 41
/Widths[
333 722 250 611 722 611 722 667 722 722 667 556 556 389
722 667 722 722 500 333 444 389 500 278 278 500 333 500
444 500 278 250 889 250 500 500 444 500 278 778 500]
/Encoding 16 0 R
/Subtype/TrueType>>
endobj
16 0 obj
<</Type/Encoding
/BaseEncoding/WinAnsiEncoding
/Differences[
1/I/N/space/T/H/E/G/C/O/U/R/F/P/J/A/B
/D/Y/asterisk/r/e/s/n/t/colon/o/f/h/a/p/l/period
/M/comma/d/v/c/two/i/m/u]
>>
endobj
The above information is provided for requirements purposes, the content object which I want to decoded as:
5 0 obj
<</Length 5950>>
stream
q 0.12 0 0 0.12 0 0 cm
/R7 gs
0 0 0 RG
0 0 0 rg
q
8.33333 0 0 8.33333 0 0 cm BT
/R8 14.0388 Tf
0.997231 0 0 1 90.1533 922.927 Tm
[
(SOH)-0.762768(STX)10.3078(ETX)10.019(EOT)10.888
(ENQ)-6.34593(ACK)10.888(ETX)-7.12126(ENQ)2.22552
(SOH)7.32006(BEL)-6.34489(ENQ)10.797(ETX)-7.1223
(BS)7.04592( )-6.34489(\n)10.797(VT)49.899
(EOT)28.0288(ETX)-7.12126( )2.22552(FF)-0.944827
(ETX)10.0196(\r)-0.945874(\n)-5.8573(STX)10.3083
(SQ)-13.6649(SI)10.798(DLE)-10.097(ETX)52.8727
(SI)11.2835(STX)-6.83247(DC1)2.22657(ETX)10.0175
(ENQ)-6.34489(SI)10.798(VT)49.8969(DC2)105.076
(SI)11.2856(STX)-6.83457(SI)53.6511(ETX)61.442
(SI)105.076(EOT)28.0288(ETX)-7.12335(BS)-1.52554
(ENQ)2.22657(SI)11.2835(STX)-6.83247(DC1)10.798
(SOH)-9.82286(BEL)2.22657(SI)
]TJ
412.949 0 Td
[(VT)-1.52763(ENQ)722.166]TJ
.......
.......
Decoding of PDF stream into text is not very simple, because you don't have anything like text there.
You have series of glyhps with very vairable meaning. In your case, you use font 13 0, that consist of 41 characters of /LRSXWR+TimesNewRoman with changes defined in obj 16 0, that has explanations of meanings of glyphs. You must have some translation table from "space" to " " (I'm quite surprised, that there is a glyph for space in your case). This may not be so simple in other cases. I've seen many times, that there was an embeded font with glyphs sorted by usage and there was no other than visual evidence, what which glyph may represent.
Are you sure you want to read the text from pdf files?
I am trying to create a table in pdf using PDF codes . I have successfully created a table and its working fine in LinuX(Ubuntu) but when am trying to opening in windows its showing me an error message that "the file has been damaged".Here is my edited code,
%PDF-1.5
%âãÏÓ
1 0 obj
<<
/PageLayout /OneColumn
/MarkInfo
<<
/Marked true
>>
/Outlines 2 0 R
/Lang <feff0045004e002d00550053>
/Pages 3 0 R
/StructTreeRoot 4 0 R
/Type /Catalog
>>
endobj
2 0 obj
<<
/First 5 0 R
/Type /Outlines
/Count 1
/Last 5 0 R
>>
endobj
3 0 obj
<<
/Kids [6 0 R]
/Type /Pages
/Count 1
>>
endobj
4 0 obj
<<
/ParentTree 7 0 R
/RoleMap 8 0 R
/ParentTreeNextKey 1
/K 9 0 R
/Type /StructTreeRoot
>>
endobj
5 0 obj
<<
/Title (Example table)
/Parent 2 0 R
/A 10 0 R
>>
endobj
6 0 obj
<<
/CropBox [0.0 0.0 612.0 792.0]
/Rotate 0
/StructParents 0
/Parent 3 0 R
/Resources
<<
/ColorSpace
<<
/CS1 11 0 R
/CS0 12 0 R
>>
/Font
<<
/TT2 13 0 R
/TT1 14 0 R
/TT0 15 0 R
>>
>>
/MediaBox [0.0 0.0 612.0 792.0]
/Type /Page
/Contents [16 0 R 17 0 R]
>>
endobj
9 0 obj
<<
/P 4 0 R
/K [18 0 R 19 0 R 20 0 R 21 0 R]
/S /Sect
>>
endobj
7 0 obj
<<
/Nums [0 22 0 R]
>>
endobj
8 0 obj
<<
/Subscript /Span
/Diagram /Figure
/Strikeout /Span
/Outline /Span
/DropCap /Figure
/InlineShape /Figure
/Footnote /Note
/Annotation /Span
/Underline /Span
/Superscript /Span
/Chart /Figure
/Endnote /Note
/TextBox /Art
>>
endobj
10 0 obj
<<
/D [6 0 R /XYZ 72 720 0.0]
/S /GoTo
>>
endobj
16 0 obj
<<
/Length 1991
>>
stream
BT
/H1 <</MCID 0 >>BDC
/CS0 cs 0.212 0.373 0.569 scn
/TT0 1 Tf
0.002 Tw 14.04 0 0 14.04 72 682.8 Tm
[(E)-3(x)4(a)-3(m)1(p)-1(le)10( t)6(a)-3(b)1(le)]TJ
0 Tw 6.496 0 Td
( )Tj
EMC
/P <</MCID 1 >>BDC
/CS1 cs 0 scn
/TT1 1 Tf
0.001 Tc -0.001 Tw 15.96 0 0 15.96 72 664.44 Tm
[(T)-1(hi)-3(s)1( )1(i)-3(s)1( )1(a)-1(n e)3(x)-2(a)-1(m)3(pl)-3(e)3( )-7(o)2(f)-2( )1(a)-1( da)-1(t)-2(a)-1( t)-2(a)-1(bl)-3(e)3(.)]TJ
0 Tc 0 Tw 13.789 0 Td
( )Tj
EMC
ET
/TH <</MCID 3 >>BDC
/CS0 cs 0.553 0.702 0.886 scn
84.84 632.64 76.68 14.88 re
f*
84.84 591.36 5.16 41.28 re
f*
156.36 591.36 5.16 41.28 re
f*
84.84 576.48 76.68 14.88 re
f*
EMC
/P <</MCID 4 >>BDC
90 618 66.36 14.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
0.004 Tc -0.004 Tw 12 0 0 12 90 621.24 Tm
[(D)4(is)3(a)8(b)1(il)10(it)1(y)8( )]TJ
ET
/CS0 cs 0.553 0.702 0.886 scn
90 591.36 66.36 26.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
-0.004 Tc 0.004 Tw 12 0 0 12 90 606.6 Tm
[(C)-5(at)-7(e)-1(go)-6(r)-9(y)]TJ
0 Tc 0 Tw ( )Tj
ET
EMC
/TH <</MCID 7 >>BDC
/CS0 cs 0.553 0.702 0.886 scn
162 625.32 71.76 22.2 re
f*
162 598.68 5.16 26.64 re
f*
228.6 598.68 5.16 26.64 re
f*
162 576.48 71.76 22.2 re
f*
EMC
/P <</MCID 8 >>BDC
167.16 598.68 61.44 26.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
0.003 Tc -0.003 Tw 12 0 0 12 167.16 613.92 Tm
[(P)5(a)7(r)-2(ti)-1(c)1(i)9(pa)7(nts)]TJ
0 Tc 0 Tw 4.95 0 Td
( )Tj
ET
EMC
/TH <</MCID 11 >>BDC
/CS0 cs 0.553 0.702 0.886 scn
234.24 632.64 71.52 14.88 re
f*
234.24 591.36 5.16 41.28 re
f*
300.6 591.36 5.16 41.28 re
f*
234.24 576.48 71.52 14.88 re
f*
EMC
/P <</MCID 12 >>BDC
239.4 618 61.2 14.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
0.004 Tc -0.004 Tw 12 0 0 12 239.4 621.24 Tm
[(B)5(a)8(llo)2(t)1(s)13( )]TJ
ET
/CS0 cs 0.553 0.702 0.886 scn
239.4 591.36 61.2 26.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
-0.003 Tc 0.003 Tw 12 0 0 12 239.4 606.6 Tm
[(C)-4(o)-5(mp)-6(l)-7(et)-6(ed)]TJ
0 Tc 0 Tw 4.55 0 Td
( )Tj
ET
EMC
endstream
endobj
17 0 obj
<<
/Length 707
>>
stream
/P <</MCID 42 >>BDC
q
84.84 550.56 76.68 25.44 re
W n
BT
/TT1 1 Tf
-0.001 Tc 0.001 Tw 11.04 0 0 11.04 90 565.56 Tm
[(Blin)2(d)]TJ
ET
Q
q
84.84 550.56 76.68 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 112.56 565.56 Tm
( )Tj
ET
EMC
/P <</MCID 46 >>BDC
Q
q
162 550.56 71.76 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 195.12 565.56 Tm
(5)Tj
ET
Q
q
162 550.56 71.76 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 200.64 565.56 Tm
( )Tj
ET
EMC
/P <</MCID 50 >>BDC
Q
q
234.24 550.56 71.519 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 267.24 565.56 Tm
(1)Tj
ET
Q
q
234.24 550.56 71.519 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 272.76 565.56 Tm
( )Tj
ET
EMC
endstream
endobj
12 0 obj /DeviceRGB
endobj
11 0 obj /DeviceRGB
endobj
15 0 obj
<<
/BaseFont /Times-Roman
/Subtype /Type1
/Type /Font
/Encoding /WinAnsiEncoding
>>
endobj
14 0 obj
<<
/BaseFont /Helvetica
/Subtype /Type1
/Type /Font
/Encoding /WinAnsiEncoding
>>
endobj
13 0 obj
<<
/BaseFont /Courier
/Subtype /Type1
/Type /Font
/Encoding /WinAnsiEncoding
>>
endobj
18 0 obj
<<
/Pg 6 0 R
/P 9 0 R
/K 0
/S /H1
>>
endobj
19 0 obj
<<
/Pg 6 0 R
/P 9 0 R
/K 1
/S /P
>>
endobj
20 0 obj
<<
/P 9 0 R
/A 23 0 R
/K [24 0 R 25 0 R]
/S /Table
>>
endobj
21 0 obj
<<
/Pg 6 0 R
/P 9 0 R
/K 144
/S /P
>>
endobj
22 0 obj [18 0 R 19 0 R null 26 0 R 27 0 R null null 28 0 R 29 0 R null null 30 0 R 31 0 R null null null null null null null null null 24 0 R null null null null null null null null null null null null null null null null null null null 32 0 R null null null 33 0 R null null null 34 0 R null null null null null null null null null null null null null 25 0 R null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null 35 0 R null null null 36 0 R null null null 37 0 R null null null null null null null null null null null null null 38 0 R null null null null null null null null null null null null null null null null null null null null null null null null null 21 0 R]
endobj
23 0 obj
<<
/O /Layout
/Placement /Block
/BBox [84.11 446.51 545.89 648.25]
>>
endobj
24 0 obj
<<
/Pg 6 0 R
/P 20 0 R
/K [26 0 R 28 0 R 30 0 R]
/S /TR
>>
endobj
25 0 obj
<<
/Pg 6 0 R
/P 20 0 R
/K [39 0 R 40 0 R 41 0 R]
/S /TR
>>
endobj
38 0 obj
<<
/Pg 6 0 R
/P 20 0 R
/K [42 0 R 43 0 R 44 0 R]
/S /TR
>>
endobj
26 0 obj
<<
/Pg 6 0 R
/P 24 0 R
/K [27 0 R]
/S /TH
>>
endobj
27 0 obj
<<
/Pg 6 0 R
/P 26 0 R
/K 4
/S /P
>>
endobj
28 0 obj
<<
/Pg 6 0 R
/P 24 0 R
/K [29 0 R]
/S /TH
>>
endobj
29 0 obj
<<
/Pg 6 0 R
/P 28 0 R
/K 8
/S /P
>>
endobj
30 0 obj
<<
/Pg 6 0 R
/P 24 0 R
/K [11 31 0 R]
/S /TH
>>
endobj
31 0 obj
<<
/Pg 6 0 R
/P 30 0 R
/K 12
/S /P
>>
endobj
32 0 obj
<<
/Pg 6 0 R
/P 39 0 R
/K 42
/S /P
>>
endobj
33 0 obj
<<
/Pg 6 0 R
/P 40 0 R
/K 46
/S /P
>>
endobj
34 0 obj
<<
/Pg 6 0 R
/P 41 0 R
/K 50
/S /P
>>
endobj
35 0 obj
<<
/Pg 6 0 R
/P 42 0 R
/K 96
/S /P
>>
endobj
36 0 obj
<<
/Pg 6 0 R
/P 43 0 R
/K 100
/S /P
>>
endobj
37 0 obj
<<
/Pg 6 0 R
/P 44 0 R
/K 104
/S /P
>>
endobj
39 0 obj
<<
/P 25 0 R
/K 32 0 R
/S /TD
>>
endobj
40 0 obj
<<
/P 25 0 R
/K 33 0 R
/S /TD
>>
endobj
41 0 obj
<<
/P 25 0 R
/K 34 0 R
/S /TD
>>
endobj
42 0 obj
<<
/P 38 0 R
/K 35 0 R
/S /TD
>>
endobj
43 0 obj
<<
/P 38 0 R
/K 36 0 R
/S /TD
>>
endobj
44 0 obj
<<
/P 38 0 R
/K 37 0 R
/S /TD
>>
endobj xref
0 45
0000000000 65535 f
0000000015 00000 n
0000000190 00000 n
0000000263 00000 n
0000000322 00000 n
0000000430 00000 n
0000000500 00000 n
0000000849 00000 n
0000000889 00000 n
0000000775 00000 n
0000001130 00000 n
0000004027 00000 n
0000003999 00000 n
0000004203 00000 n
0000004130 00000 n
0000004055 00000 n
0000001190 00000 n
0000003237 00000 n
0000004274 00000 n
0000004329 00000 n
0000004383 00000 n
0000004460 00000 n
0000004516 00000 n
0000005296 00000 n
0000005384 00000 n
0000005461 00000 n
0000005615 00000 n
0000005678 00000 n
0000005733 00000 n
0000005796 00000 n
0000005851 00000 n
0000005917 00000 n
0000005973 00000 n
0000006029 00000 n
0000006085 00000 n
0000006141 00000 n
0000006197 00000 n
0000006254 00000 n
0000005538 00000 n
0000006311 00000 n
0000006362 00000 n
0000006413 00000 n
0000006464 00000 n
0000006515 00000 n
0000006566 00000 n
trailer
<<
/Root 1 0 R
/Size 45
>>
startxref
6616
%%EOF
Note: "opening in windows" is a non-statement. You cannot "open" a PDF in Windows, you need certain software to do so. Presumably, you tried using Acrobat Reader or something alike (the error message you quote is from Acrobat Reader).
It works in Mac OS X Preview, but then again that doesn't really tell us very much. Preview is written by Apple, and it's not a really conforming PDF reader (much to the dismay of anyone using, for example, transparency or color spaces). You did not provide an image of what your document is supposed to look like; is it anything like this?
But it does not open in Acrobat X.(a) Inspecting the PDF offsets -- the most likely place for an error -- I found the xref offsets are wrong from 11 0 obj onwards. This leads to a wrong offset of +89 bytes for all next objects, up to and including the ending startxref 6616, which IMO should be 6527.
I manually fixed the 34 wrong offsets by comparing the position of every X 0 obj with a hex editor, and saved with cr line endings. I got an error from my own inspecting tool:
The keyword stream that follows the stream dictionary should be followed by either a
carriage return and a line feed or by just a line feed, and not by a carriage return alone.
(PDF Reference 1.7, §3.2.7)
so I resaved with lf line endings. No errors, it shows correctly in Preview but still not in Acrobat X.
I noticed the /Length keys for objects 16 and 17, the Page Contents objects, were off as well. After correcting them to 1887 and 648, respectively, it still displays in Preview but still not in Acrobat X.
The problem appears to lie in these contents. Requesting for an Inventory shows the error message: "An error occurred while parsing a contents stream. Unable to analyze the PDF file.", and browsing the internal PDF structure I get to see a first handful of text formatting commands from 16 0 obj but they stop at the 15th command:
/CS1 cs 0 scn
and the next command, /TT1 1 Tf, never gets seen.
Ooo-kay. Checking the parameters for scn, I see their number depends on the color space set using cs; and there is your problem.
Both 11 0 obj and 12 0 obj set color spaces, and they both set it to /DeviceRGB. So the number of parameters for /CS1 (defined in 11 0 obj) is wrong -- you only supply one. It's safe to assume you meant this one to be /DeviceGray, and lo and behold, after that final change I got to see this in Acrobat X:
and a proper Inventory and fully browsable PDF structure.
There were lots of minor problems with this file, but the PDF format in itself is quite resilient. The bad offsets, and possibly the lengths, may have been silently corrected (the PDF specification allows that) but the bad parameters for the color space were killing it.
(a) Clarification after re-reading: it does open in Acrobat but silently shows a blank page only; no error message of any kind.
Addition
This made me think: was the /DeviceRGB the only cause of it failing in Acrobat X? No: after reloading the original PDF and changing just that one line, Acrobat says the file is damaged beyond repair. So all that extra checking I did wasn't for nothing, fortunately.
I want to write a multiline text, I've tried this:
6 0 obj
<</Length 59>>
stream
BT /F1 24 Tf 100 520 Td (This is test\n This is test)Tj ET
endstream
endobj
But I am not getting a new line. Is there a simple way to achieve that or I must create another stream with position of the next line?
This is the full code:
%PDF-1.5
1 0 obj <</Type /Catalog /Pages 2 0 R>>
endobj
2 0 obj <</Type /Pages /Kids [3 0 R] /Count 1>>
endobj
3 0 obj<</Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 500 700] /Contents 6 0 R>>
endobj
4 0 obj<</Font <</F1 5 0 R>>>>
endobj
5 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj
6 0 obj
<</Length 75>>
stream
BT
/F1 24 Tf
100 520 Td
(This is test) Tj
T*
(This is test) Tj
ET
endstream
endobj
xref
0 7
0000000000 65535 f
0000000009 00000 n
0000000059 00000 n
0000000116 00000 n
0000000219 00000 n
0000000259 00000 n
0000000328 00000 n
trailer <</Size 7/Root 1 0 R>>
startxref
454
%%EOF
You may want to do something like this:
BT
/F1 24 Tf
30 TL
100 520 Td
(This is test) Tj
T*
(This is test) Tj
ET
or the shorter form:
BT
/F1 24 Tf
30 TL
100 520 Td
(This is test) Tj
(This is test) '
ET
You might want to read up on section 9.4.3 Text-Showing Operators in the PDF specification ISO 32000-1.
P.S.: Added text leading TL operators.
I took the minimal PDF example in the PDF specification from PDF Specification, copied it to NotePad, renamed the file to have the extension .pdf.
I can open it with other PDF viewer (PDF-XChange, SumatraPDF, MuPDF). But when I open it with Adobe Reader, it says the file is broken.
I am not sure if other viewers treat this "broken" file as blank file or not.
The file is supposed to display one blank page, since it is a minimal example.
In fact, I modify the minimal example. Because when I copy it from PDF specification to notepad, and open the .txt file by a Hex Editor, I see a new line in .txt file give me 2 space. For example,
1 0 obj
<< /Type /Catalog
gives me (in Hex Editor)
1 0 obj << /Type /Catalog
which is (in hex values)
31 20 30 20 6F 62 6A 0D 0A 3C 3C 20 2F 54 79 70
65 20 2F 43 61 74 61 6C 6F 67
The 2 spaces between j and < are 0D 0A.
Hence I don't make new lines in NotePad, and modify the values in the xref part.
Below is the full code.
Do you know what's wrong with this example? Why does Adobe Reader say it is broken? Is this because I gave the wrong values in xref?
%PDF-1.4 1 0 obj << /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj 2 0 obj << /Type Outlines /Count 0 >> endobj 3 0 obj << /Type /Pages /Kids [4 0 R] /Count 1 >> endobj 4 0 obj << /Type /Page /Parent 3 0 R /MediaBox [0 0 612 792] /Contents 5 0 R /Resources << /ProcSet 6 0 R >> >> endobj 5 0 obj << /Length 35 >> stream … Page-marking operators … endstream endobj 6 0 obj [/PDF] endobj xref 0 7 0000000000 65535 f 0000000009 00000 n 0000000074 00000 n 0000000119 00000 n 0000000176 00000 n 0000000295 00000 n 0000000373 00000 n trailer << /Size 7 /Root 1 0 R >> startxref 395 %%EOF
First: when you 'copied' the example from the PDF specification, very likely a few things happened which made your copy to not work as expected:
...you didn't 'copy' by re-typing the example in a text editor, but
...you used copy'n'paste, using a PDF as the source file.
Depending on your text editor, that method probably caused the conversion of the newline convention to be changed from [cr]+[lf] to [cr] or vice-versa. This in turn means that the byte offset numbers in the object 'table of contents' (the 'xref'-table) are no longer valid.
Another problem with the PDF source code you posted is that it doesn't now contain any linebreaks at all. Some viewers may be able to still silently parse the thing, but not all are. And it certainly is against the spec, because according to the spec, in chapter 7.5.2 it is clearly spelled out that
"The first line of a PDF file shall be a header consisting of the 5 characters %PDF– followed by a version number of the form 1.N, where N is a digit between 0 and 7.
Your header violates that rule.
Also, the 'stream' in 5 0 obj isn't any valid PDF code, it is just place holder text (… Page-marking operators …). Some viewers may be tilting when they come across such 'garbage'.
Lastly, your startxref value wasn't correct.
So here is a file that works. I repaired it in a text editor, and I put your original code as a comment after the %%EOF for comparison and reference:
%PDF-1.4
1 0 obj
<< /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >>
endobj
2 0 obj
<< /Type Outlines /Count 0 >>
endobj
3 0 obj
<< /Type /Pages /Kids [4 0 R] /Count 1 >>
endobj
4 0 obj
<< /Type /Page /Parent 3 0 R /MediaBox [0 0 612 792] /Contents 5 0 R /Resources << /ProcSet 6 0 R >> >>
endobj
5 0 obj
<< /Length 35 >>
stream
… Page-marking operators …
endstream
endobj
6 0 obj
[/PDF]
endobj
xref
0 7
0000000000 65535 f
0000000009 00000 n
0000000074 00000 n
0000000119 00000 n
0000000176 00000 n
0000000295 00000 n
0000000376 00000 n
trailer
<< /Size 7 /Root 1 0 R >>
startxref
394
%%EOF
%% %PDF-1.4 1 0 obj << /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj 2 0 obj << /Type Outlines /Count 0 >> endobj 3 0 obj << /Type /Pages /Kids [4 0 R] /Count 1 >> endobj 4 0 obj << /Type /Page /Parent 3 0 R /MediaBox [0 0 612 792] /Contents 5 0 R /Resources << /ProcSet 6 0 R >> >> endobj 5 0 obj << /Length 35 >> stream … Page-marking operators … endstream endobj 6 0 obj [/PDF] endobj xref 0 7 0000000000 65535 f 0000000009 00000 n 0000000074 00000 n 0000000119 00000 n 0000000176 00000 n 0000000295 00000 n 0000000373 00000 n trailer << /Size 7 /Root 1 0 R >> startxref 395