PDF: How to change/decode stream into text? - pdf

The document contains only Text no images the relveant portions of the PDF are as under:
trailer
<</Root 1 0 R>>
1 0 obj
<</Type/Catalog/Pages 3 0 R>>
endobj
3 0 obj
<</Type/Pages/Kids[4 0 R]/Count 1/Rotate 0/ITXT(5.0.6)>>
endobj
4 0 obj
<</Type/Page
/MediaBox[0 0 612 1008]
/Rotate 0
/Parent 3 0 R
/Resources<<
/ProcSet[/PDF/Text]
/ExtGState 12 0 R
/Font 13 0 R>>
/Contents 5 0 R
/Annots[24 0 R]>>
endobj
12 0 obj
<</R7 7 0 R>>
endobj
7 0 obj
<</Type/ExtGState /OPM 1>>
endobj
13 0 obj
<</R8 8 0 R
/R10 10 0 R>>
endobj
8 0 obj
<</BaseFont /LRSXWR+TimesNewRoman
/FontDescriptor 9 0 R
/Type/Font
/FirstChar 1
/LastChar 41
/Widths[
333 722 250 611 722 611 722 667 722 722 667 556 556 389
722 667 722 722 500 333 444 389 500 278 278 500 333 500
444 500 278 250 889 250 500 500 444 500 278 778 500]
/Encoding 16 0 R
/Subtype/TrueType>>
endobj
16 0 obj
<</Type/Encoding
/BaseEncoding/WinAnsiEncoding
/Differences[
1/I/N/space/T/H/E/G/C/O/U/R/F/P/J/A/B
/D/Y/asterisk/r/e/s/n/t/colon/o/f/h/a/p/l/period
/M/comma/d/v/c/two/i/m/u]
>>
endobj
The above information is provided for requirements purposes, the content object which I want to decoded as:
5 0 obj
<</Length 5950>>
stream
q 0.12 0 0 0.12 0 0 cm
/R7 gs
0 0 0 RG
0 0 0 rg
q
8.33333 0 0 8.33333 0 0 cm BT
/R8 14.0388 Tf
0.997231 0 0 1 90.1533 922.927 Tm
[
(SOH)-0.762768(STX)10.3078(ETX)10.019(EOT)10.888
(ENQ)-6.34593(ACK)10.888(ETX)-7.12126(ENQ)2.22552
(SOH)7.32006(BEL)-6.34489(ENQ)10.797(ETX)-7.1223
(BS)7.04592( )-6.34489(\n)10.797(VT)49.899
(EOT)28.0288(ETX)-7.12126( )2.22552(FF)-0.944827
(ETX)10.0196(\r)-0.945874(\n)-5.8573(STX)10.3083
(SQ)-13.6649(SI)10.798(DLE)-10.097(ETX)52.8727
(SI)11.2835(STX)-6.83247(DC1)2.22657(ETX)10.0175
(ENQ)-6.34489(SI)10.798(VT)49.8969(DC2)105.076
(SI)11.2856(STX)-6.83457(SI)53.6511(ETX)61.442
(SI)105.076(EOT)28.0288(ETX)-7.12335(BS)-1.52554
(ENQ)2.22657(SI)11.2835(STX)-6.83247(DC1)10.798
(SOH)-9.82286(BEL)2.22657(SI)
]TJ
412.949 0 Td
[(VT)-1.52763(ENQ)722.166]TJ
.......
.......

Decoding of PDF stream into text is not very simple, because you don't have anything like text there.
You have series of glyhps with very vairable meaning. In your case, you use font 13 0, that consist of 41 characters of /LRSXWR+TimesNewRoman with changes defined in obj 16 0, that has explanations of meanings of glyphs. You must have some translation table from "space" to " " (I'm quite surprised, that there is a glyph for space in your case). This may not be so simple in other cases. I've seen many times, that there was an embeded font with glyphs sorted by usage and there was no other than visual evidence, what which glyph may represent.
Are you sure you want to read the text from pdf files?

Related

PDF sign and resign is not recognized when the signatures are visible

The idea is to be able to sign a PDF file multiple times with my own PDF parser.
Reference files: here.
When the signature isn't going to be visible, all work ok. I sign 1.pdf once (2.pdf) and then twice (3.pdf), Adobe Acrobat recognizes the signature.
The problem arises when the signature should be visible. The first signing works correctly (2.pdf). However the second (3.pdf) fails, Acrobat says the first signature is invalidated and the second is not recognized.
As far as I can tell, the only difference between visible and invisible is the adding of the text object. Why adobe invalidates the first signature and why the second isn't recognized?
28 0 obj
<</BaseFont/Helvetica/Type/Font/Subtype/Type1/Encoding/WinAnsiEncoding/Name/Helv>>
endobj
29 0 obj
<</BaseFont/ZapfDingbats/Type/Font/Subtype/Type1/Name/ZaDb>>
endobj
31 0 obj<</Font 32 0 R>>
endobj
32 0 obj<</FAdESFont2 33 0 R>>
endobj
33 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj
34 0 obj
<</Length 90>>stream
BT
6 760 TD
/FAdESFont2 6 Tf
(m#turboirc.com MICHAIL CHOURDAKIS 1/23/2023 17:24:10) Tj
ET
endstream
endobj
26 0 obj
<</Type/XObject/Resources<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]>>/Subtype/Form/BBox[0 0 0 0]/Matrix [1 0 0 1 0 0]/Length 8/FormType 1/Filter/FlateDecode>>stream
xœ
endstream
endobj
3 0 obj
<</Contents[34 0 R 24 0 R 12 0 R]/CropBox[0.0 0.0 612.0 792.0]/MediaBox[0.0 0.0 612.0 792.0]/Parent 2 0 R/Resources 13 0 R/Rotate 0/Type/Page/Annots[17 0 R 27 0 R]>>
endobj
2 0 obj
<</Count 1/Kids[3 0 R]/Type/Pages>>
endobj
1 0 obj
<</AcroForm<</Fields[17 0 R 27 0 R]/DR<</Font<</Helv 28 0 R/ZaDb 29 0 R>>>>/DA(/Helv 0 Tf 0 g )/SigFlags 3>>/AcroForm<</Fields[17 0 R]/DR<</Font<</Helv 18 0 R/ZaDb 19 0 R>>>>/DA(/Helv 0 Tf 0 g )/SigFlags 3>>/Pages 2 0 R/Type/Catalog>>
endobj
14 0 obj
<</Producer(AdES Tools https://www.turboirc.com)/ModDate(D:20230123152410+00'00')>>
endobj
xref
Why adobe invalidates the first signature and why the second isn't recognized?
Because you add the visualizations of the signatures in an inappropriate way.
You add visualizations of the signatures by adding to the static page content (the page content streams). This is the wrong approach if you want to be able to add signatures to already signed PDFs, because manipulation of the static page content after signing is a forbidden change, see this answer.
The appropriate way to add visualizations of PDF signatures is by adding an appearance stream to the respective signature field widget.
For details you may want to study the PDF specification ISO 32000.

Visible signature in PDF file (2)

Continuing from this question, the PDF is now constructed as such:
8 0 obj
<</F 132/Type/Annot/Subtype/Widget/Rect[2 198 100 190]/FT/Sig/DR<<>>/T(Signature1)/V 6 0 R/P 3 0 R/AP<</N 7 0 R>>>>
endobj
6 0 obj
<</Contents <...>/Type/Sig/SubFilter/ETSI.CAdES.detached/M(D:20230128131946+00'00')/ByteRange [0 830 60832 1714]/Filter/Adobe.PPKLite>>
endobj
9 0 obj
<</BaseFont/Helvetica/Type/Font/Subtype/Type1/Encoding/WinAnsiEncoding/Name/Helv>>
endobj
10 0 obj
<</BaseFont/ZapfDingbats/Type/Font/Subtype/Type1/Name/ZaDb>>
endobj
12 0 obj<</Font 13 0 R>>
endobj
13 0 obj<</FAdESFont1 14 0 R>>
endobj
14 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj
15 0 obj
<</Length 90>>stream
BT
2 194 TD
/FAdESFont1 5 Tf
(m#turboirc.com MICHAIL CHOURDAKIS 1/28/2023 15:19:46) Tj
ET
endstream
endobj
7 0 obj
<</Type/XObject/Resources<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]>>/Subtype/Form/BBox[2 198 100 190]/Length 90/FormType 1/Filter/FlateDecode>>stream
BT
2 194 TD
/FAdESFont1 5 Tf
(m#turboirc.com MICHAIL CHOURDAKIS 1/28/2023 15:19:46) Tj
ET
endstream
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 4 0 R>>>>/Contents 5 0 R/Annots[8 0 R]>>
endobj
2 0 obj
<</Type/Pages/MediaBox[0 0 200 200]/Count 1/Kids[3 0 R]>>
endobj
1 0 obj
<</AcroForm<</Fields[8 0 R]/DR<</Font<</Helv 9 0 R/ZaDb 10 0 R>>>>/DA(/Helv 0 Tf 0 g )/SigFlags 3>>/Type/Catalog/Pages 2 0 R>>
endobj
11 0 obj
<</Producer(AdES Tools https://www.turboirc.com)/ModDate(D:20230128131946+00'00')>>
endobj
xref
0 4
0000000000 65535 f
0000061862 00000 n
0000061787 00000 n
0000061681 00000 n
6 10
0000000810 00000 n
0000061409 00000 n
0000000679 00000 n
0000060958 00000 n
0000061056 00000 n
0000062004 00000 n
0000061133 00000 n
0000061165 00000 n
0000061203 00000 n
0000061271 00000 n
trailer
<</Root 1 0 R/Prev 492/Info 11 0 R/Size 20/ID[<6BD3BF95416A5C19FFBC464EC610875C><54ACC00AA74869363131BCC04E65417F>]>>
startxref
62104
%%EOF
The idea is:
Create the annotation object (ID 8) which refers to the signature /V (6) and something to show ? /N (8).
The annotation object is a stream containing the text?
7 0 obj <</Type/XObject/Resources<</ProcSet [/PDF /Text /ImageB /ImageC /ImageI]>>/Subtype/Form/BBox[2 198 100 190]/Length 90/FormType 1/Filter/FlateDecode>>stream
BT
2 194 TD
/FAdESFont1 5 Tf
(m#turboirc.com MICHAIL CHOURDAKIS 1/28/2023 15:19:46) Tj
ET
endstream
endobj
This time adobe accepts the signature and has a "box" in which I can click to show signature information, but the text (mail name date) is not displayed.
What am I missing?
In the previous mode I was changing the content of the original root by I learned from this question that this is an incorrect way of adding a visible signature and will not work for re-signing.
Your appearance stream in object 7 has some errors, in particular
Its resources dictionary does not contain a fonts section; so how should the text in it be rendered?
It claims to be flate-encoded but obviously is not.

(Manually Created) PDF is working fine on Ubuntu but its not working in windows?

I am trying to create a table in pdf using PDF codes . I have successfully created a table and its working fine in LinuX(Ubuntu) but when am trying to opening in windows its showing me an error message that "the file has been damaged".Here is my edited code,
%PDF-1.5
%âãÏÓ
1 0 obj
<<
/PageLayout /OneColumn
/MarkInfo
<<
/Marked true
>>
/Outlines 2 0 R
/Lang <feff0045004e002d00550053>
/Pages 3 0 R
/StructTreeRoot 4 0 R
/Type /Catalog
>>
endobj
2 0 obj
<<
/First 5 0 R
/Type /Outlines
/Count 1
/Last 5 0 R
>>
endobj
3 0 obj
<<
/Kids [6 0 R]
/Type /Pages
/Count 1
>>
endobj
4 0 obj
<<
/ParentTree 7 0 R
/RoleMap 8 0 R
/ParentTreeNextKey 1
/K 9 0 R
/Type /StructTreeRoot
>>
endobj
5 0 obj
<<
/Title (Example table)
/Parent 2 0 R
/A 10 0 R
>>
endobj
6 0 obj
<<
/CropBox [0.0 0.0 612.0 792.0]
/Rotate 0
/StructParents 0
/Parent 3 0 R
/Resources
<<
/ColorSpace
<<
/CS1 11 0 R
/CS0 12 0 R
>>
/Font
<<
/TT2 13 0 R
/TT1 14 0 R
/TT0 15 0 R
>>
>>
/MediaBox [0.0 0.0 612.0 792.0]
/Type /Page
/Contents [16 0 R 17 0 R]
>>
endobj
9 0 obj
<<
/P 4 0 R
/K [18 0 R 19 0 R 20 0 R 21 0 R]
/S /Sect
>>
endobj
7 0 obj
<<
/Nums [0 22 0 R]
>>
endobj
8 0 obj
<<
/Subscript /Span
/Diagram /Figure
/Strikeout /Span
/Outline /Span
/DropCap /Figure
/InlineShape /Figure
/Footnote /Note
/Annotation /Span
/Underline /Span
/Superscript /Span
/Chart /Figure
/Endnote /Note
/TextBox /Art
>>
endobj
10 0 obj
<<
/D [6 0 R /XYZ 72 720 0.0]
/S /GoTo
>>
endobj
16 0 obj
<<
/Length 1991
>>
stream
BT
/H1 <</MCID 0 >>BDC
/CS0 cs 0.212 0.373 0.569 scn
/TT0 1 Tf
0.002 Tw 14.04 0 0 14.04 72 682.8 Tm
[(E)-3(x)4(a)-3(m)1(p)-1(le)10( t)6(a)-3(b)1(le)]TJ
0 Tw 6.496 0 Td
( )Tj
EMC
/P <</MCID 1 >>BDC
/CS1 cs 0 scn
/TT1 1 Tf
0.001 Tc -0.001 Tw 15.96 0 0 15.96 72 664.44 Tm
[(T)-1(hi)-3(s)1( )1(i)-3(s)1( )1(a)-1(n e)3(x)-2(a)-1(m)3(pl)-3(e)3( )-7(o)2(f)-2( )1(a)-1( da)-1(t)-2(a)-1( t)-2(a)-1(bl)-3(e)3(.)]TJ
0 Tc 0 Tw 13.789 0 Td
( )Tj
EMC
ET
/TH <</MCID 3 >>BDC
/CS0 cs 0.553 0.702 0.886 scn
84.84 632.64 76.68 14.88 re
f*
84.84 591.36 5.16 41.28 re
f*
156.36 591.36 5.16 41.28 re
f*
84.84 576.48 76.68 14.88 re
f*
EMC
/P <</MCID 4 >>BDC
90 618 66.36 14.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
0.004 Tc -0.004 Tw 12 0 0 12 90 621.24 Tm
[(D)4(is)3(a)8(b)1(il)10(it)1(y)8( )]TJ
ET
/CS0 cs 0.553 0.702 0.886 scn
90 591.36 66.36 26.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
-0.004 Tc 0.004 Tw 12 0 0 12 90 606.6 Tm
[(C)-5(at)-7(e)-1(go)-6(r)-9(y)]TJ
0 Tc 0 Tw ( )Tj
ET
EMC
/TH <</MCID 7 >>BDC
/CS0 cs 0.553 0.702 0.886 scn
162 625.32 71.76 22.2 re
f*
162 598.68 5.16 26.64 re
f*
228.6 598.68 5.16 26.64 re
f*
162 576.48 71.76 22.2 re
f*
EMC
/P <</MCID 8 >>BDC
167.16 598.68 61.44 26.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
0.003 Tc -0.003 Tw 12 0 0 12 167.16 613.92 Tm
[(P)5(a)7(r)-2(ti)-1(c)1(i)9(pa)7(nts)]TJ
0 Tc 0 Tw 4.95 0 Td
( )Tj
ET
EMC
/TH <</MCID 11 >>BDC
/CS0 cs 0.553 0.702 0.886 scn
234.24 632.64 71.52 14.88 re
f*
234.24 591.36 5.16 41.28 re
f*
300.6 591.36 5.16 41.28 re
f*
234.24 576.48 71.52 14.88 re
f*
EMC
/P <</MCID 12 >>BDC
239.4 618 61.2 14.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
0.004 Tc -0.004 Tw 12 0 0 12 239.4 621.24 Tm
[(B)5(a)8(llo)2(t)1(s)13( )]TJ
ET
/CS0 cs 0.553 0.702 0.886 scn
239.4 591.36 61.2 26.64 re
f*
BT
/CS1 cs 0 scn
/TT2 1 Tf
-0.003 Tc 0.003 Tw 12 0 0 12 239.4 606.6 Tm
[(C)-4(o)-5(mp)-6(l)-7(et)-6(ed)]TJ
0 Tc 0 Tw 4.55 0 Td
( )Tj
ET
EMC
endstream
endobj
17 0 obj
<<
/Length 707
>>
stream
/P <</MCID 42 >>BDC
q
84.84 550.56 76.68 25.44 re
W n
BT
/TT1 1 Tf
-0.001 Tc 0.001 Tw 11.04 0 0 11.04 90 565.56 Tm
[(Blin)2(d)]TJ
ET
Q
q
84.84 550.56 76.68 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 112.56 565.56 Tm
( )Tj
ET
EMC
/P <</MCID 46 >>BDC
Q
q
162 550.56 71.76 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 195.12 565.56 Tm
(5)Tj
ET
Q
q
162 550.56 71.76 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 200.64 565.56 Tm
( )Tj
ET
EMC
/P <</MCID 50 >>BDC
Q
q
234.24 550.56 71.519 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 267.24 565.56 Tm
(1)Tj
ET
Q
q
234.24 550.56 71.519 25.44 re
W n
BT
/TT1 1 Tf
11.04 0 0 11.04 272.76 565.56 Tm
( )Tj
ET
EMC
endstream
endobj
12 0 obj /DeviceRGB
endobj
11 0 obj /DeviceRGB
endobj
15 0 obj
<<
/BaseFont /Times-Roman
/Subtype /Type1
/Type /Font
/Encoding /WinAnsiEncoding
>>
endobj
14 0 obj
<<
/BaseFont /Helvetica
/Subtype /Type1
/Type /Font
/Encoding /WinAnsiEncoding
>>
endobj
13 0 obj
<<
/BaseFont /Courier
/Subtype /Type1
/Type /Font
/Encoding /WinAnsiEncoding
>>
endobj
18 0 obj
<<
/Pg 6 0 R
/P 9 0 R
/K 0
/S /H1
>>
endobj
19 0 obj
<<
/Pg 6 0 R
/P 9 0 R
/K 1
/S /P
>>
endobj
20 0 obj
<<
/P 9 0 R
/A 23 0 R
/K [24 0 R 25 0 R]
/S /Table
>>
endobj
21 0 obj
<<
/Pg 6 0 R
/P 9 0 R
/K 144
/S /P
>>
endobj
22 0 obj [18 0 R 19 0 R null 26 0 R 27 0 R null null 28 0 R 29 0 R null null 30 0 R 31 0 R null null null null null null null null null 24 0 R null null null null null null null null null null null null null null null null null null null 32 0 R null null null 33 0 R null null null 34 0 R null null null null null null null null null null null null null 25 0 R null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null 35 0 R null null null 36 0 R null null null 37 0 R null null null null null null null null null null null null null 38 0 R null null null null null null null null null null null null null null null null null null null null null null null null null 21 0 R]
endobj
23 0 obj
<<
/O /Layout
/Placement /Block
/BBox [84.11 446.51 545.89 648.25]
>>
endobj
24 0 obj
<<
/Pg 6 0 R
/P 20 0 R
/K [26 0 R 28 0 R 30 0 R]
/S /TR
>>
endobj
25 0 obj
<<
/Pg 6 0 R
/P 20 0 R
/K [39 0 R 40 0 R 41 0 R]
/S /TR
>>
endobj
38 0 obj
<<
/Pg 6 0 R
/P 20 0 R
/K [42 0 R 43 0 R 44 0 R]
/S /TR
>>
endobj
26 0 obj
<<
/Pg 6 0 R
/P 24 0 R
/K [27 0 R]
/S /TH
>>
endobj
27 0 obj
<<
/Pg 6 0 R
/P 26 0 R
/K 4
/S /P
>>
endobj
28 0 obj
<<
/Pg 6 0 R
/P 24 0 R
/K [29 0 R]
/S /TH
>>
endobj
29 0 obj
<<
/Pg 6 0 R
/P 28 0 R
/K 8
/S /P
>>
endobj
30 0 obj
<<
/Pg 6 0 R
/P 24 0 R
/K [11 31 0 R]
/S /TH
>>
endobj
31 0 obj
<<
/Pg 6 0 R
/P 30 0 R
/K 12
/S /P
>>
endobj
32 0 obj
<<
/Pg 6 0 R
/P 39 0 R
/K 42
/S /P
>>
endobj
33 0 obj
<<
/Pg 6 0 R
/P 40 0 R
/K 46
/S /P
>>
endobj
34 0 obj
<<
/Pg 6 0 R
/P 41 0 R
/K 50
/S /P
>>
endobj
35 0 obj
<<
/Pg 6 0 R
/P 42 0 R
/K 96
/S /P
>>
endobj
36 0 obj
<<
/Pg 6 0 R
/P 43 0 R
/K 100
/S /P
>>
endobj
37 0 obj
<<
/Pg 6 0 R
/P 44 0 R
/K 104
/S /P
>>
endobj
39 0 obj
<<
/P 25 0 R
/K 32 0 R
/S /TD
>>
endobj
40 0 obj
<<
/P 25 0 R
/K 33 0 R
/S /TD
>>
endobj
41 0 obj
<<
/P 25 0 R
/K 34 0 R
/S /TD
>>
endobj
42 0 obj
<<
/P 38 0 R
/K 35 0 R
/S /TD
>>
endobj
43 0 obj
<<
/P 38 0 R
/K 36 0 R
/S /TD
>>
endobj
44 0 obj
<<
/P 38 0 R
/K 37 0 R
/S /TD
>>
endobj xref
0 45
0000000000 65535 f
0000000015 00000 n
0000000190 00000 n
0000000263 00000 n
0000000322 00000 n
0000000430 00000 n
0000000500 00000 n
0000000849 00000 n
0000000889 00000 n
0000000775 00000 n
0000001130 00000 n
0000004027 00000 n
0000003999 00000 n
0000004203 00000 n
0000004130 00000 n
0000004055 00000 n
0000001190 00000 n
0000003237 00000 n
0000004274 00000 n
0000004329 00000 n
0000004383 00000 n
0000004460 00000 n
0000004516 00000 n
0000005296 00000 n
0000005384 00000 n
0000005461 00000 n
0000005615 00000 n
0000005678 00000 n
0000005733 00000 n
0000005796 00000 n
0000005851 00000 n
0000005917 00000 n
0000005973 00000 n
0000006029 00000 n
0000006085 00000 n
0000006141 00000 n
0000006197 00000 n
0000006254 00000 n
0000005538 00000 n
0000006311 00000 n
0000006362 00000 n
0000006413 00000 n
0000006464 00000 n
0000006515 00000 n
0000006566 00000 n
trailer
<<
/Root 1 0 R
/Size 45
>>
startxref
6616
%%EOF
Note: "opening in windows" is a non-statement. You cannot "open" a PDF in Windows, you need certain software to do so. Presumably, you tried using Acrobat Reader or something alike (the error message you quote is from Acrobat Reader).
It works in Mac OS X Preview, but then again that doesn't really tell us very much. Preview is written by Apple, and it's not a really conforming PDF reader (much to the dismay of anyone using, for example, transparency or color spaces). You did not provide an image of what your document is supposed to look like; is it anything like this?
But it does not open in Acrobat X.(a) Inspecting the PDF offsets -- the most likely place for an error -- I found the xref offsets are wrong from 11 0 obj onwards. This leads to a wrong offset of +89 bytes for all next objects, up to and including the ending startxref 6616, which IMO should be 6527.
I manually fixed the 34 wrong offsets by comparing the position of every X 0 obj with a hex editor, and saved with cr line endings. I got an error from my own inspecting tool:
The keyword stream that follows the stream dictionary should be followed by either a
carriage return and a line feed or by just a line feed, and not by a carriage return alone.
(PDF Reference 1.7, §3.2.7)
so I resaved with lf line endings. No errors, it shows correctly in Preview but still not in Acrobat X.
I noticed the /Length keys for objects 16 and 17, the Page Contents objects, were off as well. After correcting them to 1887 and 648, respectively, it still displays in Preview but still not in Acrobat X.
The problem appears to lie in these contents. Requesting for an Inventory shows the error message: "An error occurred while parsing a contents stream. Unable to analyze the PDF file.", and browsing the internal PDF structure I get to see a first handful of text formatting commands from 16 0 obj but they stop at the 15th command:
/CS1 cs 0 scn
and the next command, /TT1 1 Tf, never gets seen.
Ooo-kay. Checking the parameters for scn, I see their number depends on the color space set using cs; and there is your problem.
Both 11 0 obj and 12 0 obj set color spaces, and they both set it to /DeviceRGB. So the number of parameters for /CS1 (defined in 11 0 obj) is wrong -- you only supply one. It's safe to assume you meant this one to be /DeviceGray, and lo and behold, after that final change I got to see this in Acrobat X:
and a proper Inventory and fully browsable PDF structure.
There were lots of minor problems with this file, but the PDF format in itself is quite resilient. The bad offsets, and possibly the lengths, may have been silently corrected (the PDF specification allows that) but the bad parameters for the color space were killing it.
(a) Clarification after re-reading: it does open in Acrobat but silently shows a blank page only; no error message of any kind.
Addition
This made me think: was the /DeviceRGB the only cause of it failing in Acrobat X? No: after reloading the original PDF and changing just that one line, Acrobat says the file is damaged beyond repair. So all that extra checking I did wasn't for nothing, fortunately.

What is the smallest possible valid PDF?

Out of simple curiosity, having seen the smallest GIF, what is the smallest possible valid PDF file?
This is an interesting problem. Taking it by the book, you can start off with this:
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
xref
0 4
0000000000 65535 f
0000000010 00000 n
0000000053 00000 n
0000000102 00000 n
trailer<</Size 4/Root 1 0 R>>
startxref
149
%EOF
which is 291 bytes of PDF joy. Acrobat opens it, but it complains somewhat. There is one page in it and it is 3/72" square, the minimum allowed by the spec.
However, Acrobat X doesn't even bother with the cross reference table anymore, so we can take that out:
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
trailer<</Size 4/Root 1 0 R>>
Acrobat complains, but opens it. Now we're at 178 bytes.
Turns out that you don't need that /Size in the trailer. Now we're at 172:
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
trailer<</Root 1 0 R>>
Turns out you don't need all those pesky /Type elements in your dictionaries:
%PDF-1.0
1 0 obj<</Pages 2 0 R>>endobj 2 0 obj<</Kids[3 0 R]/Count 1>>endobj 3 0 obj<</MediaBox[0 0 3 3]>>endobj
trailer<</Root 1 0 R>>
Now we're at 138 bytes.
It also turns out that when the spec says "shall be an indirect reference" and /Count is required, and the header "must" be %PDF-1.0, they're making loose suggestions. This is the smallest I could make it and have it openable in Acrobat X:
%PDF-1.
trailer<</Root<</Pages<</Kids[<</MediaBox[0 0 3 3]>>]>>>>>>
70 bytes.
Now, my editor uses Windows newline discipline, but Acrobat accepts Windows, Mac, or Unix conventions, so by using a hex editor, I replaced the \r\n with \r and removed the last newline altogether, which leaves me with 67 bytes
25 50 44 46 2D 31 2E 0D 74 72 61 69 6C 65 72 3C
3C 2F 52 6F 6F 74 3C 3C 2F 50 61 67 65 73 3C 3C
2F 4B 69 64 73 5B 3C 3C 2F 4D 65 64 69 61 42 6F
78 5B 30 20 30 20 33 20 33 5D 3E 3E 5D 3E 3E 3E
3E 3E 3E
I tried taking off the last end dictionary (>>), but Acrobat wouldn't have that. The PDF reading built-in to Google Chrome (FoxIt) won't open it.
As a PostScript (HA! See what I did there?), if you consent to Acrobat "repairing" the file, it bumps up to 3550 bytes, most of it optional metadata, but it leaves behind a number of clear spec violations.
I could not get the hello world example to open.
For a small-ish file with text content :
%PDF-1.2
9 0 obj
<<
>>
stream
BT/ 9 Tf(Test)' ET
endstream
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Contents 9 0 R
>>
endobj
5 0 obj
<<
/Kids [4 0 R ]
/Count 1
/Type /Pages
/MediaBox [ 0 0 99 9 ]
>>
endobj
3 0 obj
<<
/Pages 5 0 R
/Type /Catalog
>>
endobj
trailer
<<
/Root 3 0 R
>>
%%EOF
Based on all the answers here, here's the smallest PDF with text:
SMALL_PDF = (
b"%PDF-1.2 \n"
b"9 0 obj\n<<\n>>\nstream\nBT/ 32 Tf( YOUR TEXT HERE )' ET\nendstream\nendobj\n"
b"4 0 obj\n<<\n/Type /Page\n/Parent 5 0 R\n/Contents 9 0 R\n>>\nendobj\n"
b"5 0 obj\n<<\n/Kids [4 0 R ]\n/Count 1\n/Type /Pages\n/MediaBox [ 0 0 250 50 ]\n>>\nendobj\n"
b"3 0 obj\n<<\n/Pages 5 0 R\n/Type /Catalog\n>>\nendobj\n"
b"trailer\n<<\n/Root 3 0 R\n>>\n"
b"%%EOF"
)
As base64. Copy this and test in Chrome:
data:application/pdf;base64,JVBERi0xLjIgCjkgMCBvYmoKPDwKPj4Kc3RyZWFtCkJULyAzMiBUZiggIFlPVVIgVEVYVCBIRVJFICAgKScgRVQKZW5kc3RyZWFtCmVuZG9iago0IDAgb2JqCjw8Ci9UeXBlIC9QYWdlCi9QYXJlbnQgNSAwIFIKL0NvbnRlbnRzIDkgMCBSCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9LaWRzIFs0IDAgUiBdCi9Db3VudCAxCi9UeXBlIC9QYWdlcwovTWVkaWFCb3ggWyAwIDAgMjUwIDUwIF0KPj4KZW5kb2JqCjMgMCBvYmoKPDwKL1BhZ2VzIDUgMCBSCi9UeXBlIC9DYXRhbG9nCj4+CmVuZG9iagp0cmFpbGVyCjw8Ci9Sb290IDMgMCBSCj4+CiUlRU9G
To make the page bigger, adjust the MediaBox dimensions :)
/MediaBox [ 0 0 250 50 ]
I thought I'd make a smallest pdf that displays "Hello World". The text is in the lower left corner. Sorry about the 9-point font, any larger would cost an extra byte :)
172 bytes for Adobe Reader X (if saved with linefeed-only newlines and no trailing newline or null-byte):
%PDF-1.
1 0 obj<</Kids[<</Parent 1 0 R/Resources<<>>/Contents 2 0 R>>]>>endobj 2 0 obj<<>>stream
BT/ 9 Tf(Hello World)' ET
endstream
endobj trailer<</Root<</Pages 1 0 R>>>>
120 bytes for Chrome's builtin PDF viewer:
%PDF 1 0 obj<</Pages<</Kids[<</Contents<<>>stream
BT 9 Tf(Hello World)' ET endstream>>]>>>>endobj trailer<</Root 1 0 R>>
To easily see this in Chrome, paste this URI in the address bar (SO won't let me link to it, and it won't work at all in other browsers):
data:application/pdf,%25PDF%201%200%20obj%3C%3C%2FPages%3C%3C%2FKids%5B%3C%3C%2FContents%3C%3C%3E%3Estream%0ABT%209%20Tf(Hello%20World)'%20ET%20endstream%3E%3E%5D%3E%3E%3E%3Eendobj%20trailer%3C%3C%2FRoot%201%200%20R%3E%3E
I was going to give an example of what I thought was the minimal valid "universal" PDF. until I noticed that the whole ethos of using a PDF is to ensure it will render exactly the same on all devices and their PDF readers. However on cross checking my "perfectly small well formed PDF" I spotted this. TL;DR this is fixed in my personal minimal text template (at the end)
So the ground rule was "smallest possible valid PDF" but I consider this shortage should count as an invalid PDF since it does not adhere to the concept of "Fit for Purpose" thus the minimum PDF must itself as a minimum contain a minimum of one means of fixing a working font.
To explain my proposed solution and why its less than perfect here it is in a rough form because of cut and paste.
%PDF-1.0
%µ¶
1 0 obj
<</Type/Catalog/Pages 2 0 R>>
endobj
2 0 obj
<</Kids[3 0 R]/Count 1/Type/Pages/MediaBox[0 0 595 792]>>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Contents 4 0 R/Resources<<>>>>
endobj
4 0 obj
<</Length 58>>
stream
q
BT
/ 96 Tf
1 0 0 1 36 684 Tm
(Hello World!) Tj
ET
Q
endstream
endobj
xref
0 5
0000000000 65536 f
0000000016 00000 n
0000000062 00000 n
0000000136 00000 n
0000000209 00000 n
trailer
<</Size 5/Root 1 0 R>>
startxref
316
%%EOF
Whilst not defined by the rules of the question I have included some past experience of user problems.
The first difference you might note is media box in 2nd obj is a hybrid MediaBox[0 0 595 792] which is a minimax A4 width and minimax US Letter high, since otherwise the "universal page" in most countries would force a second sheet # 100% scale printing either for too wide or too high a page definition for the locale defaults.
And the current problem is evidenced in 3rd obj as no fonts have been set for resources, thus in aiming for minimal the PDF, I contest without a font defined, will be Invalid.
Thus none of the answers so far including my own, appear to produce a PDF that will "WORK" as a "VALID" means to produce the same printout, regardless of platform or viewer.
Turning to libraries I found a 3MB zip with an exceptionally versatile windows.exe (a single file that can do most pdf functions like split merge import stamp export attachments etc.) which can take "Hello World! in a command line and produce a good working file, this is page centre zoomed in
it uses a stream for the text and its positioning, and has other conforming data like producer so I offer this as a potentially good minimal to pare down, note as presented this file will appear blank due to stream corruption from binary to text.
%PDF-1.7
%µ¶
1 0 obj
<</Pages 2 0 R/Type/Catalog>>
endobj
2 0 obj
<</Count 1/Kids[5 0 R]/MediaBox[0 0 595 792]/Type/Pages>>
endobj
3 0 obj
<</BaseFont/Helvetica/Encoding/WinAnsiEncoding/Subtype/Type1/Type/Font>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 101>>
stream
xœ*Tp
QÐw3P04Ò30PISp
Q01
à˜kdf¢ga¬`bhâ%ç‚ô(„”#©Aîè"EéÚlA
HW‘‚†GjNN¾Bx~QNŠ¢¦BHÈÞ## ÿÿFå
endstream
endobj
5 0 obj
<</Contents 4 0 R/CropBox[0 0 595 792]/MediaBox[0 0 595 792]/Parent 2 0 R/Resources<</Font<</F0 3 0 R>>>>/Type/Page>>
endobj
6 0 obj
<</CreationDate(D:20220600600709+01'00')/ModDate(D:20220600600709+01'00')/Producer(me 2)>>
endobj
xref
0 7
0000000000 65536 f
0000000016 00000 n
0000000062 00000 n
0000000136 00000 n
0000000225 00000 n
0000000395 00000 n
0000000529 00000 n
trailer
<</Size 7/Info 6 0 R/Root 1 0 R/ID[<A2A0CE5CCD9D0DABD5845AD574BF0A5C><09BF9D281BE12CB5B5933BB2B62B0D4D>]>>
startxref
636
%%EOF
P.S I deliberately added a non valid item so is intentionally not the minimum working answer, see if you can work out what's clearly wrong:-)
My personal offering
So I am often asked how to write plain text templated PDFs thus need the font to be static (Helvetica or Courier should do) and a structure that is easy to modify using windows CMD line, so this suits my purpose its now 698 bytes as shown with two place holders to show multi-line so if needed can find and replace Helvetica with Courier (note intentional 2 spaces after to keep byte count)
%PDF-1.1
%âã
1 0 obj
<</Type/Catalog/Pages<</Type/Pages/Count 1/Kids[2 0 R]>>>>
endobj
2 0 obj
<</Type/Page/Parent 1 0 R/MediaBox[0 0 594 792]/Resources<</Font<</F1 3 0 R>>/ProcSet[/PDF/Text]>>/Contents 4 0 R>>
endobj
3 0 obj
<</Type/Font/Subtype/Type1/Name/F1/BaseFont/Helvetica>>
endobj
4 0 obj
<</Length 5 0 R>>
stream
BT
/F1 36 Tf
1 0 0 1 255 752 Tm
48 TL
( Hello)'
(World!)'
ET
endstream
endobj
5 0 obj
78
endobj
xref
0 6
0000000000 65536 f
0000000017 00000 n
0000000094 00000 n
0000000228 00000 n
0000000302 00000 n
0000000425 00000 n
trailer
<</Size 6/Info <</CreationDate(D:2023)/Producer(cmd2pdf)/Title(mini.pdf)>>/Root 1 0 R>>
startxref
446
%%EOF
To see how this approach works in windows command line RIGHT CLICK and download as text https://github.com/GitHubRulesOK/MyNotes/raw/master/MAKE-PDF.cmd (now 200 lines long!) NOTE browser security may ask you to trust a cmd as download thus use .txt extension and you will still need to change properties to UNBLOCK once you are happy it should do no harm to run it!
#mkl are you up for producing your best shot ?
According to this Ange Albertini lecture, the smallest possible valid PDF is 36 bytes:
%PDF-(NULL)trailer<</Root<</Pages<<>>>>>>
Where (NULL) is the unprintable ASCII 0 character.
However, as Ange notes, while this PDF is technically valid, most PDF reader apps will regard it as invalid based on the size alone, thus failing to open it.
I needed a PDF version which is usable by a PDF converter (A4 format issue.. all the above constructs worked with Adobe Reader and Chrome, but not with the PDF converter which required DIN A4).
I found this site and this PDF worked fine with the PDF converter I'm using: https://help.callassoftware.com/m/73261/l/798383-how-to-create-a-simple-pdf-file
Working for a PDF related company, I know that the following content will be working pretty well. This is a valid empty A4 page:
%PDF-1.4
%âãÏÓ
5 0 obj
<<
/Length 1
>>
stream
endstream
endobj
4 0 obj
<<
/Type /Page
/MediaBox [0 0 612 792]
/Resources <<
>>
/Contents 5 0 R
/Parent 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/Kids [4 0 R]
/Count 1
>>
endobj
1 0 obj
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
3 0 obj
<<
/Creator (PDF Creator http://www.pdf-tools.com)
/CreationDate (D:20150701112447+02'00')
/ModDate (D:20220607183602+02'00')
/Producer (3-Heights\222 PDF Optimization Shell 6.0.0.0 \(http://www.pdf-tools.com\))
>>
endobj
xref
0 6
0000000000 65535 f
0000000226 00000 n
0000000169 00000 n
0000000275 00000 n
0000000065 00000 n
0000000015 00000 n
trailer
<<
/Size 6
/Root 1 0 R
/Info 3 0 R
/ID [<1C3500CA9F7232B97E0EF3F789E8B7F2> <254C8D153F655D49945EAD68D801E011>]
>>
startxref
505
%%EOF
Now using Javascript, you can embed this into your js bundle. First encode in base64 the content above, then use the encoded string and create a Blob file with it by writing:
const str = 'JVBERi0xLjQKJcOiw6PDj8OTCjUgMCBvYmoKPDwKL0xlbmd0aCAxCj4+CnN0cmVhbQogCmVuZHN0cmVhbQplbmRvYmoKNCAwIG9iago8PAovVHlwZSAvUGFnZQovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Cj4+Ci9Db250ZW50cyA1IDAgUgovUGFyZW50IDIgMCBSCj4+CmVuZG9iagoyIDAgb2JqCjw8Ci9UeXBlIC9QYWdlcwovS2lkcyBbNCAwIFJdCi9Db3VudCAxCj4+CmVuZG9iagoxIDAgb2JqCjw8Ci9UeXBlIC9DYXRhbG9nCi9QYWdlcyAyIDAgUgo+PgplbmRvYmoKMyAwIG9iago8PAovQ3JlYXRvciAoUERGIENyZWF0b3IgaHR0cDovL3d3dy5wZGYtdG9vbHMuY29tKQovQ3JlYXRpb25EYXRlIChEOjIwMTUwNzAxMTEyNDQ3KzAyJzAwJykKL01vZERhdGUgKEQ6MjAyMjA2MDcxODM2MDIrMDInMDAnKQovUHJvZHVjZXIgKDMtSGVpZ2h0c1wyMjIgUERGIE9wdGltaXphdGlvbiBTaGVsbCA2LjAuMC4wIFwoaHR0cDovL3d3dy5wZGYtdG9vbHMuY29tXCkpCj4+CmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYKMDAwMDAwMDIyNiAwMDAwMCBuCjAwMDAwMDAxNjkgMDAwMDAgbgowMDAwMDAwMjc1IDAwMDAwIG4KMDAwMDAwMDA2NSAwMDAwMCBuCjAwMDAwMDAwMTUgMDAwMDAgbgp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKL0luZm8gMyAwIFIKL0lEIFs8MUMzNTAwQ0E5RjcyMzJCOTdFMEVGM0Y3ODlFOEI3RjI+IDwyNTRDOEQxNTNGNjU1RDQ5OTQ1RUFENjhEODAxRTAxMT5dCj4+CnN0YXJ0eHJlZgo1MDUKJSVFT0Y=';
const blob = new Blob([atob(str)], { type: 'application/pdf' });
In Java, use this:
private static String samplepdf = "255044462D312E0D747261696C65723C3C2F526F6F743C3C2F50616765733C3C2F4B6964735B3C3C2F4D65646961426F785B302030203320335D3E3E5D3E3E3E3E3E3E";
and then
byte[] bytes = hexStringToByteArray(samplepdf);
...
public byte[] hexStringToByteArray(String s) {
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
+ Character.digit(s.charAt(i + 1), 16));
}
return data;
}

Writing multiline text in pdf page

I want to write a multiline text, I've tried this:
6 0 obj
<</Length 59>>
stream
BT /F1 24 Tf 100 520 Td (This is test\n This is test)Tj ET
endstream
endobj
But I am not getting a new line. Is there a simple way to achieve that or I must create another stream with position of the next line?
This is the full code:
%PDF-1.5
1 0 obj <</Type /Catalog /Pages 2 0 R>>
endobj
2 0 obj <</Type /Pages /Kids [3 0 R] /Count 1>>
endobj
3 0 obj<</Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 500 700] /Contents 6 0 R>>
endobj
4 0 obj<</Font <</F1 5 0 R>>>>
endobj
5 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj
6 0 obj
<</Length 75>>
stream
BT
/F1 24 Tf
100 520 Td
(This is test) Tj
T*
(This is test) Tj
ET
endstream
endobj
xref
0 7
0000000000 65535 f
0000000009 00000 n
0000000059 00000 n
0000000116 00000 n
0000000219 00000 n
0000000259 00000 n
0000000328 00000 n
trailer <</Size 7/Root 1 0 R>>
startxref
454
%%EOF
You may want to do something like this:
BT
/F1 24 Tf
30 TL
100 520 Td
(This is test) Tj
T*
(This is test) Tj
ET
or the shorter form:
BT
/F1 24 Tf
30 TL
100 520 Td
(This is test) Tj
(This is test) '
ET
You might want to read up on section 9.4.3 Text-Showing Operators in the PDF specification ISO 32000-1.
P.S.: Added text leading TL operators.