Pdf setting the font color to the text - pdf

I am trying to add some text to a pdf file manually.I was able to add new text with a specific font. But i am not able to set the font color. So how can i do it manually?
(I just want to change these manually as i already have the code where i write these byte to make the pdf file)
Also how can i use graphic states specified in the pdf standard to manipulate the text so that feature changes does not affect the color changes etc.How exactly i can use the graphic state?
Source pdf file click here
Modified pdf file clcik here

The PDF color operators are listed in Table 74 of the PDF specification ISO 32000-1:2008.
In your case your added content stream is
104 0 obj
<</Length 105 0 R>>stream
/Helv 8 Tf
BT
1 0 0 1 15.67 150 Tm
(l)Tj
ET
/Helv 8 Tf
BT
1 0 0 1 17.88 190 Tm
(abcdefghijklmnopqr)Tj
ET
endstream
endobj
If e.g. you want the writing to be filled with red in a RGB color space, you add an 1 0 0 rg:
104 0 obj
<</Length 105 0 R>>stream
BT
1 0 0 1 15.67 150 Tm
/Helv 8 Tf
1 0 0 rg
[...]
EDIT
If you are afraid that that change may affect later text, remember to use the Graphics State Stack operators q and Q (cf. section 8.4.2 of the PDF specification). E.g.
q
0 1 -1 0 595.22 0 cm
q
BT
1 0 0 1 36 540 Tm
/Xi0 12 Tf
0.75 g
(Hello people!)Tj
0 g
ET
Q
Q
(Copied from How to add text object to existing pdf)

Related

Why is this vertical text positioning working?

The PDF content below renders with the correct vertical positions, but how?
1 0 0 -1 0 792 cm
q
.75 0 0 .75 72 192.75 cm
BT
/F4 14.666667 Tf
1 0 0 -1 0 .80265617 Tm
0 -13.2773438 Td <0030> Tj
12.2087708 0 Td <0024> Tj
8.6870575 0 Td <003C> Tj
9.7756042 0 Td <0032> Tj
11.4001007 0 Td <0035> Tj
ET
Q
q
.75 0 0 .75 72 222.75 cm
BT
/F4 14.666667 Tf
1 0 0 -1 4.0719757 .80265617 Tm
0 -13.2773438 Td <002C> Tj
4.0719757 0 Td <0003> Tj
4.0719757 0 Td <0057> Tj
4.0719757 0 Td <004B> Tj
8.1511078 0 Td <004C> Tj
3.2561493 0 Td <0051> Tj
8.1511078 0 Td <004E> Tj
ET
Q
Renders correctly:
MAJOR
I think
However I can't understand how the y positions are calculated to do this (x is fine). The Text Rendering Matrix (TRM) is given by Text Matrix (TM) multiplied by Current Transformation Matrix (CTM) PDF1.7 Reference section 9.4.4. CTM is the identity matrix multiplied by each "cm" operation.
So for the first snippet,
CTM = [1 0 0 -1 0 792] x [0.75 0 0 0.75 72 192.75] = [0.75 0 0 -0.75 72 786.75]
TRM is TM x CTM:
TRM = [1 0 0 -1 0 0.8026] x [0.75 0 0 -0.75 72 786.75] = [0.75 0 0 0.75 72 786.1]
So, ignoring small details, the text will be rendered around y = 786 (actually 776 I reckon)
For the second snippet,
CTM = [1 0 0 -1 0 792] x [0.75 0 0 0.75 72 222.75] = [0.75 0 0 -0.75 72 816.75]
TRM = [1 0 0 -1 4.072 0.802] x [0.75 0 0 -0.75 72 816.75] = [0.75 0 0 0.75 75.05 816.4]
Again, ignoring small details, the text will be rendered around y = 816 (actually 806 I reckon)
But the y origin is the bottom of the page, and 816 is greater than 786. So how come the second snippet of text renders correctly below the first? I'm clearly missing something in the calculations, but I can't see what. Any ideas?
The error in your calculations is that you apply the cm matrix by multiplication from the right side. You instead have to apply it from the left side.
I.e. for the first snippet you have
CTM = [0.75 0 0 0.75 72 192.75] × [1 0 0 -1 0 792] = [0.75 0 0 -0.75 72 599.25]
and for the second snippet
CTM = [0.75 0 0 0.75 72 222.75] × [1 0 0 -1 0 792] = [0.75 0 0 -0.75 72 569.25]
With these current transformation matrices the rendered result is to be expected.
If you wonder how you should have known that you need to multiply from the left side...
This result is true in general for PDF: when a sequence of transformations is carried out, the matrix representing the combined transformation (M′) is calculated by premultiplying the matrix representing the additional transformation (MT) with the one representing all previously existing transformations (M):
𝑀′ = 𝑀𝑇 × 𝑀
(ISO 32000-2 section 8.3.4 "Transformation matrices")
Without going deep into matrices (not my forte, there is a slight error in my initial maths so images new corrected) you are working downwards from top left based on an inverted start point of 0 792 cm (Top Left corner)
The start of that snippet is above MAJOR 72 192.75 cm
Without outher transformations the text would be "UpsideDown" with M facing towards the bottom then the second 1 0 0 -1 mirrors it back upright and 0.8 "raises" it towards bottom so baseline is 193.5 ish from topleft at which point you "add" 0 -13.2773438 Td so the baseline is now about 205 from top left
Likewise, the origin for the second row is 72 222.75 cm down from above datum.
In both cases you placed their mirrored baseline even lower at 0 -13.2773438 Td thus both lines will be lower than shown above. In part due to the matrix inversions.
so here the second baseline is now at about 72 234 cm down from top left as subject to similar maths is roughly 222.75+.802+13.277 down but scale can also have effect.
Generally its best to use real time viewer of alterations (however this is not the best way just an example that by playing with rounded values I can see the effects).

Detect PDF form field radio button (radiobutton) shape / style

I need to programmatically categorize which shape a pdf form field radiobutton has. Therefore I created a test pdf using *crobat. I added a radiobutton group where each widget is using a different style.
One way could be to check the CA key of the appearance characteristics dictionary (MK) which is mapped to the ZapfDingbats font:
/MK<</BC[0.0]>> //CIRCLE (normally l)
/MK<</BC[0.0]/CA(4)>> //CHECK
/MK<</BC[0.0]/CA(8)>> //CROSS
/MK<</BC[0.0]/CA(u)>> //DIAMOND
/MK<</BC[0.0]/CA(n)>> //SQUARE
/MK<</BC[0.0]/CA(H)>> //STAR
However in the example PDF for the circle the CA key does not exist (it should have been /CA(l)). To implicitly assume a round shape does not seem correct.
Another idea would be to look at the appearance dictionary. For the example given in the pdf spec it seems possible:
stream
q
0 0 1 rg
BT
/ZaDb 12 Tf
0 0 Td
(l) Tj
ET
Q
endstream
However the normal appearance generated by *crobat looks like that:
stream
q
1 0 0 1 9 9 cm
8.5 0 m
8.5 4.6946 4.6946 8.5 0 8.5 c
-4.6946 8.5 -8.5 4.6946 -8.5 0 c
-8.5 -4.6946 -4.6946 -8.5 0 -8.5 c
4.6946 -8.5 8.5 -4.6946 8.5 0 c
s
Q
0.501953 G
q
0.7071 0.7071 -0.7071 0.7071 9 9 cm
7.5 0 m
7.5 4.1423 4.1423 7.5 0 7.5 c
-4.1423 7.5 -7.5 4.1423 -7.5 0 c
S
Q
0.75293 G
q
0.7071 0.7071 -0.7071 0.7071 9 9 cm
-7.5 0 m
-7.5 -4.1423 -4.1423 -7.5 0 -7.5 c
4.1423 -7.5 7.5 -4.1423 7.5 0 c
S
Q
q
1 0 0 1 9 9 cm
3.5 0 m
3.5 1.9331 1.9331 3.5 0 3.5 c
-1.9331 3.5 -3.5 1.9331 -3.5 0 c
-3.5 -1.9331 -1.9331 -3.5 0 -3.5 c
1.9331 -3.5 3.5 -1.9331 3.5 0 c
f
Q
endstream
My question: Is there a way to detect that a widget annotation has a round shape / circular style? I know that any arbitrary shape can be defined as an appearance however for the use case at hand the differentiation of those 6 styles is more than enough.
If the answer somehow depends on the pdf lib (due to certain functionality): currently openPDF is used and other libs like pdfbox or iText are in use, too.
First you can check if the /CA entry is 'l'. If /CA does not exist you can check if the appearance stream contains 'c', 'v', or 'y' operators (curve operators). If they are present you can assume a circular style.
This is a empiric approach but it might work for you situation.

PDF m l operators

I am using a PDF parser to extract lines from a pdf document. It fails on a specific doc generated pdf. The smallest pdf that it fails for has a 1 cell 1 row table, but the stream shows a 2 cell 1 row table. I have these questions:-
Why does the stream show 2 cells instead of just 1?
What are those re operators for, as there are no rectangles?
Who generates these instructions, is it MS Word? Or the PDF Printing application (Cute PDF Writer)?
Here is the pdf :-
Here is the relevant stream:-
stream
q 0.12 0 0 0.12 0 0 cm
/R7 gs
q
647 5996 m
700 5996 l
700 5885 l
647 5885 l
h
W n
0 0 0 rg
q
8.33333 0 0 8.33333 0 0 cm BT
/R8 11.04 Tf
0.998087 0 0 1 77.64 709.2 Tm
()Tj
ET
Q
Q
q
700 5996 m
746 5996 l
746 5885 l
700 5885 l
h
W n
0 0 0 rg
q
8.33333 0 0 8.33333 0 0 cm BT
/R8 11.04 Tf
0.998087 0 0 1 84 709.2 Tm
()Tj
ET
Q
Q
0 0 0 rg
600 5996 4 4 re
f
600 5996 4 4 re
f
604 5996 3892 4 re
f
4496 5996 4 4 re
f
4496 5996 4 4 re
f
600 5884 4 112 re
f
600 5880 4 4 re
f
600 5880 4 4 re
f
604 5880 3892 4 re
f
4496 5884 4 112 re
f
4496 5880 4 4 re
f
4496 5880 4 4 re
f
q
8.33333 0 0 8.33333 0 0 cm BT
/R8 11.04 Tf
0.998087 0 0 1 72 695.28 Tm
()Tj
ET
Q
Q
endstream
and here is the image drawn using the m and l instructions above :-
Why does the stream show 2 cells instead of just 1?
The stream does not show any cells at all. Only tagged PDFs may have a certain awareness of tables and table cells but your PDF does not look tagged.
What you (considering your question title) appear to mean are the sequences
647 5996 m
700 5996 l
700 5885 l
647 5885 l
h
W n
and
700 5996 m
746 5996 l
746 5885 l
700 5885 l
h
W n
But all they do is intersecting the current clip path with a rectangle. Thus, following drawing operations are restricted to the respective rectangle. Such restriction can be found in PDFs in many situations, table cells are only one of them, and such clip path changes are not even necessary for table cells...
Furthermore, considering the preceding transformation matrix change
0.12 0 0 0.12 0 0 cm
the rectangles above are fairly small, each probably large enough for a single character.
What are those re operators for, as there are no rectangles?
Well, they are rectangles.
Very small in height and/or width, but rectangles nonetheless.
And they are filled rectangles, cf. the f operator.
To make a long story short, the "lines" around the area we perceive as a table cell, are actually filled rectangles:
604 5996 3892 4 re
600 5884 4 112 re
604 5880 3892 4 re
4496 5884 4 112 re
Furthermore the corners of the cell are drawn as tiny squares (and each corner twice):
600 5996 4 4 re
600 5996 4 4 re
4496 5996 4 4 re
4496 5996 4 4 re
600 5880 4 4 re
600 5880 4 4 re
4496 5880 4 4 re
4496 5880 4 4 re
Thus, these re instructions give you the border edges and corners of what we perceive as table cell.
Who generates these instructions, is it MS Word? Or the PDF Printing application (Cute PDF Writer)?
The concrete instructions you see are PDF instructions. Thus, your printing application creates them.
Of course, though, your printing application creates them because that is how it interprets the MS Word output...
Cute PDF Writer apparently (from a quick glance on their web page) uses the Windows printing system. In general, in cases like this, you print from MS Word, and MS Word will try to use Windows methods to draw the lines and other items, which the printer driver (Cute PDF Writer in this case) will then translate to PDF commands. An intermediate stage with first rendering to PostScript and then translating to PDF is also possible.
So, that would mean that MS Word is responsible for the fact that two cells are drawn.
I only see one rectangle in the image of the PDF that you posted, so I'm not sure what is happening here. Also, I can't explain the other re commands. The rectangles in the second image look like they might be a frame around a two-on-one printed page, but the coordinates look strange, so it could also be something else.

My program reads PDF and try to find coordinate of each glyph in user space

it goes like this
q
0.1199951 0 0 0.1199951 0 0 cm
1 g
824 4101 267 389 re
f
Q
q
0.1199951 0 0 0.1199951 0 0 cm
1 g
824 4853 267 25 re
f
Q
q
0.1199951 0 0 0.1199951 0 0 cm
1 g
824 5241 267 25 re
f
Q
q
0.1199951 0 0 0.1199951 0 0 cm
1 g
1090 578 3081 1988 re
f
Q
q
0.1199951 0 0 0.1199951 0 0 cm
603 586 m
603 1800 l
649 1800 l
649 586 l
h
W n
8.3336724 0 0 8.3336724 0 0 cm
BT
/T1_0 5.04 Tf
0 1.0002 -1 0 76.8 70.32 Tm
(J)Tj
I want to ask what should be coordinate for J ?
My cropbox is 0 0 612 792 , Rotate value is 90.
So according to me
Th=1 default,
Tfs=5.04, from {/T1_0 5.04 Tf}
Trise=0 default,
teststatematrix
5.04 1 0
0 5.04 0
0 0 1
Tm
0 1.0002 0
-1 0 0
76.8 70.32 1
TRM = textstatematrix X Tm
-1 5.041 0
-5.040 0 0
76.800 70.320 1
So
[x,y,1] = [76.8, 70.32, 1] X TRM = [-354.413 457.469 1]
So x coordinate in user space is coming to be a negative number. Can you please Explain What mistake i am doing?
The matrix Trm calculated by the OP as
-1 5.041 0
-5.040 0 0
76.800 70.320 1
is the text rendering matrix described as follows in the PDF specification:
Conceptually, the entire transformation from text space to device space may be represented by a text rendering matrix, Trm:
(section 9.4.2, ISO 32000-1:2008)
The OP's mistake is not in calculating this matrix but in using it: This matrix contains the entire transformation from text space to device space,
Tj and other text-showing operators shall position the origin of the first glyph to be painted at the origin of text space.
(section 9.2.4 ISO 32000-1:2008)
and
The glyph origin is the point (0, 0) in the glyph coordinate system
(ibidem)
To determine, therefore, where the OP's
(J)Tj
puts the origin of the glyph J, one has to apply that matrix to the origin (0, 0), not to (76.8, 70.32) as the OP did.
Thus,
[x,y,1] = [0, 0, 1] X Trm = [76.8, 70.32, 1]
i.e. the coordinates of J are (76.8, 70.32) in device space. As the OP assumed the initial transformation matrix to have been the identity matrix, this device space essentially is the default user space.
Unfortunately the OP did not explain the coordinates in which coordinate system he is looking for. Thus, these coordinates probably are not the coordinates he was looking for.

What is the smallest possible valid PDF?

Out of simple curiosity, having seen the smallest GIF, what is the smallest possible valid PDF file?
This is an interesting problem. Taking it by the book, you can start off with this:
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
xref
0 4
0000000000 65535 f
0000000010 00000 n
0000000053 00000 n
0000000102 00000 n
trailer<</Size 4/Root 1 0 R>>
startxref
149
%EOF
which is 291 bytes of PDF joy. Acrobat opens it, but it complains somewhat. There is one page in it and it is 3/72" square, the minimum allowed by the spec.
However, Acrobat X doesn't even bother with the cross reference table anymore, so we can take that out:
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
trailer<</Size 4/Root 1 0 R>>
Acrobat complains, but opens it. Now we're at 178 bytes.
Turns out that you don't need that /Size in the trailer. Now we're at 172:
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
trailer<</Root 1 0 R>>
Turns out you don't need all those pesky /Type elements in your dictionaries:
%PDF-1.0
1 0 obj<</Pages 2 0 R>>endobj 2 0 obj<</Kids[3 0 R]/Count 1>>endobj 3 0 obj<</MediaBox[0 0 3 3]>>endobj
trailer<</Root 1 0 R>>
Now we're at 138 bytes.
It also turns out that when the spec says "shall be an indirect reference" and /Count is required, and the header "must" be %PDF-1.0, they're making loose suggestions. This is the smallest I could make it and have it openable in Acrobat X:
%PDF-1.
trailer<</Root<</Pages<</Kids[<</MediaBox[0 0 3 3]>>]>>>>>>
70 bytes.
Now, my editor uses Windows newline discipline, but Acrobat accepts Windows, Mac, or Unix conventions, so by using a hex editor, I replaced the \r\n with \r and removed the last newline altogether, which leaves me with 67 bytes
25 50 44 46 2D 31 2E 0D 74 72 61 69 6C 65 72 3C
3C 2F 52 6F 6F 74 3C 3C 2F 50 61 67 65 73 3C 3C
2F 4B 69 64 73 5B 3C 3C 2F 4D 65 64 69 61 42 6F
78 5B 30 20 30 20 33 20 33 5D 3E 3E 5D 3E 3E 3E
3E 3E 3E
I tried taking off the last end dictionary (>>), but Acrobat wouldn't have that. The PDF reading built-in to Google Chrome (FoxIt) won't open it.
As a PostScript (HA! See what I did there?), if you consent to Acrobat "repairing" the file, it bumps up to 3550 bytes, most of it optional metadata, but it leaves behind a number of clear spec violations.
I could not get the hello world example to open.
For a small-ish file with text content :
%PDF-1.2
9 0 obj
<<
>>
stream
BT/ 9 Tf(Test)' ET
endstream
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Contents 9 0 R
>>
endobj
5 0 obj
<<
/Kids [4 0 R ]
/Count 1
/Type /Pages
/MediaBox [ 0 0 99 9 ]
>>
endobj
3 0 obj
<<
/Pages 5 0 R
/Type /Catalog
>>
endobj
trailer
<<
/Root 3 0 R
>>
%%EOF
Based on all the answers here, here's the smallest PDF with text:
SMALL_PDF = (
b"%PDF-1.2 \n"
b"9 0 obj\n<<\n>>\nstream\nBT/ 32 Tf( YOUR TEXT HERE )' ET\nendstream\nendobj\n"
b"4 0 obj\n<<\n/Type /Page\n/Parent 5 0 R\n/Contents 9 0 R\n>>\nendobj\n"
b"5 0 obj\n<<\n/Kids [4 0 R ]\n/Count 1\n/Type /Pages\n/MediaBox [ 0 0 250 50 ]\n>>\nendobj\n"
b"3 0 obj\n<<\n/Pages 5 0 R\n/Type /Catalog\n>>\nendobj\n"
b"trailer\n<<\n/Root 3 0 R\n>>\n"
b"%%EOF"
)
As base64. Copy this and test in Chrome:
data:application/pdf;base64,JVBERi0xLjIgCjkgMCBvYmoKPDwKPj4Kc3RyZWFtCkJULyAzMiBUZiggIFlPVVIgVEVYVCBIRVJFICAgKScgRVQKZW5kc3RyZWFtCmVuZG9iago0IDAgb2JqCjw8Ci9UeXBlIC9QYWdlCi9QYXJlbnQgNSAwIFIKL0NvbnRlbnRzIDkgMCBSCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9LaWRzIFs0IDAgUiBdCi9Db3VudCAxCi9UeXBlIC9QYWdlcwovTWVkaWFCb3ggWyAwIDAgMjUwIDUwIF0KPj4KZW5kb2JqCjMgMCBvYmoKPDwKL1BhZ2VzIDUgMCBSCi9UeXBlIC9DYXRhbG9nCj4+CmVuZG9iagp0cmFpbGVyCjw8Ci9Sb290IDMgMCBSCj4+CiUlRU9G
To make the page bigger, adjust the MediaBox dimensions :)
/MediaBox [ 0 0 250 50 ]
I thought I'd make a smallest pdf that displays "Hello World". The text is in the lower left corner. Sorry about the 9-point font, any larger would cost an extra byte :)
172 bytes for Adobe Reader X (if saved with linefeed-only newlines and no trailing newline or null-byte):
%PDF-1.
1 0 obj<</Kids[<</Parent 1 0 R/Resources<<>>/Contents 2 0 R>>]>>endobj 2 0 obj<<>>stream
BT/ 9 Tf(Hello World)' ET
endstream
endobj trailer<</Root<</Pages 1 0 R>>>>
120 bytes for Chrome's builtin PDF viewer:
%PDF 1 0 obj<</Pages<</Kids[<</Contents<<>>stream
BT 9 Tf(Hello World)' ET endstream>>]>>>>endobj trailer<</Root 1 0 R>>
To easily see this in Chrome, paste this URI in the address bar (SO won't let me link to it, and it won't work at all in other browsers):
data:application/pdf,%25PDF%201%200%20obj%3C%3C%2FPages%3C%3C%2FKids%5B%3C%3C%2FContents%3C%3C%3E%3Estream%0ABT%209%20Tf(Hello%20World)'%20ET%20endstream%3E%3E%5D%3E%3E%3E%3Eendobj%20trailer%3C%3C%2FRoot%201%200%20R%3E%3E
I was going to give an example of what I thought was the minimal valid "universal" PDF. until I noticed that the whole ethos of using a PDF is to ensure it will render exactly the same on all devices and their PDF readers. However on cross checking my "perfectly small well formed PDF" I spotted this. TL;DR this is fixed in my personal minimal text template (at the end)
So the ground rule was "smallest possible valid PDF" but I consider this shortage should count as an invalid PDF since it does not adhere to the concept of "Fit for Purpose" thus the minimum PDF must itself as a minimum contain a minimum of one means of fixing a working font.
To explain my proposed solution and why its less than perfect here it is in a rough form because of cut and paste.
%PDF-1.0
%µ¶
1 0 obj
<</Type/Catalog/Pages 2 0 R>>
endobj
2 0 obj
<</Kids[3 0 R]/Count 1/Type/Pages/MediaBox[0 0 595 792]>>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Contents 4 0 R/Resources<<>>>>
endobj
4 0 obj
<</Length 58>>
stream
q
BT
/ 96 Tf
1 0 0 1 36 684 Tm
(Hello World!) Tj
ET
Q
endstream
endobj
xref
0 5
0000000000 65536 f
0000000016 00000 n
0000000062 00000 n
0000000136 00000 n
0000000209 00000 n
trailer
<</Size 5/Root 1 0 R>>
startxref
316
%%EOF
Whilst not defined by the rules of the question I have included some past experience of user problems.
The first difference you might note is media box in 2nd obj is a hybrid MediaBox[0 0 595 792] which is a minimax A4 width and minimax US Letter high, since otherwise the "universal page" in most countries would force a second sheet # 100% scale printing either for too wide or too high a page definition for the locale defaults.
And the current problem is evidenced in 3rd obj as no fonts have been set for resources, thus in aiming for minimal the PDF, I contest without a font defined, will be Invalid.
Thus none of the answers so far including my own, appear to produce a PDF that will "WORK" as a "VALID" means to produce the same printout, regardless of platform or viewer.
Turning to libraries I found a 3MB zip with an exceptionally versatile windows.exe (a single file that can do most pdf functions like split merge import stamp export attachments etc.) which can take "Hello World! in a command line and produce a good working file, this is page centre zoomed in
it uses a stream for the text and its positioning, and has other conforming data like producer so I offer this as a potentially good minimal to pare down, note as presented this file will appear blank due to stream corruption from binary to text.
%PDF-1.7
%µ¶
1 0 obj
<</Pages 2 0 R/Type/Catalog>>
endobj
2 0 obj
<</Count 1/Kids[5 0 R]/MediaBox[0 0 595 792]/Type/Pages>>
endobj
3 0 obj
<</BaseFont/Helvetica/Encoding/WinAnsiEncoding/Subtype/Type1/Type/Font>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 101>>
stream
xœ*Tp
QÐw3P04Ò30PISp
Q01
à˜kdf¢ga¬`bhâ%ç‚ô(„”#©Aîè"EéÚlA
HW‘‚†GjNN¾Bx~QNŠ¢¦BHÈÞ## ÿÿFå
endstream
endobj
5 0 obj
<</Contents 4 0 R/CropBox[0 0 595 792]/MediaBox[0 0 595 792]/Parent 2 0 R/Resources<</Font<</F0 3 0 R>>>>/Type/Page>>
endobj
6 0 obj
<</CreationDate(D:20220600600709+01'00')/ModDate(D:20220600600709+01'00')/Producer(me 2)>>
endobj
xref
0 7
0000000000 65536 f
0000000016 00000 n
0000000062 00000 n
0000000136 00000 n
0000000225 00000 n
0000000395 00000 n
0000000529 00000 n
trailer
<</Size 7/Info 6 0 R/Root 1 0 R/ID[<A2A0CE5CCD9D0DABD5845AD574BF0A5C><09BF9D281BE12CB5B5933BB2B62B0D4D>]>>
startxref
636
%%EOF
P.S I deliberately added a non valid item so is intentionally not the minimum working answer, see if you can work out what's clearly wrong:-)
My personal offering
So I am often asked how to write plain text templated PDFs thus need the font to be static (Helvetica or Courier should do) and a structure that is easy to modify using windows CMD line, so this suits my purpose its now 698 bytes as shown with two place holders to show multi-line so if needed can find and replace Helvetica with Courier (note intentional 2 spaces after to keep byte count)
%PDF-1.1
%âã
1 0 obj
<</Type/Catalog/Pages<</Type/Pages/Count 1/Kids[2 0 R]>>>>
endobj
2 0 obj
<</Type/Page/Parent 1 0 R/MediaBox[0 0 594 792]/Resources<</Font<</F1 3 0 R>>/ProcSet[/PDF/Text]>>/Contents 4 0 R>>
endobj
3 0 obj
<</Type/Font/Subtype/Type1/Name/F1/BaseFont/Helvetica>>
endobj
4 0 obj
<</Length 5 0 R>>
stream
BT
/F1 36 Tf
1 0 0 1 255 752 Tm
48 TL
( Hello)'
(World!)'
ET
endstream
endobj
5 0 obj
78
endobj
xref
0 6
0000000000 65536 f
0000000017 00000 n
0000000094 00000 n
0000000228 00000 n
0000000302 00000 n
0000000425 00000 n
trailer
<</Size 6/Info <</CreationDate(D:2023)/Producer(cmd2pdf)/Title(mini.pdf)>>/Root 1 0 R>>
startxref
446
%%EOF
To see how this approach works in windows command line RIGHT CLICK and download as text https://github.com/GitHubRulesOK/MyNotes/raw/master/MAKE-PDF.cmd (now 200 lines long!) NOTE browser security may ask you to trust a cmd as download thus use .txt extension and you will still need to change properties to UNBLOCK once you are happy it should do no harm to run it!
#mkl are you up for producing your best shot ?
According to this Ange Albertini lecture, the smallest possible valid PDF is 36 bytes:
%PDF-(NULL)trailer<</Root<</Pages<<>>>>>>
Where (NULL) is the unprintable ASCII 0 character.
However, as Ange notes, while this PDF is technically valid, most PDF reader apps will regard it as invalid based on the size alone, thus failing to open it.
I needed a PDF version which is usable by a PDF converter (A4 format issue.. all the above constructs worked with Adobe Reader and Chrome, but not with the PDF converter which required DIN A4).
I found this site and this PDF worked fine with the PDF converter I'm using: https://help.callassoftware.com/m/73261/l/798383-how-to-create-a-simple-pdf-file
Working for a PDF related company, I know that the following content will be working pretty well. This is a valid empty A4 page:
%PDF-1.4
%âãÏÓ
5 0 obj
<<
/Length 1
>>
stream
endstream
endobj
4 0 obj
<<
/Type /Page
/MediaBox [0 0 612 792]
/Resources <<
>>
/Contents 5 0 R
/Parent 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/Kids [4 0 R]
/Count 1
>>
endobj
1 0 obj
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
3 0 obj
<<
/Creator (PDF Creator http://www.pdf-tools.com)
/CreationDate (D:20150701112447+02'00')
/ModDate (D:20220607183602+02'00')
/Producer (3-Heights\222 PDF Optimization Shell 6.0.0.0 \(http://www.pdf-tools.com\))
>>
endobj
xref
0 6
0000000000 65535 f
0000000226 00000 n
0000000169 00000 n
0000000275 00000 n
0000000065 00000 n
0000000015 00000 n
trailer
<<
/Size 6
/Root 1 0 R
/Info 3 0 R
/ID [<1C3500CA9F7232B97E0EF3F789E8B7F2> <254C8D153F655D49945EAD68D801E011>]
>>
startxref
505
%%EOF
Now using Javascript, you can embed this into your js bundle. First encode in base64 the content above, then use the encoded string and create a Blob file with it by writing:
const str = 'JVBERi0xLjQKJcOiw6PDj8OTCjUgMCBvYmoKPDwKL0xlbmd0aCAxCj4+CnN0cmVhbQogCmVuZHN0cmVhbQplbmRvYmoKNCAwIG9iago8PAovVHlwZSAvUGFnZQovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Cj4+Ci9Db250ZW50cyA1IDAgUgovUGFyZW50IDIgMCBSCj4+CmVuZG9iagoyIDAgb2JqCjw8Ci9UeXBlIC9QYWdlcwovS2lkcyBbNCAwIFJdCi9Db3VudCAxCj4+CmVuZG9iagoxIDAgb2JqCjw8Ci9UeXBlIC9DYXRhbG9nCi9QYWdlcyAyIDAgUgo+PgplbmRvYmoKMyAwIG9iago8PAovQ3JlYXRvciAoUERGIENyZWF0b3IgaHR0cDovL3d3dy5wZGYtdG9vbHMuY29tKQovQ3JlYXRpb25EYXRlIChEOjIwMTUwNzAxMTEyNDQ3KzAyJzAwJykKL01vZERhdGUgKEQ6MjAyMjA2MDcxODM2MDIrMDInMDAnKQovUHJvZHVjZXIgKDMtSGVpZ2h0c1wyMjIgUERGIE9wdGltaXphdGlvbiBTaGVsbCA2LjAuMC4wIFwoaHR0cDovL3d3dy5wZGYtdG9vbHMuY29tXCkpCj4+CmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYKMDAwMDAwMDIyNiAwMDAwMCBuCjAwMDAwMDAxNjkgMDAwMDAgbgowMDAwMDAwMjc1IDAwMDAwIG4KMDAwMDAwMDA2NSAwMDAwMCBuCjAwMDAwMDAwMTUgMDAwMDAgbgp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKL0luZm8gMyAwIFIKL0lEIFs8MUMzNTAwQ0E5RjcyMzJCOTdFMEVGM0Y3ODlFOEI3RjI+IDwyNTRDOEQxNTNGNjU1RDQ5OTQ1RUFENjhEODAxRTAxMT5dCj4+CnN0YXJ0eHJlZgo1MDUKJSVFT0Y=';
const blob = new Blob([atob(str)], { type: 'application/pdf' });
In Java, use this:
private static String samplepdf = "255044462D312E0D747261696C65723C3C2F526F6F743C3C2F50616765733C3C2F4B6964735B3C3C2F4D65646961426F785B302030203320335D3E3E5D3E3E3E3E3E3E";
and then
byte[] bytes = hexStringToByteArray(samplepdf);
...
public byte[] hexStringToByteArray(String s) {
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
+ Character.digit(s.charAt(i + 1), 16));
}
return data;
}