PDF Cross-Reference Stream - pdf

I know that 0002 is object's number in pdf file. Last column's(from 01 to 07) name was "index" (In pdf reference document ) But what does 01,02,03,04,05,06 and 07 mean? Where they are pointing in object 2? I didn't get it.
Here is the Cross Reference Stream :
stream 01 0E8A 0 % Entry for object 2 (0x0E8A = 3722)
SyntaxCHAPTER 3 114
02 0002 00 % Entry for object 3 (in object stream 2, index 0)
02 0002 01 % Entry for object 4 (in object stream 2, index 1)
02 0002 02 % …
02 0002 03
02 0002 04
02 0002 05
02 0002 06
02 0002 07 % Entry for object 10 (in object stream 2, index 7)
01 1323 0 % Entry for object 11 (0x1323 = 4899)
endstream
and here is the second object in the example pdf document:
2 0 obj % The object stream, at offset 3722
<</Length ...
/N 8 % This stream contains 8 objects.
/First 47 % The stream-offset of the first object
>>
stream
3 0 4 50 5 72 … % The numbers and stream-offsets of the 8 objects
<</Type /StructTreeRoot % This is object 3.
/K 4 0 R
/RoleMap 5 0 R
/ClassMap 6 0 R
/ParentTree 7 0 R
/ParentTreeNextKey 8
>>
<< /S /Workbook % This is object 4 (K value from StructTreeRoot).
/P 8 0 R
/K 9 0 R
>>
<</Workbook /Div % This is object 5 (RoleMap).
/Worksheet /Sect
/TextBox /Figure
/Shape /Figure
>>
… % Objects 6 through 10 are defined here.
endstream
endobj

Look inside the PDF 32000 standard
According to the cross-reference streams' chapter, there're three types of entries that presents in xref streams.
These entries (with first byte = 02) tell us that they're located in the compressed stream object #2. Third field makes sense about number of object inside this compressed stream

Related

Creating DXF Spline programmatically

I need to create spline programmatically. I've made something like:
0
SECTION
2
HEADER
9
$ACADVER
1
AC1006
0
ENDSEC
0
SECTION
2
TABLES
0
TABLE
2
LAYER
0
LAYER
2
shape
70
64
62
250
6
CONTINUOUS
0
LAYER
2
holes
70
64
62
250
6
CONTINUOUS
0
ENDTAB
0
ENDSEC
0
SECTION
2
ENTITIES
0
SPLINE
8
shape
100
AcDbSpline
210
0
220
0
230
1
70
4
71
3
72
11
73
4
74
4
42
0.0000001
43
0.0000001
44
0.0000000001
40
0
40
0
40
0
40
0
40
1
40
1
40
1
40
2
40
2
40
2
40
2
10
0
20
0
30
0
10
100
20
50
30
0
10
40
20
40
30
0
10
15
20
23
30
0
11
0
21
0
31
0
11
200
21
200
31
0
11
80
21
80
31
0
11
432
21
234
31
0
0
ENDSEC
0
EOF
When I'm trying to open it in Autodesk TrueView, I'm getting an error:
Undefined group code 210 for object on line 54.
Invalid or incomplete DXF input -- drawing discarded.
Where is the error? When I'm copying just SPLINE section to the DXF generated by AI everything works fine. So I think I need to add something in the header section or something.
This file is DXF version AC1006 which is older than DXF R12. The SPLINE entity
requires at least DXF version AC1012 DXF R13/R14. But with DXF version AC1012
the tag structure of DXF files is changed (OBJECTS and CLASSES sections, SubClassMarkers ...), so just editing the DXF version
does not work.
See also: http://ezdxf.readthedocs.io/en/latest/dxfinternals/filestructure.html#minimal-dxf-content
Also the SPLINE entity seems to be invalid, it has no handle (5) and no owner
tag (330), and the whole AcDbEntity subclass is missing.
Your spline is of degree 3 with 11 knots (0, 0,0,0,1,1,1,2,2,2,2) and 4 control points ( (0,0), (100,50),(40,40),(15,23) ). This might be the problem culprit. You should either have 4 control points and 8 knots or 7 control points and 11 knots.
You may need to assign a handle to the SPLINE, since you're specifying $ACADVER = AC1018 = AutoCAD 2004 where item handles are required.
Try adding a 5-code pair right before the layer designation, like so, where AAAA is a unique hex-encoded handle:
  0
SPLINE
5 <-- add these two lines
AAAA <--
8
shape
100
AcDbSpline

Can I fetch the result of a SELECT query iteratively using LIMIT and OFFSET?

I am implementing a FUSE file system that uses a sqlite3 database for its backend. I do not plan to ever change the database backend as my file system uses sqlite3 as a file format. One of the functions a file system must implement is readdir function. This function allows the process to iteratively read the contents of a directory by repeatedly calling it and getting the next few directory entries (as many as the buffer can hold). The directory entries are returned may be returned in any order. I want to implement this operation with the following query:
SELECT fileno, name FROM dirents WHERE dirno = ? LIMIT -1 OFFSET ?;
where dirno is the directory I'm reading from and OFFSET ? is the number of entries I've already returned. I want to read as many rows as I can fit into the buffer (I cannot predict the count as these are variable-length records depending on the length of the file name) and then reset the query.
Due to the stateless nature of FUSE, keeping open a query and returning the next few rows until the directory has ended is not an option as I cannot reliably detect if the process closes the directory prematurely.
The dirents table has the following schema:
CREATE TABLE dirents (
dirno INTEGER NOT NULL REFERENCES inodes(inum),
fileno INTEGER NOT NULL REFERENCES inodes(inum),
name TEXT NOT NULL,
PRIMARY KEY (dirno, name)
) WITHOUT ROWID;
Question
In theory, a SELECT statement yields rows in no defined order. In practice, can I assume that when I execute the same prepared SELECT statement multiple times with successively larger OFFSET values, I get the same results as if I read all the data in a single query, i.e. the row order is the same unspecified order each time? An assumption that currently holds is that the database is not modified between queries.
Can I assume that the row order stays reasonably similar when a different query modifies the dirents table inbetween? Some glitches (e.g. directory entries appearing twice) will of course be observable by the program, but for usability (the main user of readdir is the ls command) it is highly useful if the directory listing is usually mostly correct.
If I cannot make these assumptions, what is a better way to reach the desired result?
I know that I could throw in an ORDER BY clause to make the row order well-defined, but I fear that this might have a strong impact on performance, especially when reading small chunks of a large directory—the directory has to be sorted every time a chunk is read.
The right solution to this problem is to use order by. If you are concerned about the performance of the order by, then use an index on the column used for the order by.
The simplest approach, in my opinion, would be to remove the without rowid option on the table creation. Then you can just access the table as:
SELECT fileno, name
FROM dirents
WHERE dirno = ?
ORDER BY rowid
LIMIT -1 OFFSET ?;
I realize this adds additional bytes to each row, but it is for a good purpose -- making sure that your queries are correct.
Actually, the best index for this table would be on dirno, rowid, fileno, name. Given the where clause, you are doing a full table scan anyway, unless you have an index.
If I add an ORDER BY name clause to the SELECT query, sqlite3 generates almost identical (except for a stray Noop) bytecode for the query but a row order is guaranteed:
sqlite> EXPLAIN SELECT fileno, name FROM dirents WHERE dirno = ? LIMIT -1 OFFSET ?;
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 21 0 00
1 Integer -1 1 0 00
2 Variable 2 2 0 00
3 MustBeInt 2 0 0 00
4 SetIfNotPos 2 2 0 00
5 Add 1 2 3 00
6 SetIfNotPos 1 3 -1 00
7 OpenRead 1 3 0 k(2,nil,nil) 02
8 Variable 1 4 0 00
9 IsNull 4 19 0 00
10 Affinity 4 1 0 D 00
11 SeekGE 1 19 4 1 00
12 IdxGT 1 19 4 1 00
13 IfPos 2 18 1 00
14 Column 1 2 5 00
15 Column 1 1 6 00
16 ResultRow 5 2 0 00
17 DecrJumpZero 1 19 0 00
18 Next 1 12 0 00
19 Close 1 0 0 00
20 Halt 0 0 0 00
21 Transaction 0 0 3 0 01
22 TableLock 0 3 0 dirents 00
23 Goto 0 1 0 00
sqlite> EXPLAIN SELECT fileno, name FROM dirents WHERE dirno = ? ORDER BY name LIMIT -1 OFFSET ?;
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 22 0 00
1 Noop 0 0 0 00
2 Integer -1 1 0 00
3 Variable 2 2 0 00
4 MustBeInt 2 0 0 00
5 SetIfNotPos 2 2 0 00
6 Add 1 2 3 00
7 SetIfNotPos 1 3 -1 00
8 OpenRead 2 3 0 k(2,nil,nil) 02
9 Variable 1 4 0 00
10 IsNull 4 20 0 00
11 Affinity 4 1 0 D 00
12 SeekGE 2 20 4 1 00
13 IdxGT 2 20 4 1 00
14 IfPos 2 19 1 00
15 Column 2 2 5 00
16 Column 2 1 6 00
17 ResultRow 5 2 0 00
18 DecrJumpZero 1 20 0 00
19 Next 2 13 0 00
20 Close 2 0 0 00
21 Halt 0 0 0 00
22 Transaction 0 0 3 0 01
23 TableLock 0 3 0 dirents 00
24 Goto 0 1 0 00
So I guess I'm going for that ORDER BY name clause.

understanding bit shifting in registers

I have to update the 32 bit register using these data
which includes bit shifting, I am confused about two things:
Which is the LSB and which is the MSB,
what is this operator |
Given the expression:
3 << 0 |
7 << 3 |
1 << 6 |
0 << 7 |
1 << 7 |
0 << 8 |
0 << 10 |
0 << 11 |
0 << 12 |
0 << 13 |
0 << 14
and the remaining 15 bits are 0.
How is the data shifted, assuming initial bits in register to be 0's?
011 111 1 0 1 0 0 0 0 0 0 X.......X
or
x .....X 0 0 0 0 0 0 1 0 1 111 011
The LSB (least-significant bit) is the bit whose value represents 1 (2^0) and the MSB is the bit whose value represents 2^(n-1), where n is the number of bits in the register. In general, when written out in binary, the MSB is the left-most bit and the LSB is the right-most bit. More often than not, the LSB is shown as bit 0 in hardware documentation, though I know one company that reverses the bit numbering so that the MSB is numbered 0.
<< is the C shift-left operator, shifting bit away from the LSB and toward the MSB. Therefore 7<<3 represents 111000 in binary.
| is the C bit-wise OR operator. This is used to combine values where resulting bits are one if either of the corresponding input bits is one.
Looking at your original value 3 << 0 | 7 << 3 | 1 << 6 | 0 << 7 | 1 << 7 | 0 << 8 | 0 << 10 | 0 << 11| 0 << 12 | 0 << 13 | 0 << 14
0000 0000 0000 0011 from 3<<0
0000 0000 0011 1000 from 7<<3
0000 0000 0100 0000 from 1<<6
0000 0000 0000 0000 from 0<<7
etc.
This type of construct is typically used to describe a value going into a register noting the individual fields of the register.

merge similar files in awk

I have the following files
File A
Kmax Event File - Text Format
1 6 1000
1 4143 9256 13645 16426 20490
49 4144 8820 14751 16529 20505
45 4146 8308 12303 16912 22715
75 4139 9049 14408 16447 20480
23 4137 8449 13223 16511 20498
22 4142 8795 14955 16615 20493
File B
Kmax Event File - Text Format
1 6 1000
42 4143 9203 13401 16475 20480
44 4140 8251 12302 16932 21872
849 6283 8455 12301 16415 20673
18 4148 8238 12757 16597 20484
19 4144 8268 12306 17110 21868
50 4134 8331 12663 16501 20606
988 5682 8296 12306 16577 20592
61 4147 8330 12307 16945 22497
0 4138 8333 12310 16871 22749
File C, File D, ... and all those files have exact the same format. In addition the file name of each file is the following : run, run%1, run%2, run%3, run%4 etc. The file number could reach even up to 30, run%30 that is.
What I'd like to do is to merge the files in the following way
Kmax Event File - Text Format
1 6 1000
1 4143 9256 13645 16426 20490
49 4144 8820 14751 16529 20505
45 4146 8308 12303 16912 22715
75 4139 9049 14408 16447 20480
23 4137 8449 13223 16511 20498
22 4142 8795 14955 16615 2049
42 4143 9203 13401 16475 20480
44 4140 8251 12302 16932 21872
849 6283 8455 12301 16415 20673
18 4148 8238 12757 16597 20484
19 4144 8268 12306 17110 21868
50 4134 8331 12663 16501 20606
988 5682 8296 12306 16577 20592
61 4147 8330 12307 16945 22497
0 4138 8333 12310 16871 22749
I believe I can do it using
awk '{i=$1;sub(i,x);A[i]=A[i]$0} FILENAME==ARGV[ARGC-1]{print i A[i]}'
but in this way the two first lines of the second file will be present. In addition I don't know if the above line will work. The problem is that I will need to merge many files at the same time. Any idea to merge those almost identical files?
Using grouping braces in the shell
{ cat run; sed '1,2d' run%*; } > c
You can use cat and tail:
cat A > C && tail -n +3 B >> C
This will merge file A and B in a new file named C.
Using awk:
awk 'FNR==NR{print; next} FNR>2' A B > C
If you have more than one file to merge into one, you can list them next to A B in the awk version, e.g A B D. C in awk version is the output file containing merged data.
In cat and tail version you can repeat tail part of the code for other files, e.g
cat A > C && tail -n +3 B >> C && tail -n +3 D >> C
or create some kind of loop to iterate over files.
Print all lines from the first file (NR==FNR) and only line 3 and on from the rest of the files (FNR>2):
awk 'NR==FNR||FNR>2' run*

What is the smallest possible valid PDF?

Out of simple curiosity, having seen the smallest GIF, what is the smallest possible valid PDF file?
This is an interesting problem. Taking it by the book, you can start off with this:
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
xref
0 4
0000000000 65535 f
0000000010 00000 n
0000000053 00000 n
0000000102 00000 n
trailer<</Size 4/Root 1 0 R>>
startxref
149
%EOF
which is 291 bytes of PDF joy. Acrobat opens it, but it complains somewhat. There is one page in it and it is 3/72" square, the minimum allowed by the spec.
However, Acrobat X doesn't even bother with the cross reference table anymore, so we can take that out:
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
trailer<</Size 4/Root 1 0 R>>
Acrobat complains, but opens it. Now we're at 178 bytes.
Turns out that you don't need that /Size in the trailer. Now we're at 172:
%PDF-1.0
1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj 3 0 obj<</Type/Page/MediaBox[0 0 3 3]>>endobj
trailer<</Root 1 0 R>>
Turns out you don't need all those pesky /Type elements in your dictionaries:
%PDF-1.0
1 0 obj<</Pages 2 0 R>>endobj 2 0 obj<</Kids[3 0 R]/Count 1>>endobj 3 0 obj<</MediaBox[0 0 3 3]>>endobj
trailer<</Root 1 0 R>>
Now we're at 138 bytes.
It also turns out that when the spec says "shall be an indirect reference" and /Count is required, and the header "must" be %PDF-1.0, they're making loose suggestions. This is the smallest I could make it and have it openable in Acrobat X:
%PDF-1.
trailer<</Root<</Pages<</Kids[<</MediaBox[0 0 3 3]>>]>>>>>>
70 bytes.
Now, my editor uses Windows newline discipline, but Acrobat accepts Windows, Mac, or Unix conventions, so by using a hex editor, I replaced the \r\n with \r and removed the last newline altogether, which leaves me with 67 bytes
25 50 44 46 2D 31 2E 0D 74 72 61 69 6C 65 72 3C
3C 2F 52 6F 6F 74 3C 3C 2F 50 61 67 65 73 3C 3C
2F 4B 69 64 73 5B 3C 3C 2F 4D 65 64 69 61 42 6F
78 5B 30 20 30 20 33 20 33 5D 3E 3E 5D 3E 3E 3E
3E 3E 3E
I tried taking off the last end dictionary (>>), but Acrobat wouldn't have that. The PDF reading built-in to Google Chrome (FoxIt) won't open it.
As a PostScript (HA! See what I did there?), if you consent to Acrobat "repairing" the file, it bumps up to 3550 bytes, most of it optional metadata, but it leaves behind a number of clear spec violations.
I could not get the hello world example to open.
For a small-ish file with text content :
%PDF-1.2
9 0 obj
<<
>>
stream
BT/ 9 Tf(Test)' ET
endstream
endobj
4 0 obj
<<
/Type /Page
/Parent 5 0 R
/Contents 9 0 R
>>
endobj
5 0 obj
<<
/Kids [4 0 R ]
/Count 1
/Type /Pages
/MediaBox [ 0 0 99 9 ]
>>
endobj
3 0 obj
<<
/Pages 5 0 R
/Type /Catalog
>>
endobj
trailer
<<
/Root 3 0 R
>>
%%EOF
Based on all the answers here, here's the smallest PDF with text:
SMALL_PDF = (
b"%PDF-1.2 \n"
b"9 0 obj\n<<\n>>\nstream\nBT/ 32 Tf( YOUR TEXT HERE )' ET\nendstream\nendobj\n"
b"4 0 obj\n<<\n/Type /Page\n/Parent 5 0 R\n/Contents 9 0 R\n>>\nendobj\n"
b"5 0 obj\n<<\n/Kids [4 0 R ]\n/Count 1\n/Type /Pages\n/MediaBox [ 0 0 250 50 ]\n>>\nendobj\n"
b"3 0 obj\n<<\n/Pages 5 0 R\n/Type /Catalog\n>>\nendobj\n"
b"trailer\n<<\n/Root 3 0 R\n>>\n"
b"%%EOF"
)
As base64. Copy this and test in Chrome:
data:application/pdf;base64,JVBERi0xLjIgCjkgMCBvYmoKPDwKPj4Kc3RyZWFtCkJULyAzMiBUZiggIFlPVVIgVEVYVCBIRVJFICAgKScgRVQKZW5kc3RyZWFtCmVuZG9iago0IDAgb2JqCjw8Ci9UeXBlIC9QYWdlCi9QYXJlbnQgNSAwIFIKL0NvbnRlbnRzIDkgMCBSCj4+CmVuZG9iago1IDAgb2JqCjw8Ci9LaWRzIFs0IDAgUiBdCi9Db3VudCAxCi9UeXBlIC9QYWdlcwovTWVkaWFCb3ggWyAwIDAgMjUwIDUwIF0KPj4KZW5kb2JqCjMgMCBvYmoKPDwKL1BhZ2VzIDUgMCBSCi9UeXBlIC9DYXRhbG9nCj4+CmVuZG9iagp0cmFpbGVyCjw8Ci9Sb290IDMgMCBSCj4+CiUlRU9G
To make the page bigger, adjust the MediaBox dimensions :)
/MediaBox [ 0 0 250 50 ]
I thought I'd make a smallest pdf that displays "Hello World". The text is in the lower left corner. Sorry about the 9-point font, any larger would cost an extra byte :)
172 bytes for Adobe Reader X (if saved with linefeed-only newlines and no trailing newline or null-byte):
%PDF-1.
1 0 obj<</Kids[<</Parent 1 0 R/Resources<<>>/Contents 2 0 R>>]>>endobj 2 0 obj<<>>stream
BT/ 9 Tf(Hello World)' ET
endstream
endobj trailer<</Root<</Pages 1 0 R>>>>
120 bytes for Chrome's builtin PDF viewer:
%PDF 1 0 obj<</Pages<</Kids[<</Contents<<>>stream
BT 9 Tf(Hello World)' ET endstream>>]>>>>endobj trailer<</Root 1 0 R>>
To easily see this in Chrome, paste this URI in the address bar (SO won't let me link to it, and it won't work at all in other browsers):
data:application/pdf,%25PDF%201%200%20obj%3C%3C%2FPages%3C%3C%2FKids%5B%3C%3C%2FContents%3C%3C%3E%3Estream%0ABT%209%20Tf(Hello%20World)'%20ET%20endstream%3E%3E%5D%3E%3E%3E%3Eendobj%20trailer%3C%3C%2FRoot%201%200%20R%3E%3E
I was going to give an example of what I thought was the minimal valid "universal" PDF. until I noticed that the whole ethos of using a PDF is to ensure it will render exactly the same on all devices and their PDF readers. However on cross checking my "perfectly small well formed PDF" I spotted this. TL;DR this is fixed in my personal minimal text template (at the end)
So the ground rule was "smallest possible valid PDF" but I consider this shortage should count as an invalid PDF since it does not adhere to the concept of "Fit for Purpose" thus the minimum PDF must itself as a minimum contain a minimum of one means of fixing a working font.
To explain my proposed solution and why its less than perfect here it is in a rough form because of cut and paste.
%PDF-1.0
%µ¶
1 0 obj
<</Type/Catalog/Pages 2 0 R>>
endobj
2 0 obj
<</Kids[3 0 R]/Count 1/Type/Pages/MediaBox[0 0 595 792]>>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Contents 4 0 R/Resources<<>>>>
endobj
4 0 obj
<</Length 58>>
stream
q
BT
/ 96 Tf
1 0 0 1 36 684 Tm
(Hello World!) Tj
ET
Q
endstream
endobj
xref
0 5
0000000000 65536 f
0000000016 00000 n
0000000062 00000 n
0000000136 00000 n
0000000209 00000 n
trailer
<</Size 5/Root 1 0 R>>
startxref
316
%%EOF
Whilst not defined by the rules of the question I have included some past experience of user problems.
The first difference you might note is media box in 2nd obj is a hybrid MediaBox[0 0 595 792] which is a minimax A4 width and minimax US Letter high, since otherwise the "universal page" in most countries would force a second sheet # 100% scale printing either for too wide or too high a page definition for the locale defaults.
And the current problem is evidenced in 3rd obj as no fonts have been set for resources, thus in aiming for minimal the PDF, I contest without a font defined, will be Invalid.
Thus none of the answers so far including my own, appear to produce a PDF that will "WORK" as a "VALID" means to produce the same printout, regardless of platform or viewer.
Turning to libraries I found a 3MB zip with an exceptionally versatile windows.exe (a single file that can do most pdf functions like split merge import stamp export attachments etc.) which can take "Hello World! in a command line and produce a good working file, this is page centre zoomed in
it uses a stream for the text and its positioning, and has other conforming data like producer so I offer this as a potentially good minimal to pare down, note as presented this file will appear blank due to stream corruption from binary to text.
%PDF-1.7
%µ¶
1 0 obj
<</Pages 2 0 R/Type/Catalog>>
endobj
2 0 obj
<</Count 1/Kids[5 0 R]/MediaBox[0 0 595 792]/Type/Pages>>
endobj
3 0 obj
<</BaseFont/Helvetica/Encoding/WinAnsiEncoding/Subtype/Type1/Type/Font>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 101>>
stream
xœ*Tp
QÐw3P04Ò30PISp
Q01
à˜kdf¢ga¬`bhâ%ç‚ô(„”#©Aîè"EéÚlA
HW‘‚†GjNN¾Bx~QNŠ¢¦BHÈÞ## ÿÿFå
endstream
endobj
5 0 obj
<</Contents 4 0 R/CropBox[0 0 595 792]/MediaBox[0 0 595 792]/Parent 2 0 R/Resources<</Font<</F0 3 0 R>>>>/Type/Page>>
endobj
6 0 obj
<</CreationDate(D:20220600600709+01'00')/ModDate(D:20220600600709+01'00')/Producer(me 2)>>
endobj
xref
0 7
0000000000 65536 f
0000000016 00000 n
0000000062 00000 n
0000000136 00000 n
0000000225 00000 n
0000000395 00000 n
0000000529 00000 n
trailer
<</Size 7/Info 6 0 R/Root 1 0 R/ID[<A2A0CE5CCD9D0DABD5845AD574BF0A5C><09BF9D281BE12CB5B5933BB2B62B0D4D>]>>
startxref
636
%%EOF
P.S I deliberately added a non valid item so is intentionally not the minimum working answer, see if you can work out what's clearly wrong:-)
My personal offering
So I am often asked how to write plain text templated PDFs thus need the font to be static (Helvetica or Courier should do) and a structure that is easy to modify using windows CMD line, so this suits my purpose its now 698 bytes as shown with two place holders to show multi-line so if needed can find and replace Helvetica with Courier (note intentional 2 spaces after to keep byte count)
%PDF-1.1
%âã
1 0 obj
<</Type/Catalog/Pages<</Type/Pages/Count 1/Kids[2 0 R]>>>>
endobj
2 0 obj
<</Type/Page/Parent 1 0 R/MediaBox[0 0 594 792]/Resources<</Font<</F1 3 0 R>>/ProcSet[/PDF/Text]>>/Contents 4 0 R>>
endobj
3 0 obj
<</Type/Font/Subtype/Type1/Name/F1/BaseFont/Helvetica>>
endobj
4 0 obj
<</Length 5 0 R>>
stream
BT
/F1 36 Tf
1 0 0 1 255 752 Tm
48 TL
( Hello)'
(World!)'
ET
endstream
endobj
5 0 obj
78
endobj
xref
0 6
0000000000 65536 f
0000000017 00000 n
0000000094 00000 n
0000000228 00000 n
0000000302 00000 n
0000000425 00000 n
trailer
<</Size 6/Info <</CreationDate(D:2023)/Producer(cmd2pdf)/Title(mini.pdf)>>/Root 1 0 R>>
startxref
446
%%EOF
To see how this approach works in windows command line RIGHT CLICK and download as text https://github.com/GitHubRulesOK/MyNotes/raw/master/MAKE-PDF.cmd (now 200 lines long!) NOTE browser security may ask you to trust a cmd as download thus use .txt extension and you will still need to change properties to UNBLOCK once you are happy it should do no harm to run it!
#mkl are you up for producing your best shot ?
According to this Ange Albertini lecture, the smallest possible valid PDF is 36 bytes:
%PDF-(NULL)trailer<</Root<</Pages<<>>>>>>
Where (NULL) is the unprintable ASCII 0 character.
However, as Ange notes, while this PDF is technically valid, most PDF reader apps will regard it as invalid based on the size alone, thus failing to open it.
I needed a PDF version which is usable by a PDF converter (A4 format issue.. all the above constructs worked with Adobe Reader and Chrome, but not with the PDF converter which required DIN A4).
I found this site and this PDF worked fine with the PDF converter I'm using: https://help.callassoftware.com/m/73261/l/798383-how-to-create-a-simple-pdf-file
Working for a PDF related company, I know that the following content will be working pretty well. This is a valid empty A4 page:
%PDF-1.4
%âãÏÓ
5 0 obj
<<
/Length 1
>>
stream
endstream
endobj
4 0 obj
<<
/Type /Page
/MediaBox [0 0 612 792]
/Resources <<
>>
/Contents 5 0 R
/Parent 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/Kids [4 0 R]
/Count 1
>>
endobj
1 0 obj
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
3 0 obj
<<
/Creator (PDF Creator http://www.pdf-tools.com)
/CreationDate (D:20150701112447+02'00')
/ModDate (D:20220607183602+02'00')
/Producer (3-Heights\222 PDF Optimization Shell 6.0.0.0 \(http://www.pdf-tools.com\))
>>
endobj
xref
0 6
0000000000 65535 f
0000000226 00000 n
0000000169 00000 n
0000000275 00000 n
0000000065 00000 n
0000000015 00000 n
trailer
<<
/Size 6
/Root 1 0 R
/Info 3 0 R
/ID [<1C3500CA9F7232B97E0EF3F789E8B7F2> <254C8D153F655D49945EAD68D801E011>]
>>
startxref
505
%%EOF
Now using Javascript, you can embed this into your js bundle. First encode in base64 the content above, then use the encoded string and create a Blob file with it by writing:
const str = 'JVBERi0xLjQKJcOiw6PDj8OTCjUgMCBvYmoKPDwKL0xlbmd0aCAxCj4+CnN0cmVhbQogCmVuZHN0cmVhbQplbmRvYmoKNCAwIG9iago8PAovVHlwZSAvUGFnZQovTWVkaWFCb3ggWzAgMCA2MTIgNzkyXQovUmVzb3VyY2VzIDw8Cj4+Ci9Db250ZW50cyA1IDAgUgovUGFyZW50IDIgMCBSCj4+CmVuZG9iagoyIDAgb2JqCjw8Ci9UeXBlIC9QYWdlcwovS2lkcyBbNCAwIFJdCi9Db3VudCAxCj4+CmVuZG9iagoxIDAgb2JqCjw8Ci9UeXBlIC9DYXRhbG9nCi9QYWdlcyAyIDAgUgo+PgplbmRvYmoKMyAwIG9iago8PAovQ3JlYXRvciAoUERGIENyZWF0b3IgaHR0cDovL3d3dy5wZGYtdG9vbHMuY29tKQovQ3JlYXRpb25EYXRlIChEOjIwMTUwNzAxMTEyNDQ3KzAyJzAwJykKL01vZERhdGUgKEQ6MjAyMjA2MDcxODM2MDIrMDInMDAnKQovUHJvZHVjZXIgKDMtSGVpZ2h0c1wyMjIgUERGIE9wdGltaXphdGlvbiBTaGVsbCA2LjAuMC4wIFwoaHR0cDovL3d3dy5wZGYtdG9vbHMuY29tXCkpCj4+CmVuZG9iagp4cmVmCjAgNgowMDAwMDAwMDAwIDY1NTM1IGYKMDAwMDAwMDIyNiAwMDAwMCBuCjAwMDAwMDAxNjkgMDAwMDAgbgowMDAwMDAwMjc1IDAwMDAwIG4KMDAwMDAwMDA2NSAwMDAwMCBuCjAwMDAwMDAwMTUgMDAwMDAgbgp0cmFpbGVyCjw8Ci9TaXplIDYKL1Jvb3QgMSAwIFIKL0luZm8gMyAwIFIKL0lEIFs8MUMzNTAwQ0E5RjcyMzJCOTdFMEVGM0Y3ODlFOEI3RjI+IDwyNTRDOEQxNTNGNjU1RDQ5OTQ1RUFENjhEODAxRTAxMT5dCj4+CnN0YXJ0eHJlZgo1MDUKJSVFT0Y=';
const blob = new Blob([atob(str)], { type: 'application/pdf' });
In Java, use this:
private static String samplepdf = "255044462D312E0D747261696C65723C3C2F526F6F743C3C2F50616765733C3C2F4B6964735B3C3C2F4D65646961426F785B302030203320335D3E3E5D3E3E3E3E3E3E";
and then
byte[] bytes = hexStringToByteArray(samplepdf);
...
public byte[] hexStringToByteArray(String s) {
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
+ Character.digit(s.charAt(i + 1), 16));
}
return data;
}