Is it possible to define clickable links in a plainTeX document when compiled with pdftex?
As far as I can see there is no support in plainTeX for this feature.
For creating clickable links in pdf documents generated from a TeX (plainTex) document you can use this code:
\newif\ifpdfmode
\ifx\pdfoutput\undefined
\else
\ifnum\pdfoutput>0 \pdfmodetrue\fi
\fi
\def\url#1{%
% turn off the special meaning of ~ inside \url{}.
\begingroup\catcode`\~=12\catcode`\_=12\relax
\ifpdfmode
\pdfstartlink user{
/Subtype /Link
% w/o this you get an ugly box around the URL.
/Border [ 0 0 0 ] % radius, radius, line thickness
/A <<
/Type /Action
/S /URI
/URI (https://#1)
>>
}%
{\tt#1}%
\pdfendlink{}%
\else
%{\tt https\negthinspace:\negthinspace/\negthinspace/#1}%
{\tt#1}%
\fi
\endgroup}
that you can save in a file named lib/url.sty.
Note that it injects some pdf code because links are not natively supported by TeX (even when using the pdftex compiler).
Once done it's just a question of using the macro url in your TeX code:
\input lib/url.sty
My preferred site is \url{stackoverflow.com}!
The macro \url works also when the document is not compiled with pdftex. In this case the conditional ifpdfmode will be set to false by the compiler and a simple text formatted with the \tt font will be rendered instead.
You can find a "live" example here: https://github.com/madrisan/cv
Related
I have converted a pdf-file using ghostscript to pdf/A-2 using
gs -dPDFA=2 -dPrinted=false -dBATCH -dNOPAUSE -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=file_PDFA-2.pdf file.pdf
I have used ghostscript version 9.26 on Ubuntu 18.04.5 LTS.
I use Acrobat Reader to display the pdf-document in double page view. Before using ghostscript, the first page is correctly show to the right (see figure). The ghostscript command apparently modifies this, which makes the document harder to read.
Is there a way in ghostscript to modify this setting? Unfortunately, I couldn't find an option. Using other freely available tools to modify the settings after using ghostscript would also work.
Edit: link to the pdf-files before and after the conversion:
link to the pdf-files
Thank you.
So looking at the before PDF file I see, in the document Catalog (after decompressing the file):
28 0 obj
<<
/Type /Catalog
/Pages 12 0 R
/Names 27 0 R
/PageMode /UseOutlines
/PageLayout /TwoPageRight
/OpenAction 4 0 R
>>
endobj
And in the after file I see the Catalog only contains:
1 0 obj
<<
/Type /Catalog
/Pages 3 0 R
/Metadata 24 0 R
>>
endobj
So it seems likely that the '/PageLayout' key having the value '/TwoPagesRight' is what causes Acrobat to display the way it does. Noe the /PageMode /UseOutlines which appears to be pointless since the PDF file doesn't actually seem to contain any Outlines.
We can use a /PUT pdfmark to add entries to dictionaries, and that is supported by both the pdfwrite device and the interpreter.
[ {Catalog} << /PageLayout /TwoPageRight >> /PUT pdfmark
Should work, but unfortunately does not for me.
Disclaimer,
I support SumatraPDF and am taking the question at face value
"How to modify the display settings of a pdf file in double-page
view?"
plus the stated desire
"Using other freely available tools to modify the settings after using
ghostscript would also work."
So without rotation the pages can be displayed 4 different layouts in a viewer. here are the main 3
Facing Manga (RTL) Cover (Book)
I can manage to create a PDF/A-3 using Ghostscript's PDFA_def.ps file from a normal PDF, but similar to this answer for per-page embedded files, any non-PDF embedded attachments are stripped.
I found a way to embed files in this SO post, but it generates a PDF that fails PDF/A-3 validation. veraPDF reports the following 4 errors:
The additional information provided for associated files as well as the usage requirements for associated files indicate the relationship between the embedded file and the PDF document or the part of the PDF document with which it is associated.
CosFileSpecification
isAssociatedFile == true
In order to enable identification of the relationship between the file specification dictionary and the content that is referring to it, a new (required) key has been defined and its presence (in the dictionary) is required.
CosFileSpecification
AFRelationship != null
The MIME type of an embedded file, or a subset of a file, shall be specified using the Subtype key of the file specification dictionary. If the MIME type is not known, the "application/octet-stream" shall be used.
EmbeddedFile
Subtype != null && /^[-\w+.]+/[-\w+.]+$/.test(Subtype)
The file specification dictionary for an embedded file does not contain either F or EF key.
CosFileSpecification
EF_size == 0 || (F != null && UF != null)
How can I either avoid stripping or just re-attach embedded whole-document files to make a valid PDF/A-3B file via Ghostscript?
This requires modifications to the example PDFA_def.ps script. My answer is based on the Ghostscript 9.27 and the version packaged with Debian 10, found at /usr/share/ghostscript/9.27/lib/PDFA_def.ps or in the Ghostscript repository. You can use an updated version, it should function similarly. I will assume you have successfully edited the file to point to the correct path for the color profile (I will use ArgyllCMS's sRGB module)
We can fairly easily get any file embedded by following existing pdfmark embedding tutorials, such as this email or this related question (as you found). Note that the namespace push/pop isn't important, so I copied to the end of my PDFA_def.ps file this postscript code:
[/_objdef {fstream} /type /stream /OBJ pdfmark % create an object as a stream
[{fstream} << /Type /EmbeddedFile >> /PUT pdfmark % sets the stream's type
% use one of these two options to define the contents of the stream
%[{fstream} (Alternatively, inline text) /PUT pdfmark
[{fstream} MyFileToEmbed (r) file /PUT pdfmark
% define the embedded file fia the /EMBED pdfmark option
[/Name (my.txt)
/FS <<
/Type /Filespec
/F (my.txt)
/EF <<
/F {fstream}
>>
>> /EMBED pdfmark
[{fstream} /CLOSE pdfmark % close streams when done
Note that I used a file read from MyFileToEmbed, which must be defined on the gs command line as -sMyFileToEmbed=/path/to/my/file.txt. If you just want plain text, uncomment the first option (and remove the second line that references MyFileToEmbed). Spacing isn't generally important either.
This is presumably where you are, and this embeds the file, but isn't a valid PDF/A-3 as you say. Lets look at each of the errors in turn:
The file specification dictionary for an embedded file shall contain the F and UF keys
This is the easiest to take care of, simply add a /UF key to the /FS dictionary.
First, however a brief primer on the PostScript syntax (at least as used in this answer), as it's rather unusual, and even StackOverflow lacks syntax highlighting (I used the LaTeX highlighter in this post).
PostScript (PS) is a stack based language that (roughly speaking) executes right to left. What C-style languages would express as file("r", MyFileToEmbed) is MyFileToEmbed (r) file. Comments start with %, and strings are not quoted but in parentheses. /foo is a name, roughly equivalent to :foo in Ruby, or 'foo in a lisp. << ... >> is a dict, like { ... } in modern scripting languages. Here, {foo} is instead a pdfmark named object1, and the pdfmark lines we will use start with a mark: [ (See the PostScript or pdfmark spec for more details).
With that knowledge in mind, we can update the code for /UF (spec page 102):
/F (my.txt)
/UF (my.txt) % Add this. Unicode File, defined the spec at Table 44, 7.11.3
/EF <<
Running veraPDF shows we are now down to 3 errors, though we are duplicating (my.txt). Lets make a variable for it, either by adding a gs argument -sMyFileName=my.txt, or, anywhere before usage:
/MyFileName (my.txt) def
% snip...
/F MyFileName % update to the varible
/UF MyFileName
% snip...
Now lets tackle the next error:
In order to enable identification of the relationship between the file specification dictionary and the content that is referring to it, a new (required) key has been defined and its presence (in the dictionary) is required.
This one is also straightforward, the /FS needs an /AFRelationship key (I can't find the spec, but here's the notes, page 6). That value can be:
Source, Data, Alternative, Supplement, EncryptedPayload, FormData, Schema or Unspecified. Custom values may be used where none of these entries is appropriate.
I will use Supplement here, but pick whichever is most appropriate for whatever you are embedding:
/F MyFileName
/UF MyFileName
/AFRelationship /Supplement % These lines can be in any order, but the key must be before the value
/EF <<
Checking with veraPDF, we are down to 2 errors. Nice! Next error to tackle:
The MIME type of an embedded file, or a subset of a file, shall be specified using the Subtype key of the file specification dictionary. If the MIME type is not known, the "application/octet-stream" shall be used.
This is slightly trickier. MIME types have a / in them, and since spaces don't really matter in PS, /Type/Filespec is just as valid as /Type /Filespec. Thus, as the mime type must be a name, not a string, we can't simply say /text/plain. Instead we'll need to use the cvn function (spec, page 402), which converts strings to names (like (quote) in lisps, or to_sym in Ruby). Note that this parameter goes on the stream object's dictionary itself (spec, page 104, table 45):
[{fstream} << /Type /EmbeddedFile
/Subtype (text/plain) cvn % Our new addition (can be on the same line as above)
>> /PUT pdfmark
% equivalent to the Ruby-syntax: { :Type => :EmbeddedFile, :Subtype => cvn("text/plain") }
Down to our final A-3 validation error, and this one is a bit trickier:
The additional information provided for associated files as well as the usage requirements for associated files indicate the relationship between the embedded file and the PDF document or the part of the PDF document with which it is associated.
We need to split the EMBED from the definition, and update the document /Catalog dictionary to also point to the definition with the /AF key (notes, page 6). Create a new pdfmark dict object, and refactor the code:
[/_objdef {myfileinfo} /type /dict /OBJ pdfmark % create new object
% assign the new pdfmark object
[{myfileinfo} <<
/Type /Filespec
/F MyFileName
/UF MyFileName
/AFRelationship /Supplement
/EF <<
/F {fstream}
>>
>> /PUT pdfmark % refactored out of the following line
[/Name MyFileName /FS {myfileinfo} /EMBED pdfmark % updated embed line
% This line was moved from the end of the original PDFA_defs.ps to after our attachment code
[{Catalog} <</OutputIntents [ {OutputIntent_PDFA} ] /AF [{myfileinfo}] >> /PUT pdfmark
Now running this through Ghostscript we get a veraPDF-accepted valid PDF/A-3B with an arbitrary document attachment!
For completeness, here is the whole modified PDFA_def.ps file, and the script I used to run it with. Note I've replaced most constants with variables. For multiple attachments you can add more copies of the code we added with more {fstream} and {myfileinfo} objects (with different names, obvously).
The final full listing of our modified PDFA_def_attach.ps:
%!
% This is a modified version of the Ghostscript 9.27 PDFA_def.ps file with
% that creates a PDF/A-3 file with an embedded attachment
% Define entries in the document Info dictionary :
/ICCProfile (/usr/share/color/argyll/ref/sRGB.icm) % Customize
def
[ /Title (My PDF/A-3 with an embedded attachment) /DOCINFO pdfmark % Customize
% Define an ICC profile :
[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA}
<<
/N currentpagedevice /ProcessColorModel known {
currentpagedevice /ProcessColorModel get dup /DeviceGray eq
{pop 1} {
/DeviceRGB eq
{3}{4} ifelse
} ifelse
} {
(ERROR, unable to determine ProcessColorModel) == flush
} ifelse
>> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark
% Define the output intent dictionary :
[/_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark
[{OutputIntent_PDFA} <<
/Type /OutputIntent % Must be so (the standard requires).
/S /GTS_PDFA1 % Must be so (the standard requires).
/DestOutputProfile {icc_PDFA} % Must be so (see above).
/OutputConditionIdentifier (sRGB) % Customize
>> /PUT pdfmark
% New code starts here.
% If you want to not use Ghostscript command line arguments,
% then uncomment these variable definitions
%/MyFileName (my.txt) def % alternative to -sMyFileName=my.txt
%/MyFileToEmbed (/path/to/my/file.txt) def % alternative to -sMyFileToEmbed=/path/to/my/file.txt
%/MyMimeType (text/plain) def % alternative to -sMyMimeType=text/plain
% Define the embedded file objects
[/_objdef {myfileinfo} /type /dict /OBJ pdfmark
[/_objdef {fstream} /type /stream /OBJ pdfmark
% Load the file to embed
[{fstream} MyFileToEmbed (r) file /PUT pdfmark
% assign the stream information
[{fstream} <<
/Type /EmbeddedFile
/Subtype MyMimeType cvn
% /Params << % Optional, see Table 46, page 104 for options
% /Size 1234 % or use a -dMyVarName flag. -d defines numbers, -s Strings
% /ModDate (D:20211216) % see section 7.9.4, page 87 for full date format
% % etc...
% >>
>> /PUT pdfmark
% assign the file information
[{myfileinfo} <<
/Type /Filespec
% /Desc (My Optional Description) % optional, see page 103, table 44
/F MyFileName
/UF MyFileName
/AFRelationship /Supplement
/EF <<
/F {fstream}
>>
>> /PUT pdfmark
% Embed the stream
[/Name MyFileName /FS {myfileinfo} /EMBED pdfmark
[{fstream} /CLOSE pdfmark
% Updated last line from the original PDFA_defs.ps
[{Catalog} <</OutputIntents [ {OutputIntent_PDFA} ] /AF [{myfileinfo}] >> /PUT pdfmark
The command line (using GS 9.27):
gs -dPDFA=3 -sColorConversionStrategy=RGB -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=1 -dPDFSETTINGS=/default -dAutoRotatePages=/All -sMyFileToEmbed=/tmp/test.png -sMyMimeType=image/png -sMyFileName=the_name_you_see_in_reader.png -dNOPAUSE -dBATCH -dQUIET -o output-a3.pdf PDFA_def_attach.ps input.pdf
(Note in more recent Ghostscript versions you'll need to add --permit-file-read=/tmp/test.png)
1 Technically, as KenS helpfully pointed out in the comments, it's actually a PostScript procedure used in a non-standard way, but as this answer won't be using procedures outside of pdfmark named objects (excluding the already-written parts of PDFA_def.ps), I've glossed over that implementation detail here in favor of the name used in the pdfmark reference manual.
I'm trying to build a PDF file with a link to an external file.
I'm using the spec https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
On page 348 there is an example of image with an alternate image loaded remotely. When I create a document with the example from the doc, the reader (using acrobat reader XI) doesn't fetch the image. There is no error message but no request is being made (checked using wireshark).
Can I have only a remote image (ie no "normal" image and alternate image).
Is there an example somewhere of a full document using that /FS /URL syntax (ie not just the objects)? I couldn't find any that actually does the request.
Thanks
Edit:
I used LibreOffice to create the base document with a single 1x1 pixel.
http://pastebin.com/5GqCYgMp
I initially created my test document with Acrobat but the output was really messy.
Then replaced the stream with the example from the pdf spec, and tried to fix the startxref offset, but not sure it's correct.
http://pastebin.com/BT42g02P
This document is currently not opening correctly, but I tried to get a minimum test case. My previous attempts were displayed with no errors only by luck (but the remote image didn't work anyway).
Is there any tool that actually allows the creation of XObject with /URL? I don't know the file format enough to create them reliably by hand.
First of all,
I'm using the spec https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
I would recommend not using a PDF reference but instead the ISO standard. The Adobe PDF references are not normative in nature while the ISO standard is. (The actual content differences are minute but if there is a normative spec, one should use it.) Adobe also publishes a copy of the ISO standard with merely the header exchanged.
Then, please don't treat PDFs as text documents. E.g. by sharing them on pastebin, you make them subject to treatment as text which essentially destroys the content.
That all been said, let's look at your actual issue:
In your sample PDF you have:
4 0 obj
<</Type/XObject/Subtype/Image/Width 1/Height 1/BitsPerComponent 8/Length 0/F << /FS /URL
/F ( https://upload.wikimedia.org/wikipedia/commons/c/ca/1x1.png )
>>/Filter/FlateDecode/ColorSpace/DeviceRGB
>>
stream
endstream
endobj
This indicates that the PDF viewer shall find at the URL https://upload.wikimedia.org/wikipedia/commons/c/ca/1x1.png a file containing an array of 1 (/Width 1/Height 1) RGB (/ColorSpace/DeviceRGB) sample with 1 byte per color (/BitsPerComponent 8), cf. section 8.9.5 Image Dictionaries of ISO 32000-1.
I doubt your file fulfills that, I assume it actually is a PNG file in particular with a PNG structure, not the structure explained above.
PDF does not support the PNG format as is, you have to transform the data. It does support, though, the JPEG format using the /FFilter /DCTDecode which is why the sample from the specification
16 0 obj
<< /Type /XObject
/Subtype /Image
/Width 1000
/Height 2000
/ColorSpace /DeviceRGB
/BitsPerComponent 8
/Length 0 % This is an external stream
/F << /FS /URL
/F (http://www.myserver.mycorp.com/images/exttest.jpg)
>>
/FFilter /DCTDecode
>>
stream
endstream
endobj
makes it look so easy.
I am running into an issue with PNG to PDF conversion.
Actually I have big PNG files not in size but in contents.
In PDF conversion it creates a big PDF files. I don't have any issue with its quality, but whenever I try to open this PDF in PDF viewer, it opens in "Fit to Page" mode.
So, I can't see the created PDF in the initial view, but I need to zoom it up to 100%.
My question is: can I create a PDF which will always open at zoom 100% ?
You can possibly achieve what you want with the help of Ghostscript.
Ghostscript supports to insert PostScript snippets into its command line parameters via -c "...[PostScript code here]...".
PostScript has a special operator called pdfmark. This operator is not understood by most PostScript interpreters, but is understood by Acrobat Distiller and (for most of its parameters) also by Ghostscript when generating PDFs.
So you could try to insert
-c "[ /PageMode /UseNone /Page 1 /View [/XYZ null null 1] \
/PageLayout /SinglePage /DOCVIEW pdfmark"
into a PDF->PDF conversion Ghostscript command line.
Please take note about various basic things concerning this snippet:
The contents of the command line snippet appears to be 'unbalanced' regarding the [ and ] operators/keywords. But it is not! The initial [ is balanced by the final pdfmark keyword. (Don't ask -- I did not define this syntax...)
The 'inner' [ ... ] brackets delimit an array representing the page /View settings you desire.
Not all PDF viewers do respect the view settings embedded in the PDF file (Acrobat software does!).
Most PDF viewers allow users to override the view settings embedded in PDF files (Acrobat software also does this). That is, you can tell your viewer to never respect any settings from the PDF files it opens, but f.e. to always open it with "fit to width".
Some specific things about this snippet:
The page mode /UseNone means: the document displays without bookmarks or thumbnails. It could be replaced by
/UseOutlines (to display bookmarks also, not just the pages)
/UseThumbs (to display thumbnail images of the pages, not just the pages
/FullScreen (to open document in full screen mode)
The array for the view mode constructed as [/XYZ <left> <top> <zoom>] means: The zoom factor is 1 (=100%), the left distance from the page origin is the special 'null' value, which means to keep the previously user-set value; the top distance from the page origin is also 'null'. This array could be replaced by
/Fit (to adapt the page to the current window size)
/FitB (to adapt the visible page content to the current window size)
/FitH <top>' (to adapt the page width to the current window width);` indicates the required distance from page origin to upper edge of window.
...plus several others I cannot remember right now.
So to change the settings of an existing PDF file, you could do the following:
gs \
-o out.pdf \
-sDEVICE=pdfwrite \
-c "[ /PageMode /UseNone /Page 1 /View [ /XYZ null null 1 ] " \
-c " /PageLayout /SinglePage /DOCVIEW pdfmark" \
-f in.pdf
To check if the Ghostscript command worked, open the PDF in a text editor which is capable of handling binary files. Search for the /View or the /PageMode keywords and check if they are there, inserted as values into the PDF root object.
If it worked, check if your PDF viewer honors the settings. If it doesn't honor them, see if there is an overriding setting within the viewers preference settings.
I did a quick test run on a sample PDF of mine. Here is how the PDF root object's dictionary looks now, checked with the help of pdf-parser.py:
pdf-parser-beta.py -s Catalog a.pdf
obj 1 0
Type: /Catalog
Referencing: 3 0 R, 9 0 R
<<
/Type /Catalog
/Pages 3 0 R
/PageMode /UseNone
/Page 1
/View [/XYZ null null 1]
/PageLayout /SinglePage
/Metadata 9 0 R
>>
To learn more about the pdfmark operator, google for 'pdfmark reference filetype:pdf'. You should be able to find it on the Adobe website and elsewhere:
https://www.google.de/search?q=pdfmark%20reference%20filetype%3Apdf&oq=pdfmark%20reference%20filetype%3Apdf
In order to let ImageMagick create a PDF as you want it, you may be able to hack the file defining your delegate settings. For more help about this topic see for example here:
http://www.imagemagick.org/Usage/files/#delegates
PDF specification supports this functionality in this way: create a GoTo action that goes to first page and sets the zoom level to 100% and then set the action as the document open action.
How exactly you implement it in real life depends very much on the tool you use to create the PDF file. I do not know if ImageMagick can create such actions.
I've been wanting to see the insides of a PDF for a while, like, the raw source code of it so I can look at it. Any way of doing that?
Looking at the raw code of PDFs will not serve you much unless you also have an idea about its internal structure. You should get yourself a copy of the official PDF reference (download PDF), and you should have read some introductionary article such as this [gone] or this to begin with.
Even after such a preparation, you'll not discover much useful when staring at the raw code. Because PDFs usually will contain parts which are "filtered" (that means: compressed).
How to look at the real PDF source behind the 'raw' binary parts
Jay Birkenbilt's qpdf is a very useful commandline tool (available for Linux, Mac OSX, Windows, and as source code, under the open source Artistic License), which can unpack most filtered content and re-organize the internal structure in a way that gives you much more insight into it (all objects are numerically ordered, etc.). The commandline to achieve this is:
qpdf --qdf original.pdf unpacked.pdf
Another useful and free tool (GPL licensed, but Linux-only AFAIK) to look into PDFs is of course PDFEdit. This one even comes with a GUI (if you prefer that), while still allowing you access to the internal structure and "raw" PDF code.
If the purpose is just to look into the file, then any simple text editor will do, ex, Notepad. PDF is just a text based format, including embedded content byte streams. Raw PDF looks like this:
>>
/Border [0 0 0]
/Rect [121.02 332.48 363.24 343.64]
/StructParent 1321
/Subtype /Link
/Type /Annot
>>
endobj
64579 0 obj
<<
/Filter /FlateDecode
/Length 5771
>>
stream
Ũn0x/�+�}�ǹ����\֛ bYO�5[��X��W��L��(�������V�A3�C���������u큋_�a��ךm2N�6� ��A��8
�d���NQ⺢GI��G�[��)�̉Y��R�y{R����&�&�;��g�k1���ҋeTC�(W��`���*��(;�AEc<= mnZ+��|T��v
�.��зe�aޞ��V4�b���L����k�Oj.ֿ�y�����kc|I�� ��C�0��Hf�7d�/�z���m��o��A��B��IJ�%�.
!�%f�б���&�ޒ�4Ύ7�l�3���3`�
endstream
endobj
64580 0 obj
<<
/Border [0 0 0]
/Dest <E4AE7DD2769553EF1668>
/Rect [219 648.5 256.8 659.66]
/StructParent 1323
/Subtype /Link
/Type /Annot
>>
What you see are basic COS objects like name, dictionary, stream and so on. All objects are described in PDF 32000 standard, see section 7.3 Objects.
Use a Hex editor. Of course, unless you know the PDF specification (PDF, 8.6 MB), you won't recognize much.
In addition to the qpdf tool conversion into postscript might be helpful.
PDF is a subset of PS. Usually its quite easy to figure out, e.g. where the labels of a graph are. You can either use pdf2ps or invoke ghostscript
gs -sDEVICE=pswrite some.pdf -sOutputFile=some.ps -dNOPAUSE -c quit
When you generate your PDFs using pdflatex you can disable compression with an option. This makes the PDF more readable.
Some more recent observations on the other answers.
Adobe keep moving about their Open Sourced copy of the 2008 standard so currently that is here https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf
The Web Archive have currently a copy here https://ia601003.us.archive.org/5/items/pdf320002008/PDF32000_2008.pdf
They should be identical 22,491,828 bytes so beware neither includes any errata.
A pdf CAN be plain mime "text/pdf" as perfectly ? annotated generated from a console keyboard or command line (too slow) or a batch file. I wont bore you with the whole file but it starts like this:-
REM Start with File "Magic" Signatures for a PDF
echo %%PDF-1.0>!Fname!
echo %%âãÏÓ>>!Fname!
echo %%01) Prepare file references>>!Fname!
for %%Z in (!Fname!) do set "FZ1=%%~zZ"
echo 1 0 obj>>!Fname!
echo ^<^</Names^<^</Dests 2 0 R^>^>/Outlines 3 0 R>>/PageLayout/OneColumn/PageMode/UseOutlines>>!Fname!
REM ToDo add files
REM /Lang (ga-IE)/MarkInfo^<^</Marked true^>^>/Names ^<^<^/EmbeddedFiles [(file.ext) 3 0 R]^>^>>>!Fname!
echo /Pages 4 0 R/Type/Catalog/ViewerPreferences^<^</DisplayDocTitle true^>^>^>^>>>!Fname!
echo endobj>>!Fname!
echo %%02) Prepare Named Destinations>>!Fname!
Thus the annotated RAW PDF (note I had edited the order in the cmd file in preparation for an XMP data section, so not identical) could looks like :-
%PDF-1.3
%âãÏÓ
%01) Prepare file references
1 0 obj
<</Lang(ga-IE)/Names<</Dests 3 0 R>>/Outlines 4 0 R/PageLayout/OneColumn/PageMode/UseOutlines
/PageLabels<</Nums[0<</S/A>>]>>/Pages 5 0 R/Type/Catalog/ViewerPreferences<</DisplayDocTitle true>>>>
endobj
%02) Reserved for big meta data
2 0 obj
<< >>
endobj
%03) Prepare Named Destinations
3 0 obj
<</Names [(Page1) [6 0 R /XYZ 0 792 null] (QRCode) [6 0 R /XYZ 25.0 317.0 1]]>>
endobj
%04) Prepare Outline / Bookmarks
...
...
Many suggestions by others for decompress binary application/PDF into text/PDF and some may be a hybrid thus still have binarized application text.
The 3 most common designed for the task are qpdf (already mentioned, but uses a hybrid QDF) PDFtk (uncompress) and Mutool (different CLI options), that's the one I play with most, as its easy in GL GUI to change the output settings. The output can be modified in MS Notepad, whilst previewing result.
So any text editing script can write or edit a PDF even with graphics, And several applications can convert RAW "binary" PDF into RAW "textual" PDF. However never attempt to edit PDF whilst temporarily in its textual base64 RePrEx (possible, but totally impractical)