What Encoding is Used in this PDF Metadata? - pdf

I'm looking at the binary of Adobe's PDF Reference document, and I'm wondering encoding is being used in the values of the metadata here:
<<
/Producer <30B9883671A1867F59929DEDF9AF32BC0029CF5414D3744A3273BCA8E7319382EA151980>
/Subject <30BE953B76E0A2306F8F8FFBFCA67E9D1D6A8F17418D200C1B6EEE88E726DAC4CE3E2CC1>
/Creator <37A89B34768D93347889CEAFBEF3>
/Title <219EBC7941A5943A6F9E80FAF5EF7E8D1A60881E04A630452968F38B>
/Author <30BE953B76E0A1266E8F8BF4E3E317B71166880A4B9135583865>
/ModDate <35E0C86923F1C36E2FC2DEA0A1F56BEF5F39C25D14D373>
/CreationDate <35E0C86923F1C36E2CCCDFAEA1F36EE128>
>>
So far, I can't find anything in the documentation or the ISO standard about this, and this is the only PDF I've seen so far with encoded metadata values.
Any ideas?

It is standard encoding but the text strings have been encrypted. See 3.5 Ecryption in that same reference guide.
When inspecting a PDF, you should always start with reading the trailer dictionary (see 3.4.4 File Trailer). In your document this contains an /Encrypt key:
<<
/Size 31667
/ID [<19574527ECBF00E3EC0373879833EEF6> <24EE9EDB7DE40DB862FDB4C5D3493585>]
/Info 7 0 R
/Root 1 0 R
/Encrypt 31666 0 R
>>
which is "required if document is encrypted".

Related

PDF Signature: "Expected a dict object"

I'm creating a library for digitally signing a PDF document. During my quest I stumbled upon an other problem.
In Acrobat I'm getting the error:
Error during signature verification.
Adobe Acrobat error.
Expected a dict object.
I know it expects a dictionary object somewhere. But I have no idea where.
This problem shows up when I add the image to the AP of the signature.
For this I'm basing my implementation on the spec, and " Insert multiple digital approval signatures without invalidating the previous one "
Most of this seems to work correctly, but when the image is present it results in the error. The image is correctly visible.
Current working:
(This is a very short overview of the part where the error is, it might be slightly different, but hope this helps)
I update the signature annotation. Add link to object that contains normal appearance.
16 0 obj
<<
/Type/Annot
/Subtype/Widget
...snip...
/AP<<
/N 21 0 R
>>
>>
Add image as XObject
20 0 obj
<<
/Type/XObject
/Subtype/Image
...snip...
/Length 29569
>>
stream
...snip...
endstream
endobj
Add XObject (Normal appearance)
21 0 obj
<<
/Type/XObject
/Subtype/Form
/Resources<<
/XObject<<
/UserSignature272 20 0 R
>>
>>
/BBox[0 0 135 37.5]
/Length 44
>>stream
q
135 0 0 37.5 0 0 cm
/UserSignature272 Do
Q
endstream
endobj
I think the problem happens somewhere in obj (21 0), but I'm not sure.
Here is a minimal file that can be used for testing.
https://drive.google.com/file/d/17sdz2xJy3VhN6i9YiuPrJ6x2s5kU2sra/view?usp=sharing
Any help, or hints would be welcome.
(This post is a continuation of PDF Digital Signature has "Bad parameter" in Acrobat, but is about a different problem, same subject area.)
You're running into a bug of Adobe Acrobat here: If you display a XObject from inside your signature appearance stream, it expects that XObject to have a Resources entry. This may make sense in case of form XObjects but it doesn't for image XObjects like in your case.
A work around is to add an empty Resources dictionary to your image XObject.
I checked this by replacing the /BBox[1 0 0 1 0 0] in your image XObject (which is not needed there anyways) by /Resources<< >>.
When Adobe Acrobat creates its own signature appearances, it creates a hierarchy of form XObjects here with Resource dictionaries all over including those for the "layers". I assume Adobe Reader, seeing the Do operator attempts to collect information on such "layers", not expecting to immediately be confronted with an image XObject.

How to correctly declare FontSet in PSD EngineData?

I'm trying to generate PSD file in my application, with some text layers (TySh). Its EngineData format is pretty simple, but unfortunately have no documentation, and i got stuck with FontSet field:
/FontSet [
<<
/Name (ADomIno)
/Script 8
/FontType 1
/Synthetic 3
>>
<<
/Name (ADomIno)
/Script 8
/FontType 1
/Synthetic 0
>>
<<
/Name (AdobeInvisFont)
/Script 0
/FontType 0
/Synthetic 0
>>
<<
/Name (MyriadPro-Regular)
/Script 0
/FontType 0
/Synthetic 0
>>
]
This is Photoshop-generated data, i only omitted UTF-16 for easy-reading.
So... I can't understand, why font a_DomIno written as ADomIno. Why some fonts has "-Regular" suffix, but some has not. What means "Script", "FontType" and "Synthetic" fields. Why some fonts has two records with different fields, but other ones - only one.
There's no info in Adobe PSD format documentation, Photoshop Scripting Reference and Photoshop SDK. Projects like psd.rb or psd.js targets to parse file and has no useful info too.
Maybe someone knows?

iText PDF fails with message "Dictionary key endstream is not a name"

The issue is the same as reported here.
I have taken this image and converted to this PDF using GraphicsMagick v1.3.26 (build on 2017-07-04):
gm convert itext_banner_InvalidPdfException.jpg itext_banner_InvalidPdfException.pdf
When I try to read it with iText v5.5.12 I get the following exception:
java -cp itextpdf-5.5.12.jar com.itextpdf.text.pdf.parser.PdfContentReaderTool itext_banner_InvalidPdfException.pdf
com.itextpdf.text.exceptions.InvalidPdfException: Rebuild failed: Dictionary key endstream is not a name. at file pointer 1197; Original message: Dictionary key endstream is not a name. at file pointer 1197
at com.itextpdf.text.pdf.PdfReader.readPdf(PdfReader.java:764)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:197)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:235)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:223)
at com.itextpdf.text.pdf.PdfReader.<init>(PdfReader.java:213)
at com.itextpdf.text.pdf.parser.PdfContentReaderTool.listContentStream(PdfContentReaderTool.java:200)
at com.itextpdf.text.pdf.parser.PdfContentReaderTool.main(PdfContentReaderTool.java:249)
Questions:
What exactly is wrong with given PDF? It seems like there is an issue in GhostScript which is used indirectly by GraphicsMagick.
When I open it with iText RUPS v5.8.8, it does not print any warnings to Console tab. Does it mean it is valid from iText RUPS point of view?
Your PDF contains this broken object:
11 0 obj
<<
endstream
endobj
The opening << is closed by a endstream. This does not match.
If that object was meant to be a mere dictionary, it should have looked like this:
11 0 obj
<<
[a reasonable number of dictionary entries]
>>
endobj
If that object was meant to be a stream, it should have looked like this:
11 0 obj
<<
[a reasonable number of dictionary entries]
>>
stream
[stream data]
endstream
endobj
BTW, the object in question is not referenced from any other object in the PDF. If you open the PDF in a PdfReader in partial mode, therefore, the issue will be ignored.

How is password removed from a pdf file programmatically?

One of password protected PDF I encountered has trailer and encryption dictionary as follows:
Trailer Dictionary:
trailer
<<
/Encrypt 64 0 R
/Info 65 0 R
/Root 63 0 R
/Size 66
/ID [xxxxxxxx]>>
Encryption Dictionary:
64 0 obj
<<
/R 3
/P -3904
/O (xxxxxxxxxxxxx)
/Filter /Standard
/Length 128
/V 2
/U (/xxxxxxxxxxxxx) >>
endobj
In comments the OP clarified that by not using any software he meant
Any software is also a code by which we remove password. I want internal working of that code i.e how that software is removing password, what it is actually doing internally.
Thus, this question is not about manually removing PDF password protection but about understanding how PDF password protection is removed programmatically.
PDF passwords are applied by encryting nearly all strings and streams in the PDF and adding the information the OP already identified. Consequentially PDF passwords are removed by decrypting the formerly encrypted strings and streams in the PDF and removing the added information.
The details of this are explained in section 7.6 Encryption in the PDF specification ISO 32000-1 and are too extensive for an answer on stackoverflow. Fortunately Adobe has provided a free copy of that specification only missing the ISO logo and copyright notices here in which one can study the section in question and more.

PostScript PDF (1.7), manually writing code

I'm trying to manually write a simple PDF file that contains a title, some text, and an image. I found one example of a manually written "Hello world" and managed to change some things, but I cant get it working for another text object. I have looked for help on the internet but with no luck, I guess not many people write their own PDF files.
This is what I have so far:
%PDF-1.7
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
>>
/Contents 4 0 R
>>
endobj
4 0 obj % page content
<<
/Length 20
>>
stream
BT
80 180 TD
/F1 14 Tf
(PDF) Tj
ET
endstream
endobj
5 0 obj % page content
<<
/Length 20
>>
stream
BT
50 70 TD
/F1 14 Tf
(this is a pdf) Tj
ET
endstream
endobj
trailer
<<
/Size 6
/Root 1 0 R
>>
startxref
492
%%EOF
I have tried adding another text object with "this is a pdf" text but it wont show up, I don't know what could be wrong, I tried changing a few things but with no luck. The image part I don't have it either, so some help with that would be nice.
This is a wiki about the "hello world" pdf I found:
http://www.gnupdf.org/Introduction_to_PDF
Adobe offers some explanation on how the pdf works but I cant find anything that would fix my problem:
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
This is not a valid PDF. If Acrobat opens it at all it's because it's given up on the xref table and done a full scan of the file, but your PDF is invalid. 4 0 obj is not a font, as you specified, and 5 0 obj is not accessed from anywhere.
PDF specification requires an xref table which points to the exact position in the file for each object. You can't realistically write this by hand unless you intend to manually update the entire xref table every time you add or remove even 1 byte from the file.
You can write a PDF from scratch like this from code easily enough but it will not work to just open a PDF in notepad and start changing things because the index (xref) immediately becomes corrupt.
I'd also advise against putting comments throughout the file unless the comments start on new lines. Otherwise some PDF parsers will get confused as this is generally not expected. Usually PDF files do not contain comments (with the exception of the second line, which is recommended by Adobe to be a comment of some non-ASCII characters so FTP recognizes the file as binary) seeing as they are virtually impossible to write manually anyway.
http://www.adobe.com/devnet/pdf/pdf_reference.html
A few years ago, I wrote a book which covers exactly this sort of thing:
http://www.amazon.com/PDF-Explained-John-Whitington/dp/1449310028/
No free online version, I'm afraid. You can get all the same information from Adobe's own documentation, which is free, but it's a rather long document!