I am trying to understand the PDF format, and so far I have seen that images can appear in JPG format (but apparently without the "JFIF" bytes).
Or, is it something different but very similar to JPEG?
I am seeing that images appear reversed vertically.
What is the purpose of this reversion?
Can images appear normally in a PDF stream?
Here is an example from a PDF. As far as I know, this uploaded file is identical to the done contained in the PDF file, packed like this (using NASM assembly data definitions to make it simple):
db "19 0 obj",0x0D
db "<< /Type /XObject /Subtype /Image /Width 367 /Height 475 /BitsPerComponent 8 ",0x0D
db "/ColorSpace 12 0 R /Length 37575 /Filter /DCTDecode >> ",0x0D
db "stream",0x0D,0x0A
incbin "19_0_obj.bin.jpg"
db "endstream",0x0D
db "endobj",0x0D
Related
I am using the qpdf command to view the raw code (source code) of PDF files. Specifically I am using the command:
qpdf --qdf original.pdf unpacked.pdf
However a lot of PDF metadata is encrypted in this unpacked file and has a lot of unprintable ASCII charactars. I am interested in some data of pdf files which is actually encrypted. Assuming that I have the password for the pdf file (say pwd="passwd"), how can I get an output similar to the output of the qpdf command, but where data has been decrypted?
Edit:
An example file is attached in the link. Please check lines 1841 - 3258. Specifically, in the whole file I am not able to find the TransformParams dictionary, although I have added permissions. I believe it may be inside this encrypted text.
Link:
https://www.mediafire.com/file/b7rf383zxdevgmx/unpacked.txt/file
As already assumed in a comment to the question, the PDF file is not encrypted at all.
Please check lines 1841 - 3258
The lines 1841 - 3258 are part of a stream from line 1739 (OTTO...) to 3258 and contain an embedded OpenType font, compare the preceding stream dictionary
57 0 obj
<<
/Subtype /OpenType
/Length 58 0 R
>>
and the font descriptor referring to it:
<<
/Ascent 952
/CapHeight 674
/CharSet (/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/.notdef/space/exclam/quotedbl/numbersign/dollar/percent/ampersand/quotesingle/parenleft/parenright/asterisk/plus/comma/hyphen/period/slash/zero/one/two/three/four/five/six/seven/eight/nine/colon/semicolon/less/equal/greater/question/at/A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R/S/T/U/V/W/X/Y/Z/bracketleft/backslash/bracketright/asciicircum/underscore/grave/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z/braceleft/bar/braceright/asciitilde/bullet/Euro/bullet/quotesinglbase/florin/quotedblbase/ellipsis/dagger/daggerdbl/circumflex/perthousand/Scaron/guilsinglleft/OE/bullet/Zcaron/bullet/bullet/quoteleft/quoteright/quotedblleft/quotedblright/bullet/endash/emdash/tilde/trademark/scaron/guilsinglright/oe/bullet/zcaron/Ydieresis/space/exclamdown/cent/sterling/currency/yen/brokenbar/section/dieresis/copyright/ordfeminine/guillemotleft/logicalnot/hyphen/registered/macron/degree/plusminus/twosuperior/threesuperior/acute/mu/paragraph/periodcentered/cedilla/onesuperior/ordmasculine/guillemotright/onequarter/onehalf/threequarters/questiondown/Agrave/Aacute/Acircumflex/Atilde/Adieresis/Aring/AE/Ccedilla/Egrave/Eacute/Ecircumflex/Edieresis/Igrave/Iacute/Icircumflex/Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Otilde/Odieresis/multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacute/Thorn/germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae/ccedilla/egrave/eacute/ecircumflex/edieresis/igrave/iacute/icircumflex/idieresis/eth/ntilde/ograve/oacute/ocircumflex/otilde/odieresis/divide/oslash/ugrave/uacute/ucircumflex/udieresis/yacute/thorn/ydieresis)
/Descent -250
/Flags 32
/FontBBox [
-157
-250
1126
952
]
/FontFamily (Myriad Pro)
/FontFile3 57 0 R
/FontName /MyriadPro-Regular
/FontStretch /Normal
/FontWeight 400
/ItalicAngle 0
/StemV 88
/Type /FontDescriptor
/XHeight 484
>>
Specifically, in the whole file I am not able to find the TransformParams dictionary, although I have added permissions.
Well, the shared version of the file neither is encrypted (so no permissions have to be applied) nor is it digitally signed (so in particular there are no signature transform methods applied, so no TransformParams are there).
Maybe the information you search have been removed by uncompressing the PDF with qpdf, maybe they weren't there to start with. Thus, you probably should analyze the original file instead. Or you may want to explain your expectations more thoroughly, maybe there is an error in them.
I'm trying to generate PSD file in my application, with some text layers (TySh). Its EngineData format is pretty simple, but unfortunately have no documentation, and i got stuck with FontSet field:
/FontSet [
<<
/Name (ADomIno)
/Script 8
/FontType 1
/Synthetic 3
>>
<<
/Name (ADomIno)
/Script 8
/FontType 1
/Synthetic 0
>>
<<
/Name (AdobeInvisFont)
/Script 0
/FontType 0
/Synthetic 0
>>
<<
/Name (MyriadPro-Regular)
/Script 0
/FontType 0
/Synthetic 0
>>
]
This is Photoshop-generated data, i only omitted UTF-16 for easy-reading.
So... I can't understand, why font a_DomIno written as ADomIno. Why some fonts has "-Regular" suffix, but some has not. What means "Script", "FontType" and "Synthetic" fields. Why some fonts has two records with different fields, but other ones - only one.
There's no info in Adobe PSD format documentation, Photoshop Scripting Reference and Photoshop SDK. Projects like psd.rb or psd.js targets to parse file and has no useful info too.
Maybe someone knows?
I'm trying to manually write a simple PDF file that contains a title, some text, and an image. I found one example of a manually written "Hello world" and managed to change some things, but I cant get it working for another text object. I have looked for help on the internet but with no luck, I guess not many people write their own PDF files.
This is what I have so far:
%PDF-1.7
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
>>
/Contents 4 0 R
>>
endobj
4 0 obj % page content
<<
/Length 20
>>
stream
BT
80 180 TD
/F1 14 Tf
(PDF) Tj
ET
endstream
endobj
5 0 obj % page content
<<
/Length 20
>>
stream
BT
50 70 TD
/F1 14 Tf
(this is a pdf) Tj
ET
endstream
endobj
trailer
<<
/Size 6
/Root 1 0 R
>>
startxref
492
%%EOF
I have tried adding another text object with "this is a pdf" text but it wont show up, I don't know what could be wrong, I tried changing a few things but with no luck. The image part I don't have it either, so some help with that would be nice.
This is a wiki about the "hello world" pdf I found:
http://www.gnupdf.org/Introduction_to_PDF
Adobe offers some explanation on how the pdf works but I cant find anything that would fix my problem:
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
This is not a valid PDF. If Acrobat opens it at all it's because it's given up on the xref table and done a full scan of the file, but your PDF is invalid. 4 0 obj is not a font, as you specified, and 5 0 obj is not accessed from anywhere.
PDF specification requires an xref table which points to the exact position in the file for each object. You can't realistically write this by hand unless you intend to manually update the entire xref table every time you add or remove even 1 byte from the file.
You can write a PDF from scratch like this from code easily enough but it will not work to just open a PDF in notepad and start changing things because the index (xref) immediately becomes corrupt.
I'd also advise against putting comments throughout the file unless the comments start on new lines. Otherwise some PDF parsers will get confused as this is generally not expected. Usually PDF files do not contain comments (with the exception of the second line, which is recommended by Adobe to be a comment of some non-ASCII characters so FTP recognizes the file as binary) seeing as they are virtually impossible to write manually anyway.
http://www.adobe.com/devnet/pdf/pdf_reference.html
A few years ago, I wrote a book which covers exactly this sort of thing:
http://www.amazon.com/PDF-Explained-John-Whitington/dp/1449310028/
No free online version, I'm afraid. You can get all the same information from Adobe's own documentation, which is free, but it's a rather long document!
I've build pdf with pdfbox and by the hand. I have also Visible signature on pdf. everything works, but there is no image and text shown in PDF (but there is visible rectangle, without image and text). what do you think what happens?
can you see the sample?
that's sample
thank you.
I've build pdf with pdfbox and by the hand. [...] there is no image and text shown in PDF (but there is visible rectangle, without image and text).
That is exactly what you built your document and especially the signature related data to do:
3 0 obj
<<
/FT /Sig
/F 132
/T (Signature1)
/Type /Annot
/Subtype /Widget
/V 5 0 R
/P 4 0 R
/Rect [100 574 310 625]
/AP << /N 6 0 R >>
/DR << /XObject << /FRM0 7 0 R >> >>
>>
endobj
6 0 obj
<<
/Type /XObject
/Subtype /Form
/Resources << /XObject << /FRM0 7 0 R >> >>
/BBox [0 0 100 100]
/FormType 1
/Length 8 0 R
>>
stream
endstream
endobj
There is a visible rectangle (actually after selecting the signature in question) because /Rect [100 574 310 625] in your signature field dictionary indicates the rectangular area where you have your signature.
There is no image and text shown in PDF because the normal appearance stream (which according to /AP << /N 6 0 R >> in your signature field dictionary is defined in object 6) is defined as an empty stream (there is nothing but white space between stream and endstream).
Most likely you wanted to place the xobject /FRM0 defined in the resources of the appearance stream. In that case you have the same problem in that xobject:
7 0 obj
<<
/Type /XObject
/Subtype /Form
/Resources << /XObject << /n0 9 0 R /n1 10 0 R >> >>
/BBox [0 0 100 100]
/FormType 1
/Length 11 0 R
>>
stream
endstream
endobj
This stream also is empty, you forgot to place the xobjects /n0 and /n1.
Those xobjects look correctly defined but seem to be copied from samples from the early age of integrated PDF signatures.
Concerning the Adobe Acrobat error message observed by #stanlyF:
Error during signature verification.
Signature contains incorrect, unrecognized, corrupted or suspicious data.
Support Information: SigDict /SubFilter value
The signature value dictionary also is incomplete:
5 0 obj
<<
/Type /Sig
/Name (sig1)
/ByteRange [0 0 0 0]
/Contents <0000...0000>
>>
endobj
The dictionary neither has a /Filter nor a /SubFilter entry. While nominally the filter is required and the subfilter is optional, interoperable signing mostly depends on the subfilter and the filter ist ignored. Thus the Support Information.
The /Name entry is weird because it is specified to contain the name of the person or authority signing the document (if present)
The signed byte range is empty: it consists of two seqgments, both of them starting at offset 0 and being 0 bytes long.
The contained signature container itself consists only of 0x00 bytes.
Acrobat said:
"Error during signature verification.
Signature contains incorrect, unrecognized, corrupted or suspicious data.
Support Information: SigDict /SubFilter value"
Signature has incorrect/incomplete the content-closing marker.
And also /n0 /n1 XObjects in resources have no pdf instructions.
I'm working on a program generating interactive forms into PDF files.
The generated file is here (source is readable). The checkbox is on the bottom of the page. After it gets focus it is rendered correctly (white square with red/blue border), after it lose the focus the square disappears and the default appereance is shown (thats incorrect for me).
in Acrobat 9, X, XI
in build-in chrome pdf viewer it works fine
Adobe XI Pro - preflight - shows warning "Form field has multiple appearances"
I can not find the mistake.
Thanks for your help.
the same (similar) problem described there:
http://forums.adobe.com/message/5144579#5144579
---- here is a part of a pdf file I expect the mistake
2 0 obj
<<
/Type /Catalog
/Pages 1 0 R
/OutputIntents [7 0 R]
/Metadata 8 0 R
/PageLabels 10 0 R
/AcroForm 14 0 R
>>
endobj
14 0 obj
<<
/Fields [13 0 R]
>>
endobj
13 0 obj
<<
/Type /Annot
/Subtype /Widget
/Rect [20.0 20.0 120.0 120.0]
/FT /Btn
/F 4
/T (name)
/AS /Yes
/V /Yes
/AP <<
/N <<
/Yes 11 0 R
/Off 12 0 R >>
>>
>>
endobj
11 0 obj
<<
/Type /XObject
/SubType /Form
/BBox [20.0 20.0 120.0 120.0]
/Length 19 0 R
>>
stream
....
endstream
endobj
12 0 obj
<<
/Type /XObject
/SubType /Form
/BBox [20.0 20.0 120.0 120.0]
/Length 20 0 R
>>
stream
....
endstream
endobj
My observations with your PDF are somewhat different but interesting nonetheless:
Adobe Acrobat 9 Pro v9.5.4 (with PDF/A r/o view disabled) here does exactly what you originally seem to have expected: It only uses the red or blue framed box. If one toggled the check box, though, even if toggling back on again, it wants to save a new revision with some changes to your field.
Adobe Reader X! v11.0.2 starts in PDF/A read-only mode and displays the red frame. After leaving that r/o mode, though, it shows the default cross appearance. When it gets the focus it again uses the red and blued frames. When it loses focus, it goes back to the default appearances.
The behavior I observed in Adobe Reader X! seems to be what you observed in more cases.
Thus in essence the issue is that under certain circumstances (for me: not in PDF/A r/o mode, focus not on form field) some PDF vewers (for me: Adobe Reader XI) don't use your custom check box appearances but some standard ones, and you think that this is incorrect.
Unfortunately there is a hint in the PDF specification ISO 32000-1:2008 according to which viewers may (perhaps even shall) act just so. Table 189 in section 12.5.6.19 Widget Annotations explains the entries in an appearance characteristics dictionary (value of /MK in the widget dictionary; you do not provide one, thus defaults apply), among them /CA:
text string (Optional; button fields only) The widget annotation’s normal caption,
which shall be displayed when it is not interacting with the user.
Unlike the remaining entries listed in this Table, which apply only to
widget annotations associated with pushbutton fields (see Pushbuttons in
12.7.4.2, “Button Fields”), the CA entry may be used with any type of
button field, including check boxes (see Check Boxes in 12.7.4.2, “Button
Fields”) and radio buttons (Radio Buttons in 12.7.4.2, “Button Fields”).
In particular check boxes, therefore, whenever not interacting with the user, shall be displayed using their normal captions, not their appearances.
When there is no focus on a form field, Adobe Reader seems to think that the form is not interacting with the user, and therefore switches to display of caption instead of appearance.
Unfortunately the normal caption you can define for a button is but a text string which by default seems to be interpreted in the context of the Zapf Dingbats font (try /MK<</CA(1)>> for example). This is, though, where you should continue looking, maybe you can make it use some Type 3 font of your design containing a blue and a red square frame.