Verifying digital signatures in PDF documents - pdf

I'm trying to verify PDF's digital signatures.
I know that when a PDF is signed, a byterange is defined, the certificates get embedded, and from what i've read, the signed message digest and the timestamp are also stored in the PDF.
I already can extract the certificates and validate them.
Now I'm trying to validate the pdf's integrity and my problem is I don't know where the signed message digest is located.
In this sample signed pdf from Adobe (http://blogs.adobe.com/security/SampleSignedPDFDocument.pdf), i can clearly identify the digest since it is down below the embedded certificates: /DigestMethod/MD5/DigestValue/ (line 1520).
But that PDF sample seems to be from 2009, and I suspect the message digest is stored in a different way now, because I signed a PDF with Adobe Reader and also with iText, and I can't find any message digest field like the previous one.
Can someone tell if the digests are now stored in a different way? Where are they located?
Anyway, for now I'm using that sample document from Adobe, and trying to verify its integrity.
I'm getting the document's bytes to be signed acording to the specified byterange, and digesting them with MD5 algorithm, but the digest value I get doesn't match with the one from the message digest field...
Am I doing something wrong? Is the digest also signed with the signer's private key?
I appreciate any help.

There are numerous details to get right when calculating the hash for integrated PDF signatures, among them:
Extract the correct bytes from the PDF to hash. The ByteRange tells you exactly which byte ranges are signed. To be accepted in modern signing contexts, the ranges must cover the whole PDF file revision with the exception of the value of Contents.
Beware, the value of Contents includes the the leading '<' and the trailing '>' brackets.
Don't use a regular text editor or text processing instructions (like readln or writeln) to process PDFs. PDFs are binary in nature, even if they look textual to the naked eye. Copying PDF parts using such text related operations most likely changes them in details, definitively breaking the signature hash value.
When in doubt, don't guess but read the specification. A copy of ISO 32000-1 has been made available by Adobe here, and much you need to know about the PDF format to start processing them can be found there and in other public standards referenced in there. A very short introduction to integrated PDF signatures can be found in this answer and documents referenced from there.

Related

Can you add a timestamped no-tamper-proof to a PDF without "signing" it?

When signing a PDF using digital signature, one can use a trusted timestamping service to add a time-stamp token that is signed by the timestamping authority. When viewing the signature of the PDF then, it will say that it contains a signed timestamp and that it has not been tampered with since that time (if it hasn't).
Technically what happens isn that the hash of the pdf content gets sent to the TSA (RCF3161), that hash is put into a structure together with the current timestamp (as determined by the timestamping authority) plus some metadata and that is then signed and sent back. This then provides proof that a PDF has not been changed since this point in time.
Technically it should be possible therefore to create such a timestamp proof without signing the document itself with an additional signature. Is that somehow supported though by the PDF standard (and also in terms of Acrobat Reader then being able to show this timestamp somehow)?
Of course I could just do it manually, take the SHA-256 hash of the file's binary representation, send it to the TSA service and store the received token in an external file, but preferrably I'd like to embedd the no-tamper proof into the PDF and such that Acrobat Reader can display it.
Is this possible? If so, how?
You can embed pure RFC 3161 time stamps in a PDF. This construct is called a document timestamp.
This structure has been originally specified in ETSI TS 102 778-4 (Annex A.2) in 2009 as a means to purely timestamp a previously signed PDF with some validation related information added in revisions after the signed one. As PAdES developed, this specification finally found its way into ETSI EN 319 142-1 (section 5.4.3).
While ETSI could only specify the structure as extension to ISO 32000-1 (PDF 1.7), the responsible ISO committee added it to the core ISO 32000-2 (PDF 2) in 2017.
Concerning your questions in comments:
Is this compatible with PDF/A?
I think they are not compatible with PDF/A-1, PDF/A-2, and PDF/A-3. As PDF/A-4 is based on ISO 32000-2, though, I assume it will be compatible. (I have not yet had a look at ISO 19005-4...)
Is there a way to create those with Acrobat Reader?
It should be possible with some Adobe Acrobat version. It is (currently) not possible with the base Adobe Acrobat Reader version. Probably, though, Adobe Acrobat Reader with some of its fee-based, built-in tools can create them.
optimally I'd like to have a cli tool or be able to do it through some library
Any not outdated general PDF signing library should support the creation of document time stamps.
but first I want to test how they are displayed later in Acrobat Reader
Like this:
The first entry is a Signature with an embedded signature timestamp, the second entry is a document time stamp.

May a PDF signed document signature reapplied?

I'm thinking about the case and pdf is dynamically generated according to a template and some data.
Until I keep the template the same and the data the same, may I reapply the same signature to the document or it will be invalidated?
You can read also as: is the public signature related only to content or depends on create time of the document,etc..?
Until I keep the template the same and the data the same, may I reapply the same signature to the document or it will be invalidated?
If you re-generate the PDF byte-wise identically and in particular also identically prepare it for signing, the signed bytes are identical, so the identical signature can be used.
You can read also as: is the public signature related only to content or depends on create time of the document,etc..?
You said you want to keep the data the same. If the claimed creation time of the document is stored in the document (e.g. in the metadata), then that claimed creation time obviously must be part of the data you keep and re-use.
You can get a different best signing time, though, by using a digital time stamp for the signing time and nothing else because such a time stamp is applied as unsigned attribute.
Unfortunately you don't mention your PDF generation tool chain, so we can not check whether your tools allow such a faithful regeneration of PDFs.

"detached" digital signatures in PDF

I want to implement "parallel" signing process of PDF, so that users can digitally sign document not "one by one", but simultaniously. To implement this, I decided to create separate copies of initial document for all users and get signatures on them. Eventually, all signatures should be concatenated into single PDF.
Let's assume, that PDF is not changing during signing process, except signature field creation (all acroForms, signatureContainers, visual signatures, etc. are created before and similar for all).
.. during futher investigation, I readed this article and understood, that each previous digital signature (even detached) is included into SignedContent of the next signature. So there is no way to put digital signature which will be completely separated from the contents. This leads to a problem, that next signature, can't be calculated, before previous is finished.
Please tell if there is any option to get around this? Or putting signatures "one by one" is the only solution?
P.S. I'm using Apache PDFBox to work with PDF.
Please tell if there is any option to get around this?
If you want your signatures to be interoperable, there is no way around that.
I readed this article and understood, that each previous digital signature (even detached) is included into SignedContent of the next signature
That answer still represents the current situation. If anything, it has been confirmed by newer specifications, e.g. the PAdES specifications referenced in that answer merely were 'technical specifications' (ETSI TS 102 778) and there now are actual norms (ETSI EN 319 142) which also require a pdf signature to sign everything in its revision except its own signature container. Also ISO 32000-2 has been published, still having that requirement for its interoperable signatures and additionally including a shortened copy of the PAdES specification.
You stress "even detached" here. The "detached" in the context at hand only refers to the structure of the CMS container which is embedded in the PDF; it in particular does not refer to the signature being more separated from the PDF or anything like that.
If you don't need to be interoperable, though, there are some options, here two of them which still are quite near to the interoperable signatures:
You can ignore the requirement that a pdf signature must sign everything in its revision except its own signature container.
For example you can prepare multiple signature fields and dictionaries in a single new revision of the document and set each signature's signed byte range to exclude the placeholders of all these signatures.
you can ignore the requirement that there is only a single SignerInfo in the CMS signature container and put SignerInfos from different signing parties into a single signature container in a single signature field.
Common PDF signature validators will,
in case of signatures created as described in the former option, not positively validate, at least most of them,
either because their code is programmed for only two ranges of signed bytes (i.e. a single gap) and so only uses the first two ranges resulting in a wrong document hash;
or because they explicitly require that a signature covers its whole revision minus the single placeholder for the signature container of the signature field being validated; the number of validators of this kind surely has risen since the publication of the "Security of PDF Signatures" master thesis by Karsten Meyer zu Selhausen at the Ruhr-Universität Bochum, see this question.
in case of signatures created as described in the latter option, appear to positively validate, at least many of them, until you look at the validation result in detail and realize that they have validated only one of the SignerInfos and ignored the others.
For example in case of two SignerInfos Adobe Reader validates the second one (I assume it always validates the last one) and eSig DSS validates the first one, and neither one of them currently indicates in the validation result that there may be another SignerInfo present.
A large Swedish security company, for example, implements the second option in its software; in its home brew format PDF/CAdES-A it inserts CAdES-A containers as CMS container in PDFs and allows multiple SignerInfos therein. Obviously, therefore, its own software will recognize and validate all SignerInfos. Nonetheless, this is a home brew solution and not interoperable.
You could use existing software that supports signature workflows - like e.g. we offer at https://www.esignanywhere.net - this software allows to define signature workflows (via API or web user interface). Input can be a PDF document with signature fields as acro form fields, or text placeholders within the PDF. The meta informations defined in eSignAnyWhere allow to assign it to a specific signer. The workflow capabilities allow to define sequential, parallel, or mixed sequential+parallel, signing workflows.

We receive signed PDF documents with ulterior modifications

Maybe this one would fit better on so security? I'm not sure...
These are the facts:
We have a web app where users download a PDF document with a form, they fullfill this form, sign it with their electronic certificate and upload it back to our environment.
We've shown cases where the uploaded document is signed, but it show some fields that have been altered after the signature. If we check the integrity of PDF signatures, it shows that have been data alteration after the signature, but the signature it's fine and valid.
If we right-click on the signature and select "See signed version" we see the real data loaded on the moment of the signature.
Now, this goes against my general perception of electronic signature functionality. If any change is made to the document (or the data loaded into it) after I make a signature, this signature should become invalid, as the document has been altered.
The behaviour of the PDF seems to be different, as not only the signature still is valid, also the "default version" that you see when you open the document is the last one, not the signed one.
Now I'm wondering
Is this some kind of bug or is a expected behaviour?
There is any place where info on the matter can be found? (google keeps redirecting me once and again to "how to sign a PDF" articles).
If this is a defined behaviour, how do you deal with it?
Now, this goes against my general perception of electronic signature functionality. If any change is made to the document (or the data loaded into it) after I make a signature, this signature should become invalid, as the document has been altered.
The behaviour of the PDF seems to be different, as not only the signature still is valid, also the "default version" that you see when you open the document is the last one, not the signed one.
Is this some kind of bug or is a expected behaviour?
It is expected behavior.
You have to be aware of two special factors here:
A PDF signature field contains the information of the byte ranges signed. Obviously not the whole file can be signed as the signature itself is embedded and cannot be part of the signed bytes. Thus, the signed bytes ranges need to be recorded somewhere. Cf. this answer on Information Security Stack Exchange:
Additions to a PDF can be made by appending to the existing document, a process called an incremental update. These updates can again be signed etc., also cf. the answer referenced above:
Thus, making changes to a PDF by means of an incremental update, the existing integrated signatures in the document still correctly sign their respective signed by range. They still are mathematically valid in spite of the added changes.
Furthermore the current contents of a PDF are defined in particular by the newest incremental update, so when you open the document it shows the content including the last changes, not the signed one.
Now, while this sounds like PDF signatures have no meaning, this is not the case. The specification ISO 32000-1 clearly defines which changes are allowed to be made in an incremental update to a certified (= signed with some special flags) base version of a document, and Adobe in their Acrobat and Reader software have extrapolated restrictions from this for signed but not certified documents, cf. this answer on stack overflow.
In particular at most the following changes are allowed:
Adding signature fields
Adding or editing annotations
Supplying form field values
Digitally signing
If this is a defined behaviour, how do you deal with it?
As the documents originate from you, you can start by applying a certificate signature to the document which only allows as little changes as possible in your use case.
Then you can define signature lock information for the signature fields your users are to sign. In these lock information you can e.g. prescribe that after signing the given signature field, a number of form fields shall be read-only.
Finally you only accept back PDFs which still contain your certification signature and to which no disallowed changes were added.
There actually are numerous PDFs which are certified and contain a number of fields for additional approval signatures, and each of the approval signature fields is coupled with some form fields which will not be editable anymore after signing. After all the signature fields are signed, all fields are read-only.
There is any place where info on the matter can be found? (google keeps redirecting me once and again to "how to sign a PDF" articles).
You should in particular look at the PDF specification ISO 32000-1 and some Adobe documents on the behavior of their software. You'll find links at the bottom of the stack overflow documentation page the above mentioned links point to.

Where to place the digital signature of the documents in my system?

I am developing an archiving system that stores documents in a database and provides various functionalities to the user. I have added a part to sign and verify any document in the database. However, I am stuck with the logic and wondering where should I place the signing function.
Hints about my aims:
No document should be uploaded on the database without a signatureIf a document is not changed it should retain its signatureIf the document does not own a signature .. it should be signed with uploader's signatureThe signature will not encrypt the file so it will still be readable after the signing process is applied
The initial solution I have used is to place the signing procedure in in the form that is called by the Upload button and store the signature of the file in a separate column in the Documents table in the database. However, that solution turned out to be invalid for my scenario as if an employee downloaded a file and then uploaded it again, then it will be signed by him and thus, the original signature will be lost. Also, the signature will be with no significance out side the system.
My main question:
Is there a way to store the signature inside the documents?
Hint: My system will deal only with PDF, JPEG, Tiff, MS Office and TXT Documents.
Subsidiary Request: It would be awesome if there's a way to store the signature in any type of files!
Is there a way to store the signature inside the documents?
A digital signature must be built using a hash of the document that is being signed. Since adding a signature to a document modifies the document (which invalidates the hash), there is no general solution to storing a digital signature inside a document.
Some document formats allow for digital signing and define what portion is to be excluded from the hash, but those formats that were listed—as far as I know—are not among them. (Though PGP could be used on TXT documents.)
Since signatures sign the hash of a document, you could simply create a table mapping hashes to signatures. Thus, downloading and re-uploading a document will not remove existing signatures, since the hash will remain the same. The usefulness of this approach depends, of course, on the semantic meaning of a signature in your system.