Creating PAdES signature - pdf

I am trying to create a PAdES signature using the following workflow:
PDF is prepared for signing and hash is calculated in the browser
hash is sent to the backend
detached CAdES signature is formed on the backend
detached CAdES is sent back to the browser where PAdES signature is assembled
We have a working example of PDF signature that works like this:
PDF is prepared and hash is calculated in the browser
hash is sent to the backend
detached PKCS7 signature is made on the backend (by using BouncyCastle lib)
detached PKCS7 is sent back to the browsere where PDF signature is assembled
This is working fine.
However, now instead of BouncyCastle we are using DSS library on the backend because we are trying to create a PAdES signature. So, DSS lib is creating detached CAdES (which should be the same as detached PAdES) instead of PKCS7. However, when the signature is assembled in the browser the signature is invalid (even the certificate info isn't visible).
From my understanding CAdES is an extension to PKCS7 so this approach should work.
I'm first trying to understand if something's wrong with our approach and if not, I'll try to share the code we're using to make a detached CAdES signature to see if something's wrong there.

I figured it out. It was that the size of detached CaDES signature is more than 2 times bigger then detached PKCS7 signature, so we weren't leaving enough space for the signature to fit in, so the signature was basically overwritting the PDF content. When I increased the space for the signature everything is working as it should

Related

How to compute message-digest in a SignerInfo?

I'm trying to generate a pdf with LTV enabled. I generate a pkcs7 object with all the attributes necesary included the signerInfo object. The signature generated is valid but not LTV enabled. According to the PDF reference manual i need to include validation info (CRLS or OCSP...) and based on rfc3852 this content goes in the signedAttributes object and it must contain a content-type attribute and a message-digest attribute. My question is how to compute the message-digest value and is it necesary to sign alongside the pdf content?
Note: the adbe-revocationInfoArchival object containing the CRLs seems to be correct since acrobat reads the revocation info directly from the file. The only issue i seem to have is the message-digest included in the signedAttrs object and/or the signature value is not computed correctly. RFC is not very clear on what that message-digest should be or if it should be included in the digest that will be signed with the signers private key.
After some back and forth in the comments to the question I'm still not sure what information you need, so here a few thoughts in general on the structure and contents of non-trivial CMS signature containers for PDF signatures.
The Specifications
First off, though, some words on the specifications to use. You mention the PDF reference manual and rfc3852. Both actually are not state-of-the-art anymore, but interestingly one less than the other.
Originally the Adobe PDF References for PDF up to 1.7 were the documentation to look at. Unfortunately Adobe saw these references as not normative in nature, i.e. if the current Reference and the current Acrobat version disagreed on something, the program was correct, not the Reference!
The latest Adobe PDF Reference (for PDF 1.7) referred to RFC 2315 for the structure of the signature container.
Then Adobe transferred the authority over the format PDF to the International Organization for Standardization (ISO) who in 2008 published the first normative PDF specification, ISO 32000-1, which was very similar to the last PDF Reference in content but adopted the RFC'ish language.
ISO 32000-1 refers both to RFC 3852 and RFC 2315 for the structure of the signature container. Which is weird, but most likely the remaining RFC 2315 reference was an oversight.
In 2017 the ISO published a PDF specification for PDF 2.0, ISO 32000-2, with a number of relevant changes, also in the context of signing.
ISO 32000-2 refers to RFC 5652 for the structure of the signature container for adbe.pkcs7.detached signatures and to ETSI EN 319 122 for the structure of the signature container for ETSI.CAdES.detached signatures.
In 2020 the ISO updated ISO 32000-2 with a number of clarifications; the references for the signature container specification remained the same.
Thus, currently you should look at ISO 32000-2:2020 and RFC 5652.
CMS Signature Containers
In a late comment you say
I want to know how do i add these attributes to the final digest to sign. I'm using SHA to digest the pdf, then sign it with the rsa private key and build the pkcs7 structure including the certificate chain, the signed message and a timestamp as an unsigned attribute.
This procedure can only create simple signature containers without signed attributes as only in these simple containers the document hash is signed directly. But the adbe-revocationInfoArchival attribute you want to add must be a signed attribute, and as soon as signed attributes are involved, the document hash value is not signed directly anymore.
The CMS signature container contains a SignedData object with exactly one SignerInfo object. That SignerInfo object is defined as
SignerInfo ::= SEQUENCE {
version CMSVersion,
sid SignerIdentifier,
digestAlgorithm DigestAlgorithmIdentifier,
signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
signatureAlgorithm SignatureAlgorithmIdentifier,
signature SignatureValue,
unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }
(RFC 5652 section 5.3. "SignerInfo Type")
In a signature container created by your working code the OPTIONAL signedAttrs are absent and the signature value is calculated immediately for the document hash.
As soon as there are signed attributes, though, the OPTIONAL signedAttrs is not absent anymore, instead it is a SET of Attribute instances including at least
a content-type attribute with id-data as value,
a message-digest attribute with the digest value of the to-be-signed PDF byte ranges as value,
and in your case an adbe-revocationInfoArchival attribute with the revocation information as value.
In this case the signature value is not calculated immediately for the document hash anymore but instead for the hash value of the signedAttrs!
To be more exact, it is calculated for the hash value of the complete DER encoding thereof, and not with the IMPLICIT [0] tag but with an EXPLICIT SET OF tag.
Thus, after using SHA to digest the pdf you instead of signing it with the rsa private key and building the pkcs7 structure proceed by
building the set of signed attributes with at least the attribute entries enumerated above, DER encoding that set and hashing it,
signing that hash value of the signed attributes with your private key, and
building the CMS signature container structure with these signed attributes and this signature value, and also with the certificate chain.
Additionally you may add a signature time stamp.

Question about Apache PDFBox and PDF Certification

We are doing external remote signing using Apache PDFBox, source code is mostly based on the official samples of the Apache PDFBox. We notices some "issues" when we try to sign a document with multiple signatures: They are visible signatures. Input is a document with some signature holders. The flow is:
Unsigned doc -> sign(graphic_signature1, cert1, unsigned_doc) -> signed_doc_1 -> sign(graphic_signature2, cert2, signed_doc_1) -> signed_doc_2, ....
The result:
signed_doc_1: Adobe Acrobat say: Signature is valid, no modification
signed_doc_2 and subsequent ones: Adobe Acrobat say: The changes that have been made to this document since it was certified are permitted by Certifying party and do not invalidate the signature.
I also read this article:
https://help.adobe.com/en_US/livecycle/11.0/Services/WS92d06802c76abadb-3598a7d812dbeb3dcf3-7ff0.2.html
what I would like to ask:
Is it actually an issue? (sorry, I am just a developer, I do not know much about the policy for PDF certification)
If it is an issue, how can it be fixed?
When signing, the following saveIncrementalForExternalSigning has been called:
signatureOptions = new SignatureOptions();
signatureOptions.setVisualSignature(createVisualSignatureTemplate(doc,
signingRequest.getSignatureInfo().getPosition().getPageNumber(), rect, signature));
signatureOptions.setPage(signingRequest.getSignatureInfo().getPosition().getPageNumber());
doc.addSignature(signature, null, signatureOptions);
ExternalSigningSupport externalSigning = doc.saveIncrementalForExternalSigning(fos);
// invoke external signature service
byte[] cmsSignature = sign(externalSigning.getContent());
// set signature bytes received from the service and save the file
externalSigning.setSignature(cmsSignature);
Edited: I was able to "fix" the issue by comment the line of code to call setMDPPermission(doc, signature, 2). (In the Apache PDFBox signature sample). Thanks!
Posted by the OP:
I was able to "fix" the issue by comment the line of code to call
setMDPPermission(doc, signature, 2). (In the Apache PDFBox signature
sample). Thanks!

Digital Signature With TSA Timestamping adding certificates in chain in TSA response PDF Box giving error “not enough space to write signature”

I have created digital signature with timestamping the signature via TSA. In this I have added certificates to build chain in TSA response for building chain this works fine and signature also created, but while embedding this signature in pdf using PDF box API for Java it gives error not enough space to write signature. Is there any configuration available in PDF box to handle signature size?
Any help would be appreciated.
I assume you're using an embedded timestamp as in the CreateEmbeddedTimeStamp.java example, so you're using the space of the existing signature. That one is fixed, so you need to make it large enough:
signatureOptions.setPreferredSignatureSize(...);
with a number higher than the default (0x2500). The SignatureOptions object can be passed in the document.addSignature() call.

Extract and recomprise PDF file using Origami

This is regarding Origami, the Ruby tool for exploring PDF files at http://esec-lab.sogeti.com/pages/Origami
By way of example I am trying to open a PDF file, extract it and then rewrite the original PDF. This is the complete code I am trying to use to accomplish this:
hg clone https://code.google.com/p/origami-pdf/
cd origami-pdf/
rake
cd ..
curl 'http://www.ada.gov/hospcombrprt.pdf' -o hospcombrprt.pdf
origami-pdf/bin/pdf2ruby -x hospcombrprt.pdf
mv hospcombrprt.pdf hospcombrprtORIG.pdf
cd hospcombrprt
ruby hospcombrprt.rb # THIS STEP PRODUCES ERRORS
bc hospcombrprt.pdf ../hospcombrprtORIG.pdf || echo FAILED
However this produces the following error:
/Users/williamentriken/Developer/origami-pdf/lib/origami/page.rb:75:in `pages': Invalid page tree (Origami::InvalidPDFError)
from /Users/williamentriken/Developer/origami-pdf/lib/origami/pdf.rb:689:in `compile'
from /Users/williamentriken/Developer/origami-pdf/lib/origami/pdf.rb:233:in `save'
from hospcombrprt.rb:189:in `<main>'
Has anyone else had success in performing this operation using this library and could you please share?
Original Post:
I played around with the library for a while, but I kept getting errors and minor bugs, such as replicated pages and missing pages...
...you should read the authors comment about the limits of using the Origami library.
I recommend the combine_pdf gem, it's great for simple pdf manipulations, such as merging, stamping and the like.
update:
I looked at the specific PDF file and it might be an issue related to an unsupported PDF version.
The http://www.ada.gov/hospcombrprt.pdf file is encrypted with a type 4 encryption, which according to the PDF standard, starting with PDF 1.5, is:
"(PDF 1.5) The security handler defines the use of encryption and decryption in the document, using the rules specified by the CF, StmF, and StrF entries."
The encryption uses AES v.2, which is limited to PDF 1.6 and above:
"AESV2 (PDF 1.6) The application shall ask the security handler for the encryption key and shall implicitly decrypt data with "Algorithm 1: Encryption of data using the RC4 or AES algorithms", using the AES algorithm in Cipher Block Chaining (CBC) mode with a 16-byte block size and an initialization vector that shall be randomly generated and placed as the first 16 bytes in the stream or string."
So, Even if the decryption code is written in, the way to apply that code might not be known due to the way the PDF file is structured...
...It might be better to start with simple PDF files and then patch anything that isn't supported just yet.

PKCS#7 SignedData and multiple digest algorithms

I'm investigating upgrading an application from SHA1 as the default PKCS#7 SignedData digest algorithm to stronger digests such as SHA256, in ways that preserve backwards compatibility for signature verifiers which do not support digest algorithms other than SHA1. I want to check my understanding of the PKCS#7 format and available options.
What think I want to do is digest message content with both SHA1 and SHA256 (or more generally, a set of digest algorithms) such that older applications can continue to verify via the SHA1, and upgraded applications can begin verifying via the SHA256 (more generally, the strongest digest provided), ignoring the weaker algorithm(s). [If there is a better approach, please let me know.]
It appears that within the PKCS#7 standard, the only way to provide multiple digests is to provide multiple SignerInfos, one for each digest algorithm. Unfortunately, this would seem to lead to a net decrease in security, as an attacker is able to strip all but the the SignerInfo with the weakest digest algorithm, which alone will still form a valid signature. Is this understanding correct?
If so, my idea was to use custom attributes within the authenticatedAttributes field of SignerInfo to provide additional message-digests for the additional digest algorithms (leaving SHA1 as the "default" algorithm for backwards compatibility). Since this field is authenticated as a single block, this would prevent the above attack. Does this seem like a viable approach? Is there a way to accomplish this or something similar without going outside of the PKCS standard?
Yes, you are right, in the current CMS RFC it says about the message digest attribute that
The SignedAttributes in a signerInfo
MUST include only one instance of the message-digest attribute.
Similarly, the AuthAttributes in an AuthenticatedData MUST include
only one instance of the message-digest attribute.
So it is true that the only way to provide multiple message digest values using the standard signed attributes is to provide several signedInfos.
And yes, any security system is as strong as its weakest link, so theoretically you will not gain anything by adding a SignedInfo with SHA-256 if you also still accept SHA-1 - as you said, the stronger signatures can always be stripped.
Your scheme with custom attributes is a bit harder to break - but there is still a SHA-1 hash floating around that can be attacked. It's no longer as easy as just stripping the attribute - as it's covered by the signature. But:
There is also the digest algorithm that is used to digest the signed attributes which serves as the basis of the final signature value. What do you intend to use there? SHA-256 or SHA-1? If it's SHA-1, then you will be in the same situation as before:
If I can produce collisions for SHA-1, then I would strip off your custom SHA-256 attribute and forge the SHA-1 attribute in such a way that the final SHA-1 digest for the signature adds up again. This shows that there will only be a gain in security if the signature digest algorithm would be SHA-256, too, but I'm guessing this is no option since you want to stay backwards-compatible.
What I would suggest in your situation is to keep using SHA-1 throughout but apply an RFC 3161-compliant timestamp to your signature as an unsigned attribute. Those timestamps are in fact signatures of their own. The good thing is you can use SHA-256 for the message imprint there and often the timestamp server applies its signature using the same digest algorithm you provided. Then reject any signature that either does not contain such a timestamp or contains only timestamps with message imprint/signature digest algorithms weaker than SHA-256.
What's the benefit of this solution? Your legacy applications should check for the presence of an unsigned timestamp attribute and if a strong digest was used for it, but otherwise ignore them and keep on verifying the signatures the same way they did before. New applications on the other hand will verify the signature but additionally verify the timestamp, too. As the timestamp signature "covers" the signature value, there's no longer a way for an attacker to forge the signature. Although the signature uses SHA-1 for the digest values an attacker would have to be able to break break the stronger digest of the timestamp, too.
An additional benefit of a timestamp is that you can associate a date of production with your signature - you can safely claim that the signature has been produced before the time of the timestamp. So even if a signature certificate were to be revoked, with the help of the timestamp you could still precisely decide whether to reject or accept a signature based on the time that the certificate was revoked. If the certificate was revoked after the timestamp, then you can accept the signature (add a safety margin (aka "grace period") - it takes some time until the information gets published), if it was revoked prior to the time of the timestamp then you want to reject the signature.
A last benefit of timestamps is that you can renew them over time if certain algorithms get weak. You could for example apply a new timestamp every 5-10 years using up-to-date algorithms and have the new timestamps cover all of the older signatures (including older timestamps). This way weak algorithms are then covered by the newer, stronger timestamp signature. Have a look at CAdES (there exists also an RFC, but it's outdated by now), which is based on CMS and makes an attempt at applying these strategies to provide for long-term archiving of CMS signatures.