Question about Apache PDFBox and PDF Certification

Question about Apache PDFBox and PDF Certification - pdfbox

We are doing external remote signing using Apache PDFBox, source code is mostly based on the official samples of the Apache PDFBox. We notices some "issues" when we try to sign a document with multiple signatures: They are visible signatures. Input is a document with some signature holders. The flow is:
Unsigned doc -> sign(graphic_signature1, cert1, unsigned_doc) -> signed_doc_1 -> sign(graphic_signature2, cert2, signed_doc_1) -> signed_doc_2, ....
The result:
signed_doc_1: Adobe Acrobat say: Signature is valid, no modification
signed_doc_2 and subsequent ones: Adobe Acrobat say: The changes that have been made to this document since it was certified are permitted by Certifying party and do not invalidate the signature.
I also read this article:
https://help.adobe.com/en_US/livecycle/11.0/Services/WS92d06802c76abadb-3598a7d812dbeb3dcf3-7ff0.2.html
what I would like to ask:
Is it actually an issue? (sorry, I am just a developer, I do not know much about the policy for PDF certification)
If it is an issue, how can it be fixed?
When signing, the following saveIncrementalForExternalSigning has been called:
signatureOptions = new SignatureOptions();
signatureOptions.setVisualSignature(createVisualSignatureTemplate(doc,
signingRequest.getSignatureInfo().getPosition().getPageNumber(), rect, signature));
signatureOptions.setPage(signingRequest.getSignatureInfo().getPosition().getPageNumber());
doc.addSignature(signature, null, signatureOptions);
ExternalSigningSupport externalSigning = doc.saveIncrementalForExternalSigning(fos);
// invoke external signature service
byte[] cmsSignature = sign(externalSigning.getContent());
// set signature bytes received from the service and save the file
externalSigning.setSignature(cmsSignature);
Edited: I was able to "fix" the issue by comment the line of code to call setMDPPermission(doc, signature, 2). (In the Apache PDFBox signature sample). Thanks!

Posted by the OP:
I was able to "fix" the issue by comment the line of code to call
setMDPPermission(doc, signature, 2). (In the Apache PDFBox signature
sample). Thanks!

Related

How to compute message-digest in a SignerInfo?

I'm trying to generate a pdf with LTV enabled. I generate a pkcs7 object with all the attributes necesary included the signerInfo object. The signature generated is valid but not LTV enabled. According to the PDF reference manual i need to include validation info (CRLS or OCSP...) and based on rfc3852 this content goes in the signedAttributes object and it must contain a content-type attribute and a message-digest attribute. My question is how to compute the message-digest value and is it necesary to sign alongside the pdf content?
Note: the adbe-revocationInfoArchival object containing the CRLs seems to be correct since acrobat reads the revocation info directly from the file. The only issue i seem to have is the message-digest included in the signedAttrs object and/or the signature value is not computed correctly. RFC is not very clear on what that message-digest should be or if it should be included in the digest that will be signed with the signers private key.

After some back and forth in the comments to the question I'm still not sure what information you need, so here a few thoughts in general on the structure and contents of non-trivial CMS signature containers for PDF signatures.
The Specifications
First off, though, some words on the specifications to use. You mention the PDF reference manual and rfc3852. Both actually are not state-of-the-art anymore, but interestingly one less than the other.
Originally the Adobe PDF References for PDF up to 1.7 were the documentation to look at. Unfortunately Adobe saw these references as not normative in nature, i.e. if the current Reference and the current Acrobat version disagreed on something, the program was correct, not the Reference!
The latest Adobe PDF Reference (for PDF 1.7) referred to RFC 2315 for the structure of the signature container.
Then Adobe transferred the authority over the format PDF to the International Organization for Standardization (ISO) who in 2008 published the first normative PDF specification, ISO 32000-1, which was very similar to the last PDF Reference in content but adopted the RFC'ish language.
ISO 32000-1 refers both to RFC 3852 and RFC 2315 for the structure of the signature container. Which is weird, but most likely the remaining RFC 2315 reference was an oversight.
In 2017 the ISO published a PDF specification for PDF 2.0, ISO 32000-2, with a number of relevant changes, also in the context of signing.
ISO 32000-2 refers to RFC 5652 for the structure of the signature container for adbe.pkcs7.detached signatures and to ETSI EN 319 122 for the structure of the signature container for ETSI.CAdES.detached signatures.
In 2020 the ISO updated ISO 32000-2 with a number of clarifications; the references for the signature container specification remained the same.
Thus, currently you should look at ISO 32000-2:2020 and RFC 5652.
CMS Signature Containers
In a late comment you say
I want to know how do i add these attributes to the final digest to sign. I'm using SHA to digest the pdf, then sign it with the rsa private key and build the pkcs7 structure including the certificate chain, the signed message and a timestamp as an unsigned attribute.
This procedure can only create simple signature containers without signed attributes as only in these simple containers the document hash is signed directly. But the adbe-revocationInfoArchival attribute you want to add must be a signed attribute, and as soon as signed attributes are involved, the document hash value is not signed directly anymore.
The CMS signature container contains a SignedData object with exactly one SignerInfo object. That SignerInfo object is defined as
SignerInfo ::= SEQUENCE {
version CMSVersion,
sid SignerIdentifier,
digestAlgorithm DigestAlgorithmIdentifier,
signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
signatureAlgorithm SignatureAlgorithmIdentifier,
signature SignatureValue,
unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }
(RFC 5652 section 5.3. "SignerInfo Type")
In a signature container created by your working code the OPTIONAL signedAttrs are absent and the signature value is calculated immediately for the document hash.
As soon as there are signed attributes, though, the OPTIONAL signedAttrs is not absent anymore, instead it is a SET of Attribute instances including at least
a content-type attribute with id-data as value,
a message-digest attribute with the digest value of the to-be-signed PDF byte ranges as value,
and in your case an adbe-revocationInfoArchival attribute with the revocation information as value.
In this case the signature value is not calculated immediately for the document hash anymore but instead for the hash value of the signedAttrs!
To be more exact, it is calculated for the hash value of the complete DER encoding thereof, and not with the IMPLICIT [0] tag but with an EXPLICIT SET OF tag.
Thus, after using SHA to digest the pdf you instead of signing it with the rsa private key and building the pkcs7 structure proceed by
building the set of signed attributes with at least the attribute entries enumerated above, DER encoding that set and hashing it,
signing that hash value of the signed attributes with your private key, and
building the CMS signature container structure with these signed attributes and this signature value, and also with the certificate chain.
Additionally you may add a signature time stamp.

Digital Signature With TSA Timestamping adding certificates in chain in TSA response PDF Box giving error “not enough space to write signature”

I have created digital signature with timestamping the signature via TSA. In this I have added certificates to build chain in TSA response for building chain this works fine and signature also created, but while embedding this signature in pdf using PDF box API for Java it gives error not enough space to write signature. Is there any configuration available in PDF box to handle signature size?
Any help would be appreciated.

I assume you're using an embedded timestamp as in the CreateEmbeddedTimeStamp.java example, so you're using the space of the existing signature. That one is fixed, so you need to make it large enough:
signatureOptions.setPreferredSignatureSize(...);
with a number higher than the default (0x2500). The SignatureOptions object can be passed in the document.addSignature() call.

Creating PAdES signature

I am trying to create a PAdES signature using the following workflow:
PDF is prepared for signing and hash is calculated in the browser
hash is sent to the backend
detached CAdES signature is formed on the backend
detached CAdES is sent back to the browser where PAdES signature is assembled
We have a working example of PDF signature that works like this:
PDF is prepared and hash is calculated in the browser
hash is sent to the backend
detached PKCS7 signature is made on the backend (by using BouncyCastle lib)
detached PKCS7 is sent back to the browsere where PDF signature is assembled
This is working fine.
However, now instead of BouncyCastle we are using DSS library on the backend because we are trying to create a PAdES signature. So, DSS lib is creating detached CAdES (which should be the same as detached PAdES) instead of PKCS7. However, when the signature is assembled in the browser the signature is invalid (even the certificate info isn't visible).
From my understanding CAdES is an extension to PKCS7 so this approach should work.
I'm first trying to understand if something's wrong with our approach and if not, I'll try to share the code we're using to make a detached CAdES signature to see if something's wrong there.

I figured it out. It was that the size of detached CaDES signature is more than 2 times bigger then detached PKCS7 signature, so we weren't leaving enough space for the signature to fit in, so the signature was basically overwritting the PDF content. When I increased the space for the signature everything is working as it should

ITextSharp Document Signing showing invalid

So I've been working on signing of PDF documents lately, and today I came across a new and wonderful problem. So when I sign the document(The document is signed on a server actually) and open that document up in my machine, the signature shows valid and that it is LTV enabled, so pretty much works as expected. But when I open the same document on my boss's computer it shows that the identity of the signing could not be verified even after the certificate was trusted, but if I open the certificate properties it says that the certificate is valid and revocation was performed successfully. What could be the cause of this?
Figure 1: The certificate itself is trusted.
Figure 2: Intermediary certificate is trusted.
Figure 3: Root certificate trusted
Signed Document: https://drive.google.com/file/d/0B9RyqgJoa6W8WlBLemVETXJRU0U/view?usp=sharing
Another weird thing, for the timestamping signature, when I add the Root certificate as a trusted root certificate to adobe, it says that LTV is not enabled, but if I add the GlobalTrustFinder certificate itself as a trusted certificate it says that LTV is enabled. Any reason that it would do that?
Any help would really be appreciated
Code for adding LTV to the existing signature blocks as well as add the timestamping signature:
private void SignDocumentSigningBlockAddLTVVerification(PdfStamper stamper, Certificate certificate)
{
LtvVerification ltvVerification = stamper.LtvVerification;
List<string> signatureFieldNames = stamper.AcroFields.GetSignatureNames();
ITSAClient tsaClient = new TSAClientBouncyCastle(_settingManager["DocumentSigningTimestampingServiceAddress"], String.Empty, String.Empty, Int32.Parse(_settingManager["DocumentSigningEstimatedTimestampSize"]), _settingManager["DocumentSigningEncryptionHashAlgorithm"]);
IOcspClient ocspClient = new OcspClientBouncyCastle();
ICrlClient crlClient = new CrlClientOnline(SignDocumentSigningBlockBuildChain(new X509Certificate2(certificate.Bytes, certificate.Password, X509KeyStorageFlags.Exportable)).ToList());
PdfPKCS7 pkcs7 = stamper.AcroFields.VerifySignature(signatureFieldNames.Last());
if (pkcs7.IsTsp)
{
ltvVerification.AddVerification(signatureFieldNames.Last(), ocspClient, crlClient, LtvVerification.CertificateOption.SIGNING_CERTIFICATE, LtvVerification.Level.OCSP_CRL, LtvVerification.CertificateInclusion.NO);
}
else
{
foreach (string name in stamper.AcroFields.GetSignatureNames())
{
ltvVerification.AddVerification(name, ocspClient, crlClient, LtvVerification.CertificateOption.WHOLE_CHAIN, LtvVerification.Level.OCSP_CRL, LtvVerification.CertificateInclusion.NO);
}
}
ltvVerification.Merge();
PdfSignatureAppearance appearance = stamper.SignatureAppearance;
LtvTimestamp.Timestamp(appearance, tsaClient, null);
}
Kind regards

Revocation could not be checked
But when I open the same document on my boss's computer it shows that the identity of the signing could not be verified even after the certificate was trusted, but if I open the certificate properties it says that the certificate is valid and revocation was performed successfully. What could be the cause of this?
As I cannot check the computer of your boss, this is bound to be guesswork. So...
Giving it a shot
You say the certificate was trusted but there are many certificates involved. Thus, which one have you trusted? According to your screenshot
you have not trusted the user certificate (if you had, that certificate would not have been checked against a CRL but instead it would have been trusted without further tests).
Probably you have trusted the Root CA certificate on the computer of your boss. If you have, not only the end user certificate has to be checked for revocation but also the intermediary CA certificate (SAPO Class 4 CA)! Probably that is where your problem arises
Thus, when you are on the revocation tab, please select the intermediary certificate and see whether revocation checks for it could be performed. If not, that's the problem you ran into.
PS: It's not so
The OP tested the idea mentioned above but
So I went back and checked all the certificates, and it seems that the CRL check is performed as expected.
Thus, it is not as easy as I thought above
the difference in setup was that i trusted the certificate that I signed with as the root certificate hence was showing up as successfully signed on my machine, I've now set up my security settings so that I have the same results as on my boss's computer.
Ok, so the difference between the results on those two computers is explained.
To determine whether the problem is in determining the revocation status of the CA certificate or your user certificate, could you please re-configure your settings yet again and trust the CA certificate?
If after this change the problem remains, the issue most likely is linked to your user certificate and checking its revocation status. If it vanishes, the issue most likely is linked with checking the revocation status of the CA certificate.
I trusted the CA now, it still has the same problem, but if I trust the user cert marking the "Use this certificate as a trusted root" and Certified Documents" checkbox, the signature is shown as valid as well as LTV enabled, but only if I mark it as the root. In all other cases the signature has the warning stating that the revocation checks were not performed.
If you immediately trust your user certificate, the signature is verified positively without further ado. After all, you do trust that certificate...
Putting that aside, there already is a problem when checking the user certificate. I'll try and investigate some more.
PPS: So that is why...
If all else fails, start inspecting the signature or so I thought after I could not find anything really bad in the certificate. I opened the signature in an ASN.1 Dump utility and...
...
<30 42>
2174 66: SEQUENCE {
<06 09>
2176 9: OBJECT IDENTIFIER '1 2 840 113583 1 1 8'
<31 35>
2187 53: SET {
<30 33>
2189 51: SEQUENCE {
<A0 31>
2191 49: [0] {
<30 2F>
2193 47: SEQUENCE {
<2D 2D>
2195 45: Unknown (Reserved) {
<2D 2D>
2197 45: Unknown (Reserved) {
<2D 42>
2199 66: Unknown (Reserved) {
<45 47>
2201 71: [APPLICATION 5]
: 'IN X509 CRL-----.MIIfZzCCHU8CAQEwDQYJKo0...*.H..'
: '............0...Ix<..N+'
: Error: IA5String contains illegal character(s).
Error: Inconsistent object length, 7 bytes difference.
: }
Error: Inconsistent object length, 30 bytes difference.
: }
Error: Inconsistent object length, 32 bytes difference.
: }
Error: Inconsistent object length, 32 bytes difference.
: }
Error: Inconsistent object length, 32 bytes difference.
: }
Error: Inconsistent object length, 32 bytes difference.
: }
Error: Inconsistent object length, 32 bytes difference.
: }
Error: Inconsistent object length, 32 bytes difference.
: }
Error: Inconsistent object length, 32 bytes difference.
: }
...
So there are syntactical errors in the signature itself, more exactly inside the PDF signature certificate revocation information attribute (OID 1 2 840 113583 1 1 8).
(Looking at that region with a different viewer, one sees that there is a text(!) "-----BEGIN X509 CRL-----" - There seems to be a textual CRL representation which is invalid here.)
Adobe Reader, therefore, when trying to check for certificate revocation during signature verification, inspects the signature attribute which can be used to add revocation information already during signing, finds a broken attribute value and seems to stop the verification check with a failure...
The code used for filling the certificate details window seem to be separately coded.
Why that?
One might wonder how a textual CRL representation happens to be included in the PDF signature certificate revocation information attribute of your signature.
I looked into that embedded "CRL". It is in PEM format and cut off after 47 bytes. So I accessed the CRL at the URL given in the certificate (https://pki.trustcentre.co.za/crl/sapo_c4ca.crl) and I indeed retrieved a (complete) CRL in PEM format which is textual. The signature creation code seems to have tried to include it as is but it got cur off.
Is the PKI wrong in supplying that CRL in PEM format instead of in DER format? Or is the signature creation code wrong in assuming to find a DER encoded CRL at the position pointed to?
The current specification for the Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile (RFC 5280) specifies:
When the HTTP or FTP URI scheme is used, the URI MUST point to a single DER encoded CRL as specified in [RFC2585].
(Section 4.2.1.13. CRL Distribution Points)
But it does not say anything equivalent about the HTTPS URI scheme!
Because the PKI employs the HTTPS scheme, therefore, the PKI might be considered to be conform to the letter of the RFC.
Even though the HTTPS scheme essentially merely is the secured variant of the HTTP scheme and so the PKI definitively and unnecessarily violates the spirit of the RFC, it seemingly does not violate the letter of it.
Thus, CRL retrieving classes (like iText's CrlClientOnline) had better check the format of the retrieved CRL and transform it if necessary.
LTV-enabled or not LTV-enabled
An explanation
Another weird thing, for the timestamping signature, when I add the Root certificate as a trusted root certificate to adobe, it says that LTV is not enabled, but if I add the GlobalTrustFinder certificate itself as a trusted certificate it says that LTV is enabled. Any reason that it would do that?
This is pretty obvious:
If you trust the root certificate, revocation has to be checked for the intermediary CA and the end entity certificates. As there are no CRLs or OCSP responses for them embedded in your document, your time stamp is not LTV-enabled.
If on the other hand you explicitly trust the end entity certificate, the very certificate that created the timestamp, the Adobe validator immediately trusts the time stamp. No need to check for revocation of a certificate explicitly trusted... Thus, in this case the time stamp is LTV enabled.
(The term "LTV enabled" is an Adobe term with a very variable value. For backgrounds on this term and other LTV related terminology, cf. the section Background in this answer and the other answers referenced from there,)
Follow-up questions
For the timestamping I'm a bit confused, I followed the code examples in the post you recomended, and the code I've used was a variation of that but in principle the same, why would the CRL and the OCSP responses not be embedded?
The big difference is that the code in my referenced answer does not apply a document time stamp, it merely adds validation related information (certificates, CRLs, OCSP responses).
Adobe's proprietary term "LTV-enabled" refers to a document signature or document time stamp for which "all information required for verification of all relevant signatures (including those signing the CRLs, OCSP responses and time stamps used in the verification process) in the context of the security settings of the validating Adobe Reader are contained in the PDF". Your code, after adding the validation related information, adds a document time stamp, so the information required for validating that time stamp are in general not yet available in the PDF.
ETSI's standardized term "PDF document with LTV" refers to a signed PDF with validation related information (e.g. an "LTV-enabled" PDF) plus a final document time stamp.
The iText sample code you used attempts to create an ETSI "PDF document with LTV" and, therefore, has to be tweaked to instead attempt to create a PDF where all document signature and time stamps are Adobe "LTV-enabled".
As for adding the crls and ocsps, as far as I understood that for adding LTV, you have to add the validation to all signatures in the document and the final document time stamp, as I did in the code I provided, is this not the case?
There are different use cases in the context of LTV, and depending on one's very use case, different information may have to be added.

Extract and recomprise PDF file using Origami

This is regarding Origami, the Ruby tool for exploring PDF files at http://esec-lab.sogeti.com/pages/Origami
By way of example I am trying to open a PDF file, extract it and then rewrite the original PDF. This is the complete code I am trying to use to accomplish this:
hg clone https://code.google.com/p/origami-pdf/
cd origami-pdf/
rake
cd ..
curl 'http://www.ada.gov/hospcombrprt.pdf' -o hospcombrprt.pdf
origami-pdf/bin/pdf2ruby -x hospcombrprt.pdf
mv hospcombrprt.pdf hospcombrprtORIG.pdf
cd hospcombrprt
ruby hospcombrprt.rb # THIS STEP PRODUCES ERRORS
bc hospcombrprt.pdf ../hospcombrprtORIG.pdf || echo FAILED
However this produces the following error:
/Users/williamentriken/Developer/origami-pdf/lib/origami/page.rb:75:in `pages': Invalid page tree (Origami::InvalidPDFError)
from /Users/williamentriken/Developer/origami-pdf/lib/origami/pdf.rb:689:in `compile'
from /Users/williamentriken/Developer/origami-pdf/lib/origami/pdf.rb:233:in `save'
from hospcombrprt.rb:189:in `<main>'
Has anyone else had success in performing this operation using this library and could you please share?

Original Post:
I played around with the library for a while, but I kept getting errors and minor bugs, such as replicated pages and missing pages...
...you should read the authors comment about the limits of using the Origami library.
I recommend the combine_pdf gem, it's great for simple pdf manipulations, such as merging, stamping and the like.
update:
I looked at the specific PDF file and it might be an issue related to an unsupported PDF version.
The http://www.ada.gov/hospcombrprt.pdf file is encrypted with a type 4 encryption, which according to the PDF standard, starting with PDF 1.5, is:
"(PDF 1.5) The security handler defines the use of encryption and decryption in the document, using the rules specified by the CF, StmF, and StrF entries."
The encryption uses AES v.2, which is limited to PDF 1.6 and above:
"AESV2 (PDF 1.6) The application shall ask the security handler for the encryption key and shall implicitly decrypt data with "Algorithm 1: Encryption of data using the RC4 or AES algorithms", using the AES algorithm in Cipher Block Chaining (CBC) mode with a 16-byte block size and an initialization vector that shall be randomly generated and placed as the first 16 bytes in the stream or string."
So, Even if the decryption code is written in, the way to apply that code might not be known due to the way the PDF file is structured...
...It might be better to start with simple PDF files and then patch anything that isn't supported just yet.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas