In Itext 7, how to sign a pdf with 2 steps? - pdf

Following the answers given in this previous question : In Itext 7, how to get the range stream to sign a pdf?, i've tried to reimplement the two steps signing method working in Itext 5 but i encounter an issue when trying to reopen the document result of the first step (with the PdfReader or a pdf reader).(invalid document)
Here is the presigning part for a document already containing an empty signature field named certification ... why is the result of this step invalid ?
PdfReader reader = new PdfReader(fis);
Path signfile = Files.createTempFile("sign", ".pdf");
FileOutputStream os = new FileOutputStream(signfile.toFile());
PdfSigner signer = new PdfSigner(reader, os, false);
signer.setFieldName("certification"); // this field already exists
signer.setCertificationLevel(PdfSigner.CERTIFIED_FORM_FILLING);
PdfSignatureAppearance sap = signer.getSignatureAppearance();
sap.setReason("Certification of the document");
sap.setLocation("On server");
sap.setCertificate(maincertificate);
BouncyCastleDigest digest = new BouncyCastleDigest();
PdfPKCS7 sgn = new PdfPKCS7(null, chain, hashAlgorithm, null, digest,false);
//IExternalSignatureContainer like BlankContainer
PreSignatureContainer external = new PreSignatureContainer(PdfName.Adobe_PPKLite,PdfName.Adbe_pkcs7_detached);
signer.signExternalContainer(external, 8192);
byte[] hash=external.getHash();
byte[] sh = sgn.getAuthenticatedAttributeBytes(hash, null, null,PdfSigner.CryptoStandard.CMS);// sh will be sent for signature
And here is the PreSignatureContainer class :
public class PreSignatureContainer implements IExternalSignatureContainer {
private PdfDictionary sigDic;
private byte hash[];
public PreSignatureContainer(PdfName filter, PdfName subFilter) {
sigDic = new PdfDictionary();
sigDic.put(PdfName.Filter, filter);
sigDic.put(PdfName.SubFilter, subFilter);
}
#Override
public byte[] sign(InputStream data) throws GeneralSecurityException {
String hashAlgorithm = "SHA256";
BouncyCastleDigest digest = new BouncyCastleDigest();
try {
this.hash= DigestAlgorithms.digest(data, digest.getMessageDigest(hashAlgorithm));
} catch (IOException e) {
throw new GeneralSecurityException("PreSignatureContainer signing exception",e);
}
return new byte[0];
}
#Override
public void modifySigningDictionary(PdfDictionary signDic) {
signDic.putAll(sigDic);
}
public byte[] getHash() {
return hash;
}
public void setHash(byte hash[]) {
this.hash = hash;
}
}

why is the result of this step invalid
Because you essentially discovered a bug... ;)
Your sample input file has one feature which triggers the bug: It is compressed using object streams.
When iText manipulates such a file, it also tries to put as many objects as possible into object streams. Unfortunately it also does so with the signature dictionary. This is unfortunate because after writing the whole file it tries to enter some information (which are not available before) into this dictionary which damages the compressed object stream.
What you can do...
You can either
wait for iText development to fix this issue - I assume this won't take too long but probably you don't have the time to wait; or
convert the file to sign into a form which does not use object streams - this can be done using iText itself but probably you cannot accept the file growth this means, or probably the files already are signed which forbids any such transformation; or
patch iText 7 to force the signature dictionary not to be added to an object stream - it is a trivial patch but you probably don't want to used patched libraries.
The patch mentioned above indeed is trivial, the method PdfSigner.preClose(Map<PdfName, Integer>) contains this code:
if (certificationLevel > 0) {
// add DocMDP entry to root
PdfDictionary docmdp = new PdfDictionary();
docmdp.put(PdfName.DocMDP, cryptoDictionary.getPdfObject());
document.getCatalog().put(PdfName.Perms, docmdp); // TODO: setModified?
}
document.close();
The cryptoDictionary.getPdfObject()) is the signature dictionary I mentioned above. During document.close() it is added to an object stream unless it has been written to the output before. Thus, you simply have to add a call to flush that object right before that close call and by parameter make clear that it shall not be added to an object stream:
cryptoDictionary.getPdfObject().flush(false);
With that patch in place, the PDFs your code returns are not damaged as above anymore.
As an aside, iText 5 does contain a similar line in the corresponding PdfSignatureAppearance.preClose(HashMap<PdfName, Integer>) right above the if block corresponding to the if block above. It seems to have been lost during refactoring to iText 7.

Related

Pdf signature invalidates existing signature in Acrobat Reader

I'm using iText 7.1.15 and SignDeferred to apply signatures to pdf documents.
SignDeferred is required since the signature is created PKCS11 hardware token (usb key).
When i sign a "regular" pdf, e.g. created via word, i can apply multiple signatures and all signatures are shown as valid in the adobe acrobat reader.
If the pdf was created by combining multiple pdf documents with adobe DC, the first signature is valid but becomes invalid as soon as the seconds signature is applied.
Document in Adobe reader after the first signature is applied:
Document in Adobe reader after the second signature is applied:
The signatures of the same document are shown as valid in foxit reader.
I've found a similar issue on stackoverflow (multiple signatures invalidate first signature in iTextSharp pdf signing), but it was using iText 5 and i'm not sure it is the same problem.
Question: What can i do in order to keep both signatures valid in the Acrobat Reader?
Unsigned Pdf document on which the first signature becomes invalid:
https://github.com/suntsu42/iTextDemoInvalidSecondSignature/blob/master/test.pdf
Twice signed document which is invalid:
https://github.com/suntsu42/iTextDemoInvalidSecondSignature/blob/master/InvalidDocumentSignedTwice.pdf
Code used for signing
//Step #1 >> prepare pdf for signing (Allocate space for the signature and calculate hash)
using (MemoryStream input = new MemoryStream(pdfToSign))
{
using (var reader = new PdfReader(input))
{
StampingProperties sp = new StampingProperties();
sp.UseAppendMode();
using (MemoryStream baos = new MemoryStream())
{
var signer = new PdfSigner(reader, baos, sp);
//Has to be NOT_CERTIFIED since otherwiese a pdf cannot be signed multiple times
signer.SetCertificationLevel(PdfSigner.NOT_CERTIFIED);
if (visualRepresentation != null)
{
try
{
PdfSignatureAppearance appearance = signer.GetSignatureAppearance();
base.SetPdfSignatureAppearance(appearance, visualRepresentation);
}
catch (Exception ex)
{
throw new Exception("Unable to set provided signature image", ex);
}
}
//Make the SignatureAttributeName unique
SignatureAttributeName = $"SignatureAttributeName_{DateTime.Now:yyyyMMddTHHmmss}";
signer.SetFieldName(SignatureAttributeName);
DigestCalcBlankSigner external = new DigestCalcBlankSigner(PdfName.Adobe_PPKLite, PdfName.Adbe_pkcs7_detached);
signer.SignExternalContainer(external, EstimateSize);
hash = external.GetDocBytesHash();
tmpPdf = baos.ToArray();
}
}
//Step #2 >> Create the signature based on the document hash
// This is the part which accesses the HSM via PCKS11
byte[] signature = null;
if (LocalSigningCertificate == null)
{
signature = CreatePKCS7SignatureViaPKCS11(hash, pin);
}
else
{
signature = CreatePKCS7SignatureViaX509Certificate(hash);
}
//Step #3 >> Apply the signature to the document
ReadySignatureSigner extSigContainer = new ReadySignatureSigner(signature);
using (MemoryStream preparedPdfStream = new MemoryStream(tmpPdf))
{
using (var pdfReader = new PdfReader(preparedPdfStream))
{
using (PdfDocument docToSign = new PdfDocument(pdfReader))
{
using (MemoryStream outStream = new MemoryStream())
{
PdfSigner.SignDeferred(docToSign, SignatureAttributeName, outStream, extSigContainer);
return outStream.ToArray();
}
}
}
}
}
Sample project
I've created a working sample project which uses a local certificate for signing. I also did update iText to version 7.2 but with the same result.
It also contains the document which cannot be signed twice (test.pdf)
https://github.com/suntsu42/iTextDemoInvalidSecondSignature/tree/master
Edit
I've applied the solution provided by MKL to the sample project on github.
As a second note, It is also possible to use PdfSigner but in this case, the bookmarks of the original document must be removed.
As already mentioned in a comment, the example document "InvalidDocumentSignedTwice.pdf" has the signature not applied in an incremental update, so here it is obvious that former signatures will break. But this is not the issue of the OP's example project. Thus, the issue is processed with an eye on the actual outputs of the example project.
Analyzing the Issue
When validating signed PDFs Adobe Acrobat executes two types of checks:
It checks the signature itself and whether the revision of the PDF it covers is untouched.
(If there are additions to the PDF after the revision covered by the signature:) It checks whether changes applied in incremental updates only consist of allowed changes.
The former check is pretty stable and standard but the second one is very whimsical and prone to incorrect negative validation results. Like in your case...
In case of your example document one can simply determine that the first check must positively validate the first signature: The file with only one (valid!) signature constitutes a byte-wise starting piece of the file with two signatures, so nothing can have been broken here.
Thus, the second type of check, the fickle type, must go wrong in the case at hand.
To find out what change one has to analyze the changes done during signing. A helpful fact is that doing the same using iText 5 does not produce the issue; thus, the change that triggered the check must be in what iText 7 does differently than iText 5 here. And the main difference in this context is that iText 7 has a more thorough tagging support than iText 5 and, therefore, also adds a reference to the new signature field to the document structure tree.
This by itself does not yet trigger the whimsical check, though, it only does so here because one outline element refers to the parent structure tree element of the change as its structure element (SE). Apparently Adobe Acrobat considers the change in the associated structure element as a change of the outline link and, therefore, as a (disallowed) change of the behavior of the document revision signed by the first signature.
So is this an iText error (adding entries to the structure tree) or an Adobe Acrobat error (complaining about the additions)? Well, in a tagged PDF (and your PDF has the corresponding Marked entry set to true) the content including annotations and form fields is expected to be tagged. Thus, addition of structure tree entries for the newly added signature field and its appearance not only should be allowed but actually recommended or even required! So this appears to be an error of Adobe Acrobat.
A Work-Around
Knowing that this appears to be an Adobe Acrobat bug is all well and good, but at the end of the day one might need a way now to sign such documents multiple times without current Adobe Acrobat calling that invalid.
It is possible to make iText believe there is no structure tree and no need to update a structure tree. This can be done by making the initialization of the document tag structure fail. For this we override the PdfDocument method TryInitTagStructure. As the iText PdfSigner creates its document object internally, we do this in an override of the PdfSigner method InitDocument.
I.e. instead of PdfSigner we use the class MySigner defined like this:
public class MySigner : PdfSigner
{
public MySigner(PdfReader reader, Stream outputStream, StampingProperties properties) : base(reader, outputStream, properties)
{
}
override protected PdfDocument InitDocument(PdfReader reader, PdfWriter writer, StampingProperties properties)
{
return new MyDocument(reader, writer, properties);
}
}
public class MyDocument : PdfDocument
{
public MyDocument(PdfReader reader, PdfWriter writer, StampingProperties properties) : base(reader, writer, properties)
{
}
override protected void TryInitTagStructure(PdfDictionary str)
{
structTreeRoot = null;
structParentIndex = -1;
}
}
Using MySigner for signing documents iText won't add tagging anymore and so won't make Adobe Acrobat complain about new entries in the structure tree.
The Same Work-Around in Java
As I feel more comfortable working in Java, I analyzed this and tested the work-around in Java.
Here this can be put into a more closed form (maybe it also can for C#, I don't know), instead of initializing the signer like this
PdfSigner pdfSigner = new PdfSigner(pdfReader, os, new StampingProperties().useAppendMode());
we do it like this:
PdfSigner pdfSigner = new PdfSigner(pdfReader, os, new StampingProperties().useAppendMode()) {
#Override
protected PdfDocument initDocument(PdfReader reader, PdfWriter writer, StampingProperties properties) {
return new PdfDocument(reader, writer, properties) {
#Override
protected void tryInitTagStructure(PdfDictionary str) {
structTreeRoot = null;
structParentIndex = -1;
}
};
}
};
(MultipleSignaturesAndTagging test testSignTestManuelTwiceNoTag)
TL;DR
iText 7 adds a structure element for the new signature field to the document structure tree during signing. If the parent node of this new node is referenced as associated structure element of an outline element, though, Adobe Acrobat incorrectly considers this a disallowed change. As a work-around one can tweak iText signing to not add structure elements.
I've got a similar problem with signatures after last update of Adobe Reader. I wrote a post on their community, but they still didn't answer me.
Take a look:
https://community.adobe.com/t5/acrobat-reader-discussions/invalid-signatures-after-adobe-reader-update-2022-001-20085/td-p/12892048
I am using a iText v.5.5.5 to generate pdf. I sign and certify pdf document in single method. Moreover, foxit reader shows that signatures are valid. I believe that this is an Adobe bug and it will be fixed soon :) Key are an log.

itextsharp signing pdf with signed hash

I'm trying to sign a pdf through a signing service. This service requires to send a hex encoded SHA256 digest and in return I receive a hex encoded signatureValue. Besides that I also receive a signing certificate, intermediate certificate, OCSP response, and TimeStampToken. However, I already get stuck trying to sign the pdf with the signatureValue.
I have read Bruno's white paper, browsed the internet excessively, and tried many different ways, but the signature keeps coming up as invalid.
My latest attempt:
First, prepare pdf
PdfReader reader = new PdfReader(src);
FileStream os = new FileStream(dest, FileMode.Create);
PdfStamper stamper = PdfStamper.CreateSignature(reader, os, '\0');
PdfSignatureAppearance appearance = stamper.SignatureAppearance;
appearance.Certificate = signingCertificate;
IExternalSignatureContainer external = new ExternalBlankSignatureContainer(PdfName.ADOBE_PPKLITE, PdfName.ADBE_PKCS7_DETACHED);
MakeSignature.SignExternalContainer(appearance, external, 8192);
string hashAlgorithm = "SHA-256";
PdfPKCS7 sgn = new PdfPKCS7(null, chain, hashAlgorithm, false);
PdfSignatureAppearance appearance2 = stamper.SignatureAppearance;
Stream stream = appearance2.GetRangeStream();
byte[] hash = DigestAlgorithms.Digest(stream, hashAlgorithm);
byte[] sh = sgn.getAuthenticatedAttributeBytes(hash, null, null, CryptoStandard.CMS);
Hash byte[] sh and convert to string as follows
private static String sha256_hash(Byte[] value)
{
using (SHA256 hash = SHA256.Create())
{
return String.Concat(hash.ComputeHash(value).Select(item => item.ToString("x2"))).ToUpper();
}
}
and send to signing service. The received hex encoded signatureValue I then convert to bytes
private static byte[] StringToByteArray(string hex)
{
return Enumerable.Range(0, hex.Length).Where(x => x % 2 == 0).Select(x => Convert.ToByte(hex.Substring(x, 2), 16)).ToArray();
}
Finally, create signature
private void CreateSignature(string src, string dest, byte[] sig)
{
PdfReader reader = new PdfReader(src); // src is now prepared pdf
FileStream os = new FileStream(dest, FileMode.Create);
IExternalSignatureContainer external = new MyExternalSignatureContainer(sig);
MakeSignature.SignDeferred(reader, "Signature1", os, external);
reader.Close();
os.Close();
}
private class MyExternalSignatureContainer : IExternalSignatureContainer
{
protected byte[] sig;
public MyExternalSignatureContainer(byte[] sig)
{
this.sig = sig;
}
public byte[] Sign(Stream s)
{
return sig;
}
public void ModifySigningDictionary(PdfDictionary signDic) { }
}
What am I doing wrong? Help is very much appreciated. Thanks!
Edit: Current state
Thanks to help from mkl and following Bruno's deferred signing example I've gotten past the invalid signature message. Apparently I don't receive a full chain from the signing service, but just an intermediate certificate, which caused the invalid message. Unfortunately, the signature still has flaws.
I build the chain like this:
List<X509Certificate> certificateChain = new List<X509Certificate>
{
signingCertificate,
intermediateCertificate
};
In the sign method of MyExternalSignatureContainer I now construct and return the signature container:
public byte[] Sign(Stream s)
{
string hashAlgorithm = "SHA-256";
PdfPKCS7 sgn = new PdfPKCS7(null, chain, hashAlgorithm, false);
byte[] ocspResponse = Convert.FromBase64String("Base64 encoded DER representation of the OCSP response received from signing service");
byte[] hash = DigestAlgorithms.Digest(s, hashAlgorithm);
byte[] sh = sgn.getAuthenticatedAttributeBytes(hash, ocspResponse, null, CryptoStandard.CMS);
string messageDigest = Sha256_hash(sh);
// messageDigest sent to signing service
byte[] signatureAsByte = StringToByteArray("Hex encoded SignatureValue received from signing service");
sgn.SetExternalDigest(signatureAsByte, null, "RSA");
ITSAClient tsaClient = new MyITSAClient();
return sgn.GetEncodedPKCS7(hash, tsaClient, ocspResponse, null, CryptoStandard.CMS);
}
public class MyITSAClient : ITSAClient
{
public int GetTokenSizeEstimate()
{
return 0;
}
public IDigest GetMessageDigest()
{
return new Sha256Digest();
}
public byte[] GetTimeStampToken(byte[] imprint)
{
string hashedImprint = HexEncode(imprint);
// Hex encoded Imprint sent to signing service
return Convert.FromBase64String("Base64 encoded DER representation of TimeStampToken received from signing service");
}
}
Still get these messages:
"The signer's identity is unknown because it has not been included in the list of trusted identities and none or its parent
certificates are trusted identities"
"The signature is timestamped, but the timestamp could not be verified"
Further help is very much appreciated again!
"What am I doing wrong?"
The problem is that on one hand you start constructing a CMS signature container using a PdfPKCS7 instance
PdfPKCS7 sgn = new PdfPKCS7(null, chain, hashAlgorithm, false);
and for the calculated document digest hash retrieve the signed attributes to be
byte[] sh = sgn.getAuthenticatedAttributeBytes(hash, null, null, CryptoStandard.CMS);
to send them for signing.
So far so good.
But then you ignore the CMS container you started constructing but instead inject the naked signature bytes you got from your service into the PDF.
This cannot work as your signature bytes don't sign the document directly but instead they sign these signed attributes (and, therefore, indirectly the document as the document hash is one of the signed attributes). Thus, by ignoring the CMS container under construction you dropped the actually signed data...
Furthermore, the subfilter ADBE_PKCS7_DETACHED you use promises that the embedded signature is a full CMS signature container, not a few naked signature bytes, so the format also is wrong.
How to do it instead?
Instead of injecting the naked signature bytes you got from your service into the PDF as is, you have to set them as external digest in the PdfPKCS7 instance in which you originally started constructing the signature container:
sgn.SetExternalDigest(sig, null, ENCRYPTION_ALGO);
(ENCRYPTION_ALGO must be the encryption part of the signature algorithm, I assume in your case "RSA".)
and then you can retrieve the generated CMS signature container:
byte[] encodedSig = sgn.GetEncodedPKCS7(hash, null, null, null, CryptoStandard.CMS);
Now this is the signature container to inject into the document using MyExternalSignatureContainer:
IExternalSignatureContainer external = new MyExternalSignatureContainer(encodedSig);
MakeSignature.SignDeferred(reader, "Signature1", os, external);
Remaining issues
Having corrected your code Adobe Reader still warns about your signatures:
"The signer's identity is unknown because it has not been included in the list of trusted identities and none or its parent certificates are trusted identities"
This warning is to be expected and correct!
The signer's identity is unknown because your signature service uses merely a demo certificate, not a certificate for production use:
As you see the certificate is issued by "GlobalSign Non-Public HVCA Demo", and non-public demo issuers for obvious reasons must not be trusted (unless you manually add them to your trust store for testing purposes).
"The signature is timestamped, but the timestamp could not be verified"
There are two reasons why Adobe does not approve of your timestamp:
On one hand, just like above, the timestamp certificate is a non-public, demo certificate ("DSS Non-Public Demo TSA Responder"). Thus, there is no reason for the verifier to trust your timestamp.
On the other hand, though, there is an actual error in your timestamp'ing code, you apply the hashing algorithm twice! In your MyITSAClient class you have
public byte[] GetTimeStampToken(byte[] imprint)
{
string hashedImprint = Sha256_hash(imprint);
// hashedImprint sent to signing service
return Convert.FromBase64String("Base64 encoded DER representation of TimeStampToken received from signing service");
}
The imprint parameter of your GetTimeStampToken implementation is already hashed, so you have to hex encode these bytes and send them for timestamp'ing. But you apply your method Sha256_hash which first hashes and then hex encodes this new hash.
Thus, instead of applying Sha256_hash merely hex encode the imprint!

iTextSharp Read Text From Single Layer of PDF

Currently I am using a custom LocationTextExtractionStrategy to extract text from a PDF that returns a TextRenderInfo[]. I would like to be able to determine if a TextRenderInfo object (or PDFString, child of TextRenderInfo) appears in a specific layer. I am not sure if this is possible. To get the layers in a PDF, I am using:
Dictionary<string,PdfLayer> layers;
using (var pdfReader = new PdfReader(src))
{
var newSrc = Path.Combine(["new file location"]);
using (var stream = new FileStream(newSrc, FileMode.Create))
{
PdfStamper stamper = new PdfStamper(pdfReader, stream);
layers = stamper.GetPdfLayers();
stamper.Close();
}
pdfReader.Close();
src = newSrc;
}
To extract the text, I am using:
var textExtractor = new TextExtractionStrategy();
PdfTextExtractor.GetTextFromPage(pdfReader, pdfPageNum,textExtractor);
List<TextRenderInfo> results = textExtractor.Results;
Is there any way that I can check if the individual TextRenderInfo results exist within the layers obtained in the first code snippet. Any help would be much appreciated.
It is possible to get the contents from a single layer, but you'll have to jump through a few hoops to work it out. Specifically, you will have to recreate some of the logic that is provided by the PdfTextExtractor and PdfReaderContentParser.
public static String GetText(PdfReader reader, int pageNumber, int streamNumber) {
var strategy = new LocationTextExtractionStrategy();
var processor = new PdfContentStreamProcessor(strategy);
var resourcesDic = pageDic.GetAsDict(PdfName.RESOURCES);
// assuming you still need to extract the page bytes
byte[] contents = GetContentBytesForPageStream(reader, pageNumber, streamNumber);
processor.ProcessContent(contents, resourcesDic);
return strategy.GetResultantText();
}
public static byte[] GetContentBytesForPageStream(PdfReader reader, int pageNumber, int streamNumber) {
PdfDictionary pageDictionary = reader.GetPageN(pageNum);
PdfObject contentObject = pageDictionary.Get(PdfName.CONTENTS);
if (contentObject == null)
return new byte[0];
byte[] contentBytes = GetContentBytesFromContentObject(contentObject, streamNumber);
return contentBytes;
}
public static byte[] GetContentBytesFromContentObject(PdfObject contentObject, int streamNumber) {
// copy-paste logic from
// ContentByteUtils.GetContentBytesFromContentObject(contentObject);
// but in case PdfObject.ARRAY: only select the streamNumber you require
}
If you're specifically looking to just use PdfTextExtractor or PdfReaderContentParser, and ask the returned TextRenderInfo for the layer it's on, then I'm not sure it will be easily possible. There are a number of problems with that:
TextRenderInfo doesn't store that information, so you'd have to subclass it (which is possible)
you'd have to rewrite the logic that creates the TextRenderInfo objects. This is possible by registering custom IContentOperator objects for all text operators (Tj, TJ, ' and ") with the PdfTextExtractor or PdfReaderContentParser
the hardest part is that you have already lost layer information in ContentByteUtils.GetContentBytesFromContentObject - so you'd need to retain that somehow, which creates its own set of problems.

Text Extraction, Not Image Extraction

Please help me understand if my solution is correct.
I'm trying to extract text from a PDF file with a LocationTextExtractionStrategy parser. I'm getting exceptions because the ParseContentMethod tries to parse inline images? The code is simple and looks similar to this:
RenderFilter[] filter = { new RegionTextRenderFilter(cropBox) };
ITextExtractionStrategy strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
PdfTextExtractor.GetTextFromPage(pdfReader, pageNumber, strategy);
I realize the images are in the content stream but I have a PDF file failing to extract text because of inline images. It returns an UnsupportedPdfException of "The filter /DCTDECODE is not supported" and then it finally fails with and InlineImageParseException of "Could not find image data or EI", when all I really care about is the text. The BI/EI exists in my file so I assume this failure is because of the /DCTDECODE exception. But again, I don't care about images, I'm looking for text.
My current solution for this is to add a filterHandler in the InlineImageUtils class that assigns the Filter_DoNothing() filter to the DCTDECODE filterHandler dictionary. This way I don't get exceptions when I have InlineImages with DCTDECODE. Like this:
private static bool InlineImageStreamBytesAreComplete(byte[] samples, PdfDictionary imageDictionary) {
try {
IDictionary<PdfName, FilterHandlers.IFilterHandler> handlers = new Dictionary<PdfName, FilterHandlers.IFilterHandler>(FilterHandlers.GetDefaultFilterHandlers());
handlers[PdfName.DCTDECODE] = new Filter_DoNothing();
PdfReader.DecodeBytes(samples, imageDictionary, handlers);
return true;
} catch (IOException e) {
return false;
}
}
public class Filter_DoNothing : FilterHandlers.IFilterHandler
{
public byte[] Decode(byte[] b, PdfName filterName, PdfObject decodeParams, PdfDictionary streamDictionary)
{
return b;
}
}
My problem with this "fix" is that I had to change the iTextSharp library. I'd rather not do that so I can try to stay compatible with future versions.
Here's the PDF in question:
https://app.box.com/s/7eaewzu4mnby9ogpl2frzjswgqxn9rz5

Issues with iTextsharp and pdf manipulation

I am getting a pdf-document (no password) which is generated from a third party software with javascript and a few editable fields in it. If I load this pdf-document with the pdfReader class the NumberOfPagesProperty is always 1 although the pdf-document has 17 pages. Oddly enough the document has 17 pages if I save the stream afterwards. When I now try to open the document the Acrobat Reader shows an extended feature warning and the fields are not fillable anymore (I haven't flattened the document). Do anyone know about such a problem?
Background Info:
My job is to remove the javascript code, fill out some fields and save the document afterwards.
I am using the iTextsharp version 5.5.3.0.
Unfortunately I can't upload a sample file because there are some confidental data in it.
private byte[] GetDocumentData(string documentName)
{
var document = String.Format("{0}{1}\\{2}.pdf", _component.OutputDirectory, _component.OutputFileName.Replace(".xml", ".pdf"), documentName);
if (File.Exists(document))
{
PdfReader.unethicalreading = true;
using (var originalData = new MemoryStream(File.ReadAllBytes(document)))
{
using (var updatedData = new MemoryStream())
{
var pdfTool = new PdfInserter(originalData, updatedData) {FormFlattening = false};
pdfTool.RemoveJavascript();
pdfTool.Save();
return updatedData.ToArray();
}
}
}
return null;
}
//Old version that wasn't working
public PdfInserter(Stream pdfInputStream, Stream pdfOutputStream)
{
_pdfInputStream = pdfInputStream;
_pdfOutputStream = pdfOutputStream;
_pdfReader = new PdfReader(_pdfInputStream);
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream);
}
//Solution
public PdfInserter(Stream pdfInputStream, Stream pdfOutputStream, char pdfVersion = '\0', bool append = true)
{
_pdfInputStream = pdfInputStream;
_pdfOutputStream = pdfOutputStream;
_pdfReader = new PdfReader(_pdfInputStream);
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream, pdfVersion, append);
}
public void RemoveJavascript()
{
for (int i = 0; i <= _pdfReader.XrefSize; i++)
{
PdfDictionary dictionary = _pdfReader.GetPdfObject(i) as PdfDictionary;
if (dictionary != null)
{
dictionary.Remove(PdfName.AA);
dictionary.Remove(PdfName.JS);
dictionary.Remove(PdfName.JAVASCRIPT);
}
}
}
The extended feature warning is a hint that the original PDF had been signed using a usage rights signature to "Reader-enable" it, i.e. to tell the Adobe Reader to activate some additional features when opening it, and the OP's operation on it has invalidated the signature.
Indeed, he operated using
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream);
which creates a PdfStamper which completely re-generates the document. To not invalidate the signature, though, one has to use append mode as in the OP's fixed code (for char pdfVersion = '\0', bool append = true):
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream, pdfVersion, append);
If I load this pdf-document with the pdfReader class the NumberOfPagesProperty is always 1 although the pdf-document has 17 pages. Oddly enough the document has 17 pages
Quite likely it is a PDF with a XFA form, i.e. the PDF is only a carrier of some XFA data from which Adobe Reader builds those 17 pages. The actual PDF in that case usually only contains one page saying something like "if you see this, your viewer does not support XFA."
For a final verdict, though, one has to inspect the PDF.