Returning PDFDocument object from PDFStamper itextsharp - pdf

I want to return the Document object from below code.
At present I get a document has no pages exception.
private static Document GeneratePdfAcroFields(PdfReader reader, Document docReturn)
{
if (File.Exists(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"]))
File.Delete(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"]);
PdfStamper stamper = new PdfStamper(reader, new FileStream(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"],FileMode.Create));
AcroFields form = stamper.AcroFields;
///INSERTING TEXT DYNAMICALLY JUST FOR EXAMPLE.
form.SetField("topmostSubform[0].Page16[0].topmostSubform_0_\\.Page78_0_\\.TextField3_9_[0]", "This value was dynamically added.");
stamper.FormFlattening = false;
stamper.Close();
FileStream fsRead = new FileStream(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"], FileMode.Open);
Document docret = new Document(reader.GetPageSizeWithRotation(1));
return docret;
}

Thanks Chris.
Just to reiterate what #BrunoLowagie is saying, passing Document objects around almost never makes >sense. Despite what the name might sound like, a Document doesn't represent a PDF in any way. Calling >ToString() or GetBytes() (if that method actually existed) wouldn't get you a PDF. A Document is just a >one-way funnel for passing human-friendly commands over to an engine that actually writes raw PDF >tokens. The engine, however, is also not even a PDF. The only thing that truly is a PDF is the raw >bytes of the stream that is being written to. – Chris Haas

Related

Pdf signature invalidates existing signature in Acrobat Reader

I'm using iText 7.1.15 and SignDeferred to apply signatures to pdf documents.
SignDeferred is required since the signature is created PKCS11 hardware token (usb key).
When i sign a "regular" pdf, e.g. created via word, i can apply multiple signatures and all signatures are shown as valid in the adobe acrobat reader.
If the pdf was created by combining multiple pdf documents with adobe DC, the first signature is valid but becomes invalid as soon as the seconds signature is applied.
Document in Adobe reader after the first signature is applied:
Document in Adobe reader after the second signature is applied:
The signatures of the same document are shown as valid in foxit reader.
I've found a similar issue on stackoverflow (multiple signatures invalidate first signature in iTextSharp pdf signing), but it was using iText 5 and i'm not sure it is the same problem.
Question: What can i do in order to keep both signatures valid in the Acrobat Reader?
Unsigned Pdf document on which the first signature becomes invalid:
https://github.com/suntsu42/iTextDemoInvalidSecondSignature/blob/master/test.pdf
Twice signed document which is invalid:
https://github.com/suntsu42/iTextDemoInvalidSecondSignature/blob/master/InvalidDocumentSignedTwice.pdf
Code used for signing
//Step #1 >> prepare pdf for signing (Allocate space for the signature and calculate hash)
using (MemoryStream input = new MemoryStream(pdfToSign))
{
using (var reader = new PdfReader(input))
{
StampingProperties sp = new StampingProperties();
sp.UseAppendMode();
using (MemoryStream baos = new MemoryStream())
{
var signer = new PdfSigner(reader, baos, sp);
//Has to be NOT_CERTIFIED since otherwiese a pdf cannot be signed multiple times
signer.SetCertificationLevel(PdfSigner.NOT_CERTIFIED);
if (visualRepresentation != null)
{
try
{
PdfSignatureAppearance appearance = signer.GetSignatureAppearance();
base.SetPdfSignatureAppearance(appearance, visualRepresentation);
}
catch (Exception ex)
{
throw new Exception("Unable to set provided signature image", ex);
}
}
//Make the SignatureAttributeName unique
SignatureAttributeName = $"SignatureAttributeName_{DateTime.Now:yyyyMMddTHHmmss}";
signer.SetFieldName(SignatureAttributeName);
DigestCalcBlankSigner external = new DigestCalcBlankSigner(PdfName.Adobe_PPKLite, PdfName.Adbe_pkcs7_detached);
signer.SignExternalContainer(external, EstimateSize);
hash = external.GetDocBytesHash();
tmpPdf = baos.ToArray();
}
}
//Step #2 >> Create the signature based on the document hash
// This is the part which accesses the HSM via PCKS11
byte[] signature = null;
if (LocalSigningCertificate == null)
{
signature = CreatePKCS7SignatureViaPKCS11(hash, pin);
}
else
{
signature = CreatePKCS7SignatureViaX509Certificate(hash);
}
//Step #3 >> Apply the signature to the document
ReadySignatureSigner extSigContainer = new ReadySignatureSigner(signature);
using (MemoryStream preparedPdfStream = new MemoryStream(tmpPdf))
{
using (var pdfReader = new PdfReader(preparedPdfStream))
{
using (PdfDocument docToSign = new PdfDocument(pdfReader))
{
using (MemoryStream outStream = new MemoryStream())
{
PdfSigner.SignDeferred(docToSign, SignatureAttributeName, outStream, extSigContainer);
return outStream.ToArray();
}
}
}
}
}
Sample project
I've created a working sample project which uses a local certificate for signing. I also did update iText to version 7.2 but with the same result.
It also contains the document which cannot be signed twice (test.pdf)
https://github.com/suntsu42/iTextDemoInvalidSecondSignature/tree/master
Edit
I've applied the solution provided by MKL to the sample project on github.
As a second note, It is also possible to use PdfSigner but in this case, the bookmarks of the original document must be removed.
As already mentioned in a comment, the example document "InvalidDocumentSignedTwice.pdf" has the signature not applied in an incremental update, so here it is obvious that former signatures will break. But this is not the issue of the OP's example project. Thus, the issue is processed with an eye on the actual outputs of the example project.
Analyzing the Issue
When validating signed PDFs Adobe Acrobat executes two types of checks:
It checks the signature itself and whether the revision of the PDF it covers is untouched.
(If there are additions to the PDF after the revision covered by the signature:) It checks whether changes applied in incremental updates only consist of allowed changes.
The former check is pretty stable and standard but the second one is very whimsical and prone to incorrect negative validation results. Like in your case...
In case of your example document one can simply determine that the first check must positively validate the first signature: The file with only one (valid!) signature constitutes a byte-wise starting piece of the file with two signatures, so nothing can have been broken here.
Thus, the second type of check, the fickle type, must go wrong in the case at hand.
To find out what change one has to analyze the changes done during signing. A helpful fact is that doing the same using iText 5 does not produce the issue; thus, the change that triggered the check must be in what iText 7 does differently than iText 5 here. And the main difference in this context is that iText 7 has a more thorough tagging support than iText 5 and, therefore, also adds a reference to the new signature field to the document structure tree.
This by itself does not yet trigger the whimsical check, though, it only does so here because one outline element refers to the parent structure tree element of the change as its structure element (SE). Apparently Adobe Acrobat considers the change in the associated structure element as a change of the outline link and, therefore, as a (disallowed) change of the behavior of the document revision signed by the first signature.
So is this an iText error (adding entries to the structure tree) or an Adobe Acrobat error (complaining about the additions)? Well, in a tagged PDF (and your PDF has the corresponding Marked entry set to true) the content including annotations and form fields is expected to be tagged. Thus, addition of structure tree entries for the newly added signature field and its appearance not only should be allowed but actually recommended or even required! So this appears to be an error of Adobe Acrobat.
A Work-Around
Knowing that this appears to be an Adobe Acrobat bug is all well and good, but at the end of the day one might need a way now to sign such documents multiple times without current Adobe Acrobat calling that invalid.
It is possible to make iText believe there is no structure tree and no need to update a structure tree. This can be done by making the initialization of the document tag structure fail. For this we override the PdfDocument method TryInitTagStructure. As the iText PdfSigner creates its document object internally, we do this in an override of the PdfSigner method InitDocument.
I.e. instead of PdfSigner we use the class MySigner defined like this:
public class MySigner : PdfSigner
{
public MySigner(PdfReader reader, Stream outputStream, StampingProperties properties) : base(reader, outputStream, properties)
{
}
override protected PdfDocument InitDocument(PdfReader reader, PdfWriter writer, StampingProperties properties)
{
return new MyDocument(reader, writer, properties);
}
}
public class MyDocument : PdfDocument
{
public MyDocument(PdfReader reader, PdfWriter writer, StampingProperties properties) : base(reader, writer, properties)
{
}
override protected void TryInitTagStructure(PdfDictionary str)
{
structTreeRoot = null;
structParentIndex = -1;
}
}
Using MySigner for signing documents iText won't add tagging anymore and so won't make Adobe Acrobat complain about new entries in the structure tree.
The Same Work-Around in Java
As I feel more comfortable working in Java, I analyzed this and tested the work-around in Java.
Here this can be put into a more closed form (maybe it also can for C#, I don't know), instead of initializing the signer like this
PdfSigner pdfSigner = new PdfSigner(pdfReader, os, new StampingProperties().useAppendMode());
we do it like this:
PdfSigner pdfSigner = new PdfSigner(pdfReader, os, new StampingProperties().useAppendMode()) {
#Override
protected PdfDocument initDocument(PdfReader reader, PdfWriter writer, StampingProperties properties) {
return new PdfDocument(reader, writer, properties) {
#Override
protected void tryInitTagStructure(PdfDictionary str) {
structTreeRoot = null;
structParentIndex = -1;
}
};
}
};
(MultipleSignaturesAndTagging test testSignTestManuelTwiceNoTag)
TL;DR
iText 7 adds a structure element for the new signature field to the document structure tree during signing. If the parent node of this new node is referenced as associated structure element of an outline element, though, Adobe Acrobat incorrectly considers this a disallowed change. As a work-around one can tweak iText signing to not add structure elements.
I've got a similar problem with signatures after last update of Adobe Reader. I wrote a post on their community, but they still didn't answer me.
Take a look:
https://community.adobe.com/t5/acrobat-reader-discussions/invalid-signatures-after-adobe-reader-update-2022-001-20085/td-p/12892048
I am using a iText v.5.5.5 to generate pdf. I sign and certify pdf document in single method. Moreover, foxit reader shows that signatures are valid. I believe that this is an Adobe bug and it will be fixed soon :) Key are an log.

Add attachment using iTextSharp.text.pdf.PdfStamper

I'm having a little bit trouble changing from PdfStamper.AddFileAttachment that receives four arguments to PdfStamper.AddFileAttachment which recieves PdfFileSpecification object as an argument. The thing is i want to add files to my pdf document as an embedded files,
Can some one tell me if i'm doing this the right way?!
I've replaced the : iText_Stamper.AddFileAttachment(desc, b, s, s);
with:
PdfFileSpecification pfs = PdfFileSpecification.FileEmbedded(iText_Stamper.Writer,
f.sDataFileName, s, b);
pfs.AddDescription(desc, true);
iText_Stamper.AddFileAttachment(desc, pfs);
PdfTargetDictionary target = new PdfTargetDictionary(true);
target.EmbeddedFileName = s;
PdfDestination dest = new PdfDestination(PdfDestination.FIT);
dest.AddFirst(new PdfString(desc));
iTextSharp.text.pdf.PdfAction action = iTextSharp.text.pdf.PdfAction.GotoEmbedded(null, target,
dest, true);
Chunk chunk = new Chunk(desc);
chunk.SetAction(action);
iText_Stamper.Writer.Add(chunk);
Is this sufficient? am i doing it right?
I'll be glad for some help.
The main issue in your code is that you assume that the PdfWriter descendant iText_Stamper.Writer can be used like a Document to which you can add text chunks using the Add method, and expect iTextSharp to layout such material automatically.
The class hierarchy unfortunately suggests this as both PdfWriter and Document implement the interface IElementListener which provides a method bool Add(IElement element).
Nonetheless, this assumption is wrong, the class hierarchies overlap for internal code reuse reasons, not to suggest similar usages; the Add implementation of the PdfWriter descendant iText_Stamper.Writer merely returns false and does not even attempt to add the given element to the document.
In particular in case of pages the stamper retrieved from the underlying PdfReader (and didn't add to them) this does make sense: If there is content somehow scattered across a page, where should the stamper add the new content? Should it consider the existing content background material and start at the top? Or should it somehow find a big unused page area and paint the elements there? (In case of newly added pages, though, the stamper indeed could have been programmed to serve like a regular PdfWriter and have allowed linkage with a Document to automatically layout content...)
Thus, to allow for new content to be automatically layout'ed, you have to tell iTextSharp where to put the content. You can do this by means of a ColumnText instance like this:
using (PdfReader reader = new PdfReader(source))
using (PdfStamper stamper = new PdfStamper(reader, new FileStream(result, FileMode.Create, FileAccess.Write)))
{
PdfFileSpecification pfs = PdfFileSpecification.FileEmbedded(stamper.Writer, pdfPath, pdfName, pdfBytes);
pfs.AddDescription(pdfDesc, true);
stamper.AddFileAttachment(pdfDesc, pfs);
PdfContentByte cb = stamper.GetOverContent(1);
ColumnText ct = new ColumnText(cb);
ct.SetSimpleColumn(30, 30, reader.GetPageSize(1).GetRight(30), 60);
PdfTargetDictionary target = new PdfTargetDictionary(true);
target.EmbeddedFileName = pdfDesc;
PdfDestination dest = new PdfDestination(PdfDestination.FIT);
dest.AddFirst(new PdfNumber(1));
PdfAction action = PdfAction.GotoEmbedded(null, target, dest, true);
Chunk chunk = new Chunk(pdfDesc);
chunk.SetAction(action);
ct.AddElement(chunk);
ct.Go();
}
(I used somewhat more descriptive names than your f.sDataFileName, s, b)
In the course of layout'ing your chunks, ColumnText also establishes the desired goto-emebedded links.
By the way, from your writing embedded files and using a PdfAction.GotoEmbedded I assumed you attach another PDF. If that assumption happens to be wrong, you might want to use PdfAnnotation.CreateFileAttachment instead

Reading PDF Bookmarks in VB.NET using iTextSharp

I am making a tool that scans PDF files and searches for text in PDF bookmarks and body text. I am using Visual Studio 2008 with VB.NET with iTextSharp.
How do I load bookmarks' list from an existing PDF file?
It depends on what you understand when you say "bookmarks".
You want the outlines (the entries that are visible in the bookmarks panel):
The CreateOnlineTree examples shows you how to use the SimpleBookmark class to create an XML file containing the complete outline tree (in PDF jargon, bookmarks are called outlines).
Java:
PdfReader reader = new PdfReader(src);
List<HashMap<String, Object>> list = SimpleBookmark.getBookmark(reader);
SimpleBookmark.exportToXML(list,
new FileOutputStream(dest), "ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(pdfIn);
var list = SimpleBookmark.GetBookmark(reader);
using (MemoryStream ms = new MemoryStream()) {
SimpleBookmark.ExportToXML(list, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The list object can also be used to examine the different bookmark elements one by one programmatically (this is all explained in the official documentation).
You want the named destinations (specific places in the document you can link to by name):
Now suppose that you meant to say named destinations, then you need the SimpleNamedDestination class as shown in the LinkActions example:
Java:
PdfReader reader = new PdfReader(src);
HashMap<String,String> map = SimpleNamedDestination.getNamedDestination(reader, false);
SimpleNamedDestination.exportToXML(map, new FileOutputStream(dest),
"ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(src);
Dictionary<string,string> map = SimpleNamedDestination
.GetNamedDestination(reader, false);
using (MemoryStream ms = new MemoryStream()) {
SimpleNamedDestination.ExportToXML(map, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The map object can also be used to examine the different named destinations one by one programmatically. Note the Boolean parameter that is used when retrieving the named destinations. Named destinations can be stored using a PDF name object as name, or using a PDF string object. The Boolean parameter indicates whether you want the former (true = stored as PDF name objects) or the latter (false = stored as PDF string objects) type of named destinations.
Named destinations are predefined targets in a PDF file that can be found through their name. Although the official name is named destinations, some people refer to them as bookmarks too (but when we say bookmarks in the context of PDF, we usually want to refer to outlines).
If someone is still searching the vb.net solution, trying to simplify, I have a large amount of pdf created with reportbuilder and with documentmap I automatically add a bookmarks "Title". So with iTextSharp I read the pdf and extract just the first bookmark value:
Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
Dim list As Object
list = SimpleBookmark.GetBookmark(oReader)
Dim string_book As String
string_book = list(0).item("Title")
It is a little help very simple for someone searching a start point to understand how it works.

iTextSharp filling forms and creating multiple pages

I have following written codes
Dim template As String = Server.MapPath("files/") & "2_paged_form.pdf"
Dim newFile As String = Server.MapPath("exports/") & "newFile.pdf"
Dim reader = New PdfReader(template)
Dim output = New FileStream(newFile, FileMode.Create, FileAccess.Write)
Dim stamp = New PdfStamper(reader, output)
stamp.AcroFields.SetField("client", "hello")
stamp.AcroFields.SetField("name", "test test")
stamp.AcroFields.SetField("address", "Hellocourt")
stamp.AcroFields.SetField("postcode", "xx 3xx")
stamp.AcroFields.SetField("dob", "11/02/1987")
stamp.FormFlattening = True
stamp.Close()
output.Close()
reader.Close()
I have managed to created a newfile.pdf with only onetime entry from 2_paged_form.pdf.
However I have multiple information to loop through so that newfile.pdf has multiple entries. for example newfile.pdf should have 10 pages with 5 different entries.
Could anyone help?
This is documented on the official iText site and in the book.
If you prefer watching a video, you can watch this tutorial. You can try the examples here. You need the entry "Fill, Flatten and Merge: how to do it correctly." The code for these examples can be found here: FillFlattenMerge2. Note that there's also a FillFlattenMerge1 example that demonstrates how NOT to do it. Please don't use that example ;-)
If you prefer reading a book, please download Chapter 6 of "iText in Action - Second Edition". You already know how to fill out one form (as described on page 185), you now want to merge different results. Again, there's an example on how not to do it (on page 190) and on how you should do it (on page 190-191).
I have never written a line of vb.net, but please look at this Java code as if it were pseudo code:
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
document.open();
ByteArrayOutputStream baos;
PdfReader reader;
PdfStamper stamper;
AcroFields fields;
while (data.hasMoreElements()) {
// create a PDF in memory
baos = new ByteArrayOutputStream();
reader = new PdfReader(SRC);
stamper = new PdfStamper(reader, baos);
fields = stamper.getAcroFields();
MyData myData = data.nextElement();
fields.setField("name", myData.getName());
fields.setField("address", myData.getAddress());
...
stamper.setFormFlattening(true);
stamper.close();
reader.close();
// add the PDF to PdfCopy
reader = new PdfReader(baos.toByteArray());
copy.addDocument(reader);
reader.close();
}
document.close();
As you can see, you need to create to fill the form resulting in a PDF that is kept in memory. Then you need to read this PDF from memory and add it to a PdfSmartCopy instance using the addDocument() method.
P.S. 1: What is wrong with the bad example? It results in bloated PDFs because the static content of the form is added redundantly as many times as you copy the form. PdfSmartCopy checks for redundant information and will add the static content only once.
P.S. 2: Why is there a bad way of doing it? The bad way of doing it, is actually a good way if the documents you are merging are all very different. In this case, the bad way is much faster and less memory-extensive and therefore actually the good way. It's only bad when you're merging documents that are very similar to each other, such as the same form filled out with different data sets.

Existing PDF to PDF/A "conversion"

I am trying to make an existing pdf into pdf/a-1b. I understand that itext cannot convert a pdf to pdf/a in the sense making it pdf/a compliant. But it definitely can flag the document as pdf/a. However, I looked at various examples and I cannot seem to figure out how to do it. The major problem is that
writer.PDFXConformance = PdfWriter.PDFA1B;
does not work anymore. First PDFA1B is not recognized, second, pdfwriter seems to have been rewritten and there is not much information about that.
It seems the only (in itext java version) way is:
PdfAWriter writer = PdfAWriter.getInstance(document, new FileOutputStream(filename), PdfAConformanceLevel.PDF_A_1B);
But that requires a document type, ie. it can be used when creating a pdf from scratch.
Can someone give an example of pdf to pdf/a conversion with the current version of itextsharp?
Thank you.
I can't imagine a valid reason for doing this but apparently you have one.
The conformance settings in iText are intended to be used with a PdfWriter and that object is (generally) only intended to be used with new documents. Since iText was never intended to convert documents to conformance that's just the way it was built.
To do what you want to do you could either just open the original document and update the appropriate tags in the document's dictionary or you could create a new document with the appropriate entries set and then import your old document. The below code shows the latter route, it first creates a regular non-conforming PDF and then creates a second document that says it is conforming even though it may or may not. See the code comments for more details. This targets iTextSharp 5.4.2.0.
//Folder that we're working from
var workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Create a regular non-conformant PDF, nothing special below
var RegularPdf = Path.Combine(workingFolder, "File1.pdf");
using (var fs = new FileStream(RegularPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
doc.Add(new Paragraph("Hello world!"));
doc.Close();
}
}
}
//Create our conformant document from the above file
var ConformantPdf = Path.Combine(workingFolder, "File2.pdf");
using (var fs = new FileStream(ConformantPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
//Use PdfSmartCopy to get every page
using (var copy = new PdfSmartCopy(doc, fs)) {
//Set our conformance levels
copy.SetPdfVersion(PdfWriter.PDF_VERSION_1_3);
copy.PDFXConformance = PdfWriter.PDFX1A2001;
//Open our new document for writing
doc.Open();
//Bring in every page from the old PDF
using (var r = new PdfReader(RegularPdf)) {
for (var i = 1; i <= r.NumberOfPages; i++) {
copy.AddPage(copy.GetImportedPage(r, i));
}
}
//Close up
doc.Close();
}
}
}
Just to be 100% clear, this WILL NOT MAKE A CONFORMANT PDF, just a document that says it conforms.