Add attachment using iTextSharp.text.pdf.PdfStamper - pdf

I'm having a little bit trouble changing from PdfStamper.AddFileAttachment that receives four arguments to PdfStamper.AddFileAttachment which recieves PdfFileSpecification object as an argument. The thing is i want to add files to my pdf document as an embedded files,
Can some one tell me if i'm doing this the right way?!
I've replaced the : iText_Stamper.AddFileAttachment(desc, b, s, s);
with:
PdfFileSpecification pfs = PdfFileSpecification.FileEmbedded(iText_Stamper.Writer,
f.sDataFileName, s, b);
pfs.AddDescription(desc, true);
iText_Stamper.AddFileAttachment(desc, pfs);
PdfTargetDictionary target = new PdfTargetDictionary(true);
target.EmbeddedFileName = s;
PdfDestination dest = new PdfDestination(PdfDestination.FIT);
dest.AddFirst(new PdfString(desc));
iTextSharp.text.pdf.PdfAction action = iTextSharp.text.pdf.PdfAction.GotoEmbedded(null, target,
dest, true);
Chunk chunk = new Chunk(desc);
chunk.SetAction(action);
iText_Stamper.Writer.Add(chunk);
Is this sufficient? am i doing it right?
I'll be glad for some help.

The main issue in your code is that you assume that the PdfWriter descendant iText_Stamper.Writer can be used like a Document to which you can add text chunks using the Add method, and expect iTextSharp to layout such material automatically.
The class hierarchy unfortunately suggests this as both PdfWriter and Document implement the interface IElementListener which provides a method bool Add(IElement element).
Nonetheless, this assumption is wrong, the class hierarchies overlap for internal code reuse reasons, not to suggest similar usages; the Add implementation of the PdfWriter descendant iText_Stamper.Writer merely returns false and does not even attempt to add the given element to the document.
In particular in case of pages the stamper retrieved from the underlying PdfReader (and didn't add to them) this does make sense: If there is content somehow scattered across a page, where should the stamper add the new content? Should it consider the existing content background material and start at the top? Or should it somehow find a big unused page area and paint the elements there? (In case of newly added pages, though, the stamper indeed could have been programmed to serve like a regular PdfWriter and have allowed linkage with a Document to automatically layout content...)
Thus, to allow for new content to be automatically layout'ed, you have to tell iTextSharp where to put the content. You can do this by means of a ColumnText instance like this:
using (PdfReader reader = new PdfReader(source))
using (PdfStamper stamper = new PdfStamper(reader, new FileStream(result, FileMode.Create, FileAccess.Write)))
{
PdfFileSpecification pfs = PdfFileSpecification.FileEmbedded(stamper.Writer, pdfPath, pdfName, pdfBytes);
pfs.AddDescription(pdfDesc, true);
stamper.AddFileAttachment(pdfDesc, pfs);
PdfContentByte cb = stamper.GetOverContent(1);
ColumnText ct = new ColumnText(cb);
ct.SetSimpleColumn(30, 30, reader.GetPageSize(1).GetRight(30), 60);
PdfTargetDictionary target = new PdfTargetDictionary(true);
target.EmbeddedFileName = pdfDesc;
PdfDestination dest = new PdfDestination(PdfDestination.FIT);
dest.AddFirst(new PdfNumber(1));
PdfAction action = PdfAction.GotoEmbedded(null, target, dest, true);
Chunk chunk = new Chunk(pdfDesc);
chunk.SetAction(action);
ct.AddElement(chunk);
ct.Go();
}
(I used somewhat more descriptive names than your f.sDataFileName, s, b)
In the course of layout'ing your chunks, ColumnText also establishes the desired goto-emebedded links.
By the way, from your writing embedded files and using a PdfAction.GotoEmbedded I assumed you attach another PDF. If that assumption happens to be wrong, you might want to use PdfAnnotation.CreateFileAttachment instead

Related

iTextSharp v5 - How do you concatenate PDFs in memory?

I am having trouble merging PDFs in-memory. I have 2 memory streams, a master and component stream, the idea is that as each component PDF is built up, the component PDF's bytes are added to the master stream. At the very end of all the components, we have a byte array that's a PDF.
I have the code below, but nothing is copying into my masterStream. I think the issue is with CopyPagesTo, but I'm not familiar enough and the documentation/examples are hard to find.
byte[] updated;
using (MemoryStream masterMemoryStream = new MemoryStream())
{
masterStream.WriteTo(masterMemoryStream);
// Read from master stream (ie. all existing components)
masterMemoryStream.Position = 0;
using (iText.Kernel.Pdf.PdfWriter masterPdfWriter = new iText.Kernel.Pdf.PdfWriter(masterMemoryStream))
using (iText.Kernel.Pdf.PdfDocument masterPdfDocument = new iText.Kernel.Pdf.PdfDocument(masterPdfWriter))
{
using (MemoryStream componentMemoryStream = new MemoryStream())
{
componentStream.WriteTo(componentMemoryStream);
// Read from new component
componentMemoryStream.Position = 0;
using (iText.Kernel.Pdf.PdfReader componentPdfReader = new iText.Kernel.Pdf.PdfReader(componentMemoryStream))
using (iText.Kernel.Pdf.PdfDocument componentPdfDocument = new iText.Kernel.Pdf.PdfDocument(componentPdfReader))
{
// Copy pages from component into master
componentPdfDocument.CopyPagesTo(1, componentPdfDocument.GetNumberOfPages(), masterPdfDocument);
}
}
}
updated = masterMemoryStream.GetBuffer();
}
// Write updates to master stream?
masterStream.SetLength(0);
using (MemoryStream temp = new MemoryStream(updated))
temp.WriteTo(masterStream);
Answer
This is mkl's answer with some of my corrections:
using (MemoryStream temporaryStream = new MemoryStream())
{
masterStream.Position = 0;
componentStream.Position = 0;
using (PdfDocument combinedDocument = new PdfDocument(new PdfReader(masterStream), new PdfWriter(temporaryStream)))
using (PdfDocument componentDocument = new PdfDocument(new PdfReader(componentStream)))
{
componentDocument.CopyPagesTo(1, componentDocument.GetNumberOfPages(), combinedDocument);
}
byte[] temporaryBytes = temporaryStream.ToArray();
masterStream.Position = 0;
masterStream.SetLength(temporaryBytes.Length);
masterStream.Capacity = temporaryBytes.Length;
masterStream.Write(temporaryBytes, 0, temporaryBytes.Length);
}
There are a number of issues in your code. I'll first give you a working version and then go into the issues in your code.
A working version (with an important limitation)
You can combine two PDFs given in MemoryStream instances masterStream and componentStream and get the result in the same MemoryStream instance masterStream as follows:
using (MemoryStream temporaryStream = new MemoryStream())
{
masterStream.Position = 0;
componentStream.Position = 0;
using (PdfDocument combinedDocument = new PdfDocument(new PdfReader(masterStream), new PdfWriter(temporaryStream)))
using (PdfDocument componentDocument = new PdfDocument(new PdfReader(componentStream)))
{
componentDocument.CopyPagesTo(1, componentDocument.GetNumberOfPages(), combinedDocument);
}
byte[] temporaryBytes = temporaryStream.ToArray();
masterStream.Position = 0;
masterStream.Capacity = temporaryBytes.Length;
masterStream.Write(temporaryBytes, 0, temporaryBytes.Length);
masterStream.Position = 0;
}
The limitation is that you have to have instantiated the masterStream with an expandable capacity; the MemoryStream class has a number of constructors only some of which create such an expandable instance while the others create non-resizable instances. For details read here.
Issues in your concept and code
Concatenating PDF files does not result in a valid merged PDF
You describe your concept like this
the idea is that as each component PDF is built up, the component PDF's bytes are added to the master stream
This does not work, though, the PDF format does not allow merging PDFs by simply concatenating them. In particular the (active) objects in a PDF have an identifier number which must be unique in the PDF, concatenating would result in a file with non-unique object identifiers; PDFs contain cross reference structures which map each object identifier to its offset from the file start, concatenating would get all these offsets wrong for the added PDFs; furthermore, a PDF has to have a single root object from which the other objects are referenced directly or indirectly, concatenating would result in multiple root objects.
Writing and immediately overwriting
In your code you have
masterStream.WriteTo(masterMemoryStream);
// Read from master stream (ie. all existing components)
masterMemoryStream.Position = 0;
using (iText.Kernel.Pdf.PdfWriter masterPdfWriter = new iText.Kernel.Pdf.PdfWriter(masterMemoryStream))
Here you write the contents of masterStream to masterMemoryStream, then set the masterMemoryStream position to the start and instantiate a PdfWriter which starts writing there. I.e. your original copy of the masterStream contents get overwritten, surely not what you wanted.
Using MemoryStream.GetBuffer
MemoryStream.GetBuffer does not only return the data written into the MemoryStream by design but the whole buffer; i.e. there may be a lot of trash bytes after the actual PDF in what you retrieve here
updated = masterMemoryStream.GetBuffer();
This may cause PDF processors trying to process your result PDFs to be unable to open the file: PDFs have a pointer to the last cross references at their end, so if you have trash bytes following the actual end of your PDF, PDF processors may not find that pointer.
PS
As worked out in the comments, the code above works fine in case of constantly growing stream lengths (which usually will happen in the use case at hand) but in general one needs to restrict the stream size before writing the new content, e.g. like this:
...
masterStream.Position = 0;
masterStream.SetLength(temporaryBytes.Length); // <<<<
masterStream.Capacity = temporaryBytes.Length;
...

Reading PDF Bookmarks in VB.NET using iTextSharp

I am making a tool that scans PDF files and searches for text in PDF bookmarks and body text. I am using Visual Studio 2008 with VB.NET with iTextSharp.
How do I load bookmarks' list from an existing PDF file?
It depends on what you understand when you say "bookmarks".
You want the outlines (the entries that are visible in the bookmarks panel):
The CreateOnlineTree examples shows you how to use the SimpleBookmark class to create an XML file containing the complete outline tree (in PDF jargon, bookmarks are called outlines).
Java:
PdfReader reader = new PdfReader(src);
List<HashMap<String, Object>> list = SimpleBookmark.getBookmark(reader);
SimpleBookmark.exportToXML(list,
new FileOutputStream(dest), "ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(pdfIn);
var list = SimpleBookmark.GetBookmark(reader);
using (MemoryStream ms = new MemoryStream()) {
SimpleBookmark.ExportToXML(list, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The list object can also be used to examine the different bookmark elements one by one programmatically (this is all explained in the official documentation).
You want the named destinations (specific places in the document you can link to by name):
Now suppose that you meant to say named destinations, then you need the SimpleNamedDestination class as shown in the LinkActions example:
Java:
PdfReader reader = new PdfReader(src);
HashMap<String,String> map = SimpleNamedDestination.getNamedDestination(reader, false);
SimpleNamedDestination.exportToXML(map, new FileOutputStream(dest),
"ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(src);
Dictionary<string,string> map = SimpleNamedDestination
.GetNamedDestination(reader, false);
using (MemoryStream ms = new MemoryStream()) {
SimpleNamedDestination.ExportToXML(map, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The map object can also be used to examine the different named destinations one by one programmatically. Note the Boolean parameter that is used when retrieving the named destinations. Named destinations can be stored using a PDF name object as name, or using a PDF string object. The Boolean parameter indicates whether you want the former (true = stored as PDF name objects) or the latter (false = stored as PDF string objects) type of named destinations.
Named destinations are predefined targets in a PDF file that can be found through their name. Although the official name is named destinations, some people refer to them as bookmarks too (but when we say bookmarks in the context of PDF, we usually want to refer to outlines).
If someone is still searching the vb.net solution, trying to simplify, I have a large amount of pdf created with reportbuilder and with documentmap I automatically add a bookmarks "Title". So with iTextSharp I read the pdf and extract just the first bookmark value:
Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
Dim list As Object
list = SimpleBookmark.GetBookmark(oReader)
Dim string_book As String
string_book = list(0).item("Title")
It is a little help very simple for someone searching a start point to understand how it works.

Returning PDFDocument object from PDFStamper itextsharp

I want to return the Document object from below code.
At present I get a document has no pages exception.
private static Document GeneratePdfAcroFields(PdfReader reader, Document docReturn)
{
if (File.Exists(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"]))
File.Delete(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"]);
PdfStamper stamper = new PdfStamper(reader, new FileStream(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"],FileMode.Create));
AcroFields form = stamper.AcroFields;
///INSERTING TEXT DYNAMICALLY JUST FOR EXAMPLE.
form.SetField("topmostSubform[0].Page16[0].topmostSubform_0_\\.Page78_0_\\.TextField3_9_[0]", "This value was dynamically added.");
stamper.FormFlattening = false;
stamper.Close();
FileStream fsRead = new FileStream(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"], FileMode.Open);
Document docret = new Document(reader.GetPageSizeWithRotation(1));
return docret;
}
Thanks Chris.
Just to reiterate what #BrunoLowagie is saying, passing Document objects around almost never makes >sense. Despite what the name might sound like, a Document doesn't represent a PDF in any way. Calling >ToString() or GetBytes() (if that method actually existed) wouldn't get you a PDF. A Document is just a >one-way funnel for passing human-friendly commands over to an engine that actually writes raw PDF >tokens. The engine, however, is also not even a PDF. The only thing that truly is a PDF is the raw >bytes of the stream that is being written to. – Chris Haas

iTextSharp filling forms and creating multiple pages

I have following written codes
Dim template As String = Server.MapPath("files/") & "2_paged_form.pdf"
Dim newFile As String = Server.MapPath("exports/") & "newFile.pdf"
Dim reader = New PdfReader(template)
Dim output = New FileStream(newFile, FileMode.Create, FileAccess.Write)
Dim stamp = New PdfStamper(reader, output)
stamp.AcroFields.SetField("client", "hello")
stamp.AcroFields.SetField("name", "test test")
stamp.AcroFields.SetField("address", "Hellocourt")
stamp.AcroFields.SetField("postcode", "xx 3xx")
stamp.AcroFields.SetField("dob", "11/02/1987")
stamp.FormFlattening = True
stamp.Close()
output.Close()
reader.Close()
I have managed to created a newfile.pdf with only onetime entry from 2_paged_form.pdf.
However I have multiple information to loop through so that newfile.pdf has multiple entries. for example newfile.pdf should have 10 pages with 5 different entries.
Could anyone help?
This is documented on the official iText site and in the book.
If you prefer watching a video, you can watch this tutorial. You can try the examples here. You need the entry "Fill, Flatten and Merge: how to do it correctly." The code for these examples can be found here: FillFlattenMerge2. Note that there's also a FillFlattenMerge1 example that demonstrates how NOT to do it. Please don't use that example ;-)
If you prefer reading a book, please download Chapter 6 of "iText in Action - Second Edition". You already know how to fill out one form (as described on page 185), you now want to merge different results. Again, there's an example on how not to do it (on page 190) and on how you should do it (on page 190-191).
I have never written a line of vb.net, but please look at this Java code as if it were pseudo code:
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
document.open();
ByteArrayOutputStream baos;
PdfReader reader;
PdfStamper stamper;
AcroFields fields;
while (data.hasMoreElements()) {
// create a PDF in memory
baos = new ByteArrayOutputStream();
reader = new PdfReader(SRC);
stamper = new PdfStamper(reader, baos);
fields = stamper.getAcroFields();
MyData myData = data.nextElement();
fields.setField("name", myData.getName());
fields.setField("address", myData.getAddress());
...
stamper.setFormFlattening(true);
stamper.close();
reader.close();
// add the PDF to PdfCopy
reader = new PdfReader(baos.toByteArray());
copy.addDocument(reader);
reader.close();
}
document.close();
As you can see, you need to create to fill the form resulting in a PDF that is kept in memory. Then you need to read this PDF from memory and add it to a PdfSmartCopy instance using the addDocument() method.
P.S. 1: What is wrong with the bad example? It results in bloated PDFs because the static content of the form is added redundantly as many times as you copy the form. PdfSmartCopy checks for redundant information and will add the static content only once.
P.S. 2: Why is there a bad way of doing it? The bad way of doing it, is actually a good way if the documents you are merging are all very different. In this case, the bad way is much faster and less memory-extensive and therefore actually the good way. It's only bad when you're merging documents that are very similar to each other, such as the same form filled out with different data sets.

Existing PDF to PDF/A "conversion"

I am trying to make an existing pdf into pdf/a-1b. I understand that itext cannot convert a pdf to pdf/a in the sense making it pdf/a compliant. But it definitely can flag the document as pdf/a. However, I looked at various examples and I cannot seem to figure out how to do it. The major problem is that
writer.PDFXConformance = PdfWriter.PDFA1B;
does not work anymore. First PDFA1B is not recognized, second, pdfwriter seems to have been rewritten and there is not much information about that.
It seems the only (in itext java version) way is:
PdfAWriter writer = PdfAWriter.getInstance(document, new FileOutputStream(filename), PdfAConformanceLevel.PDF_A_1B);
But that requires a document type, ie. it can be used when creating a pdf from scratch.
Can someone give an example of pdf to pdf/a conversion with the current version of itextsharp?
Thank you.
I can't imagine a valid reason for doing this but apparently you have one.
The conformance settings in iText are intended to be used with a PdfWriter and that object is (generally) only intended to be used with new documents. Since iText was never intended to convert documents to conformance that's just the way it was built.
To do what you want to do you could either just open the original document and update the appropriate tags in the document's dictionary or you could create a new document with the appropriate entries set and then import your old document. The below code shows the latter route, it first creates a regular non-conforming PDF and then creates a second document that says it is conforming even though it may or may not. See the code comments for more details. This targets iTextSharp 5.4.2.0.
//Folder that we're working from
var workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Create a regular non-conformant PDF, nothing special below
var RegularPdf = Path.Combine(workingFolder, "File1.pdf");
using (var fs = new FileStream(RegularPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
doc.Add(new Paragraph("Hello world!"));
doc.Close();
}
}
}
//Create our conformant document from the above file
var ConformantPdf = Path.Combine(workingFolder, "File2.pdf");
using (var fs = new FileStream(ConformantPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
//Use PdfSmartCopy to get every page
using (var copy = new PdfSmartCopy(doc, fs)) {
//Set our conformance levels
copy.SetPdfVersion(PdfWriter.PDF_VERSION_1_3);
copy.PDFXConformance = PdfWriter.PDFX1A2001;
//Open our new document for writing
doc.Open();
//Bring in every page from the old PDF
using (var r = new PdfReader(RegularPdf)) {
for (var i = 1; i <= r.NumberOfPages; i++) {
copy.AddPage(copy.GetImportedPage(r, i));
}
}
//Close up
doc.Close();
}
}
}
Just to be 100% clear, this WILL NOT MAKE A CONFORMANT PDF, just a document that says it conforms.