iTextSharp filling forms and creating multiple pages - vb.net

I have following written codes
Dim template As String = Server.MapPath("files/") & "2_paged_form.pdf"
Dim newFile As String = Server.MapPath("exports/") & "newFile.pdf"
Dim reader = New PdfReader(template)
Dim output = New FileStream(newFile, FileMode.Create, FileAccess.Write)
Dim stamp = New PdfStamper(reader, output)
stamp.AcroFields.SetField("client", "hello")
stamp.AcroFields.SetField("name", "test test")
stamp.AcroFields.SetField("address", "Hellocourt")
stamp.AcroFields.SetField("postcode", "xx 3xx")
stamp.AcroFields.SetField("dob", "11/02/1987")
stamp.FormFlattening = True
stamp.Close()
output.Close()
reader.Close()
I have managed to created a newfile.pdf with only onetime entry from 2_paged_form.pdf.
However I have multiple information to loop through so that newfile.pdf has multiple entries. for example newfile.pdf should have 10 pages with 5 different entries.
Could anyone help?

This is documented on the official iText site and in the book.
If you prefer watching a video, you can watch this tutorial. You can try the examples here. You need the entry "Fill, Flatten and Merge: how to do it correctly." The code for these examples can be found here: FillFlattenMerge2. Note that there's also a FillFlattenMerge1 example that demonstrates how NOT to do it. Please don't use that example ;-)
If you prefer reading a book, please download Chapter 6 of "iText in Action - Second Edition". You already know how to fill out one form (as described on page 185), you now want to merge different results. Again, there's an example on how not to do it (on page 190) and on how you should do it (on page 190-191).
I have never written a line of vb.net, but please look at this Java code as if it were pseudo code:
PdfCopy copy = new PdfSmartCopy(document, new FileOutputStream(dest));
document.open();
ByteArrayOutputStream baos;
PdfReader reader;
PdfStamper stamper;
AcroFields fields;
while (data.hasMoreElements()) {
// create a PDF in memory
baos = new ByteArrayOutputStream();
reader = new PdfReader(SRC);
stamper = new PdfStamper(reader, baos);
fields = stamper.getAcroFields();
MyData myData = data.nextElement();
fields.setField("name", myData.getName());
fields.setField("address", myData.getAddress());
...
stamper.setFormFlattening(true);
stamper.close();
reader.close();
// add the PDF to PdfCopy
reader = new PdfReader(baos.toByteArray());
copy.addDocument(reader);
reader.close();
}
document.close();
As you can see, you need to create to fill the form resulting in a PDF that is kept in memory. Then you need to read this PDF from memory and add it to a PdfSmartCopy instance using the addDocument() method.
P.S. 1: What is wrong with the bad example? It results in bloated PDFs because the static content of the form is added redundantly as many times as you copy the form. PdfSmartCopy checks for redundant information and will add the static content only once.
P.S. 2: Why is there a bad way of doing it? The bad way of doing it, is actually a good way if the documents you are merging are all very different. In this case, the bad way is much faster and less memory-extensive and therefore actually the good way. It's only bad when you're merging documents that are very similar to each other, such as the same form filled out with different data sets.

Related

Add attachment using iTextSharp.text.pdf.PdfStamper

I'm having a little bit trouble changing from PdfStamper.AddFileAttachment that receives four arguments to PdfStamper.AddFileAttachment which recieves PdfFileSpecification object as an argument. The thing is i want to add files to my pdf document as an embedded files,
Can some one tell me if i'm doing this the right way?!
I've replaced the : iText_Stamper.AddFileAttachment(desc, b, s, s);
with:
PdfFileSpecification pfs = PdfFileSpecification.FileEmbedded(iText_Stamper.Writer,
f.sDataFileName, s, b);
pfs.AddDescription(desc, true);
iText_Stamper.AddFileAttachment(desc, pfs);
PdfTargetDictionary target = new PdfTargetDictionary(true);
target.EmbeddedFileName = s;
PdfDestination dest = new PdfDestination(PdfDestination.FIT);
dest.AddFirst(new PdfString(desc));
iTextSharp.text.pdf.PdfAction action = iTextSharp.text.pdf.PdfAction.GotoEmbedded(null, target,
dest, true);
Chunk chunk = new Chunk(desc);
chunk.SetAction(action);
iText_Stamper.Writer.Add(chunk);
Is this sufficient? am i doing it right?
I'll be glad for some help.
The main issue in your code is that you assume that the PdfWriter descendant iText_Stamper.Writer can be used like a Document to which you can add text chunks using the Add method, and expect iTextSharp to layout such material automatically.
The class hierarchy unfortunately suggests this as both PdfWriter and Document implement the interface IElementListener which provides a method bool Add(IElement element).
Nonetheless, this assumption is wrong, the class hierarchies overlap for internal code reuse reasons, not to suggest similar usages; the Add implementation of the PdfWriter descendant iText_Stamper.Writer merely returns false and does not even attempt to add the given element to the document.
In particular in case of pages the stamper retrieved from the underlying PdfReader (and didn't add to them) this does make sense: If there is content somehow scattered across a page, where should the stamper add the new content? Should it consider the existing content background material and start at the top? Or should it somehow find a big unused page area and paint the elements there? (In case of newly added pages, though, the stamper indeed could have been programmed to serve like a regular PdfWriter and have allowed linkage with a Document to automatically layout content...)
Thus, to allow for new content to be automatically layout'ed, you have to tell iTextSharp where to put the content. You can do this by means of a ColumnText instance like this:
using (PdfReader reader = new PdfReader(source))
using (PdfStamper stamper = new PdfStamper(reader, new FileStream(result, FileMode.Create, FileAccess.Write)))
{
PdfFileSpecification pfs = PdfFileSpecification.FileEmbedded(stamper.Writer, pdfPath, pdfName, pdfBytes);
pfs.AddDescription(pdfDesc, true);
stamper.AddFileAttachment(pdfDesc, pfs);
PdfContentByte cb = stamper.GetOverContent(1);
ColumnText ct = new ColumnText(cb);
ct.SetSimpleColumn(30, 30, reader.GetPageSize(1).GetRight(30), 60);
PdfTargetDictionary target = new PdfTargetDictionary(true);
target.EmbeddedFileName = pdfDesc;
PdfDestination dest = new PdfDestination(PdfDestination.FIT);
dest.AddFirst(new PdfNumber(1));
PdfAction action = PdfAction.GotoEmbedded(null, target, dest, true);
Chunk chunk = new Chunk(pdfDesc);
chunk.SetAction(action);
ct.AddElement(chunk);
ct.Go();
}
(I used somewhat more descriptive names than your f.sDataFileName, s, b)
In the course of layout'ing your chunks, ColumnText also establishes the desired goto-emebedded links.
By the way, from your writing embedded files and using a PdfAction.GotoEmbedded I assumed you attach another PDF. If that assumption happens to be wrong, you might want to use PdfAnnotation.CreateFileAttachment instead

Reading PDF Bookmarks in VB.NET using iTextSharp

I am making a tool that scans PDF files and searches for text in PDF bookmarks and body text. I am using Visual Studio 2008 with VB.NET with iTextSharp.
How do I load bookmarks' list from an existing PDF file?
It depends on what you understand when you say "bookmarks".
You want the outlines (the entries that are visible in the bookmarks panel):
The CreateOnlineTree examples shows you how to use the SimpleBookmark class to create an XML file containing the complete outline tree (in PDF jargon, bookmarks are called outlines).
Java:
PdfReader reader = new PdfReader(src);
List<HashMap<String, Object>> list = SimpleBookmark.getBookmark(reader);
SimpleBookmark.exportToXML(list,
new FileOutputStream(dest), "ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(pdfIn);
var list = SimpleBookmark.GetBookmark(reader);
using (MemoryStream ms = new MemoryStream()) {
SimpleBookmark.ExportToXML(list, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The list object can also be used to examine the different bookmark elements one by one programmatically (this is all explained in the official documentation).
You want the named destinations (specific places in the document you can link to by name):
Now suppose that you meant to say named destinations, then you need the SimpleNamedDestination class as shown in the LinkActions example:
Java:
PdfReader reader = new PdfReader(src);
HashMap<String,String> map = SimpleNamedDestination.getNamedDestination(reader, false);
SimpleNamedDestination.exportToXML(map, new FileOutputStream(dest),
"ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(src);
Dictionary<string,string> map = SimpleNamedDestination
.GetNamedDestination(reader, false);
using (MemoryStream ms = new MemoryStream()) {
SimpleNamedDestination.ExportToXML(map, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The map object can also be used to examine the different named destinations one by one programmatically. Note the Boolean parameter that is used when retrieving the named destinations. Named destinations can be stored using a PDF name object as name, or using a PDF string object. The Boolean parameter indicates whether you want the former (true = stored as PDF name objects) or the latter (false = stored as PDF string objects) type of named destinations.
Named destinations are predefined targets in a PDF file that can be found through their name. Although the official name is named destinations, some people refer to them as bookmarks too (but when we say bookmarks in the context of PDF, we usually want to refer to outlines).
If someone is still searching the vb.net solution, trying to simplify, I have a large amount of pdf created with reportbuilder and with documentmap I automatically add a bookmarks "Title". So with iTextSharp I read the pdf and extract just the first bookmark value:
Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
Dim list As Object
list = SimpleBookmark.GetBookmark(oReader)
Dim string_book As String
string_book = list(0).item("Title")
It is a little help very simple for someone searching a start point to understand how it works.

Returning PDFDocument object from PDFStamper itextsharp

I want to return the Document object from below code.
At present I get a document has no pages exception.
private static Document GeneratePdfAcroFields(PdfReader reader, Document docReturn)
{
if (File.Exists(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"]))
File.Delete(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"]);
PdfStamper stamper = new PdfStamper(reader, new FileStream(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"],FileMode.Create));
AcroFields form = stamper.AcroFields;
///INSERTING TEXT DYNAMICALLY JUST FOR EXAMPLE.
form.SetField("topmostSubform[0].Page16[0].topmostSubform_0_\\.Page78_0_\\.TextField3_9_[0]", "This value was dynamically added.");
stamper.FormFlattening = false;
stamper.Close();
FileStream fsRead = new FileStream(System.Configuration.ConfigurationSettings.AppSettings["TEMP_PDF"], FileMode.Open);
Document docret = new Document(reader.GetPageSizeWithRotation(1));
return docret;
}
Thanks Chris.
Just to reiterate what #BrunoLowagie is saying, passing Document objects around almost never makes >sense. Despite what the name might sound like, a Document doesn't represent a PDF in any way. Calling >ToString() or GetBytes() (if that method actually existed) wouldn't get you a PDF. A Document is just a >one-way funnel for passing human-friendly commands over to an engine that actually writes raw PDF >tokens. The engine, however, is also not even a PDF. The only thing that truly is a PDF is the raw >bytes of the stream that is being written to. – Chris Haas

Converting Asp.net page to pdf with Itextsharp (XMLWorker) returning damaged/blank pdf

Not sure if I skipped a step in my code, I am using ItextSharp version 5.5.1 and XML Worker version 5.5.1. The doc.Close throws an exception "the document has no pages", but I watched sw.toString (it has the html content).
private void ExporttoPDF()
{
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.AddHeader("Content-Disposition", "attachment;filename=RequestSummaryReport.pdf");
HttpContext.Current.Response.Charset = "";
HttpContext.Current.Response.ContentType = "application/pdf";
StringWriter sw = new StringWriter();
HtmlTextWriter htw = new HtmlTextWriter(sw);
var doc = new Document(PageSize.A3, 45, 5, 5, 5);
PdfWriter writer = PdfWriter.GetInstance(doc, Response.OutputStream);
doc.Open();
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.SetTagFactory(Tags.GetHtmlTagProcessorFactory());
ICSSResolver cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(false);
IPipeline pipeline = new CssResolverPipeline(cssResolver, new HtmlPipeline(htmlContext, new PdfWriterPipeline(doc, writer)));
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser xmlParse = new XMLParser(true, worker);
pnlReport.RenderControl(htw);
StringReader sr = new StringReader(sw.ToString());
xmlParse.Parse(sr);
xmlParse.Flush();
doc.Close();
Response.Write(doc);
}
I just spent almost two hours with the same symptoms, and finally figured out the cause of the problem. I'm guessing you may have figured it out already (since, the question was asked 5 months ago) but I thought I'd post the answer in case there are others who run into the same problem.
When you create your PdfWriter, you pass in the stream (in your case Response.OutputStream) which is to be the destination for the generated PDF content. When as the PdfWriter writes content to the stream, the stream position is incremented accordingly. When the writer finishes, the stream position is at the end of the content, and this makes sense because any further calls to write should continue where the PdfWriter left off.
The problem is that when the MVC pipeline takes the Response.OutputStream (after your method returns) and attempts to read it to send its contents to the client (or in general, whenever the PDF destination stream is consumed), the stream position is at the end of the content, and that means that to the consumer it appears that the stream is empty, hence the empty PDF output.
To solve this, simply reset the the position of the stream immediate after you are finished writing to it, and before anything tries to read from it. In your case insert:
Response.OutputStream.Position = 0;
right after the line with xmlParse.Flush(), since that is the last line that will write to the stream.

Existing PDF to PDF/A "conversion"

I am trying to make an existing pdf into pdf/a-1b. I understand that itext cannot convert a pdf to pdf/a in the sense making it pdf/a compliant. But it definitely can flag the document as pdf/a. However, I looked at various examples and I cannot seem to figure out how to do it. The major problem is that
writer.PDFXConformance = PdfWriter.PDFA1B;
does not work anymore. First PDFA1B is not recognized, second, pdfwriter seems to have been rewritten and there is not much information about that.
It seems the only (in itext java version) way is:
PdfAWriter writer = PdfAWriter.getInstance(document, new FileOutputStream(filename), PdfAConformanceLevel.PDF_A_1B);
But that requires a document type, ie. it can be used when creating a pdf from scratch.
Can someone give an example of pdf to pdf/a conversion with the current version of itextsharp?
Thank you.
I can't imagine a valid reason for doing this but apparently you have one.
The conformance settings in iText are intended to be used with a PdfWriter and that object is (generally) only intended to be used with new documents. Since iText was never intended to convert documents to conformance that's just the way it was built.
To do what you want to do you could either just open the original document and update the appropriate tags in the document's dictionary or you could create a new document with the appropriate entries set and then import your old document. The below code shows the latter route, it first creates a regular non-conforming PDF and then creates a second document that says it is conforming even though it may or may not. See the code comments for more details. This targets iTextSharp 5.4.2.0.
//Folder that we're working from
var workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Create a regular non-conformant PDF, nothing special below
var RegularPdf = Path.Combine(workingFolder, "File1.pdf");
using (var fs = new FileStream(RegularPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
doc.Add(new Paragraph("Hello world!"));
doc.Close();
}
}
}
//Create our conformant document from the above file
var ConformantPdf = Path.Combine(workingFolder, "File2.pdf");
using (var fs = new FileStream(ConformantPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
//Use PdfSmartCopy to get every page
using (var copy = new PdfSmartCopy(doc, fs)) {
//Set our conformance levels
copy.SetPdfVersion(PdfWriter.PDF_VERSION_1_3);
copy.PDFXConformance = PdfWriter.PDFX1A2001;
//Open our new document for writing
doc.Open();
//Bring in every page from the old PDF
using (var r = new PdfReader(RegularPdf)) {
for (var i = 1; i <= r.NumberOfPages; i++) {
copy.AddPage(copy.GetImportedPage(r, i));
}
}
//Close up
doc.Close();
}
}
}
Just to be 100% clear, this WILL NOT MAKE A CONFORMANT PDF, just a document that says it conforms.