huge data export in pdf itextsharp - vb.net

I try to export data in PDF there is huge data so when i export ..here i dont export data from gridview but actually here i create dummy gridview in code and bind data in that grid.. i am not displaying the data the grid in page .. I try below code
Private Sub ExportGridToPDF()
Using myMemoryStream As New MemoryStream()
Dim myDocument As New iTextSharp.text.Document(iTextSharp.text.PageSize.A1, 10.0F, 10.0F, 10.0F, 0.0F)
' Dim myDocument As New iTextSharp.text.Document()
Dim myPDFWriter As PdfWriter = PdfWriter.GetInstance(myDocument, myMemoryStream)
myDocument.Open()
' Add to content to your PDF here...
Dim sw As New StringWriter()
Dim hw As New HtmlTextWriter(sw)
GridView1.AllowPaging = False
GridView1.DataBind()
GridView1.RenderControl(hw)
' We're done adding stuff to our PDF.
myDocument.Add(hw)
myDocument.Close()
Dim content As Byte() = myMemoryStream.ToArray()
' Write out PDF from memory stream.
Using fs As FileStream = File.Create("eport_PDF.pdf")
fs.Write(content, 0, CInt(content.Length))
End Using
End Using
End Sub
when i run this shows an error
System.InvalidCastException: Unable to cast object of type 'System.Web.UI.HtmlTextWriter' to type 'iTextSharp.text.IElement'.
on this line
myDocument.Add(hw)
I use memory stream because of huge data when i use code without memory stream then shows an error Out of Memory exception so i use memory stream and now this shows different error

The Add() method in the Document object only accepts parameters that implement the IElement interface. You are passing an HtmlTextWriter object. That object is totally unrelated to iText. It is truly amazing that you would think this could work.
In this question, as in previous questions you posted (some of which are deleted), you refer to HTML. You were using HTMLWorker in Add image using itextsharp and the deleted question Out Of Memory Exception error itext sharp.
If you want to convert HTML to PDF, you should upgrade to iText 7 and use the pdfHTML add-on. Take a look at the tutorial to see how HTML to PDF conversion is done: https://developers.itextpdf.com/content/itext-7-converting-html-pdf-pdfhtml
In a comment to this answer however, you write: I'm not exporting data in HTML to PDF. OK, if that's true, then why do you refer to HTML in your code? That's very confusing.
Furthermore, you write I create dummy grid-view in code and bind data in it. Unfortunately, you don't give us any information about the format of that dummy grid-view. I assume, it's something you "invented" yourself, but if that's the case, how do you suppose that iText can magically understand the dummy grid-view you invented?
I started this answer by saying the the Add() method only accepts objects that implement the IElement interface. Since you are talking about a grid, it's probably interesting to use an iText table element. In iText 5, there's an object named PdfPTable; in iText 7, that object is simply named Table.
Many people with large data sets, create such a table object first, then add it to a Document. That's not always wise, because objects keep building up in memory, eventually resulting in an OutOfMemoryException. For large data sets, you should mark the table as a large element, and add the table gradually.
In iText 5, the code would look like this:
Document document = new Document();
FileStream stream = new FileStream(fileName, FileMode.Create);
var pdfWriter = PdfWriter.GetInstance(document, stream);
document.Open();
PdfPTable table = new PdfPTable(4);
table.Complete = false;
for (int i = 0; i < 1000000; i++) {
PdfPCell cell = new PdfPCell(new Phrase(i.ToString()));
table.AddCell(cell);
if (i > 0 && i % 1000 == 0) {
document.Add(table);
}
}
table.Complete = true;
document.Add(table);
document.Close();
We're adding 1000000 cells to a table with 4 columns, but we add the table every 1000 cells (so every 250 rows). This means that the content is flushed from memory on a regular basis, thus avoiding an OutOfMemoryException.
Since you seem to be new at iText, do yourself a favor, and upgrade to using iText 7. iText 5 is in maintenance mode, which means that no new functionality will be added to that version. For instance: if at some point someone asks you to produce PDF 2.0 files (the PDF 2.0 spec was released a couple of months ago), you will have to throw all your iText 5 code away, and start anew, because only iText 7 will support PDF 2.0.
The large table functionality in iText 7, is discussed at the end of chapter 5 of the tutorial:
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
Table table = new Table(new[] {1f, 1f, 1f}, true);
table.AddHeaderCell("Table header 1");
table.AddHeaderCell("Table header 2");
table.AddHeaderCell("Table header 3");
table.AddFooterCell("Table footer 1");
table.AddFooterCell("Table footer 2");
table.AddFooterCell("Table footer 3");
document.Add(table);
for (int i = 0; i < 1000; i++)
{
table.AddCell($"Row {i + 1}; column 1");
table.AddCell($"Row {i + 1}; column 2");
table.AddCell($"Row {i + 1}; column 3");
if (i % 50 == 0)
{
table.Flush();
}
}
table.Complete();
document.Close();
As you can see, the iText 7 code is much more intuitive. We create a table with 3 columns, and the second parameter (true) indicates that we will add a very large table. We add a header, we add a footer, and we add the table to the document. Then we add 1000 rows, but we Flush() the table every 50 rows. Flushing free memory, avoiding going out of memory. Once we're done, we Complete() the table.
All of this is documented on the official web site! There is no need for you to invent your own grid view. As you have found out, inventing your own grid view cannot possibly work.
Also important: you say iTextSharp, I say iText. We both mean the same thing: the PDF library produced by iText Group that can be used to create PDF documents from C# code. Only you are using the old name, whereas we try to avoid that name based on the advice of a Trademark who told us that there's a company named Sharp that doesn't appreciate other companies using the word Sharp in the context of brands that aren't related to their company. So please stop saying that you're using iTextSharp; you're using iText!

Related

Adding an imported PDF to a table cell in iTextSharp

I am creating a new PDF that will contain a compilation of other documents.
These other documents can be word/excel/images/PDF's.
I am hoping to add all of this content to cells in a table, which is added to the document - this gives me the goodness of automatically adding pages, positioning elements in a cell rather than a page and allowing me an easier life at keeping content in the same order as i supply (such as img, doc, pdf, img, pdf etc)
Adding images to the table is simple enough.
I am converting the word/excel docs to PDF image streams. I'm also reading in the existing PDF's as a stream.
Adding these to a new PDF is simple enough - by way of adding a template to the PdfContent byte.
What I am trying to do though is add these PDF's to cells in a table, which are then added to the doc.
Is this possible?
Please download chapter 6 of my book. It contains two variations on what you are trying to do:
ImportingPages1, with as result time_table_imported1.pdf
ImportingPages2, with as result time_table_imported2.pdf
This is a code snippet:
// step 1
Document document = new Document();
// step 2
PdfWriter writer
= PdfWriter.getInstance(document, new FileOutputStream(RESULT));
// step 3
document.open();
// step 4
PdfReader reader = new PdfReader(MovieTemplates.RESULT);
int n = reader.getNumberOfPages();
PdfImportedPage page;
PdfPTable table = new PdfPTable(2);
for (int i = 1; i <= n; i++) {
page = writer.getImportedPage(reader, i);
table.getDefaultCell().setRotation(-page.getRotation());
table.addCell(Image.getInstance(page));
}
document.add(table);
// step 5
document.close();
reader.close();
The pages are imported as PdfImportedPage objects, and then wrapped inside an Image so that we can add them to a PdfPTable.

In iTextSharp, how to include an existing PDF while creating a new document

I've found many solutions in here and in the 'iText in Action' book, to merge PDF's using the PDFCopy and PDFSmartCopy classes, but the only similar question asked I've seen, the guy worked it out himself but didn't post the answer. This post Add an existing PDF from file to an unwritten document using iTextSharp asks the same question but its at the end, so they suggest closing the existing document and then use PDFCopy, here I'd like to insert it anywhere. So here goes.
I'm creating an iTextSharp document with text and images using normal Sections, Phrases, Document and PDFWriter classes. This is code written over many years and works fine. Now we need to insert an existing PDF while creating this document as either a new Section or Chapter if that isn't possible. I have the PDF as a Byte array, so no problems getting a PDFReader. However, I cannot work out how to read that PDF and insert it into the existing document at the point I'm at. I can get access to the PDFWriter if need be, but for the rest of the document all access is via Sections. This is as far as I've got and I can add the PDFWriter as another parameter if necessary.
I've made some progress since the original post and amend the code accordingly.
internal static void InsertPDF( Section section, Byte[] pdf )
{
this.document.NewPage();
PdfReader pdfreader = new PdfReader( pdf );
Int32 pages = pdfreader.NumberOfPages;
for ( Int32 page = 1; page <= pages; page++ )
{
PdfImportedPage page = this.writer.GetImportedPage( planreader, pagenum );
PdfContentByte pcb = this.writer.DirectContentUnder;
pcb.AddTemplate( page, 0, 0 );
this.document.NewPage();
}
}
It is close to doing what I want, but as I obviously don't understand the full workings of iText wonder if this is the correct way or there is a better way to do it.
If there is any other information I can provide, let me know.
Any pointers would be appreciated.
Just adding a little more meat to the answer. The solution ended up being found by researching what methods worked with a PdfTemplate which is what a PdfImportedPage is derived from. I've added a little more to show how it interacts with the rest of the document being built up. I hope this helps someone else.
internal static void InsertPDF( PdfWriter writer, Document document, Section section, Byte[] pdf )
{
Paragraph para = new Paragraph();
// Add note to show blank page is intentional
para.Add( new Phrase( "PDF follows on the next page.", <your font> ) );
section.Add( para );
// Need to update the document so we render this page.
document.Add( section );
PdfReader reader = new PdfReader( pdf );
PdfContentByte pcb = writer.DirectContentUnder;
Int32 pages = planreader.NumberOfPages;
for ( Int32 pagenum = 1; pagenum <= pages; pagenum++ )
{
document.NewPage();
PdfImportedPage page = writer.GetImportedPage( reader, pagenum );
// Render their page in our document.
pcb.AddTemplate( page, 0, 0 );
}
}
for insert existing pdf into new page, i've change order newpage
PdfImportedPage page2 = writer.GetImportedPage(pdf, 1);
cb.AddTemplate(page2, 0, 0);
document.NewPage();

iTextSharp: What is Lost When Copying PDF Content From Another PDF?

I am currently evaluating iTextSharp for potential use in a project. The code that I have written to achieve my goal is making use of PDFCopy.GetImportedPage to copy all of the pages from an existing PDF. What I want to know is what all do I need to be aware of that will be lost from a PDF and/or page when duplicating PDF content like this? For example, one thing that I already noticed is that I need to manually add in any bookmarks and named destinations into my new PDF.
Here's some rough sample code:
using (PdfReader reader = new PdfReader(inputFilename))
{
using (MemoryStream ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
int n;
n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
// add content and make further modifications here
}
}
// write the content to disk
}
}
Basically anything that's document-level instead of page-level will get lost and both Bookmarks and Destinations are document-level. Pull up the PDF spec and look at section 3.6.1 for other entries in the document catalog including Threads, Open and Additional Actions and Meta Data.
You might already have seen these but here are some samples (in Java) of how to merge Named Destinations and how to merge Bookmarks.

Need alternative to local or remote goto/destinations with merged documents

BACKGROUND
I have a java program that analyzes data and creates a pdf report using itext 5.
I recently had to add a summary of major problems at the start of the document so a user would not have to read over a hundred pages to find problems. Problems are only discovered when serially looking through the data.
I solved the problem by creating 3 pdf documents and then merging them, a start/title pdf, the summary of problems pdf, and the body or analysis pdf. (Basically splitting the original document at the point I wanted to insert the summary)
I use PdfReader and PdfCopy to combine the documents. I am able to keep the chapter bookmarks OK.
THE PROBLEM
As I encounter a significant problem I add it to the 'summary' document. I want to add a link in the summary to point to the problem in the body.
I tried to use Chunk.setLocalDestination and setLocalGoto but realized why that did not work, so I tried using setLocalDestination and setRemoteGoto (with and without 'file://'), but that did not work either. (Also, I used the final pdf document name in the RemoteGoto, not the temporary pdf document name.)
I do not want to use bookmarks because that seems wrong and would not look right.
I am hoping someone could suggest an alternate method or make a suggestion.
To recap, in my current code a create a Chunk with setLocalDestination and that chunk goes into the 'body' document. At the same time I create a setRemoteGoto which is put into the summary document. I was hoping when they were combined the link would work, but when the link is clicked, you go to the first page of the combined document.
Thanks.....
PS I have both iText in action books
CLARIFICATION 3/5/2014
What I was calling 'bookmarks' are really Chapter class entities that are inserted into sections of the 3 documents as they are being created.
After saving the 3 documents, PdfReader is used to open each and PdfCopy is used to put them into a new, final document.
I get the data from the Chapters, which creates the 'bookmarks' on the left side of the Pdf reader used by the user, e.g. Acrobat Reader.
int thisPdfPages = reader.getNumberOfPages();
reader.consolidateNamedDestinations();
java.util.List<HashMap<String, Object>> bookmarks = SimpleBookmark.getBookmark(reader);
if (bookmarks != null) {
if (pageOffset != 0) {
if (debug3) auditLogger.log("Shifting pages by " + pageOffset );
SimpleBookmark.shiftPageNumbers(bookmarks, pageOffset, null);
}
masterBookmarks.addAll(bookmarks);
}
for (int i = 0; i < thisPdfPages;) {
page = copy.getImportedPage(reader, ++i);
stamp = copy.createPageStamp(page);
// add page numbers
ColumnText.showTextAligned(stamp.getUnderContent(), Element.ALIGN_CENTER, new Phrase(String.format("page %d of %d", start + i, totalPages)), 297.5f, 28, 0);
stamp.alterContents();
copy.addPage(page);
}
PRAcroForm form = reader.getAcroForm();
if (form != null) {
copy.copyAcroForm(reader);
}
When analyzing the data I have 2 documents open, a base document which contains all the details and a summary document which contains notable events over some thresholds.
//NOTE section is part of the 'body' document
//NOTE summaryPhrase is a part of the 'summary' document
String linkName = "summaryPf_" + networkid ;
//create Link target
section.add(new Chunk("CHANGE TO EMPTY STRING WHEN WORKING").setLocalDestination( linkName ));
//create Link
Chunk linkChunk = new Chunk( "[Link] " );
Font linkFont = new Font( regularFont );
linkFont.setColor(BaseColor.BLUE);
linkFont.setStyle( Font.UNDERLINE );
linkChunk.setFont( linkFont );
boolean useLocal = true;
// both local and remote goto's fail
if (useLocal) {
linkChunk.setLocalGoto( linkName);
} else {
// all permutations of setting filename fail,
// but it does bring up a permissions dialog when the link is clicked.
//String remotePdfName = "file://./" + pdfReportName ;
//String remotePdfName = "file://" + pdfReportName ;
//String remotePdfName = "file:" + pdfReportName ;
String remotePdfName = pdfReportName ;
linkChunk.setRemoteGoto( remotePdfName, linkName);
}
// add link to summary document
summaryPhrase.add( linkChunk );
summaryPhrase.add( String.format("There were %d devices with ping failures", summaryCount));
summaryPhrase.add( Chunk.NEWLINE );
}
If I use setLocalGoto, when you click the link in the final document you goto the first page.
If I use setRemoteGoto, a dialog ask permission to go to a document, but the document fails to open, tried several permutations on filename.

Some pdf file watermark does not show using iText

Our company using iText to stamp some watermark text (not image) on some pdf forms. I noticed 95% forms shows watermark correctly, about 5% does not. I tested, copy 2 original pdf files, one was marked ok, other one does not ok, then tested in via a small program, same result: one got marked, the other does not. I then tried the latest version of iText jar file (version 5.0.6), same thing. I checked pdf file properties, security settings etc, seems nothing shows any hint. The result file does changed size and markd "changed by iText version...." after executed program.
Here is the sample watermark code (using itext jar version 2.1.7), note topText, mainText, bottonText parameters passed in, make 3 lines of watermarks show in the pdf as watermark.
Any help appreciated !!
public class WatermarkGenerator {
private static int TEXT_TILT_ANGLE = 25;
private static Color MEDIUM_GRAY = new Color(160, 160, 160);
private static int SUPPORT_FONT_SIZE = 42;
private static int PRIMARY_FONT_SIZE = 54;
public static void addWaterMark(InputStream pdfInputStream,
OutputStream outputStream, String topText,
String mainText, String bottomText) throws Exception {
PdfReader reader = new PdfReader(pdfInputStream);
int numPages = reader.getNumberOfPages();
// Create a stamper that will copy the document to the output
// stream.
PdfStamper stamp = new PdfStamper(reader, outputStream);
int page=1;
BaseFont baseFont =
BaseFont.createFont(BaseFont.HELVETICA_BOLDOBLIQUE,
BaseFont.WINANSI, BaseFont.EMBEDDED);
float width;
float height;
while (page <= numPages) {
PdfContentByte cb = stamp.getOverContent(page);
height = reader.getPageSizeWithRotation(page).getHeight() / 2;
width = reader.getPageSizeWithRotation(page).getWidth() / 2;
cb = stamp.getUnderContent(page);
cb.saveState();
cb.setColorFill(MEDIUM_GRAY);
// Top Text
cb.beginText();
cb.setFontAndSize(baseFont, SUPPORT_FONT_SIZE);
cb.showTextAligned(Element.ALIGN_CENTER, topText, width,
height+PRIMARY_FONT_SIZE+16, TEXT_TILT_ANGLE);
cb.endText();
// Primary Text
cb.beginText();
cb.setFontAndSize(baseFont, PRIMARY_FONT_SIZE);
cb.showTextAligned(Element.ALIGN_CENTER, mainText, width,
height, TEXT_TILT_ANGLE);
cb.endText();
// Bottom Text
cb.beginText();
cb.setFontAndSize(baseFont, SUPPORT_FONT_SIZE);
cb.showTextAligned(Element.ALIGN_CENTER, bottomText, width,
height-PRIMARY_FONT_SIZE-6, TEXT_TILT_ANGLE);
cb.endText();
cb.restoreState();
page++;
}
stamp.close();
}
}
We solved problem by change Adobe LifecycleSave file option. File->Save->properties->Save as, then look at Save as type, default is Acrobat 7.0.5 Dynamic PDF Form File, we changed to use 7.0.5 Static PDF Form File (actually any static one will work). File saved in static one do not have this watermark disappear problem. Thanks Mark for pointing to the right direction.
You're using the underContent rather than the overContent. Don't do that. It leaves you at the mercy of big, white-filled rectangles that some folks insist on drawing first thing. It's a hold over from less-than-good PostScript interpreters and hasn't been necessary for Many Years.
Okay, having viewed your PDF, I can see the problem is that this is an XFA-based form (from LiveCycle Designer). Acrobat can (and often does) rebuild the entire file based on the XFA (a type of xml) it contains. That's how your changes are lost. When Acrobat rebuilds the PDF from the XFA, all the existing PDF information is pitched, including your watermark.
The only way to get this to work would be to define the watermark as part of the XFA file contained in the PDF.
Detecting these forms isn't all that hard:
PdfReader reader = new PdfReader(...);
AcroFields acFields = reader.getAcroFields();
XfaForm xfaForm = acFields.getXfaForm();
if (xfaForm != null && xfaForm.isXfaPresent()) {
// Ohs nose.
throw new ItsATrapException("We can't repel XML of that magnitude!");
}
Modifying them on the other hand could be Quite Challenging, but here's the specs.
Once you've figured out what needs to be changed, it's a simple matter of XML manipulation... but that "figure it out" part could be interesting.
Good hunting.