Insert PDF in PDF (NOT merging files) - pdf

I'd like to insert a PDF page in another PDF page scaled. I'd like to use iTextSharp for this.
I have a vector drawing which can be exported as a single page PDF file. I would like to add this file into a page of other PDF document just like I would add an image to a PDF document.
Is this possible?
The purpose of this is to retain the ability to zoom in without losing quality.
It is very hard to reproduce the vector drawing using PDF vectors because it is an extremely complex drawing.
Exporting the vector drawing as high resolution image is not an option since I have to use a lot of them in a single PDF document. The final PDF would be very large and its writing too slow.

This is relatively easy to do although there's a couple of ways to go about it. If you're creating a new document that has the other documents inside of it and nothing else then the easiest thing to use is probably the PdfWriter.GetImportedPage(PdfReader, Int). This will give you a PdfImportedPage (which inherits from PdfTemplate). Once you have that you can add it to your new document by using PdfWriter.DirectContent.AddTemplate(PdfImportedPage, Matrix).
There's a couple of overloads to AddTemplate() but the easiest one (at least for me) is the one that takes a System.Drawing.Drawing2D.Matrix. If you use this you can easily scale and translate (change x,y) without having to think in "matrix" terms.
Below is sample code that shows this off. It targets iTextSharp 5.4.0 although it should work pretty much the same with 4.1.6 if you remove the using statements. It first creates a sample PDF with 12 pages with random background colors. Then it creates a second document and adds each page from the first PDF scaled by 50% so that 4 old pages fit onto 1 new page. See the code comments for further details. This code assumes that all pages are the same size, you might need to perform further calculations if your situation differs.
//Test files that we'll be creating
var file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File1.pdf");
var file2 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File2.pdf");
//For test purposes we'll fill the pages with a random background color
var R = new Random();
//Standard PDF creation, nothing special here
using (var fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//Create 12 pages with text on each one
for (int i = 1; i <= 12; i++) {
doc.NewPage();
//For test purposes fill the page with a random background color
var cb = writer.DirectContentUnder;
cb.SaveState();
cb.SetColorFill(new BaseColor(R.Next(0, 256), R.Next(0, 256), R.Next(0, 256)));
cb.Rectangle(0, 0, doc.PageSize.Width, doc.PageSize.Height);
cb.Fill();
cb.RestoreState();
//Add some text to the page
doc.Add(new Paragraph("This is page " + i.ToString()));
}
doc.Close();
}
}
}
//Create our combined file
using (var fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
//Bind a reader to the file that we created above
using (var reader = new PdfReader(file1)) {
doc.Open();
//Get the number of pages in the original file
int pageCount = reader.NumberOfPages;
//Loop through each page
for (int i = 0; i < pageCount; i++) {
//We're putting four original pages on one new page so add a new page every four pages
if (i % 4 == 0) {
doc.NewPage();
}
//Get a page from the reader (remember that PdfReader pages are one-based)
var imp = writer.GetImportedPage(reader, (i + 1));
//A transform matrix is an easier way of dealing with changing dimension and coordinates on an rectangle
var tm = new System.Drawing.Drawing2D.Matrix();
//Scale the image by half
tm.Scale(0.5f, 0.5f);
//PDF coordinates put 0,0 in the bottom left corner.
if (i % 4 == 0) {
tm.Translate(0, doc.PageSize.Height); //The first item on the page needs to be moved up "one square"
} else if (i % 4 == 1) {
tm.Translate(doc.PageSize.Width, doc.PageSize.Height); //The second needs to be moved up and over
} else if (i % 4 == 2) {
//Nothing needs to be done for the third
} else if (i % 4 == 3) {
tm.Translate(doc.PageSize.Width, 0); //The fourth needs to be moved over
}
//Add our imported page using the matrix that we set above
writer.DirectContent.AddTemplate(imp,tm);
}
doc.Close();
}
}
}
}

In addition; while i was trying to add a rotated pdf to a rotated pdf, i got some rotation problems. Kind of confusing but you should check the "PdfImportedPage.Rotation" of the page which is gonna be added to pdf.
PdfImportedPage page;//page = writer.GetImportedPage(PdfReader reader, int pageNum);
PdfContentByte pcb;//pcb = PdfWriter.DirectContentUnder;
//create matrix to use for rotating imported page
Matrix matrix = new Matrix(a, b, c, d, e, f);
matrix.Rotate(-(page.Rotation));
if (page.Rotation != 0)
pcb.AddTemplate(page, matrix, true);
else
pcb.AddTemplate(page, a, b, c, d, e, f, true);
code looks like silly but i want to get your attention on "matrix.Rotate(negative rotation of imported page)"

Related

Remove object in PDF with iTextSharp and save

This is a case of OCR gone wrong. I need to remove the hidden text from a PDF and I'm having a hard time figuring out how to do it.
The hidden text resides in an area always named /QuickPDFsomething which is under and /XObject dictionary that resides in the page's /Resources dictionary.
I have tried these two things and neither has worked so I'm clearly doing something wrong.
Option 1 - Kill obj - The PDF won't open in Acrobat and states, 'An error exists on this page. Acrobat may not display the page correctly' but it looks ok. Pitstop pukes with 'Critical parser failure: XObject resource missing'.
PdfReader.KillIndirect(obj);
oPdfFile.GetPdfReader().RemoveUnusedObjects();
var stamper = new PdfStamper(oPdfFile.GetPdfReader(), new FileStream(#"C:\temp.pdf", FileMode.Create));
stamper.Close();
Option 2 - CleanupProcessor - Throws an exception about 'A Graphics object cannot be created from an image that has an indexed pixel format'.
var stamper = new PdfStamper(oPdfFile.GetPdfReader(), new FileStream(#"C:\temp.pdf", FileMode.Create));
var cleanupLocations = new List<PdfCleanUpLocation>();
var pageRect = oPdfFile.GetPdfReader().GetCropBox(1);
cleanupLocations.Add(new PdfCleanUpLocation(1, pageRect));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanupLocations, stamper);
cleaner.CleanUp();
stamper.Close();
I'd like to remove the /QuickPDF object (41 0 R, in this image) as well as remove it from the content stream that calls it with /QuickPDF Do.
Unfortunately I cannot provide the PDF.
Any tips on how to do this?
I hate to answer my own question but I wanted to share the solution I found in case others need it.
After playing around with this for a couple days i figured out that Option 1 above would indeed remove the object and that the exception that I was getting from PitStop was because the content stream had a reference to the /QuickPDF XObject.
So I tried following #mkl's solution here Removing Watermark from PDF iTextSharp but it kept putting unwanted data in the content stream that rotated my PDF.
So then I found #Chris's solution here Removing Watermark from a PDF using iTextSharp and it seems to work although I'm not sure how stable this solution will be.
This is my solution for removing /QuickPDF from the content stream:
int numPages = oPdfFile.GetPdfReader().NumberOfPages;
int pgNumber = 1;
PdfDictionary page = oPdfFile.GetPdfReader().GetPageN(pgNumber);
PdfArray contentarray = page.GetAsArray(PdfName.CONTENTS);
PRStream stream;
string content;
if (contentarray != null)
{
//Loop through content
for (int j = 0; j < contentarray.Size; j++)
{
stream = (PRStream)contentarray.GetAsStream(j);
content = Encoding.ASCII.GetString(PdfReader.GetStreamBytes(stream));
string[] tokens = content.Split('\n');
for (int i = 0; i< tokens.Length; i++)
{
if (tokens[i].Contains("/QuickPDF"))
{
tokens[i] = string.Empty;
}
}
string outstr = string.Join("\n", tokens.Select(p => p).ToArray());
byte[] outbytes = Encoding.ASCII.GetBytes(outstr);
stream.SetData(outbytes);
}
}

Adding an imported PDF to a table cell in iTextSharp

I am creating a new PDF that will contain a compilation of other documents.
These other documents can be word/excel/images/PDF's.
I am hoping to add all of this content to cells in a table, which is added to the document - this gives me the goodness of automatically adding pages, positioning elements in a cell rather than a page and allowing me an easier life at keeping content in the same order as i supply (such as img, doc, pdf, img, pdf etc)
Adding images to the table is simple enough.
I am converting the word/excel docs to PDF image streams. I'm also reading in the existing PDF's as a stream.
Adding these to a new PDF is simple enough - by way of adding a template to the PdfContent byte.
What I am trying to do though is add these PDF's to cells in a table, which are then added to the doc.
Is this possible?
Please download chapter 6 of my book. It contains two variations on what you are trying to do:
ImportingPages1, with as result time_table_imported1.pdf
ImportingPages2, with as result time_table_imported2.pdf
This is a code snippet:
// step 1
Document document = new Document();
// step 2
PdfWriter writer
= PdfWriter.getInstance(document, new FileOutputStream(RESULT));
// step 3
document.open();
// step 4
PdfReader reader = new PdfReader(MovieTemplates.RESULT);
int n = reader.getNumberOfPages();
PdfImportedPage page;
PdfPTable table = new PdfPTable(2);
for (int i = 1; i <= n; i++) {
page = writer.getImportedPage(reader, i);
table.getDefaultCell().setRotation(-page.getRotation());
table.addCell(Image.getInstance(page));
}
document.add(table);
// step 5
document.close();
reader.close();
The pages are imported as PdfImportedPage objects, and then wrapped inside an Image so that we can add them to a PdfPTable.

iTextSharp rotated PDF page reverts orientation when file is rasterized at print house

Using iTextSharp I am creating a PDF composed of a collection of existing PDFs, some of the included PDFs are landscape orientation and need to be rotated. So, I do the following:
private static void AdjustRotationIfNeeded(PdfImportedPage pdfImportedPage, PdfReader reader, int documentPage)
{
float width = pdfImportedPage.Width;
float height = pdfImportedPage.Height;
if (pdfImportedPage.Rotation != 0)
{
PdfDictionary pageDict = reader.GetPageN(documentPage);
pageDict.Put(PdfName.ROTATE, new PdfNumber(0));
}
if (width > height)
{
PdfDictionary pageDict = reader.GetPageN(documentPage);
pageDict.Put(PdfName.ROTATE, new PdfNumber(270));
}
}
This works great. The included PDFs rotated to portrait orientation if needed. The PDF prints correctly on my local printer.
This file is sent to a fulfillment house, and unfortunately, the landscape included files do not print properly when going through their printer and rasterization process. They use Kodak (Creo) NexRip 11.01 or Kodak (Creo) Prinergy 6.1. machines. The fulfillment house's suggestion is to: "generate a new PDF file after we rotate pages or make any changes to a PDF. It is as easy as exporting out to a PostScript and distilling back to a PDF."
I know iTextSharp doesn't support PostScript. Is there another way iTextSharp can rotate included PDFs to hold the orientation when rasterized?
First let me assure you that changing the rotation in the page dictionary is the correct procedure to achieve what you want. As far as I can see your code, there's nothing wrong with it. You are doing the right thing.
Unfortunately, you are faced with a third party product over which you have no control that is not doing the right thing. How to solve this?
I have written an example called IncorrectExample. I have named it that way because I don't want it to be used in a context that is different from yours. You can safely ignore all the warnings I added: they are not meant for you. This example is very specific to your problem.
Please try the following code:
public void manipulatePdf(String src, String dest)
throws IOException, DocumentException {
// Creating a reader
PdfReader reader = new PdfReader(src);
// step 1
Rectangle pagesize = getPageSize(reader, 1);
Document document = new Document(pagesize);
// step 2
PdfWriter writer
= PdfWriter.getInstance(document, new FileOutputStream(dest));
// step 3
document.open();
// step 4
PdfContentByte cb = writer.getDirectContent();
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
pagesize = getPageSize(reader, i);
document.setPageSize(pagesize);
document.newPage();
PdfImportedPage page = writer.getImportedPage(reader, i);
if (isPortrait(reader, i)) {
cb.addTemplate(page, 0, 0);
}
else {
cb.addTemplate(page, 0, 1, -1, 0, pagesize.getWidth(), 0);
}
}
// step 4
document.close();
reader.close();
}
public Rectangle getPageSize(PdfReader reader, int pagenumber) {
Rectangle pagesize = reader.getPageSizeWithRotation(pagenumber);
return new Rectangle(
Math.min(pagesize.getWidth(), pagesize.getHeight()),
Math.max(pagesize.getWidth(), pagesize.getHeight()));
}
public boolean isPortrait(PdfReader reader, int pagenumber) {
Rectangle pagesize = reader.getPageSize(pagenumber);
return pagesize.getHeight() > pagesize.getWidth();
}
I have taken the pages.pdf file as an example. This file is special in the sense that it has two pages in landscape that are created in a different way:
one page is a page of which the width is smaller than the height (sounds like it's a page in portrait), but as there's a /Rotate value of 90 added to the page dictionary, it is shown in landscape.
the other page isn't rotated, but it has a height that is smaller than the width.
In my example, I am using the classes Document and PdfWriter to create a copy of the original document. This is wrong in general because it throws away all interaction. I should use PdfStamper or PdfCopy instead, but it is right in your specific case because you don't need the interactivity: the final purpose of the PDF is to be printed.
With Document, I create new pages using a new Rectangle that uses the lowest value of the dimensions of the existing page as the width and the highest value as the height. This way, the page will always be in portrait. Note that I use the method getPageSizeWithRotation() to make sure I get the correct width and height, taking into account any possible rotation.
I then add a PdfImportedPage to the direct content of the writer. I use the isPortrait() method to find out if I need to rotate the page or not. Observe that the isPortrait() method looks at the page size without taking into account the rotation. If we did take into account the rotation, we'd rotate pages that don't need rotating.
The resulting PDF can be found here: pages_changed.pdf
As you can see, some information got lost: there was an annotation on the final page: it's gone. There were specific viewer preferences defined for the original document: they're gone. But that shouldn't matter in your specific case, because all that matters for you is that the pages are printed correctly.

Cannot stamp certain PDFs

PDF stamping works for nearly every document I have tried. However, a client scanned some pages and his computer generated a PDF document that is resistant to stamping. The embedded image files are in JBIG2 format, but I am not sure if that is important. I have debugged the PDF with Apache's pdfbox, and I can see the text is embedded. It just doesn't show up.
Here is the PDF that won't stamp: http://demo.clearvillageinc.com/plans.pdf
And my code:
static void Main(string[] args) {
string stamp = "<div style=\"color:#F00;\">Reviewed for Code Compliance</div>";
string fileName = #"C:\temp\source.pdf";
string outputFileName = #"C:\temp\source-output.pdf";
// Open a destination stream.
using (var destStream = new System.IO.MemoryStream()) {
using (var sourceReader = new PdfReader(fileName)) {
// Convert the HTML into a stamp.
using (var stampData = FromHtml(stamp)) {
using (var stampReader = new PdfReader(stampData)) {
using (var stamper = new PdfStamper(sourceReader, destStream)) {
stamper.Writer.CloseStream = false;
// Add the stamp stream to the source document.
var stampPage = stamper.GetImportedPage(stampReader, 1);
// Process all of the pages in the source document.
for (int i = 1; i <= sourceReader.NumberOfPages; i++) {
var canvas = stamper.GetOverContent(i);
canvas.AddTemplate(stampPage, 0, -50);
}
}
}
}
}
// Finished. Save the file.
using (var fs = new System.IO.FileStream(outputFileName, FileMode.Create)) {
destStream.Position = 0;
destStream.CopyTo(fs);
}
}
}
public static System.IO.Stream FromHtml(string html) {
var ms = new System.IO.MemoryStream();
// Convert html to pdf.
using (var document = new iTextSharp.text.Document()) {
var writer = iTextSharp.text.pdf.PdfWriter.GetInstance(document, ms);
writer.CloseStream = false;
document.Open();
using (var sr = new System.IO.StringReader(html)) {
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, sr);
}
}
ms.Position = 0; // Reset for reading.
return ms;
}
One part of a page definition is the "MediaBox" which controls the page's size. This property takes two locations that specify the coordinates of two opposite corners of a rectangle. Although not required, most PDFs specify the lower left corner first followed by the upper right corner. Also, most PDF use 0x0 for the lower left and then whatever the page's width and height for the top corner. So an 8.5x11 inch PDF would be 0,0 and 612,792 (8.5 * 72 = 612 and 11 * 72 = 792) and this would be written as 0,0,612,792.
Your scanned PDF, however, has for whatever reason decided to treat 0,7072 as the lower left corner and 614,7864 as the top right corner. That still gives us (almost) an 8.5x11 page size but if you try to draw something at 0,0 it will be 7,072 pixels below the actual page. You can see this in Acrobat Pro by zooming out very far (1% for me), picking Tools, Edit Object and then doing a Select All. You should see something way far down selected, too.
To get around this, you need to respect the page's boundaries.
for (int i = 1; i <= sourceReader.NumberOfPages; i++) {
//Get the page to be stamped
var pageToBeStamped = sourceReader.GetPageSize(i);
var canvas = stamper.GetOverContent(i);
//Offset our new page by 50 pixels off of the destination page's bottom
canvas.AddTemplate(stampPage, pageToBeStamped.Left, pageToBeStamped.Bottom - 50);
}
The code above gets the rectangle for the imported page and uses bottom offset by 50 pixels (from your original code). Also, although not a problem in your case, we use the imported page's actual left edge instead of just zero.
This code can still break, however. The math in the first paragraph uses 72 which is the default for PDFs but this can be changed. Most people don't change it but most people also don't change 0,0. Currently your -50 assumes the 72 which gives the visual perception of moving the stamp about seven-tenths of an inch from the top edge. If you run into this scenario you'll want to look into retrieving the user unit.
Also, as I said in the first paragraph, most applications use lower left upper right but this isn't a hard rule. Someone could specify upper right and bottom left or even top left and bottom right. This is a hard one to take into account but it is something that you should at least be aware of.

ITextSharp adding text. Some text not showing up

I am adding text to an already created pdf document using this method.
ITextSharp insert text to an existing pdf
Basically it uses the PdfContentByte and then adds the content template to the page.
I am finding that in some areas of the file, the text doesn't show up.
It seems that the text I am adding is showing up behind the content that is already on the page? I flattened the pdf document down to it just being images but I am still having the same issue happen with the flattened file.
Has anyone had any issues adding text being hidden using Itextsharp?
I also tried using DirectContentUnder as was suggested in this link to no avail..
iTextSharp hides text when write
Here is the code I am using...With this I am trying to basically overlay graph paper on top of the PDF. In this example, there is a box in the upper left corner of every page that doesn't get populated. There is an image in the original pdf in this spot. And on the 4th and 5th pages, there are boxes that don't get populated, but they don't seem to be images.
PdfReader reader = new PdfReader(oldFile);
iTextSharp.text.Rectangle size = reader.GetPageSizeWithRotation(1);
Document document = new Document(size);
// open the writer
FileStream fs = new FileStream(newFile, FileMode.Create, FileAccess.Write);
PdfWriter writer = PdfWriter.GetInstance(document, fs);
document.Open();
// the pdf content
PdfContentByte cb = writer.DirectContent;
for (int i = 0; i < reader.NumberOfPages; i++)
{
document.NewPage();
// select the font properties
BaseFont bf = BaseFont.CreateFont(BaseFont.HELVETICA_BOLD, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
cb.SetFontAndSize(bf, 4);
cb.SetColorStroke(BaseColor.GREEN);
cb.SetLineWidth(1f);
for (int j = 10; j < 600; j += 10)
{
WriteToDoc(ref cb, j.ToString(), j, 10);//Write the line number
WriteToDoc(ref cb, j.ToString(), j, 780);//Write the line number
if (j % 20 == 0)
{
cb.MoveTo(j, 20);
cb.LineTo(j, 760);
cb.Stroke();
}
}
for (int j = 10; j < 800; j += 10)
{
WriteToDoc(ref cb, j.ToString(), 5, j);//Write the line number
WriteToDoc(ref cb, j.ToString(), 590, j);//Write the line number
if (j % 20 == 0)
{
cb.MoveTo(15, j);
cb.LineTo(575, j);
cb.Stroke();
}
}
// create the new page and add it to the pdf
PdfImportedPage page = writer.GetImportedPage(reader, i + 1);
cb.AddTemplate(page, 0, 0);
}
// close the streams and voilá the file should be changed :)
document.Close();
fs.Close();
writer.Close();
reader.Close();
Thanks for any of the help you can provide...I really appreciate it!
-Greg
First of all: If you are trying to basically overlay graph paper on top of the PDF, why do you first draw the graph paper and stamp the original page onto it? You essentially are underlaying graph paper, not overlaying it.
Depending on the content of the page, your graph paper this way may easily get covered. E.g. if there is a filled rectangle in the page content, in the result there is a box in the upper left corner of every page that doesn't get populated.
Thus, simply first add the old page content, then add overlay changes.
This being said, for the task of applying changes to an existing PDF, using PdfWriter and GetImportedPage is less than optimal. This actually is a task for the PdfStamper class which its made for stamping additional content on existing PDFs.
E.g. have a look at the sample StampText, the pivotal code being:
PdfReader reader = new PdfReader(resource);
using (var ms = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(reader, ms))
{
PdfContentByte canvas = stamper.GetOverContent(1);
ColumnText.ShowTextAligned( canvas, Element.ALIGN_LEFT, new Phrase("Hello people!"), 36, 540, 0 );
}
return ms.ToArray();
}