Use PDFBox to Merge Pages? - pdfbox

I know I can use PDFBox to merge multiple PDF's into one PDF. But is there a way to merge pages? For example, I have a header in PDF and want it to be inserted to the top of the first page of the combined PDF and push everything down. Is there a way to do it using PDFBox API?

Here is some code that works to copy two files into a merged one with multiple copies of each one. It copies by pages. It's something I got using the information in the answer to this question: Can duplicating a pdf with PDFBox be small like with iText?
So all you have to do is to make one copy only of the first page of doc1 and one copy only of all pages of doc2. There's a comment where you'll have to make a change to leave off some pages.
final int COPIES = 1; // total copies
// Same code as linked answer mostly
PDDocument samplePdf = new PDDocument();
InputStream in1 = this.getClass().getResourceAsStream(DOC1_NAME);
PDDocument doc1 = PDDocument.load(in1);
List<PDPage> pages = (List<PDPage>) doc1.getDocumentCatalog().getAllPages();
// *** Change this loop to only copy the pages you want from DOC1
for (PDPage page : pages) {
for (int i = 0; i < COPIES; i++) { // loop for each additional copy
samplePdf.importPage(page);
}
}
// Same code again mostly
InputStream in2 = this.getClass().getResourceAsStream(DOC2_NAME);
PDDocument doc2 = PDDocument.load(in2);
pages = (List<PDPage>) doc2.getDocumentCatalog().getAllPages();
for (PDPage page : pages) {
for (int i = 0; i < COPIES; i++) { // loop for each additional copy
samplePdf.importPage(page);
}
}
// Then write the results out
File output = new File(OUT_NAME);
FileOutputStream out = new FileOutputStream(output);
samplePdf.save(out);
samplePDF.close();
in1.close();
doc1.close();
in2.close();
doc2.close();

Related

How to add watermark on existing pdf file

I am trying to add watermark on pdf file using PdfSharp, I tried from this link
http://www.pdfsharp.net/wiki/Watermark-sample.ashx
but am not able to get how to get the existing pdf file page object and how to watermark on that page.
Help?
Basically, the samples are only snippets. You can download the source and with that you get a bunch of samples, including this watermark example.
The following comes from PDFSharp-MigraDocFoundation-1_32/PDFsharp/samples/Samples C#/Based on GDI+/Watermark/Program.cs
Quite simple, really ... I am only showing the code up to the for loop that goes over each page. You should have a look at the full file.
[...]
const string watermark = "PDFsharp";
const int emSize = 150;
// Get a fresh copy of the sample PDF file
const string filename = "Portable Document Format.pdf";
File.Copy(Path.Combine("../../../../../PDFs/", filename),
Path.Combine(Directory.GetCurrentDirectory(), filename), true);
// Create the font for drawing the watermark
XFont font = new XFont("Times New Roman", emSize, XFontStyle.BoldItalic);
// Open an existing document for editing and loop through its pages
PdfDocument document = PdfReader.Open(filename);
// Set version to PDF 1.4 (Acrobat 5) because we use transparency.
if (document.Version < 14)
document.Version = 14;
for (int idx = 0; idx < document.Pages.Count; idx++)
{
//if (idx == 1) break;
PdfPage page = document.Pages[idx];
[...]

Adding an imported PDF to a table cell in iTextSharp

I am creating a new PDF that will contain a compilation of other documents.
These other documents can be word/excel/images/PDF's.
I am hoping to add all of this content to cells in a table, which is added to the document - this gives me the goodness of automatically adding pages, positioning elements in a cell rather than a page and allowing me an easier life at keeping content in the same order as i supply (such as img, doc, pdf, img, pdf etc)
Adding images to the table is simple enough.
I am converting the word/excel docs to PDF image streams. I'm also reading in the existing PDF's as a stream.
Adding these to a new PDF is simple enough - by way of adding a template to the PdfContent byte.
What I am trying to do though is add these PDF's to cells in a table, which are then added to the doc.
Is this possible?
Please download chapter 6 of my book. It contains two variations on what you are trying to do:
ImportingPages1, with as result time_table_imported1.pdf
ImportingPages2, with as result time_table_imported2.pdf
This is a code snippet:
// step 1
Document document = new Document();
// step 2
PdfWriter writer
= PdfWriter.getInstance(document, new FileOutputStream(RESULT));
// step 3
document.open();
// step 4
PdfReader reader = new PdfReader(MovieTemplates.RESULT);
int n = reader.getNumberOfPages();
PdfImportedPage page;
PdfPTable table = new PdfPTable(2);
for (int i = 1; i <= n; i++) {
page = writer.getImportedPage(reader, i);
table.getDefaultCell().setRotation(-page.getRotation());
table.addCell(Image.getInstance(page));
}
document.add(table);
// step 5
document.close();
reader.close();
The pages are imported as PdfImportedPage objects, and then wrapped inside an Image so that we can add them to a PdfPTable.

Hyperlink Detection from PDF

I have some PDFs containing Hyperlinks both in form of URL and mailto. Now Is there any way or tool(may be 3rd party) to extract the Hyperlink meta information form the PDF like coordinates, link type and destination address. Any help is highly appreciated.
I have already tried with iText and PDFBox but with no major success, even some third party software are not providing me the desired output.
I have tried the following code in Java using iText
PdfReader myReader = new PdfReader("pdf File Path");
PdfDictionary pageDict = myReader.getPageN(1);
PdfArray annots = pageDict.getAsArray(PdfName.ANNOTS);
System.out.println(annots);
ArrayList<String> dests = new ArrayList<String>();
if(annots != null)
{
for(int i=0; i<annots.size(); ++i)
{
PdfDictionary annotDict = annots.getAsDict(i);
PdfName subType = annotDict.getAsName(PdfName.SUBTYPE);
if (subType != null && PdfName.LINK.equals(subType))
{
PdfDictionary action = annotDict.getAsDict(PdfName.A);
if(action != null && PdfName.URI.equals(action.getAsName(PdfName.S)))
{
dests.add(action.getAsString(PdfName.URI).toString());
} // else { its an internal link }
}
}
}
System.out.println(dests);
You can use Docotic.Pdf library for links extraction (disclaimer: I work for the company).
Below is the code that opens specified file, finds all hyperlinks, collects information about position of each link and draws rectangle around each links.
After that the code creates new PDF (with links in rectangles) and a text file with collected information. In the end, both created files are opened in default viewers.
public static void ListAndHighlightLinks(string inputFile, string outputFile, string outputTxt)
{
using (PdfDocument doc = new PdfDocument(inputFile))
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < doc.Pages.Count; i++)
{
PdfPage page = doc.Pages[i];
foreach (PdfWidget widget in page.Widgets)
{
PdfActionArea actionArea = widget as PdfActionArea;
if (actionArea == null)
continue;
PdfUriAction linkAction = actionArea.Action as PdfUriAction;
if (linkAction == null)
continue;
Uri url = linkAction.Uri;
PdfRectangle rect = actionArea.BoundingBox;
// add information about found link into string buffer
sb.Append("Page ");
sb.Append(i.ToString());
sb.Append(" : ");
sb.Append(rect.ToString());
sb.Append(" ");
sb.AppendLine(url.ToString());
// draw rectangle around found link
page.Canvas.DrawRectangle(rect);
}
}
// save document with highlighted links and text information about links to files
doc.Save(outputFile);
System.IO.File.WriteAllText(outputTxt, sb.ToString());
// open created PDF and text file in default viewers
System.Diagnostics.Process.Start(outputTxt);
System.Diagnostics.Process.Start(outputFile);
}
}
You can use the sample code with a call like this:
ListAndHighlightLinks("input.pdf", "output.pdf", "links.txt");
if your pdfs are copy protected, you need to start with step 1, if they're free to copy, you can start with step 2
step 1: convert your pdfs into word .doc: use Adobe Acrobat Pro or an online pdf to word converter:
http://www.pdfonline.com/pdf2word/index.asp
step 2: copy-paste the whole document into the input window here, you can also download the lightweight html tool:
http://www.surf7.net/services/value-added-services/free-web-tools/email-extractor-lite/
select 'url' as 'Type of address to extract', select your separator, hit extract and that's it.
Hope it works cheers.
One possibility would be using a custom JavaScript in Acrobat, which would enumerate the "words" on the page and then read out their Quads. From that you get the coordinates to create a link (or to compare with the links on the page), as well as the actual text (that's the "word(s)".
If it is "only" to set the border of the existing links, you also do another Acrobat JavaScript which enumerates the links of the document, and set their border color property (and you may need to set the width as well).
(if you prefer "buy" over "make" feel free to contact me in private; such things are part of my standard "repertoire").

iTextSharp: What is Lost When Copying PDF Content From Another PDF?

I am currently evaluating iTextSharp for potential use in a project. The code that I have written to achieve my goal is making use of PDFCopy.GetImportedPage to copy all of the pages from an existing PDF. What I want to know is what all do I need to be aware of that will be lost from a PDF and/or page when duplicating PDF content like this? For example, one thing that I already noticed is that I need to manually add in any bookmarks and named destinations into my new PDF.
Here's some rough sample code:
using (PdfReader reader = new PdfReader(inputFilename))
{
using (MemoryStream ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
int n;
n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
// add content and make further modifications here
}
}
// write the content to disk
}
}
Basically anything that's document-level instead of page-level will get lost and both Bookmarks and Destinations are document-level. Pull up the PDF spec and look at section 3.6.1 for other entries in the document catalog including Threads, Open and Additional Actions and Meta Data.
You might already have seen these but here are some samples (in Java) of how to merge Named Destinations and how to merge Bookmarks.

Need alternative to local or remote goto/destinations with merged documents

BACKGROUND
I have a java program that analyzes data and creates a pdf report using itext 5.
I recently had to add a summary of major problems at the start of the document so a user would not have to read over a hundred pages to find problems. Problems are only discovered when serially looking through the data.
I solved the problem by creating 3 pdf documents and then merging them, a start/title pdf, the summary of problems pdf, and the body or analysis pdf. (Basically splitting the original document at the point I wanted to insert the summary)
I use PdfReader and PdfCopy to combine the documents. I am able to keep the chapter bookmarks OK.
THE PROBLEM
As I encounter a significant problem I add it to the 'summary' document. I want to add a link in the summary to point to the problem in the body.
I tried to use Chunk.setLocalDestination and setLocalGoto but realized why that did not work, so I tried using setLocalDestination and setRemoteGoto (with and without 'file://'), but that did not work either. (Also, I used the final pdf document name in the RemoteGoto, not the temporary pdf document name.)
I do not want to use bookmarks because that seems wrong and would not look right.
I am hoping someone could suggest an alternate method or make a suggestion.
To recap, in my current code a create a Chunk with setLocalDestination and that chunk goes into the 'body' document. At the same time I create a setRemoteGoto which is put into the summary document. I was hoping when they were combined the link would work, but when the link is clicked, you go to the first page of the combined document.
Thanks.....
PS I have both iText in action books
CLARIFICATION 3/5/2014
What I was calling 'bookmarks' are really Chapter class entities that are inserted into sections of the 3 documents as they are being created.
After saving the 3 documents, PdfReader is used to open each and PdfCopy is used to put them into a new, final document.
I get the data from the Chapters, which creates the 'bookmarks' on the left side of the Pdf reader used by the user, e.g. Acrobat Reader.
int thisPdfPages = reader.getNumberOfPages();
reader.consolidateNamedDestinations();
java.util.List<HashMap<String, Object>> bookmarks = SimpleBookmark.getBookmark(reader);
if (bookmarks != null) {
if (pageOffset != 0) {
if (debug3) auditLogger.log("Shifting pages by " + pageOffset );
SimpleBookmark.shiftPageNumbers(bookmarks, pageOffset, null);
}
masterBookmarks.addAll(bookmarks);
}
for (int i = 0; i < thisPdfPages;) {
page = copy.getImportedPage(reader, ++i);
stamp = copy.createPageStamp(page);
// add page numbers
ColumnText.showTextAligned(stamp.getUnderContent(), Element.ALIGN_CENTER, new Phrase(String.format("page %d of %d", start + i, totalPages)), 297.5f, 28, 0);
stamp.alterContents();
copy.addPage(page);
}
PRAcroForm form = reader.getAcroForm();
if (form != null) {
copy.copyAcroForm(reader);
}
When analyzing the data I have 2 documents open, a base document which contains all the details and a summary document which contains notable events over some thresholds.
//NOTE section is part of the 'body' document
//NOTE summaryPhrase is a part of the 'summary' document
String linkName = "summaryPf_" + networkid ;
//create Link target
section.add(new Chunk("CHANGE TO EMPTY STRING WHEN WORKING").setLocalDestination( linkName ));
//create Link
Chunk linkChunk = new Chunk( "[Link] " );
Font linkFont = new Font( regularFont );
linkFont.setColor(BaseColor.BLUE);
linkFont.setStyle( Font.UNDERLINE );
linkChunk.setFont( linkFont );
boolean useLocal = true;
// both local and remote goto's fail
if (useLocal) {
linkChunk.setLocalGoto( linkName);
} else {
// all permutations of setting filename fail,
// but it does bring up a permissions dialog when the link is clicked.
//String remotePdfName = "file://./" + pdfReportName ;
//String remotePdfName = "file://" + pdfReportName ;
//String remotePdfName = "file:" + pdfReportName ;
String remotePdfName = pdfReportName ;
linkChunk.setRemoteGoto( remotePdfName, linkName);
}
// add link to summary document
summaryPhrase.add( linkChunk );
summaryPhrase.add( String.format("There were %d devices with ping failures", summaryCount));
summaryPhrase.add( Chunk.NEWLINE );
}
If I use setLocalGoto, when you click the link in the final document you goto the first page.
If I use setRemoteGoto, a dialog ask permission to go to a document, but the document fails to open, tried several permutations on filename.