BACKGROUND
I have a java program that analyzes data and creates a pdf report using itext 5.
I recently had to add a summary of major problems at the start of the document so a user would not have to read over a hundred pages to find problems. Problems are only discovered when serially looking through the data.
I solved the problem by creating 3 pdf documents and then merging them, a start/title pdf, the summary of problems pdf, and the body or analysis pdf. (Basically splitting the original document at the point I wanted to insert the summary)
I use PdfReader and PdfCopy to combine the documents. I am able to keep the chapter bookmarks OK.
THE PROBLEM
As I encounter a significant problem I add it to the 'summary' document. I want to add a link in the summary to point to the problem in the body.
I tried to use Chunk.setLocalDestination and setLocalGoto but realized why that did not work, so I tried using setLocalDestination and setRemoteGoto (with and without 'file://'), but that did not work either. (Also, I used the final pdf document name in the RemoteGoto, not the temporary pdf document name.)
I do not want to use bookmarks because that seems wrong and would not look right.
I am hoping someone could suggest an alternate method or make a suggestion.
To recap, in my current code a create a Chunk with setLocalDestination and that chunk goes into the 'body' document. At the same time I create a setRemoteGoto which is put into the summary document. I was hoping when they were combined the link would work, but when the link is clicked, you go to the first page of the combined document.
Thanks.....
PS I have both iText in action books
CLARIFICATION 3/5/2014
What I was calling 'bookmarks' are really Chapter class entities that are inserted into sections of the 3 documents as they are being created.
After saving the 3 documents, PdfReader is used to open each and PdfCopy is used to put them into a new, final document.
I get the data from the Chapters, which creates the 'bookmarks' on the left side of the Pdf reader used by the user, e.g. Acrobat Reader.
int thisPdfPages = reader.getNumberOfPages();
reader.consolidateNamedDestinations();
java.util.List<HashMap<String, Object>> bookmarks = SimpleBookmark.getBookmark(reader);
if (bookmarks != null) {
if (pageOffset != 0) {
if (debug3) auditLogger.log("Shifting pages by " + pageOffset );
SimpleBookmark.shiftPageNumbers(bookmarks, pageOffset, null);
}
masterBookmarks.addAll(bookmarks);
}
for (int i = 0; i < thisPdfPages;) {
page = copy.getImportedPage(reader, ++i);
stamp = copy.createPageStamp(page);
// add page numbers
ColumnText.showTextAligned(stamp.getUnderContent(), Element.ALIGN_CENTER, new Phrase(String.format("page %d of %d", start + i, totalPages)), 297.5f, 28, 0);
stamp.alterContents();
copy.addPage(page);
}
PRAcroForm form = reader.getAcroForm();
if (form != null) {
copy.copyAcroForm(reader);
}
When analyzing the data I have 2 documents open, a base document which contains all the details and a summary document which contains notable events over some thresholds.
//NOTE section is part of the 'body' document
//NOTE summaryPhrase is a part of the 'summary' document
String linkName = "summaryPf_" + networkid ;
//create Link target
section.add(new Chunk("CHANGE TO EMPTY STRING WHEN WORKING").setLocalDestination( linkName ));
//create Link
Chunk linkChunk = new Chunk( "[Link] " );
Font linkFont = new Font( regularFont );
linkFont.setColor(BaseColor.BLUE);
linkFont.setStyle( Font.UNDERLINE );
linkChunk.setFont( linkFont );
boolean useLocal = true;
// both local and remote goto's fail
if (useLocal) {
linkChunk.setLocalGoto( linkName);
} else {
// all permutations of setting filename fail,
// but it does bring up a permissions dialog when the link is clicked.
//String remotePdfName = "file://./" + pdfReportName ;
//String remotePdfName = "file://" + pdfReportName ;
//String remotePdfName = "file:" + pdfReportName ;
String remotePdfName = pdfReportName ;
linkChunk.setRemoteGoto( remotePdfName, linkName);
}
// add link to summary document
summaryPhrase.add( linkChunk );
summaryPhrase.add( String.format("There were %d devices with ping failures", summaryCount));
summaryPhrase.add( Chunk.NEWLINE );
}
If I use setLocalGoto, when you click the link in the final document you goto the first page.
If I use setRemoteGoto, a dialog ask permission to go to a document, but the document fails to open, tried several permutations on filename.
Related
Problem: I routinely receive PDF reports and annotate (highlight etc.) some of them. I had the bad habit of saving the annotated PDFs together with the non-annotated PDFs. I now have hundreds of PDF files in the same folder, some annotated and some not. Is there a way to check every PDF file for annotations and copy only the annotated ones to a new folder?
Thanks a lot!
I'm on Win 7 64bit, I have Adobe Acrobat XI installed and I'm able to do some beginner coding in Python and Javascript
Please ignore the following suggestion, since the answers already solved the problem.
EDIT: Following Mr. Wyss' suggestion, I created the following code for Acrobat's Javascript console to be run only once at the beginning:
counter = 1;
// Open a new report
var rep = new Report();
rep.size = 1.2;
rep.color = color.blue;
rep.writeText("Files WITH Annotations");
Then this code should be applied to all PDFs:
this.syncAnnotScan();
annots = this.getAnnots();
path = this.path;
if (annots) {
rep.color = color.black;
rep.writeText(" ");
rep.writeText(counter.toString()+"- "+path);
rep.writeText(" ");
if (counter% 20 == 0) {
rep.breakPage();
}
counter++;
}
And, at last, one code to be run only once at the end:
//Now open the report
var docRep = rep.open("files_with_annots.pdf");
There are two problems with this solution:
1. The "Action Wizard" seems to always apply the same code afresh to each PDF (that means that the "counter" variable, for instance, is meaningless; it will always be = 1. But more importantly, var "rep" will be unassigned when the middle code is run on different PDFs).
2. How can I make the codes that should be run only once run only at the beginning or at the end, instead of running everytime for every single PDF (like it does by default)?
Thank you very much again for your help!
This would be possible using the Action Wizard to put together an action.
The function to determine whether there are annotations in the document would be done in Acrobat JavaScript. Roughly, the core function would look like this:
this.syncAnnotScan() ; // updates all annots
var myAnnots = this.getAnnots() ;
if (myAnnots != null) {
// do something if there are annots
} else {
// do something if there are no annots
}
And that should get you there.
I am not completely positive, but I think there is also a Preflight check which tells you whether there are annotations in the document. If so, you would create a Preflight droplet, which would sort out the annotated and not annotated documents.
Mr. Wyss is right, here's a step-by-step guide:
In Acrobat XI Pro, go to the 'Tools' panel on the right side
Click on the 'Action Wizard' tab (you must first make it visible, though)
Click on 'Create New Action...', choose 'More tools' > 'Execute Javascript' and add it to right-hand pane > click on 'Execute Javascript' > 'Specify Settings' (uncheck 'prompt user' if you want) > paste this code:
.
this.syncAnnotScan();
var annots = this.getAnnots();
var fname = this.documentFileName;
fname = fname.replace(",", ";");
var errormsg = "";
if (annots) {
try {
this.saveAs({
cPath: "/c/folder/"+fname,
bPromptToOverwrite: false //make this 'true' if you want to be prompted on overwrites
});
} catch(e) {
for (var i in e)
{errormsg+= (i + ": " + e[i]+ " / ");}
app.alert({
cMsg: "Error! Unable to save the file under this name ('"+fname+"'- possibly an unicode string?) See this: "+errormsg,
cTitle: "Damn you Acrobat"
});
}
;}
annots = 0;
Save and run it! All your annotated PDFs will be saved to 'c:\folder' (but only if this folder already exists!)
Be sure to enable first Javascript in 'Edit' > 'Preferences...' > 'Javascript' > 'Enable Acrobat Javascript'.
VERY IMPORTANT: Acrobat's JS has a bug that doesn't allow Docs to be saved with commas (",") in their names (e.g., "Meeting with suppliers, May 11th.pdf" - this will get an error). Therefore, I substitute in the code above all "," for ";".
I have some PDFs containing Hyperlinks both in form of URL and mailto. Now Is there any way or tool(may be 3rd party) to extract the Hyperlink meta information form the PDF like coordinates, link type and destination address. Any help is highly appreciated.
I have already tried with iText and PDFBox but with no major success, even some third party software are not providing me the desired output.
I have tried the following code in Java using iText
PdfReader myReader = new PdfReader("pdf File Path");
PdfDictionary pageDict = myReader.getPageN(1);
PdfArray annots = pageDict.getAsArray(PdfName.ANNOTS);
System.out.println(annots);
ArrayList<String> dests = new ArrayList<String>();
if(annots != null)
{
for(int i=0; i<annots.size(); ++i)
{
PdfDictionary annotDict = annots.getAsDict(i);
PdfName subType = annotDict.getAsName(PdfName.SUBTYPE);
if (subType != null && PdfName.LINK.equals(subType))
{
PdfDictionary action = annotDict.getAsDict(PdfName.A);
if(action != null && PdfName.URI.equals(action.getAsName(PdfName.S)))
{
dests.add(action.getAsString(PdfName.URI).toString());
} // else { its an internal link }
}
}
}
System.out.println(dests);
You can use Docotic.Pdf library for links extraction (disclaimer: I work for the company).
Below is the code that opens specified file, finds all hyperlinks, collects information about position of each link and draws rectangle around each links.
After that the code creates new PDF (with links in rectangles) and a text file with collected information. In the end, both created files are opened in default viewers.
public static void ListAndHighlightLinks(string inputFile, string outputFile, string outputTxt)
{
using (PdfDocument doc = new PdfDocument(inputFile))
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i < doc.Pages.Count; i++)
{
PdfPage page = doc.Pages[i];
foreach (PdfWidget widget in page.Widgets)
{
PdfActionArea actionArea = widget as PdfActionArea;
if (actionArea == null)
continue;
PdfUriAction linkAction = actionArea.Action as PdfUriAction;
if (linkAction == null)
continue;
Uri url = linkAction.Uri;
PdfRectangle rect = actionArea.BoundingBox;
// add information about found link into string buffer
sb.Append("Page ");
sb.Append(i.ToString());
sb.Append(" : ");
sb.Append(rect.ToString());
sb.Append(" ");
sb.AppendLine(url.ToString());
// draw rectangle around found link
page.Canvas.DrawRectangle(rect);
}
}
// save document with highlighted links and text information about links to files
doc.Save(outputFile);
System.IO.File.WriteAllText(outputTxt, sb.ToString());
// open created PDF and text file in default viewers
System.Diagnostics.Process.Start(outputTxt);
System.Diagnostics.Process.Start(outputFile);
}
}
You can use the sample code with a call like this:
ListAndHighlightLinks("input.pdf", "output.pdf", "links.txt");
if your pdfs are copy protected, you need to start with step 1, if they're free to copy, you can start with step 2
step 1: convert your pdfs into word .doc: use Adobe Acrobat Pro or an online pdf to word converter:
http://www.pdfonline.com/pdf2word/index.asp
step 2: copy-paste the whole document into the input window here, you can also download the lightweight html tool:
http://www.surf7.net/services/value-added-services/free-web-tools/email-extractor-lite/
select 'url' as 'Type of address to extract', select your separator, hit extract and that's it.
Hope it works cheers.
One possibility would be using a custom JavaScript in Acrobat, which would enumerate the "words" on the page and then read out their Quads. From that you get the coordinates to create a link (or to compare with the links on the page), as well as the actual text (that's the "word(s)".
If it is "only" to set the border of the existing links, you also do another Acrobat JavaScript which enumerates the links of the document, and set their border color property (and you may need to set the width as well).
(if you prefer "buy" over "make" feel free to contact me in private; such things are part of my standard "repertoire").
I am currently evaluating iTextSharp for potential use in a project. The code that I have written to achieve my goal is making use of PDFCopy.GetImportedPage to copy all of the pages from an existing PDF. What I want to know is what all do I need to be aware of that will be lost from a PDF and/or page when duplicating PDF content like this? For example, one thing that I already noticed is that I need to manually add in any bookmarks and named destinations into my new PDF.
Here's some rough sample code:
using (PdfReader reader = new PdfReader(inputFilename))
{
using (MemoryStream ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
int n;
n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
// add content and make further modifications here
}
}
// write the content to disk
}
}
Basically anything that's document-level instead of page-level will get lost and both Bookmarks and Destinations are document-level. Pull up the PDF spec and look at section 3.6.1 for other entries in the document catalog including Threads, Open and Additional Actions and Meta Data.
You might already have seen these but here are some samples (in Java) of how to merge Named Destinations and how to merge Bookmarks.
I know I can use PDFBox to merge multiple PDF's into one PDF. But is there a way to merge pages? For example, I have a header in PDF and want it to be inserted to the top of the first page of the combined PDF and push everything down. Is there a way to do it using PDFBox API?
Here is some code that works to copy two files into a merged one with multiple copies of each one. It copies by pages. It's something I got using the information in the answer to this question: Can duplicating a pdf with PDFBox be small like with iText?
So all you have to do is to make one copy only of the first page of doc1 and one copy only of all pages of doc2. There's a comment where you'll have to make a change to leave off some pages.
final int COPIES = 1; // total copies
// Same code as linked answer mostly
PDDocument samplePdf = new PDDocument();
InputStream in1 = this.getClass().getResourceAsStream(DOC1_NAME);
PDDocument doc1 = PDDocument.load(in1);
List<PDPage> pages = (List<PDPage>) doc1.getDocumentCatalog().getAllPages();
// *** Change this loop to only copy the pages you want from DOC1
for (PDPage page : pages) {
for (int i = 0; i < COPIES; i++) { // loop for each additional copy
samplePdf.importPage(page);
}
}
// Same code again mostly
InputStream in2 = this.getClass().getResourceAsStream(DOC2_NAME);
PDDocument doc2 = PDDocument.load(in2);
pages = (List<PDPage>) doc2.getDocumentCatalog().getAllPages();
for (PDPage page : pages) {
for (int i = 0; i < COPIES; i++) { // loop for each additional copy
samplePdf.importPage(page);
}
}
// Then write the results out
File output = new File(OUT_NAME);
FileOutputStream out = new FileOutputStream(output);
samplePdf.save(out);
samplePDF.close();
in1.close();
doc1.close();
in2.close();
doc2.close();
I have been searching for days for a solution to this problem.
Description : I have a website which loads a PDF dynamically via an iFrame. The PDF is saved on the server and the user of the website can view the pdf on the website.
Problem : Introduce a Print button on website which prints the PDF which was created dynamically and saved on the server.
Is this even possible ? I am looking at a cross-browser implementation as well to make things worse. I have tried n number of JS options from the web but none of them seem to work. I can not seem to get the PDF printed in the same way as it looks. To put it short, I am trying to emulate the print button which appears on the PDF when it is loaded. Is there an option to pass the pdf document from the server to the print dialog box ?
Description : I have a website which loads a PDF dynamically via an iFrame. The PDF is saved on the server and the user of the website can view the pdf on the website.
Problem : Introduce a Print button on website which prints the PDF which was created dynamically and saved on the server.
Solution : I could not find an exact solution to this problem, but here is how I solved the problem -
Create the 'Print' as per req and redirect that to another page which has only the PDF.
Copy the previous PDF & Create new PDF with JS - this.print() such that when it opens up, the print dialog pops up directly to the user.
In the new page -
if ("Location of PDF " != null)
{
sPdf = "Location of PDF ";
PdfReader pReader = new PdfReader(sPdf);
Document document = new Document
(pReader.GetPageSizeWithRotation(ApplicationConstants.INDEX_ONE));
int n = pReader.NumberOfPages;
FileStream fs = new FileStream
("New PDF location",
FileMode.Create, FileAccess.Write);
PdfCopy copy = new PdfCopy(document, fs);
// Write to pdf
document.Open();
for (int i = ApplicationConstants.INDEX_ONE; i <= n; i++)
{
PdfImportedPage page = copy.GetImportedPage(pReader, i);
copy.AddPage(page);
}
copy.AddJavaScript("this.print(true);", true);
document.Close();
pReader.Close();
inStr = File.OpenRead("New PDF location");
while ((bytecnt = inStr.Read
(buffer, ApplicationConstants.INDEX_ZERO, buffer.Length))
> ApplicationConstants.INDEX_ZERO)
{
if (Context.Response.IsClientConnected)
{
Context.Response.ContentType = "application/PDF";
Context.Response.OutputStream.Write(buffer,
ApplicationConstants.INDEX_ZERO, buffer.Length);
Context.Response.Flush();
}
}
}
Please note that I am using itextsharp to inject the JS script into the new PDF. Hope this helps someone else. I am trying to find another solution without the usage of itextsharp or any other dll but this will have to do for now.
I am not sure if this will work, but you could try launching a popup window with a special version of your PDF file that opens the print dialog when opened. Then close the popup afterwards. This last part might be tricky since I think there is no clean way to know if the print dialog has been closed.