SOLVED: Looking for a way to automate generation of internal PDF hyperlinks - pdf

I have a 300+ page PDF document which needs to have internal page links added to it to reference other pages in the document. The document is created in Visio, which does not support consistent hyperlink generation in PDF export, so the link generation needs to be done on the PDF itself, not up the chain. This is an annual need, and regularly takes over a week due to the amount of manual labor, time, and checking needed.
The text which is hyperlinked has the same format in every case (e.g., "See Section 8.18 - How to Hyperlink"), and I'm certain this can be automated, as there are commercial plugins which can do this, but they cost hundreds of dollars, and are not able to be used in this case due to restrictions imposed by my employer. Example: https://www.evermap.com/ABAddingHyperlinks.asp
I've been looking through the Acrobat Plugin SDK and it seems doable, but I know there is also a higher level scripting language available for Acrobat. Does anyone have experience working with PDFs or with the Acrobat scripting / SDK tools? Are there open source methods for doing this? I've looked everywhere! Willing to learn. I've looked at Ghostscript (Adding internal hyperlink to a pdf) but what I need is way more than just a Table of Contents, and links can appear in many places on the page with line breaks, so consistency is a challenge.
EDIT: I found a solution! Bluebeam software's Revu Extreme works pretty darn well, and can be used as a 30 day free trial of all features. Only limitation is that links which extend across a line break (multiple lines of text) do not properly work in Edge or Chrome's PDF viewer, as they don't properly support hyperlinks with multiple click regions. I've submitted a ticket requesting a feature be added to Revu that fixes this, but for now those links need to be manually fixed following the batch link. The process is described here: https://support.bluebeam.com/online-help/revu2018/Content/RevuHelp/Menus/Batch/Link/Batch-Link--T.htm

EDIT: I found a solution! Bluebeam software's Revu Extreme works pretty darn well, and can be used as a 30 day free trial of all features. Only limitation is that links which extend across a line break (multiple lines of text) do not properly work in Edge or Chrome's PDF viewer, as they don't properly support hyperlinks with multiple click regions. I've submitted a ticket requesting a feature be added to Revu that fixes this, but for now those links need to be manually fixed following the batch link. The process is described here: https://support.bluebeam.com/online-help/revu2018/Content/RevuHelp/Menus/Batch/Link/Batch-Link--T.htm

You can add hyperlinks to a document with Ghostscript, but you would need to know the location of the text to hyperlink and the destination in advance, you cannot automate it or in fact write any reasonably simple code to automate the task using Ghostscript. You'd need to modify chunks of the PDF interpreter, which is written in PostScript and is not a task for anyone not a PostScript expert.
You could probably do it with MuPDF, and probably using MuJS to script it, but I don't know enough to be certain. It would still require some coding effort, but it would probably be easier to use JavaScript at least.

Related

Annotate the pdf file on the location clicked by user

I am having trouble in trying to find the solution for the below described problem.
Annotate the PDF file when user clicks on specific location in pdf and then finaly save the pdf which in future opens at annotated location.
How to approach this?
What I have tried.
I have tried to find various libraries irrespective of programming language (since programing language is not the dependency)- found few libraries like minipdf in python, pdfbox in java to mention few relevant ones. Finally selected pdfbox since it seemed to be mature enough to provide the solution closeby.
There are various hurdles now how to get user the location clicked by the user? since after getting the location I can able to perform various actions like annotating at the clicked location and then saving the pdf on the same specific location.
It seems I have to write whole pdf javascript to approach it but again how to do so?
I had similar problem and have solved it the other way. In my case I am not opening PDF in Adobe reader, but in browser. So what I did is converted the pdf to html using python libraries (Let me know if you are interested, I will share different library names with their pros and cons).
Now that html can be edited easily. We can put hyperlinks, highlights everything there as source code is with us.
This workaround may be applicable to you if your front end is web based.
PS: Wanted to post this workaround as comment, but couldn't due to little less reputation count as of now. Hope, it won't be downmarked :)

Extract Font from PDF using GdPicture

According to their website (http://www.gdpicture.com/products/managed-pdf/) you have the ability to extract fonts from a PDF file. However, I can't seem to find the functionality to do this. I have encountered several methods to add them, but none to extract them (and they don't show as embedded files). Has anyone tried to do this, or have experience with GdPicture?
Version: 14 (Current)
Disclosure: I am part of the ORPALIS technical staff that edits the GdPicture.NET SDK, that's why I know there's an ongoing communication about this already.
It is my understanding that you have a support case open for a merging issue relative to fonts and as you know, our development team is currently working on a fix that will solve it so I strongly recommend that you wait for them to finish.
There's no extraction of the embedded font as you might expect at the moment but the development team is also working on one, we will let you know as soon as it is available (it should be very soon).
You can get information about (already) embedded fonts using the GetFontCount, IsFontEmbedded, GetFontName and GetFontType methods.
You can also add new embedded fonts (of different types) using the AddFontFromFileU, AddStandardFont, AddTrueTypeFont, AddTrueTypeFontFromFile, AddTrueTypeFontFromFileU and AddTrueTypeFontU methods.

Selecting text and image from pdf through any programming language

I'm trying to develop a tool/web application such that it will import a PDF file and I need to select text and images available in PDF by selecting them with a mouse click and marking them as title,content and image with a button click (3 different button) where the marked contents and image will be copied to clipboard or will be pasted into a word document which is going to be a another part. So in which programming language is this possible to work with and carry on ?
I'd probably try researching pure browser-side solution using pdf.js and clipboard API.
Otherwise, you'd still need clipboard API in the browser and the server-side may actually be powered by any programming language which can be hooked into a web server and has a library to parse PDFs.
You said nothing at all about your prospective server platform but to name a few, .NET has PdfSharp which is able to read PDFs, Python has a host of tools available for it. After all, there exist a bunch of command-line utilities to extract data from PDF which can be called using any PL able to call external processes.
Note that this only appears to be a simpler solution than using pdf.js but note that unless your PDFs are really uniform (say, invoices created by some piece of software), and so you'll be able to make your PDF parser know which bits of data it has to extract and return, the parser will need to returl all the data it extracted to the client, and you'll need to somehow render it all there. May be it's exactly what you need but maybe not.
Since PDFs are really tailored for typesetting and not presenting information in a structured manner, I'd try to piggyback on an already hard-core PDF rendering solution which runs in the browser, so see above.

I need some insight on PDF Bookmarks

I haven't done any programming to handle PDFs in depth, only PDF creation with PHP.
I've been asked into a project where the requirements are generating PDF bookmarks with titles created from selected text.
The scenario goes like this:
The user highlights some text in a given PDF file.
The user is prompted to enter the starting page number for the chapter (bookmark)
A bookmark is created with a title which points to the given page number.
Multi-level bookmarks to handle sub-chapters (like child nodes) should be supported.
Due to some restraints, the client would prefer this to be a web app if possible.
What platform/language/technology/library would you recommend?
Is it doable in a browser? Should this be a desktop app instead?
I am fluent in PHP/Javascript and capable in Python with tiny bits of experience on handling PDF files (nothing further than generating formatted PDF). (plus willing to learn anything new)
I've got some time to dig around and conceptualise it, so I'm very open to suggestions.
Any insight would be appreciated.

Add watermark to various documents investigation

I've been asked to investigate the feasibility of adding watermarks to documents when printed through our application. The documents will consist of word, pdf and cad.
The interface of the application is vb6 with a plethora of vc6 dll's.
I can see a couple of possible solutions:
Convert all documents to PDF, add a watermark and then print.
Find a print driver that will add a watermark to all documents prior to printing and install it and reenable it at runtime if it gets disabled for any reason.
3rd Party suites are possibility (we use Volo View Express for viewing CAD files) but since this application is nearing end-of-life we wouldn't want to spend too much on it.
Has anyone had any experience of the above? Any gotcha's that will bog me down?
Tracker Software has a good set of PDF api's that that will allow you to implement the solution you already have in mind. I've used their Image and PDF libraries quite a bit with a lot of success in both VB6 and .NET. Single user licenses are not expensive (depending on how you look at it I guess), and I've found support to be excellent as well.