I haven't done any programming to handle PDFs in depth, only PDF creation with PHP.
I've been asked into a project where the requirements are generating PDF bookmarks with titles created from selected text.
The scenario goes like this:
The user highlights some text in a given PDF file.
The user is prompted to enter the starting page number for the chapter (bookmark)
A bookmark is created with a title which points to the given page number.
Multi-level bookmarks to handle sub-chapters (like child nodes) should be supported.
Due to some restraints, the client would prefer this to be a web app if possible.
What platform/language/technology/library would you recommend?
Is it doable in a browser? Should this be a desktop app instead?
I am fluent in PHP/Javascript and capable in Python with tiny bits of experience on handling PDF files (nothing further than generating formatted PDF). (plus willing to learn anything new)
I've got some time to dig around and conceptualise it, so I'm very open to suggestions.
Any insight would be appreciated.
Related
I have a 300+ page PDF document which needs to have internal page links added to it to reference other pages in the document. The document is created in Visio, which does not support consistent hyperlink generation in PDF export, so the link generation needs to be done on the PDF itself, not up the chain. This is an annual need, and regularly takes over a week due to the amount of manual labor, time, and checking needed.
The text which is hyperlinked has the same format in every case (e.g., "See Section 8.18 - How to Hyperlink"), and I'm certain this can be automated, as there are commercial plugins which can do this, but they cost hundreds of dollars, and are not able to be used in this case due to restrictions imposed by my employer. Example: https://www.evermap.com/ABAddingHyperlinks.asp
I've been looking through the Acrobat Plugin SDK and it seems doable, but I know there is also a higher level scripting language available for Acrobat. Does anyone have experience working with PDFs or with the Acrobat scripting / SDK tools? Are there open source methods for doing this? I've looked everywhere! Willing to learn. I've looked at Ghostscript (Adding internal hyperlink to a pdf) but what I need is way more than just a Table of Contents, and links can appear in many places on the page with line breaks, so consistency is a challenge.
EDIT: I found a solution! Bluebeam software's Revu Extreme works pretty darn well, and can be used as a 30 day free trial of all features. Only limitation is that links which extend across a line break (multiple lines of text) do not properly work in Edge or Chrome's PDF viewer, as they don't properly support hyperlinks with multiple click regions. I've submitted a ticket requesting a feature be added to Revu that fixes this, but for now those links need to be manually fixed following the batch link. The process is described here: https://support.bluebeam.com/online-help/revu2018/Content/RevuHelp/Menus/Batch/Link/Batch-Link--T.htm
EDIT: I found a solution! Bluebeam software's Revu Extreme works pretty darn well, and can be used as a 30 day free trial of all features. Only limitation is that links which extend across a line break (multiple lines of text) do not properly work in Edge or Chrome's PDF viewer, as they don't properly support hyperlinks with multiple click regions. I've submitted a ticket requesting a feature be added to Revu that fixes this, but for now those links need to be manually fixed following the batch link. The process is described here: https://support.bluebeam.com/online-help/revu2018/Content/RevuHelp/Menus/Batch/Link/Batch-Link--T.htm
You can add hyperlinks to a document with Ghostscript, but you would need to know the location of the text to hyperlink and the destination in advance, you cannot automate it or in fact write any reasonably simple code to automate the task using Ghostscript. You'd need to modify chunks of the PDF interpreter, which is written in PostScript and is not a task for anyone not a PostScript expert.
You could probably do it with MuPDF, and probably using MuJS to script it, but I don't know enough to be certain. It would still require some coding effort, but it would probably be easier to use JavaScript at least.
I am having trouble in trying to find the solution for the below described problem.
Annotate the PDF file when user clicks on specific location in pdf and then finaly save the pdf which in future opens at annotated location.
How to approach this?
What I have tried.
I have tried to find various libraries irrespective of programming language (since programing language is not the dependency)- found few libraries like minipdf in python, pdfbox in java to mention few relevant ones. Finally selected pdfbox since it seemed to be mature enough to provide the solution closeby.
There are various hurdles now how to get user the location clicked by the user? since after getting the location I can able to perform various actions like annotating at the clicked location and then saving the pdf on the same specific location.
It seems I have to write whole pdf javascript to approach it but again how to do so?
I had similar problem and have solved it the other way. In my case I am not opening PDF in Adobe reader, but in browser. So what I did is converted the pdf to html using python libraries (Let me know if you are interested, I will share different library names with their pros and cons).
Now that html can be edited easily. We can put hyperlinks, highlights everything there as source code is with us.
This workaround may be applicable to you if your front end is web based.
PS: Wanted to post this workaround as comment, but couldn't due to little less reputation count as of now. Hope, it won't be downmarked :)
I'm trying to develop a tool/web application such that it will import a PDF file and I need to select text and images available in PDF by selecting them with a mouse click and marking them as title,content and image with a button click (3 different button) where the marked contents and image will be copied to clipboard or will be pasted into a word document which is going to be a another part. So in which programming language is this possible to work with and carry on ?
I'd probably try researching pure browser-side solution using pdf.js and clipboard API.
Otherwise, you'd still need clipboard API in the browser and the server-side may actually be powered by any programming language which can be hooked into a web server and has a library to parse PDFs.
You said nothing at all about your prospective server platform but to name a few, .NET has PdfSharp which is able to read PDFs, Python has a host of tools available for it. After all, there exist a bunch of command-line utilities to extract data from PDF which can be called using any PL able to call external processes.
Note that this only appears to be a simpler solution than using pdf.js but note that unless your PDFs are really uniform (say, invoices created by some piece of software), and so you'll be able to make your PDF parser know which bits of data it has to extract and return, the parser will need to returl all the data it extracted to the client, and you'll need to somehow render it all there. May be it's exactly what you need but maybe not.
Since PDFs are really tailored for typesetting and not presenting information in a structured manner, I'd try to piggyback on an already hard-core PDF rendering solution which runs in the browser, so see above.
I have searched using many different terms and phrases, and waded through many pages of results, but I have (remarkably) not seen anyone else addressing, even asking, about, this issue. So here goes...
Ultimate Goal: Allow a user viewing a content-based page (may contain both text and images) within a Windows Store app to share that content with someone else.
Description
I am working on taking a fair amount of content and making it available for browsing/navigating as a Windows 8/WinRT/Windows Store (we need a consistent name here) application. One of the desired features is to take advantage of the Share Charm, such that someone viewing a page could share that page with someone else.
The ideal behavior is for the application to implement the Share Source contract which would share an email message that contained some explanatory text, a link to get the app from the Windows Store, and a "deep link" into the shared page in the application.
Solutions Considered
We had originally looked at just generating a PDF representation of the page, but there are very few external libraries that would work under WinRT, and having to include externally licensed code would be problematic as well. Writing our own PDF generation code would out of scope.
We have also considered generating a Word document or PowerPoint slide using OpenXML, but again, we run up against the limitaions of WinRT. In this case, it is highly unlikely the OpenXML SDK is useable in a WinRT application.
Another thought was to pre-generate all of the pages as .pdf files, store them as resources, and when the Share Charm is invoked, share the .pdf file associated with the current page. The problem here is the application will have at least 150 content pages, and depending on how we break the content down, up to over 600. This would likely cause serious bloat.
Where We Are At
Thus we have come to sharing URIs. From what I can tell, though, the "deep linking" feature is only intended for use on Secondary Tiles tied to your application. Another avenue I considered was registering a protocol like, "my-special-app:" with the OS and having it fire up the application but that would require HKCR registry access, which is outside the WinRT sandbox.
If it matters, we are leaning towards an HTML/JS application, rather than XAML/C#, because the converted content will all be in HTML and the WebView control in WinRT is fairly limited. This decision is not yet final, though.
Conclusion
So, is this possible, and if so, how would it be done or where can I find documentation on it?
Thanks,
Dave Parker
I've been asked to investigate the feasibility of adding watermarks to documents when printed through our application. The documents will consist of word, pdf and cad.
The interface of the application is vb6 with a plethora of vc6 dll's.
I can see a couple of possible solutions:
Convert all documents to PDF, add a watermark and then print.
Find a print driver that will add a watermark to all documents prior to printing and install it and reenable it at runtime if it gets disabled for any reason.
3rd Party suites are possibility (we use Volo View Express for viewing CAD files) but since this application is nearing end-of-life we wouldn't want to spend too much on it.
Has anyone had any experience of the above? Any gotcha's that will bog me down?
Tracker Software has a good set of PDF api's that that will allow you to implement the solution you already have in mind. I've used their Image and PDF libraries quite a bit with a lot of success in both VB6 and .NET. Single user licenses are not expensive (depending on how you look at it I guess), and I've found support to be excellent as well.