How do I make an offline front end for over 50 pdf documents? - pdf

I have over 50 training documents (PDFs) at work. I would like to create a 'front end' that a user can 'run', which would provide a convenient access portal to all the PDFs available.
This needs to be able to be dropped on to my work colleagues laptops (they don't have Office on there but do have Acrobat). And it also needs to be able to be edited/added to as more PDF training materials are created.
I know that I could create a Word document that contained links to the PDFs, then convert that to a PDF itself. Or I could create an offline web page that linked to them, but I wondered if there was a better solution?
Like a way to compile an executable that would bring up a front-end and contain all the PDF files? I've seen similar things for car-repair manuals years ago, where you insert a CD, run an executable and get a nice front-end that essentially just allows you to browse PDF manuals.
Anyone know if this is possible and, if so, how to go about it?
Or does anyone know another viable solution to this?
Thanks

There are indeed various possibilities, depending on what the users have (Acrobat or Reader), and how you can control the distribution.
a) You create a front end PDF document which has links or buttons to open the subsequent documents residing in a subfolder or on the same level as the front end document.
b) You create a front end PDF document into which you embed the subsequent documents as Data Objects. You have buttons which export/open the embedded documents in a different window.
c) You create a front end PDF document into which you embed the subsequent documents as File Attachments (part of the Comments tools). You have buttons which open the embedded documents.
d) You would create a PDF Portfolio in Acrobat, containing the subsequent documents, and maybe provide an overview page from which you can open the documents.
Of these three approaches, a) would run in the biggest number of supporting PDF viewers, in particular also mobile devices. The downside is that you have the subsequent documents around loosely, and your users may mess up with them.
The most elegant (and app-like) approach would be b). However, it requires smart PDF viewers, and you would have to make sure that the user's viewer is not too dumb.
Approach c) would be a compromise between integrity and portability, and approach d) would be quite nice for distributing, but does require a PDF viewer by Adobe, and may most likely not work in any mobile viewer.

Related

I'd like to recognize the text of all pdfs on my computer and save them without moving them from their locations. Is it possible?

I've tried using Adobe Acrobat X Pro to "recognize text in multiple files."
When I start this process and it asks for the directory, I've chose C:, my main hard drive.
It took hours to load and when it did, the list of files it generated included word documents as well. Adobe said I couldn't proceed until I removed the problem files.
Once I removed all the pdfs Adobe flagged as having errors (like password protection) and the prompt remained, I assumed it meant the word documents in the list.
So I manually removed those too. But Adobe still said that I couldn't proceed until problem files were removed and there weren't any remaining files in the list that adobe had flagged as having issues.
My firm is trying to make sure all pdfs we have are searcheable. Currently, some are and some aren't. Our goal is to make them all searchable without removing them from their varied locations.
I think you can do this using a combination of
regular java : to list all files in a directory that match a given criterium (e.g. their name ends with '.pdf')
iText : to iterate over the PDF document and extract all images
Tess4J : a port of Tesseract (google OCR engine) for java, to turn the extracted images back into text
Unless I am much mistaken, Tesseract even offers a crude version of this workflow for you. But only for 1 pdf at a time. So you'd still need some windows/linux scripting to pipe in all files of a given directory.

Get selected "PostScript" from PDF

I wasn't able to find anything on the internet and I get the feeling that what I want is not such a trivial thing. To make a long story short: I'd like to get my hands on the underlying code that describes the PDF document of a selected area from a .pdf file. I've been looking for libraries or open source readers but couldn't find anything useful yet.
Does there exist something that might be able to accomplish my needs here or anything that might be reused (like an open source reader) to get there a little faster and not having to write everything from scratch?
You can convert a whole PDF document to PostScript using pdftops, one of the utilities from the poppler PDF rendering library.
This utility enables you to convert individual pages, which is at least a start.
If you just want to extract bitmapped images, try pdfimages from the same package. This extraction can also be restricted to individual pages.
The poppler library was originally written for UNIX-like systems, but there are a couple of windows builds available.
The open source tool from iText called iText RUPS does what you want, showing you all the PDF commands for a particular PDF and allow you to visualize the structure and relationships.
http://sourceforge.net/projects/itextrups/

Save data as editable Pdf

We have a software, which creates user reports and saves them into pdf documents. We're using Ghostscript for this.
I'm aware that PDF is "normally" an export format which is not editable, but one of our customer needs the possibility (for legal reasons) to edit these files.
I thought it can be possible to save the text in fillable forms (like adobe acrobat offers) and save it that way. Is it possible to create Text within a fillable form in a PDF and save it (with free tools like Ghostscript), so that the user can edit it later?
I read the Ghostscript documentation, but I didn't find anything.
GhostScript isn't really a terrific tool for this. You'd be better off with a PDF generation library which can add the appropriate annotations to the page - if you're wedded to using annotations.
If the "content" must be edited by end users, using widget annotations is not a horribly bad way of doing things, except that every end user needs to have a copy of Acrobat and if only some people are allowed to edit, you will likely have to play with owner password protection and permissions in order prevent anyone from changing field contents.
As for free tools, depending on the usage you could use iText or iTextSharp.
If you are required to be able to take the content of the document and be able to make changes to it on the fly, that's a trickier beast. If you can afford it (and it's certainly not free), my company Atalasoft, publishes a product that I wrote that lets you build PDF documents from scratch or from templates and embed the .NET objects that create the content into the PDF itself, which means that you can read those objects back out and change the content with a site-specific application, for example.

Parse InDesign (.indd) files for search index

Could any of you help me with the following:
I have quite a bunch load of InDesign Documents, and I need to be able to search through them, text wise. I don't have the resources of opening these files, make a pdf, and then do the search. I want, in short, to be able to either extract the textual context and index that, or directly index the file itself.
In the end, I would present the content or the index to a SOLR engine for further processing. This all should take place in a php/apache/mysql environment.
Your insights are highly appreciated.
In order to search the textual contents of an InDesign file, you will have to open the file in InDesign or InDesign server. There is no legal way around this.
However, there is no need to do a time consuming pdf export. You can use the InDesign scripting API to search through the text content of the file and create an index either inside the document or in an external location.
I think you might be looking for an application that can read & allow you to edit text in InDesign without having to actually have InDesign?
If so, I may be wrong, but there is a product in the market called PageZephyr, from Markzware.
You should look into it, I believe there's 30-day free demo as well. I used it awhile ago and it worked great, saved me tons of time. I don't have much InDesign files nowadays though.
Google them.

Dynamic PDF features

I've been asked to write a program which generates reports in the form of PDF files. There are two main dynamic features which have been asked for, which I'm not sure are even possible:
1) The report contains a table with several columns. Users should be able to click on the column header to sort the table rows by the values in that column.
I've never seen a PDF file that users can click on to re-sort table results, but I'm told that this is possible.
2) The report should have a dropdown box which users can select to toggle which rows of the table are displayed or hidden.
I'm fairly sure that this isn't possible to do in a PDF file, though I've been told otherwise.
So my question is, which of these things are even possible, and what library should I use for generating PDF files? (The library can be in any programming language.)
Don't use PDF as a substitute for html/CSS/JavaScript/etc. PDF is best when it's used as an immutable document format, not as a poor man's web page. Sure, you can put your foot in a box and call it a shoe, but it's really just a box.
Have a look at
Sorting tables in dynamic PDF on the Adobe Developer Connection website.
You can also download a ready-to-study sample PDF with that feature built in.
I would look at Acrobat. There is a JavaScript implementation for it.
http://www.adobe.com/devnet/acrobat/javascript.html
For Java there are the following tools / libraries that are very good and stable:
JasperReports - you design your report in a graphical designer and then populate it with data programatically.
The other is iText. It works on the lower lavel (actualy JasperReports is built on top of it for the PDF part), so it might support the requested sorting options.
Yes, all of those dynamic features are possible with an XFA PDF form (created in LiveCycle Designer) and scripting ( JavaScript). We have examples of sorting rows in tables and hiding and showing sub-forms at http://www.pdfscripting.com , but you must be a member to access them (not free). You may be able to find free sample files doing an internet search for XFA PDFs or LiveCycle Designer PDFs- not sure but it is possible at any rate.
Dimitri
WindJack Solutions
http://www.windjack.com