Save data as editable Pdf - pdf

We have a software, which creates user reports and saves them into pdf documents. We're using Ghostscript for this.
I'm aware that PDF is "normally" an export format which is not editable, but one of our customer needs the possibility (for legal reasons) to edit these files.
I thought it can be possible to save the text in fillable forms (like adobe acrobat offers) and save it that way. Is it possible to create Text within a fillable form in a PDF and save it (with free tools like Ghostscript), so that the user can edit it later?
I read the Ghostscript documentation, but I didn't find anything.

GhostScript isn't really a terrific tool for this. You'd be better off with a PDF generation library which can add the appropriate annotations to the page - if you're wedded to using annotations.
If the "content" must be edited by end users, using widget annotations is not a horribly bad way of doing things, except that every end user needs to have a copy of Acrobat and if only some people are allowed to edit, you will likely have to play with owner password protection and permissions in order prevent anyone from changing field contents.
As for free tools, depending on the usage you could use iText or iTextSharp.
If you are required to be able to take the content of the document and be able to make changes to it on the fly, that's a trickier beast. If you can afford it (and it's certainly not free), my company Atalasoft, publishes a product that I wrote that lets you build PDF documents from scratch or from templates and embed the .NET objects that create the content into the PDF itself, which means that you can read those objects back out and change the content with a site-specific application, for example.

Related

How do I make an offline front end for over 50 pdf documents?

I have over 50 training documents (PDFs) at work. I would like to create a 'front end' that a user can 'run', which would provide a convenient access portal to all the PDFs available.
This needs to be able to be dropped on to my work colleagues laptops (they don't have Office on there but do have Acrobat). And it also needs to be able to be edited/added to as more PDF training materials are created.
I know that I could create a Word document that contained links to the PDFs, then convert that to a PDF itself. Or I could create an offline web page that linked to them, but I wondered if there was a better solution?
Like a way to compile an executable that would bring up a front-end and contain all the PDF files? I've seen similar things for car-repair manuals years ago, where you insert a CD, run an executable and get a nice front-end that essentially just allows you to browse PDF manuals.
Anyone know if this is possible and, if so, how to go about it?
Or does anyone know another viable solution to this?
Thanks
There are indeed various possibilities, depending on what the users have (Acrobat or Reader), and how you can control the distribution.
a) You create a front end PDF document which has links or buttons to open the subsequent documents residing in a subfolder or on the same level as the front end document.
b) You create a front end PDF document into which you embed the subsequent documents as Data Objects. You have buttons which export/open the embedded documents in a different window.
c) You create a front end PDF document into which you embed the subsequent documents as File Attachments (part of the Comments tools). You have buttons which open the embedded documents.
d) You would create a PDF Portfolio in Acrobat, containing the subsequent documents, and maybe provide an overview page from which you can open the documents.
Of these three approaches, a) would run in the biggest number of supporting PDF viewers, in particular also mobile devices. The downside is that you have the subsequent documents around loosely, and your users may mess up with them.
The most elegant (and app-like) approach would be b). However, it requires smart PDF viewers, and you would have to make sure that the user's viewer is not too dumb.
Approach c) would be a compromise between integrity and portability, and approach d) would be quite nice for distributing, but does require a PDF viewer by Adobe, and may most likely not work in any mobile viewer.

Get selected "PostScript" from PDF

I wasn't able to find anything on the internet and I get the feeling that what I want is not such a trivial thing. To make a long story short: I'd like to get my hands on the underlying code that describes the PDF document of a selected area from a .pdf file. I've been looking for libraries or open source readers but couldn't find anything useful yet.
Does there exist something that might be able to accomplish my needs here or anything that might be reused (like an open source reader) to get there a little faster and not having to write everything from scratch?
You can convert a whole PDF document to PostScript using pdftops, one of the utilities from the poppler PDF rendering library.
This utility enables you to convert individual pages, which is at least a start.
If you just want to extract bitmapped images, try pdfimages from the same package. This extraction can also be restricted to individual pages.
The poppler library was originally written for UNIX-like systems, but there are a couple of windows builds available.
The open source tool from iText called iText RUPS does what you want, showing you all the PDF commands for a particular PDF and allow you to visualize the structure and relationships.
http://sourceforge.net/projects/itextrups/

progressive pdf download

I want to download a pdf file progressively in an iPad application. I m not sure how to do that and google wasn't very helpful. can anyone help me understand the concepts here please. I am planning to render in core graphics.
Thanks.
Do you mean you want to render pdf pages before download is completed? If yes:
First of all, PDF format initially was not designed for that.
Let me explain. PDF file consists of a number of objects and xref. xref is a table containing location (in bytes from the beginning) of every object withing the file, so objects may be located at random locations withing the file. Even worse, xref itself is located at the end of file, so you can't locate any object in the file until you download it.
So, PDF is designed for random access. Actually, HTTP protocol allows it, so if you really need it, you can try to implement it :)
Good news for you: starting from PDF-1.2 there is a special feature called "Linearized PDF". It is designed exactly for your task, so you can render the first page before the next one if downloaded. You can google around or check out pdf reference for more details. The most important thing: you have to linearize pdf file using special tools, so not every pdf file can be rendered progressively.
Bad news for you: looks like core graphics doesn't support. I didn't tried it actually, but I found nothing re linearized pdf in core graphics documentation. (Please let me know if you will find anything.) So you may need to render PDF manually.
Not entirely sure about for iPad, but doing a Save as... in Acrobat by default it will be optimized as Fast Web View, which allows downloads a page at a time instead of the whole document in one go.
http://www.adobe.com/designcenter-archive/acrobat/articles/acr6optimize/acr6optimize.pdf
Linearzied PDF will meet your needs. You need a capable reader such as the one from Adobe to utilize this feature.

Update a PDF to include an encrypted, hidden, unique identifier?

Background
The idea is this:
Person provides contact information for online book purchase
Book, as a PDF, is marked with a unique hash
Person downloads book
PDF passwords are easy to circumvent, or share
The ideal process would be something like:
Generate hash based on contact information
Store contact information and hash in database
Acquire book lock
Update an "include" file with hash text
Generate book as PDF (using pdflatex)
Apply hash to book
Release book lock
Send email with book download link
Technologies
The following technologies can be used (other programming languages are possible, but libraries will likely be limited to those supplied by the host):
C, Java, PHP
LaTeX files
PDF files
Linux
Question
What programming techniques (or open source software) should I investigate to:
Embed a unique hash (or other mark) to a PDF
Create a collusion-attack resistant mark
Develop a non-fragile (e.g., PDF -> EPS -> PDF still contains the mark) solution
Research
I have looked at the following possibilities:
Steganography
Natural Language Processing (NLP)
Convert blank pages in PDF to images; mark those images; reassemble PDF
LaTeX watermark package
ImageMagick
Issues
The possible solutions I have researched have the following issues:
Steganography. (a) Requires a master copy of the images, which are converted to EPS, which is CPU-intensive and time-consuming; (b) would the watermark survive PDF -> EPS -> PDF, or other types of conversion; (c) most images are drawings or screen captures, not photographs in PNG format.
LaTeX. Creates an image cache; any steganographic solution would have to intercept that process somehow.
NLP. Introduces grammatical errors; could change meaning of technical words.
Blank Pages. Immediately suspect; it is easy to replace suspicious blank pages.
Watermark Package. Draws visible marks.
ImageMagick. Draws visible marks.
What other solutions are possible?
Related Links
http://www.tcpdf.org/
invisible watermarks in images
Thank you!
I've done this for another project with PDFlib. We needed traceability for the generated PDFs in case the file was leaked. Basically:
Created a source template PDF with the content in place, set the document master password with the required options (no edit, no print, no screen-reader, etc...) set
At runtime, we applied a few watermarks (imposed page footer saying "This document checked out to user #12345", set a few of the metadata fields with user ID, download IP, download date/time, added a "this document copyright by..." cover page, etc...)
Optionally attach a user password to force a PW prompt when document is opened.
Since the latest PDF versions use AES-128 for their encryption, we just set a suitable randomly generated 128char high-entropy password - no one would ever be typing it in by hand so hard-to-typedness was irrelevant to us and actually preferable. The master password prevented end-users from making any changes to the document. The various noprint/no screen read options are actually enforced by the PDF reader and therefore bypassable, but can't hurt to set them anyways.
The downside to this is that PDFlib's licensing is fairly steep. I don't know if any of the free php PDF libraries support the latest PDF encryption schemes, especially the master password stuff, but if you budget can support it, PDFlib's the way to go for secure document production.

Create destinations for all bookmarks in a PDF file with iText API

I'd like to write some (java) code that takes a PDF document, and creates named destinations from all of the bookmarks. I think the iText API is the easiest way of doing this, but I have never used the API before.
How would you go about writing this sort of code with the iText API? Can iText do the parsing needed to manipulate existing PDFs by itself? The kind of manipulations I am thinking of are:
Open,
Find bookmarks,
Create destinations,
Save,
Close.
Or is there a different API that would be better?
Followup: I submitted a patch to iText a few months ago (it has now been accepted and is part of HEAD) that adds text parsing capabilities to iText. PdfBox (mentioned below) has (had?) problems with reading newer PDFs that use xref streams instead of the older xref table format.
Another library that is very good at parsing existing PDF files is PdfBox It can also be used for modifying an existing PDF. FYI - this is the text parser that Lucene uses.
I will also mention that iText does have the ability to parse a PDF file, it's just not great at parsing the text content on each page. If you are looking at accessing the PDF higher level constructs (Dictionaries, etc...) that are used for storing bookmarks, etc... and you don't mind getting your hands a little dirty with reading the PDF spec, you can absolutely do what you are asking about (we do it quite a bit ourselves).
The PDF Spec is big, but readable for the most part, and you don't have to worry about the bulk of it (which is geared towards actual page content and rendering) if all you are trying to do is extract bookmarks.
I'll just warn you up front that you may be disappointed with this. iText isn't really intended to be used as a parser. It's really more for creating entirely new PDF documents, but you can take a whack at it.
To start, using iText, you won't be able to modify the existing PDF document. What you can do, though, is to make a copy with the additional features that you want. (If somebody else knows better, please let me know, this drives me crazy.)
What you will want to do is create a PdfReader object from an input stream on your source file. Then create a PdfCopy object (which is just an extended PdfWriter that makes getting data from an existing source more convenient) for your destination.
As far as I can tell, the bookmarks cannot be obtained from iText at all. Another library may be needed. I think jpedal may have the ability to extract them (it can get them as an XML document, which you may then have to parse to get what you want.) However you get them, you can then add them to a java.util.List, and set that list as outline on the PDFCopy. The bookmarks themselves are just HashMaps with a particular set of keys. I'm not sure what all of the values are, but they include "Title", "Action" (which seems to be where you'd specify that this is a named destination, though I don't know what that value would be), and "URI" (which is used if this is an external link -- I suspect that this would specify the name of the named destination that you're linking to). Again, the specifics are hard to find.
Then iterate over the pages of the reader, importing each page to the PdfCopy. this page may help you.
Sorry I'm not more helpful to you. Good luck.
P.S. If anybody else knows of a better tool that's either (L)GPL or BSD licensed, I'd love to hear about it.