Link to pages of PDFs created with GhostScript - pdf

We have created PDFs from converting individual PostScript pages into a single PDF (and embedding appropriate fonts) using GhostScript.
We've found that an individual page of the PDF cannot be linked to; for example, through the usage of
http://xxxx/yyy.pdf#page=24
There must be something within the PDF that makes this not possible. Are there any specific GhostScript options that should be passed when creating the PDF that would allow this type of page-destination link to work?

There are no specific pdfwrite (the Ghostscript device which actually produces PDF) options to do this. Without knowing why the (presumably) web browser or plugin won't open the file at the specified page its a little difficult to offer any more guidance.
What are you using to view the PDF files ?
Can you make a very simple file that fails ? Can you make that file public ?
If I can reproduce the problem, and the file is sufficiently simple, it may be possible to determine the problem. By the way, which version of Ghostscript are you using ?

Related

Is it possible to obfuscate PDF file binary data?

Is it possible to obfuscate the bytes that are visible when a PDF file is opened with a hex editor? Also, I wonder if there is any problem in viewing the contents of the PDF file even if it is obfuscated.
You will always be able to see whatever bytes are within a file using a hex editor.
There might be ways to generate your pdf pages using methods that don't involve directly writing the text into the pdf (for example using javascript that's obfuscated).
Like answered above, the bytes of the file are always visible when being viewed with a hex-editor. However there are some options to hide/protect data in the file:
You could encrypt either the whole pdf or partial datasets. Note that an encryption/decryption always requires a secret. When the file is fully encrypted you can't read it without the key.
You can add additional similiar dataframes but set them invisible in the pdf. Note that this technique blows up the size of the file.
You can use scripting languages which dynamicly build up your pdf. Be aware that this could look suspicious to users or any anti-virus software.
You can use tools steganography to hide your data. For example a tool you could use is steghide
You can simply compress datastreams in the pdf, e.g. using gzip or similiar compression tools. That way you can't read it directly. However that is easy to recognize and to uncompress for anyone.

GhostScript creating extra page when font errors occur

I have a process that needs to write multiple postscript and pdf files to a single postscript file generated by, and that will continue to be modified by, word interop VB code. Each call to ghostscript results in an extra blank page. I am using GhostScript 9.27.
Since there are several technologies and factors here, I've narrowed it down: the problem can be demonstrated by converting a postscript file to postscript and then to pdf via command line. The problem does not occur going directly from postscript to pdf. Here's an example and an example of the errors.
C:\>"C:\Program Files (x86)\gs\gs9.27\bin\gswin32c.exe" -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=C:\testfont.ps C:\smallexample.ps
C:\>"C:\Program Files (x86)\gs\gs9.27\bin\gswin32c.exe" -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=C:\testfont.pdf C:\testfont.ps
Can't find (or can't open) font file %rom%Resource/Font/TimesNewRomanPSMT.
Can't find (or can't open) font file TimesNewRomanPSMT.
Can't find (or can't open) font file %rom%Resource/Font/TimesNewRomanPSMT.
Can't find (or can't open) font file TimesNewRomanPSMT.
Querying operating system for font files...
Didn't find this font on the system!
Substituting font Times-Roman for TimesNewRomanPSMT.
I'm starting with the assumption that the font errors are the cause of the extra page (if only to rule that out, I know it is not certain). Since my ps->pdf test does not exhibit this problem and my ps->ps->pdf does, I'm thinking ghostscript is not writing font data that was in the original postscript file to the one it is creating. I'm looking for a way to preserve/recreate that in the resulting postscript file. Or if that is not possible, I'll need a way to tell ghostscript how to use those fonts. I did not have success attempting to include them as described in the GS documentation here: https://www.ghostscript.com/doc/current/Use.htm#CIDFontSubstitution.
Any help is appreciated.
I've made this an answer, even though I'm aware it doesn't answer the question, becasue it won'f fit as a comment.
I think your assumption that the missing fonts are causing your problem is flawed. Many PDF files do not embed all the fonts they require, I've seen many such examples and they do not emit extra pages.
You haven't been entirely clear in your description of what you are doing. You describe two processes, one going from PostScript to PDF and one going from PostScript on to PostScript (WHY ?) and then to PDF.
You haven't described why you are processing PostScript into a PostScript file.
In particular you haven't supplied an example file to look at. Without that there's no way to tell whether your experience is in fact correct.
For example; its entirely possible that you have set /Duplex true and have an odd number of pages in your file. This will cause an extra blank page (quite properly) to be emitted, because duplexing requires an even number of pages.
The documentation you linked to is for CIDFont substitution, it has nothing to do with Font substitution, CIDFonts and Fonts are different things in PDF and (more particularly) PostScript. But I honestly doubt that is your problem.
I'd suggest that you put (at the least) 'smallexample.ps' somewhere public and post the URL here, that way we can at least follow the same steps you are doing. That way we can probably tell you what's going on. An explanation of why you're doing this would be useful too, I would normally strongly suggest that you don't do extra steps like this; each step carries the risk of degarding the output in some way.
Thank you for the response. I am posting as an answer as well due to the comment length restrictions:
I think you are correct that my assumption about fonts is wrong. I have found the extra page in the second ps file and do not encounter the font errors until the second conversion.
I have a process that uses VB MSWord Interop libraries to print multiple documents to a single ps file using a virtual printer set up with ghostscript and redmon. I am adding functionality to mix in pdf files too. It works, but results in an extra page. To narrow down where the problem actually was, I tried much simpler test cases via command line to identify the problem. I only get the extra page when ghostscript is converting ps to ps (whether or not there is a pdf as well). Converting ps to pdf I do not get the extra page. Interestingly, I can work around the problem by converting the ps to pdf and then both pdfs back to ps. That is a slower and should not be necessary however, so I would like to identify and resolve the extra page issue. I cannot share that particular file. I'll see if I can create an example I can share that also exhibits the problem. In the meantime, I can confirm that the source ps file is six pages and the duplexing settings are as follows. There is duplex definition in the resulting ps file with the extra page. Might there be some other common culprits I could check for in the source ps? Thank you.
featurebegin{
%%BeginFeature: *DuplexUnit NotInstalled
%%EndFeature
}featurecleanup
featurebegin{
%%BeginFeature: *Duplex None
<</Duplex false /Tumble false>> setpagedevice
%%EndFeature
}featurecleanup

Get selected "PostScript" from PDF

I wasn't able to find anything on the internet and I get the feeling that what I want is not such a trivial thing. To make a long story short: I'd like to get my hands on the underlying code that describes the PDF document of a selected area from a .pdf file. I've been looking for libraries or open source readers but couldn't find anything useful yet.
Does there exist something that might be able to accomplish my needs here or anything that might be reused (like an open source reader) to get there a little faster and not having to write everything from scratch?
You can convert a whole PDF document to PostScript using pdftops, one of the utilities from the poppler PDF rendering library.
This utility enables you to convert individual pages, which is at least a start.
If you just want to extract bitmapped images, try pdfimages from the same package. This extraction can also be restricted to individual pages.
The poppler library was originally written for UNIX-like systems, but there are a couple of windows builds available.
The open source tool from iText called iText RUPS does what you want, showing you all the PDF commands for a particular PDF and allow you to visualize the structure and relationships.
http://sourceforge.net/projects/itextrups/

progressive pdf download

I want to download a pdf file progressively in an iPad application. I m not sure how to do that and google wasn't very helpful. can anyone help me understand the concepts here please. I am planning to render in core graphics.
Thanks.
Do you mean you want to render pdf pages before download is completed? If yes:
First of all, PDF format initially was not designed for that.
Let me explain. PDF file consists of a number of objects and xref. xref is a table containing location (in bytes from the beginning) of every object withing the file, so objects may be located at random locations withing the file. Even worse, xref itself is located at the end of file, so you can't locate any object in the file until you download it.
So, PDF is designed for random access. Actually, HTTP protocol allows it, so if you really need it, you can try to implement it :)
Good news for you: starting from PDF-1.2 there is a special feature called "Linearized PDF". It is designed exactly for your task, so you can render the first page before the next one if downloaded. You can google around or check out pdf reference for more details. The most important thing: you have to linearize pdf file using special tools, so not every pdf file can be rendered progressively.
Bad news for you: looks like core graphics doesn't support. I didn't tried it actually, but I found nothing re linearized pdf in core graphics documentation. (Please let me know if you will find anything.) So you may need to render PDF manually.
Not entirely sure about for iPad, but doing a Save as... in Acrobat by default it will be optimized as Fast Web View, which allows downloads a page at a time instead of the whole document in one go.
http://www.adobe.com/designcenter-archive/acrobat/articles/acr6optimize/acr6optimize.pdf
Linearzied PDF will meet your needs. You need a capable reader such as the one from Adobe to utilize this feature.

Web served image in PDF?

Does PDF and/or Adobe Reader support including an image by URL so that you can insert a dynamic images from a web server into a document?
The answer to your question is both yes and no. If you look in the PDF spec (I'm going by version 1.7) in section 7.11.5, you'll see that a stream within a PDF document can be represented by an URL. So yes, you can go ahead and specify that a PDF has, say, its image content in the specified URL.
The problem will be that when you specify an image within PDF, you are specifying a PARTICULAR image that must have a particular data length and encoding. Simply specifying dimensions, dct compression (aka jpg), and URL is not enough. Images are contained in streams of a particular length. If the stream is too long or too short, it is considered an error.
So you can have images dynamically served up, provided that they are always exactly the same byte length. I think. And I say this because the specification is somewhat ambiguous as to what happens when you set the length to 0 in the stream dictionary.
Now, is doing this practical? Maybe - you'll need a fairly strong PDF toolkit in order to be able to author these documents. And if you have that, I think you'd be better off authoring the entire PDF document that your clients want on the fly rather than trying to substitute an image at read time.
I don't believe you can place a dynamic image in a PDF document in this manner. It's possible to dynamically create an entire PDF document using web-hosted content (using PHP, Coldfusion, etc.) but changing that content later on the web server will not dynamically update previously generated PDF documents, which is what it sounds like you want to do.
As PDFs are meant to be portable by nature (PORTABLE Document Format) and thus, not always viewed online, this goes against the very principle of the document format, and is not supported as far as I know.
You could include a reference to an image at the time of generation of the PDF, but said image will embedded into the PDF, not linked.
You could use pdf.js and modify the rendering methods slightly so that you inject your image. You can find pdf.js here: https://github.com/mozilla/pdf.js
You can also use FlexPaper which has an API that allows you to overlay your document with images
http://flexpaper.devaldi.com/