Ghostscript: preserve annotations (keep print/non-print distinction)?

Ghostscript: preserve annotations (keep print/non-print distinction)? - pdf

I'm currently using GS to convert PDFs into PDFs (mostly to convert T1 fonts to CFF).
-dPDFSETTINGS=/prepress -dAutoRotatePages=/Nones -dEmbedAllFonts=true -dSubsetFonts=true -dPrinted=false
These are my flags that I use right now. Because I set dPrinted=false, it preserves the links in my files, even when they are set not to print.
But if I have links and annotations which normally would only show in one medium (on screen), dPrinted seems to force me to choose one way or the other for my converted file. Is there any way to input a PDF which has annotations on screen but not on paper, and output a PDF which has the same distinction?
Thanks.

Currently, no, there is no way to keep both printing and non-printing annotations, you can have one or the other.
By altering Printed you can keep either printing annotations (Printed=true) or non-printed (Printed=false) annotations.
The distinction isn't down to the creation of the output PDF file, but the PDF interpreter, which can only behave, at the moment, as either a screen device or a printer. So it only processes one set of annotations.

Related

Change a PDF's 0.00pt Lines to a Larger Size

In this PDF, the drawings on the second-to-last page apparently use a 0.00pt line width. This makes them almost unreadable on-screen, and completely invisible when printed.
Is there a relatively painless way to change these "no width" lines to have some width? There are lots of small details, so converting to image will not retain enough detail unless an outlandish resolution is used... then the "no width" issue re-emerges.
I've installed GhostScript, ran pdf2ps in.pdf med.ps then ps2pdf med.ps out.pdf and the line weights are exactly the same. Next, I opened med.ps in a text editor, hoping I could make a python script "find and replace" these zero line widths, but I'm seeing nothing like "0 w" in the file. Perhaps it is defined in a macro somewhere, but I'm not seeing it.
This idea came from Change the width of all lines in a PDF programmatically and Thicken line weights when printing PDF.

Best bet is to use a tool to decompress the PDF file (eg, using MuPDF; mutool -d <in.pdf> <out.pdf> or with Ghostscript gs -sDEVICE=pdfwrite -o out.pdf -dCompressPages=false in.pdf) then use a text editor or some kind of scripting tool such as sed to look for "0 w" and replace wiith 'something else'.
PDF isn't a programming language, unlike PostScript, so you can reliably search for operator usage like this in a PDF file, trying to do the same in a PostScript file is, as beginner6789 says above, extremely hard.
If you want to then have the finak file compressed you could run the edited file through Ghostscript's pdfwrite device using something like gs -sDEVICE=pdfwrite -o final.pdf in.pdf.
You absolutely should not use Ghostscript's ps2write device to producce PostScript; the PostScript imaging model is not entirely compatible with PDF, and any PDF constructs which cannot be represented in PostScript (such as any kind of transparency) will be rendered to an image. Really, don't do this.

This could be a problem if there are a lot of different weights used and you just want to change the 0.0 width lines. If they were all 0.0 then placing this early in the page could work unless the postscript looks in the system dictionaries for the command:
/setlinewidth {pop} def
The default linewidth for my ghostscript is 1.0 so that should be used automatically instead of the 0.0 linewidth.
The pdf2ps usually has a lot of pdf style dictionaries so finding the code used for setlinewidth can be confusing. The setlinewidth must be there someplace. Some people like to read postscript.
Pdf files aren't really meant to be edited so I use these options to make reading the final pdf easier: -dCompressPages=false -dCompressStreams=false just in case there is some useful information to look at in the pdf.
EDIT: depending on the code used to create the original postscript there might be labels like this:
dup/LW//knownget exec{
setlinewidth
}if
/w/setlinewidth load def
So there could be LW or w used for setlinewidth like this simple example. Most are not this simple.
EDIT2: There is some good info here:
How to change the width of lines in a PDF/PostScript file

How do you create a PDF for Kindle Direct Publishing with wkhtmltopdf

Kindle Direct Publishing AKA Create Space, wants a PDF/X-1a in 6x9 format with 0.25" outside margins and 0.375" inside/gutter margins, which I need help with, since Qt PDF generator does not do inside and out, so I have to set them both for the largest, and I need to know if my css effects this, if so how, but setting --margin-left .375in --margin-right .375in gives me this error: Currently all margin units must be the same, not sure what that means, why have a left and right if they must both be the same, and does this really have to apply to top and bottom, what is the thinking, so I added it to top and bottom just to make the file, but it is not what I wanted for margins, I wonder if gs can fix this?
If so how.
I know that wkhtmltopdf currently only creates PDF version 1.4, and Kindle does not seem to mind that much on upload, I do not have a published upload yet, so I hope someone has and knows this from experience, because I do not know if they will accept that yet, so I also use Ghost Script to convert that to PDF version 1.7, this is what I have currently:
PDF_Combine is a bash array of files:
PDF_Combine=("file1.html" "file2.html");
Update: Now KDP wants .875in margins, on both sides my content is real small, how dose CSS effect Margins in a PDF, can I set the Margins to 0 in wkhtmltopdf and adjust them in my CSS, if so how, in the body?
wkhtmltopdf --margin-left .375in --margin-right .375in --margin-bottom .375in --margin-top .375in --page-width 6in --page-height 9in --load-error-handling ignore --javascript-delay 3333 --enable-forms --footer-center "[page]/[topage]" "${PDF_Combine[#]}" "/MyPath/MyFileName.pdf"
The Ghost Script:
gs -dPDFA -dBATCH -dNOPAUSE -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile="/MyPath/MyOutPutFileName.pdf" "/MyPath/MyInPutFileName.pdf"

PDF/X-1a is a highly specific restricted set of PDF features; for instance you cannot use the RGB colour model when producing a PDF/X-1a
You also need specific keys to be present in the Info dictionary of the PDF file.
Ghostscript does not create PDF/X-1a files. It can create PDF/X-3 files, but that's no good to you.
You can process a PDF file using Ghostscript to add white space around the page, what you need to do is specify a larger media size, and tell Ghostscript its a fixed size, so the PDF file can't change it. Then you need to offset the content up and right on the new media (because otherwise the content will be rendered in the bottom left corner). Note; its not clear to me if you want to increase the media size, or reduce the content so that it fits into the required media, but has margins.
See the answer and comments on the question here.

Ghostscript loses emdash characters and replaces with hyphens

When I run a PDF which was originally created with LibreOffice on Linux, through ghostscript 9.19 on OSX, to produce another (flattened) PDF, the output is perfect except for one problem. All emdashes in the entire document have been replaced with a standard hyphen (awkwardly followed by half of a space.) Oddly enough, if I highlight the resulting "hyphen+space", my context menu shows that I've selected an emdash, so the underlying text is still an emdash, it is just rendering the wrong glyph.
I can reproduce this on multiple documents from the same source, and I'm assuming there's a setting or switch somewhere that can help resolve this.
I don't know whether the font used makes a difference, but for the sake of reference, the body text of my document is set in Arno Pro. When I use a modern version of LibreOffice on OS X to make a sample document also containing an emdash in Arno Pro, the same problem is not exhibited, so it seems to be specific to the software which originally made these PDF files.
These PDFs are of legacy projects that I am not set-up to re-produce at this time, so I need to prepare them for reprinting using the existing files.
How do I retain emdash glyphs when running a command such as the following?
gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite \
-sColorConversionStrategy=/LeaveColorUnchanged \
-dAutoFilterColorImages=true -dAutoFilterGrayImages=true \
-sOutputFile=output.pdf input.pdf
I can add an example of the input PDF to this question if needed.

Without seeing the PDF file it isn't possible to give you an answer. Most likely the font isn't embedded, or if it is embedded doesn't have an emdash glyph.
Copy and paste uses the ToUnicode CMap, so it isn't dependent on the font. Its simply a list of character codes and the Unicode code point associated with each, when using a given font.
Note that this doesn't mean 'the underlying text is still an emdash'. The ToUnicode information is utterly separate from the font end of things, it is effectively metadata and bears no real relation to the font or rendering.
Put the file on DropBox and post the URL and someone can look into it. I'll be on vacation for the next few days though, but maybe someone else will look.
Note that in PDF you don't necessarily specify characters and positions as a list of consecutive characters; you can specify the position of each individually, or you can specify widths which override the width in the font, etc. So there almost certainly is only one glyph, the 'white space' you refer to is probably just that, white space, its not another glyph.
I should also point out (I do this a lot) that Ghostscript never 'flattens', concatenates, merges, or anything similar operation on PDF files. WHen using Ghostscript and the pdfwrite device the original input (in whatever format) is fully interpreted into graphics marking operations, and sent tot eh device. The device executes the marking operations; in the case of a rendering device, it scan-converts and writes to a bitmap. In the case of pdfwrite, it creates PDF operators.
The result of this is that the output PDF file bears no relation to the input PDF, other than its visual appearance.
You also don't say which version of Ghostscript you are using....

Prevent Ghostscript from rasterizing text?

I'm trying to convert PDFs to PCL (using ghostscript, but I'd love to hear alternative suggestions), and every driver (ghostscript device), including all of the built-ins and gutenprint generate PCL files many times larger than the input PDF. (This is the problem - I need my PCL to be about as small as the input).
Given that the text doesn't show up in the PCL file, I guess that Ghostscript is rasterizing the text. Is there a way to prevent GS generally, or just gutenprint, from doing that? I'd rather either have it embed the fonts, or not even embed the fonts (leave it to the printer to render the fonts)?
Unfortunately, there doesn't seem to be any documentation on this point.

There are 3 (I think) types of font in PCL. There are rendered bitmaps, TrueType fonts (in later versions) and the HPGL stick font.
PDF and PostScript Have type 1, 2 (CFF), 3 and 42 (TrueType, but not the same as PCL) and CIDFonts based on any of the preceding types.
The only font type the two have in common is TrueType, so in order to retain text, any font which was not TrueType would have top be converted into TrueType. This is not a simple task. So Ghostscript simply renders the text, which is guaranteed to work.
PDF is, in general, a much richer format than PCL< there are many PDF constructs (fonts, shading, stroke/fill in a single operation, transparency) which cannot be represented in PCL. So its entirely possible that the increase in size is nothing to do with text and fonts.
In fact, I believe that the PXL drivers in Ghostscript simply render the entire page to a bitmap at the required resolution, and then wrap that up with enough PCL to be successfully sent to a printer. (I could be mistaken on this point though)
Basically, you are not going to get PCL of a similar size to your PDF out of Ghostscript.

Here is a way to 'prevent Ghostscript from rasterizing text'. But its output will be PostScript. You may however succeed convert this PostScript to a PCL5e in an additional step.
The method will convert all glyphs into outline shapes for its PostScript output, and it does not work for its PDF or PCL output. The key here is the -dNOCACHE parameter:
gs -o somepdf.ps -dNOCACHE -sDEVICE=pswrite somepdf.pdf
Of course, converting font glyphs to outlines will take more space than keeping the original fonts embedded, because "fonts" are a space-optimized concept to store, retrieve and render glyph shapes.
Once you have this PostScript, you may be able to convert it to PCL5e with the help of either of the methods you tried before for PDF input (including {Apache?} FOP).
However, I have no idea if the output will be much smaller than versions with rasterized fonts (or even wholesome rasterized pages). But it may be worth a test.
Now vote down this answer too...
Update
Apparently, from version 9.15 (to be released during September/October 2014), Ghostscript will support a new command line parameter:
-dNoOutputFonts
which will cause the output devices pdfwrite, ps2write and eps2write to "to 'flatten' glyphs into 'basic' marking operations (rather than writing fonts to the output)".
That means that the above command should be replaced by this:
gs -o somepdf.ps -dNoOutputFonts -sDEVICE=ps2write somepdf.pdf
Caveats: I've tested this with a few input files using a self-compiled Ghostscript based on current Git sources. It worked flawlessly in each case.

Obey the MediaBox/CropBox in PDF when using Ghostscript to render a PDF to a PNG

I've been using Ghostscript to convert my single figure plots rendered in PDF to PNG:
gswin32c -sDEVICE=png16m -r300x300 -sOutputFile=junk.png ^
-dBATCH -dNOPAUSE Figure_001-a.pdf
This works in the sense I get a PNG out and it contains the plot.
But it contains a huge amount of white space as well (an example source image: http://cdsweb.cern.ch/record/1258681/files/Figure_001-a.pdf).
If you view it in Acrobat you'll note there is no white space around the plot. If you use the above command line you'll find the plot is only about 1/3 of the space.
When doing the same thing with an EPS file I run into the same problem. However, there is the command-line parameter -dEPSCrop that one can pass to get the PS rendering engine to pay attention to the BoundingBox.
I need the similar argument for rendering PDFs. I was not able to find it in docs (nor even the -dEPSCrop, actually).

I had exactly the same issue. I fixed it by adding -dUseArtBox switch.
Example:
/usr/bin/gs -dUseArtBox -dNOPAUSE -sDEVICE=pngalpha -sOutputFile=output.png input.pdf
Note: -dUseArtBox switch is supported since ghostscript version 9.07
-dUseArtBox
Sets the page size to the ArtBox rather than the MediaBox. The art box defines the extent of the page's meaningful content (including potential white space) as intended by the page's creator. The art box is likely to be the smallest box. It can be useful when one wants to crop the page as much as possible without losing the content.

There are various options to control which "media size" Ghostscript renders a given input:
-dPDFFitPage
-dUseTrimBox
-dUseCropBox
With PDFFitPage Ghostscript will render to the current page device size (usually the default page size).
With UseTrimBox it will use the TrimBox (and it will at the same time set the PageSize to that value).
With UseCropBox it will use the CropBox (and it will at the same time set the PageSize to that value).
By default (give no parameter), Ghostscript will render using the MediaBox.
For your example, it looks like adding "-dUseCropBox" will do the job you're expecting.
Note, you can additionally control the overall size of your output by using "-sPAPERSIZE" (select amongst all pre-defined values Ghostscript knows) or (for more flexibility) use "-dDEVICEWIDTHPOINTS=NNN -dDEVICEHEIGHTPOINTS=NNN".

Have you tried using pdfcrop using pdftex (comes with texlive for example) or (not tried yet) the python script pdfcrop?
I have a similar workflow using the first tool mentioned.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas