Converting commented PDF with Ghostscript but without the comments - pdf

Gentlepeople,
I'm using the command line version of GhostScript for Windows to convert PDF to PNG images. However I noticed that also the annotations (such as comments, shapes, attached files - anything the user can put on top of the original PDF) were converted and appear in the image output. Is there any way to let Ghostscript ignore comments in PDF?
Your help is appreciated :-)

I had the same question. I found a setting in GhostScript which turns off comment printing (called annotations in their documentation). http://www.ghostscript.com/doc/current/Use.htm
the switch is -dShowAnnots=false which is case sensitive. For example, to convert a file to PNG (which was also what I wanted to do), you would use something like:
gswin64c -sDEVICE=png16m -sOutputFile="OutFile.png" -r300 -dShowAnnots=false "InputFile.pdf"
Using this command line format gave me exactly what I wanted: The first page of the source PDF converted to true-color PNG format without transparency, at 300 DPI, without any of the comments from the PDF.

Had this error:
BBox has zero width or height, which is not allowed.
Found this hint, but without solution: https://bugs.ghostscript.com/show_bug.cgi?id=696889
I already used
-dPreserveAnnots=false
but the error came nonetheless.
-dShowAnnots=false fixes it for me.

Related

Change a PDF's 0.00pt Lines to a Larger Size

In this PDF, the drawings on the second-to-last page apparently use a 0.00pt line width. This makes them almost unreadable on-screen, and completely invisible when printed.
Is there a relatively painless way to change these "no width" lines to have some width? There are lots of small details, so converting to image will not retain enough detail unless an outlandish resolution is used... then the "no width" issue re-emerges.
I've installed GhostScript, ran pdf2ps in.pdf med.ps then ps2pdf med.ps out.pdf and the line weights are exactly the same. Next, I opened med.ps in a text editor, hoping I could make a python script "find and replace" these zero line widths, but I'm seeing nothing like "0 w" in the file. Perhaps it is defined in a macro somewhere, but I'm not seeing it.
This idea came from Change the width of all lines in a PDF programmatically and Thicken line weights when printing PDF.
Best bet is to use a tool to decompress the PDF file (eg, using MuPDF; mutool -d <in.pdf> <out.pdf> or with Ghostscript gs -sDEVICE=pdfwrite -o out.pdf -dCompressPages=false in.pdf) then use a text editor or some kind of scripting tool such as sed to look for "0 w" and replace wiith 'something else'.
PDF isn't a programming language, unlike PostScript, so you can reliably search for operator usage like this in a PDF file, trying to do the same in a PostScript file is, as beginner6789 says above, extremely hard.
If you want to then have the finak file compressed you could run the edited file through Ghostscript's pdfwrite device using something like gs -sDEVICE=pdfwrite -o final.pdf in.pdf.
You absolutely should not use Ghostscript's ps2write device to producce PostScript; the PostScript imaging model is not entirely compatible with PDF, and any PDF constructs which cannot be represented in PostScript (such as any kind of transparency) will be rendered to an image. Really, don't do this.
This could be a problem if there are a lot of different weights used and you just want to change the 0.0 width lines. If they were all 0.0 then placing this early in the page could work unless the postscript looks in the system dictionaries for the command:
/setlinewidth {pop} def
The default linewidth for my ghostscript is 1.0 so that should be used automatically instead of the 0.0 linewidth.
The pdf2ps usually has a lot of pdf style dictionaries so finding the code used for setlinewidth can be confusing. The setlinewidth must be there someplace. Some people like to read postscript.
Pdf files aren't really meant to be edited so I use these options to make reading the final pdf easier: -dCompressPages=false -dCompressStreams=false just in case there is some useful information to look at in the pdf.
EDIT: depending on the code used to create the original postscript there might be labels like this:
dup/LW//knownget exec{
setlinewidth
}if
/w/setlinewidth load def
So there could be LW or w used for setlinewidth like this simple example. Most are not this simple.
EDIT2: There is some good info here:
How to change the width of lines in a PDF/PostScript file

GhostScript PS to PDF conversion - No Color

Referring to this post, GhostScript Conversion Font Issues, is it safe to assume that GhostScript's PS-to-PDF conversions still do not guarantee cut-&-paste text from the converted document? Because I too am getting garbled copy-&-paste results with formatted documents, although it works with plain text files.
sample Word document .DOC
printed to PostScript by MS PS Driver
converted to PDF by GhostScript
On the color issue, I am using the Microsoft PS Class Driver to print documents to PostScript format files, and then convert them to PDF format with the GhostScript v9.20 DLL (sample source and outputs attached above). The options used are as follows:
-dNOPAUSE
-dBATCH
-dSAFER
-sDEVICE=pdfwrite
-sColorConversionStrategy=/RGB
-dProcessColorModel=/DeviceRGB
However, it is converted without color. Have I missed some option?
You can never guarantee getting a PDF file with text you can cut and paste from a PostScript program. There is no guarantee that there is any ToUnicode information in the PostScript program, and without that, if the font is subset as here, then there is no way to know what the Unicode code point for a given glyph is.
Regarding colour, the PostScript file you have supplied contains no colour, so its not Ghostscript, the problem is in the way you have produced the PostScript. At a guess you have used a Printer Definition (PPD file) which is for a monochrome printer.
You might be able to improve the text by playing with the options for downloading fonts, the basic problem is that your PostScript program doesn't contain the information we need to be able to construct a ToUnicode CMap. Without that we are forced to assume that the character codes are ASCII, and in your case, because the fonts are subset, they are not ASCII.
For some reason the content of your PostScript appears to be downloading the font as bitmaps. This is ugly, doesn't scale well, and might be the source of your inability to get ToUnicode data inserted. It may also be caused by the fonts you are using, you might try some standard system fonts (if you aren't already) like TimesNewRoman.
While its great that you supplied an example to look at, I'd suggest that in future you make the example smaller, much smaller.... There's really no need for 13 pages of multiply repeated content in this case. More content means it takes more time to decipher, try and keep example files to the minimum required to demonstrate the problem.
In short, it looks like both your problems are due to the way you are (or the application) generating the PostScript.

Use Ghostscript / PostScript to convert all text colours to black within a PDF

I want to convert the white text in this PDF into black text and generate a new PDF with the changed text.
I have found this
http://www.artifex.com/files/Ghostscript_Color_Architecture.pdf
which mentions settings like -sTextICCProfile but using black_output.icc from
http://www(dot)ghostscript.com/doc/toolbin/color/icc_creator/effects/
like so:
gs -o test.pdf -sTextICCProfile=black_output.icc out.pdf
does not change the text colour to black.
Is the usage of the .icc profile incorrect? Is it even the right approach?
Is there a way to achieve this with postscript?
Example PDF
The usage of the ICCProfile is correct...
However, that usage is for rendering, it has no effect on the pdfwrite device at all (because it doesn't render the input, it turns it into a PDF file). So no, this is not the correct approach.
There is no real means to do what you want with Ghostscript. Technically its probably possible, but it wouldn't be easy. You also haven't apparently posted an example of the PDF file. Its entirely possible that the 'text' is not actually text. It may be an image, or vectors, which look like text.
There may also be transparency ivolved which would complicate the matter still further.

Pdf conversion with ghostscript undesired border cut off

Hi,
I'm using ghostscript to convert pdf of various format to png images. My pdfs are in landscape format or normal.
I'm passing to gs this command (from c#):
string CmdArguments = string.Format("-o {0}%04d.png -sDEVICE=pngalpha -r600 -g2000x2000 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -c<</Orientation 3>> setpagedevice {1}", outputfilename, inputfilename);
But I have always on every page had undesired cut off of right border.
How can I fix this issue?
Many thanks :)
If you are expecting the page to be scaled to fit the specified fixed page size, then you need to tell Ghostscript to do so, which you haven't done.
By the way <> setpagedevice isn't valid, it would also be a lot easier to understand if you would quote an actual complete string rather than the parameters to a C# method, those of us who don't grok C# might be able to understand it better. You've put a '-c' in there to treat the following as PostScript, but there's no -f to terminate PostScript processing before you reach the input filename. Frankly I'm surprised this does anything at all.
Try adding -dPDFFitPage.

Imagemagick/GhostScript conversion to jpeg/png ignores the pdf background

What I am doing is making thumbnails for pdf files (only the first page). I use imagemagick like this (simplified without the resize. It has the same problem):
convert mreji.pdf[0] test.jpg
The problem is that it just ignores my pdf's background and turns it black. It's not transparent either (if I use png instead of jpg), it's just black. I want to keep the original background color.
Here is the test pdf: http://slides.bg/website/Uploads/Temp/mreji.pdf
And the imagemagick output here: http://slides.bg/website/Uploads/Temp/mreji.jpg
Notice that the background color is replaced with black. I want to keep the original one.
I tried using GhostScript directly
gs -sDEVICE=jpeg -sOutputFile=cover.jpg -r72 mreji.pdf
Again, the same output. Maybe there is an argument to prevent that from happening?
The problem may be with the "smooth shading" objects in that PDF.
There are a lot (29) Type 2 (Axial Shading), smooth shading objects in the PDF used for the backgrounds and IIRC GhostScript has had problems with these and a number of bug fixes over the years, what version of gs are you running?
Easiest solution is to raster the background in whatever created the PDF for this purpose.
Try adding the flatten parameter:
convert mreji.pdf[0] test.jpg -flatten