Ghostpcl PCL to PDF conversion shading resolution - pdf

i am trying to us Ghostpcl to convert pcl files to pdf on linux. In the main, this is working well and the majority of documents are converting well. However, some documents have boxes and shading and these are not rendering well at all. The resolution is very poor and as a result any text on top of the shading is almost unreadable. Additionally some alignment is slightly out down the right hand margin.
i have also used visual software pcl2pdf which does a good job on the shading but unfortunately does not substitute all of the fonts correctly.
the pcl file can be found here
https://dl.dropboxusercontent.com/u/86110783/20170215102450_65702421.pcl
the ghostpcl converted pdf
https://dl.dropboxusercontent.com/u/86110783/ghostpcl20170215102450_65702421.pdf
the pcl2pdf pdf
The command i am using for converting the pcl to pdf is
/opt/ghostpcl/gs -sDEVICE=pdfwrite -sFONTPATH=/opt/fonts -dBATCH -dNOPAG
EPROMPT -dNOPAUSE -dQUIET -sOutputFile=$1.pdf $1.pcl
i have tried various different switches to no avail.
Any ideas would be greatly appreciated

If the pdfwrite device can't handle a graphic primitive 'as is' it will render it to an image. The default resolution is 720 dpi which is ordinarily good for most purposes, but you can alter it with the -r switch.
Note that for PCL its probably important to set the resolution to 300 or 600 dpi, as that is the only resolution PCL is defined for. The 'shading' you are talking about is, I think, a pattern, and that will only repeat properly at the precise resolution (or integer multiples thereof) for which its intended.
Even if you run at 600 dpi its probably going to look odd as you zoom in and out of the PDF file.
I'm not sure what exactly you are complaining about regarding alignment, do you mean the fact that the text doesn't fit in the box ? That will be because its using a substitute for the missing font, and the substitute has different metrics to the original.
I don't see a link to the pcl2pdf file.

Related

Change a PDF's 0.00pt Lines to a Larger Size

In this PDF, the drawings on the second-to-last page apparently use a 0.00pt line width. This makes them almost unreadable on-screen, and completely invisible when printed.
Is there a relatively painless way to change these "no width" lines to have some width? There are lots of small details, so converting to image will not retain enough detail unless an outlandish resolution is used... then the "no width" issue re-emerges.
I've installed GhostScript, ran pdf2ps in.pdf med.ps then ps2pdf med.ps out.pdf and the line weights are exactly the same. Next, I opened med.ps in a text editor, hoping I could make a python script "find and replace" these zero line widths, but I'm seeing nothing like "0 w" in the file. Perhaps it is defined in a macro somewhere, but I'm not seeing it.
This idea came from Change the width of all lines in a PDF programmatically and Thicken line weights when printing PDF.
Best bet is to use a tool to decompress the PDF file (eg, using MuPDF; mutool -d <in.pdf> <out.pdf> or with Ghostscript gs -sDEVICE=pdfwrite -o out.pdf -dCompressPages=false in.pdf) then use a text editor or some kind of scripting tool such as sed to look for "0 w" and replace wiith 'something else'.
PDF isn't a programming language, unlike PostScript, so you can reliably search for operator usage like this in a PDF file, trying to do the same in a PostScript file is, as beginner6789 says above, extremely hard.
If you want to then have the finak file compressed you could run the edited file through Ghostscript's pdfwrite device using something like gs -sDEVICE=pdfwrite -o final.pdf in.pdf.
You absolutely should not use Ghostscript's ps2write device to producce PostScript; the PostScript imaging model is not entirely compatible with PDF, and any PDF constructs which cannot be represented in PostScript (such as any kind of transparency) will be rendered to an image. Really, don't do this.
This could be a problem if there are a lot of different weights used and you just want to change the 0.0 width lines. If they were all 0.0 then placing this early in the page could work unless the postscript looks in the system dictionaries for the command:
/setlinewidth {pop} def
The default linewidth for my ghostscript is 1.0 so that should be used automatically instead of the 0.0 linewidth.
The pdf2ps usually has a lot of pdf style dictionaries so finding the code used for setlinewidth can be confusing. The setlinewidth must be there someplace. Some people like to read postscript.
Pdf files aren't really meant to be edited so I use these options to make reading the final pdf easier: -dCompressPages=false -dCompressStreams=false just in case there is some useful information to look at in the pdf.
EDIT: depending on the code used to create the original postscript there might be labels like this:
dup/LW//knownget exec{
setlinewidth
}if
/w/setlinewidth load def
So there could be LW or w used for setlinewidth like this simple example. Most are not this simple.
EDIT2: There is some good info here:
How to change the width of lines in a PDF/PostScript file

GhostScript PS to PDF conversion - No Color

Referring to this post, GhostScript Conversion Font Issues, is it safe to assume that GhostScript's PS-to-PDF conversions still do not guarantee cut-&-paste text from the converted document? Because I too am getting garbled copy-&-paste results with formatted documents, although it works with plain text files.
sample Word document .DOC
printed to PostScript by MS PS Driver
converted to PDF by GhostScript
On the color issue, I am using the Microsoft PS Class Driver to print documents to PostScript format files, and then convert them to PDF format with the GhostScript v9.20 DLL (sample source and outputs attached above). The options used are as follows:
-dNOPAUSE
-dBATCH
-dSAFER
-sDEVICE=pdfwrite
-sColorConversionStrategy=/RGB
-dProcessColorModel=/DeviceRGB
However, it is converted without color. Have I missed some option?
You can never guarantee getting a PDF file with text you can cut and paste from a PostScript program. There is no guarantee that there is any ToUnicode information in the PostScript program, and without that, if the font is subset as here, then there is no way to know what the Unicode code point for a given glyph is.
Regarding colour, the PostScript file you have supplied contains no colour, so its not Ghostscript, the problem is in the way you have produced the PostScript. At a guess you have used a Printer Definition (PPD file) which is for a monochrome printer.
You might be able to improve the text by playing with the options for downloading fonts, the basic problem is that your PostScript program doesn't contain the information we need to be able to construct a ToUnicode CMap. Without that we are forced to assume that the character codes are ASCII, and in your case, because the fonts are subset, they are not ASCII.
For some reason the content of your PostScript appears to be downloading the font as bitmaps. This is ugly, doesn't scale well, and might be the source of your inability to get ToUnicode data inserted. It may also be caused by the fonts you are using, you might try some standard system fonts (if you aren't already) like TimesNewRoman.
While its great that you supplied an example to look at, I'd suggest that in future you make the example smaller, much smaller.... There's really no need for 13 pages of multiply repeated content in this case. More content means it takes more time to decipher, try and keep example files to the minimum required to demonstrate the problem.
In short, it looks like both your problems are due to the way you are (or the application) generating the PostScript.

Ghostscript loses emdash characters and replaces with hyphens

When I run a PDF which was originally created with LibreOffice on Linux, through ghostscript 9.19 on OSX, to produce another (flattened) PDF, the output is perfect except for one problem. All emdashes in the entire document have been replaced with a standard hyphen (awkwardly followed by half of a space.) Oddly enough, if I highlight the resulting "hyphen+space", my context menu shows that I've selected an emdash, so the underlying text is still an emdash, it is just rendering the wrong glyph.
I can reproduce this on multiple documents from the same source, and I'm assuming there's a setting or switch somewhere that can help resolve this.
I don't know whether the font used makes a difference, but for the sake of reference, the body text of my document is set in Arno Pro. When I use a modern version of LibreOffice on OS X to make a sample document also containing an emdash in Arno Pro, the same problem is not exhibited, so it seems to be specific to the software which originally made these PDF files.
These PDFs are of legacy projects that I am not set-up to re-produce at this time, so I need to prepare them for reprinting using the existing files.
How do I retain emdash glyphs when running a command such as the following?
gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite \
-sColorConversionStrategy=/LeaveColorUnchanged \
-dAutoFilterColorImages=true -dAutoFilterGrayImages=true \
-sOutputFile=output.pdf input.pdf
I can add an example of the input PDF to this question if needed.
Without seeing the PDF file it isn't possible to give you an answer. Most likely the font isn't embedded, or if it is embedded doesn't have an emdash glyph.
Copy and paste uses the ToUnicode CMap, so it isn't dependent on the font. Its simply a list of character codes and the Unicode code point associated with each, when using a given font.
Note that this doesn't mean 'the underlying text is still an emdash'. The ToUnicode information is utterly separate from the font end of things, it is effectively metadata and bears no real relation to the font or rendering.
Put the file on DropBox and post the URL and someone can look into it. I'll be on vacation for the next few days though, but maybe someone else will look.
Note that in PDF you don't necessarily specify characters and positions as a list of consecutive characters; you can specify the position of each individually, or you can specify widths which override the width in the font, etc. So there almost certainly is only one glyph, the 'white space' you refer to is probably just that, white space, its not another glyph.
I should also point out (I do this a lot) that Ghostscript never 'flattens', concatenates, merges, or anything similar operation on PDF files. WHen using Ghostscript and the pdfwrite device the original input (in whatever format) is fully interpreted into graphics marking operations, and sent tot eh device. The device executes the marking operations; in the case of a rendering device, it scan-converts and writes to a bitmap. In the case of pdfwrite, it creates PDF operators.
The result of this is that the output PDF file bears no relation to the input PDF, other than its visual appearance.
You also don't say which version of Ghostscript you are using....

CMYK Overprinting and Knockout in Ghostscript

I'm trying to get a grasp on the capabilities of the current version of Ghostscript (see also this question that I asked a few days ago). So, I downloaded a "test form" for the PDF/X-4 standard from www.pdfx-ready.ch, a standards organization in Switzerland, and tried to render it... (In case anyone wants to try this, here's the direct download link: http://www.pdfx-ready.ch/files/PDFX-ready-OutputTest_PDFX4-CMYK_V301d.zip. You can find more info on this page (in German): http://www.pdfx-ready.ch/index.php?show=496)
Anyway: I was pleasantly surprised to see that most of the test fields were rendered correctly on screen. Most of the other PDF viewers that I had tried had failed miserably. Then I noticed that there were a few test cases that produced errors:
CMYK Overprint Mode (on page 1) is not respected for fonts and
vectors (it works fine for images, masks and shadings).
Rendering of Knockout Transparency Groups (on page 2) is not performed correctly.
Rendering of a few more fields (on page 4) that had to do with overprinting (Spot to CMYK, CMYK over Spot, Image Overprint etc.) failed.
So, I started experimenting... First I noticed that I still had an old version of Ghostscript installed. So, I compiled the new version 9.16 and tried again. This time, the Knockout Transparency Groups (see above) were rendered correctly. Great!
Then I read here that "the handling of overprinting and spot colors depends upon the process color model of the output device". So, instead of -sDEVICE=x11 I now tried -sDEVICE=x11cmyk. And to my surprise, the errors regarding the CMYK Overprint Mode went away. Unfortunately, the errors on page 4 remained.
What's more, I now have two new problems: First of all, the pages are now rendered in wrong colors. In fact, the white background of the test pages now appears in cyan! Also, it seems, Ghostscript is now trying to simulate some kind of ugly halftoning on screen. I read here again that "The differences in appearance of files with overprinting and spot colors caused by the differences in the color model of the output device [...] are not due to a limitation in the implementation of Ghostscript or its output devices." So, I'm assuming that I'm missing something. But what is it?
Summarizing:
Is there a way (maybe another device, a command line parameter or something) to tell Ghostscript to handle overprinting correctly? Or hasn't this been implemented yet?
What causes the cyan tinting of the white background?
Is there a way to print this correctly to an inkjet, the way it appears on screen? (lpr doesn't seem to work well.)
Thanks in advance.
UPDATE
So, I experimented a lot and read a few discussions. Also, the documentation here, which I found pretty interesting, as it says:
"Ghostscript currently provides overprint simulation for spot
colorants when rendering to the separation devices psdcmyk and
tiffsep. These devices maintain all the spot color planes and merge
these together to provide a simulated preview of what would be
printed."
Alright, this is what #KenS (see below) mentioned in a comment. But then
"It is possible to get a simulated preview of overprinting with other
CMYK devices by specifying -dSimulateOverprint = true/false In this
case, simulated overprinting is achieved through a blending of the
CMYK colorants." [p.9]
Now, I read that as saying that I can use a CMYK device (like tiff32nc) to get a simulated preview of overprinting with spot colors. Am I correct? So, after some more reading here (just in case this has anything to do with CMYK, which I doubt), I finally tried the following:
gs -dBATCH -dNOPAUSE -dSAFER
-dSimulateOverprint=true
-sDefaultCMYKProfile=ISOcoated_v2_300_eci.icc
-sOutputICCProfile=ISOcoated_v2_300_eci.icc
-sDEVICE=tiff32nc
-sOutputFile=out.tif
in.pdf
I even experimented with the options -dOverrideICC, -dRenderIntent and -sProofProfile. Nothing seems to work. What am I misunderstanding here? Is there really no way to render a non-seperated full-color preview of correctly overprinted spot colors?
UPDATE 2
So, I finally tried the tiffsep device (not really, what I would like to achieve, but interesting as a test case) and checked the five files that are produced. And there are still errors! If you would like to check, run the command
gs -dBATCH -dNOPAUSE -dSAFER
-sDEVICE=tiffsep
-dFirstPage=4
-dLastPage=4
-sOutputFile=page4.tif
PDFX-ready_Output-Test_301d_X4.pdf
over the aforementioned PDF/X-4 document. Then examine, e.g., the third test field in the first row in the left column (page 4).
So, I really don't know what to make of this. Does that mean that Ghostscript can't handle overprinting with spot colors at all - contrary to what the documentation says? Is that a bug? Or do I have the command wrong? Am I missing anything?
First answer is stop trying to use the X11 device, its an RGB device and not hugely well supported. In order to do X11CMYK the input must be rendered to CMYK then post filtered to RGB. Its not a good solution.
Overprinting is only defined for CMYK process colours (and spots), any other colour model will not perform overprinting. So I would suggest you render to TIFF or JPEG devices using their CMYK variants.
Spot colours are even more complex, if the device does not support the requested spot colour then it uses the tint transform to convert into the defined alternate colour space. If tint transformation takes place the spot is not overprinted.
Since the display devices cannot support spot colours, you can't preview spot overprinting using a display device. If you want to do this you should use the tiffsep device.
If you believe you have found a bug in Ghostscript, then please report it as such, but you will have to report it against a CMYK device, and I'll say now that we won't be very active with bugs in the X11 CMYK device, its practically unused.
Printing to an inkjet device depends on the printing workflow, and I have no idea what you are using for that. If its CUPS (and I'm guessing solely based on the fact that you are using an X11 device) then this 'should' just work. But it depends on the complete end-to-end print process, and I have no idea what it is you are doing.
Again note that spot colours will not be available on a CMYK printer, so overprinting spots is probably not going to work the way you expect.
I may be very late to the party but this works for me:
gs -dBATCH -dNOPAUSE -dSimulateOverprint=true \
-sDEVICE=jpegcmyk -sOUTPUTFILE=overprint.jpg overprint.pdf

Prevent Ghostscript from rasterizing text?

I'm trying to convert PDFs to PCL (using ghostscript, but I'd love to hear alternative suggestions), and every driver (ghostscript device), including all of the built-ins and gutenprint generate PCL files many times larger than the input PDF. (This is the problem - I need my PCL to be about as small as the input).
Given that the text doesn't show up in the PCL file, I guess that Ghostscript is rasterizing the text. Is there a way to prevent GS generally, or just gutenprint, from doing that? I'd rather either have it embed the fonts, or not even embed the fonts (leave it to the printer to render the fonts)?
Unfortunately, there doesn't seem to be any documentation on this point.
There are 3 (I think) types of font in PCL. There are rendered bitmaps, TrueType fonts (in later versions) and the HPGL stick font.
PDF and PostScript Have type 1, 2 (CFF), 3 and 42 (TrueType, but not the same as PCL) and CIDFonts based on any of the preceding types.
The only font type the two have in common is TrueType, so in order to retain text, any font which was not TrueType would have top be converted into TrueType. This is not a simple task. So Ghostscript simply renders the text, which is guaranteed to work.
PDF is, in general, a much richer format than PCL< there are many PDF constructs (fonts, shading, stroke/fill in a single operation, transparency) which cannot be represented in PCL. So its entirely possible that the increase in size is nothing to do with text and fonts.
In fact, I believe that the PXL drivers in Ghostscript simply render the entire page to a bitmap at the required resolution, and then wrap that up with enough PCL to be successfully sent to a printer. (I could be mistaken on this point though)
Basically, you are not going to get PCL of a similar size to your PDF out of Ghostscript.
Here is a way to 'prevent Ghostscript from rasterizing text'. But its output will be PostScript. You may however succeed convert this PostScript to a PCL5e in an additional step.
The method will convert all glyphs into outline shapes for its PostScript output, and it does not work for its PDF or PCL output. The key here is the -dNOCACHE parameter:
gs -o somepdf.ps -dNOCACHE -sDEVICE=pswrite somepdf.pdf
Of course, converting font glyphs to outlines will take more space than keeping the original fonts embedded, because "fonts" are a space-optimized concept to store, retrieve and render glyph shapes.
Once you have this PostScript, you may be able to convert it to PCL5e with the help of either of the methods you tried before for PDF input (including {Apache?} FOP).
However, I have no idea if the output will be much smaller than versions with rasterized fonts (or even wholesome rasterized pages). But it may be worth a test.
Now vote down this answer too...
Update
Apparently, from version 9.15 (to be released during September/October 2014), Ghostscript will support a new command line parameter:
-dNoOutputFonts
which will cause the output devices pdfwrite, ps2write and eps2write to "to 'flatten' glyphs into 'basic' marking operations (rather than writing fonts to the output)".
That means that the above command should be replaced by this:
gs -o somepdf.ps -dNoOutputFonts -sDEVICE=ps2write somepdf.pdf
Caveats: I've tested this with a few input files using a self-compiled Ghostscript based on current Git sources. It worked flawlessly in each case.