Ghostscript convert cmyk-PDF to rgb-PNG and preserving correct colors - pdf

I want to convert a CMYK-PDF into a rgb-PNG using ghostscript.
This is what I use so far:
gs -sDEVICE=pngalpha -dBATCH -dNOPAUSE -dCompatibilityLevel=1.4 -dColorConversionStrategy=/sRGB -dProcessColorModel=/DeviceRGB -dUseCIEColor=true -sOutputFile=out.png -r300 pdf/input.pdf
The issue is however that the color is not converted accurately. I'm using a Mac OS Catalina MacBook and when I look at the original PDF in the built in preview tool the color is not as saturated as the converted color.
So my question is what am I doing wrong with this? I'm not an expert with color management.
Thanks!

Well, clearly the colours cannot be the 'same' because one is expressed in a subtractive 4 colour model, and the other in an additive 3 colour model.
Firstly you are using three command line switches which are not going to have any effect at all:
-dCompatibilityLevel is only used for the pdfwrite device to set the maximum version of PDF support, it has no effect on any other device and certainly won't affect the PNG output devices.
-sColorConversionStrategy only works with the pdfwrite device, which is capable of handling nearly the full range of colour models. Most devices, such as PNG only support a single colour model, so you don't need to specify the colour conversion, it all has to be converted to that colour space.
You should never set -dProcessColorModel. That has to remain correct for the device; in the case of high level devices it is ignored, or set appropriately to the same as ColorConversionStrategy.
Finally and most importantly; you have set -dUseCIEColor. This is an ancient PostScript colour management control; it causes all colours to be converted into CIE spaces, either CIEBasedA, CIEBasedABC, CIEBasedDEF or CIEBasedDEFG. From there the colour is converted to the destination space. Using this will pretty much break the ICC profile colour management in Ghostscript.
So drop all four of those and try again. Note that when you compare the PNG result with the PDF you are comparing the colour conversion performed by Ghostscript with the colour conversion (CMYK->RGB) performed by the PDF consumer, presumably the built-in Mac Quartz code. I'd have to express some reservations about the quality of that conversion.
It is entirely possible for you to control the colour management performed by Ghostscript. There are default RGB and CMYK profiles which are used to convert the CMYK components into the CIE XYZ calibrated space, and then from there to RGB. Each of these uses on ICC profile. If you don't like the default ones you can substitute another one of your choosing for either or both. The ICC profiles can be found in the ghostpdl/iccprofiles directory, you can use -sDefaultCMYKProfile=<...path...> to specify a different profile to use to convert CMYK into CIE-XYZ and -sOutputICCProfile=<...path...> to specify a different ICC profile to use for the final space (RGB in your case).
For properly colour managed workflow you should know the characteristics of the specified input colour model (eg SWOP CMYK) and use the correct ICC profile to map from CMYK to CIE-XYZ and you should know the characteristics of the output colour model (eg Sony Trinitron, to use an old monitor example) to create the closest matching output colour.
You may have an ICC profile for your monitor, I doubt you know what the CMYK values in the original PDF file represent. To match whatever PDF consumer you are using you would need to know what CMYK and RGB output profiles it is using and use the same ones in the Ghostscript rendering process.
NB all the above assumes that the original PDF, which is not supplied, actually specifies the colours in CMYK, and not in an ICCBased colour space, or other similar device-independent colour space.
Edit
From the comment:
Following your argumentation it should be identical
No. In the case of the PDF consumer it is doing a CMYK->RGB conversion in order to display the content. When rendering to PNG Ghostscript is also doing a CMYK->RGB conversion. Actually because you are using -dUseCIEColor it's doing a CMYK->CIEBasedDEFG->RGB conversion, but lets assume you dropped that so it's just doing CMYK->RGB.
Now, if the two conversions are fully colour managed, lets assume for now ICC profiles as the colour management technique, and the two conversions are using the same ICC profiles, that is assuming the same characterised space, then yes the result will be identical.
Without seeing your PDF file I can't tell how the colours are actually specified within it. You say they are 'CMYK' but CMYK is not a characterised space. There are many different CMYK inks and they are printed to many different kinds of paper with different reflectivity and absorbency. So SWOP and Euroscale are both CMYK printing processes, but their characteristics are different.
So if we treat your CMYK values as SWOP and convert them to an RGB space we should expect different RGB values than when those same CMYK values as if they were Euroscale. That's because the same CMYK quad printed to a SWOP process would be different to the appearance on a Euroscale process.
Similarly when it comes to creating the RGB values. RGB is also not a characterised space, there are many different RGB output devices and they differ in how they display a given RGB triplet.
Now I don't know how your PDF consumer does the CMYK->RGB conversion. I'd like to think it uses ICC profiles to characterise the spaces but it might not. There's a long standing (quick and dirty) conversion method from the PostScript Language Reference which it might use instead.
However, a modern colour managed workflow would use ICC profiles.
When dealing with an uncharacterised space such as 'CMYK' or 'RGB' the only thing Ghostscript can do is use a generic profile. It uses a general purpose CMYK profile to convert the incoming CMYK into the CIE XYZ space (which is device-independent) and then a generic RGB profile to convert the CIE XYZ components into RGB. You can change the assumptions about the input and output colour spaces.
In effect you can say 'I happen to know that the CMYK values were intended for an HP Indigo, so use the HP Indigo ICC profile' and that will then map the CMYK into XYZ as the original creator intended. Similarly you can say 'I'm using a Sony wide gamut RGB monitor, so use that ICC profile' and that will give the best possible representation of the XYZ colours on that device.
But if tomorrow you viewed it on a low end Iiyama monitor you could tell it to use a different appropriate profile, and you would see (as far as possible) the same colours as on the expensive device.
So to try and summarise; the problem is that you are using uncharacterised spaces. The two consumers are not set up to use the same default colour management paths, so you can expect to see differences. To avoid this you need to use the same profiles on both PDF consumers (preview and Ghostscript).
I can't remember if PNG allows you to save an ICC profile in the file. If it does then you can carry the PNG to another computer with a different monitor and it will still look the same. If it doesn't (and I think it doesn't) then viewing the RGB output on different monitors will look different.

Related

How can I convert a PDF to CMYK with different kinds of black?

I have a PDF that I created in Inkscape. I would like to convert it to CMYK for a printing company, but I have one issue - I want to convert some blacks to "rich black" (20/20/20/100) and others to "flat black" (0/0/0/100). I have 60 cards and would like to automate this rather than having to do it manually for each file. How can I do this?
Specifically, the card has text and design elements on it that shouldn't be rich black because they're too small and detailed, but the printing company suggests that since the border is solid black that part should be rich black. I've managed to convert my PDF to CMYK by following the instructions here but all the blacks seem to be rich.
Well firstly you haven't said what criteris you want to use to differentiate between your types of black, how do you propose to instruct the application which objects drawn in 'black' should be converted to pure K, and which ones should be a mixture of CMYK ?
You don't say what the input colour space is either.
Ghostscript's pdfwrite device does not distinguish between object type, you can (with effort) use a customised ICC profile to create a pure K output from a given input space and colour value but there's no way to say "I want to use this profile for this bit of the file, but this other profile for this other bit of the file".
Also, this isn't (as written) a programming question.

Convert PDF to PS in Ghostscript and preserve CMYK split

I have an RGB PDF which I've preflighted in Adobe Acrobat pro to a PDF x1a compliant PDF in US Web Coated SWOP v2.
The PDF now has 4 plates (C/M/Y/K)
C plate is empty
M plate has 100% of a red image
Y plate has 100% of the same red image
K plate has 100% of black text on page (text is not on any other plate)
I'm now trying to convert that PDF into a PS using ghostscript
I've tried:
gs -dNOPAUSE -dBATCH -sDEVICE=ps2write -sProcessColorModel=DeviceCMYK -sOutputFile=output.ps input.pdf
But then when I distill this PS back to a PDF the text is on all the plates and not just the K plate.
I've used this online tool:
http://pdf.my-addr.com/free-online-pdf-to-ps-convert.php
To also do the conversion and the distilled version of the PS generated by that preserves the plate breakdown. They are also using Ghostscript to create the PS.
So I'm assuming there is some setting I am missing.
Does anyone know?
Update 1
Trying in pdftops too and again it is taking my K plate and spreading it across all CMYK plates.
What secret magic are they doing on that web site to preserve plates?!
Update 2
Only main difference I can see is I'm using
%%Creator: GPL Ghostscript 905 (pswrite)
and that website is using
%%Creator: GPL Ghostscript 871 (pswrite)
Could it be a version thing, or are they doing something I'm not?
Ghostscript 9 and above use much better colour management than previous versions, but you have to get the ICC profiles correct. I'd guess you are using the default profiles and I think the first thing I'd suggest is that you use the current version of Ghostscript which is 9.07, I think there were a few changes made to the default profiles.
Its also possible that the PDF file now has an input RGB profile associated with it, which Ghostscript is now using whereas previously it didn't. I'd need to see the file to be able to tell better what is going on, but I have a sneaky suspicion that your 'pre-flight' conversion is causing the problem. What happens if you use the original PDF file ?
I very much doubt if the PDF file actually contains CMYK colour components, I would imagine all that has happened is that different profiles have been inserted into the file that control the conversion from RGB to CIE and from CIE to CMYK.
In passing, don't use pswrite. Its a terrible low-level output that converts much of the content into images. It produces large PostScript that processes very slowly and doesn't scale well (ie if the printer is a different resolution). Use the ps2write device instead.
By the way, since you've already used Acrobat, why don't you just use 'Save As' PostScript from there ?

Quality degradation of a text pdf after pdf>png>pdf

I have a very specific requirement where i must automatically stamp every page of a PDF file (for a faxing application), so here's the process i've made:
step 1: Convert PDF to PNG, one png file per page
cmd1: gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -r400 -sOutputFile=image_raw.png input.pdf
cmd2: mogrify -resize 31.245% image_raw.png
input.pdf (input): https://www.dropbox.com/s/p2ajqxe99nc0h8m/input.pdf
image_raw.png (output): https://www.dropbox.com/s/4cni4w7mqnmr0t7/image_raw.png
step 2: Stamp every PNG file (using a third party tool ..)
image_stamped.png (output): https://www.dropbox.com/s/3ryiu1m9ndmqik6/image_stamped.png
step 3: Reconvert PNG files into one PDF file
cmd: convert -resize 1240x1753 -units PixelsPerInch -density 150x150 image_stamped.png output.pdf
output.pdf (output): https://www.dropbox.com/s/o9y0jp9b4pm08ci/output.pdf
The output file of the third step shal be "theoretically" the same as the input file in step 1 (plus the stamp on it) but it's not, the file is somehow blurry and it turns to be unreadeable for humans after faxing it since blurred pixels wouldnt pass through fax wires even if you may see no difference between input.pdf and output.pdf, try zooming in and you'll find that text characters are blurred on its edges.
What is the best parameters to play with at input (step 1) or output (step 3) ?
Thanks !
You are using anti-aliasing (TextAlphaBits=4). This 'smooths' the edges of text by introducing grey pixels between the black pixels of the text edges. At low resolutions (such as displays) this prevents the 'jaggies' in text and gives a more readable result. At higher resolutions its value is highly debatable.
Fax is a 1-bit monochrome medium, so the grayscale values have to be recreated by dithering. As you have discovered, this is not a good idea in a limited resolution device as it leads to a loss of sharpness.
I believe that if you remove the -dTextAlphaBits=4 you will see an immediate improvement. I would also suggest that you remove the GraphicsAlphaBits as well, since this will have the same effect on linework.
If you believe that you still want anti-aliasing you could try reducing the aggressiveness, you currnetly have it set to 4, try reducing it to 2.
Regarding the other comments;
Kurt is quite correct, as is fourat, and I'm afraid MarcB is mistaken, the -r400 sets the resolution for rendering, in dots per inch. If only one number is given it is used for both x and y resolution. It is possible to produce a fixed size raster using Ghostscript, but you use the -dFIXEDMEDIA with -sPAPERSIZE switches or the -g switch which also sets FIXEDMEDIA automatically.
While I do agree with yms and Kurt that converting the PDF to a bitmap format (PNG) and then back to PDF will result in a loss of quality, if the final PDF is only used for transmission via fax, it doesn't matter. The PDF must be rendered to a fax-resolution bitmap at some point in the process, its not a big problem if its done before the stamp is applied.
I don't agree with BitBank here, converting a vector representation to bitmap means rasterising it at a particular resolution. Once this is done, the resulting image cannot be rescaled without loss of quality, whereas the original vector representation can be as it is simply rendered again at a different resolution. Image in PDF refers to a bitmap, you can't have a vector bitmap. The image posted by yms clearly shows the effect of rendering a vector representation into an image.
One last caveat. I'm not familiar with the other tools being used here, but two of the command lines at least imply 'resize'. If you 'resize' a bitmap then the chances are that the tool will introduce the same kinds of artefacts (anti-aliasing) that you are having a problem with. Onceyou have created the bitmap you should not alter it at all. Its important that you create the PNG at the correct size in the first place.
And finally.....
I just checked your original PDF file and I see that the content of the page is already an image. Not only that its a DCT (JPEG) image. JPEG is a really poor choice of format for a monochrome image. Its a lossy compression format and always introduces artefacts into the image. If you open your original PDF file in Acrobat (or similar viewer) and zoom in, you can see that there are faint 'halos' around the text, you will also see that the text is already blurry.
You then render the image, quite probably at a different resolution to the original image resolution, and at the same time introduce more blurring by setting -dGraphicsAlphaBits. You then make further changes to the image data which I can't comment on. In the end you render the image again, to a monochrome bitmap. The dithering required to represent the grey pixels leads to your text being unreadable.
Here are some ways to improve this:
1) Don't convert text into images like this, it instantly leads to a quality loss.
2) Don't compress monochrome images using JPEG
3) If you are going to work with images, don't keep converting them back and forth, work with the original until you are done, then make a PDF file from that, if you really must.
4) If you really insist on doing all this, don't compound the problem by using more anti-aliasing. Remove the -dGraphicsAlphaBits from the command line. You might as well remove -dTextAlphaBits as well since your files contain no text. Please read the documentation before using switches and understand what it is you are doing.
You should really think about your workflow here. Obviously we don't know what you are doing or why, so there may well be good reasons why some things are not possible, but you should try and avoid manipulating images like this. Because these are not vector, every time you make a change to the image data you are potentially losing information which cannot be recovered at a later stage. By making many such transformations (and your workflow as depicted seems to perform as many as 5 transformations from the 'original' image data) you will unavoidably lose quality.
If possible retain everything as vector data. When it is unavoidable to move to image data, create the image data as you need it to be finally used, do not transform it further.
I've had a closer look at the files you provided, see here:
So, already the first image (image_raw), the result of the mogrify resize command, is fairly blurry at 1062x1375. While the blurriness does not get worse in the second image (image_stamped) which is the result of the third-party tool, the third image (extracted from your output.pdf), i.e. the result of that convert command, is even more blurred which is due to the graphic being resized (which is something you explicitly tell it to do).
I don't know at which resolution your fax program works, but there is more quality loss still, at least due to 24 bit colors to black-and-white transformation.
If you insist on the work flow (i.e. pdf->png->stamped png->pdf->fax) you should
in the initial rasterization already use the per-inch resolution your rastered image will have in all following steps (including fax transmission),
refrain from anti-aliasing and use of alpha bits (cf. KenS' answer), and
restrict the rasterized image to the colorspace available to the fax transmission, i.e. most likely black-and-white.
PS As KenS pointed out, already the original PDF is merely a container for an image (with some blur to start with). Therefore, an alternative way to improve your workflow is to extract that image instead of rendering it, to stamp that original image and only resize it (again without anti-aliasing) when faxing.

Prevent Ghostscript from rasterizing text?

I'm trying to convert PDFs to PCL (using ghostscript, but I'd love to hear alternative suggestions), and every driver (ghostscript device), including all of the built-ins and gutenprint generate PCL files many times larger than the input PDF. (This is the problem - I need my PCL to be about as small as the input).
Given that the text doesn't show up in the PCL file, I guess that Ghostscript is rasterizing the text. Is there a way to prevent GS generally, or just gutenprint, from doing that? I'd rather either have it embed the fonts, or not even embed the fonts (leave it to the printer to render the fonts)?
Unfortunately, there doesn't seem to be any documentation on this point.
There are 3 (I think) types of font in PCL. There are rendered bitmaps, TrueType fonts (in later versions) and the HPGL stick font.
PDF and PostScript Have type 1, 2 (CFF), 3 and 42 (TrueType, but not the same as PCL) and CIDFonts based on any of the preceding types.
The only font type the two have in common is TrueType, so in order to retain text, any font which was not TrueType would have top be converted into TrueType. This is not a simple task. So Ghostscript simply renders the text, which is guaranteed to work.
PDF is, in general, a much richer format than PCL< there are many PDF constructs (fonts, shading, stroke/fill in a single operation, transparency) which cannot be represented in PCL. So its entirely possible that the increase in size is nothing to do with text and fonts.
In fact, I believe that the PXL drivers in Ghostscript simply render the entire page to a bitmap at the required resolution, and then wrap that up with enough PCL to be successfully sent to a printer. (I could be mistaken on this point though)
Basically, you are not going to get PCL of a similar size to your PDF out of Ghostscript.
Here is a way to 'prevent Ghostscript from rasterizing text'. But its output will be PostScript. You may however succeed convert this PostScript to a PCL5e in an additional step.
The method will convert all glyphs into outline shapes for its PostScript output, and it does not work for its PDF or PCL output. The key here is the -dNOCACHE parameter:
gs -o somepdf.ps -dNOCACHE -sDEVICE=pswrite somepdf.pdf
Of course, converting font glyphs to outlines will take more space than keeping the original fonts embedded, because "fonts" are a space-optimized concept to store, retrieve and render glyph shapes.
Once you have this PostScript, you may be able to convert it to PCL5e with the help of either of the methods you tried before for PDF input (including {Apache?} FOP).
However, I have no idea if the output will be much smaller than versions with rasterized fonts (or even wholesome rasterized pages). But it may be worth a test.
Now vote down this answer too...
Update
Apparently, from version 9.15 (to be released during September/October 2014), Ghostscript will support a new command line parameter:
-dNoOutputFonts
which will cause the output devices pdfwrite, ps2write and eps2write to "to 'flatten' glyphs into 'basic' marking operations (rather than writing fonts to the output)".
That means that the above command should be replaced by this:
gs -o somepdf.ps -dNoOutputFonts -sDEVICE=ps2write somepdf.pdf
Caveats: I've tested this with a few input files using a self-compiled Ghostscript based on current Git sources. It worked flawlessly in each case.

PDF "Canonicalization"

I am writing a library to generate PDF reports using prawn reports.
One of the features I wish to my gem is the ability to provide means of testing the generation of reports.
The problem is that two visually equal PDFs can have different files.
Is there a way to make sure that 2 visually equal PDF have the same bits in the file? Something like XML canonicalization.
'Visual equality' (or visual similarity': where only a small percentage of pixels is different for each page) of 2 different PDFs can occur even if the internal structure of PDF objects is very different. (Think of a page of 'text', which may use real fonts or which may use 'outline' vector graphics for each glyph's shape...)
That means this equality can only be determined by rendering the two files at the same resolution to page images and then comparing both image sets pixel by pixel. The result of the comparison could be another pixel image that shows all differing pixels as red, or, at your preference, just the number of pixels which do not agree.
A scriptable way to do this with the help of ghostscript, pdftk and ImageMagick I've described in this answer:
How to unit test a Python function that draws PDF graphics?
Alternatively, you may have a look at
diffpdf
(which is available for Linux, Unix, Mac OS X and Windows): it also can compare two PDF files visually.
[ Your literal question was this: "Is there a way to make sure that 2 visually equal PDF have the same bits in the file?" -- However, I'm not sure if you really meant it that way -- hence my above answer. Otherwise I'd have to say: If two PDF files are visually equal, just generate their respective MD5sum to determine if they have the same bits in each file... ]