Ghostscript's pdfwrite to grayscale results in wrong graylevel - pdf

I try to convert a PDF file (test.pdf, attached below) using Ghostscript (9.20 on Windows) to only use the Graylevel colorspace (not RGB or CMY):
gswin64c.exe -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dOverrideICC -dUseCIEColor -o gray.pdf -f test.pdf
The result indeed only uses gray colors:
>gswin64c.exe -o - -sDEVICE=inkcov gray.pdf
GPL Ghostscript 9.20 (2016-09-26)
Copyright (C) 2016 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
0.00000 0.00000 0.00000 0.92673 CMYK OK
(I need to use -dUseCIEColor, otherwise CMY values are >0, this is a separate problem which I havent yet solved...)
My problem: The resulting gray.pdf uses significantly different graylevels than the original test.pdf (open in your PDF viewer and compare for yourself).
Does anyone see my mistake or what I should do differently to get the same PDF but in grayscale rather than RGB colorspace?
Thank you very much!
test.pdf: https://drive.google.com/open?id=0BzjatAIrG6P3S2F5Vng4cUhUS0U
gray.pdf: https://drive.google.com/open?id=0BzjatAIrG6P3cEtTY3JaaTJCS2c

You are doing a multiple conversion, and not managing the colour space conversions at all.
Firstly you convert the original colour into a CIEBased colour space (and the space varies depending on the number of components in the original space). Since you don't specify Colour Rendering Dictionaries, this is an uncontrolled conversion, you are using the defaults.
You then embark on another conversion from CIEBased (which cannot, in general, be represented in PDF anyway, so would always result in an additional conversion) into DeviceGray. Again you haven't supplied any ICC profiles for this conversion, so you are using the default ones.
If you insist on using -dUseCIEColor (which I would very strongly advise against, controlling this is hard) then you need to supply ColorRendering Dictionaries to control the conversion from device space into CIE space, and also ICC profiles to control the subsequent conversion from CIE space into DeviceGray.
But I strongly suspect that you will get better results by not using -dUseCIEColor, just like Ghostscript tells you.

I can only guess about what you need based on source file. There's DeviceRGB 0.5/0.5/0.5 filled rectangle, and I suspect you want it to become 0.5 DeviceGray.
The solutions and speculations below will work for that and similar cases only. (E.g., I have no idea what are "CMY values" you write about, i.e. if there are DeviceCMYK or ICC-based or anything else in your files). There're simple formulas to convert between device color spaces (see PDF Reference), one of them indeed maps from equal values in DeviceRGB to same value in DeviceGray. To make it work, use GhostScript 9.10:
"C:\Program Files\gs\gs9.10\bin\gswin32c.exe" -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dUseFastColor -o test_1.pdf -f test.pdf
Note the switch -dUseFastColor. You'll get "correct" 0.5 grayscale filled rectangle.
To make it work in versions 9.10 .. 9.20 (excluding both), I had to add another switch: -dPDFUseOldCMS. Again, 0.5 grayscale filled rectangle in result.
As last switch name indicates, simple things were probably considered deprecated, and looks like were scrapped in 9.20.
Instead, new wonderful CMS engine was introduced (since 9.10). Except, it doesn't work for high-level devices (pdfwrite included). Either switched off or broken, for many releases.
I was unable to make it work for any combination of device- or ICC-based colors in source and command line options, to make it actually use the -sOutputICCProfile option, for either DeviceCMYK or DeviceGray output (or ICC-based output, whatever). Same color values in produced files.
I'd appreciate if someone indicates I'm wrong and shows an opposite example.
It worked, actually (partly -- for device source colors only), in 9.10:
"C:\Program Files\gs\gs9.10\bin\gswin32c.exe" -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -sOutputICCProfile=sgray.icc -o test_2.pdf -f test.pdf
Using different icc profiles results in different (and correct, it looks) output. To convert from equal RGB values to same Gray values one would need grayscale profile with same gamma as (default) sRGB. Just, use free ICC Profile Inspector to extract a curve from sRGB and import it into e.g. sgray.icc (distributed with Ghostscript).
The advantage of using a profile to convert RGB to Gray, preserving gamma, opposed to "simple formula" described above, may or may not be worth the effort. Check for your files and purposes.

Related

Ghostscript won't generate PDF/A with UTF16BE text string detected in DOCINFO - in spite of PDFACompatibilityPolicy saying otherwise

I am trying to convert normal PDF files to PDF/A with this command line:
gs -dPDFA -dBATCH -dNOPAUSE -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output.pdf input.pdf
However, I get the message
GPL Ghostscript 9.26: UTF16BE text string detected in DOCINFO cannot be represented in XMP for PDF/A1, reverting to normal PDF output
an gs reverts to normal PDF.
Apparently, the message stems from this code fragment of gs, but there we read that the message can occur only when pdev->PDFACompatibilityPolicy == 0. My understanding was that the parameter -sPDFACompatibilityPolicy=1 in the command line has the purpose of preventing this.
Q: Why does gs behave as if the desired policy were 0 instead of 1? Is there another way to set the policy to 1?
Also, just as it makes me curious:
Q: Is there a way to see what kind of strange DOCINFO there is causing the original problem or to prevent it in the first place? Using Acrobat Reader, I cannot see anything "suspicuous" in the file. If it helps: The input.pdf is generated on Window from Word (and I tried even with the UseISO19005-1 setting, which should produce PDF/A to begin with, but the problem occurs anyway).
You have put -sPDFACompatibilityPolicy=1. That, I'm afraid, is incorrect. Ghostscript has two kinds of switches -s which deals with string values, and -d which deals with numeric and name values (names in PostScript begin with '/').
You've assigned a string value of '1' to the parameter PDFACompatbilityPolicy, which (internally) expects a numeric value. For reasons to do with the fact that these values are required to be accessible from the PostScript environment, we can't flag the type confusion as an error. Instead we leave the actual control at its default value of 0.
If you instead set -dPDFACompatibilityPolicy=1 I expect you will see the behaviour you expect.
As for seeing the data, without looking at the PDF file I cannot tell. However, if you stop in the debugger at that point and look at p->data you will be able to see what the data is. If you look at pairs + i instead of pairs + i + 1 you will be able to see the key which is associated with the value from the DOCINFO pdfmark.
You won't be able to see anything 'suspicious' by looking at the file in Acrobat, because Acrobat will translate the UTF16BE into whatever your system requires in order to display the text correctly. It may even be that this is ASCII, you can still represent that as UTF16.
If you open the file in a text editor you may be able to see the relevant string (note that the BOM in Ghostscript is in octal, so that's 0xFE 0xFF in hexadecimal), provided its not in a compressed object stream.
Examining the source of latest ghostscript (9.50), it seems that the PDFACompatibilityPolicy values in this case (see devices/vector/gdevpdfm.c around line 1951) set the error-containing behavior as such:
0 will revert to normal PDF output (not really what I wanted)
1 will discard PDFINFO (even worse)
2 will throw an error (even even worse)
any other value is ignored in the switch and works as a pass-through!
So, in my case, the whole thing was solved simply by setting
-dPDFACompatibilityPolicy=3
Ghostscript does not complain, does not abort PDF/A output, does not discard the PDFINFO, and, most importantly, veraPDF checker still verifies the PDF as perfectly okay.
I'm not commenting on how ugly this solution is, but it works just great. Since all other switch statements just assume compatibility policy 0 if anything above 2 gets passed in, this "shortcut" seems to be an unintended, but very useful bug.
The answer of exa is not quiet correct. Ghostscript will continue its output but the resulting pdf will not conform the veraPDF validator.
At this moment im busy trying to make ghostscript work so i get a valid zugferd invoice pdf. Therefore the PDF needs to be a valid PDF/A-3(a,b or u) file.
Problem with the Answer
If you just use -dPDFACompatibilityPolicy=3 verPDF wont validate the PDF.
Instead you should fix the file with right encoding.
In my case the pdf looked like this:
How to resolve it:
Create a new file (example "pdfmarks") with this content:
[ /Title (Foo Title)
/Author (Foo Bar)
/Subject (Foo Bar Subject)
/Keywords ()
/ModDate (D:20061204092842)
/CreationDate (D:20061204092842)
/Creator (Foo Bar)
/Producer (Foo Bar)
/DOCINFO pdfmark
(There's no ending square brackets ']')
Run gs like this:
Windows:
"C:\Program Files\gs\gs9.53.3\bin\gswin64c.exe" -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=/path/to/output.pdf /path/to/input.pdf /path/to/pdfmarks
Linux:
gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=/path/to/output.pdf /path/to/input.pdf /path/to/pdfmarks
You can either include your stuff or call gs a second time.
I hope i could safe you guys some time with this.

ghostscript: convert PDF into CMYK preserving pure Black for text

I need to convert RGB PDF into CMYK PDF.
I need to have pure black color for texts.
It seems (thanks to comments below) term "black point compensation" is wrong. I took it from Adobe Acrobat where it works exactly how i need. I thought gs has same feature.
I use ghostscript 9.16
If i got it right there is -dBlackPtComp option, but it does not work for me.
Ghostscript command I have tried is:
"c:/Program Files/gs/gs9.16/bin/GSWIN64C.EXE" -o testing_black_cmyk.pdf -sColorConversionStrategy=CMYK -sDEVICE=pdfwrite -dOverrideICC=true -sOutputICCProfile=c:/Windows/System32/spool/drivers/color/JapanColor2002Newspaper.icc -dTextBlackPt=1 -dBlackPtComp=1 test2.pdf
Try this:
collink -v -G AppleRGB.icc JapanColor2002Newspaper.icc apple_to_jNP_photo.icc
collink -v -f AppleRGB.icc JapanColor2002Newspaper.icc apple_to_jNP_neutrals.icc
control.txt:
Image_RGB apple_to_jNP_photo.icc 0 1 0
Graphic_RGB apple_to_jNP_neutrals.icc 0 1 0
Text_RGB apple_to_jNP_neutrals.icc 0 1 0
and
gswin32c -q -sDEVICE=pdfwrite -o out.pdf -sColorConversionStrategy=CMYK -sSourceObjectICC=control.txt in.pdf
Then the DeviceRGB in source PDF is converted to DeviceCMYK, and RGB 0/0/0 becomes (as I'm checking now) the DeviceGray 0, which should be OK (and all other neutral RGB shades are mapped to true grayscale, too).
The reason we are using different DL-profiles for different objects, is because, though saturated colors (far from neutrals) will be converted to the same CMYK through both profiles, nevertheless you probably don't want color suddenly switch to 0/0/0/n in continuous tone photographs, if color happens to be near neutral -- it'll look terrible on the press.
If your "images" are e.g. rasterized graphics (diagrams, etc.) with 0/0/0 RGB, then you can consider using apple_to_jNP_neutrals.icc for these images too.
If your page has a mix of both real images and rasterized graphics (text) - bad luck, you'll have to compromise.
The reason we use -G instead of fast and simple Simple Mode, is because -f (for second profile) implies the "Gamut Mapping Mode using inverse outprofile A2B", and we want 2 profiles to produce the result (for saturated colors) as close to each other as possible.
From the description on black point compensation on the Little CMS page:
"Black point compensation (BPC) is a technique used to address color
conversion problems caused by differences between the darkest levels
of black achievable on different media/devices."
In other words, BPC has nothing to do with your problem and if you want proper answers, you should remove it from this question.
If you want black to be preserved (or pure / secondary colors in general), you basically have two options you can look at:
1) Create a proper DeviceLink profile to do your conversion. This devicelink profile should take your input ICC Profile and the destination you want to convert to and should contain proper exception rules to keep black / gray / secondary / tertiary colors as required.
2) Use a color conversion engine that supports exceptions while doing regular ICC Profile conversion. Little CMS for example has an intents flag ("INTENT_PRESERVE_K_ONLY_RELATIVE_COLORIMETRIC") that can be set to instruct the engine to preserve black during conversion.

GhostScript dPDFSETTINGS shortcut

I'm very new to the GhostScript world and I was wondering what was the configurations that are setted when we write, for example, -dPDFSETTINGS=/ebook?
The problem I face is that /ebook is too low quality and /printer is to heavy. I'm searching somewhere in the middle :)
Based on the the corresponding documentation, you probably want to set ColorImageResolution like this for instance if you want to tune image resolution:
gs -sDEVICE=pdfwrite -q -dPDFSETTINGS=/printer -dColorImageResolution=150 -o out.pdf in.pdf
You need to set GrayImageResolution or MonoGrayImageResolution for non coloured images.
Be aware that setting the resolution does not work if PDFSETTINGS is unset (respectively set to /default). I don't know why.
RTFM.
/ghostpdl/gs/doc/Ps2pdf.htm#Options
Then look for the big table with all the options listed in the rows, and the PDFSETTINGS listed in the columns. More usefully read up on all the Distiller parameters and select the ones you want to use.

Ghostscript: convert PDF to EPS with embeded font rather than outlined curve

I use the following command to convert a PDF to EPS:
gswin32 -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=epswrite -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
I then use the following command to convert the EPS to another PDF (test2.pdf) to view the EPS figure.
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I found the text in the generated test2.pdf have been converted to outline curves. There is no font embedded anymore either.
Is it possible to convert PDF to EPS without convert text to outlines? I mean, to EPS with embedded font and text.
Also after the conversion (test.pdf -> test.eps -> test2.pdf), the height and width of the PDF figure (test2.pdf) is a little bit smaller than the original PDF (test.pdf):
test.pdf:
test2.pdf:
Is it possible to keep the width and height of the figure after conversion?
Here is the test.pdf: https://dl.dropboxusercontent.com/u/45318932/test.pdf
I tried KenS's suggestion:
gswin32 -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I can see the converted test2.pdf have very weird font:
that is different from the original font in test.pdf:
When I copy the text from test2.pdf, I only get a couple of symbols like:
✕ ✖ ✗✘✙ ✚✛
Here is the test2.pdf: https://dl.dropboxusercontent.com/u/45318932/test2.pdf
I was using the latest Ghostscript 9.15. So what is the problem?
I just noticed you are using epswrite, you don't want to do that. That device is terrible and has been deprecated (and removed now). Use the eps2write device instead (you will need a relatively recent version of Ghostscript).
There's nothing you can do with epswrite except throw it away, it makes terrible EPS files. It also can't make level 2 files, no matter what you set -dLanguageLevel to
oh, and don't use -dNOCACHE, that prevents fonts being processed and decomposes everything to outlines or bitmaps.
UPDATE
You set subset fonts to true. By doing so the character codes which are used are more or less random. The first glyph in the document (say for example the 'H' in 'Hello World') gets the code 1, the second one (eg 'e') gets the code 2 and so on.
If you have a ToUnicode CMap, then Acrobat and other readers can convert these character codes to Unicode code points, without that the readers have to fall back on heuristics, the final one being 'treat it as ASCII'. Because the encoding arrangement isn't ASCII, then you get gibberish. MS Windows' PostScript output can contain additional ToUnicode information, but that's not something we try to mimic in ps2write. After all, presumably you had a PDF file already....
Every time you do a conversion you run the risk of this kind of degradation, you should really try and minimise this in your workflow.
The problem is even worse in this case, the input PDF file has a TrueType CID Font. Basic language level 2 PostScript can't handle CIDFonts (IIRC this was introduced in version 2015). Since eps2write only emits basic level 2 it cannot write the font as a CIDFont. So instead it captures the glyph outlines and stores them in a type 3 font.
However, our EPS/PS output doesn't attempt to embed ToUnicode information in the PostScript (its non-standard, very few applications can make use of it and it therefore makes the files larger for little benefit). In addition CIDFonts use multiple (2 or more) bytes for the character code, so there's no way to encode the type 3 fonts as ASCII.
Fundamentally you cannot use Ghostscript to go PDF->PS->PDF and still be able to copy/paste/search text, if the input contains CIDFonts.
By the way, there's no point in setting -dLanguageLevel at all. eps2write only creates level 2 output.
I used Inkscape To convert a .pdf to .EPS. Just upload the .pdf file to Inkscape, in the options to open chose high mesh, and save as . an EPS file.

Cropped PCL after gswin PDF to PCL conversion

I have a PDF, which I want to convert to PCL
I convert PDF to PCL using the following command:
(gs 8.70)
gswin32c.exe -q -dNOPAUSE -dBATCH \
-sDEVICE=ljetplus -dDuplex=false -dTumble=false \
-sPAPERSIZE=a4 -sOutputFile="d:\doc1.pcl" \
-f"d:\doc1.pdf" -c -quit
When I view or print the output PCL, it is cropped. I would expect the output to start right at the edge of the paper (at least in the viewer).
Is there any to way get the whole output without moving the contents of the page away from the paper edge?
I tried the -dPDFFitpage option which works, but results in a scaled output.
You are using -sPAPERSIZE=a4. This causes the PCL to render for A4 sized media.
Very likely, your input PDF is made for a non-A4 size. That leaves you with 3 options:
...you either use that exact page size for the PCL too (which your printer possibly cannot handle),
...or you have to add -dPDFFitPage (as you tried, but didn't like),
...or you skip the -sPAPERSIZE=... parameter altogether (which most likely will automatically use the same as size as the PDF, and which your printer possibly cannot handle...)
Update 1:
In case ljetplus is not a hard requirement for your requested PCL format variant, you could try this:
gs -sDEVICE=pxlmono -o pxlmono.pcl a4-fo.pdf
gs -sDEVICE=pxlcolor -o pxlcolor.pcl a4-fo.pdf
Update 2:
I can confirm now that even the most recent version of Ghostscript (v9.06) cannot handle non-Letter page sizes for ljetplus output.
I'd regard this as a bug... but it could well be that it won't be fixed, even if reported at the GS bug tracker. However, the least that can be expected is that it will get documented as a known limitation for ljetplus output...