ghostscript: convert PDF into CMYK preserving pure Black for text - pdf

I need to convert RGB PDF into CMYK PDF.
I need to have pure black color for texts.
It seems (thanks to comments below) term "black point compensation" is wrong. I took it from Adobe Acrobat where it works exactly how i need. I thought gs has same feature.
I use ghostscript 9.16
If i got it right there is -dBlackPtComp option, but it does not work for me.
Ghostscript command I have tried is:
"c:/Program Files/gs/gs9.16/bin/GSWIN64C.EXE" -o testing_black_cmyk.pdf -sColorConversionStrategy=CMYK -sDEVICE=pdfwrite -dOverrideICC=true -sOutputICCProfile=c:/Windows/System32/spool/drivers/color/JapanColor2002Newspaper.icc -dTextBlackPt=1 -dBlackPtComp=1 test2.pdf

Try this:
collink -v -G AppleRGB.icc JapanColor2002Newspaper.icc apple_to_jNP_photo.icc
collink -v -f AppleRGB.icc JapanColor2002Newspaper.icc apple_to_jNP_neutrals.icc
control.txt:
Image_RGB apple_to_jNP_photo.icc 0 1 0
Graphic_RGB apple_to_jNP_neutrals.icc 0 1 0
Text_RGB apple_to_jNP_neutrals.icc 0 1 0
and
gswin32c -q -sDEVICE=pdfwrite -o out.pdf -sColorConversionStrategy=CMYK -sSourceObjectICC=control.txt in.pdf
Then the DeviceRGB in source PDF is converted to DeviceCMYK, and RGB 0/0/0 becomes (as I'm checking now) the DeviceGray 0, which should be OK (and all other neutral RGB shades are mapped to true grayscale, too).
The reason we are using different DL-profiles for different objects, is because, though saturated colors (far from neutrals) will be converted to the same CMYK through both profiles, nevertheless you probably don't want color suddenly switch to 0/0/0/n in continuous tone photographs, if color happens to be near neutral -- it'll look terrible on the press.
If your "images" are e.g. rasterized graphics (diagrams, etc.) with 0/0/0 RGB, then you can consider using apple_to_jNP_neutrals.icc for these images too.
If your page has a mix of both real images and rasterized graphics (text) - bad luck, you'll have to compromise.
The reason we use -G instead of fast and simple Simple Mode, is because -f (for second profile) implies the "Gamut Mapping Mode using inverse outprofile A2B", and we want 2 profiles to produce the result (for saturated colors) as close to each other as possible.

From the description on black point compensation on the Little CMS page:
"Black point compensation (BPC) is a technique used to address color
conversion problems caused by differences between the darkest levels
of black achievable on different media/devices."
In other words, BPC has nothing to do with your problem and if you want proper answers, you should remove it from this question.
If you want black to be preserved (or pure / secondary colors in general), you basically have two options you can look at:
1) Create a proper DeviceLink profile to do your conversion. This devicelink profile should take your input ICC Profile and the destination you want to convert to and should contain proper exception rules to keep black / gray / secondary / tertiary colors as required.
2) Use a color conversion engine that supports exceptions while doing regular ICC Profile conversion. Little CMS for example has an intents flag ("INTENT_PRESERVE_K_ONLY_RELATIVE_COLORIMETRIC") that can be set to instruct the engine to preserve black during conversion.

Related

Using Ghostscript to look only at specific area when using inkcov

I use Ghostscript to check if part of a pdf page is empty of content or not by cropping the area I'm interested in (using iTextSharp in vb.net) and then running:
gswin64c.exe -o - -sDEVICE=inkcov c:\cropped.pdf
This works well enough, but it would be much better if I could get Ghostscript to look at a specific rectangle area on the page (instead of the whole page), to remove the need for cropping first.
Is this possible?
You can use -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS along with -dFIXEDMEDIA to specify a fixed media size. You want to set that to the 'window' size you want to use.
You then need to shift the PDF content so that the portion of the page you are interested in lies under the window. To do that you need to write some PostScript code.
You need to define a BeginPage procedure, that will be executed at the start of every page, in there you need to translate the page origin.
So lets take a specific example:
Say that you have an A4 page (612x792 points) and you want to limit the rendered content to a 1x2 inch rectangle at the top right. We start by defining the media size:
-dDEVICEWIDTHPOINTS=72 -dDEVICEHEIGHTPOINTS=144 -dFIXEDMEDIA
We then need to shift the origin (0,0) which is the bottom left corner in PostScript and PDF, so that only the top right section of the content lies on the page. That means we need to shift the content 540 points to the left, and 648 points down. So our BeginPage procedure is:
/BeginPage {-540 -648 translate}
and we install that in the current page device thus:
<<
/BeginPage {-540 -648 translate}
>> setpagedevice
In order to introduce arbitrary PostScript we use the -c command (and we need to tell GS when we're done, we usually use the -f switch for that but any switch would do)
So your original command line would then be:
gswin64c -dDEVICEWIDTHPOINTS=72 -dDEVICEHEIGHTPOINTS=144 -dFIXEDMEDIA -sDEVICE=inkcov -o - -c "<</BeginPage {-540 -648 translate}>> setpagedevice" -f original.pdf
Obviously the exact numbers here are going to be up to you to determine.

Ghostscript's pdfwrite to grayscale results in wrong graylevel

I try to convert a PDF file (test.pdf, attached below) using Ghostscript (9.20 on Windows) to only use the Graylevel colorspace (not RGB or CMY):
gswin64c.exe -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dOverrideICC -dUseCIEColor -o gray.pdf -f test.pdf
The result indeed only uses gray colors:
>gswin64c.exe -o - -sDEVICE=inkcov gray.pdf
GPL Ghostscript 9.20 (2016-09-26)
Copyright (C) 2016 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
0.00000 0.00000 0.00000 0.92673 CMYK OK
(I need to use -dUseCIEColor, otherwise CMY values are >0, this is a separate problem which I havent yet solved...)
My problem: The resulting gray.pdf uses significantly different graylevels than the original test.pdf (open in your PDF viewer and compare for yourself).
Does anyone see my mistake or what I should do differently to get the same PDF but in grayscale rather than RGB colorspace?
Thank you very much!
test.pdf: https://drive.google.com/open?id=0BzjatAIrG6P3S2F5Vng4cUhUS0U
gray.pdf: https://drive.google.com/open?id=0BzjatAIrG6P3cEtTY3JaaTJCS2c
You are doing a multiple conversion, and not managing the colour space conversions at all.
Firstly you convert the original colour into a CIEBased colour space (and the space varies depending on the number of components in the original space). Since you don't specify Colour Rendering Dictionaries, this is an uncontrolled conversion, you are using the defaults.
You then embark on another conversion from CIEBased (which cannot, in general, be represented in PDF anyway, so would always result in an additional conversion) into DeviceGray. Again you haven't supplied any ICC profiles for this conversion, so you are using the default ones.
If you insist on using -dUseCIEColor (which I would very strongly advise against, controlling this is hard) then you need to supply ColorRendering Dictionaries to control the conversion from device space into CIE space, and also ICC profiles to control the subsequent conversion from CIE space into DeviceGray.
But I strongly suspect that you will get better results by not using -dUseCIEColor, just like Ghostscript tells you.
I can only guess about what you need based on source file. There's DeviceRGB 0.5/0.5/0.5 filled rectangle, and I suspect you want it to become 0.5 DeviceGray.
The solutions and speculations below will work for that and similar cases only. (E.g., I have no idea what are "CMY values" you write about, i.e. if there are DeviceCMYK or ICC-based or anything else in your files). There're simple formulas to convert between device color spaces (see PDF Reference), one of them indeed maps from equal values in DeviceRGB to same value in DeviceGray. To make it work, use GhostScript 9.10:
"C:\Program Files\gs\gs9.10\bin\gswin32c.exe" -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dUseFastColor -o test_1.pdf -f test.pdf
Note the switch -dUseFastColor. You'll get "correct" 0.5 grayscale filled rectangle.
To make it work in versions 9.10 .. 9.20 (excluding both), I had to add another switch: -dPDFUseOldCMS. Again, 0.5 grayscale filled rectangle in result.
As last switch name indicates, simple things were probably considered deprecated, and looks like were scrapped in 9.20.
Instead, new wonderful CMS engine was introduced (since 9.10). Except, it doesn't work for high-level devices (pdfwrite included). Either switched off or broken, for many releases.
I was unable to make it work for any combination of device- or ICC-based colors in source and command line options, to make it actually use the -sOutputICCProfile option, for either DeviceCMYK or DeviceGray output (or ICC-based output, whatever). Same color values in produced files.
I'd appreciate if someone indicates I'm wrong and shows an opposite example.
It worked, actually (partly -- for device source colors only), in 9.10:
"C:\Program Files\gs\gs9.10\bin\gswin32c.exe" -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -sOutputICCProfile=sgray.icc -o test_2.pdf -f test.pdf
Using different icc profiles results in different (and correct, it looks) output. To convert from equal RGB values to same Gray values one would need grayscale profile with same gamma as (default) sRGB. Just, use free ICC Profile Inspector to extract a curve from sRGB and import it into e.g. sgray.icc (distributed with Ghostscript).
The advantage of using a profile to convert RGB to Gray, preserving gamma, opposed to "simple formula" described above, may or may not be worth the effort. Check for your files and purposes.

Reverse white and black colors in a PDF

Given a black and white PDF, how do I reverse the colors such that background is black and everything else is white?
Adobe Reader does it (Preferences -> Accessibility) for viewing purposes only in the program. But does not change the document inherently such that the colors are reversed also in other PDF readers.
How to reverse colors permanently?
You can run the following Ghostscript command:
gs -o inverted.pdf \
-sDEVICE=pdfwrite \
-c "{1 exch sub}{1 exch sub}{1 exch sub}{1 exch sub} setcolortransfer" \
-f input.pdf
Acrobat will show the colors inverted.
The four identical parts {1 exch sub} are meant for CMYK color spaces and are applied to C(yan), M(agenta), Y(ellow) and (blac)K color channels in the order of appearance.
You may use only three of them -- then it is meant for RGB color spaces and is applied to R(ed), G(reen) and B(lue).
Of course you can "invent" you own transfer functions too, instead of the simple 1 exch sub one: for example {0.5 mul} will just use 50% of the original color values for each color channel.
Note: Above command will show ALL colors inverted, not just black+white!
Caveats:
Some PDF viewers won't display the inverted colors, notably Preview.app on Mac OS X, Evince, MuPDF and PDF.js (Firefox PDF Viewer) won't. But Chrome's native PDF viewer PDFium will do it, as well as Ghostscript and Adobe Reader.
It will not work with all PDFs (or for all pages of the PDF), because it is also dependent on how exactly the document's colors are defined.
Update
Command above updated with added -f parameter (required) before the input.pdf. Sorry for not noticing this flaw in my command line before. I got aware of it again only because some good soul gave it its first upvote today...
Additional update: The most recent versions of Ghostscript do not require the added -f parameter any more. Verified with v9.26 (may also be true even with v9.25 or earlier versions).
Best method would be to use "pdf2ps - Ghostscript PDF to PostScript translator", which convert the PDF to PS file.
Once PS file is created, open it with any text editor & add {1 exch sub} settransfer before first line.
Now "re-convert" the PS file back to PDF with same software used above.
If you have the Adobe PDF printer installed, you go to Print -> Adobe PDF -> Advanced... -> Output area and select the "Invert" checkbox. Your printed PDF file will then be inverted permanently.
None of the previously posted solutions worked for me so I wrote this simple bash script. It depends on pdftk and awk. Just copy the code into a file and make it executable. Then run it like:
$ /path/to/this_script.sh /path/to/mypdf.pdf
The script:
#!/bin/bash
pdftk "$1" output - uncompress | \
awk '
/^1 1 1 / {
sub(/1 1 1 /,"0 0 0 ",$0);
print;
next;
}
/^0 0 0 / {
sub(/0 0 0 /,"1 1 1 ",$0);
print;
next;
}
{ print }' | \
pdftk - output "${1/%.pdf/_inverted.pdf}" compress
This script works for me but your mileage may vary. In particular sometimes the colors are listed in the form 1.000 1.000 1.000 instead of 1 1 1. The script can easily be modified as needed. If desired, additional color conversions could be added as well.
For me, the pdf2ps -> edit -> ps2pdf solution did not work. The intermediate .ps file is inverted correctly, but the final .pdf is the same as the original. The final .pdf in the suggested gs solution was also the same as the original.
Cross Platform try MuPDF
Mutool draw -I -o out.pdf in.pdf [range of pages]
It should permanently change colours in many viewers
Later Edit
A sample file that did not reverse was one with linework only (no image) and the method needed was to save the graphics as inverted image then reuse that to build a replacement PDF, however beware converting the whole pages to image will make any searchable text just simply unsearchable pixels thus would need to be run with the OCR active on rebuild.
The two commands needed will be something like (%4d means numbers for images start output0001)
mutool draw -o output%4d.png -I input.pdf
For Linux users the folowing second pass should work easily:-
mutool convert -O compress -o output.pdf output*.png
For windows users you will for now (v1.19) need to combine by scripting or use groups
mutool convert -O compress -o output.pdf output0001.png output0002.png output0003.png
next version may include an #filelist option see https://bugs.ghostscript.com/show_bug.cgi?id=703163
This is probably just a frontend for the ghostscript command Kurt Pfeifle posted, but you could also use imagemagick with something like:
convert -density 300 -colorspace RGB -channel RGB -negate input.pdf output.pdf

Ghostscript: convert PDF to EPS with embeded font rather than outlined curve

I use the following command to convert a PDF to EPS:
gswin32 -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=epswrite -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
I then use the following command to convert the EPS to another PDF (test2.pdf) to view the EPS figure.
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I found the text in the generated test2.pdf have been converted to outline curves. There is no font embedded anymore either.
Is it possible to convert PDF to EPS without convert text to outlines? I mean, to EPS with embedded font and text.
Also after the conversion (test.pdf -> test.eps -> test2.pdf), the height and width of the PDF figure (test2.pdf) is a little bit smaller than the original PDF (test.pdf):
test.pdf:
test2.pdf:
Is it possible to keep the width and height of the figure after conversion?
Here is the test.pdf: https://dl.dropboxusercontent.com/u/45318932/test.pdf
I tried KenS's suggestion:
gswin32 -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I can see the converted test2.pdf have very weird font:
that is different from the original font in test.pdf:
When I copy the text from test2.pdf, I only get a couple of symbols like:
✕ ✖ ✗✘✙ ✚✛
Here is the test2.pdf: https://dl.dropboxusercontent.com/u/45318932/test2.pdf
I was using the latest Ghostscript 9.15. So what is the problem?
I just noticed you are using epswrite, you don't want to do that. That device is terrible and has been deprecated (and removed now). Use the eps2write device instead (you will need a relatively recent version of Ghostscript).
There's nothing you can do with epswrite except throw it away, it makes terrible EPS files. It also can't make level 2 files, no matter what you set -dLanguageLevel to
oh, and don't use -dNOCACHE, that prevents fonts being processed and decomposes everything to outlines or bitmaps.
UPDATE
You set subset fonts to true. By doing so the character codes which are used are more or less random. The first glyph in the document (say for example the 'H' in 'Hello World') gets the code 1, the second one (eg 'e') gets the code 2 and so on.
If you have a ToUnicode CMap, then Acrobat and other readers can convert these character codes to Unicode code points, without that the readers have to fall back on heuristics, the final one being 'treat it as ASCII'. Because the encoding arrangement isn't ASCII, then you get gibberish. MS Windows' PostScript output can contain additional ToUnicode information, but that's not something we try to mimic in ps2write. After all, presumably you had a PDF file already....
Every time you do a conversion you run the risk of this kind of degradation, you should really try and minimise this in your workflow.
The problem is even worse in this case, the input PDF file has a TrueType CID Font. Basic language level 2 PostScript can't handle CIDFonts (IIRC this was introduced in version 2015). Since eps2write only emits basic level 2 it cannot write the font as a CIDFont. So instead it captures the glyph outlines and stores them in a type 3 font.
However, our EPS/PS output doesn't attempt to embed ToUnicode information in the PostScript (its non-standard, very few applications can make use of it and it therefore makes the files larger for little benefit). In addition CIDFonts use multiple (2 or more) bytes for the character code, so there's no way to encode the type 3 fonts as ASCII.
Fundamentally you cannot use Ghostscript to go PDF->PS->PDF and still be able to copy/paste/search text, if the input contains CIDFonts.
By the way, there's no point in setting -dLanguageLevel at all. eps2write only creates level 2 output.
I used Inkscape To convert a .pdf to .EPS. Just upload the .pdf file to Inkscape, in the options to open chose high mesh, and save as . an EPS file.

Cropped PCL after gswin PDF to PCL conversion

I have a PDF, which I want to convert to PCL
I convert PDF to PCL using the following command:
(gs 8.70)
gswin32c.exe -q -dNOPAUSE -dBATCH \
-sDEVICE=ljetplus -dDuplex=false -dTumble=false \
-sPAPERSIZE=a4 -sOutputFile="d:\doc1.pcl" \
-f"d:\doc1.pdf" -c -quit
When I view or print the output PCL, it is cropped. I would expect the output to start right at the edge of the paper (at least in the viewer).
Is there any to way get the whole output without moving the contents of the page away from the paper edge?
I tried the -dPDFFitpage option which works, but results in a scaled output.
You are using -sPAPERSIZE=a4. This causes the PCL to render for A4 sized media.
Very likely, your input PDF is made for a non-A4 size. That leaves you with 3 options:
...you either use that exact page size for the PCL too (which your printer possibly cannot handle),
...or you have to add -dPDFFitPage (as you tried, but didn't like),
...or you skip the -sPAPERSIZE=... parameter altogether (which most likely will automatically use the same as size as the PDF, and which your printer possibly cannot handle...)
Update 1:
In case ljetplus is not a hard requirement for your requested PCL format variant, you could try this:
gs -sDEVICE=pxlmono -o pxlmono.pcl a4-fo.pdf
gs -sDEVICE=pxlcolor -o pxlcolor.pcl a4-fo.pdf
Update 2:
I can confirm now that even the most recent version of Ghostscript (v9.06) cannot handle non-Letter page sizes for ljetplus output.
I'd regard this as a bug... but it could well be that it won't be fixed, even if reported at the GS bug tracker. However, the least that can be expected is that it will get documented as a known limitation for ljetplus output...