GhostScript dPDFSETTINGS shortcut - pdf

I'm very new to the GhostScript world and I was wondering what was the configurations that are setted when we write, for example, -dPDFSETTINGS=/ebook?
The problem I face is that /ebook is too low quality and /printer is to heavy. I'm searching somewhere in the middle :)

Based on the the corresponding documentation, you probably want to set ColorImageResolution like this for instance if you want to tune image resolution:
gs -sDEVICE=pdfwrite -q -dPDFSETTINGS=/printer -dColorImageResolution=150 -o out.pdf in.pdf
You need to set GrayImageResolution or MonoGrayImageResolution for non coloured images.
Be aware that setting the resolution does not work if PDFSETTINGS is unset (respectively set to /default). I don't know why.

RTFM.
/ghostpdl/gs/doc/Ps2pdf.htm#Options
Then look for the big table with all the options listed in the rows, and the PDFSETTINGS listed in the columns. More usefully read up on all the Distiller parameters and select the ones you want to use.

Related

How to identify the pdf object in raw pdf file?

I want to remove certain objects using programs.
Using cpdf I can get the objects, if I can somehow identify the objects that I want to delete, then I should be able to modify pdf files with programs.
$ cpdf in.pdf -output-json -output-json-parse-content-streams -o out.json
$ cpdf -j out.json -o out.pdf
However, I can not find out the object corresponding to my target text. For example, text search does not work on a raw pdf file. What is the best way to identify the target object of a text?
EDIT: Here is a test pdf. Please remove XYZ from the top of each page. Note that the test is a significant simplification of the real pdf file. So the solution should not be so simple so that it can not be applied to real complicated pdf files.
curl -s https://i.stack.imgur.com/whsnm.gif | tail -c +43 > test.pdf
The output of cpdf -output-json -output-json-parse-content-streams may or may not contain text which is recognisable to you. This depends on the font encodings in use, and the way in which text is layed out. In your file, for example, the painting of the string "XYZ" is represented as
[ "\u0000;\u0000<\u0000=", "Tj" ]
This is a string representing three codepoints indexing into the font. Cpdf presently has no way to show you what actual text this corresponds to; a future version will.
So I don't think your task can be done via cpdf -output-json in the general case, or indeed in this specific case.

Ghostscript's pdfwrite to grayscale results in wrong graylevel

I try to convert a PDF file (test.pdf, attached below) using Ghostscript (9.20 on Windows) to only use the Graylevel colorspace (not RGB or CMY):
gswin64c.exe -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dOverrideICC -dUseCIEColor -o gray.pdf -f test.pdf
The result indeed only uses gray colors:
>gswin64c.exe -o - -sDEVICE=inkcov gray.pdf
GPL Ghostscript 9.20 (2016-09-26)
Copyright (C) 2016 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
0.00000 0.00000 0.00000 0.92673 CMYK OK
(I need to use -dUseCIEColor, otherwise CMY values are >0, this is a separate problem which I havent yet solved...)
My problem: The resulting gray.pdf uses significantly different graylevels than the original test.pdf (open in your PDF viewer and compare for yourself).
Does anyone see my mistake or what I should do differently to get the same PDF but in grayscale rather than RGB colorspace?
Thank you very much!
test.pdf: https://drive.google.com/open?id=0BzjatAIrG6P3S2F5Vng4cUhUS0U
gray.pdf: https://drive.google.com/open?id=0BzjatAIrG6P3cEtTY3JaaTJCS2c
You are doing a multiple conversion, and not managing the colour space conversions at all.
Firstly you convert the original colour into a CIEBased colour space (and the space varies depending on the number of components in the original space). Since you don't specify Colour Rendering Dictionaries, this is an uncontrolled conversion, you are using the defaults.
You then embark on another conversion from CIEBased (which cannot, in general, be represented in PDF anyway, so would always result in an additional conversion) into DeviceGray. Again you haven't supplied any ICC profiles for this conversion, so you are using the default ones.
If you insist on using -dUseCIEColor (which I would very strongly advise against, controlling this is hard) then you need to supply ColorRendering Dictionaries to control the conversion from device space into CIE space, and also ICC profiles to control the subsequent conversion from CIE space into DeviceGray.
But I strongly suspect that you will get better results by not using -dUseCIEColor, just like Ghostscript tells you.
I can only guess about what you need based on source file. There's DeviceRGB 0.5/0.5/0.5 filled rectangle, and I suspect you want it to become 0.5 DeviceGray.
The solutions and speculations below will work for that and similar cases only. (E.g., I have no idea what are "CMY values" you write about, i.e. if there are DeviceCMYK or ICC-based or anything else in your files). There're simple formulas to convert between device color spaces (see PDF Reference), one of them indeed maps from equal values in DeviceRGB to same value in DeviceGray. To make it work, use GhostScript 9.10:
"C:\Program Files\gs\gs9.10\bin\gswin32c.exe" -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dUseFastColor -o test_1.pdf -f test.pdf
Note the switch -dUseFastColor. You'll get "correct" 0.5 grayscale filled rectangle.
To make it work in versions 9.10 .. 9.20 (excluding both), I had to add another switch: -dPDFUseOldCMS. Again, 0.5 grayscale filled rectangle in result.
As last switch name indicates, simple things were probably considered deprecated, and looks like were scrapped in 9.20.
Instead, new wonderful CMS engine was introduced (since 9.10). Except, it doesn't work for high-level devices (pdfwrite included). Either switched off or broken, for many releases.
I was unable to make it work for any combination of device- or ICC-based colors in source and command line options, to make it actually use the -sOutputICCProfile option, for either DeviceCMYK or DeviceGray output (or ICC-based output, whatever). Same color values in produced files.
I'd appreciate if someone indicates I'm wrong and shows an opposite example.
It worked, actually (partly -- for device source colors only), in 9.10:
"C:\Program Files\gs\gs9.10\bin\gswin32c.exe" -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -sOutputICCProfile=sgray.icc -o test_2.pdf -f test.pdf
Using different icc profiles results in different (and correct, it looks) output. To convert from equal RGB values to same Gray values one would need grayscale profile with same gamma as (default) sRGB. Just, use free ICC Profile Inspector to extract a curve from sRGB and import it into e.g. sgray.icc (distributed with Ghostscript).
The advantage of using a profile to convert RGB to Gray, preserving gamma, opposed to "simple formula" described above, may or may not be worth the effort. Check for your files and purposes.

ghostscript: convert PDF into CMYK preserving pure Black for text

I need to convert RGB PDF into CMYK PDF.
I need to have pure black color for texts.
It seems (thanks to comments below) term "black point compensation" is wrong. I took it from Adobe Acrobat where it works exactly how i need. I thought gs has same feature.
I use ghostscript 9.16
If i got it right there is -dBlackPtComp option, but it does not work for me.
Ghostscript command I have tried is:
"c:/Program Files/gs/gs9.16/bin/GSWIN64C.EXE" -o testing_black_cmyk.pdf -sColorConversionStrategy=CMYK -sDEVICE=pdfwrite -dOverrideICC=true -sOutputICCProfile=c:/Windows/System32/spool/drivers/color/JapanColor2002Newspaper.icc -dTextBlackPt=1 -dBlackPtComp=1 test2.pdf
Try this:
collink -v -G AppleRGB.icc JapanColor2002Newspaper.icc apple_to_jNP_photo.icc
collink -v -f AppleRGB.icc JapanColor2002Newspaper.icc apple_to_jNP_neutrals.icc
control.txt:
Image_RGB apple_to_jNP_photo.icc 0 1 0
Graphic_RGB apple_to_jNP_neutrals.icc 0 1 0
Text_RGB apple_to_jNP_neutrals.icc 0 1 0
and
gswin32c -q -sDEVICE=pdfwrite -o out.pdf -sColorConversionStrategy=CMYK -sSourceObjectICC=control.txt in.pdf
Then the DeviceRGB in source PDF is converted to DeviceCMYK, and RGB 0/0/0 becomes (as I'm checking now) the DeviceGray 0, which should be OK (and all other neutral RGB shades are mapped to true grayscale, too).
The reason we are using different DL-profiles for different objects, is because, though saturated colors (far from neutrals) will be converted to the same CMYK through both profiles, nevertheless you probably don't want color suddenly switch to 0/0/0/n in continuous tone photographs, if color happens to be near neutral -- it'll look terrible on the press.
If your "images" are e.g. rasterized graphics (diagrams, etc.) with 0/0/0 RGB, then you can consider using apple_to_jNP_neutrals.icc for these images too.
If your page has a mix of both real images and rasterized graphics (text) - bad luck, you'll have to compromise.
The reason we use -G instead of fast and simple Simple Mode, is because -f (for second profile) implies the "Gamut Mapping Mode using inverse outprofile A2B", and we want 2 profiles to produce the result (for saturated colors) as close to each other as possible.
From the description on black point compensation on the Little CMS page:
"Black point compensation (BPC) is a technique used to address color
conversion problems caused by differences between the darkest levels
of black achievable on different media/devices."
In other words, BPC has nothing to do with your problem and if you want proper answers, you should remove it from this question.
If you want black to be preserved (or pure / secondary colors in general), you basically have two options you can look at:
1) Create a proper DeviceLink profile to do your conversion. This devicelink profile should take your input ICC Profile and the destination you want to convert to and should contain proper exception rules to keep black / gray / secondary / tertiary colors as required.
2) Use a color conversion engine that supports exceptions while doing regular ICC Profile conversion. Little CMS for example has an intents flag ("INTENT_PRESERVE_K_ONLY_RELATIVE_COLORIMETRIC") that can be set to instruct the engine to preserve black during conversion.

Ghostscript: convert PDF to EPS with embeded font rather than outlined curve

I use the following command to convert a PDF to EPS:
gswin32 -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=epswrite -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
I then use the following command to convert the EPS to another PDF (test2.pdf) to view the EPS figure.
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I found the text in the generated test2.pdf have been converted to outline curves. There is no font embedded anymore either.
Is it possible to convert PDF to EPS without convert text to outlines? I mean, to EPS with embedded font and text.
Also after the conversion (test.pdf -> test.eps -> test2.pdf), the height and width of the PDF figure (test2.pdf) is a little bit smaller than the original PDF (test.pdf):
test.pdf:
test2.pdf:
Is it possible to keep the width and height of the figure after conversion?
Here is the test.pdf: https://dl.dropboxusercontent.com/u/45318932/test.pdf
I tried KenS's suggestion:
gswin32 -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -dLanguageLevel=2 -sOutputFile=test.eps -f test.pdf
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile=test2.pdf -f test.eps
I can see the converted test2.pdf have very weird font:
that is different from the original font in test.pdf:
When I copy the text from test2.pdf, I only get a couple of symbols like:
✕ ✖ ✗✘✙ ✚✛
Here is the test2.pdf: https://dl.dropboxusercontent.com/u/45318932/test2.pdf
I was using the latest Ghostscript 9.15. So what is the problem?
I just noticed you are using epswrite, you don't want to do that. That device is terrible and has been deprecated (and removed now). Use the eps2write device instead (you will need a relatively recent version of Ghostscript).
There's nothing you can do with epswrite except throw it away, it makes terrible EPS files. It also can't make level 2 files, no matter what you set -dLanguageLevel to
oh, and don't use -dNOCACHE, that prevents fonts being processed and decomposes everything to outlines or bitmaps.
UPDATE
You set subset fonts to true. By doing so the character codes which are used are more or less random. The first glyph in the document (say for example the 'H' in 'Hello World') gets the code 1, the second one (eg 'e') gets the code 2 and so on.
If you have a ToUnicode CMap, then Acrobat and other readers can convert these character codes to Unicode code points, without that the readers have to fall back on heuristics, the final one being 'treat it as ASCII'. Because the encoding arrangement isn't ASCII, then you get gibberish. MS Windows' PostScript output can contain additional ToUnicode information, but that's not something we try to mimic in ps2write. After all, presumably you had a PDF file already....
Every time you do a conversion you run the risk of this kind of degradation, you should really try and minimise this in your workflow.
The problem is even worse in this case, the input PDF file has a TrueType CID Font. Basic language level 2 PostScript can't handle CIDFonts (IIRC this was introduced in version 2015). Since eps2write only emits basic level 2 it cannot write the font as a CIDFont. So instead it captures the glyph outlines and stores them in a type 3 font.
However, our EPS/PS output doesn't attempt to embed ToUnicode information in the PostScript (its non-standard, very few applications can make use of it and it therefore makes the files larger for little benefit). In addition CIDFonts use multiple (2 or more) bytes for the character code, so there's no way to encode the type 3 fonts as ASCII.
Fundamentally you cannot use Ghostscript to go PDF->PS->PDF and still be able to copy/paste/search text, if the input contains CIDFonts.
By the way, there's no point in setting -dLanguageLevel at all. eps2write only creates level 2 output.
I used Inkscape To convert a .pdf to .EPS. Just upload the .pdf file to Inkscape, in the options to open chose high mesh, and save as . an EPS file.

Cropped PCL after gswin PDF to PCL conversion

I have a PDF, which I want to convert to PCL
I convert PDF to PCL using the following command:
(gs 8.70)
gswin32c.exe -q -dNOPAUSE -dBATCH \
-sDEVICE=ljetplus -dDuplex=false -dTumble=false \
-sPAPERSIZE=a4 -sOutputFile="d:\doc1.pcl" \
-f"d:\doc1.pdf" -c -quit
When I view or print the output PCL, it is cropped. I would expect the output to start right at the edge of the paper (at least in the viewer).
Is there any to way get the whole output without moving the contents of the page away from the paper edge?
I tried the -dPDFFitpage option which works, but results in a scaled output.
You are using -sPAPERSIZE=a4. This causes the PCL to render for A4 sized media.
Very likely, your input PDF is made for a non-A4 size. That leaves you with 3 options:
...you either use that exact page size for the PCL too (which your printer possibly cannot handle),
...or you have to add -dPDFFitPage (as you tried, but didn't like),
...or you skip the -sPAPERSIZE=... parameter altogether (which most likely will automatically use the same as size as the PDF, and which your printer possibly cannot handle...)
Update 1:
In case ljetplus is not a hard requirement for your requested PCL format variant, you could try this:
gs -sDEVICE=pxlmono -o pxlmono.pcl a4-fo.pdf
gs -sDEVICE=pxlcolor -o pxlcolor.pcl a4-fo.pdf
Update 2:
I can confirm now that even the most recent version of Ghostscript (v9.06) cannot handle non-Letter page sizes for ljetplus output.
I'd regard this as a bug... but it could well be that it won't be fixed, even if reported at the GS bug tracker. However, the least that can be expected is that it will get documented as a known limitation for ljetplus output...