ghostscript preserve PDF input's font

ghostscript preserve PDF input's font - pdf

My ghostscript command is this:
gs \
-dBATCH -dNOPAUSE \
-sDEVICE=pdfwrite \
-dColorConversionStrategy=/LeaveColorUnchanged \
-dDownsampleMonoImages=false \
-dDownsampleGrayImages=false \
-dDownsampleColorImages=false \
-dAutoFilterColorImages=false \
-dAutoFilterGrayImages=false \
-dColorImageFilter=/FlateEncode \
-dGrayImageFilter=/FlateEncode \
-sOutputFile=./merge.pdf \
-f ./page_*.pdf
Most of the command are related to images.
After execution, I find that the fonts look less sharp than the input file.
The difference between the fonts is as follows:
Fonts of (one of) the input file:
$ pdffonts page_3.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
NachlieliCLM-Bold Type 1 Builtin yes no yes 62 0
NachlieliCLM-Bold Type 1 Custom yes no yes 65 0
Courier10PitchBT-Bold Type 1 Builtin yes no yes 70 0
EAAAAA+LiberationSerif TrueType WinAnsi yes yes yes 27 0
NachlieliCLM-Light Type 1 Builtin yes no yes 75 0
NachlieliCLM-Light Type 1 Custom yes no yes 78 0
HAAAAA+LiberationSans-Bold TrueType WinAnsi yes yes yes 42 0
IAAAAA+DejaVuSans TrueType WinAnsi yes yes yes 47 0
JAAAAA+LMMono9-Regular Type 1 Builtin yes yes yes 52 0
KAAAAA+LMMonoProp10-Regular Type 1 Builtin yes yes yes 37 0
Courier10PitchBT-Roman Type 1 Builtin yes no yes 83 0
MAAAAA+LiberationSerif-Bold TrueType WinAnsi yes yes yes 57 0
NAAAAA+LiberationSerif-Italic TrueType WinAnsi yes yes yes 32 0
Font of the output file:
$ pdffonts merge.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
LQRJGW+LiberationSerif-Bold TrueType WinAnsi yes yes yes 54 0
ZAZOKA+NachlieliCLM-Light Type 1C Custom yes yes yes 42 0
XFXEQZ+LiberationSerif-Italic TrueType WinAnsi yes yes yes 56 0
KBCNYY+LiberationSans-Bold TrueType WinAnsi yes yes yes 44 0
PPEMTT+DejaVuSans TrueType WinAnsi yes yes yes 46 0
FUVLBK+NachlieliCLM-Bold Type 1C Custom yes yes yes 36 0
OQFKGW+LMMono9-Regular Type 1C Custom yes yes no 48 0
ZFATCB+LMMonoProp10-Regular Type 1C Custom yes yes no 50 0
WIGEDL+Courier10PitchBT-Bold Type 1C WinAnsi yes yes no 38 0
AFLCKO+Courier10PitchBT-Roman Type 1C WinAnsi yes yes no 52 0
QNUNTR+LiberationSerif TrueType WinAnsi yes yes yes 40 0
BLSWAW+DejaVuSansMono TrueType WinAnsi yes yes yes 97 0
HQDKJN+LiberationSerif TrueType WinAnsi yes yes yes 99 0
SCAKLE+LiberationSerif-Italic TrueType WinAnsi yes yes yes 101 0
AGALJA+NachlieliCLM-Bold Type 1C Custom yes yes yes 91 0
PPEMTT+DejaVuSans TrueType WinAnsi yes yes yes 103 0
TLVEAY+LiberationSans-Bold TrueType WinAnsi yes yes yes 93 0
GLOKSW+NachlieliCLM-Light Type 1C Custom yes yes yes 95 0
The only method which PARTIALLY works is executing the following:
gs \
-dBATCH -dNOPAUSE \
-sDEVICE=pdfwrite \
-dCompressFonts=true \
-dSubsetFonts=true \
-dEmbedAllFonts=false \
-sOutputFile=./merge.pdf \
-f ./page_*.pdf
$ pdffonts merge.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
LQRJGW+LiberationSerif-Bold TrueType WinAnsi yes yes yes 54 0
NachlieliCLM-Light Type 1 Custom no no yes 42 0
XFXEQZ+LiberationSerif-Italic TrueType WinAnsi yes yes yes 56 0
KBCNYY+LiberationSans-Bold TrueType WinAnsi yes yes yes 44 0
PPEMTT+DejaVuSans TrueType WinAnsi yes yes yes 46 0
NachlieliCLM-Bold Type 1 Custom no no yes 36 0
EVJWAP+LMMono9-Regular Type 1C Custom yes yes no 48 0
LAKFSN+LMMonoProp10-Regular Type 1C Custom yes yes no 50 0
Courier10PitchBT-Bold Type 1 WinAnsi no no no 38 0
Courier10PitchBT-Roman Type 1 WinAnsi no no no 52 0
QNUNTR+LiberationSerif TrueType WinAnsi yes yes yes 40 0
BLSWAW+DejaVuSansMono TrueType WinAnsi yes yes yes 97 0
HQDKJN+LiberationSerif TrueType WinAnsi yes yes yes 99 0
SCAKLE+LiberationSerif-Italic TrueType WinAnsi yes yes yes 101 0
NachlieliCLM-Bold Type 1 Custom no no yes 91 0
PPEMTT+DejaVuSans TrueType WinAnsi yes yes yes 103 0
TLVEAY+LiberationSans-Bold TrueType WinAnsi yes yes yes 93 0
NachlieliCLM-Light Type 1 Custom no no yes 95 0
In the last case - the font LMMono9 doesn't change, but the font NachlieliCLM is sharper (probably due to that it is not embedded...).
As you can see - some of the fonts aren't embedded - which is bad.
The output PDF is intended for printing and sharing, therefore the fonts needs to be embedded and in high quality.
BTW, I know that the flags need to be as such:
-dCompressFonts=true \
-dSubsetFonts=false \
-dEmbedAllFonts=true \
But the fonts still not sharp with these flags.
I've read many SO threads and in documentation but failed to find a solution.
I suspect that I need the output font to be of type Type 1, instead of type Type 1C, but not sure.
Example:
There are attached two files: input.pdf and output.pdf.
There is a difference between them ONLY in the following text:
"Title is here (heading 2)" - Font is Courier 10 Pitch
Bullets "First" up to and including "Sixth" - Font is LM Mono 9
(Notice that the rest of the sentence in the bullets is fine - it doesn't change. Meaning that only the word numbers, like "First", are the ones who change.)
The last/bottom (Hebrew) sentence - Font is Nachlieli CLM
The difference is that in the output file, the text is more gross. (input is sharper). In order to see the difference, place the two PDF documents side by side, each having a zoom of 100%. The difference is noticeable with using either Ubuntu's default Document Viewer or Okular.
The rest of the text is the same.
Also indeed the input PDF file was created using LibreOffice 4.2.
The output file was created using the following command:
gs \
-dBATCH -dNOPAUSE \
-sDEVICE=pdfwrite \
-dColorConversionStrategy=/LeaveColorUnchanged \
-dDownsampleMonoImages=false \
-dDownsampleGrayImages=false \
-dDownsampleColorImages=false \
-dAutoFilterColorImages=false \
-dAutoFilterGrayImages=false \
-dColorImageFilter=/FlateEncode \
-dGrayImageFilter=/FlateEncode \
-sOutputFile=./output.pdf \
-f ./input.pdf
I'm using GPL Ghostscript 9.10 and Ubuntu 14.04.
Screenshot to show the differences between input (pre-GS) and output (post GS). Used the software Document Viewer 3.10 (or Evince).

No the font format is not an issue. Type 1C is simply CFF format which is (more or less) a more compact representation of a type 1 font.
You don't say which operating system you are using, nor which version of Ghostscript.
Its not possible to tell you anything about what is happening without an example. Please post an input file somewhere where we can look at it and then we might be able to help you.
Converting to type 1C is a simple matter and does not affect the font 'sharpness' since both are vector forms. You will not get a type 1 font out of pdfwrite, all type 1 fonts are converted to CFF (type 1C).
The reason NachlieliCLM is not embedded is that have set EmbedAllFonts=false, and its not subset and uses the default Encoding. A subset input font needs to be embedded, because it almost invariably uses a custom Encoding, so you can't simply use the original font in its place.
I suspect, but can't prove without seeing the input, that your problem is due to the poor naming convention used by certain PDF-producing applications. The subset is not unique, causing name collisions. LibreOffice is a known culprit here.
That problem was believed resolved recently (we now additionally use the PDF object number as well the font name) but since I don't know what version of Ghostscript you are using I can't say whether that's likely to be the problem. However that normally causes incorrect characters, not a loss of 'sharpness' which is more likely to be caused by rendering to an image.
As I said, post a (small) example input file and the result you get from Ghostscript and pdfwrite, and it might be possible to say more.

I don't know how current your question is but today I faced a similar issue as yours and I found out using pdffonts that my output after merging does not contain the same fonts as my source resulting in unsharp and unpretty characters. I solved the problem in putting all fonts used by the input in a specific folder and then I called ghostscript with following arguments:
gs -sFONTPATH=<my_folder> \
-dEmbedAllFonts=true \
-sOutputFile=<my_output>.pdf \
<my_input>.pdf
Input and output pdfs matched after that! BTW: I am currectly using gs 9.25.0

Related

Rename PDF embedded font

I am using ghostscript to merge PDF files. But occasionally embedded font names collide among different files, ghostscript will pick one subset, and some characters from other subsets of the same name cannot be rendered after merging.
To solve the problem, I'd like to add a preprocess phase that renames embedded fonts for each file, and the new names are generated by my logic.
Solutions under Linux are preferred.
P.S. I have evaled other tools to merge pdf (pdfbox, pdfjam, pdftk, pdfunite, qpdf), but it looks none of them identify same images, and the merged PDF is large. GhostScript only keeps 1 object for exactly same images in multiple input files, and it fits my scenario.
Update after reading reply from #KenS
GhostScript version: 9.18
PDF creator:
xelatex: XeTeX 3.14159265-2.6-0.99998 (TeX Live 2017)
xdvipdfmx: Version 20170318 by the DVIPDFMx project team, modified for TeX Live.
The output of 2 PDF with collision font names:
$ gs -q -dSAFER -dBATCH -dNOPAUSE -dPDFSETTINGS=/prepress -sDEVICE=pdfwrite -sOutputFile=merged.pdf 1.pdf 2.pdf
GPL Ghostscript 9.18: Missing glyph CID=120, glyph=0078 in the font BLTQUA+LMRoman9-Regular . The output PDF may fail with some viewers.
GPL Ghostscript 9.18: Missing glyph CID=117, glyph=0075 in the font BLTQUA+LMRoman9-Regular . The output PDF may fail with some viewers.
GPL Ghostscript 9.18: Missing glyph CID=118, glyph=0076 in the font BLTQUA+LMRoman9-Regular . The output PDF may fail with some viewers.
GPL Ghostscript 9.18: Missing glyph CID=116, glyph=0074 in the font BLTQUA+LMRoman9-Regular . The output PDF may fail with some viewers.
Embedded fonts:
$ pdffonts 1.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
ITLHBL+LMRoman10-Regular-Identity-H CID Type 0C Identity-H yes yes yes 7 0
BLTQUA+LMRoman9-Regular-Identity-H CID Type 0C Identity-H yes yes yes 9 0
MHRCBY+LMRoman8-Regular-Identity-H CID Type 0C Identity-H yes yes yes 12 0
$ pdffonts 2.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
ITLHBL+LMRoman10-Regular-Identity-H CID Type 0C Identity-H yes yes yes 7 0
BLTQUA+LMRoman9-Regular-Identity-H CID Type 0C Identity-H yes yes yes 9 0
MHRCBY+LMRoman8-Regular-Identity-H CID Type 0C Identity-H yes yes yes 12 0
The fonts names are exactly the same. Because I use xelatex to programmatically generate PDFs in a pattern, the object ids of fonts are exactly the same. And GhostScript considers BLTQUA+LMRoman9-Regular fonts from 2 files are the same subset, and complains at processing time.
As #KenS suggested, I let GhostScript to generate a new file for each PDF.
Ghostscript will calculate a prefix using the MD5 sum of the font contents.
Then check fonts:
$ pdffonts preproc_1.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
JUVZAM+LMRoman8-Regular CID Type 0C Identity-H yes yes yes 22 0
DCQLFZ+LMRoman9-Regular CID Type 0C Identity-H yes yes yes 17 0
YAKIEH+LMRoman10-Regular CID Type 0C Identity-H yes yes yes 13 0
$ pdffonts preproc_2.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
JUVZAM+LMRoman8-Regular CID Type 0C Identity-H yes yes yes 22 0
EQFACS+LMRoman9-Regular CID Type 0C Identity-H yes yes yes 17 0
YAKIEH+LMRoman10-Regular CID Type 0C Identity-H yes yes yes 13 0
Now, it is obvious that LMRoman9-Regular are not the same subsets (though still with the same object id), and this will not confuse GhostScript any more.

[insert usual disclaimer about the fact that Ghostscript does not merge PDF files]
Note that this is really only a problem when the creating application does a poor job of selecting the prefix for the embedded font name. Realistically the fault lies with the PDF creator.
You haven't stated which version of Ghostscript you are using. Recent versions of Ghostscript use both the font name and the PDF object number to try and give a greater degree of uniqueness. So the fonts will only collide if the name and object number in the two PDF files are the same, which is less likely.
If that's still a problem, a practical solution is to pass each of the original PDF files through Ghostscript and the pdfwrite device, to produce a number of new PDF files. When creating the fonts in the new PDF files, Ghostscript will calculate a prefix using the MD5 sum of the font contents. While not absolutely unbreakable, the chances of two different subsets having contents which produce the same MD5 hash is very low.
You can then safely process the newly created PDF files with no real risk that different fonts will have the same name and object number.
If you insist on doing the renaming yourself you might be able to get away with just looking through the PDF file for names of the for XXXXX+FontName. You could modify the 5 letter prefix and rewrite the file.
I can't recall offhand if font objects can be stored in compressed object streams, if they can that would significantly increase the problem, because you would have to decompress the stream, modify the data, recompress it, and, most likely modify the xref table, because its unlikely the recompressed stream would be the same length as the original.

Limiting problematic font mapping in Ghostscript

I am using Ghostscript to process some PDFs for the size reduction. Sometimes the fonts embedded when processed are inferior to the local fonts used when viewing the original.
A few questions:
I imagine that fonts which are already embedded in an input PDF are reused in the output PDF rather than sourced from the local machine. Is this correct? Is this true even when subsetting is enabled?
Is it possible (and reasonably) to have Ghostscript only embed a missing font when it has a strict match?
Is it possible for Ghostscript to retain fonts already embedded in the input PDF but not bother embedding fonts that are missing in the source?
Background
Currently I am using the following command with Ghostscript 9.23:
gs -sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageResolution=72 \
-dGrayImageResolution=72 \
-dMonoImageResolution=72 \
-dColorImageDownsampleThreshold=1.0 \
-dGrayImageDownsampleThreshold=1.0 \
-dMonoImageDownsampleThreshold=1.0 \
-dNOPAUSE -dQUIET -dPARANOIDSAFER -dBATCH \
-dDetectDuplicateImages=true \
-sOutputFile=output.pdf input.pdf
However, in some cases the font remapping appears to be hurting the rendered result. Here is a case where a source PDF without any embedded fonts suffered some rendering degradation for the typical viewer after font substitution and embedding:
Before:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
FrizQuadrata-Bold Type 1 MacRoman no no no 7 0
Helvetica-Black Type 1 MacRoman no no no 9 0
Helvetica-Light Type 1 MacRoman no no no 59 0
Helvetica-Bold TrueType MacRoman no no no 65 0
Helvetica-Bold Type 1 MacRoman no no no 68 0
ZapfDingbats Type 1 Custom no no no 70 0
Helvetica-Black Type 1 MacRoman no no no 108 0
Helvetica-BlackOblique Type 1 MacRoman no no no 136 0
ZapfDingbats Type 1 Custom no no no 137 0
Helvetica-Bold Type 1 MacRoman no no no 780 0
Helvetica-LightOblique Type 1 MacRoman no no no 926 0
After:
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
VRGBBC+Times-Bold Type 1C MacRoman yes yes no 8 0
JTHLZY+Helvetica-Bold Type 1C MacRoman yes yes no 10 0
ETQHWQ+Helvetica Type 1C MacRoman yes yes no 20 0
ZapfDingbats Type 1 ZapfDingbats no no yes 29 0
LSUJJC+Helvetica-BoldOblique Type 1C MacRoman yes yes no 46 0
RBDUAX+Helvetica-Oblique Type 1C MacRoman yes yes no 202 0

There's a lot to answer here......
I imagine that fonts which are already embedded in an input PDF are reused in the output PDF rather than sourced from the local
machine. Is this correct? Is this true even when subsetting is
enabled?
When you create a new PDF file, Ghostscript's pdfwrite device will, by default, preserve everything it possibly can from the input. This is actually vitally important for fonts.
Is it possible (and reasonably) to have Ghostscript only embed a missing font when it has a strict match?
And how do you define a 'strict match' ?
Is it possible for Ghostscript to retain fonts already embedded in the input PDF but not bother embedding fonts that are missing in the
source?
You can certainly adjust the way Ghostscript's pdfwrite device embeds fonts, there are several controls, all documented. The AlwaysEmbed, NeverEmbed arrays permit you to prevent certain named fonts being embedded, there's also EmbedAlllFonts and SubsetFonts.
Try setting EmbedAllFonts to false (important also set SubsetFonts to false) with your test document. I'd also suggest that you share a PDF file which exhibits the problems because its hard to see why there would be a problem. 'some degradation' doesn't really tell me very much. The fonts being embedded (with the exceptions of FrizQuadrata, Helvetica-Black, Helvetica-BlackOblique and Helvetica-LightOblique) are part of the base 13 and so should be fine.
Its a bad idea to not embed fonts (apart from the base 13) in a PDF file, its supposed to be portable, and if you rely on font substitution you always run the risk of the rendered result being incorrect.
Presumably you actually have these fonts locally, so why not just make them available to Ghostscript so that the pdfwrite device can embed subsets of the font in the new PDF file ? Then it will be portable.

Why does ghostscript replace fontnames to "CairoFont"?

I use ghostscript to optimize pdf files (mostly with respect to size), for which it does a great job. The command that I use is:
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress \
-dCompatibilityLevel=1.4 -sOutputFile=out.pdf in.pdf
However, it seems that this replaces fonts (or subsets them) and does not preserve their names. It replaces it by CairoFont. How could I get ghostscript to preserve the fontnames?
Example:
A simple pdf file (created with Inkscape), with a single text element in it (Nimbus Roman) as an input (in.pdf):
for which pdffonts reports:
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
PMLNBT+NimbusRomanNo9L Type 1 yes yes yes 5 0
However, after running ghostscript over the file pdffonts reports:
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
OEPSCM+CairoFont-0-0 Type 1C yes yes no 8 0
So, is there a way to have ghostscript (or libcairo?) preserve the name of the font?
The input file is uploaded here.

Ghostscript doesn't change the font name, but there are, in fact, several different font 'names' in a PDF file.
In the case of your file the PDF FontDescriptor object has a name
<<
/Type /FontDescriptor
/FontName /PMLNBT+NimbusRomanNo9L
/Flags 4
/FontBBox [ -168 -281 1031 924 ]
/ItalicAngle 0
/Ascent 924
/Descent -281
/CapHeight 924
/StemV 80
/StemH 80
/FontFile 7 0 R
>>
which refers to a FontFile stream
/FontFile 7 0 R
That stream contains the following:
%!PS-AdobeFont-1.0: NimbusRomNo9L-Regu 1.06
%%Title: NimbusRomNo9L-Regu
%Version: 1.06
%%CreationDate: Thu Aug 2 13:14:49 2007
%%Creator: frob
%Copyright: Copyright (URW)++,Copyright 1999 by (URW)++ Design &
%Copyright: Development; Cyrillic glyphs added by Valek Filippov (C)
%Copyright: 2001-2005
% Generated by FontForge 20070723 (http://fontforge.sf.net/)
%%EndComments
FontDirectory/NimbusRomNo9L-Regu known{/NimbusRomNo9L-Regu findfont dup/UniqueID known pop false {dup
/UniqueID get 5020931 eq exch/FontType get 1 eq and}{pop false}ifelse
{save true}{false}ifelse}{false}ifelse
11 dict begin
/FontType 1 def
/FontMatrix [0.001 0 0 0.001 0 0 ]readonly def
/FontName /CairoFont-0-0 def
Do you see the FontName in the actual font ? Its called CairoFont-0-0
This brings me back to a point which I reiterate frequently here and elsewhere; when you process a PDF file with Ghostscript and emit a new PDF file using the pdfwrite device you are not 'optimising', 'converting', 'subsetting' or in a general sense manipulating the content of the original PDF file.
What Ghostscript does is interpret the PDF file, ths produces a set opf marking operations (such as 'stroke', 'fill', 'image' etc) which it sends to the selected Ghostscript device. Most Ghostscript devices will then use the graphics library to render the operations to a bitmap and when the page is complete will write the bitmap to a file. The 'high level' or 'vector' devices instead repackage the operations into another Page Description Language. In the case of pdfwrite, that's a PDF file.
What this means in practice is that the emitted PDF file has nothing (apart from appearance) in common with the original PDF file. In particular the description of the objects may be different.
So in your case, the pdfwrite device doesn't know what the font was called in the original PDF object. It does know that the font that was defined was called Cairo-0-0 so that's what it calls the font when it emits it.
Frankly this is another piss-poor example from Cairo, to go along with defining each page as containing transparency whether it does or not, the FontName in the Font object is supposed to be the same as the name in the Font stream.
Its pretty clear that the FontName has been altered, given the rest of the boilerplate there.

Ghostscript renders embedded fonts in pdf poorly (all jaggy)

Ghostscript doesn't render embedded fonts in pdf's properly.
e.g. http://vegro.nl/cmsfiles/ConsumentenAssortiment/Brochure/10.axd
The characters of the logo on the right top ('Thermrad') are all jagged.
If I open the file in Adobe Reader, no problem at all!
Do you have this problem too? Is there any solution?
I've been searching for days now, but I cannot find anything.
I tried Ghostscript 8.64 and 8.71 both on Windows Vista and CentOS.

My advice is to use Ghostscript 8.71. Then use this commandline:
gswin32c.exe ^
-sDEVICE=pdfwrite ^
-o thermrad-out.pdf ^
-dPDFSETTINGS=/printer ^
10.axd
That should do the job of converting the PDF to one that has no problems any more. Because the original .axd file does have a problem with an embedded font. (I'm using pdffonts.exe from the XPDF suite to check). The problem occurs on page 3 of your 10.axd:
for /l %i in (1,1,16) do (
echo. ............ Page %i ............................... ^
& pdffonts.exe -f %i -l %i 10.axd ^
& echo.)
outputs this:
[....]
............ Page 3 ...............................
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
YCRHYF+HelveticaNeue-LightExt Type 1C yes yes yes 249 0
XCZBKH+HelveticaNeue-Light Type 1C yes yes yes 250 0
Error: Illegal entry in bfchar block in ToUnicode CMap
YCRHYF+HelveticaNeue-LightExt Type 1C yes yes yes 15 0
Error: Illegal entry in bfchar block in ToUnicode CMap
YCRHYF+HelveticaNeue-LightExt Type 1C yes yes yes 19 0
Error: Illegal entry in bfchar block in ToUnicode CMap
YCRHYF+HelveticaNeue-LightExt Type 1C yes yes yes 41 0
Error: Illegal entry in bfchar block in ToUnicode CMap
YCRHYF+HelveticaNeue-LightExt Type 1C yes yes yes 45 0
Error: Illegal entry in bfchar block in ToUnicode CMap
YCRHYF+HelveticaNeue-LightExt Type 1C yes yes yes 49 0
Error: Illegal entry in bfchar block in ToUnicode CMap
YCRHYF+HelveticaNeue-LightExt Type 1C yes yes yes 53 0
Error: Illegal entry in bfchar block in ToUnicode CMap
YCRHYF+HelveticaNeue-LightExt Type 1C yes yes yes 57 0
Error: Illegal entry in bfchar block in ToUnicode CMap
YCRHYF+HelveticaNeue-LightExt Type 1C yes yes yes 61 0
[....]
After I let Ghostscript repair it, the problem is gone for page 3 in the repaired PDF:
c:\> pdffonts.exe -f 3 -l 3 thermrad.pdf
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
CZBBTM+HelveticaNeue-LightExt Type 1C yes yes no 13 0
MXETZY+HelveticaNeue-Light Type 1C yes yes no 40 0

The cure for smooth font rendering for us when converting PDF to JPG was to turn on text anti-aliasing with -dGraphicsAlphaBits=4 -dTextAlphaBits=4.
Here's a windows batch file I use to convert to a page size passed on the command line. Sample invocation: pdf2jpg infile.pdf 11x17
gswin64c.exe ^
-dNOPAUSE -P- -dSAFER -dBATCH ^
-dGraphicsAlphaBits=4 ^
-dTextAlphaBits=4 ^
-sDEVICE=jpeg ^
-dJPEGQ=85 ^
-r300x300 ^
-sPAGESIZE=%2^
-sOutputFile=%~n1.jpg ^
%1
Also there is at least one known issue with font anti-aliasing being turned off automatically in some gs versions if transparent images are present. Convert a PDF to a Transparent PNG with GhostScript has a solution.

How to find out which fonts are referenced and which are embedded in a PDF document

We have a little problem with fonts in PDF documents. In order to put the finger on the problem I'd like to inspect, which fonts are actually embedded in the pdf document and which are only referenced. Is there an easy (and cheap as in free) way to do that?

pdffonts command line tool originally from Xpdf, now part of Poppler.
This tool is available in most Linux distributions as part of poppler-utils package.
Example usage and output:
$ pdffonts some.pdf
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
BAAAAA+Arial-Black TrueType yes yes yes 53 0
CAAAAA+Tahoma TrueType yes yes yes 28 0
DAAAAA+Wingdings-Regular TrueType yes yes yes 43 0
EAAAAA+Webdings TrueType yes yes yes 38 0
FAAAAA+Arial-BoldMT TrueType yes yes yes 33 0
GAAAAA+Tahoma-Bold TrueType yes yes yes 23 0
HAAAAA+OpenSymbol TrueType yes yes yes 48 0

Much simpler if you just want to find out the font names: run this from a terminal
strings yourPDFfilepath.pdf | grep FontName

I finally got an example file that actually seems to have fonts embedded.
Using the normal Adobe Reader (or Foxit if you prefer). Select File->Properties on the resulting Dialog choose the Font tab. You will see a list of fonts. The ones that are embedded will state this fact in ( ) behind the font name.

CAM::PDF has a font reporter, available as a command-line utility or via a library call. If you run "listfont.pl file.pdf" you get output like this:
Page 1:
Name: F1.0
Type: TrueType
BaseFont: NZUXSR+Impact
Encoding: MacRomanEncoding
Widths: yes
Characters: 0-255
Embedded: yes
Name: F2.0
Type: TrueType
BaseFont: XSFKRA+ArialMT
Encoding: MacRomanEncoding
Widths: yes
Characters: 0-255
Embedded: yes

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas