What fill rule must be applied for Adobe Type 1 fonts and Type1 CFF fonts as used in PDF? - pdf

The PDF Standard (PDF 32000-1:2008) specifies two rules for filling closed paths, non-zero winding number rule ("NZW") and even-odd rule ("EO"), see p. 136-137.
Among the fonts that are allowed in PDF, my understanding is that TrueType fonts and OpenType CFF2 fonts use always NZW...
But what about Adobe Type1 fonts and Type1 CFF (compact font format) fonts (as described in section 9.6.2.1 on p. 254)? Do these also use NZW?

Related

How to convert the font objects in a PDF to CFF

PDFs which use CFF fonts instead of Type 1 fonts are much smaller. Running a PDF through a ghostscript conversion will indeed produce a PDF with Type1C (CFF) fonts. But this also "re-renders" the PDF, getting rid of certain factors such as the print/screen distinction (if used). It there any way to convert a PDF with Type 1 fonts to a PDF which is otherwise identical, but which uses Type1C fonts instead, i.e., just converting the font objects?

Does TrueType require the 'cmap' table; must it cover all contained glyphs?

The purpose of TrueType's font table cmap purpose is clear: It allows to defined one (or even multiple) ways to map input "character codes" with the glyphs contained in the file.
However I wonder if the TrueType reference does require its presence? And furthermore even if it was mandated that cmap should exist, must it provide a mapping for that covers all glyphs?
Background
Let me provide the motivation to this question, for those who wonder:
Why would somebody event think it sensible to not provide for a mapping?
Would this not make the omitted (no mapping provided) glyphs, inaccessible, what is the point?
PDF has a text encoding defined as /Identity-H which for TrueType fonts maps 16bit words from a text directly to the glyph indeces (refered to as GID), meaning when embedding a TrueType font program in a pdf file and using the /Identity-H encoding for this font, to the best of my undersanding the cmap table is made obsolete, hence the wish to not having to include it in the subset fonts I embed.
From PDF 32000-1:2008 - 9.9 Embedded Font Programs:
[...]These TrueType tables shall always be present if present in the original TrueType font program: “head”, “hhea”, “loca”, “maxp”, “cvt”, “prep”, “glyf”, “hmtx”, and “fpgm”. If used with a simple font dictionary, the font program shall additionally contain a cmap table defining one or more encodings, as discussed in 9.6.6.4, "Encodings for TrueType Fonts". If used with a CIDFont dictionary, the cmap table is not needed and shall not be present, since the mapping from character codes to glyph descriptions is provided separately.[...]
So the cmap table shall be not present if it is used with a CIDFont.
There are two specs being discussed here: the TrueType one, cited in #JosephA answer and PDF one cited in #Jan Slabon answer.
The TrueType spec really mark cmap table as required, so a font file without cmap can be considered invalid.
The PDF spec points that when embedding that font, the cmap can be omitted from the font subset being embedded.
To those not acquainted with PDF font embedding, a font can be embedded entirely or, more often, as a subset (only the character/glyphs used in the file are included).
So, while cmap table is required for TrueType fonts it can be excluded from the subset being embedded in the process of creating the PDF.
Also important to point that even when cmap is embedded, only the relevant cmap entries (used glyphs) are added to the PDF file minimizing the size impact
The TrueType reference mentions in Table 2, that 'cmap' is a required table. I think you're going to get into trouble by purposely trying to create a TrueType font that lacks this table with the hopes that you can rely on the PDF font dictionary's encoding to override it. It's possible PDF TrueType parsing code will not be expecting the cmap table to be present, will raise an error when it's not found, etc. In general I don't believe the cmap table is going to take up much space in the first place in the font program.

The 14 standard PDF fonts and character encoding

I'm having difficulty producing PDFs that make use of the 14 standard PDF fonts. Let's use Times-Roman as an example.
I create a Font dictionary of type Type1, with BaseFont set to Times-Roman. If I omit the Encoding entry to the Font dictionary, or add an Encoding dictionary without a BaseEncoding set, the PDF viewer application should use the font's built-in encoding. For Times-Roman, this is AdobeStandardEncoding.
This works fine for ASCII characters. However, something more exotic like the 'fi' ligature (AdobeStandardEncoding code 174) is not displayed correctly by all PDF viewers:
Adobe Reader shows ® (unicode index 174) for Times-Roman and Ă for Times-Italic
SumatraPDF (wine) shows ® for both fonts
Mozilla's PDF.js shows the 'AE' ligature both fonts
All other PDF viewers I've tried, display the 'fi' ligature properly. They also display the € symbol correctly, which is additionally mapped using the Differences array in the Encoding dictionary (because it is not included in AdobeStandardEncoding):
Apple Preview/Skim
GhostScript
PDF-XChange Viewer (wine)
Foxit Reader (wine)
Chromium's internal PDF viewer
Evince (homebrew)
Opening Adobe Reader's Document Properties window shows:
Times-Roman
Type: Type1
Encoding: Custom
Actual Font: Times-Roman
Actual Font Type: TrueType
I suspect the fact that a TrueType font is being used instead of a Type1 font might be related to the problem. The PDF specification:
StandardEncoding Adobe standard Latin-text encoding. This is the
built-in encoding defined in Type 1 Latin-text font programs (but
generally not in TrueType font programs).
It also says WinAnsiEncoding and MacRomanEncoding can be used with TrueType fonts. So should we avoid using the built-in or StandardEncoding for the standard 14 fonts? Its effects seem to be undefined. It seems Adobe Reader doesn't bother performing a proper mapping from glyph names to glyphs in the TrueType font being used.
Will providing a Differences array when using the Win or Mac encodings produce proper results? Since these map codepoints to Type1/Postscript glyphs names, there is no direct link to TrueType glyphs.
EDIT Mmm, I have a feeling the Font Descriptor Flags might be important for these standard fonts. I set the flags to 4 up to now for all fonts, which seemed to work fine for True/OpenType fonts.
Turns out the Flags in the FontDescriptor dictionary is important. For Times, the Nonsymbolic flag (bit 6) needs to be set. The fact that Times is actually being typeset using a TrueType font has nothing to do with it.
To use the built-in encoding of the font, the Encoding entry of the Type1 Font dictionary should not be set. You may only add an Encoding dictionary (with BaseEncoding omitted) if it contains a non-empty Differences array, or Adobe Reader will error.
With these precautions, the generated PDF displays correctly on all 9 viewer applications listed above.

Pdf partial font embedding with iText

I am asked to include partial font into a pdf.
I think I will use iText and I found how to embed the font but I found no clue about partial embedding.
Does anybody know if partial embedding is automatic ? Or maybe iText does not have this feature ?
Thank you.
When does iText embed the full font, a subset, or no font?
In this answer, it is assumed that you use the BaseFont class and the Font class like this:
BaseFont bf = BaseFont.createFont(pathToFont, encoding, embedded);
Font font = new Font(bf, 12);
In this snippet:
pathToFont is the path to a font file (.ttf, .ttc, otf, .afm),
encoding is an encoding such as "winansi", BaseFont.IDENTITY_H,...
embedded is a boolean: true or false.
Will iText embed the font or not?
That's determined by the embedded parameter:
If it is false, the font isn't embedded.
If it is true, the font is embedded, except in the case of the Standard Type 1 fonts or Type 1 fonts for which the .pfb file is missing or CJK fonts.
Regarding the exceptions:
The Standard Type 1 fonts are 4 flavors of Helvetica (regular, bold, italic, bold-italic), 4 flavors of Times Roman (...), 4 flavors of Courier (...), Symbol and Zapfdingbats. iText ships with 14 Adobe Font Metrics (AFM) files. These files contain the metrics that are needed to calculate widths of glyphs and words. iText doesn't have the necessary Printer Font Binary (PFB) files that are required to embed the font.
Type 1 fonts are stored in two files: an AFM file and a PFB file. If you provide an AFM file, iText will look for the PFB file in the same directory. If iText doesn't find any PFB file, the font can't be embedded.
CJK stands for a series of Chinese, Japanese and Korean fonts that are available in downloadable font packs. It's a special type of Asian fonts; Asian fonts in .ttf, .otf or .ttc files can be embedded.
Will iText subset the font or not?
iText will always try to embed a subset of the font, not the whole font, except in case you provide a Type 1 font (AFM and PFB file). In case a Type 1 font is provided, the full font is embedded.
Can iText embed the full font?
Yes, you can force iText to embed the full font by adding this line:
bf.setSubset(false);
However, this value will be ignored in case you use the encoding Identity-H because that's how it's described in ISO-32000-1. iText will only embed full fonts that are stored inside the PDF as a simple font (256 characters); iText will never embed fonts that are stored as a composite font (up to 65,535 characters).

Use custom TTF font in a PostScript file

I'm trying to write my own PostScript file manually and want to use a custom TTF font downloaded from the web but it's not using it - either uses some other font or doesn't display the text at all. I don't have problems with the fonts installed in the system.
The commands I used were different variations of:
/FontName /TheFontName def
/TheFontName 20 selectfont
(XXXXXXXXXXX) show
You can't use a TrueType font directly in PostScript, unlike PDF PostScript doesn't support TrueType.
In order to use a TrueType font you must first convert it into a type 42 font which PostScript does support.
Adobe Technical Note 5012 documents the type 42 format
You must convert ttf fonts to pfb and pfm format to use it in postscript. There are online tools available to convert ttf fonts to pfb and pfm format.