Pdf partial font embedding with iText

Pdf partial font embedding with iText - pdf

I am asked to include partial font into a pdf.
I think I will use iText and I found how to embed the font but I found no clue about partial embedding.
Does anybody know if partial embedding is automatic ? Or maybe iText does not have this feature ?
Thank you.

When does iText embed the full font, a subset, or no font?
In this answer, it is assumed that you use the BaseFont class and the Font class like this:
BaseFont bf = BaseFont.createFont(pathToFont, encoding, embedded);
Font font = new Font(bf, 12);
In this snippet:
pathToFont is the path to a font file (.ttf, .ttc, otf, .afm),
encoding is an encoding such as "winansi", BaseFont.IDENTITY_H,...
embedded is a boolean: true or false.
Will iText embed the font or not?
That's determined by the embedded parameter:
If it is false, the font isn't embedded.
If it is true, the font is embedded, except in the case of the Standard Type 1 fonts or Type 1 fonts for which the .pfb file is missing or CJK fonts.
Regarding the exceptions:
The Standard Type 1 fonts are 4 flavors of Helvetica (regular, bold, italic, bold-italic), 4 flavors of Times Roman (...), 4 flavors of Courier (...), Symbol and Zapfdingbats. iText ships with 14 Adobe Font Metrics (AFM) files. These files contain the metrics that are needed to calculate widths of glyphs and words. iText doesn't have the necessary Printer Font Binary (PFB) files that are required to embed the font.
Type 1 fonts are stored in two files: an AFM file and a PFB file. If you provide an AFM file, iText will look for the PFB file in the same directory. If iText doesn't find any PFB file, the font can't be embedded.
CJK stands for a series of Chinese, Japanese and Korean fonts that are available in downloadable font packs. It's a special type of Asian fonts; Asian fonts in .ttf, .otf or .ttc files can be embedded.
Will iText subset the font or not?
iText will always try to embed a subset of the font, not the whole font, except in case you provide a Type 1 font (AFM and PFB file). In case a Type 1 font is provided, the full font is embedded.
Can iText embed the full font?
Yes, you can force iText to embed the full font by adding this line:
bf.setSubset(false);
However, this value will be ignored in case you use the encoding Identity-H because that's how it's described in ISO-32000-1. iText will only embed full fonts that are stored inside the PDF as a simple font (256 characters); iText will never embed fonts that are stored as a composite font (up to 65,535 characters).

Related

Problem showing a font with license restriction to pdf

I'm programming to convert a file to pdf on mac, file contains a Chinese text
using a font STFangsong which has license restriction and is not embeddable, I
tried many CMaps to encode it, but it seems the root cause is because pdf viewer(both
mac previewer and acrobat reader) does not recognize the font, as shown in the pdf
file properties, Actual Font Unknown and there is a pop message says can't find or
create the font.
The PDF 32000-1:2008 9.6.6.4 tells a guideline that when encoding truetype font,
the font program should be embedded, though no specific explanation, from my
understanding, embedding can guarantee the pdf is readable everywhere, but I do not
need this since the font is licensed, I just want it can be shown on my computer.
So my question here is does those pdf viewer has limitation on those CJK characters
when embedding is forbidden?
By the way, I used Microsoft word to write a text with the font and save word to
pdf, and it shows the font is embedded subset, does it mean Microsoft have bought the
license?

The 14 standard PDF fonts and character encoding

I'm having difficulty producing PDFs that make use of the 14 standard PDF fonts. Let's use Times-Roman as an example.
I create a Font dictionary of type Type1, with BaseFont set to Times-Roman. If I omit the Encoding entry to the Font dictionary, or add an Encoding dictionary without a BaseEncoding set, the PDF viewer application should use the font's built-in encoding. For Times-Roman, this is AdobeStandardEncoding.
This works fine for ASCII characters. However, something more exotic like the 'fi' ligature (AdobeStandardEncoding code 174) is not displayed correctly by all PDF viewers:
Adobe Reader shows ® (unicode index 174) for Times-Roman and Ă for Times-Italic
SumatraPDF (wine) shows ® for both fonts
Mozilla's PDF.js shows the 'AE' ligature both fonts
All other PDF viewers I've tried, display the 'fi' ligature properly. They also display the € symbol correctly, which is additionally mapped using the Differences array in the Encoding dictionary (because it is not included in AdobeStandardEncoding):
Apple Preview/Skim
GhostScript
PDF-XChange Viewer (wine)
Foxit Reader (wine)
Chromium's internal PDF viewer
Evince (homebrew)
Opening Adobe Reader's Document Properties window shows:
Times-Roman
Type: Type1
Encoding: Custom
Actual Font: Times-Roman
Actual Font Type: TrueType
I suspect the fact that a TrueType font is being used instead of a Type1 font might be related to the problem. The PDF specification:
StandardEncoding Adobe standard Latin-text encoding. This is the
built-in encoding defined in Type 1 Latin-text font programs (but
generally not in TrueType font programs).
It also says WinAnsiEncoding and MacRomanEncoding can be used with TrueType fonts. So should we avoid using the built-in or StandardEncoding for the standard 14 fonts? Its effects seem to be undefined. It seems Adobe Reader doesn't bother performing a proper mapping from glyph names to glyphs in the TrueType font being used.
Will providing a Differences array when using the Win or Mac encodings produce proper results? Since these map codepoints to Type1/Postscript glyphs names, there is no direct link to TrueType glyphs.
EDIT Mmm, I have a feeling the Font Descriptor Flags might be important for these standard fonts. I set the flags to 4 up to now for all fonts, which seemed to work fine for True/OpenType fonts.

Turns out the Flags in the FontDescriptor dictionary is important. For Times, the Nonsymbolic flag (bit 6) needs to be set. The fact that Times is actually being typeset using a TrueType font has nothing to do with it.
To use the built-in encoding of the font, the Encoding entry of the Type1 Font dictionary should not be set. You may only add an Encoding dictionary (with BaseEncoding omitted) if it contains a non-empty Differences array, or Adobe Reader will error.
With these precautions, the generated PDF displays correctly on all 9 viewer applications listed above.

Which Chinese font is commonly supported by PDF readers of Chinese people?

I am generating PDF files which contain English and Chinese characters (using the Ruby Prawn library). I don't want to embed a Chinese font file in the generated PDF files, because these files need to stay small. So I'm wondering if I could just mentioning a Chinese font name in my PDF files, and have the PDF readers correctly rendering the Chinese characters because the PDF readers would already have the Chinese font file.
Is that something sensible? If so is there any commonly used Chinese font that one can expect to be installed in most of the PDF readers used by Chinese people?

The best way to ensure that a PDF file can be displayed on a any reader, is to use partially embedded fonts (also known as font subset). In PDF, you don't need to include the whole font with your document, having a subset with just the glyphs that were used in the file is enough for the file to be portable.

PdfBox - change font of text in PDF

Is it possible to change text fonts in existed PDF through PdfBox? If yes how to do that? I have problems with some special fonts in PDF and I want to change them to font that is widely supported.
Thanks

How to find the used characters in a subsetted font?

I have PDF files which are dynamically generated, with text, vectors, and subsetted fonts. I can see which fonts are used in various viewers - is there a way of displaying the actual subsetted characters of those fonts?
For example, I see the document contains the subsetted subsetted fonts "AAAAAC+FreeMono" and "AAAAAD+DejaVuSans". How do I find how many characters were subsetted from these fonts, and what characters they were?
(I tried loading the fonts in FontForge, but it just crashes while opening the file)

The solution is to save the font data to a file and load it into a font editor. A subset font file is still a valid font file but it is possible that FontForge expects some data in the font that is not there. I have seen also many fonts that are not properly subset and this could also cause loading problems in a font editor.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas