Apache PDFBox - Adding multiple fonts - pdfbox

Is there anyway we can add multiple fonts to Apache PDFBox? In our app, showing text in Browser works because browser uses multiple fonts to render a page. We are trying to mimic the same when generating PDF out of the data displayed in the browser but running into many Glyph errors like the one below.
No glyph for U+0633 in font XXXXX
I see iText offers this utility where we can add multiple fonts
fontSelector.addFont(new Font(Fonts.FONT_NOTO_SANS, size, style, color));
fontSelector.addFont(new Font(Fonts.FONT_NOTO_SANS_AR, size, style, color));
fontSelector.addFont(new Font(Fonts.FONT_NOTO_SANS_TH, size, style, color));
fontSelector.addFont(new Font(Fonts.FONT_NOTO_SANS_CJK, size, style, color));
But with PDFbox, I do not see a way to add base font and backup fonts where all characters are not available from single font file especially when we have a mix of English, Foreign language characters, numbers, spaces etc...
PDType0Font.load(document, new File(loc));

Related

How to handle PDF font fallback perfectly when generating it?

I'm currently working on a project that can convert HTML canvas to PDF, user can select the font and draw the text in the canvas and export as a PDF(vector), but there's a problem that user can enter other language text that the font doesn't really support it. It's shown fine in the canvas because the browser did the font fallback mechanism maybe to grab the system font as a fallback, but in the exported PDF it's all corrupted. I've embedded the font in the PDF but the font doesn't have corresponding glyph, and the PDF reader like adobe doesn't have font fallback mechanism so it all become .nodef
I have two ideas but that aren't really satisfying.
1. Collect all glyph from each sentence and create a new font
Walk through each char and check if current font has corresponding glyph, if so, adding it to the new font list, if not, using an alternative font from the font stack #1 as the fallback to get the glyph and adding it to the new font list, then finally converting it as a new font and embedding it in the new PDF.
It seems good but in reality the performance of generating new font is terrible.
(I was using Opentype.js to load and write a new font, when we exported the font by using toArrayBuffer method, it took 10 mins for 6,000 words)
#1, Font stack is a stack like ['Crimson Text', 'Pt Sans', 'Noto Sans'], if the first font can't find corresponding glyph then go next until the end we give up.
2. If encountered any missing char, change the font-family of that sentence to Arial Unicode MS or Noto
It's pretty simple but it converts every word in the sentence to Arial Unicode MS or Noto, besides, it's hard to find a good font that contains most of language's glyph and we can't use font stack mechanism because we only can use one font in one sentence.
My goal is to make the exported PDF similar with the canvas that user drew, hoping someone can give me some direction 😥, many thanks
The usual solution would be to embed all four fonts in your stack plus noto (all suitably subsetted, preferably), and switch between them mid-word as required.
Building a new frankenfont from the fonts as you suggest is not required, though I admire the ambition!

Prevent anti-aliasing (or sub-pixel rendering) of a TrueType font

This is how the .ttf font is rendered:
I have created this vector-only TrueType font using FontForge.
I want to use this font on applications which require vector-based glyphs, and do not support loading .ttf embedded bitmaps (which do not seem to have this problem).
On certain color-schemes this sub-pixel rendering that Windows does makes the font completely unreadable. This effect is present in most ttf fonts, but is much stronger on fonts with pixel-perfect edges like mine.
Does anybody know any programmable hinting tricks or font-settings that will allow the font to render pixel-perfectly instead of with this red/blue halo? I would like the font to work properly without OS modifications to disable ClearType or similar.
To clarify, this is a question about leveraging the TrueType Instruction Set, or changing a TrueType font-setting (not a System/Application setting) that I have may have neglected to set properly, to make the font render legibly (if possible).
Working Solution
Credit goes to Brian Nixon for posting the solution URL, and to Erik Olofsson for researching and posting the solution on his blog.
Erik Olofsson provides a solution that forces Windows font API to prioritize .ttf embedded bitmaps to be used with priority over glyphs when rendering.
The solution can be found in detail at http://www.electronicdissonance.com/2010/01/raster-fonts-in-visual-studio-2010.html
Solution Summary
Add 'Traditional Chinese' code page to the OS/2 Panpose table.
Use the 'ISO 106046-1' (Unicode, UCS-2) encoding.
Include glyphs for the following seemingly random Hiragana characters:
い - U+3044
う - U+3046
か - U+304B
ひ - U+3057
の - U+306E
ん - U+3093
This list is not a joke
On certain color-schemes this sub-pixel rendering that Windows does makes the font completely unreadable.
It sounds as if ClearType is not correctly calibrated.
"Pixel-perfect" display is possible only when the text color matches a color plane of the display. For black or grayscale text, that means a grayscale display (high-resolution and expensive digital monochrome displays are popular in the medical imaging field, for example).
Otherwise, you run into the fundamental fact that color components are physically separated on the display. The concept of ClearType is to adjust the image to compensate for the actual physical offset between color planes.
Printed media with high-precision registration is the closest you get to multiple color planes without any offset.
Now, it does still make sense to disable ClearType in some cases -- when the image is intended to be saved in a file rather than presented on the local display, disabling ClearType can produce results that are legible across a wider range of displays and also compress better. (But for best results, send vectors and let the end-user display compensate for its particular sub-pixel structure)
In GDI, control of ClearType is set via the LOGFONT structure that command text-drawing functions which font family, size, and attributes to use. In GDI+, use SetTextRenderingHint on a Graphics instance.
Because the use of ClearType is set by the application at the same time as size, weight, and other attributes, your font is subject to being requested both with and without. However, ClearType is not compatible with all fonts, by forcing incompatibility you will avoid ClearType for your font alone.
The LOGFONT documentation has the following remarks about ClearType:
The following situations do not support ClearType antialiasing:
Text is rendered on a printer.
Display set for 256 colors or less.
Text is rendered to a terminal server client.
The font is not a TrueType font or an OpenType font with TrueType outlines. For example, the following do not support ClearType antialiasing: Type 1 fonts, Postscript OpenType fonts without TrueType outlines, bitmap fonts, vector fonts, and device fonts.
The font has tuned embedded bitmaps, for any font sizes that contain the embedded bitmaps. For example, this occurs commonly in East Asian fonts.
In addition, the gasp table within the TTF format has several fields specified to influence ClearType usage.
Documentation at https://www.microsoft.com/typography/otspec/gasp.htm
and https://fontforge.github.io/fontinfo.html#gasp
And of course, make sure that the "optimized for ClearType" bit in the head table is not set.

How to find the used characters in a subsetted font?

I have PDF files which are dynamically generated, with text, vectors, and subsetted fonts. I can see which fonts are used in various viewers - is there a way of displaying the actual subsetted characters of those fonts?
For example, I see the document contains the subsetted subsetted fonts "AAAAAC+FreeMono" and "AAAAAD+DejaVuSans". How do I find how many characters were subsetted from these fonts, and what characters they were?
(I tried loading the fonts in FontForge, but it just crashes while opening the file)
The solution is to save the font data to a file and load it into a font editor. A subset font file is still a valid font file but it is possible that FontForge expects some data in the font that is not there. I have seen also many fonts that are not properly subset and this could also cause loading problems in a font editor.

Rules for Font Substitutions to / from PDF when using SSRS / ReportViewer to create PDFs?

In an effort to reduce the size of the PDF files exported from SSRS and the ReportViewer control, one of the main contributors to the size of a PDF is full and subset font embedding.
Ultimately, we decided to standardise our reports using only variations (size, bold, italic etc) of the Arial and Times New Roman fonts - fortunately most of our reports are corporate death-by-spreadsheet fare and not requiring aesthetic appeal.
During PDF creation, these fonts seem to then be substituted by ReportViewer/SSRS for one of the 14 PDF 'standard permissable fonts' (since the standard fonts aren't usually installed on most Windows machines anyway).
So my question is, what exactly are the rules that the PDF renderer on ReportViewer uses during font substitution, rather than embedding a font in the PDF?
Based on this site, and with a bit of trial and error, the following substitutions do seem to be made (from RDL to PDF, list is incomplete):
Times New Roman => Times
Courier New => Courier
Arial => Helvitica
I had thought that the substitution was forced because of font embedding legalities, but Times New Roman and Arial are both "Editable" which I understand to be the least restrictive.
There are a few conditions to be met when the pdf is rendered.
I suspect one of the parameters set out here are not met and converting your fonths to the closest one that fits.
Ensure the font is installed correctly and that font embedding is granted font; and
The font must be a true type font.

How to replace or modify the font or glyphs embedded in a PDF file?

I want to replace the font embedded in an existing PDF file programmatically (with iText).
iText itself does not seem to provide any data model for glyphs and fonts, but I believe it can let me retrieve and update the binary stream that contains the font.
It's OK even if I don't know which glyph is associated to which font - what I want to do is just to replace them. To be precise, I want to embold all glyphs in a PDF document.
Replacing fonts in rendering time is not an option because the output must be PDF with all information preserved as is.
Is there anyone who has done this before with iText or any other PDF libraries?
PDF files define a set of fonts (ie F0, F1, F2) and then define these separately so you could theoretically rewrite the entry for F0. You would have to ensure the 2 fonts have the same spacing (or you will have to rewrite the PDF as well), and probably hack the PDF manually.