My use case is as such: my web application is only available in English and we are using the font Nunito. Some people write within the app with characters from other alphabets, but it works, I guess because the browser automatically changes the font. So for example if I write this, it works: γεια σας สวัสดี 你好 Helló ሰላም صباح الخير
But you can make a PDF export from my app, and I had to declare the font for the export, so of course now the characters not included in Nunito are replaced with blank rectangles.
I want the PDF export to display all the characters, and I know the font Noto is available in a lot of languages with different alphabets, but the thing is, as I said, our app is only available in English, so I cannot use a specific font depending on the selected language, and even if it was possible, that wouldn't fix the problem when mixing different alphabets.
So my question is: is there a single font that includes "a lot of" alphabets; or is there a way to load different fonts (such as all the Noto variants) and add a mechanism that chooses the best one for every character (something like a fallback from font to font, like I guess the browsers do)?
Related
There’s a Qt5 application that I use to render text on screen and to PDF.
I’ve been having trouble with newer fonts automatically creating ligatures from e.g. ff (which is plain wrong, ① because U+FB00 ff exists for this purpose, and ② because this will also wrongly convert e.g. compound-word boundaries). I get this is the new fancy thing to do, but I would like to disable it, either by setting an environment variable, a fontconfig fonts.conf(5) setting or similar, or by patching the application. (Modifying the font is no option for most due to licencing issues.)
I cannot find any documentation for this, though. Other people are writing text editors and have similar problems due to the rise of “coding ligatures” in fonts, but so far nobody has provided a workable solution to disable them.
My app should be able to output a PDF file containing the user guide in several supported languages. (I'm using pdfkit)
I had some troubles finding a suitable font for Thai: some so-called Thai supported languages (included Noto Thai from Google) would output squares, question marks or even worse unreadable stuff.
After a bit of research, I found one that seemed to work reasonably well, until our Thai guy noted that the charachters
ต่ำ
were rendered like in the picture below, basically with the two elements above the first character collapsed with one covering the other
I'm using Nimbus Sans Thai Family downloaded from myfonts.com that, by the way, would seem able to render those characters correctly, as you might appreciate trying to copypaste ต่ำ in the preview input
Any hints?
Your font is incomplete in a certain way. It lacks some glyphs that usually reside in Private Use Area (PUA) of Unicode.
Some applications (I'm aware of Microsoft Word) can manually overcome this problem, but your rendering app (and Adobe Acrobat Viewer) does not.
You should either find a font with these glyphs presenting or alternatively find an application that would displace the existing glyphs manually.
Many fonts, despite they claim supporting Thai (and they, indeed, contain "regular" Thai glyphs), can be incomplete.
Besides canonic glyphs, a well-formed font should contain a "Private Use
Area" (PUA) subrange that contains glyphs in non-canonical forms. Those
glyphs include:
Tone marks shifted to the upper position for use in combination with upper
vowels (SARA_I, SARA_UE, etc) and shifted in a lower position in case of Consonant + Tone Mark and no upper vowel;
Tone marks and upper-vowels slightly shifted to the left for use in combination with PO_PLA, FO_FAN, etc (otherwise it would overlap with the consonants' upper tail);
also, both effects combined, e.g. the tone mark shifted down-left at the same time:
Special glyphs for YO_YING and THO_THAN (with no tail) for use in combination with under-vowels;
Several more;
Normally, when a rendered app finds above mentioned symbol combinations, it looks for substitute glyphs in PUA area. If not found, it simply falls back to default glyph, which happens in your case.
Here are two screenshots of PUA areas of Arial Unicode and FreeSerif
which are self-explanatory: FreeSerif has PUA empty. I think, the same problem occurs with your Nimbus font.
And the final observation. Incorrect fonts can be incorrect in different ways. Above I have described a more canonical case when the standard positions of tone marks a upper positions, while non-standard positions are shifted down (or are absent, which constitutes an incomplete font).
There are, however, fonts that behave the opposite way; they (only) contain tone marks in lower positions. This is what you seem to observe.
The problem is that PDFKit does not perform complex script rendering.
Several scripts such as arabic, thai etc, require glyph substitution and re-positioning depending on context (position in string, neighbor characters) and PDFKit seems not to do it.
PDF viewer applications display exactly what is defined in the PDF file. The Nimbus Sans Thai font probably includes all the required glyphs but what bytebuster explains in his answer needs to be performed by PDFKit and not by the viewer application.
I regularly create documents that need Unicode characters above U+FFFF. Unfortunately, OpenOffice and LibreOffice are both unable to correctly export these characters when creating a PDF. The actual data gets mangled by a completely asinine algorithm, while the display just consists of various overlapping question mark boxes.
This is not a font issue. I embed all used fonts in the PDF and all characters below U+FFFF work perfectly fine.
Until now I have been working around this issue by mapping the glyphs I need to a custom PUA font. This solves the display problems, but obviously makes the actual content of the text unsearchable and quite fragile. I haven’t been able to find any settings that might affect the handling of Unicode characters in PDF.
Therefore I have three questions:
Is there a way to make OpenOffice/LibreOffice handle astral characters correctly on PDF export?
If not, is there an external tool that can convert .odt files to PDF while preserving astral characters?
If not, is there another good rich-text editor using a different file format that can deal with astral characters in PDFs?
My client wants us to build a custom document viewer for their app. (It really, truly needs to be custom, because there are a ton of application-specific features they need.)
We built one for them last year that took PDFs, generated page images, and backed the images using a hidden layer of text that could be selected and copied. We did it in Flex. It was a nightmare. PDF is horrid.
This year, we need to build one in HTML 5 with similar requirements, except that most of the documents now are in Word or HTML, that is, they have reflowable text, instead of the fixed layout and glyphs of PDF. But they still want to do PDF in the same viewer.
I'm thinking that we need to convert all documents to some common file format that can handle both reflowable text and also the fixed-position glyphs of PDF. (Each document would probably support one or the other, but not both). It would be nice if it were an XML-like markup language that would say:
<text>here's some text</text>
-- or --
<glyph letter="a" name="my_a_glyph" position="10,10"/>
<image src="my_image" position="20,20"/>
or something like that.
Is there any existing file format out there that can handle it? EPUB won't do the fixed-position text, and PDF sucks in too many ways to describe.
I think you can look at FB2 (FictionBook 2) format . That is an XML-based format, designed for publishing books. It includes images, though I am not sure if they can be aligned absolutely.
Also, you can simply go with HTML and do HTML-to-PDF rendering when needed (there exists various components and libraries for this). I don't see (or you have not listed) any reasons why this way doesn't work.
GROFF? Maybe build a macro library to customize it, as needed.
Groff/troff/nroff, the "run off" programs of Unix, can output to postscript or HTML. The jump from postscript to PDF is built in to some PDF viewers; there are also several existing programs for it, pstopdf, for example.
GROFF has some fixed layout options and some flow-like options. With GROFF, it's almost easier to base most of the printout on flowing text, within proscribed bounds.
Well basically I'm finishing school in mid December so I'm just brushing up my resume and I'm wondering if there's a way to use custom fonts (in this case Calibri and Cambria) in a PDF file and make them render correctly on all computers.
Thanks in advance!
EDIT: I'm using MS Word 2007, but am open to suggestions
PDFs don't store text and fonts like other documents, they actually convert the font to vectors, that way no matter what font you use, the document displays exactly as expected. This is why searching for text inside the PDF is such a problem for 3rd party PDF Readers and why even Adobe themselves use to distribute 2 versions of Acrobat (one with text search, one without).
Another thing to keep in mind is, PDF isn't pixel exact, it's ratio exact. PDF readers generally do not use a 100% zoom level, instead most people read them at "fit to screen" or "fit to page". I point this out because I'm guessing the reason you are trying to use those new Vista/Office 2007 fonts is because of their LCD subpixel support (improves readability on LCD screens). This feature will not translate into the PDF, since the letter becomes a vector, subpixel information is lost, and even if it wasn't, becomes useless because the vector will be sized to something other than you intended at view time.
The PDF format is capable of embedding fonts, if the font has been marked embeddable by its creator. You'll have to check the software that's creating your PDF to see if it has the capability and how to enable it.
theoretically speaking, on technical side, embedding/not embedding ability, regarding the fonts, is settled with a special flag in font file (ttf or opentype or type1)
you can view this special embedding flag with any font editor program (I recommend
FontCreator (by High-logic)
http://www.high-logic.com/font-editor/fontcreator.html
with a free trial fully operative and without limitations
you can also change embedding/not embedding flag, but legally speaking, for the 99% of fonts commercially distributed, this breaks the license of font