PDFKit broken when using Noto Sans font - node-pdfkit

I'm trying to generate a pdf with Node.js using PDFKit. The pdf contains text in multiple languages, including arabic and russian, so I'm trying to use Noto Sans from Google, but as soon as I use that font, the layout is totally broken, not showing even latin characters:
var doc = new PDFDocument();
doc.registerFont('NotoSans', 'fonts/NotoSans-Regular.ttf');
doc.font('NotoSans');
doc.pipe(res);
doc.fontSize(15);
doc.text('UTF-8 Test');
doc.text('صباح الخیر');
doc.text('japanese');
doc.text('武大郎');
doc.text('RUSSIAN');
doc.text('Привет / здравствуйте');
doc.end();
But this is the result. I expected japanese not to work, but nothing gets displayed correctly. This is the output I get:
Using specific fonts for japanese and arabic works rendering that language, but I need a font that support multiple languages because I'm printing a dynamic table and I don't even know which languages are needed.

You're problem is that NotoSans-Regular.ttf doesn't have all the characters you're looking for. It only has Cyrillic, Greek, Latin characters:
Afrikaans, Aghem, Akan, Albanian, Asu, Azerbaijani, Bafia, Bambara,
Basaa, Basque, Belarusian, Bemba, Bena, Bosnian, Breton, Bulgarian,
Catalan, Central Atlas Tamazight, Chiga, Colognian, Cornish, Croatian,
Czech, Danish, Duala, Dutch, Embu, English, Esperanto, Estonian, Ewe,
Ewondo, Faroese, Filipino, Finnish, French, Friulian, Fulah, Galician,
Ganda, German, Greek, Gusii, Hausa, Hawaiian, Hungarian, Icelandic,
Igbo, Indonesian, Irish, Italian, Jola-Fonyi, Kabuverdianu, Kabyle,
Kalaallisut, Kalenjin, Kamba, Kazakh, Kikuyu, Kinyarwanda, Koyra
Chiini, Koyraboro Senni, Kwasio, Kyrgyz, Langi, Latvian, Lingala,
Lithuanian, Luba-Katanga, Luo, Luyia, Macedonian, Machame,
Makhuwa-Meetto, Makonde, Malagasy, Malay, Maltese, Manx, Masai, Meru,
Mongolian, Morisyen, Mundang, Nama, North Ndebele, Northern Sami,
Norwegian Bokmål, Norwegian Nynorsk, Nuer, Nyankole, Oromo, Polish,
Portuguese, Romanian, Rombo, Rundi, Russian, Rwa, Sakha, Samburu,
Sango, Sangu, Sena, Serbian, Shambala, Shona, Slovak, Slovenian, Soga,
Somali, Spanish, Swahili, Swedish, Swiss German, Taita, Tajik,
Tasawaq, Teso, Tongan, Turkish, Ukrainian, Uzbek, Vietnamese, Vunjo,
Walser, Welsh, Yangben, Yoruba, Zarma, Zulu
You need to load Noto-San in all the languages you're using.
Clone the nato into your project directory:
git clone git#github.com:googlei18n/noto-fonts.git
If you need Korean, Japanese, or Chinese clone these fonts as well:
git clone git#github.com:googlei18n/noto-cjk.git
Now that you have the fonts you want to use you need to register them
// Default
doc.registerFont('NotoSans', 'noto-ckj/hinted/Noto-Regular.ttf');
// Ariabic
doc.registerFont('NotoSansAR', 'noto-ckj/hinted/NotoKufiArabic-Regular.ttf');
// Chines, Korean, Japanese
doc.registerFont('NotoSansCKJ', 'noto-ckj/NotoSansCJK-Regular.ttc');
Once you've got the appropriate font's loaded then you can use them like so:
doc.font('NotoSans');
doc.text('English');
doc.font('NotoSansAR');
doc.text('العربية'); // Arabic
doc.font('NotoSansCKJ');
doc.text('日本語'); // Japanese
doc.text('武大郎'); // Chinese
doc.text('한국어'); // Korean
doc.font('NotoSans');
doc.text('русский'); // Russian
doc.text('ελληνικά'); // Greek
Be careful that you're not loading too many fonts into your PDF since they'll add to the overall size of you document. You may consider using a library like Franc to figure out which fonts you need before loading them.

Related

Problem with line breaks in PDF document generated by BIRT

I have some cell texts in a BIRT report which do not flow as nicely as I hoped.
For example,
The text is Long value resultwithaverylongname whichcannotbreak and I had hoped that it would be displayed like this:
Long value
resultwithaverylongname
whichcannotbreak
The render options are as follows:
renderOptions.setOutputFormat(IPDFRenderOption.OUTPUT_FORMAT_PDF);
renderOptions.setOption(IPDFRenderOption.PAGE_OVERFLOW, IPDFRenderOption.OUTPUT_TO_MULTIPLE_PAGES);
renderOptions.setOption(IPDFRenderOption.PDF_TEXT_WRAPPING, true);
renderOptions.setOption(IPDFRenderOption.PDF_WORDBREAK, true);
It seems to me that my desired output is physically possible but I don't know why BIRT does not break on a whitespace and breaks in the middle of the word.
I am using BIRT 4.16 (from Sourceforge). The texts contain normal whitespace (no non-breakable spaces) and are displayed via a data object.
3.Sep.21
I now have an example project which I am trying to commit to Github. In the meantime here is a screenshot showing breaks which look good and others which are not...
The git repo is here: https://github.com/pramsden/test.wordbreak
If the text "resultwithaverylongname" physically fits, then you are right:
BIRT should not break it in the middle of the word.
Your renderOptions seem right (depending of what BIRT version you are using).
At first glance this looks like a bug.
But: In German language, we often have quite long words, and I've created a lot of (complex) PDF reports with BIRT, but I never saw this issue.
So I guess it is a tiny silly detail which causes this.
Just to double-check:
Are the spaces between "Long", "value", "result..." normal spaces (0x20)? or non-breaking spaces?
Which BIRT release are you using?
Are you using a data item or a dynamic text item and if so, is it HTML or plain text?
Can you create a reproducible simple test case and post the rptdesign file somewhere?
well i don use BIRT , but try to use (\n),
in my case I use PDFFlow library to generate pdf docs, and to make a line-break i just use \n
this is a simple example code to create a pdf file and use line break
var DocumentBuilder.New()
.AddSection()
.AddParagraphToSection("Hello world! \n go to the next line")
.ToDocument()
.Build("Result.PDF");
try it and tell me if it works

uwp localization with 1 resource file for all variants of a language

I know how localization in uwp works and I currently have en-US and es-es Resouce files.
What I want to ask is can I just use 1 resource file i.e en and will that affect all variants of English language? like en-us , en-as and all others? and then 1 file for Spanish i.e : es and so on.
all language variants have only few differences between them which are negligible almost in my app. I noticed my app has most of its users from Spain and US only. I assume that is because I am only supporting these 2 languages right now. So I want to support all variants of Spanish and all variants of English but only with 2 files, would that work or do I need to provide 1 file for each variant?
Try checking first if the current app language is any english variant (en-US, en-AS, ...) and if it's true then override it with the one you want.
If ( CurretLanguage == "en-US" || CurrentLanguage == "en-AS" || ...)
{
Windows.Globalization.ApplicationLanguages.PrimaryLanguageOverride = "en-US";
}
Yes, you can do it with single file by just using short language name. In your case, it should be Resources.en.resx for english and Resources.es.resx for spanish. You can refer here for language tags.

PDFs: Extracting text associated with font (linux)

The general problem that I'm trying to solve is to determine how much text in a large set of PDFs is associated with different fonts. I know I can extract text from a PDF using pdftotext and fonts information with pdffonts, but I can't figure out how to link those together. I have 100,000+ PDFs to process, so will need something I can program against (and I don't mind a commercial solution).
PDFTron PDFNet SDK can extract all the graphic operations, including text objects, including link to the font being used.
Starting with the ElementReader sample, you can get the Font for every text element.
https://www.pdftron.com/documentation/samples?platforms=windows#elementreader
https://www.pdftron.com/api/PDFNet/?topic=html/T_pdftron_PDF_Font.htm
The Adobe PDF Library - a product my company sells - can do that.
This is part of the sample code:
// This callback function is called fpr each PDWord object.
ACCB1 ASBool ACCB2 WordEnumProc(PDWordFinder wfObj, PDWord pdWord, ASInt32 pgNum, void* clientData)
{
char str[128];
char fontname[100];
// get word text
PDWordGetString(pdWord, str, sizeof(str));
// get the font name
PDStyle style = PDWordGetNthCharStyle(wfObj, pdWord, 0);
PDFont wordFont = PDStyleGetFont(style);
PDFontGetName(wordFont, fontname, sizeof(fontname));
printf("%s [%s]\n", str, fontname);
return true;
}
This is the output example:
...
Chapter [Arial,Bold]
2: [Arial,Bold]
Overview [Arial,Bold]
27 [Arial]
...
This [TimesNewRoman]
book [TimesNewRoman]
describes [TimesNewRoman]
the [TimesNewRoman]
Portable [TimesNewRoman]
Document [TimesNewRoman]
Format [TimesNewRoman]
...

how can i export DataGridView with ARABIC data from Visual Basic to PDF by using iTextSharp [duplicate]

I have a problem with inserting UNICODE characters in a PDF file in eclipse.
There is some solution for this that it is not very efficient for me.
The solution is like this.
document.add(new Paragraph("Unicode: \u0418", new Font(bfComic, 12)));
I want to retrieve data from a database and show them to the user and my characters are in Arabic script and sometimes in Farsi script.
What solution do you suggest?
thanks
You are experiencing different problems:
Encoding of the data:
Please download chapter 2 of my book and go to section 2.2.2 entitled "The Phrase object: a List of Chunks with leading". In this section, look for the title "Database encoding versus the default CharSet used by the JVM".
You will see that database values are retrieved like this:
String name1 = new String(rs.getBytes("given_name"), "UTF-8");
That’s because the database contains different names with special characters. You risk that these special characters are displayed as gibberish if you would retrieve the field like this:
String name2 = rs.getString("given_name")
Encoding of the font:
You create your font like this:
Font font = new Font(bfComic, 12);
You don't show how you create bfComic, but I assume that this object is a BaseFont object using IDENTITY_H as encoding.
Writing from right to left / making ligatures
Although your code will work to show a single character, it won't work to show a sentence correctly.
Suppose that name1 is the Arabic version of the name "Lawrence of Arabia" and that we want to write this name to a PDF. This is done three times in the following screen shot:
The first line is wrong, because the characters are in the wrong order. They are written from left to right whereas they should be written from right to left. This is what will happen when you do:
document.add(name1);
Even if the encoding is correct, you're rendering the text incorrectly.
The second line is also wrong. The characters are now in the correct order, but no ligatures are made: ل followed by و should be combined into a single glyph: لو
You can only achieve this by adding the content to a ColumnText or PdfPCell object, and by setting the run direction to PdfWriter.RUN_DIRECTION_RTL. For instance:
pdfCell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
Now the text will be rendered correctly.
This is explained in chapter 11 of my book. You can find a full example here: Ligatures2

Need help with fonts in latex: output too dim

I have a problem where all fonts come out too dim. Is there any thing I can do to get a different
look and feel from the pdfs?
My tex file looks like
\documentclass[a4paper,twoside]{article}`
\usepackage{graphics}
\usepackage{color}
\usepackage{hyperref}
\usepackage{multirow}
\usepackage{longtable}
\usepackage{fullpage}
\usepackage[pdftex]{graphicx}
\usepackage{fancyhdr}
\oddsidemargin 0cm
\evensidemargin 0cm
\pagestyle{fancy}
\renewcommand{\headrulewidth}{0.0pt}
\rfoot{Raval, Ketan R -13223}
\textwidth 15.5cm
\topmargin -1cm
\parindent 0cm
\textheight 26.5cm
\parskip 1mm
\begin{document}
\fontencoding{\encodingdefault}
\renewcommand{\familydefault}{\sfdefault}
\fontshape{\shapedefault}
\selectfont
So how can I improve my overall look and feel of the pdf?
Thanks
Maybe you don't like Computer Modern Roman? Try a different type, like Palatino. You can vary the number points of the type to give it a heavier feel, say by using 12pt fonts.
Otherwise, I agree with noviceoof, everything with your .tex file looks perfectly standard. It's just about possible that Latex's font path is finding fonts you don't like, but I would try different fonts before testing this.
You could try different document classes (book, report, letter). What do you mean with "coming too dim"?
Have a look at
http://www.tug.org/pracjourn/2006-1/schmidt/schmidt.pdf
If you just want to change your font, check out http://www.tug.dk/FontCatalogue/
LaTeX actually has quite a few nice fonts :P
Verify if you have installed cm-super package. Without it Computer Modern font looks dim in output pdf.