Rendering issue for combined font (Japanese & English ) in PDF using cfdocument - pdf

I have big trouble with "Combined Fonts" (Japanese & English).
I have to create a PDF document from HTML content which is shown in my website. For that I have used <cfdocument> and implemented the PDF from the HTML content. But my content includes both Japanese & English content and which is appear in a different font in the created PDF than what is on my website. The issue occurred only in the case of combined Japanese & English section.
The requirement is:
For English content, the font should be Verdana.
For Japanese content, the font should be Simson.
I have implemented the same with Korean, Chinese, French and it's working.
For outputting the special characters, I have added <cfprocessingDirective pageEncoding="utf-8"> above the code. But I still get weird font for the contents in both English and Japanese.
The code I have tried is given below,
<cfcontent type="application/pdf">
<cfheader name="Content-Disposition" value="attachment;filename=test.pdf">
<cfprocessingdirective pageencoding="utf-8">
<cfdocument format="PDF" localurl="yes" marginTop=".25" marginLeft=".25" marginRight=".25" marginBottom=".25" pageType="custom" pageWidth="8.5" pageHeight="10.2">
<cfoutput>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>PDF Export Example</title>
<style>
body { font-family: Verdana; }
h1 { font-size: 14px; }
p { font-size: 12px; line-height: 1.25em; margin-left:20px;}
</style>
</head>
<body>
<h1>PDF Export Example Combined Japanese & English</h1>
<p>This is an japanese with english example
日本人は単純な音素配列論で膠着、モーラ·タイミングの言語、純粋な母音システム、
音素の母音と子音の長さ、および語彙的に重要なピッチアクセント。語順は通常、粒子が言葉の文法的機能をマ
ーキング対象オブジェクトと動詞であり、文の構造は、トピック·コメントです。文末粒子は、感情的または強調の影響を追加したり、
質問を作るために使用されます。名詞は文法的に番号や性別を持たず、何の記事はありません。動詞は主に緊張し、音声ではなく、
人のために、コンジュゲートされる。形容詞の日本の同等物は、また、結合している。日本人は動詞の形や語彙、話者の相対的な地位、
リスナーおよび掲げる者を示すと敬語の複雑なシステムを持っています。This is an example.
</p>
<h1>PDF Export English Example</h1>
<p>This is an example.
</p>
</body>
</html>
</cfoutput>
</cfdocument>
What else should I do to fix this problem?
Thank you.

As per the results I have got and research, I have found that there is an issue with PDF style rendering for English with Japanese content.
So finally I found a solution,
Apply space between the Japanese and English words.
Iterate the whole string (list with space delimiter)
Then It is possible to differentiate the English words from the Japanese words by Regular expression
Apply separate style for English words by wrapping them by span (or any) tag.
I don't know whether this is the proper solution for this issue. This is what I have done for solving the issue.

Related

How to control page breaks with react-native-html-to-pdf?

I am generating a pdf document using react-native-html-to-pdf.
When the document contains a long list of elements it is possible for some elements to span two pages within the same document.
For example this simple html:
<html>
<head></head>
<body>
<section style="border:solid 1px black;"><p>item</p></section>
<!-- sections repeat 32 times omitted for brevity -->
</body>
</html>
I get a document that looks like this at the page break:
How can I control this? Does it depend on the html elements in the document?

Why won’t both of my html stuff work together

I am trying to decorate my background of my website I am building and for some reason I can put one or the other by themselves work but when I add both lines then only the top one works. How can I make both lines work together.
<body style=background-color:powderblue>
<body style=border-style:solid;border-color:red>
You can only have one <body> tag.
<body style="background-color:powderblue;border-style:solid;border-color:red">
Combine the styles into one, or move the styles to a css file or <style> block in your <head>
body {
background-color:powderblue;
border-style:solid;
border-color:red
}

C# Selenium Webdriver (Firefox) iFrame does not allow text to be entered via sendKeys

I'm using latest Selenium Firefox (2.53.0)
Previously code was working when performing the following
1) Finding the iFrame by Xpath iframe class
IWebElement detailFrame = `Driver_Lib.Instance.FindElement(By.XPath("//iframe[#class='cke_wysiwyg_frame cke_reset']"));`
2) Switching to that frame by
Driver_Lib.Instance.SwitchTo().Frame(detailFrame);
3) finding the p tag within the iFrame by
IWebElement freeText = Driver_Lib.Instance.FindElement(By.TagName("p"));
4) Inserting a simple string to the iframe text box
freeText.SendKeys("this is some text");
5) switching from the iFrame back to the main contentwindow by
Driver_Lib.Instance.SwitchTo().DefaultContent();
Here is the code part from the application
<iframe class="cke_wysiwyg_frame cke_reset" frameborder="0" src="" style="width: 100%; height: 100%;" title="Rich Text Editor, ctl00_ctl00_MainContentPlaceHolder_PageContent_mlcEditor_CKEditor" aria-describedby="cke_61" tabindex="0" allowtransparency="true">
<!DOCTYPE html>
<html lang="en" dir="ltr">
<head>
<title data-cke-title="Rich Text Editor, ctl00_ctl00_MainContentPlaceHolder_PageContent_mlcEditor_CKEditor">Rich Text Editor, ctl00_ctl00_MainContentPlaceHolder_PageContent_mlcEditor_CKEditor</title>
<style data-cke-temp="1">
<link href="https://myUrl/contents.css" rel="stylesheet" type="text/css">
<style data-cke-temp="1">
</head>
<body class="cke_editable cke_editable_themed cke_contents_ltr cke_show_borders" contenteditable="true" spellcheck="false">
<p>
<br _moz_editor_bogus_node="TRUE">
</p>
</body>
</html>
</iframe>
The test I am running is a simple one, open up that page, insert some text, save.
It not inserting the text into the iFrame. I am totally puzzled as to why.
Has anyone else found this issue at all?
Many thanks
I have removed the exception, this was a redHerring.
the iFrame can not have text entered into it
hi all I've found the solution:~ here is the summary of what was happening:
1) The iFrame was being located by xPath.
2) the SwitchTo() method used placed focus in the detailFrame instance of IWebElement
3) What was not happening was the p tag could not be located as it was contained withing a CSS Body Class that.
The solution was staring me in the face the whole time! so simple!!
I did this:
IWebElement detailFrame = Driver_Lib.Instance.FindElement(By.XPath("//iframe[#class='cke_wysiwyg_frame cke_reset']"));
Driver_Lib.Instance.SwitchTo().Frame(detailFrame);
IWebElement freeText = Driver_Lib.Instance.FindElement(By.TagName("body"));
freeText.SendKeys("This is a free text question created by Automation Smoke Test");
Driver_Lib.Instance.SwitchTo().DefaultContent();
So as you see, simply locating the 1st instance of the body tag!

Is it possible to add a background to the entire height in pdf when it's converted with wkhtmltopdf?

Is it possible to add a background color to the entire height in pdf when it's converted with wkhtmltopdf?
This css rule:
html,body {
height: 100%;
backgorund-color: #ff0000;
}
doesn't work :)
Converted html is automaticaly generated. It has from 1 to 3 pages.
This works perfectly for me:
#test.html
<html>
<body style="background-color:#E6E6FA">
<h1>Hello world!</h1>
</body>
</html>
And run command -
wkhtmltopdf --margin-bottom 0 --margin-top 0 test.html test1.pdf
For more wkhtmltopdf options check out this manual

trouble using xhtml2pdf with unicode

I've been trying to convert Hebrew html files without success; the Hebrew characters show up in the output PDF as black rectangles regardless of any encoding I tried.
I tried some unicode test files included in the pisa distribution: pisa-3.0.33\test\test-unicode-all.html and \test-bidirectional-text.html . I ran xhtml2pdf from the command line both with and without --encoding utf-8. Same result: none of the non-Latin characters made it through.
Is this a fonts problem*? If the unicode test file works for you, was there anything you did to set it up?
*FWIW, at least some of these languages, including Hebrew, should work with Arial.
EDIT: Alternatively, if someone has pisa set up and could try converting the unicode test file above, I would be very grateful.
Inserting following code into html helped me
<style>
#page {
size: a4;
margin: 0.5cm;
}
#font-face {
font-family: "Verdana";
src: url("verdana.ttf");
}
html {
font-family: Verdana;
font-size: 11pt;
}
</style>
in url instead of "verdana.ttf" you should put absolute path to font in your os
If anyone in the future tries, like me, to figure out how to PROPERLY create a PDF file that contains Hebrew using xhtml2pdf, here's what worked for me:
First thing: including the fonts settings as described here by #eviltrue in my HTML. This can be any font as long as it supports Hebrew characters, otherwise any Hebrew characters in the input HTML would simply appear as black rectangles in the PDF.
At the time of writing this answer, while it is possible to output Hebrew characters to PDF in xhtml2pdf, Hebrew characters are outputted in revers order, i.e. שלום כיתה א
would be א התיכ םולש.
At this point I was stuck, but then I stumbled upon this SO asnwer:
https://stackoverflow.com/a/15449145/1918837
After installing the python-bidi package, here is an example of a complete solution (used in a python app):
from bidi import algorithm as bidialg
from xhtml2pdf import pisa
HTMLINPUT = """
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<style>
#page {
size: a4;
margin: 1cm;
}
#font-face {
font-family: DejaVu;
src: url(my_fonts_dir/DejaVuSans.ttf);
}
html {
font-family: DejaVu;
font-size: 11pt;
}
</style>
</head>
<body>
<div>Something in English - משהו בעברית</div>
</body>
</html>
"""
pdf = pisa.CreatePDF(bidialg.get_display(HTMLINPUT, base_dir="L"), outpufile)
# I'm using base_dir="L" so that "< >" signs in HTML tags wouldn't be
flipped by the bidi algorithm
The nice thing about the bidi algorithm is that you can have mixed RTL and LTR languages in the same line (like in the HTML example above) and still have a correctly formatted result.
EDIT:
The best way to go now is definitely using wkhtmltopdf