Adding Arial Unicode MS to CKEditor - pdf

My web application allows user to write rich text inside CKEditor, then export the result as PDF with the Flying Saucer library.
As they need to write Greek characters, I chose to add Arial Unicode MS to the available fonts, by doing the following :
config.font_names = "*several fonts...*; Arial Unicode MS/Arial Unicode MS, serif";
This font is now displayed correctly in the CKEditor menu, but when I apply this font to any element, I get the following result :
<span style="font-family:arial unicode ms,serif;"> some text </span>
As you can notice, I lost the UpperCase characters. This has pretty bad effect during PDF export, as then Flying Saucer doesn't recognise the font and so uses Helvetica which does not support Unicode characters, so the greek characters are not displayed in the PDF.
If I change manually from code source
<span style="font-family:arial unicode ms,serif;"> some text </span>
to
<span style="font-family:Arial Unicode MS,serif;"> some text </span>
then it is working as expected, greek characters are displayed.
Has anyone met this problem before? Is there a way to avoid UpperCase characters to be changed to LowerCase?
I really want to avoid doing any kind of post-processing like :
htmlString = htmlString.replace("arial unicode ms", "Arial Unicode MS");

I agree with you regarding resolving this issue aside from Flying Saucer R8.
Although depending upon your workflow, would it be more efficient to allow CKEditor to preprocess and validate a completed HTML encoded file (render the entire document to HTML first)?
None of the CKEditor support tickets specify the true source of the issue, so I recommend confirming for yourself whether it is (A) a styling issue, or (B) a CSS processing issue, or (C) a peculiar CKEditor parsing issue.
A possible workaround:
Make a copy of the desired unicode font and import it into Type 3.2 (works on both Mac and Windows).
http://www.cr8software.net/type.html
rename the duplicate font set into something all lowercase.
Limit your font selection
config.font_names = "customfontnamehere";
Apply the style separately (unicode typeface greatvibes below) and see if that gives you the desired result:
var s = CKEDITOR.document.$.createElement( 'style' );
s.type = 'text/css';
cardElement.$.appendChild( s );
s.styleSheet.cssText =
'#font-face {' +
'font-family: \'GreatVibes\';' +
'src: url(\'' + path +'fonts/GreatVibes-Regular.eot\');' +
'}' +
style;
If the above does not work, you can try to modify the xmas plugin.js (also uses the unicode typeface greatvibes and does all sorts of cool manipulations before output), so it might be worth trying to modify it rather than start from scratch:
'<style type="text/css">' +
'#font-face {' +
'font-family: "GreatVibes";' +
'src: url("' + path +'fonts/GreatVibes-Regular.ttf");' +
'}' +
style +
'</style>' )
Whichever approach you try, the goal is to test various styling and see if CKEditor defaults back to Helvetica again.
Lastly, the CKEditor SDK has excellent support, so if you have the time and energy, you could write a plugin. Sounds daunting, I know, but notice how the plugin.js within the /plugins/font directory has priority for size attributes.
If you are not interested in producing your own plugin, I recommend contacting the prolific ckeditor plugin writer
doksoft
(listed both on their website and on his own website) and ask for a demo of his commercial plugin "CKEditor Special Symbols" which has broad unicode capability.
Hope that helps,
ClaireW

I didn't find any way to do it with Flying Saucer R8, but you can make it work using Flying Saucer R9.
The method
ITextResolver.addFont(String path, String fontFamilyNameOverride, String encoding, boolean embedded, String pathToPFB) allow you to add the fond with a specific name.
Code sample:
ITextRenderer renderer = new ITextRenderer();
// Adding fonts
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", "arial unicode ms", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, null);
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", "Arial Unicode MS", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, null);
String inputFile = "test.html";
renderer.setDocument(new File(inputFile));
renderer.layout();
String outputFile = "test.pdf";
OutputStream os = new FileOutputStream(outputFile);
renderer.createPDF(os);
os.close();
You can find Flying Saucer R9 on Maven.

The simplest solution (until CKEditor fixes that bug) is to do that post-processing.
You can do it on the server (really simple, you already have the code) or with a little CKEditor plugin, but that will give you the solution that you want and unless you need to add more fonts it will work without any further changes.

Related

acroform field.setRichTextValue is not working

I have a field from acroform and I see field.setValue() and field.setRichTextValue(...). The first one set the correct value, but second one seems not working, rich text value is not display.
Here is code im using :
PDDocument pdfDocument = PDDocument.load(new File(SRC));
pdfDocument.getDocument().setIsXRefStream(true);
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
acroForm.setNeedAppearances(false);
acroForm.getField("tenantDataValue").setValue("Deuxième texte");
acroForm.getField("tradingAddressValue").setValue("Text replacé");
acroForm.getField("buildingDataValue").setValue("Deuxième texte");
acroForm.getField("oldRentValue").setValue("750");
acroForm.getField("oldChargesValue").setValue("655");
acroForm.getField("newRentValue").setValue("415");
acroForm.getField("newChargesValue").setValue("358");
acroForm.getField("increaseEffectiveDateValue").setValue("Texte 3eme contenu");
// THIS RICH TEXT NOT SHOW ANYTHING
PDTextField field = (PDTextField) acroForm.getField("tableData");
field.setRichText(true);
String val = "\\rtpara[size=12]{para1}{This is 12pt font, while \\span{size=8}{this is 8pt font.} OK?}";
field.setRichTextValue(val);
I expect field named "tableData" to be setted with rich text value!
You can download the PDF form I am using with this code : download pdf form
and you can download the output after runn this code and flatten form data download output here
To sum up what has been said in the comments to the question plus some studies of the working version...
Wrong rich text format
The OP in his original code used this as rich text
String val = "\\rtpara[size=12]{para1}{This is 12pt font, while \\span{size=8}{this is 8pt font.} OK?}";
which he took from this document. But that document is the manual for the LaTeX richtext package which provides commands and documentation needed to “easily” produce such rich strings. I.e. the \rtpara... above is not PDF rich text but instead a LaTeX command that produces PDF rich text (if executed in a LaTeX context).
The document actually even demonstrates this using the example
\rtpara[indent=first]{para1}{Now is the time for
\span{style={bold,italic,strikeit},color=ff0000}{J\374rgen}
and all good men to come to the aid of \it{their}
\bf{country}. Now is the time for \span{style=italic}
{all good} women to do the same.}
for which the instruction generates two values, a rich text value and a plain text value:
\useRV{para1}: <p dir="ltr" style="text-indent:12pt;
margin-top:0pt;margin-bottom:0pt;">Now is the time
for <span style="text-decoration:line-through;
font-weight:bold;font-style:italic;color:#ff0000;
">J\374rgen</span> and all good men to come to the
aid of <i>their</i> <b>country</b>. Now is the
time for <span style="font-style:italic;">all
good</span> women to do the same.</p>
\useV{para1}: Now is the time for J\374rgen and all
good men to come to the aid of their country. Now
is the time for all good women to do the same.
As one can see in the \useRV{para1} result, PDF rich text uses (cut down) HTML markup for rich text.
For more details please lookup the PDF specification, e.g. section 12.7.3.4 "Rich Text Strings" in the copy of ISO 32000-1 published by Adobe here
PDFBox does not create rich text appearances
The OP in his original code uses
acroForm.setNeedAppearances(false);
This sets a flag that claims that all form fields have appearance streams (in which the visual appearance of the respective form field plus its content are elaborated) and that these streams represent the current value of the field, so it effectively tells the next processor of the PDF that it can use these appearance streams as-is and does not need to generate them itself.
As #Tilman quoted from the JavaDocs, though,
/**
* Set the fields rich text value.
*
* <p>
* Setting the rich text value will not generate the appearance
* for the field.
* <br>
* You can set {#link PDAcroForm#setNeedAppearances(Boolean)} to
* signal a conforming reader to generate the appearance stream.
* </p>
*
* Providing null as the value will remove the default style string.
*
* #param richTextValue a rich text string
*/
public void setRichTextValue(String richTextValue)
So setRichTextValue does not create an appropriate appearance stream for the field. To signal the next processor of the PDF (in particular a viewer or form flattener) that it has to generate appearances, therefore, one needs to use
acroForm.setNeedAppearances(true);
Making Adobe Acrobat (Reader) generate the appearance from rich text
When asked to generate field appearances for a rich text field, Adobe Acrobat has the choice to do so either based on the rich text value RV or the flat text value V. I did some quick checks and Adobe Acrobat appears to use these strategies:
If RV is set and the value of V equals the value of RV without the rich text markup, Adobe Acrobat assumes the value of RV to be up-to-date and generates an appearance from this rich text string according to the PDF specification. Else the value of RV (if present at all) is assumed to be outdated and ignored!
Otherwise, if the V value contains rich text markup, Adobe Acrobat assumes this value to be rich text and creates the appearance according to this styling.
This is not according to the PDF specification.
Probably some software products used to falsely put the rich text into the V value and Adobe Acrobat started to support this misuse for larger compatibility.
Otherwise the V value is used as a plain string and an appearance is generated accordingly.
This explains why the OP's original approach using only
field.setRichTextValue(val);
showed no change - the rich text value was ignored by Adobe Acrobat.
And it also explains his observation
then instead of setRichTextValue simply using field.setValue("<body xmlns=\"http://www.w3.org/1999/xhtml\"><p style=\"color:#FF0000;\">Red
</p><p style=\"color:#1E487C;\">Blue
</p></body>") works ! in acrobat reader (without flatten) the field is correctly formatted
Be aware, though, that this is beyond the PDF specification. If you want to generate valid PDF, you have to set both RV and V and have the latter contain the plain version of the rich text of the former.
For example use
String val = "<?xml version=\"1.0\"?>"
+ "<body xfa:APIVersion=\"Acroform:2.7.0.0\" xfa:spec=\"2.1\" xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:xfa=\"http://www.xfa.org/schema/xfa-data/1.0/\">"
+ "<p dir=\"ltr\" style=\"margin-top:0pt;margin-bottom:0pt;font-family:Helvetica;font-size:12pt\">"
+ "This is 12pt font, while "
+ "<span style=\"font-size:8pt\">this is 8pt font.</span>"
+ " OK?"
+ "</p>"
+ "</body>";
String valClean = "This is 12pt font, while this is 8pt font. OK?";
field.setValue(valClean);
field.setRichTextValue(val);
or
String val = "<body xmlns=\"http://www.w3.org/1999/xhtml\"><p style=\"color:#FF0000;\">Red
</p><p style=\"color:#1E487C;\">Blue
</p></body>";
String valClean = "Red\rBlue\r";
field.setValue(valClean);
field.setRichTextValue(val);

How to embed font into pdf/a using iText7

I'm trying to see how to embed fonts into my pdf/a.
I found a lot of answer but using iTextSharp.
In my cas I use iText7 and all I tried gave me the error:
"All the fonts must be embedded..."
I have a ttf file for my font but I didn't find a way to embed it into my pdf to use it...
Could someone help me?
Thanks in advance
kor6k
As documented in the tutorial and as indicated by the error you mention ("All the fonts must be embedded"), you need to embed the fonts.
You are probably not defining a font, in which case the standard Type 1 font Helvetica will be used. These standard Type 1 fonts are never embedded, hence you need to pick another font.
The example from the tutorial uses the free font FreeSans:
public const String FONT = "resources/font/FreeSans.ttf";
The font object is defined like this:
PdfFont font = PdfFontFactory.CreateFont(FONT, PdfEncodings.WINANSI, true);
This font is used in a Paragraph like this:
Paragraph p = new Paragraph();
p.SetFont(font);
p.Add(new Text("Font is embedded"));
document.Add(p);
This is the C# version. If you need the Java version, take a look at the Java version of the tutorial:
public static final String FONT = "src/main/resources/font/FreeSans.ttf";
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.setFont(font);
p.add(new Text("Font is embedded"));
document.add(p);
If you already use this approach, and you still get the error, you probably have some content somewhere for which you didn't define a font that is embedded.

How does one enter a 'checkbox' character on a pdf generated by report4pdf?

So I am working on generating PDFs using the report4PDF package(bob nemec) from the VisualWorks 8.1 software from Cincom. I am doing everything in 'smalltalk'.
However right now, the issue I am facing is that I can't get a checkbox
character to show up on the PDF.
So my code would go along like this:
pdfDocument := Report4PDF.R4PReport new.
exporter := SAGETEAPDFDataExporter onDocument: pdfDocument.
exporter currentText text string:' Available'.
"Followed by relevant code to save PDF"
But what shows up on my PDF is basically ' Available'. A space appears instead of the checkbox symbol. I even tried using dingbats codes(e.g: #9744 ). Works with the copyright, alpha, gamma symbols. Not with the checkbox symbol.
I tried updating my VisualWorks image from the public repository using the report4pdf, pdf development and fonts development packages. Ran into some
issues which I wont mention since it will derail us from the topic.
Thanks in Advance!
Okay... So I ended up finding a solution to this question. I will just
post the answer here just in case anyone else gets in a similar situation.
pdfDocument := Report4PDF.R4PReport new.
exporter := SAGETEAPDFDataExporter onDocument: pdfDocument.
exporter currentText text:[:text|
text string zapfDingbats ;string:'q'.
text string helvetica; string:'Available' ].
So you can use dingbats font to get a similar character for checkbox. You use
mixed fonts to get something like this:' (Checkbox) Available'.
So that's like the string is: 'q Available'. But 'q' is of the dingbats font while the 'Available' substring is of Helvetica.
Hope that helped. And thank you again #Leandro for trying to help me :)
Cheers!

How can I get special characters using elm-html module?

Disclaimer: I'm brand new to Elm
I'm fiddling around with the online Elm editor and I've run into an issue. I can't find a way to get certain special characters (copyright, trademark, etc.) to show up. I tried:
import Html exposing (text)
main =
text "©"
All that showed up was the actual text ©. I also tried to use the unicode character for it \u00A9 but that ended up giving me a syntax error:
(line 1, column 16): unexpected "u" expecting space, "&" or escape code
The only way I found was to actually go to someone's website and copy/paste their copyright symbol into my app:
import Html exposing (text)
main =
text "©"
This works, but I would much rather be able to type these characters out quickly instead of having to hunt down the actual symbols on other websites. Is there a preferred/recommended method of getting non-escaped text when returning HTML in Elm?
Edit:
Specifically for Mac:
option+g gives you ©
option+2 gives you ™
option+r gives you ®
All tested in the online editor and they worked. This still doesn't attack the core issue, but it's just something nice to note for these specific special characters.
Why this is (intentionally) not so easy
The "makers" of Elm are understandably reluctant to give us a way to insert "special" characters into HTML text. Two main reasons:
This would open a "text injection" hole where a malicious user could insert any HTML tags, even JavaScript code, into a Web page. Imagine if you could do that in a forum site like Stack Overflow: you could trick anyone reading your contribution into executing code of your choosing in their browser.
Elm works hard to produce optimal DOM updates. This only works with the content of tags that Elm is aware of, not with text that happens to contain tags. When people insert text containing HTML tags in an Elm program, there end up being parts of the DOM that can't be optimized.
How it's possible anyway
That said, the Elm user community has found a loophole that affords a workaround. For the reasons above, it's not recommended, especially not if your text is non-constant, i.e. comes from a source outside your program. Still, people will be wanting to do this anyway so I'm going to document it to save others the trouble I had digging everything up and getting it working:
If you don't already have it,
import Json.Encode exposing (string)
This is in package elm-lang/core so it should already be in your dependencies.
Similarly,
import Html.Attributes exposing (property)
Finally, create a tag having a property "innerHTML" and a JSON-Value representation of your text, e.g.:
span [ property "innerHTML" (string " ") ] []
I found, that there is a better solution:
you can convert special characters from Unicode to char, and then create a string from char:
resString = String.fromChar (Char.fromCode 187)
You can use directly the unicode escape code in Elm strings and chars:
We have a util module containing all our special chars like:
module Utils.SpecialChars exposing (noBreakSpace)
noBreakSpace : Char
noBreakSpace = '\x00A0'
Which can be used as:
let
emptyLabel = String.fromChar noBreakSpace
in
label []
[ span [ ] [ text emptyLabel ]
]
This will render a <span> </span>
I recently created an Elm package that solves this. If you use text' "©" it'll render the copyright symbol © instead of the escape code. Also works with "©" and "©". Hope this helps!
You don't need to hunt the symbols, you can get them from a list like this one.
If it's too bothersome to copy & paste, you can also create a helper function that you can use with your escaped characters like this:
import Html exposing (..)
import String
htmlDecode str =
let
replace (s1, s2) src= String.join s2 <| String.split s1 src
chrmap =
[ ("®", "®")
, ("©", "©" )
]
in
List.foldl replace str chrmap
main = text <| htmlDecode "hello ©"

Docx4j v3 Docx to HTML with Images

I'm working to convert a docx to html using Docx4j version 3.
The document contains white space consisting of tabs, spaces and newlines. The resulting HTML either has unrecognized characters or does not preserve whitespace at all.
The java code I'm using is:
WordprocessingMLPackage wordMLPackage = Docx4J.load(is);
HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath( System.getProperty("user.dir") + uploadedImagesDirectory );
htmlSettings.setWmlPackage(wordMLPackage);
Docx4J.toHTML(htmlSettings, out, Docx4J.FLAG_EXPORT_PREFER_XSL);
String result = ((ByteArrayOutputStream)out).toString();
How can I preserve the whitespace in the document. Also, is there a method to apply css to a particular node? Specifically, I have 3 images which should be evenly spaced horizontally on the page.
I've looked over the documentation and searched online with no success.
Thank you.
I resolved the issue and it was not related to Docx4j.
Docx4j parsed the document perfectly! The problem was related to sending the output in an email.
I set the Spring helper javamail mime encoding to resolve this issue:
MimeMessageHelper message = new MimeMessageHelper(mimeMessage, true, "utf-8");