How does one enter a 'checkbox' character on a pdf generated by report4pdf? - pdf

So I am working on generating PDFs using the report4PDF package(bob nemec) from the VisualWorks 8.1 software from Cincom. I am doing everything in 'smalltalk'.
However right now, the issue I am facing is that I can't get a checkbox
character to show up on the PDF.
So my code would go along like this:
pdfDocument := Report4PDF.R4PReport new.
exporter := SAGETEAPDFDataExporter onDocument: pdfDocument.
exporter currentText text string:' Available'.
"Followed by relevant code to save PDF"
But what shows up on my PDF is basically ' Available'. A space appears instead of the checkbox symbol. I even tried using dingbats codes(e.g: #9744 ). Works with the copyright, alpha, gamma symbols. Not with the checkbox symbol.
I tried updating my VisualWorks image from the public repository using the report4pdf, pdf development and fonts development packages. Ran into some
issues which I wont mention since it will derail us from the topic.
Thanks in Advance!

Okay... So I ended up finding a solution to this question. I will just
post the answer here just in case anyone else gets in a similar situation.
pdfDocument := Report4PDF.R4PReport new.
exporter := SAGETEAPDFDataExporter onDocument: pdfDocument.
exporter currentText text:[:text|
text string zapfDingbats ;string:'q'.
text string helvetica; string:'Available' ].
So you can use dingbats font to get a similar character for checkbox. You use
mixed fonts to get something like this:' (Checkbox) Available'.
So that's like the string is: 'q Available'. But 'q' is of the dingbats font while the 'Available' substring is of Helvetica.
Hope that helped. And thank you again #Leandro for trying to help me :)
Cheers!

Related

How to embed font into pdf/a using iText7

I'm trying to see how to embed fonts into my pdf/a.
I found a lot of answer but using iTextSharp.
In my cas I use iText7 and all I tried gave me the error:
"All the fonts must be embedded..."
I have a ttf file for my font but I didn't find a way to embed it into my pdf to use it...
Could someone help me?
Thanks in advance
kor6k
As documented in the tutorial and as indicated by the error you mention ("All the fonts must be embedded"), you need to embed the fonts.
You are probably not defining a font, in which case the standard Type 1 font Helvetica will be used. These standard Type 1 fonts are never embedded, hence you need to pick another font.
The example from the tutorial uses the free font FreeSans:
public const String FONT = "resources/font/FreeSans.ttf";
The font object is defined like this:
PdfFont font = PdfFontFactory.CreateFont(FONT, PdfEncodings.WINANSI, true);
This font is used in a Paragraph like this:
Paragraph p = new Paragraph();
p.SetFont(font);
p.Add(new Text("Font is embedded"));
document.Add(p);
This is the C# version. If you need the Java version, take a look at the Java version of the tutorial:
public static final String FONT = "src/main/resources/font/FreeSans.ttf";
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.setFont(font);
p.add(new Text("Font is embedded"));
document.add(p);
If you already use this approach, and you still get the error, you probably have some content somewhere for which you didn't define a font that is embedded.

Adding Arial Unicode MS to CKEditor

My web application allows user to write rich text inside CKEditor, then export the result as PDF with the Flying Saucer library.
As they need to write Greek characters, I chose to add Arial Unicode MS to the available fonts, by doing the following :
config.font_names = "*several fonts...*; Arial Unicode MS/Arial Unicode MS, serif";
This font is now displayed correctly in the CKEditor menu, but when I apply this font to any element, I get the following result :
<span style="font-family:arial unicode ms,serif;"> some text </span>
As you can notice, I lost the UpperCase characters. This has pretty bad effect during PDF export, as then Flying Saucer doesn't recognise the font and so uses Helvetica which does not support Unicode characters, so the greek characters are not displayed in the PDF.
If I change manually from code source
<span style="font-family:arial unicode ms,serif;"> some text </span>
to
<span style="font-family:Arial Unicode MS,serif;"> some text </span>
then it is working as expected, greek characters are displayed.
Has anyone met this problem before? Is there a way to avoid UpperCase characters to be changed to LowerCase?
I really want to avoid doing any kind of post-processing like :
htmlString = htmlString.replace("arial unicode ms", "Arial Unicode MS");
I agree with you regarding resolving this issue aside from Flying Saucer R8.
Although depending upon your workflow, would it be more efficient to allow CKEditor to preprocess and validate a completed HTML encoded file (render the entire document to HTML first)?
None of the CKEditor support tickets specify the true source of the issue, so I recommend confirming for yourself whether it is (A) a styling issue, or (B) a CSS processing issue, or (C) a peculiar CKEditor parsing issue.
A possible workaround:
Make a copy of the desired unicode font and import it into Type 3.2 (works on both Mac and Windows).
http://www.cr8software.net/type.html
rename the duplicate font set into something all lowercase.
Limit your font selection
config.font_names = "customfontnamehere";
Apply the style separately (unicode typeface greatvibes below) and see if that gives you the desired result:
var s = CKEDITOR.document.$.createElement( 'style' );
s.type = 'text/css';
cardElement.$.appendChild( s );
s.styleSheet.cssText =
'#font-face {' +
'font-family: \'GreatVibes\';' +
'src: url(\'' + path +'fonts/GreatVibes-Regular.eot\');' +
'}' +
style;
If the above does not work, you can try to modify the xmas plugin.js (also uses the unicode typeface greatvibes and does all sorts of cool manipulations before output), so it might be worth trying to modify it rather than start from scratch:
'<style type="text/css">' +
'#font-face {' +
'font-family: "GreatVibes";' +
'src: url("' + path +'fonts/GreatVibes-Regular.ttf");' +
'}' +
style +
'</style>' )
Whichever approach you try, the goal is to test various styling and see if CKEditor defaults back to Helvetica again.
Lastly, the CKEditor SDK has excellent support, so if you have the time and energy, you could write a plugin. Sounds daunting, I know, but notice how the plugin.js within the /plugins/font directory has priority for size attributes.
If you are not interested in producing your own plugin, I recommend contacting the prolific ckeditor plugin writer
doksoft
(listed both on their website and on his own website) and ask for a demo of his commercial plugin "CKEditor Special Symbols" which has broad unicode capability.
Hope that helps,
ClaireW
I didn't find any way to do it with Flying Saucer R8, but you can make it work using Flying Saucer R9.
The method
ITextResolver.addFont(String path, String fontFamilyNameOverride, String encoding, boolean embedded, String pathToPFB) allow you to add the fond with a specific name.
Code sample:
ITextRenderer renderer = new ITextRenderer();
// Adding fonts
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", "arial unicode ms", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, null);
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", "Arial Unicode MS", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, null);
String inputFile = "test.html";
renderer.setDocument(new File(inputFile));
renderer.layout();
String outputFile = "test.pdf";
OutputStream os = new FileOutputStream(outputFile);
renderer.createPDF(os);
os.close();
You can find Flying Saucer R9 on Maven.
The simplest solution (until CKEditor fixes that bug) is to do that post-processing.
You can do it on the server (really simple, you already have the code) or with a little CKEditor plugin, but that will give you the solution that you want and unless you need to add more fonts it will work without any further changes.

Can I define a text property as rich text?

VS 2013, VB, EF6
I am creating an object that will keep user input in one of its properties. I would like that user input to be stored as rich text. What's involved to make that stored text be rich text format? So,
Public Property Text as <what?>
I thought I would post what was my answer for others who might ask the question the same way I did. I begin by stating that my question was poorly formed because I didn't understand I'm not really storing RTF, I'm storing WYSIWYG text with html tags. But I think the question as phrased is useful because that's how many people think until they are taught by others.
Ultimately this process opens a serious XSS vector, but first we have to at least collect the WYSIWYG text.
First step: using a script-based editor capture the text with html tags. I used CKEditor which is easy to download on NuGet. It comes in 3 flavors: basic, standard and full. Another popular one seems to be TinyMCE also available through NuGet.
CKEditor must be 'wired in' to replace the existing input element. I replaced #html.editorfor with a < textarea > directly as follows. Model.UserPost.Body is the property into which I want to place the WYSIWYG text. The Raw helper is required so the output is NOT encoded allowing us to see our WYSIWYG text.
<textarea name="model.UserPost.Body" id="model_UserPost_Body" class="form-control text-box multi-line">
#Html.Raw(Model.UserPost.Body)
</textarea>
CKEditor is 'wired in' using a script element to replace the < textarea > element.
#Section Scripts
<script src="~/scripts/ckeditor/ckeditor.js"></script>
<script>
CKEDITOR.replace('model.UserPost.Body');
</script>
End Section
The script above can be added to all pages via _layout.vbhtml, or just the target page via a #Section Scripts section as shown above, which is often recommended and what I did, but that may also require adding to the standard _Layout the following in the < head > section such as follows.
#RenderSection("Styles", False)
In the controller POST method for the view the following code is needed to capture the WYSIWYG text otherwise the default filter will raise an exception when it detects anything that looks like an html tag.
Dim rawBody = Request.Unvalidated.Form("model.UserPost.Body")
userPost.Body = rawBody
There are some possible gotcha's; The 'body' property has to be removed from the Include:= list of the < Bind > element in the method paramter list if < Bind > is being used. Also, although not directly related to this solution, you can't have a Data Annotation like < Required() > on this property in the model because background checking won't be able to confirm that condition so the ModelState.IsValid flag won't ever go true.
Second step: before saving the input it MUST be checked for XSS. Microsoft has a nice video explaining basic XSS that I recommend viewing; it's only 11 minutes.
Mikesdotnetting has a nice explaination for dealing with XSS and shows a whitelisting algorithm toward the bottom of this page. The following code is based on his work.
To create a white listing approach, the HTML Agility Pack is useful to catalogue the HTML nodes for review. This is easily loaded from Nu Get as well. This is the code I used in the POST method to invoke the white list methods (Yes, it could be more compact, but this is easier to read for us novices):
Dim tempDoc = New HtmlDocument()
tempDoc.LoadHtml(rawBody)
RemoveNodes(tempDoc.DocumentNode, allowedTags)
userPost.Body = tempDoc.DocumentNode.OuterHtml
The allowed tags are what you will allow, which means everything else is rejected, hence whitelisting. This is just a sample list:
Dim allowedTags As New List(Of String)() From {"p", "em", "s", "ol", "ul", "li", "h1", "h2", "h3", "h4", "h5", "h6", "strong"}
These are the methods based on Mikesdotnetting page:
Private Sub RemoveNodes(ByVal node As HtmlNode, allowedTags As List(Of String))
If (node.NodeType = HtmlNodeType.Element) Then
If Not allowedTags.Contains(node.Name) Then
node.ParentNode.RemoveChild(node)
Exit Sub
End If
End If
If (node.HasChildNodes) Then
RemoveChildren(node, allowedTags)
End If
End Sub
Private Sub RemoveChildren(ByVal parent As HtmlNode, allowedTags As List(Of String))
For i = parent.ChildNodes.Count() - 1 To 0 Step -1
RemoveNodes(parent.ChildNodes(i), allowedTags)
Next
End Sub
So basically, (1) CKEditor captures user input with html tags that looks nice, (2) the raw input is specially requested in the Controller POST method and then (3) cleaned using a white list. After that it can be output directly to the page using #Html.Raw() because it can be trusted.
That's it. I've not really posted solutions like this before, so if I've missed something let me know and I'll correct or add it.
Rich Text is stored in the Rich Text Format.
The Rich Text Format specifications can be found here:
http://www.microsoft.com/en-us/download/details.aspx?id=10725
It is just an ordinary string. You can extract the string from a RichTextBox using the SaveFile function:
Private Function GetRTF(ByRef Box As RichTextBox) As String
Using ms As New IO.MemoryStream
Box.SaveFile(ms, RichTextBoxStreamType.RichText)
Return System.Text.Encoding.ASCII.GetString(ms.ToArray)
End Using
End Function
You can load text in the Rich Text Format into a RichTextBox using the LoadFile method of the RichTextBox. The text needs to be in the correct format:
Dim rtf As String = "{\rtf1 {\colortbl;\red0\green0\blue255;\red255\green0\blue0;}Guten Tag!\line{\i Dies} ist ein\line formatierter {\b Text}.\line Das {\cf1 Ende}.}"
Using ms As New IO.MemoryStream(System.Text.Encoding.ASCII.GetBytes(rtf))
RichTextBox1.LoadFile(ms, RichTextBoxStreamType.RichText)
End Using
Ordinary controls usually will not interpret this format in their text property.

Docx4J: Vertical text frame not exported to PDF

I'm using Docx4J to make an invoice model.
In the left-side of the page, it's usual to show a legal sentence as: Registered company in ... Book ... Page ...
I have inserted this in my template with a Word text frame.
Well, my issue is: when exporting to .docx, this legal text is shown perfect, but when exporting to .pdf, it's shown as an horizontal table under the other data.
The code to export to PDF is:
FOSettings foSettings = Docx4J.createFOSettings();
foSettings.setFoDumpFile(foDumpFile);
foSettings.setWmlPackage(template);
fos = new FileOutputStream(new File("/C:/mypath/prueba_OUT.pdf"));
Docx4J.toFO(foSettings, fos, Docx4J.FLAG_EXPORT_PREFER_XSL);
Any help would be very appreciated.
Thanks.
You'd need to extend the PDF via FO code; see further How to correctly position a header image with docx4j?
Float left may or may not be easy; similarly the rotated text.
In general, the way to work on this is to take the FO generated by docx4j, then hand edit it to something which FOP can convert to a PDF you are happy with. If you can do that, then its a matter of modifying docx4j to generate that FO.

Tika - how to extract text from PDF text: underlined, highlighted, crossed out

I'm using Tika* to parse a PDF file.
There are no problems to retrieve the document's text, but I don't figure out how to extract text:
underlined
highlighted
crossed out
Adobe Writer gives you different text edit options, but I'm not able to see where they are "hidden".
Is there a solution to extract these metadata information? (underline, highligh ...)
Do you know if Tika is able to extract this data?
*http://tika.apache.org/
Wow. 4 years is a long time to wait for an answer, and I figure you have found a solution by now. Anyways, for the sake of those who would visit this link, the answer is Yes. Apache Tika can extract not just text in a document, but also the formatting as well (e.g. bold, italicized). This was my Scenario:
//inputStream is the document you wish to parse from.
AutoDetectParser parser = new AutoDetectParser();
ContentHandler handler = new BodyContentHandler(new ToXMLContentHandler());
Metadata metadata = new Metadata();
parser.parse(inputStream,handler,metadata);
System.out.println(handler.toString());
The print statement prints an XML of your document. With a little work of cleaning up the XML (really HTML tags), you would be left with tags like < b >text< /b> for bold text and < i >text < / i > for italicized text. Then you could find a way to render it. Good luck.