How to format images with markdown, Sphinx latexpdf - pdf

I have a markdown document that I wish to output in pdf format. I cannot find a way to format the images correctly.
If I use
<center><img src="_images/parameter_form.png" alt="Select Parameters" width="400"></center>
I works perfectly with make html, but with make latexpdf the image does not appear at all.
If I use (based on this stack overflow answer)
![Select Parameters](_images/parameter_form.png) { width=400 }
The images is not sized and the format string { width=400 } appears in the text
What am I doing wrcng?

The answer is to use MyST. This Sphinx parser allows a richer syntax in markdown and specifically the image tag
```{image} _images/parameter_form.png
:alt: Select Parameters
:width: 400px
:align: center
```
Perfect!

Related

pdf.js get info about embedded fonts

I am using pdf.js. Fetching the Text I get blocks with font info
Object {
str: "blabla",
dir: "ltr",
width: 191.433141,
height: 12.546,
transform: Array[6],
fontName: "g_d0_f2"
}
Is it possible to get somehow more information about g_d0_f2.
Notice the PDF.js getTextContent will not and not suppose to match glyphs in PDFs. The PDF32000 specification has two different algorithms for text display and extraction. Even if you can lookup font data in the page.commonObjs, it might not be really helpful for extracted text content display due to glyphs encoding mismatch.
The page's getTextContent is doing text extraction and getOperatorList gets (glyph) display operators. See how src/display/svg.js renderer displays glyphs.

Adding Arial Unicode MS to CKEditor

My web application allows user to write rich text inside CKEditor, then export the result as PDF with the Flying Saucer library.
As they need to write Greek characters, I chose to add Arial Unicode MS to the available fonts, by doing the following :
config.font_names = "*several fonts...*; Arial Unicode MS/Arial Unicode MS, serif";
This font is now displayed correctly in the CKEditor menu, but when I apply this font to any element, I get the following result :
<span style="font-family:arial unicode ms,serif;"> some text </span>
As you can notice, I lost the UpperCase characters. This has pretty bad effect during PDF export, as then Flying Saucer doesn't recognise the font and so uses Helvetica which does not support Unicode characters, so the greek characters are not displayed in the PDF.
If I change manually from code source
<span style="font-family:arial unicode ms,serif;"> some text </span>
to
<span style="font-family:Arial Unicode MS,serif;"> some text </span>
then it is working as expected, greek characters are displayed.
Has anyone met this problem before? Is there a way to avoid UpperCase characters to be changed to LowerCase?
I really want to avoid doing any kind of post-processing like :
htmlString = htmlString.replace("arial unicode ms", "Arial Unicode MS");
I agree with you regarding resolving this issue aside from Flying Saucer R8.
Although depending upon your workflow, would it be more efficient to allow CKEditor to preprocess and validate a completed HTML encoded file (render the entire document to HTML first)?
None of the CKEditor support tickets specify the true source of the issue, so I recommend confirming for yourself whether it is (A) a styling issue, or (B) a CSS processing issue, or (C) a peculiar CKEditor parsing issue.
A possible workaround:
Make a copy of the desired unicode font and import it into Type 3.2 (works on both Mac and Windows).
http://www.cr8software.net/type.html
rename the duplicate font set into something all lowercase.
Limit your font selection
config.font_names = "customfontnamehere";
Apply the style separately (unicode typeface greatvibes below) and see if that gives you the desired result:
var s = CKEDITOR.document.$.createElement( 'style' );
s.type = 'text/css';
cardElement.$.appendChild( s );
s.styleSheet.cssText =
'#font-face {' +
'font-family: \'GreatVibes\';' +
'src: url(\'' + path +'fonts/GreatVibes-Regular.eot\');' +
'}' +
style;
If the above does not work, you can try to modify the xmas plugin.js (also uses the unicode typeface greatvibes and does all sorts of cool manipulations before output), so it might be worth trying to modify it rather than start from scratch:
'<style type="text/css">' +
'#font-face {' +
'font-family: "GreatVibes";' +
'src: url("' + path +'fonts/GreatVibes-Regular.ttf");' +
'}' +
style +
'</style>' )
Whichever approach you try, the goal is to test various styling and see if CKEditor defaults back to Helvetica again.
Lastly, the CKEditor SDK has excellent support, so if you have the time and energy, you could write a plugin. Sounds daunting, I know, but notice how the plugin.js within the /plugins/font directory has priority for size attributes.
If you are not interested in producing your own plugin, I recommend contacting the prolific ckeditor plugin writer
doksoft
(listed both on their website and on his own website) and ask for a demo of his commercial plugin "CKEditor Special Symbols" which has broad unicode capability.
Hope that helps,
ClaireW
I didn't find any way to do it with Flying Saucer R8, but you can make it work using Flying Saucer R9.
The method
ITextResolver.addFont(String path, String fontFamilyNameOverride, String encoding, boolean embedded, String pathToPFB) allow you to add the fond with a specific name.
Code sample:
ITextRenderer renderer = new ITextRenderer();
// Adding fonts
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", "arial unicode ms", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, null);
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", "Arial Unicode MS", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, null);
String inputFile = "test.html";
renderer.setDocument(new File(inputFile));
renderer.layout();
String outputFile = "test.pdf";
OutputStream os = new FileOutputStream(outputFile);
renderer.createPDF(os);
os.close();
You can find Flying Saucer R9 on Maven.
The simplest solution (until CKEditor fixes that bug) is to do that post-processing.
You can do it on the server (really simple, you already have the code) or with a little CKEditor plugin, but that will give you the solution that you want and unless you need to add more fonts it will work without any further changes.

Docx4J: Vertical text frame not exported to PDF

I'm using Docx4J to make an invoice model.
In the left-side of the page, it's usual to show a legal sentence as: Registered company in ... Book ... Page ...
I have inserted this in my template with a Word text frame.
Well, my issue is: when exporting to .docx, this legal text is shown perfect, but when exporting to .pdf, it's shown as an horizontal table under the other data.
The code to export to PDF is:
FOSettings foSettings = Docx4J.createFOSettings();
foSettings.setFoDumpFile(foDumpFile);
foSettings.setWmlPackage(template);
fos = new FileOutputStream(new File("/C:/mypath/prueba_OUT.pdf"));
Docx4J.toFO(foSettings, fos, Docx4J.FLAG_EXPORT_PREFER_XSL);
Any help would be very appreciated.
Thanks.
You'd need to extend the PDF via FO code; see further How to correctly position a header image with docx4j?
Float left may or may not be easy; similarly the rotated text.
In general, the way to work on this is to take the FO generated by docx4j, then hand edit it to something which FOP can convert to a PDF you are happy with. If you can do that, then its a matter of modifying docx4j to generate that FO.

How to correctly position a header image with docx4j?

I am trying to convert this Word document with a header showing an image on the right
http://www.filesnack.com/files/cduiejc7
to PDF using this sample code:
https://github.com/plutext/docx4j/blob/master/src/samples/docx4j/org/docx4j/samples/ConvertOutPDF.java
Here's the result:
http://www.filesnack.com/files/ctjs659h
While the Word document has the header image on the right, the converted PDF shows it on the left.
How can I make docx4j to reproduce the original document as PDF?
Your image is positioned relative to a paragraph:
<w:drawing>
<wp:anchor distT="0" distB="0" distL="114300" distR="114300" simplePos="0" relativeHeight="251658240" behindDoc="0" locked="0" layoutInCell="1" allowOverlap="1" wp14:anchorId="791936E3" wp14:editId="575B92C8">
<wp:simplePos x="0" y="0"/>
<wp:positionH relativeFrom="column">
<wp:posOffset>5317388</wp:posOffset>
</wp:positionH>
<wp:positionV relativeFrom="paragraph">
<wp:posOffset>-325755</wp:posOffset>
</wp:positionV>
docx4j potential to support stuff like that in PDF output is limited by what XSL FO supports. See docx4j's TextBoxTest class for what we can do with text boxes.
Currently, although we can position some textBoxes; we don't do the same for floating images: https://github.com/plutext/docx4j/issues/127
In the meantime, a possible workaround for some cases (eg float right) is to use a table.
Or possibly, you could try putting the image inside a text box!

How to find x,y location of a text in pdf

Is there any tool to find the X-Y location on a text content in a pdf file ?
Docotic.Pdf Library can do it. See C# sample below:
using (PdfDocument doc = new PdfDocument("your_pdf.pdf"))
{
foreach (PdfTextData textData in doc.Pages[0].Canvas.GetTextData())
Console.WriteLine(textData.Position + " " + textData.Text);
}
Try running "Preflight..." in Acrobat and choosing PDF Analysis -> List page objects, grouped by type of object.
If you locate the text objects within the results list, you will notice there is a position value (in points) within the Text Properties -> * Font section.
TET, the Text Extraction Toolkit from the pdflib family of products can do that. TET has a commandline interface, and it's the most powerful of all text extraction tools I'm aware of. (It can even handle ligatures...)
Geometry
TET provides precise metrics for the text, such as the position on the page, glyph widths, and text direction. Specific areas on the page can be excluded or included in the text extraction, e.g. to ignore headers and footers or margins.