Cannot display Unicode Characters (like λ) in PDF output of Jupyter - pdf

I'm using Julia in a jupyter notebook.
I want to generate a pdf for the results of my work. However, when generating the pdf, the λ of the mathematical expression λ=3 is lost so that the output in the pdf is =3.
Here is the jupyter notebook code
In[1]: λ=3
Out[1]: 3
Here is the pdf generated with the jupyter notebook
In[1]: =3
Out[1]: 3
This is not the case in with the pdf generated with nteract where the expression λ=3 is fully printed out. However the overall appearance of the pdf generated with nteract is not as nice as the pdf generated with jupyter notebook.
Here is printed pdf generated with nteract (looks exactly the same as the code itself):
In[1]: λ=3
Out[1]: 3
Can somebody know how to print such character with jupyter notebook?
Many thanks in advance

The issue is related to how Jupyter itself generates and compiles the latex file. Jupyter, by default, compiles the file with xelatex to support Unicode. My guess is, however, xelatex requires some configurations in the files and Jupyter does not generate the file that works out of the box with plain xelatex command.
You can change the configuration of Jupyter to compile latex file generated with pdflatex or latex command.
Solution:
Find your Jupyter configuration directory(i.e. output of jupyter --config-dir, on Linux usually ~/.jupyter. To find out which jupyter IJulia uses run using IJulia; IJulia.jupyter then find out the config directory of that jupyter)
Create the file jupyter_notebook_config.py in this directory if there is not one, already.
Put the following line of code at the end of this file and save it:
c.PDFExporter.latex_command = ['pdflatex', '{filename}']
Then you can export the PDF file using the notebook, as usual. The characters should appear, just right. This should work provided that pdflatex command is found in your shell.
If you do not have pdflatex but have latex you can also use the following line instead of the code above:
c.PDFExporter.latex_command = ['latex', '--output-format=pdf', '{filename}']
If you are not able to change the configuration of Jupyter, download the latex file and compile it with the command latex --output-format=pdf filename.tex.
Hope it works!

My solution to this is ugly, and I will get to it below, but first it is important to understand why this is happening.
Why is this happening
The intermediate .tex file that is generated indirectly calls for the Latin Modern fonts. Latin Modern is a fine choice for math fonts, but it is sucky for monospaced. The Latin Modern mono font does not include Greek.
Latin Modern is set by the unicode-math LaTeX package, which is loaded in the generated LaTeX around line 43.
\ifPDFTeX
\usepackage[T1]{fontenc}
\IfFileExists{alphabeta.sty}{
\usepackage{alphabeta}
}{
\usepackage[mathletters]{ucs}
\usepackage[utf8x]{inputenc}
}
\else
\usepackage{fontspec}
\usepackage{unicode-math}
\fi
So the unicode-math package will be loaded if you are using XeLaTeX (which is a good default) or LuaTeX or any other LaTeX engine for which fontspec is available.
The unicode-math package very reasonably uses Latin Modern for math, but if nothing is set otherwise, it will also use Latin Modern for monospaced fonts. From the documentation
Once the package is loaded, traditional TFM-based maths fonts are no longer supported; you can only switch to a different OpenType maths font using the \setmathfont command. If you do not load an OpenType maths font before \begin{document}, Latin Modern Math will be loaded automatically.
The creators of unicode-math assume that you will set your non-math fonts up after you have loaded the unicode-math, but that isn't done with the .tex generated by jupyter nbconvert. (I don't know if this is a jupyter thing or a Pandoc thing, but either way we end up with a document that is used Latin Modern for more than just math.)
So one solution is to set some other mono font after unicode-math is loaded and before \begin{Document}.
My solution
My solution is tuned for what I already had set up. It may not be the right approach for you, and it certainly will need some adjusting.
My Makefile used to have a simple juypter nbconvert --to=pdf in it. But now I need to edit the intermediate .tex file. So I have this for a notebook named computation-examples. You will need to use your own file name or do some Make rule magic.
# sed on macOS is just weird. Resorting to perl
computation-examples.tex: computation-examples.ipynb
jupyter nbconvert --to=latex $<
perl -pi -e '/^([ \t]*)\\usepackage{unicode-math}/ and $$_.="$$1\\usepackage[default]{fontsetup}\n"' $#
The perl adds the line \usepackage[default]{fontsetup} immediately after the line with\usepackage{unicode-math}. There are probably nicer ways to do that. I started with sed, but gave up. So the .tex file that is then processed to PDF by XeLaTeX has this.
\else
\usepackage{fontspec}
\usepackage{unicode-math}
\usepackage[default]{fontsetup}
\fi
The fontsetup package preserves all of the goodness of unicode-math while setting up the non-math founts. The default is to use the OpenType (.otf) Computer Modern Unicode fonts that will be a part of any TeX distribution that has xelatex on it.
Another approach
Probably a cleaner approach, but one I haven't experimented with, would be to create a fontspec.cfg file which lies about (overrides) what font files to load for what we are calling Latin Modern Mono. I would need to reread the fontspec documentation for the hundredths time to do that.
Make magic
Since writing the above, I have set up a more general Makefile rule,
%.tex: %.ipynb
jupyter nbconvert --to=latex $<
perl -pi -e '/^([ \t]*)\\usepackage{unicode-math}/ and $$_.="$$1\\usepackage[default]{fontsetup}\n"' $#
which sits along rules to make a PDF from a .tex file.
But if you aren't using make and Makefiles, then you can wrap up that perl monstrosity into a script of your choosing.

Related

inkscape: multiple page pdf to multiple png

when I convert pdf to image in linux command line, it seems inkscape gets the best result (better quality than gs with same dpi). Unfortunately, it only converts the first page to png. How to convert every pdf page to different png file? Do I have to extract one PDF page and store to a new pdf file , then do inkscape concert, and so on?
This isn't solely using Inkscape, but you could use e.g. pdftk to split up the pdf-file into separate pages and convert every page into a png with Inkscape. For example, like this:
pdftk file.pdf burst;
l=$(ls pg_*.pdf)
for i in $l; do inkscape "$i" -z --export-dpi=300 --export-area-page --export-png="$i.png"; done
Note that pdftk burst creates pdf-files called pg_0001.pdf, etc., so if you have any files named like that, they'll be overwritten. You can remove them afterwards easily using
rm pg_*.pdf
Lu Kas' answer threw warnings for me without doing the conversion. Probably because I'm running Inkscape 1.1
However, i got it running by replacing some deprecated commands:
inkscape pdfFile.pdf --export-dpi=300 --export-area-page --export-filename=imageFile.png;
For batch processing rather than slowly looping through file by file inkscape has a shell mode for command file scripting. See https://wiki.inkscape.org/wiki/index.php/Using_the_Command_Line#Shell_mode
However like all other #file.txt scripts you need to write a custom text file. and for Windows users run against higher ranking inkscape.com not .exe
Since version 1.0 (currently 1.2) a multipage pdf of contents can be addressed for multiple outputs. for some other examples see https://inkscape.org/doc/inkscape-man.html#EXAMPLES
Commands get replaced over time so currently to export png use --export-type="xxx" to batch export a list of input files to type xxx. Thus in this case --export-type="png"
Also for pdf related inputs and support see https://wiki.inkscape.org/wiki/index.php/Using_the_Command_Line#New_options
For windows users there is a handy batchfile converter here https://gist.github.com/JohannesDeml/779b29128cdd7f216ab5000466404f11

How to convert unusual unicode characters (UTF-8) to PDF?

I would like to convert a text file containing Unicode characters in UTF-8 to a PDF file. When I cat the file or look at it with vim, everything is great, but when I open the file with LibreOffice, the formatting is off. I have tried various fonts, none of which have worked. Is there a font file somewhere on my Ubuntu 16.04 system which is used for display in a terminal window? It seems that would be the font to tell LibreOffice to use.
I am not attached to LibreOffice. Any app that will convert the text file into a PDF file is fine. I have tried txt2pdf and pandoc without success.
This is what the file looks like
To be more specific about the problem, below is an example of what the above lines look like in LibreOffice using Liberation Mono font (no mono font does better):
I answered to you by mail, but here is the answer. You are using some very specific characters; the most difficult to find being in the Miscellaneous Symbols unicode block. For instance the SESQUIQUADRATE which sould is on your second line as ⚼.
A quick search lead me to the two following candidates (for monospace fonts):
Everson Mono
GNU Unifont
As you can see, the block is partially covered by PragmataPro which is a very good font; however, I tried with an old version and found all your own characters, but an issue occured because the Sun character (rendered as ☉) seems to be printed twice wider than the other characters, but my version of this font is rather old and perhaps buggy.
Once you have chosen the font suiting your needs, you may be able to render your documents as PDF with various tools. I made all my experiments with txt2pdf which I use daily for many documents.

IPython/ Jupyter download as PDF styling

Imagine editing a typical IPython (4.x) notebook, notebook.ipynb, in the Jupyter editor. The code, graphs, and markdown get rendered exactly how you like them when previewed in the browser.
But then you "Download as PDF via LaTeX" and get something slightly different:
A centered title/ date header has been added.
The font is now serif instead of sans serif.
Section headers are numbered.
I'd like to change the default output to be a little more "what you see is what you get". In particular: I don't want a title header; I don't want numbering on my section headers; and I want sans serif font (code blocks look better with sans IMHO). How can I do this using the LaTeX custom template.tplx files and/ or the jupyter_nbconvert_config.py configuration?
I don't mind having to use the jupyter nbconvert command, but my first choice would be a one-click solution from the browser.
Thanks!
You can run the following on your notebook file from the command line (in the same directory):
ipython nbconvert --to latex notebook.ipynb
This will generate a tex file, which you can then open with a latex editor such as Texmaker. There you can edit the latex code to conform to any style you want (i.e. changing font, changing margins, changing numbering, etc.). Finally, convert the tex to pdf (most latex editors have tools for this).
Of course, this isn't an automated solution, but it allows for detailed changes and customization, so your final pdf comes out exactly as you want.
What you are looking for is to use a different latex template.
See this post for more details.
Changing style of PDF-Latex output through IPython Notebook conversion
Basically, you will need to edit your tplx files in your /nbconvert/templates/latex directory.
I'm still learning latex, but I did manage to change my default font for my documents to San-Serif by using adding this \renewcommand{\familydefault}{\sfdefault} to my article.tplx file.
Like so:
((* block docclass *))
\renewcommand{\familydefault}{\sfdefault}
\documentclass{article}
((* endblock docclass *))

Pandoc disable figure stretching from Markdown to PDF conversion

I create PDF documents from Markdown documents using the simplest pandoc command:
pandoc my.md -o my.pdf
The figures inside the PDF are all stretched, i.e: 100% width.
Which configuration should I give to pandoc to leave the figures as is without changing figure size.
Currently you cannot control that feature directly from Markdown.
In recent months there have been some discussions going on in the Pandoc developer + user community about how to best implement it and create an easy-to-use syntax, for example
![Image Caption](./path/to/image.jpg "Image Comment"){width="60%", height="150px"}
(Warning: Example only, made up on the fly and drawn out of thin air by myself -- can't remember the latest state of the discussion...) This is designed to then transfer to all the supported output formats which can contain images, not just PDF.
So this is planned to be a major new feature for the next major release of Pandoc.
As you may or may not know, Pandoc doesn't create the PDFs itself. It produces LaTeX and employs LaTeX technology (by default its pdflatex command) to convert the LaTeX to PDF (then deleting the intermediate LaTeX files).
To execute some (limited) control about how the LaTeX/PDF pages (or other outputs) look like, Pandoc uses template files. You can look at the exact template definitions your own Pandoc version uses for LaTeX/PDF output by running
pandoc -D latex
So if you are a LaTeX hacker (or know one), you are able to modify that or create your own template from scratch.
In the current release of Pandoc (v1.13.2.1), there is this code snippet in the LaTeX template:
\makeatletter
\def\maxwidth{\ifdim\Gin#nat#width>\linewidth\linewidth\else\Gin#nat#width\fi}
\def\maxheight{\ifdim\Gin#nat#height>\textheight\textheight\else\Gin#nat#height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
This should keep the original image sizes if they fit into the page width, and scale them down to the page width if they don't.
If this is not the behavior you experience with your PDF output, I suspect you are an a rather old version of Pandoc.
For using your own template instead of the builtin internal one, you can add
--template=/path/to/myown-template.latex
to the Pandoc command line.
#KurtPfeifle Thanks for your help. I updated the latex to set static width and hight for the images using the tip.
In my latex template I have:
\setkeys{Gin}{width=128pt,height=192pt,keepaspectratio}
This works great for the mobile images. But I also have a cover page, where the cover figure is now small sized.
I tried creating 2 different latex files and combining them but the figure sizes are back to being stretched:
pandoc _cover_page.md -o _cover_page.tex
pandoc ... -template=mobile_images.latex -o remaining.tex
pandoc _cover_page.tex remaining.tex -o out.pdf
Is there an easy way to combine latex files whicih obey the templates in Pandoc?
I can create 2 pdf files: cover.pdf and remaining.pdf, and combine them too. Is there an easy tool that you know?

docsplit conversion to PDF mangles non-ASCII characters in docx on Linux

My documentation management app involves converting a .docx file containing non-ASCII Unicode characters (Japanese) to PDF with docsplit (via the Ruby gem, if it matters). It works fine on my Mac. On my Ubuntu machine, the resulting PDF has square boxes where the characters should be, whether invoked through Ruby or directly on the command line. The odd thing is, when I open up the .docx file directly in LibreOffice and do a PDF export, it works fine. So it would seem there is some aspect to how docsplit invokes LO that causes the Unicode characters to be handled improperly. I have scoured various parts of the documentation and code for options that I might need to specify, with no luck. Any ideas of why this could be happening?
FWIW, docsplit invokes LO with the following options line in pdf_extractor.rb:
options = "--headless --invisible --norestore --nolockcheck --convert-to pdf --outdir #{escaped_out} #{escaped_doc}"
I notice that the output format can optionally be followed by an output filter a in pdf:output_filter_name--is this something I need to think about using?
I have tracked this down to the --headless option which docsplit passes to LibreOffice. That invokes a non-X version of LO, which apparently does not have the necessary Japanese fonts. Unfortunately, there appears to be no way to pass options to docsplit to tell it to omit the --headless option to LO, so I will end up patching or forking the code somehow.