Table of contents sidebar in Sphinx LaTeX PDF - pdf

I am generating a LaTeX document from Sphinx, and converting it to PDF using pdflatex (from MikTeX). The document is missing a table of contents in the sidebar of the PDF viewer.
If I add manually \usepackage{hyperref} to the tex file, it works. But how can I tell Sphinx to do it in the conf.py project file? There is no (evident) related option in the latex output options.
Thanks!

Section 2.5.3 Customizing the rendering of the Sphinx document mentions:
LaTeX preamble
Additional commands may be added as preamble in the generated LaTeX file. This is easily done by editing file conf.py:
f = open('latex-styling.tex', 'r+');
PREAMBLE = f.read();
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'a4paper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
'preamble': PREAMBLE
}
This will copy the contents of file latex-styling.tex (in same directory as conf.py) to the generated LaTeX document. For instance, if latex-styling.tex reads:
% My personal "bold" command
\newcommand{\mycommand}[1]{\textbf{#1}}
the generated LaTeX document becomes:
% Generated by Sphinx.
\def\sphinxdocclass{report}
\documentclass[a4paper,10pt,english]{sphinxmanual}
% snip (packages)
% My personal "bold" command
\newcommand{\mycommand}[1]{\textbf{#1}}
\title{My Extension Documentation}
\date{2013-06-30 22:25}
\release{1.0.0}
\author{Xavier Perseguers}
Other options
The configuration file conf.py lets you further tune the rendering with LaTeX. Please consult http://www.sphinx-doc.org/en/stable/config.html#options-for-latex-output for further instructions.
A more direct way of adding content rather than inserting it in a separate file (say, latex-styling.tex), is to specify if verbatim. The next subsection in the documentation mentions this for a specific package typo3:
TYPO3 template
We want to stick as much as possible to default rendering, to avoid having to change the LaTeX code generation from Sphinx. As such, we choose to include a custom package typo3 (file typo3.sty) that will override some settings of package sphinx. To include it automatically, we simply use the preamble option of conf.py:
latex_elements = {
# Additional stuff for the LaTeX preamble.
'preamble': '\\usepackage{typo3}'
}
It's better to contain your styling options in a separate latex-styling.tex file that you can include using the preamble key via an f.read(). That way you don't have to update conf.py. Compartmentalization is usually better.

Related

Pandoc: generate compilable .tex from markdown

I have started using Markdown to write my Latex PDFs, and so far I am impressed by the amount of boilerplate it takes away.
However, I find Markdown not as expressive as Tex, and therefore in some situations would like to write the document in Markdown, convert to tex, then add some Latex-only stuff and only then convert to PDF.
However, converting .md to .tex with Pandoc does not yield an compilable file: it only contains the body of the file, not the "document setup".
Example, the following .md file:
```haskell
data Expr = I Int
```
Converts to:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{data} \DataTypeTok{Expr} \FunctionTok{=} \DataTypeTok{I} \DataTypeTok{Int}
\end{Highlighting}
\end{Shaded}
Obviously this is missing some stuff like the document class, start of document and the imported packages. Is there any way to generate this complete file instead of just the body? Or if not, can anyone at least tell me what package the Shaded, Highlighting, KeywordTok, DataTypeTok and FunctionTok commands are pulled from? Then I can add these imports myself.
Pandoc creates small snippets by default. Invoke it with the --standalone (or -s) command line flag to get a full document.

Change font size of ATX-header in markdown

I am writing a book with bookdown. Unfortunately, I have no clue how to format (e.g. setting font size) ATX-hearder (#, ##, ## etc.). So far, it does not work via pandoc or preamble.tex.
I have tried the following, with regard to this.
Unfortunately, there is an error message :
\usepackage{titlesec} \titleformat{\chapter}[display] {\normalfont\sffamily\huge\bfseries\color{blue}} {\chaptertitlename\ \thechapter}{20pt}{\Huge}
Thanks in advance!
Your best bet here is to add a LaTeX preamble to the document. In here, you can define the required LaTeX packages. Two changes are made to the base template:
We need to add subparagraph: true to make titlesec work with R Markdown, as explained here
# refers to a level one header in pandoc, and therefore you need to make the style changes for section not chapter https://www.sharelatex.com/learn/Sections_and_chapters
Here is a minimal example
---
output:
pdf_document:
includes:
in_header: header.tex
subparagraph: true
---
# Section
## Subsection
The preamble.tex file is saved in the same directory:
\usepackage{titlesec}
\usepackage{color}
\titleformat*{\section}{\LARGE}
\titleformat{\subsection}[display]
{\normalfont\sffamily\huge\bfseries\color{blue}} {\chaptertitlename\ \thechapter}{20pt}{\Huge}

Pandoc set jobname for LaTeX PDF export

Is there a way to tell Pandoc to set \jobanme to a specific value while converting and compiling single markdown file to PDF (via LaTeX)? -Preferably the name of the source *.md file.
background:
I have my own LaTeX document class defined which uses \jobname.
It prints it in the document footer, so that it's easy for me to find source file/repo having a printed PDF.
I set jobname in my compile scripts as pdfLaTeX argument.
I am currently trying to use my document class as LaTeX template for documents processed by Pandoc from Markdown source. It seems, Pandoc sets \jobname always as 'input'.
I can set any variable in Markdown's yaml header which may be then printed into PDF, but being able to set it based on true md file name will be much less error prone.
I solved my problem by redefining my LaTeX template and using sourcefile pandoc variable instead of \jobname in case of using pandoc.

Convert PDF text into outlines?

Does anybody know a way to vectorize the text in a PDF document? That is, I want each letter to be a shape/outline, without any textual content. I'm using a Linux system, and open source or a non-Windows solution would be preferred.
The context: I'm trying to edit some old PDFs, for which I no longer have the fonts. I'd like to do it in Inkscape, but that will replace all the fonts with generic ones, and that's barely readable. I've also been converting back and forth using pdf2ps and ps2pdf, but the font info stays there. So when I load it into Inkscape, it still looks awful.
Any ideas? Thanks.
To achieve this, you will have to:
Split your PDF into individual pages;
Convert your PDF pages into SVG;
Edit the pages you want
Reassemble the pages
This answer will omit step 3, since that's not programmable.
Splitting the PDF
If you don't want a programmatic way to split documents, the modern way would be with using stapler. In your favorite shell:
stapler burst file.pdf
Would generate {file_1.pdf,...,file_N.pdf}, where 1...N are the PDF pages. Stapler itself uses PyPDF2 and the code for splitting a PDF file is not that complex. The following function splits a file and saves the individual pages in the current directory. (shamelessly copying from the commands.py file)
import math
import os
from PyPDF2 import PdfFileWriter, PdfFileReader
def split(filename):
with open(filename) as inputfp:
inputpdf = PdfFileReader(inputfp)
base, ext = os.path.splitext(os.path.basename(filename))
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(inputpdf.getNumPages()))),
'd',
ext
])
for page in range(inputpdf.getNumPages()):
outputpdf = PdfFileWriter()
outputpdf.addPage(inputpdf.getPage(page))
outputname = output_template % (page + 1)
with open(outputname, 'wb') as fp:
outputpdf.write(fp)
Converting the individual pages to SVG
Now to convert the PDFs to editable files, I'd probably use pdf2svg.
pdf2svg input.pdf output.svg
If we take a look at the pdf2svg.c file, we can see that the code in principle is not that complex (assuming the input filename is in the filename variable and the output file name is in the outputname variable). A minimal working example in python follows. It requires the pycairo and pypoppler libraries:
import os
import cairo
import poppler
def convert(inputname, outputname):
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
# We only have one page, since we split prior to converting. Get the page
page = pdffile.get_page(0)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
At this point you should have an SVG in which all text has been converted to paths, and will be able to edit with Inkscape without rendering issues.
Combining steps 1 and 2
You can call pdf2svg in a for loop to do that. But you would need to know the number of pages beforehand. The code below figures the number of pages and does the conversion in a single step. It requires only pycairo and pypoppler:
import os, math
import cairo
import poppler
def convert(inputname, base=None):
'''Converts a multi-page PDF to multiple SVG files.
:param inputname: Name of the PDF to be converted
:param base: Base name for the SVG files (optional)
'''
if base is None:
base, ext = os.path.splitext(os.path.basename(inputname))
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
pages = pdffile.get_n_pages()
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(pages))),
'd',
'.svg'
])
# Iterate over all pages
for nthpage in range(pages):
page = pdffile.get_page(nthpage)
# Output file name based on template
outputname = output_template % (nthpage + 1)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
# Free some memory
surface.finish()
Assembling the SVGs into a single PDF
To reassemble you can use the pair inkscape / stapler to convert the files manually. But it is not hard to write code that does this. The code below uses rsvg and cairo. To convert from SVG and merge everything into a single PDF:
import rsvg
import cairo
def convert_merge(inputfiles, outputname):
# We have to create a PDF surface and inform a size. The size is
# irrelevant, though, as we will define the sizes of each page
# individually.
outputsurface = cairo.PDFSurface(outputname, 1, 1)
outputcontext = cairo.Context(outputsurface)
for inputfile in inputfiles:
# Open the SVG
svg = rsvg.Handle(file=inputfile)
# Set the size of the page itself
outputsurface.set_size(svg.props.width, svg.props.height)
# Draw on the PDF
svg.render_cairo(outputcontext)
# Finish the page and start a new one
outputcontext.show_page()
# Free some memory
outputsurface.finish()
PS: It should be possible to use the command pdftocairo, but it doesn't seem to call render_for_printing(), which makes the output SVG maintain the font information.
I'm afraid to vectorize the PDFs you would still need the original fonts (or a lot of work).
Some possibilities that come to mind:
dump the uncompressed PDF with pdftk and discover what the font names are, then look for them on FontMonster or other font service.
use some online font recognition service to get a close match with your font, in order to preserve kerning (I guess kerning and alignment are what's making your text unreadable)
try replacing the fonts manually (again pdftk to convert the PDF to a PDF which is editable with sed. This editing will break the PDF, but pdftk will then be able to recompress the damaged PDF to a useable one).
Here's what you really want - font substitution. You want some code/app to be able to go through the file and make appropriate changes to the embedded fonts.
This task is doable and is anywhere from easy to non-trivial. It's easy when you have a font that matches the metrics of the font in the file and the encoding used for the font is sane. You could probably do this with iText or DotPdf (the latter is not free beyond the evaluation, and is my company's product). If you modified pdf2ps, you could probably manage changing the fonts on the way through too.
If the fonts used in the file are font subsets that have creative reencoding, then you are in hell and will likely have all manner of pain doing the change. Here's why:
PostScript was designed at a point when there was no Unicode. Adobe used a single byte for characters and whenever you rendered any string, the glyph to draw was taken from a 256 entry table called the encoding vector. If a standard encoding didn't have what you wanted, you were encouraged to make fonts on the fly based on the standard font that differed only in encoding.
When Adobe created Acrobat, they wanted to make transition from PostScript as easy as possible so that font mechanism was modeled. When the ability to embed fonts into PDFs was added, it was clear that this would bloat the files, so PDF also included the ability to have font subsets. Font subsets are made by taking an existing font and removing all the glyphs that won't be used and re-encoding it into the PDF. The may be no standard relationship between the encoding vector and the code points in the file - all those may be changed. Instead, there may be an embedded PostScript function /ToUnicode which will translate encoded characters to a Unicode representation.
So yeah, non-trivial.
For the folks who come after me:
The best solutions I found were to use Evince to print as SVG, or to use the pdf2svg program that's accessible via Synaptic on Mint. However, Inkscape wasn't able to cope with the resulting SVGs--it entered an infinite loop with the error message:
File display/nr-arena-item.cpp line 323 (?): Assertion item->state & NR_ARENA_ITEM_STATE_BBOX failed
I'm giving up this quest for now, but maybe I'll try again in a year or two. In the meantime, maybe one of these solutions will work for you.

How can I change the margins on a PDF document created by Doxygen?

I am using doxygen to generate a PDF of my code documentation. The PDF has very big margins when using PAPER_TYPE = letter. It looks OK when using a4wide but I would like to have more control over it. I want to use a package called geometry but can't figure out where to add code like this:
\usepackage[top=2.9cm,left=2in,bottom=1in,right=1in]{geometry}
I would like to not have to change the doxygen-generated tex files if possible.
In your Doxyfile, add or edit the EXTRA_PACKAGES line:
EXTRA_PACKAGES = mydoxy
Then create a new file called mydoxy.sty:
\NeedsTeXFormat{LaTeX2e}[1994/06/01]
\ProvidesPackage{mydoxy}[2009/12/29 v1.0.0 csmithmaui's Doxygen style]
\RequirePackage[top=2.9cm,left=2in,bottom=1in,right=1in]{geometry}
% any other custom stuff can go here
\endinput
Drop that mydoxy.sty where LaTeX can find it.
The EXTRA_PACKAGES line will tell Doxygen to add \usepackage{mydoxy} to the preamble of the .tex files it generates. This will cause LaTeX to look for a file named mydoxy.sty. In the mydoxy.sty file that we've created, we can add whatever LaTeX code we like (before the \endinput line). Feel free to drop any other customizations you like in this style file.
Note that I haven't tested this, and I'm making a number of assumptions that may be false. But it should at least get you started.