pandoc for Jupyter-to-LaTex | How to render tables? - html-table

My attempts to convert from beautiful Jupyter notebook format (with output generated by R code) to publication-ready LaTex format continue. Tables are not being rendered as desired.
Here’s where I’m at …
The R data.frame output using Jupyter’s default display DOES convert correctly.
The R data.frame output using knitr’s kable function and IRdisplay’s
display_html function renders correctly in Jupyter, but DOES NOT
render at all in LaTex/PDF – the generated HTML is just ignored in
the conversion process.
The R data.frame output using knitr’s kable function and IRdisplay’s
display_latex function does render correctly in LaTex/PDF, but DOES
NOT render in Jupyter – only the LaTex code is displayed.
The markdown table from a markdown cell DOES convert correctly.
Here’s what I did …
little_test.ipynb (a Jupyter notebook) is converted to little_test.tex (a latex document) using pandoc like this:
pandoc "little_test.ipynb" --output=”little_test.tex” --to=latex --lua-filter=rah.lua --standalone --extract-media=images --number-sections --shift-heading-level-by=1 --dpi=125 -V documentclass=book -V block-headings -V papersize=letter -V fontsize=10pt -V margin-left=1in -V margin-right=1in -V margin-top=1in -V margin-bottom=1in
little_test.tex (a ltext document) is edited slightly like this:
\setcounter{chapter}{8}
\setcounter{section}{5} % section number start minus one
…
\begin{center}
\includegraphics[width=4in]{images/decision-model_pets_provider.jpg}
\end{center}
little_test.tex (a latex document) is converted to little_test.pdf (a PDF document) using MikTex’s TeXworks.
Also, notice little_test (from Chrome print).pdf, which is generated by the Chrome browser print applied directly to little_test.ipynb. Using this alternative approach, all the tables (and all html images) are rendered correctly. So, I can see there must be some way to do it. But of course, I’d still need the LaTex so that I can adjust the styling.
How to get HTML-format tables from Jupyter to LaTex? -or-
How to display (kable-generated) LaTex-format tables in Jupyter?

The below Lua filter will parse and, if possible, convert all raw HTML blocks in your input. That should make it possible to render tables as HTML, but still have them show up in the final output. Use it by saving the code to a file parse-html.lua and pass it to pandoc via --lua-filter=parse-html.lua.
function RawBlock (raw)
if raw.format:match 'html' then
return pandoc.read(raw.text, 'html').blocks
end
end

Also, here's a better overall solution. Rather than have pandoc convert HTML tables to LaTex tables, I've coded the tables in both HTML and LaTex format within the Jupyter notebook. The HTML representation will be rendered when running Jupyter; the LaTex representation will be copied as-is by pandoc when converting from Jupyter-format to LaTeX format.
x = data.frame(a=c(1,2,3), b=c(10,20,30), c=c(100,200,300))
x.html = kable(x, format=“html”, escape=FALSE, align=rep(“r”, ncol(x)), caption=“This is from HTML”, row.names=FALSE, table.attr=“style="white-space: nowrap;"”)
x.latex = kable_styling(latex_options=c(“hold_position”),
kable(x, format=“latex”, escape=FALSE, align=rep(“r”, ncol(x)), caption=“This is from LaTex”, row.names=FALSE))
mbx = list(data=list(“text/html”=as.character(x.html), “text/latex”=as.character(x.latex)), metadata=NULL)
publish_mimebundle(mbx$data, mbx$metadata)

Related

GROFF PDFPIC converted w ImageMagick to .ms document causes "troff: sample.ms:18: division by zero" and leads images to show very right of the pdf doc

I converted my original image to pdf with ImageMagick. If viewed independently, the pdf image looks perfectly normal.
sample.ms :
.PDFPIC Figure_1.pdf
Once I try to compile my .ms document with the following command:
groff -ms sample.ms -U -T pdf > sample.pdf
I get the following error from groff:
troff: sample.ms:1: division by zero
The document does compile but it looks like this: image is way to the right of the page to the point its sometimes almost completely out of the page.
I was having the same problem and it seems like the PDFs convert generates are corrupt in some way.
I ended up using convert img.png img.tiff and then tiff2pdf img.tiff > img.pdf. Including img.pdf then worked just fine.
I used tiff2pdf just because that's what I had installed, but any other program should work too if it generates valid PDF.

How to add footer to pdf with pdfjam or pdftk?

I am using a shell script to modify many pdfs and would like to create a script that adds the page number (1 of X format) to the bottom of PDFs in a directory along with the text of the filename.
I tried using pdfjam with this format:
pdfjam --pagenumbering true
but it fails saying undefine pagenumbering
Any other recommendations how to do this? I am OK installing other tools but would like this to all be within a shell script.
Thank you
tl;dr: pdfjam --pagecommand '' input.pdf
By default, pdfjam adds the following LaTeX command to every page: \thispagestyle{empty}. By changing the command to an empty command, the default plain page style is used, which consists of a page number at the bottom. Of course you may want to play with other styles or layout options to position the page number differently.

Pandoc: generate compilable .tex from markdown

I have started using Markdown to write my Latex PDFs, and so far I am impressed by the amount of boilerplate it takes away.
However, I find Markdown not as expressive as Tex, and therefore in some situations would like to write the document in Markdown, convert to tex, then add some Latex-only stuff and only then convert to PDF.
However, converting .md to .tex with Pandoc does not yield an compilable file: it only contains the body of the file, not the "document setup".
Example, the following .md file:
```haskell
data Expr = I Int
```
Converts to:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{data} \DataTypeTok{Expr} \FunctionTok{=} \DataTypeTok{I} \DataTypeTok{Int}
\end{Highlighting}
\end{Shaded}
Obviously this is missing some stuff like the document class, start of document and the imported packages. Is there any way to generate this complete file instead of just the body? Or if not, can anyone at least tell me what package the Shaded, Highlighting, KeywordTok, DataTypeTok and FunctionTok commands are pulled from? Then I can add these imports myself.
Pandoc creates small snippets by default. Invoke it with the --standalone (or -s) command line flag to get a full document.

How to compile all .md files in a directory into a single .pdf with pandoc, while preserving YAML header data?

I have a directory of .md documents that each contain a YAML header specifying document title, author, date, categories,tags, etc. The directory contains journal entries and the filenames are simply the date of the entry.
I have no trouble using pandoc to generate a PDF for each .md file, however I'm looking for a way generate a single PDF in book or memoir format with each .md document's title field as a chapter in the table of contents, arranged by the date value. Ideally, the date would also appear in the table of contents, but that's not critical if the individual chapters will also display that information.
I haven't been able to find a way to do this as pandoc seems to ignore all but the first YAML header when concatenating multiple documents. One possible solution I can think of is to convert all relevant YAML header info to markdown headings and then demote existing headings in each .md document. But I'm not sure how to do this or if this is even the best approach. I was also looking at the R bookdown package, but it also uses markdown headers for chapters and not sure if it can be adapted to use YAML header info.
what is the easiest way to accomplish what I need? Thanks.
Your idea as outlined in your question is a good way to go:
The demoting of the title to a header can be done via a filter, e.g. a Lua filter if you are using pandoc >2.0. The following assumes that you are using the current version 2.0.6:
demote.lua:
-- List is available since pandoc 2.0.4
local List = require 'pandoc.List'
function Header (h)
h.level = h.level + 1
return h
end
function Pandoc (doc)
local title = doc.meta.title
local header = pandoc.Header(1, title)
doc.blocks = {header} .. doc.blocks
return doc
end
Now run the following command to create your pdf:
for f in /path/to/docs/*.md; do
pandoc --lua-filter=demote.lua -t markdown
printf "\n" # insert empty line between articles
end | pandoc -o combined.pdf

Inkscape "PDF + Latex" export

I'm using inkscape to produce vector figures, save them in SVG format to export them later as "PDF + Latex" much in the vein of TUG inkscape+pdflatex guide.
Trying to produce a simple figure, however, turns out to be extremely frustating.
The first figure
is an example of the figure I would like to export in the form of "PDF + Latex" (shown here in PNG format).
If I export this to a PDF figure without latex macros the PDF produced looks exactly the same, except for some minor differences with the fonts used to render the text.
When I try to export this using the "PDF + Latex" option the PDF file produced consists on a PDF document of 2 pages (again as .png here):
This, of course, does not looks good when compiling my latex document. So far the guide at TUG has been very helpful, but I still can't produce a working "PDF + Latex" export from inkscape.
What am I doing wrong?
I worked around this by putting all the text in my drawing at the top
select text and then Object -> Raise to top
Inkscape only generates the separate pages if the text is below another object.
I asked this question on the Inkscape online discussion page and got some very helpful guidance from one of the users there.
This is a known bug https://bugs.launchpad.net/ubuntu/+bug/1417470 which was inadvertently introduced in Inkscape 0.91 in an attempt to fix a previous bug https://bugs.launchpad.net/inkscape/+bug/771957.
It seems this bug does two things:
The *.pdf_tex file will have an extra \includegraphics statement which needs to be deleted manually as described in the link to the bug above.
The *.pdf file may be split into multiple pages, regardless of the size of the image. In my case the line objects were split off onto their own page. I worked around this by turning off the text objects (opacity to zero) and then doing a standard PDF export.
If you can execute linux commands, this works:
# Generate the .pdf and .pdf_tex files
inkscape -z -D --file="$SVGFILE" --export-pdf="$PDFFILE" --export-latex
# Fix the number of pages
sed -i 's/\\\\/\n/g' ${PDFFILE}_tex;
MAXPAGE=$(pdfinfo $PDFFILE | grep -oP "(?<=Pages:)\s*[0-9]+" | tr -d " ");
sed -i "/page=$(($MAXPAGE+1))/,\${/page=/d}" ${PDFFILE}_tex;
with:
$SVGFILE: path of the svg
$PDF_FILE: path of the pdf
It is possible to include these commands in a script and execute it automatically when compiling your tex file (so that you don't have to manually export from inkscape each time you modify your svg).
Try it with an illustration that is less wide.
Alternatively, use a wider paperwidth setting.