How to compile all .md files in a directory into a single .pdf with pandoc, while preserving YAML header data? - pdf

I have a directory of .md documents that each contain a YAML header specifying document title, author, date, categories,tags, etc. The directory contains journal entries and the filenames are simply the date of the entry.
I have no trouble using pandoc to generate a PDF for each .md file, however I'm looking for a way generate a single PDF in book or memoir format with each .md document's title field as a chapter in the table of contents, arranged by the date value. Ideally, the date would also appear in the table of contents, but that's not critical if the individual chapters will also display that information.
I haven't been able to find a way to do this as pandoc seems to ignore all but the first YAML header when concatenating multiple documents. One possible solution I can think of is to convert all relevant YAML header info to markdown headings and then demote existing headings in each .md document. But I'm not sure how to do this or if this is even the best approach. I was also looking at the R bookdown package, but it also uses markdown headers for chapters and not sure if it can be adapted to use YAML header info.
what is the easiest way to accomplish what I need? Thanks.

Your idea as outlined in your question is a good way to go:
The demoting of the title to a header can be done via a filter, e.g. a Lua filter if you are using pandoc >2.0. The following assumes that you are using the current version 2.0.6:
demote.lua:
-- List is available since pandoc 2.0.4
local List = require 'pandoc.List'
function Header (h)
h.level = h.level + 1
return h
end
function Pandoc (doc)
local title = doc.meta.title
local header = pandoc.Header(1, title)
doc.blocks = {header} .. doc.blocks
return doc
end
Now run the following command to create your pdf:
for f in /path/to/docs/*.md; do
pandoc --lua-filter=demote.lua -t markdown
printf "\n" # insert empty line between articles
end | pandoc -o combined.pdf

Related

Pandoc: generate compilable .tex from markdown

I have started using Markdown to write my Latex PDFs, and so far I am impressed by the amount of boilerplate it takes away.
However, I find Markdown not as expressive as Tex, and therefore in some situations would like to write the document in Markdown, convert to tex, then add some Latex-only stuff and only then convert to PDF.
However, converting .md to .tex with Pandoc does not yield an compilable file: it only contains the body of the file, not the "document setup".
Example, the following .md file:
```haskell
data Expr = I Int
```
Converts to:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{data} \DataTypeTok{Expr} \FunctionTok{=} \DataTypeTok{I} \DataTypeTok{Int}
\end{Highlighting}
\end{Shaded}
Obviously this is missing some stuff like the document class, start of document and the imported packages. Is there any way to generate this complete file instead of just the body? Or if not, can anyone at least tell me what package the Shaded, Highlighting, KeywordTok, DataTypeTok and FunctionTok commands are pulled from? Then I can add these imports myself.
Pandoc creates small snippets by default. Invoke it with the --standalone (or -s) command line flag to get a full document.

Pandoc set jobname for LaTeX PDF export

Is there a way to tell Pandoc to set \jobanme to a specific value while converting and compiling single markdown file to PDF (via LaTeX)? -Preferably the name of the source *.md file.
background:
I have my own LaTeX document class defined which uses \jobname.
It prints it in the document footer, so that it's easy for me to find source file/repo having a printed PDF.
I set jobname in my compile scripts as pdfLaTeX argument.
I am currently trying to use my document class as LaTeX template for documents processed by Pandoc from Markdown source. It seems, Pandoc sets \jobname always as 'input'.
I can set any variable in Markdown's yaml header which may be then printed into PDF, but being able to set it based on true md file name will be much less error prone.
I solved my problem by redefining my LaTeX template and using sourcefile pandoc variable instead of \jobname in case of using pandoc.

Rename ttf/woff/woff2 file to PostScript Font Name with Script

I am a typographer working with many fonts that have incorrect or incomplete filenames. I am on a Mac and have been using Hazel, AppleScript, and Automator workflows, attempting to automate renaming these files*. I require a script to replace the existing filename of ttf, woff, or woff2 files in Finder with the font's postscriptName. I know of tools (fc-scan/fontconfig, TTX, etc) which can retrieve the PostScript name-values I require, but lack the programming knowhow to code a script for my purposes. I've only managed to setup a watched directory that can run a script when any files matching certain parameters are added.
*To clarify, I am talking about changing the filename only, not the actual names stored within the font. Also I am open to a script of any compatible language or workflow of scripts if possible, e.g. this post references embedding AppleScript within Shell scripts via osascript.
StackExchange Posts I've Consulted:
How to get Fontname from OTF or TTF File?
How to get PostScript name of TTF font in OS X?
How to Change Name of Font?
Automate Renaming Files in macOS
Others:
https://github.com/dtinth/JXA-Cookbook/wiki/Using-JavaScript-for-Automation
https://github.com/fonttools/fonttools
https://github.com/devongovett/fontkit
https://www.npmjs.com/package/rename-js
https://opentype.js.org/font-inspector.html
http://www.fontgeek.net/blog/?p=343
https://www.lantean.co/osx-renaming-fonts-for-free
Edit: Added the following by request.
1) Screenshot of a somewhat typical webfont, illustrating how the form fields for font family and style names are often incomplete, blank, or contain illegal characters.
2) The woff file depicted (also, as base64).
Thank you all in advance!
Since you mentioned Automator in your question, I thought I'd try and solve this while using that to rename the file, along with standard Mac bash to get the font name. Hopefully, it beats learning a whole programming language.
I don't know what your workflow is so I'll leave any deviations to you but here is a method to select a font file and from Services, rename the file to the font's postscript name… based on Apple's metadata, specifically "com_apple_ats_name_postscript". This is one of the pieces of data retrieved using 'mdls' from the Terminal on the font file. To focus on the postscript name, grep the output for name_postscript. For simplicity here, I'll exclude the path to the selected file.
Font Name Aquisition
So… running this command…
mdls GenBkBasBI.ttf | grep -A1 name_postscript
… generates this output, which contains FontBook's Postscript name. The 'A1' in grep returns the found line and the first line after, which is the one containing the actual font name.
com_apple_ats_name_postscript = (
"GentiumBookBasic-BoldItalic"
Clean this up with some more bash (tr, tail)…
tr -d \ | tail -n 1 | tr -d \"
In order, these strip spaces, all lines excepting the last, and quotation marks. So for the first 'tr' instance, there is an extra space after the backslash.
In a single line, it looks like this…
mdls GenBkBasBI.ttf | grep -A1 name_postscript | tr -d \ | tail -n 1 | tr -d \"
…and produces this…
GentiumBookBasic-BoldItalic
Now, here is the workflow that includes the above bash command. I got the idea for variable usage from the answer to this question…
Apple Automator “New PDF from Images” maintaining same filename
Automator Workflow
Automator Workflow screenshot
At the top; Service receives selected 'files or folders' in 'Finder'.
Get Selected Finder Items
This (or Get Specified…) is there to allow testing. It is obviated by using this as a Service.
Set Value of Variable (File)
This is to remember which file you want to rename
Run Shell Script
This is where we use the bash stuff. The $f is the selected/specified file. I'm running 'zsh' for whatever reason. You can set it to whatever you're running, presumably 'bash'.
Set Value of Variable (Text)
Assign the bash output to a variable. This will be used by the last action for the new filename.
Get Value of Variable (File)
Recall the specified/selected file to rename.
Rename Finder Items: Name Single Item
I have it set to 'Basename only' so it will leave the extension alone. Enter the 'Text' variable from action 4 in here.

Table of contents sidebar in Sphinx LaTeX PDF

I am generating a LaTeX document from Sphinx, and converting it to PDF using pdflatex (from MikTeX). The document is missing a table of contents in the sidebar of the PDF viewer.
If I add manually \usepackage{hyperref} to the tex file, it works. But how can I tell Sphinx to do it in the conf.py project file? There is no (evident) related option in the latex output options.
Thanks!
Section 2.5.3 Customizing the rendering of the Sphinx document mentions:
LaTeX preamble
Additional commands may be added as preamble in the generated LaTeX file. This is easily done by editing file conf.py:
f = open('latex-styling.tex', 'r+');
PREAMBLE = f.read();
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'a4paper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
'preamble': PREAMBLE
}
This will copy the contents of file latex-styling.tex (in same directory as conf.py) to the generated LaTeX document. For instance, if latex-styling.tex reads:
% My personal "bold" command
\newcommand{\mycommand}[1]{\textbf{#1}}
the generated LaTeX document becomes:
% Generated by Sphinx.
\def\sphinxdocclass{report}
\documentclass[a4paper,10pt,english]{sphinxmanual}
% snip (packages)
% My personal "bold" command
\newcommand{\mycommand}[1]{\textbf{#1}}
\title{My Extension Documentation}
\date{2013-06-30 22:25}
\release{1.0.0}
\author{Xavier Perseguers}
Other options
The configuration file conf.py lets you further tune the rendering with LaTeX. Please consult http://www.sphinx-doc.org/en/stable/config.html#options-for-latex-output for further instructions.
A more direct way of adding content rather than inserting it in a separate file (say, latex-styling.tex), is to specify if verbatim. The next subsection in the documentation mentions this for a specific package typo3:
TYPO3 template
We want to stick as much as possible to default rendering, to avoid having to change the LaTeX code generation from Sphinx. As such, we choose to include a custom package typo3 (file typo3.sty) that will override some settings of package sphinx. To include it automatically, we simply use the preamble option of conf.py:
latex_elements = {
# Additional stuff for the LaTeX preamble.
'preamble': '\\usepackage{typo3}'
}
It's better to contain your styling options in a separate latex-styling.tex file that you can include using the preamble key via an f.read(). That way you don't have to update conf.py. Compartmentalization is usually better.

Exporting all yaml bibliographic in a pdf file using pandoc

I'm using Leo, yaml and pandoc to create a pdf. For that, my workflow is something like this:
I collected all relevant items as a zotero collection
I exported all of them as CSL JSON and converted it to yaml using biblio2yaml
I created a Leo outline with markdown nodes and a yaml node containing all the info for I want to write and all the collected bibliography items and made a small script to traverse the outline and export the things as I want.
Finally over the output file I run:
pandoc --filter pandoc-citeproc output.markdown -o output.pdf
and is working pretty fine. The thing is that I would like to tell pandoc to include all the bibliography items, no matter if they are referenced with [#reference] inside the markdown text or are just collected in the embeded yaml block for bibliography. Is this possible?, if not, there is some way to script pandoc to do something like that?
PS: I used the [-#reference] trick inside the pandoc's markdown, for trying to put non explicit references of the bibliography in the exported but then I get a year in parenthesis in the exported pdf, as one would expect, so that's not the way to go.
Eventually I'd like to add a syntax to pandoc for marking citations for inclusion in the bibliography without putting them in the text.
But for now, your best bet would be to put references for all of them in the text, and modify your CSL file so that no actual citation is printed (just the bibliography). I can't give guidance on how to do that, but I have heard of others doing it, so I know it's possible.
The README1 of pandoc gives the solution. You need to define a dummy nocite metadata field and put the citations there:
# References
The bibliography will be inserted after this header. Note that
the `unnumbered` class will be added to this header, so that the
section will not be numbered.
If you want to include items in the bibliography without actually
citing them in the body text, you can define a dummy `nocite` metadata
field and put the citations there:
---
nocite: |
#item1, #item2
...
#item3
In this example, the document will contain a citation for `item3`
only, but the bibliography will contain entries for `item1`, `item2`, and
`item3`.