Pandoc markdown to PDF - pdf

I'm looking to convert my Markdown book to a PDF.
I've done a lot of research and it seems that Pandoc is the best choice for this. It seems pandoc converts the markdown to latex and then to a PDF.
The problem I am running into is including external images. Ideally I would have the process grab the remote images off the net and put them into the pdf.
I'm hitting this error:
pandoc: Error producing PDF from TeX source.
! Package pdftex.def Error: File `http://wes.io/QYGG/content.png' not found.
See the pdftex.def package documentation for explanation.
Type H <return> for immediate help.
...
l.84 ...degraphics{http://wes.io/QYGG/content.png}
I have MacTex installed and the command I'm running is pandoc test.md -o test.pdf
I've never used latex before, so I'm a bit at a loss of how to fix this.

According to this LaTex question, you can't directly reference URL images from within LaTeX, though they have a potential LaTeX-hacking option available at the link.
Local image files are probably your best bet.

Newer versions of pandoc fetch external images before passing them on to LaTeX.
Alternatively, you could use ConTeXt instead of LaTeX which natively supports fetching images from URLs:
$ pandoc -t context test.md -o test.pdf

Related

Remove flickering during pdf refresh when compiling latex with pandoc

I am creating latex documents on Linux (Debian 9) writing them in markdown and converting them with pandoc.
I use the following pandoc terminal command:
watch -n 5 pandoc -N --toc --toc-depth=3 -f markdown checklist.md -t
latex -o checklist.pdf
which compiles the markdown code into a pdf (latex) every 5 seconds, which enables live preview.
The issue is that the pdf in the pdf reader (i have tried several, i.e. evince, atril, okular, mupdf) flashes every time the pdf is recompiled (in this case every 5 seconds), which is somewhat annoying.
I had a similar setup for latex live preview in vim, and it was not flashing like this.
(one possible difference between the vim setup, and my current workflow, is that in vim, the latex code would auto-save after any changes, which i am not doing now (and would rather not do).
PS: i have found this script, that seems interesting, but am not sure how to get it to work:
https://gist.github.com/mmcclimon/7311538

How to avoid automatic appended file extensions to directory links when converting to pdf using pandoc?

I'm writing company internal documentation in R markdown and compiling using knitr in Rstudio. I'm trying to add a link pointing to a directory as follows:
[testdir](file:////c:/test/)
(this is following the convention described in here)
When I compile it to html, I get the following link.
testdir
and it works as expected in Internet explorer. However, when I try to convert to pdf straight from RStudio, an unwanted pdf extension is appended to the link. I tried dissecting the problem and it seems this change is happening within pandoc. Here are the details.
When I convert it to latex using pandoc,
>pandoc -f markdown -t latex testing.md -o test.tex
the link in the latex output file looks as follows:
\href{file:///c:/test/}{testdir}
Everything good so far. However, when I convert the latex output to pdf with pandoc,
>pandoc -f latex -t latex -o test.pdf test.tex
a .pdf extension is appended to the link. Here is a copy/paste of the pdf link output:
/c:/test/.pdf
is there a way to avoid this unwanted appended extension?
Perhaps I'm asking too much of pandoc, but I thought it might be worth asking since RStudio is becoming such a useful IDE to write my dynamic documents.
As you said, the .tex file pandoc generates is fine. So the problem is actually with LaTeX, specifically with the hyperref package which is used in pandoc's LaTeX template.
The problem with two possible solutions was described here. To prevent hyperref from being smart and adding a file extensions, try:
[testdir](file:///c:/test/.)
Or use ConTeXt instead of LaTeX:
$ pandoc -t context -s testing.md -o test.tex && context test.tex

Is it possible to uncompress PDF by using Adobe Acrobat or Acrobat Distiller?

Most PDF files found on the Web have compressed and unreadable data streams. Is it possible to uncompress the internal content of a PDF file using Acrobat or Acrobat Distiller, allowing us to read the source code by a text editor?
P.S. This question is inspired by this answer which explains how it can be done with GhostScript.
qpdf and pdftk have already been mentioned. To show the commands:
$ qpdf --qdf --object-streams=disable orig.pdf uncompressed-orig.pdf
$ pdftk orig.pdf output uncompressed-orig.pdf uncompress
mutool however hasn't been mentioned yet:
$ mutool clean -d -a orig.pdf uncompressed-orig.pdf
mutool is a command line tool which ships alongside the lightweight MuPDF PDF + document viewer.
I do not think you can achieve the uncompressing of PDF objects' streams with Acrobat or Distiller (unless you have additional payware plugins available).
Use cpdf:
cpdf -decompress in.pdf -o out.pdf
and then the graphic operators for each page can be read in a text editor. You'll need a copy of the standard as a reference, though.
Disclosure: I am the author of cpdf.
This is easy with qpdf and pdftk.
With Adobe Acrobat you can get at the internal structure after profiling a PDF (preflight with some profile (e.g. detect PDF syntax errors), then Options->Internal PDF structure) - but there's no way to get something editable with a text editor.

How to convert a Markdown file to PDF

I have a Markdown file that I wish to convert to PDF so that I can upload it on Speakerdeck. I am using Pandoc to convert from markdown to PDF.
My problem is I can't specify what content should go on what page of the PDF, because Markdown doesn't provide any feature like that.
E.g., Markdown:
###Hello
* abc
* def
###Bye
* ghi
* jkl
Now I want Hello to be one slide and Bye to be on another slide on Speakerdeck. So, I will need them to be on different pages in the PDF that I generate using Pandoc.
But both Hello and Bye gets on the same page in the PDF.
How can I accomplish this?
Via the terminal (tested in 2020)
Download dependencies
sudo apt-get install pandoc texlive-latex-base texlive-fonts-recommended texlive-extra-utils texlive-latex-extra
Try to use
pandoc MANUAL.txt -o example13.pdf
pandoc MANUAL.md -o example13.pdf
Via a Visual Studio Code extension (tested in 2020)
Download the Yzane Markdown PDF extension
Right click inside a Markdown file (md)
The content below will appear
Select the Markdown PDF: Export (pdf) option
Note: Emojis are better in Windows than Linux (I don't know why)
2016 update:
NPM module: https://github.com/alanshaw/markdown-pdf
Has a command line interface: https://github.com/alanshaw/markdown-pdf#usage
npm install -g markdown-pdf
markdown-pdf <markdown-file-path>
Or, an online service: http://markdown2pdf.com
As SpeakerDeck only accepts PDF files, the easiest option is to use the Latex Beamer backend for pandoc:
pandoc -t beamer -o output.pdf yourInput.mkd
Note that you should have LaTeX Beamer installed for that.
In Ubuntu, you can do sudo apt-get install texlive-latex-recommended to install it. If you use Windows, you may try this answer.
You may also want to try the HTML/CSS output from Slidy:
pandoc --self-contained -t slidy -o output-slidy.html yourInput.mkd
It has a decent printing output, as you can check out trying to print the original.
Read more about slideshows with pandoc here.
Easy online solution: dillinger.io.
Just paste your Markdown content into the editor on the left and see the (html) preview on the right. Then click Export as on the top and chose pdf.
It's based on the open source dillinger editor.
Adding to elias' answer, if you want to separate text in slides, just put *** between the text you want to separate. For your example to be in several pages, write it like this:
### Hello
- abc
- def
***
### Bye
- ghi
- jkl
And then use elias' answer, pandoc -t beamer -o output.pdf yourInput.md.
I have Ubuntu 18.10 (Cosmic Cuttlefish) and installed the full package from texlive. It works for me.
Previously I had used the npm markdown-pdf answer. However, on a fresh install of Ubuntu 19.04 (Disco Dingo) I had issues getting it to install correctly.
Instead I started using the Visual Studio Code package: "Markdown PDF"
Details:
Name: Markdown PDF
Id: yzane.markdown-pdf
Description: Convert Markdown to PDF
Version: 1.2.0
Publisher: yzane
Visual Studio Marketplace link: https://marketplace.visualstudio.com/items?itemName=yzane.markdown-pdf
It has worked consistently well. If you've had issues getting other answers to work, I would recommend trying this.
I've managed to get a stable Markdown -> HTML > PDF pipeline working with the MarkReport project.
It is a bit more than what Pandoc will do though, since it is based on WeasyPrint and is therefore aimed for clean report publishing, with cover, headers, sections, ...
It also enriches the HTML with syntax highlighting and LaTeX equations.
Simple way with iOS:
Use Shortcuts app (by Apple)
Make Rich Text From Markdown: Clipboard
^
Make PDF from Rich Text From Markdown
^
Show [PDF] in Quick Look
Just copy text and run the shortcut. Press share in Quick Look (bottom left) to store or send it. I use this to quickly convert Joplin notes to pdf.
I found that many markdown-to-pdf converters produce files that I don't find exactly neat-looking. However there is a solution to this.
If you're using IntelliJ, you can use a plugin called "Markdown". The export function uses pandoc as an engine so you probably will need to install that along with pdf-latex. https://pandoc.org/installing.html
In IntelliJ, under Tools > Markdown Converter > Export Markdown File To...
And there you go, a clean looking document. Additional styling can be added via a .css stylesheet.

Is it possible to programmatically "chain" several PDF files, preferably from command line?

Is there a way, in Linux, Windows, or preferably Mac OS X to take a bunch of PDF files and "chain them" into one "booklet" without owning Acrobat and preferably without doing this manually?
I have TexShop, MikTex and the like installed, if any of their utilities help.
ghostcript method:
gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf in1.pdf in2.pdf in3.pdf ...
from: How to concatenate PDFs without pain
ImageMagick method:
convert file1.pdf file2.pdf file3.pdf out.pdf
pdftk method:
pdftk file1.pdf file2.pdf file3.pdf cat output out.pdf
I have tried several different tools and have gotten the most reliable results with the PDF toolkit, pdftk. It seems to work more consistently than trying to use gs or messing around with conversion to PostScript and back. And it avoids dealing with one image per page, which is a nuisance.
pdftk is included in Debian-based Linux distributions and perhaps others as well.
I had to recently research this and came up with the following. In the end I went with ImageMagick.
Merging is hard! http://ansuz.sooke.bc.ca/software/pdf-append.php
pdfjoin from the pdfjam package seems to be the standard on unix-like systems but not available on Windows
Coherentpdf is multi-platform. However licences cost up to €700
pdftk is multi-platform and open source. However it does appear to be 3 years old.
Imagemagick will merge pdfs and also generate pdfs from jpgs. I know it works on Linux and Windows.
PDFsam works using iText and Java
You can chain the "Get selected Items", "Combine PDF Pages", "Rename PDF Document" and "Move Finder Items" actions in Automator to create the desired workflow.
Have a look at Multivalent Document Tools
Failing that you can search out other tools via Freshmeat.net
I've mentioned it in the other topics and I'll mention it again: you can use the Ghostscript utilities pdf2ps and ps2pdf do it as so:
pdf2ps file1.pdf file1.ps # Convert file1 to PostScript
pdf2ps file2.pdf file2.ps # Convert file2 to PostScript
cat file2.ps >> file1.ps # Concatenate files
ps2pdf file1.ps output.pdf # Convert back to PDF
I have also used Multivalent Java based tools. It is a simple invocation of Java MultiValent main program passing in each pdf file you want to append as arguments.