How to avoid automatic appended file extensions to directory links when converting to pdf using pandoc? - pdf

I'm writing company internal documentation in R markdown and compiling using knitr in Rstudio. I'm trying to add a link pointing to a directory as follows:
[testdir](file:////c:/test/)
(this is following the convention described in here)
When I compile it to html, I get the following link.
testdir
and it works as expected in Internet explorer. However, when I try to convert to pdf straight from RStudio, an unwanted pdf extension is appended to the link. I tried dissecting the problem and it seems this change is happening within pandoc. Here are the details.
When I convert it to latex using pandoc,
>pandoc -f markdown -t latex testing.md -o test.tex
the link in the latex output file looks as follows:
\href{file:///c:/test/}{testdir}
Everything good so far. However, when I convert the latex output to pdf with pandoc,
>pandoc -f latex -t latex -o test.pdf test.tex
a .pdf extension is appended to the link. Here is a copy/paste of the pdf link output:
/c:/test/.pdf
is there a way to avoid this unwanted appended extension?
Perhaps I'm asking too much of pandoc, but I thought it might be worth asking since RStudio is becoming such a useful IDE to write my dynamic documents.

As you said, the .tex file pandoc generates is fine. So the problem is actually with LaTeX, specifically with the hyperref package which is used in pandoc's LaTeX template.
The problem with two possible solutions was described here. To prevent hyperref from being smart and adding a file extensions, try:
[testdir](file:///c:/test/.)
Or use ConTeXt instead of LaTeX:
$ pandoc -t context -s testing.md -o test.tex && context test.tex

Related

How can I convert specific web page to markdown or asciidoc with pandoc?

I want to convert java specification documentation to easily editable formats(markdown or asciidoc) and upload GitHub Gist and customize(adding my code experiences and notes.)
I want to convert to something like this
I use a tool called pandoc that allows us to convert from HTML to markdown.
I tried followings:
Technique 1
I tried to convert all table of components of java specification on index.html
pandoc -f html -t markdown -o test2.md
https://docs.orac le.com/javase/specs/jls/se10/html/index.html`
I got this:tes2.md
(I did not upload here because the file of contents is too long)
Problem 1:
This markdown file does not have contents of java specification documentation. I expected that I got markdown toc(table of components) and java specification documentation contents in markdown file like this`
Problem 2:
When click the links on this markdown file then I get 404 error page.
Technique 2(Better than technique 1)
I downloaded all HTML files of TOC with HTTrack and try to convert all files separately.
pandoc -f html-native_divs-native_spans -i jls-1.html -t markdown -o test2.md
Problem 1:
I got following markdown file which have the table of components links that cannot redirect to another section of the same document. When I click on this links, they return external GitHub page like that:https://gist.github.com/lostdinar2/jls-1.html#jls-1.1
which is not available.
test3.md
A demonstration of problem 1:
1)I want to convert this HTML internal id link(#) to the markdown internal link that redirects to another section of the same document
<dt><span class="section">2.2. The Lexical Grammar</span></dt>
[link text](#abcd)
2)But pandoc cannot convert this links to the markdown internal link.Pandoc create an external link like this:https://gist.github.com/lostdinar2/jls-1.html#jls-1.1
Is there a pandoc parameter to fix this? I make a search on pandoc documentation but I cannot do this feature.

Remove flickering during pdf refresh when compiling latex with pandoc

I am creating latex documents on Linux (Debian 9) writing them in markdown and converting them with pandoc.
I use the following pandoc terminal command:
watch -n 5 pandoc -N --toc --toc-depth=3 -f markdown checklist.md -t
latex -o checklist.pdf
which compiles the markdown code into a pdf (latex) every 5 seconds, which enables live preview.
The issue is that the pdf in the pdf reader (i have tried several, i.e. evince, atril, okular, mupdf) flashes every time the pdf is recompiled (in this case every 5 seconds), which is somewhat annoying.
I had a similar setup for latex live preview in vim, and it was not flashing like this.
(one possible difference between the vim setup, and my current workflow, is that in vim, the latex code would auto-save after any changes, which i am not doing now (and would rather not do).
PS: i have found this script, that seems interesting, but am not sure how to get it to work:
https://gist.github.com/mmcclimon/7311538

Pandoc markdown to PDF

I'm looking to convert my Markdown book to a PDF.
I've done a lot of research and it seems that Pandoc is the best choice for this. It seems pandoc converts the markdown to latex and then to a PDF.
The problem I am running into is including external images. Ideally I would have the process grab the remote images off the net and put them into the pdf.
I'm hitting this error:
pandoc: Error producing PDF from TeX source.
! Package pdftex.def Error: File `http://wes.io/QYGG/content.png' not found.
See the pdftex.def package documentation for explanation.
Type H <return> for immediate help.
...
l.84 ...degraphics{http://wes.io/QYGG/content.png}
I have MacTex installed and the command I'm running is pandoc test.md -o test.pdf
I've never used latex before, so I'm a bit at a loss of how to fix this.
According to this LaTex question, you can't directly reference URL images from within LaTeX, though they have a potential LaTeX-hacking option available at the link.
Local image files are probably your best bet.
Newer versions of pandoc fetch external images before passing them on to LaTeX.
Alternatively, you could use ConTeXt instead of LaTeX which natively supports fetching images from URLs:
$ pandoc -t context test.md -o test.pdf

How to convert a Markdown file to PDF

I have a Markdown file that I wish to convert to PDF so that I can upload it on Speakerdeck. I am using Pandoc to convert from markdown to PDF.
My problem is I can't specify what content should go on what page of the PDF, because Markdown doesn't provide any feature like that.
E.g., Markdown:
###Hello
* abc
* def
###Bye
* ghi
* jkl
Now I want Hello to be one slide and Bye to be on another slide on Speakerdeck. So, I will need them to be on different pages in the PDF that I generate using Pandoc.
But both Hello and Bye gets on the same page in the PDF.
How can I accomplish this?
Via the terminal (tested in 2020)
Download dependencies
sudo apt-get install pandoc texlive-latex-base texlive-fonts-recommended texlive-extra-utils texlive-latex-extra
Try to use
pandoc MANUAL.txt -o example13.pdf
pandoc MANUAL.md -o example13.pdf
Via a Visual Studio Code extension (tested in 2020)
Download the Yzane Markdown PDF extension
Right click inside a Markdown file (md)
The content below will appear
Select the Markdown PDF: Export (pdf) option
Note: Emojis are better in Windows than Linux (I don't know why)
2016 update:
NPM module: https://github.com/alanshaw/markdown-pdf
Has a command line interface: https://github.com/alanshaw/markdown-pdf#usage
npm install -g markdown-pdf
markdown-pdf <markdown-file-path>
Or, an online service: http://markdown2pdf.com
As SpeakerDeck only accepts PDF files, the easiest option is to use the Latex Beamer backend for pandoc:
pandoc -t beamer -o output.pdf yourInput.mkd
Note that you should have LaTeX Beamer installed for that.
In Ubuntu, you can do sudo apt-get install texlive-latex-recommended to install it. If you use Windows, you may try this answer.
You may also want to try the HTML/CSS output from Slidy:
pandoc --self-contained -t slidy -o output-slidy.html yourInput.mkd
It has a decent printing output, as you can check out trying to print the original.
Read more about slideshows with pandoc here.
Easy online solution: dillinger.io.
Just paste your Markdown content into the editor on the left and see the (html) preview on the right. Then click Export as on the top and chose pdf.
It's based on the open source dillinger editor.
Adding to elias' answer, if you want to separate text in slides, just put *** between the text you want to separate. For your example to be in several pages, write it like this:
### Hello
- abc
- def
***
### Bye
- ghi
- jkl
And then use elias' answer, pandoc -t beamer -o output.pdf yourInput.md.
I have UbuntuĀ 18.10 (Cosmic Cuttlefish) and installed the full package from texlive. It works for me.
Previously I had used the npm markdown-pdf answer. However, on a fresh install of UbuntuĀ 19.04 (Disco Dingo) I had issues getting it to install correctly.
Instead I started using the Visual Studio Code package: "Markdown PDF"
Details:
Name: Markdown PDF
Id: yzane.markdown-pdf
Description: Convert Markdown to PDF
Version: 1.2.0
Publisher: yzane
Visual Studio Marketplace link: https://marketplace.visualstudio.com/items?itemName=yzane.markdown-pdf
It has worked consistently well. If you've had issues getting other answers to work, I would recommend trying this.
I've managed to get a stable Markdown -> HTML > PDF pipeline working with the MarkReport project.
It is a bit more than what Pandoc will do though, since it is based on WeasyPrint and is therefore aimed for clean report publishing, with cover, headers, sections, ...
It also enriches the HTML with syntax highlighting and LaTeX equations.
Simple way with iOS:
Use Shortcuts app (by Apple)
Make Rich Text From Markdown: Clipboard
^
Make PDF from Rich Text From Markdown
^
Show [PDF] in Quick Look
Just copy text and run the shortcut. Press share in Quick Look (bottom left) to store or send it. I use this to quickly convert Joplin notes to pdf.
I found that many markdown-to-pdf converters produce files that I don't find exactly neat-looking. However there is a solution to this.
If you're using IntelliJ, you can use a plugin called "Markdown". The export function uses pandoc as an engine so you probably will need to install that along with pdf-latex. https://pandoc.org/installing.html
In IntelliJ, under Tools > Markdown Converter > Export Markdown File To...
And there you go, a clean looking document. Additional styling can be added via a .css stylesheet.

How to use ghostscript to convert PDF to PDF/A or PDF/X?

Is there a way to use ghostscript to convert PDF to PDF/A or PDF/X? I know it can be used to convert PDF to images, but I don't know if it can be used to convert PDF/A. What parameters should I use?
This is to convert a pdf document (not pdf/a) into pdf/a:
gs -dPDFA -dBATCH -dNOPAUSE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=output_filename.pdf input_filename.pdf
Hope this will help some one!
Hope this answer helps others coming from Google with the same problem:
To convert from PDF to PDFA-1b or PDFA-2b, you can use Ghostscript. I suggest you use the latest version (9.19 today).
Install it
**In Mac OS**, you may prefer to use [Homebrew][1]:
brew install ghostscript
(UPDATE: 2023-01-23. This no longer works in mac with homebrew, as versions newer than 9.19 will adamantly refuse to do the conversion, no matter what I've tried)
In Linux, some distros bring a much older version (rhel7 sports 9.07). To download a fully independent modern one-file-only ghostscript, download it directly from the site:
wget https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs919/ghostscript-9.19-linux-x86_64.tgz
(UPDATE: 2023-01-23: stick to that version, newer versions won't work with the method presented below.
If the link above is broken when you try it 20 years from now, please refer to ghostscript.com and search for download section. Download the binary version, don't go for the source, unless you know what you are doing.
In Windows, I cannot help you, but if you manage to install it, the following commands will also work, if you substitute the location of files and gs executable.
Command line
(note to future editors, please don't remove formatting, as this is more readable, yet working command line)
gs-919-linux_x86_64 \
-dPDFA=1
-dNOOUTERSAVE \
-sProcessColorModel=DeviceRGB \
-sDEVICE=pdfwrite \
-dPDFACompatibilityPolicy=1 \
-o output_file.pdf \
/path/to/PDFA_def.ps \
input_file.pdf
In Mac gs-919-linux_x86_64 will be simply gs.
Please note that output_file.pdf and input_file.pdf must be changed to the names of the output file (the converted file) and the input file (the file to be converted). /path/to/PDFA_def.ps is your copy of the file PDFA_def.ps.
-dPDFA=1 is for PDFA-1b.
-dPDFA=2 if you want PDFA-2b.
What is PDFA_def.ps?
PDFA_def.ps is some sort of template ghostscript uses to create a PDFA file. The tricky part is that, for some reason, ghostcript comes with a non-working file.
You'll need to edit PDFA_def.ps and include the path to a valid ICC (color profile) file. Download a good color profile from Adobe:
wget https://tutankhamon.acc.umu.se/mirror/archive/ftp.sunet.se/pub/vendor/adobe/adobe/iccprofiles/win/AdobeICCProfilesWin_end-user.zip
Inside that zip, find a file called AdobeRGB1998.icc, put it somewhere and put the path to that file INSIDE you PDFA_def.ps file. Note that the path should be absolute, with no quotes. Like:
/ICCProfile (/full/path/to/file/AdobeRG1998.icc) % Customize.
Here is a version of PDFA_def.ps, change PATH_TO_YOUR_ICC_FILE to the path of you AdobeRGB1998.icc.
https://gist.githubusercontent.com/weltonrodrigo/19df77833f023fbe1572168982e4b515/raw/ea86e87379d14120d7ff26f6f235ac7eeb5f5dd5/PDFA_def.ps
#danio, #imgen: Even recently released documentation pages on PDF/X (standardized Prepress requirements) and PDF/A (standardized Archiving requirments) generation were quite misleading. (Your link pointed to a v8.63 release.) In the end, it suggested that running the example commandlines using the sample PDF*_def.ps would already generated valid PDF/A and PDF/X files.
But, they do not!
Here is one of the sample commands, which by itself is correct:
gs \
-dPDFA \
-dBATCH \
-dNOPAUSE \
-dNOOUTERSAVE \
-dUseCIEColor \
-sDEVICE=pdfwrite \
-sOutputFile=out-a.pdf \
PDFA_def.ps \
input.ps
The output file will declare itself to be PDF/A (and most PDF viewers would happily go along with this), but the output file fails all real compliance tests.
The fix is easy: you need to edit your sample PDFA_def.ps (for PDF/X: your PDFX_def.ps) files to match your environments. These required edits were not clearly spelled out in older documentation versions, and the provided command suggested it would work out of the box.
Especially in case of PDF/X you MUST specifiy a valid ICC profile to use.
See also the updated documentation (current SVN trunk version) about this:
http://svn.ghostscript.com/ghostscript/trunk/gs/doc/Ps2pdf.htm#PDFA
Please note that current answers are not completely correct. You can define which level of PDF/A you want, resulting in different behaviors of the program. This one is correct:
gs -dPDFA -dBATCH -dNOPAUSE -sColorConversionStrategy=UseDeviceIndependentColor -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=2 -sOutputFile=output_filename.pdf input_filename.pdf
Please note my change from sdPDFACompatibilityPolicy to dPDFACompatibilityPolicy.
Change it to a higher number to get other versions. 1 is good if you don't need DOCINFO.
Furthermore we use the option UseDeviceIndependentColor to avoid validating issues.
If you change options here, you will most likely get a non compliant PDF/A (even if it stated differently).
You can check your pdf/a here:
https://www.pdf-online.com/osa/validate.aspx
If you're using Windows and want to create PDF/A-1b documents explicitely (PDFCreator has an output option for PDF/A-2b but not for PDF/A-1b), you just can enter the parameters Artur described above into the ui settings of PDFCreator without the ones for the document names. Start PDFCreator, choose the printer menu, then go to settings. Now, choose 'Ghostscript' from the settings list on the left side. Under 'additional ghostscript settings', enter as follows :
-dPDFA|-dBATCH|-dNOPAUSE|-dUseCIEColor|-sProcessColorModel=DeviceCMYK|-sDEVICE=pdfwrite|-sPDFACompatibilityPolicy=1
Click on 'Save', then print something from MS Word or any other application you want using the PDFCreator - it will be created in PDF/A-1b.
Greetings,
Fritz