Add pdf file in Rmarkdown file - pdf

Is it possible to display a pdf file in an Rmarkdown file? Say, for example, a previously saved image myimage.pdf
What I am aiming is a form of
\includegraphics[width=0.5]{myimage.pdf}

An update from the very end of 2017
It is possible to import a PDF image using knitr::include_graphics(), e.g.
```{r label, out.width = "85%", fig.cap = "caption"}
include_graphics("path-to-your-image.pdf")
```

You can insert this directly into your R Markdown. The alternate name will only be displayed if the image does not load.
![Alternate Name](file.pdf){width=500px}

For the record and as suggested in the comments, this works perfectly fine as long as you're knitting to pdf (this is a LaTex command that won't work if you try to knit to HTML or docx though):
\includegraphics[width = \textwidth]{myimage.pdf}
You can change the width with something like
\includegraphics[width = 0.5\textwidth]{myimage.pdf}

Related

How can I insert and fit a PDF file using Latex commands

\begin{figure}[ht]
\begin{center}
\includegraphics[height = 3.5\textheight, width = 3.0\textwidth]{fig/InterfaceEMF2ASMFOR}
\end{center}
\caption{list of global parameters customer, FRD versions} \label{fig: limit}
\end{figure}
is the code I have used in my Latex file.
NOTE: {fig/InterfaceEMF2ASMFOR} is a pdf file
After the pdf report got generated the included {fig/InterfaceEMF2ASMFOR} does not look good and not fit to the page in PDF report.
The inserted pdf always loose some portion of it and always inserted from the middle of the page but not from the edges of the page.
Please help me how to fit the fig to the PDF I am creating.
You can include the whole pdf page with the pdfpages package:
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
test
\includepdf{example-image-a4}
text
\end{document}

How to add footer to pdf with pdfjam or pdftk?

I am using a shell script to modify many pdfs and would like to create a script that adds the page number (1 of X format) to the bottom of PDFs in a directory along with the text of the filename.
I tried using pdfjam with this format:
pdfjam --pagenumbering true
but it fails saying undefine pagenumbering
Any other recommendations how to do this? I am OK installing other tools but would like this to all be within a shell script.
Thank you
tl;dr: pdfjam --pagecommand '' input.pdf
By default, pdfjam adds the following LaTeX command to every page: \thispagestyle{empty}. By changing the command to an empty command, the default plain page style is used, which consists of a page number at the bottom. Of course you may want to play with other styles or layout options to position the page number differently.

Convert PDF text into outlines?

Does anybody know a way to vectorize the text in a PDF document? That is, I want each letter to be a shape/outline, without any textual content. I'm using a Linux system, and open source or a non-Windows solution would be preferred.
The context: I'm trying to edit some old PDFs, for which I no longer have the fonts. I'd like to do it in Inkscape, but that will replace all the fonts with generic ones, and that's barely readable. I've also been converting back and forth using pdf2ps and ps2pdf, but the font info stays there. So when I load it into Inkscape, it still looks awful.
Any ideas? Thanks.
To achieve this, you will have to:
Split your PDF into individual pages;
Convert your PDF pages into SVG;
Edit the pages you want
Reassemble the pages
This answer will omit step 3, since that's not programmable.
Splitting the PDF
If you don't want a programmatic way to split documents, the modern way would be with using stapler. In your favorite shell:
stapler burst file.pdf
Would generate {file_1.pdf,...,file_N.pdf}, where 1...N are the PDF pages. Stapler itself uses PyPDF2 and the code for splitting a PDF file is not that complex. The following function splits a file and saves the individual pages in the current directory. (shamelessly copying from the commands.py file)
import math
import os
from PyPDF2 import PdfFileWriter, PdfFileReader
def split(filename):
with open(filename) as inputfp:
inputpdf = PdfFileReader(inputfp)
base, ext = os.path.splitext(os.path.basename(filename))
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(inputpdf.getNumPages()))),
'd',
ext
])
for page in range(inputpdf.getNumPages()):
outputpdf = PdfFileWriter()
outputpdf.addPage(inputpdf.getPage(page))
outputname = output_template % (page + 1)
with open(outputname, 'wb') as fp:
outputpdf.write(fp)
Converting the individual pages to SVG
Now to convert the PDFs to editable files, I'd probably use pdf2svg.
pdf2svg input.pdf output.svg
If we take a look at the pdf2svg.c file, we can see that the code in principle is not that complex (assuming the input filename is in the filename variable and the output file name is in the outputname variable). A minimal working example in python follows. It requires the pycairo and pypoppler libraries:
import os
import cairo
import poppler
def convert(inputname, outputname):
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
# We only have one page, since we split prior to converting. Get the page
page = pdffile.get_page(0)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
At this point you should have an SVG in which all text has been converted to paths, and will be able to edit with Inkscape without rendering issues.
Combining steps 1 and 2
You can call pdf2svg in a for loop to do that. But you would need to know the number of pages beforehand. The code below figures the number of pages and does the conversion in a single step. It requires only pycairo and pypoppler:
import os, math
import cairo
import poppler
def convert(inputname, base=None):
'''Converts a multi-page PDF to multiple SVG files.
:param inputname: Name of the PDF to be converted
:param base: Base name for the SVG files (optional)
'''
if base is None:
base, ext = os.path.splitext(os.path.basename(inputname))
# Convert the input file name to an URI to please poppler
uri = 'file://' + os.path.abspath(inputname)
pdffile = poppler.document_new_from_file(uri, None)
pages = pdffile.get_n_pages()
# Prefix the output template with zeros so that ordering is preserved
# (page 10 after page 09)
output_template = ''.join([
base,
'_',
'%0',
str(math.ceil(math.log10(pages))),
'd',
'.svg'
])
# Iterate over all pages
for nthpage in range(pages):
page = pdffile.get_page(nthpage)
# Output file name based on template
outputname = output_template % (nthpage + 1)
# Get the page dimensions
width, height = page.get_size()
# Open the SVG file to write on
surface = cairo.SVGSurface(outputname, width, height)
context = cairo.Context(surface)
# Now we finally can render the PDF to SVG
page.render_for_printing(context)
context.show_page()
# Free some memory
surface.finish()
Assembling the SVGs into a single PDF
To reassemble you can use the pair inkscape / stapler to convert the files manually. But it is not hard to write code that does this. The code below uses rsvg and cairo. To convert from SVG and merge everything into a single PDF:
import rsvg
import cairo
def convert_merge(inputfiles, outputname):
# We have to create a PDF surface and inform a size. The size is
# irrelevant, though, as we will define the sizes of each page
# individually.
outputsurface = cairo.PDFSurface(outputname, 1, 1)
outputcontext = cairo.Context(outputsurface)
for inputfile in inputfiles:
# Open the SVG
svg = rsvg.Handle(file=inputfile)
# Set the size of the page itself
outputsurface.set_size(svg.props.width, svg.props.height)
# Draw on the PDF
svg.render_cairo(outputcontext)
# Finish the page and start a new one
outputcontext.show_page()
# Free some memory
outputsurface.finish()
PS: It should be possible to use the command pdftocairo, but it doesn't seem to call render_for_printing(), which makes the output SVG maintain the font information.
I'm afraid to vectorize the PDFs you would still need the original fonts (or a lot of work).
Some possibilities that come to mind:
dump the uncompressed PDF with pdftk and discover what the font names are, then look for them on FontMonster or other font service.
use some online font recognition service to get a close match with your font, in order to preserve kerning (I guess kerning and alignment are what's making your text unreadable)
try replacing the fonts manually (again pdftk to convert the PDF to a PDF which is editable with sed. This editing will break the PDF, but pdftk will then be able to recompress the damaged PDF to a useable one).
Here's what you really want - font substitution. You want some code/app to be able to go through the file and make appropriate changes to the embedded fonts.
This task is doable and is anywhere from easy to non-trivial. It's easy when you have a font that matches the metrics of the font in the file and the encoding used for the font is sane. You could probably do this with iText or DotPdf (the latter is not free beyond the evaluation, and is my company's product). If you modified pdf2ps, you could probably manage changing the fonts on the way through too.
If the fonts used in the file are font subsets that have creative reencoding, then you are in hell and will likely have all manner of pain doing the change. Here's why:
PostScript was designed at a point when there was no Unicode. Adobe used a single byte for characters and whenever you rendered any string, the glyph to draw was taken from a 256 entry table called the encoding vector. If a standard encoding didn't have what you wanted, you were encouraged to make fonts on the fly based on the standard font that differed only in encoding.
When Adobe created Acrobat, they wanted to make transition from PostScript as easy as possible so that font mechanism was modeled. When the ability to embed fonts into PDFs was added, it was clear that this would bloat the files, so PDF also included the ability to have font subsets. Font subsets are made by taking an existing font and removing all the glyphs that won't be used and re-encoding it into the PDF. The may be no standard relationship between the encoding vector and the code points in the file - all those may be changed. Instead, there may be an embedded PostScript function /ToUnicode which will translate encoded characters to a Unicode representation.
So yeah, non-trivial.
For the folks who come after me:
The best solutions I found were to use Evince to print as SVG, or to use the pdf2svg program that's accessible via Synaptic on Mint. However, Inkscape wasn't able to cope with the resulting SVGs--it entered an infinite loop with the error message:
File display/nr-arena-item.cpp line 323 (?): Assertion item->state & NR_ARENA_ITEM_STATE_BBOX failed
I'm giving up this quest for now, but maybe I'll try again in a year or two. In the meantime, maybe one of these solutions will work for you.

Is there an "Export to Pdf" plugin for Tiddlywiki?

Has anyone put together a plugin or tool for exporting a Tiddlywiki to pdf?
No, there isn't.
As a workaround, I write or find a decent printable stylesheet, then print to PDF.
Why not select the target tiddler to "Open in new window", and print it to PDF with any installed PDF printer?
To accomplish this I used a tool to convert HTML to PDF. These steps are a bit long but well worth it. Once you've got it working it is easily repeated.
In each tiddler that I want in my PDF, I mark with a specific tag; I used TableOfContents.
In each tiddler that is marked with this tag, I added an order field--to be used to define the order of tiddlers to appear in the PDF.
Ensure your HTML headers are properly defined for the document. I think tiddler titles use <h2>, so properly defining subheadings using <h3><h4> etc will ensure, if you want, a nice auto-generated Table of Contents in your PDF.
If you want each tiddler to start on a new page (in the PDF), we need to add this HTML to the end of each tiddler:
<div style = "display:block; clear:both; page-break-after:always;"></div>
With a completed TiddlyWiki document export the tiddlers to a single HTML file--this will be used to generate a PDF document. To export, go to the AdvancedSearch, select the Filter tab. In the search textbox enter your filter criteria--for me that was:
[tag[TableOfContents]sort[order]]
You'll see, immediately, on-screen a list of the tiddlers the system found based on that criteria. Then click on the Export icon and select Static HTML.
Optionally, but I think it's a great idea, manually create a cover page (in your favorite editor)--this will be a single HTML file to act as the cover page in the PDF document; call it cover.html. More on this later.
Download and install wkhtmltopdf (command-line tool to generate PDF from an HTML file).
https://wkhtmltopdf.org/downloads.html
Learn and get familiar with the wkhtmltopdf command line syntax. There are numerous features here so the command you end up with maybe lengthy. Use wkhtmltopdf /? to view general help, then wkhtmltopdf --extended-help to view details (well worth the read).
Generate a PDF document. At the command prompt navigate to the folder where your TiddlyWiki document is located. Here is a list of my favorite command-line switches. My app is installed in C:\Program Files..., so my command line starts with that...
"c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe"
Add this switch for a header on the left:
--header-left "My document title"
For a header on the right:
--header-right "v1.0.0.1"
Font size of header:
--header-font-size 8
Display a line below the header:
--header-line
Spacing between header and content in mm (default 0):
--header-spacing 5
A left-footer ([section] is replaced with the name of the current section:
--footer-left "[section]"
A centered footer:
--footer-center "Page [page] of [topage]"
Footer font size:
--footer-font-size 8
Footer spacing:
--footer-spacing 5
If you want titles to hyperlink (in the PDF) to go back to the TOC:
--enable-toc-back-links
Make sure no background images get printed:
--no-background
I added special styles in the TiddlyWiki document for print media--to hide tags and clean up the spacing. Then I used this switch to ensure print media is used:
--print-media-type
Being in North America I want letter-size pages; I think the default is A4:
-s Letter
IMPORTANT--give the tool access to local files, otherwise your images will be missing in the PDF:
--enable-local-file-access
Use this if you want to have a cover page (see step 6 above):
cover "cover.htm"
And use this if you want a TOC automatically generated. Without a cover page, the TOC will be your first page, so create a cover page:
toc
After the toc identify your exported tiddler HTML file as input to the tool:
tiddlers.html
And, the final argument on the command line is the output PDF file name:
MyDocument.pdf
Export the tid to html.
Then in the terminal, issue:
html2pdf $myTid.html $myTid.pdf
$myTid is only a var and can be any name
:)

How to do mail merge on top of a PDF?

I often get a PDF from our designer (built in Adobe InDesign) which is supposed to be sent out to thousands of people.
I've got the list with all the people, and it's easy doing a mail merge in OpenOffice.org. However, OpenOffice.org doesn't support the advanced PDF. I just want to output some text onto each page and print it out.
Here's how I do it now: print out 6.000 copies of the PDF, then put all of them into the printer again and just print out name, address and other information on top of it. But that's expensive.
Sadly, I can't make the PDF to an image and use that in OpenOffice.org because it grinds the computer to a halt. It also takes extremely long time to send this job to the printer.
So, is there an easy way to do this mail merge (preferably in Python) without paying for third party closed solutions?
Now I've made an account. I fixed it by using the ingenious pdftk.
In my quest I totally overlook the feature "background" and "overlay". My solution was this:
pdftk names.pdf background boat_background.pdf output out.pdf
Creating the names.pdf you can easily do with Python reportlab or similar PDF-creation scripts. It's best using code to do that, creating 6k pages took several hours in LibreOffice/OpenOffice, while it took just a few seconds using Python.
You could probably look at a PDF library like iText. If you have some programming knowledge and a bit of time you could write some code that adds the contact information to the PDFs
There are two much simpler and cheaper solutions.
First, you can do your mail merge directly in InDesign using DataMerge. This is a utility added to InDesign way back in CS. You export or save your names in CSV format. Import the data into an InDesign template and then drop in your name, address and such fields in the layout. Press Go. It will create a new document with all the finished letters or you can go right to the printer.
OR, you can export your data to an XML file and create a dynamic layout using XML placeholders in InDesign.
The book A Designer's Guide to Adobe InDesign and XML will teach you how to do this, or you can check out the Lynda.com videos for Dynamic workflows with InDesign and XML.
Very easy to do.
If you want to create separate PDFs files for the mail merge, you can run out one long PDF with all the names in one file then do an Extract to Separate PDF files in Acrobat Pro itself.
If you cannot get the template in another format than PDF a simple ad-hoc solution would be to
convert the PDF into an image
put the image in the backgroud of your (OpenOffice.org) document
position mail merge fields on top of the image
do the mail merge and print
Probably the best way would be to generate another PDF with the missing text, and overlay one PDF over the other. A quick Google found this link showing how to do it in Acrobat, and I'm sure there are other methods as well.
http://forums.macrumors.com/showthread.php?t=508226
For a no-mess, no-fuss solution, use iText to simply add the text to the pdf. For example, you can do the following to add text to a pdf document once loaded:
PdfContentByte cb= ...;
cb.BeginText();
cb.SetFontAndSize(font, fontSize);
float x = ...;
float y = ...;
cb.SetTextMatrix(x, y);
cb.ShowText(fieldValue);
cb.EndText();
From there on, save it as a different file, and print it.
However, I've found that form fields are the way to go with pdf document generation from templates.
If you have a template with form fields (added with Adobe Acrobat), you have one of two choices :
Create a FDF file, which is essentially a list of values for the fields on the form. A FDF is a simple text document which references the original document so that when you open up the PDF, the document loads with the field values supplied by the FDF.
Alternatively, load the template with with a library like iText / iTextSharp, fill the form fields manually, and save it as a seperate pdf.
A sample FDF file looks like this (stolen from Planet PDF) :
%FDF-1.2
%âãÏÓ
1 0 obj
<<<
/F(Example PDF Form.pdf)
/Fields[
<<
/T(myTextField)
/V(myTextField default value)
>>
]
>>
>> endobj trailer
<>
%%EOF
Because of the simple format and the small size of the FDF, this is the preferred approach, and the approach should work well in any language.
As for filling the fields programmatically, you can use iText in the following way :
PdfAcroForm acroForm = writer.AcroForm;
acroForm.Put(new PdfName(fieldInfo.Name), new PdfString(fieldInfo.Value));
What about using a variable data program such as - XMPie for Adobe Indesign. It's a plug-in that should reference to your list of people (think it might have to be a list in Excel though).
One easy way would be to create a fillable pdf form from the original document in Acrobat and do a mail merge with the form and a csv.
PDF mail merges are relatively easy to do in python and pdftk. Fdfgen (pip install fdfgen) is a python library that will create an fdf from a python array, so you can save the excel grid to a csv, make sure that the csv headers match the name of the pdf form field you want to fill with that column, and do something like
import csv
import subprocess
from fdfgen import forge_fdf
PDF_FORM = 'path/to/form.pdf'
CSV_DATA = 'path/to/data.csv'
infile = open(CSV_DATA, 'rb')
reader = csv.DictReader(infile)
rows = [row for row in reader]
infile.close()
for row in rows:
# Create fdf
filename = row['filename'] # Construct filename
fdf_data = [(k,v) for k, v in row.items()]
fdf = forge_fdf(fdf_data_strings=fdf_data)
fdf_file = open(filename+'.fdf', 'wb')
fdf_file.write(fdf)
fdf_file.close()
# Use PDFTK to create filled, flattened, pdf file
cmds = ['pdftk', PDF_FORM, 'fill_form', filename+'.fdf',
'output', filename+'.pdf', 'flatten', 'dont_ask']
process = subprocess.Popen(cmds, stdout=subprocess.PIPE)
stdout, stderr = process.communicate()
returncode = process.poll()
os.remove(filename+'.fdf')
I've encountered this problem enough to write my own free solution, PdfZero. PdfZero has a mail merge feature to merge spreadsheets with PDF forms. You will still need to create a PDF form, but you can upload the form and csv to pdfzero, select which form fields you want filled with which columns, create a naming convention for each filled pdf using the csv data if needed, and batch generate the filled PDfs.
DISCLAIMER: I wrote PdfZero
Someone asked for specifics. I didn't want to sully my top answer with it, because you can do it how you like (and just knowing pdftk is up to it should give people the idea).
But here's some scripts I used ages ago:
csv_to_pdf.py
#!/usr/bin/python
# This makes one PDF page per name in the CSV file
# csv_to_pdf.py <CSV_FILE>
import csv
import sys
from reportlab.pdfgen.canvas import Canvas
from reportlab.lib.units import cm, mm
in_db = csv.reader(open(sys.argv[1], "rb"));
outname = sys.argv[1].replace("csv", "pdf")
pdf = Canvas(outname)
in_db.next()
i = 0
for rad in in_db:
pdf.setFontSize(11)
adr = rad[1]
tekst = pdf.beginText(2*cm, 26*cm)
for a in adr.split('\n'):
if not a.strip():
continue
if a[-1] == ',':
a = a[:-1]
tekst.textLine(a)
pdf.drawText(tekst)
pdf.showPage()
i += 1
if i % 1000 == 0:
print i
pdf.save()
When you've ran this, you have a file with thousands of pages, only with a name on it. This is when you can background the fancy PDF under all of them:
pdftk <YOUR_NEW_PDF_FILE.pdf> background <DESIGNED_FILE.pdf> <MERGED.pdf>
You can use InDesign's data merge function, or you can do what you've been doing with printing a portion of the job, and then printing the mail merge atop that with Word or Open Office.
But also look into finding a company that can do variable data offset printing or dynamic publishing. Might be a little more expensive up front but can save a bundle when it comes to time, testing, even packaging and mailing.
Disclaimer: I'm the author of this tool.
I ran into this issue enough times that I built a free online tool for it: https://pdfbatchfill.com/
It assumes a PDF form as a template and uses that along with CSV form data to generate a single PDF or individual PDFs in a zip file.