Convert a PDF page to SVG and retain hyperlinks - pdf

I am trying to convert each page of a PDF to an SVG image and retain the hyperlinks.
However using pdftocairo (and imagemagick) the hyperlinks are lost.
I am on Ubuntu and have tried (to create one page)
#pdftocairo -svg -f 1 -l 1 originalFile.pdf Output.svg
This creates a perfect looking SVG image, but none hyperlinks work on the SVG.
Any suggestions on how to convert a page of a PDF to SVG and retain the hyperlinks in the orgininal document?
I am looking for a solultion that will work from CLI or from PHP8.0

Related

How to merge pages of a PDF document into a single page from command line?

I have an HTML document to be printed in an 80mm paper roll. So, I convert the HTML document to PDF and use the lp command to print it. Problem is: the generated PDF file breaks my document into pages. What I need is to merge PDF pages into a single page which has the height of the HTML document I am converting from

How do I convert RMarkdown ioslides presentations to 2-up PDFs programmatically?

I use rmarkdown to generate ioslides HTML presentations, using custom css. This bit is great and I love it. My question is about generating 'notes' versions of presentations.
The only way I've seen to get 2up PDF A4 notes from these slides is to print out of Safari, by clicking Print..., then landscape, then layout 2pages, then border = hairline, then save as. then find the right folder etc. However, it gets the formatting and fonts right, and Webkit renders things that Chrome or other solutions will not.
This is fine for one copy. But I am now regularly updating between 9 and 30 separate presentations at a time and all the clicking sends me nuts, especially when I need to update just a small issue, and I want to check all files have been regenerated as PDF.
Is there any solution to rapidly and programmatically generate a 2-up PDF version of a set of RMarkdown ioslides presentation slides? Alternatively good workarounds would be appreciated.
You can use the webshot package to capture the output of HTML graphics and save this to a graphical device (PDF, PNG, PDF). You can read about it here.
Assuming you have a file called testPres.Rmd stored in the same working directory of the following script, it will convert the report to a PDF:
# Setup
install.packages("webshot")
webshot::install_phantomjs()
library(webshot)
library(rmarkdown)
rmdshot("testPres.Rmd", "document.pdf")
Having created a PDF of the slides, we now need to convert them into a two-page PDF. There is probably a more elegant way of doing this but you could use a very basic R Markdown document. This following script will load all the slides into a two-page layout:
---
output: pdf_document
header-includes:
- \usepackage{pdfpages}
papersize: a4paper
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
\includepdf[pages={1-},scale=0.75, nup=1x2]{document.pdf}
I am not sure this meets your exact requires perfectly, but hopefully is enough to set you in the right direction.
You can check the documentation of the pdfpages LaTeX page to customise how the PDFs of the slide appear in the document (add margins, borders etc.)

Make wkhtmltopdf to render text instead of curves

When converting html into a PDF with wkhtmltopdf it seems that the text gets rendered to curves with the default options instead of getting a text-based PDF.
As a consequence it's not possible to select the text in the PDF (as it is a bunch of curves ressembling text) as well as having rendering problems (instead of delegating the rendering of the font to the PDF viewer).
Additional info
There's much more context in here:
https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2999
Questions
Q1) How can I tell wkhtmltopdf to render the document by placing text instead of converting text to curves?
Q2) How can I ensure that wkhtmltopdf embeds the needed fonts inside the document just in case the destination machine does not have it?

How can I easily crop a PDF page?

How can I easily crop a PDF page in a given PDF file? I prefer using as little coding as possible, and guess border geometries as little as possible...
There are several options:
Crop by point-and-click using a GUI front-end:
pdf-quench
krop
briss
PDF scissors
Crop by using the command line:
pdfcrop command (provided by texlive-extra-utils), using the following arguments: pdfcrop --margins '-30 -30 -250 -150' --clip input.pdf output.pdf (-left -top -right -bottom format).
PDFCrop
convert -crop command (provided by imagemagick)
Ghostscript
Crop by writing your own script:
Python
LaTeX
For quick, GUI-aided PDF cropping tasks, try pdfarranger (available in Debian repos, formerly known as PDF-Shuffler).
For precise point-and-click cropping, one option is to use LibreOffice Draw.
The instructions below assume you want to crop part of a single-page PDF:
Start with a blank document
Select the Insert > Image... menu
Navigate to the PDF you wish to crop
The contents of the PDF will show up as an image
Right-click on the PDF content in your document and select the "Crop" menu item.
Use the handles to resize the viewable area of the PDF to the section you want to remain after cropping
Click outside of the PDF to disable the crop handles
Click again on the PDF content to position it however you want by:
Dragging it around the page
Using the arrow keys to move it
Use the Draw positioning tools to align or center the PDF content.
When you're happy with the result, save, export it to PDF, or print it.
For multi-page PDFs, You'll have to work page by page by first splitting the PDF into multiple pages using some other tool like PDF Arranger (or simply "Printing to PDF" each page of the PDF you want to crop in your PDF viewer), cropping them one by one with Draw, then recombining them into a single PDF (using PDF Arranger again).
You could try using the pdfCropMargins Python program (https://pypi.org/project/pdfCropMargins/) with the -pg option to select the particular page. The command-line program offers many options, and also has an optional GUI.
You can use Inkscape to losslessly crop PDFs. This uses Inkscape's built-in SVG-PDF conversion.
Open your file in Inkscape: File -> Open -> select your file -> Open
Resize PDF:
Using user-input values: File -> Document properties -> Page -> Custom size
Using auto resize to content: File -> Document properties -> Page -> Custom size -> Resize page to content... -> set desired margin -> Resize page to drawing or selection
Inkscape is a particularly good option as often PDF crop utilities (such as krop, mentioned in other answers) do not change the actual size of the object, instead adjusting how much of the object (e.g. an A4 page) is displayed.
E.g. from krop homepage:
Unfortunately, there is no simple way to eliminate
unnecessary/invisible parts of a PDF file. krop only adjusts which
parts of a PDF are displayed; the original content is still there in
the file and will, for instance, show up when editing the file in
inkscape
Editing directly in Inkscape does exactly what this says is impossible.
The list of tools provided by #sparkler was interesting, but did not help me very much.
Some of the tools provided, actually cropped my pages, but usually they involved some conversion to an image which made pdf files blurry and hard to read.
In the end I used podofocrop of PoDoFo tools which was able to retain all the graphics at full resolution and the text as real text.
It will crop all pages to the minimal size (i.e. without a border).
The command is: podofocrop input.pdf output.pdf
To install on MacOS use brew install podofo

Undo Pdfnup Operation

I have a Pdf file which contains several slides per page, including text (not only images).
This pdf was probably created using pdfnup.
Can I revert the pdfnup operation so that each slide is shown on one page?
As far as I know, there is no simple to be used 'undo' operation.
However, the following answers show you the approach principle, how you can achieve the undo-equivalent operation using Ghostscript:
Convert PDF 2 sides per page to 1 side per page (Superuser)
How can I split a PDF's pages down the middle? (Superuser)
Cropping a PDF using Ghostscript 9.01 (Stackoverflow)
PDF - Remove White Margins (Stackoverflow)
(Should these not help you to find the final solution, ask again. But then to come up with a fully working commandline, I'd need the complete output of the following command first: pdfinfo -f 1 -l 100 -box your.pdf.)