How can I easily crop a PDF page?

How can I easily crop a PDF page? - pdf

How can I easily crop a PDF page in a given PDF file? I prefer using as little coding as possible, and guess border geometries as little as possible...

There are several options:
Crop by point-and-click using a GUI front-end:
pdf-quench
krop
briss
PDF scissors
Crop by using the command line:
pdfcrop command (provided by texlive-extra-utils), using the following arguments: pdfcrop --margins '-30 -30 -250 -150' --clip input.pdf output.pdf (-left -top -right -bottom format).
PDFCrop
convert -crop command (provided by imagemagick)
Ghostscript
Crop by writing your own script:
Python
LaTeX

For quick, GUI-aided PDF cropping tasks, try pdfarranger (available in Debian repos, formerly known as PDF-Shuffler).

For precise point-and-click cropping, one option is to use LibreOffice Draw.
The instructions below assume you want to crop part of a single-page PDF:
Start with a blank document
Select the Insert > Image... menu
Navigate to the PDF you wish to crop
The contents of the PDF will show up as an image
Right-click on the PDF content in your document and select the "Crop" menu item.
Use the handles to resize the viewable area of the PDF to the section you want to remain after cropping
Click outside of the PDF to disable the crop handles
Click again on the PDF content to position it however you want by:
Dragging it around the page
Using the arrow keys to move it
Use the Draw positioning tools to align or center the PDF content.
When you're happy with the result, save, export it to PDF, or print it.
For multi-page PDFs, You'll have to work page by page by first splitting the PDF into multiple pages using some other tool like PDF Arranger (or simply "Printing to PDF" each page of the PDF you want to crop in your PDF viewer), cropping them one by one with Draw, then recombining them into a single PDF (using PDF Arranger again).

You could try using the pdfCropMargins Python program (https://pypi.org/project/pdfCropMargins/) with the -pg option to select the particular page. The command-line program offers many options, and also has an optional GUI.

You can use Inkscape to losslessly crop PDFs. This uses Inkscape's built-in SVG-PDF conversion.
Open your file in Inkscape: File -> Open -> select your file -> Open
Resize PDF:
Using user-input values: File -> Document properties -> Page -> Custom size
Using auto resize to content: File -> Document properties -> Page -> Custom size -> Resize page to content... -> set desired margin -> Resize page to drawing or selection
Inkscape is a particularly good option as often PDF crop utilities (such as krop, mentioned in other answers) do not change the actual size of the object, instead adjusting how much of the object (e.g. an A4 page) is displayed.
E.g. from krop homepage:
Unfortunately, there is no simple way to eliminate
unnecessary/invisible parts of a PDF file. krop only adjusts which
parts of a PDF are displayed; the original content is still there in
the file and will, for instance, show up when editing the file in
inkscape
Editing directly in Inkscape does exactly what this says is impossible.

The list of tools provided by #sparkler was interesting, but did not help me very much.
Some of the tools provided, actually cropped my pages, but usually they involved some conversion to an image which made pdf files blurry and hard to read.
In the end I used podofocrop of PoDoFo tools which was able to retain all the graphics at full resolution and the text as real text.
It will crop all pages to the minimal size (i.e. without a border).
The command is: podofocrop input.pdf output.pdf
To install on MacOS use brew install podofo

Related

Embedding PDF graphics in PDF output file programmatically

I am looking for a rough overview of how one would go about embedding graphics (coming from a PDF file) into another PDF file when writing a C++ document processor.
Background: I work on the LilyPond music typesetter, and recently added Cairo output to the system. Now I would like to support adding externally provided graphics to the PDF files that we generate (eg. adding a logo onto page laid out). This is trivial with EPS for PS output.
I can see how you could hook up Poppler to read the PDF, and render the PDF contents onto a Cairo surface, but I wonder if there is a simpler shortcut (eg. embed the PDF file as a binary stream, and then point directly to that stream).

If you need to go via an external route, like reading the PDF and writing it into an existing PDF using Cairo, that would be simpler. To do it manually:
A PDF page consists of a stream of operators for drawing it, and a dictionary of external resources (fonts, images etc.). To stamp one PDF page onto another, you would need to:
a) Find all objects for external resources in the stamp which are needed, and add them to the destination PDF.
b) Convert the page to a "Form Xobject", which is a sort of reusable piece of content. Add this to the /XObjects entry in the destination page, making sure to pick a fresh name.
c) Add some operators to the page content in the destination page to invoke the new xobject
To see how this might work, you could play with -stamp-as-xobject and -postpend-content "/XObjName Do" from section 8.4 of the cpdf manual.
Making this work for arbitrary PDFs is really not for the faint of heart, I'm afraid.

How to convert a "pdf" to "odg" file with OpenOffice cmd

I can easily convert a pdf to an odt file using:
soffice --infilter="writer_pdf_import" --convert-to odt a.pdf
But when I try to do:
soffice --infilter="writer_pdf_import" --convert-to odg a.pdf
I get an error:
no export filter

TL;DR the answer is at the bottom but do read the following as to why there can be issues
ODG is a multi-part graphics file usually a blank template, often similar to an ORA, however there are many ways they can be structured and converted TO a set of PDF page printouts, as they contain thumbnails, plus one or more high resolution images or scalable vector graphics. Common variants can be used with Inkscape, Krita possibly Scribus / OODraw and other more GRAPHIC apps.
PDF is a page document output format thus not a suitable candidate for converting to professional images with scalar graphics. *Except see the last comment
ODG or ORA may be done well in image conversion but the reverse is not usually true.
Open Office Graphic is like a DocX, a zip wrapper around a core object, here it is a Jpeg but could be PNG SVG etc.
However the contents of the zip are not simple potentially running to thousands of lines of coding. Thus you need to use a more appropriate method to hand build an ODG not simple command line conversion from cruder PDF.
The real strength of a EXPORT from draw as PDF is the hybrid use of embedding ODFG content thus opening such a PDF you can edit it in Draw.
And it will look just as good in any PDF viewer. However it is too specialist to be simply translated without the app settings. In reality the PDF is the chimera/polyglot ODG.
But if you wish to try with simple files the command line is for a.pdf to a.odg
soffice --infilter="draw_pdf_import" --convert-to odg a.pdf

How do I convert RMarkdown ioslides presentations to 2-up PDFs programmatically?

I use rmarkdown to generate ioslides HTML presentations, using custom css. This bit is great and I love it. My question is about generating 'notes' versions of presentations.
The only way I've seen to get 2up PDF A4 notes from these slides is to print out of Safari, by clicking Print..., then landscape, then layout 2pages, then border = hairline, then save as. then find the right folder etc. However, it gets the formatting and fonts right, and Webkit renders things that Chrome or other solutions will not.
This is fine for one copy. But I am now regularly updating between 9 and 30 separate presentations at a time and all the clicking sends me nuts, especially when I need to update just a small issue, and I want to check all files have been regenerated as PDF.
Is there any solution to rapidly and programmatically generate a 2-up PDF version of a set of RMarkdown ioslides presentation slides? Alternatively good workarounds would be appreciated.

You can use the webshot package to capture the output of HTML graphics and save this to a graphical device (PDF, PNG, PDF). You can read about it here.
Assuming you have a file called testPres.Rmd stored in the same working directory of the following script, it will convert the report to a PDF:
# Setup
install.packages("webshot")
webshot::install_phantomjs()
library(webshot)
library(rmarkdown)
rmdshot("testPres.Rmd", "document.pdf")
Having created a PDF of the slides, we now need to convert them into a two-page PDF. There is probably a more elegant way of doing this but you could use a very basic R Markdown document. This following script will load all the slides into a two-page layout:
---
output: pdf_document
header-includes:
- \usepackage{pdfpages}
papersize: a4paper
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
\includepdf[pages={1-},scale=0.75, nup=1x2]{document.pdf}
I am not sure this meets your exact requires perfectly, but hopefully is enough to set you in the right direction.
You can check the documentation of the pdfpages LaTeX page to customise how the PDFs of the slide appear in the document (add margins, borders etc.)

Automatically remove all PDF content outside a crop area

For a deck of lecture slides, I have extracted several vector illustrations from a PDF-file. I did this by highlighting the relevant area in Preview.app, copying, and opening a new file from the clipboard.
The figures look just fine, even though I noticed that the files are a little large. When I open them in Illustrator, I can see what's described in the screenshot – that all of the page content is still there, it's just hidden because it lies outside the crop area.
Now I could simply remove everything except the relevant figures in Illustrator, but I would much rather automate the process, since I have a large number of figures.
How can I automate this process such that everything outside the crop area is discarded and everything inside it is preserved as a vector image?

You can use redact utility to remove the content.
Just go to https://doxiview.cib.de/showcase/index.html?locale=default
Choose redact tool
upload your PDF
Choose on the right Select Area and redact fill color as white
Mark all content, which you want to remove
click on apply
download PDF
Afterwards you can crop the PDF and you won't have the content being still there.

There's no need to rasterize. Just crop the pages then use Acrobat DC to "Sanitize" the document. That will completely remove any non-visible parts of the file.

In Acrobat Pro, go to Preflight and select the setting below.
Then click edit to the right
You should be able to create Adobe droplets with this preflight setting for automation

How to clip and concatenate a page region in multiple pdf files with one page each?

I have a lot of pdf files each one with an image inside. I want to clip a rectangular region in each of these files and concatenate them into a single pdf file. Is it possible with ghostscript or similar?

I'll have a go at this. Try Briss if you want to crop rectangular regions in pdf files. It's free and cross-platform GUI.
If you have multiple pdf files you can concatenate/merge them first online using http://www.pdfmerge.com/ Then use Briss to crop the images out into a new pdf file. Or vice-versa depending on the location of your images inside the pdf files.
After you fire up Briss, load the merged pdf file containing the images. When you're asked if you want to exlude anything, just click "cancel" if you want to include all pages.
If your file has many pages, similar pages may be overlapping each other so you can draw a rectangle over the region you want to crop. Click Action -> Preview for previewing the output. Click Action -> Crop PDF to finalize your output pdf file. Cheers.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas