How to clip and concatenate a page region in multiple pdf files with one page each? - pdf

I have a lot of pdf files each one with an image inside. I want to clip a rectangular region in each of these files and concatenate them into a single pdf file. Is it possible with ghostscript or similar?

I'll have a go at this. Try Briss if you want to crop rectangular regions in pdf files. It's free and cross-platform GUI.
If you have multiple pdf files you can concatenate/merge them first online using http://www.pdfmerge.com/ Then use Briss to crop the images out into a new pdf file. Or vice-versa depending on the location of your images inside the pdf files.
After you fire up Briss, load the merged pdf file containing the images. When you're asked if you want to exlude anything, just click "cancel" if you want to include all pages.
If your file has many pages, similar pages may be overlapping each other so you can draw a rectangle over the region you want to crop. Click Action -> Preview for previewing the output. Click Action -> Crop PDF to finalize your output pdf file. Cheers.

Related

How to find whether the text overlaps with the overlay in a PDF

I am using PDF box version 2.0.25 to manipulate pdf files. I add overlays to the pdf files. These overlays consists of various texts, images and shapes. And I want to check whether they overlap with the existing texts and images of the pdf.
I have found a way to extract the images and text locations of the PDF. But I am unable to find a way to extract overlay information.
Can you please help?

How to replace a specific image within a pdf?

I have a pdf with 3 images
I want to find each image and replace it with another image
I saw in the pdf the original paths under xmpMM:Ingredients:
I tried to change it via notepad++ but it looks like the images are already embedded and changing the path does nothing.
How can I find each image and replace it with another image?
The xmp stuff is information only. The actual images are embedded streams in the pdf file. Finding the correct streams to replace and replacing them isn't a simple problem, and can't be done with notepad. You'll need a library / toolkit that can modify PDFs, like https://pdf-lib.js.org/ or similar.
The PDF file looks like an Illustrator file, which adds another layer of weirdness - Illustrator can write PDFs that have both PDF and Illustrator versions of the content, and you see one in Acrobat and the other in Illustrator.
It's probably easier to recreate the PDF from whatever source produced it.

Create thumbnails using MigraDoc or PDFsharp

We have a need to take a single PDF file, break it into separate page thumbnails, and based on user input, put together selected pages into a new PDF document.
Can someone show a quick example of how to take a single PDF document and generate a thumbnail preview of each page using either MigraDoc or PDFsharp?
Those who read FAQ lists will know that neither PDFsharp nor MigraDoc can render PDF files.
To create thumbnails from PDF pages you have to render them.
You'll need a different library to create thumbnails.
http://pdfsharp.net/wiki/PDFsharpFAQ.ashx

Automatically remove all PDF content outside a crop area

For a deck of lecture slides, I have extracted several vector illustrations from a PDF-file. I did this by highlighting the relevant area in Preview.app, copying, and opening a new file from the clipboard.
The figures look just fine, even though I noticed that the files are a little large. When I open them in Illustrator, I can see what's described in the screenshot – that all of the page content is still there, it's just hidden because it lies outside the crop area.
Now I could simply remove everything except the relevant figures in Illustrator, but I would much rather automate the process, since I have a large number of figures.
How can I automate this process such that everything outside the crop area is discarded and everything inside it is preserved as a vector image?
You can use redact utility to remove the content.
Just go to https://doxiview.cib.de/showcase/index.html?locale=default
Choose redact tool
upload your PDF
Choose on the right Select Area and redact fill color as white
Mark all content, which you want to remove
click on apply
download PDF
Afterwards you can crop the PDF and you won't have the content being still there.
There's no need to rasterize. Just crop the pages then use Acrobat DC to "Sanitize" the document. That will completely remove any non-visible parts of the file.
In Acrobat Pro, go to Preflight and select the setting below.
Then click edit to the right
You should be able to create Adobe droplets with this preflight setting for automation

How can I easily crop a PDF page?

How can I easily crop a PDF page in a given PDF file? I prefer using as little coding as possible, and guess border geometries as little as possible...
There are several options:
Crop by point-and-click using a GUI front-end:
pdf-quench
krop
briss
PDF scissors
Crop by using the command line:
pdfcrop command (provided by texlive-extra-utils), using the following arguments: pdfcrop --margins '-30 -30 -250 -150' --clip input.pdf output.pdf (-left -top -right -bottom format).
PDFCrop
convert -crop command (provided by imagemagick)
Ghostscript
Crop by writing your own script:
Python
LaTeX
For quick, GUI-aided PDF cropping tasks, try pdfarranger (available in Debian repos, formerly known as PDF-Shuffler).
For precise point-and-click cropping, one option is to use LibreOffice Draw.
The instructions below assume you want to crop part of a single-page PDF:
Start with a blank document
Select the Insert > Image... menu
Navigate to the PDF you wish to crop
The contents of the PDF will show up as an image
Right-click on the PDF content in your document and select the "Crop" menu item.
Use the handles to resize the viewable area of the PDF to the section you want to remain after cropping
Click outside of the PDF to disable the crop handles
Click again on the PDF content to position it however you want by:
Dragging it around the page
Using the arrow keys to move it
Use the Draw positioning tools to align or center the PDF content.
When you're happy with the result, save, export it to PDF, or print it.
For multi-page PDFs, You'll have to work page by page by first splitting the PDF into multiple pages using some other tool like PDF Arranger (or simply "Printing to PDF" each page of the PDF you want to crop in your PDF viewer), cropping them one by one with Draw, then recombining them into a single PDF (using PDF Arranger again).
You could try using the pdfCropMargins Python program (https://pypi.org/project/pdfCropMargins/) with the -pg option to select the particular page. The command-line program offers many options, and also has an optional GUI.
You can use Inkscape to losslessly crop PDFs. This uses Inkscape's built-in SVG-PDF conversion.
Open your file in Inkscape: File -> Open -> select your file -> Open
Resize PDF:
Using user-input values: File -> Document properties -> Page -> Custom size
Using auto resize to content: File -> Document properties -> Page -> Custom size -> Resize page to content... -> set desired margin -> Resize page to drawing or selection
Inkscape is a particularly good option as often PDF crop utilities (such as krop, mentioned in other answers) do not change the actual size of the object, instead adjusting how much of the object (e.g. an A4 page) is displayed.
E.g. from krop homepage:
Unfortunately, there is no simple way to eliminate
unnecessary/invisible parts of a PDF file. krop only adjusts which
parts of a PDF are displayed; the original content is still there in
the file and will, for instance, show up when editing the file in
inkscape
Editing directly in Inkscape does exactly what this says is impossible.
The list of tools provided by #sparkler was interesting, but did not help me very much.
Some of the tools provided, actually cropped my pages, but usually they involved some conversion to an image which made pdf files blurry and hard to read.
In the end I used podofocrop of PoDoFo tools which was able to retain all the graphics at full resolution and the text as real text.
It will crop all pages to the minimal size (i.e. without a border).
The command is: podofocrop input.pdf output.pdf
To install on MacOS use brew install podofo