How to extract images from PDF using Ghostscript or ImageMagick? - pdf

I need to render or fetch all the images from a specific PDF file. How can I achieve this using Ghostscript or ImageMagick ?

You cannot do it with Ghostscript, but you can do it with Poppler's or XPDF's commandline tools named pdfimages:
pdfimages -j some.pdf subdir/image-prefix
All the images will now be located in subdir/ named image-prefix-0001.jpg, image-prefix-0002.jpg ...
The -j parameter will make the command try to directly extract JPEGs. Failing to create JPEGs, it will create PNMs or PPMs, which you can always convert using ImageMagick:
convert subdir/image-prefix-0033.ppm subdir/image-prefix-0033.jpeg

You certainly can't do it in Ghostscript, without coding yourself a Ghostscript device.
I doubt you can do it with ImageMagick either.
Have you looked at PDFtk ?
If you are on Windows then a quick Google turns up:
http://www.somepdf.com/some-pdf-image-extract.html
And on Linux:
https://askubuntu.com/questions/150100/extracting-images-from-a-pdf

example extracting 1 page:
gs -q -dBATCH -dNOPAUSE -sDEVICE=pnggray -d300 -dFirstPage=1 -dLastPage=1 -sOutputFile=1.tiff in.pdf

Related

GHOSTSCRIPT - PS to PDF conversion paper size

I am trying to run regular conversions of PS to PDF but having some issues with Ghostscript.
Running under normal conditions the output crops the top of the page as you would expect since the PS is configured for A4 but doesn't define a page size.
However, when I use the sPAPERSIZE or change the default in gs_init as described here it prints a blank page.
I tried on a colleges PC who is running adobe distiller and the conversion worked perfectly without issue. I also tried using PDF24 rather than GS directly - it cropped the same but I couldn't find an init file to change.
Unfortunately the PS files are auto generated and so changing them isnt an option.
Windows 10 10.0.17763 x64
GS 9.53.3
PDF24 9.2.2
Adobe Distiller: Version unknown (probably older)
Solved my issue:
C:\Program Files (x86)\gs\gs9.53.3\bin>gswin32c.exe -sOutputFile="output.pdf" -dNOPAUSE -dBATCH -sPAPERSIZE=a4 -sDEVICE=pdfwrite -dSAFER "input.PS"
Solved using a combination of answers from here to get the gs function
https://stackoverflow.com/questions/30128250/ps2pdf-preserve-page-size#:~:text=An%20A4%20page%20has%20a,it%20comes%20to%20PDF%20output
and here to run gs in the command line (unable to get it to work outside of command line)
Keep getting error messages in ghostscript when using the documented ghostscript syntax

PDF - Programmatically remove hyperlinks using Ghostscript

I have a PDF document which has hyperlinks at the bottom of each page. Last week I successfully removed them using the trial version of Adobe Acrobat X Pro on Windows, however since then I've mislaid the new document and I've installed Ubuntu 14.04. Is there a way I can programmatically do a (Tools > Edit Tool > Delete) action as I did on Windows using something like Ghostscript? I don't want to reinstall Windows, but I will if there's no alternative.
As for this answer on TexExchange, discarding hyperlinks in PDFs is actually the default in ghostscript. So simply running it like this will do the trick:
gs -sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/prepress \
-dNOPAUSE -dQUIET -dBATCH \
-sOutputFile=output.pdf \
input.pdf
You would need to set the option "-dPrinted=false" in order for ghostscript to leave hyperlinks in the output PDF.
Using gs 9.27, apparently the default behaviour changed, i needed
-dPreserveAnnots=false
to remove hyperlinks.
Source: https://www.ghostscript.com/doc/current/VectorDevices.htm

Converting PDF to Color PCL 5 with Ghostscript

I'm using Ghostscript 9.09 and try create PCL 5 with color, but only get a mono file. What are the correct arguments for this problem?
Passing the following arguments:
-q
-dQUIET
-dBATCH
-dNOPAUSE
-sDEVICE=ljet4
-sOutputFile=d:\\output.pcl
c:\\input.pdf
The proper device is cljet5c. This is dev device so need set in psi\msvc.mak cljet5c.dev and rebuild that.

Ghostscript SVG output device

I'm led to believe that it's possible to output from Ghostscript to SVG, as described on this blog post:
gs -dBATCH -dSAFER -dNOPAUSE -sDEVICE=svg -sOutputFile=Logo.svg Logo.pdf
However, I just get "Unknown device: svg"
I am using Ghostscript 9.06
My question is: where do I get the svg device, and how do I install it? (Red Hat x64)
So far I have tried googling (many dead ends but no real mention of this output device) and looking on the Ghostscript website.
If you do gs -? the usage will give you a list of available devices. Presumably your build doesn't include the svg device, in which case you will need to rebuild your executable with that support included.
I do have svg as one of my available devices. It does something a little different with the output than you expect. Specifically, it sends the file to standard error instead of to the output file you modified. Set up your command line like this for the special case:
gs -dBATCH -dSAFER -dNOPAUSE -sDEVICE=svg Logo.pdf 2>Logo.svg
You may or may want to look into the -q flag, which will suppress the usual standard output.

Linux command-line utility to remove colors in a PDF file?

I'm searching for a linux command-line utility/script capable of removing colors in a PDF. The output of the utility should be the same PDF, but in grayscale.
Does anyone know how to do this?
Thanks
You can use Ghostscript:
gswin32c ^
-o grayscale.pdf ^
-sDEVICE=pdfwrite ^
-sColorConversionStrategy=Gray ^
-sProcessColorModel=DeviceGray ^
-dCompatibilityLevel=1.4 ^
c:/path/to/input.pdf
(example is for Windows; on Linux use gs instead of gswin32c.exe and \ as a line continuation mark instead of ^).
Update
If color conversion does not work as desired and if you see a message like "Unable to convert color space to Gray, reverting strategy to LeaveColorUnchanged" then...
your Ghostscript probably is a newer release from the 9.x version series, and
your source PDF likely uses an embedded ICC color profile
In this case add -dOverrideICC to the command line and see if it changes the result as desired.
Also, the original answer contained a typo:
it used -sProcessColorModel=/DeviceGray (additional forward slash character)
instead of -sProcessColorModel=DeviceGray (no forward slash))