Imagemagick/GhostScript conversion to jpeg/png ignores the pdf background - pdf

What I am doing is making thumbnails for pdf files (only the first page). I use imagemagick like this (simplified without the resize. It has the same problem):
convert mreji.pdf[0] test.jpg
The problem is that it just ignores my pdf's background and turns it black. It's not transparent either (if I use png instead of jpg), it's just black. I want to keep the original background color.
Here is the test pdf: http://slides.bg/website/Uploads/Temp/mreji.pdf
And the imagemagick output here: http://slides.bg/website/Uploads/Temp/mreji.jpg
Notice that the background color is replaced with black. I want to keep the original one.
I tried using GhostScript directly
gs -sDEVICE=jpeg -sOutputFile=cover.jpg -r72 mreji.pdf
Again, the same output. Maybe there is an argument to prevent that from happening?

The problem may be with the "smooth shading" objects in that PDF.
There are a lot (29) Type 2 (Axial Shading), smooth shading objects in the PDF used for the backgrounds and IIRC GhostScript has had problems with these and a number of bug fixes over the years, what version of gs are you running?
Easiest solution is to raster the background in whatever created the PDF for this purpose.

Try adding the flatten parameter:
convert mreji.pdf[0] test.jpg -flatten

Related

ImageMagick convert (PDF to JPG) returns white image with black stripes at the top

I've encountered some issues in converting a multipage PDF files to JPG with ImageMagick.
The process returns a white page with black stripes at the top. It seems like the text has been 'compressed' and written at the very top of the page. I experience this behavior only with a given PDF file (the others work fine). I am running ImageMagick-6.9.8-Q16 on Windows. I also tried with ImageMagick-7.0.5 but I obtain the same result. I also tried using directly Ghostscript (9.21) and the output is the same. I think there must be something wrong with the PDF at this point.
Here's the pdf I am trying to convert to image and here is the result I obtain for the first page of the PDF. Subsequent ones are also like this.
Any clues on what's going on? Any suggestion on how to make it work?

Use Ghostscript / PostScript to convert all text colours to black within a PDF

I want to convert the white text in this PDF into black text and generate a new PDF with the changed text.
I have found this
http://www.artifex.com/files/Ghostscript_Color_Architecture.pdf
which mentions settings like -sTextICCProfile but using black_output.icc from
http://www(dot)ghostscript.com/doc/toolbin/color/icc_creator/effects/
like so:
gs -o test.pdf -sTextICCProfile=black_output.icc out.pdf
does not change the text colour to black.
Is the usage of the .icc profile incorrect? Is it even the right approach?
Is there a way to achieve this with postscript?
Example PDF
The usage of the ICCProfile is correct...
However, that usage is for rendering, it has no effect on the pdfwrite device at all (because it doesn't render the input, it turns it into a PDF file). So no, this is not the correct approach.
There is no real means to do what you want with Ghostscript. Technically its probably possible, but it wouldn't be easy. You also haven't apparently posted an example of the PDF file. Its entirely possible that the 'text' is not actually text. It may be an image, or vectors, which look like text.
There may also be transparency ivolved which would complicate the matter still further.

Import vector graphics from PDF to GIMP

I need to extract vector graphics from a PDF image and import them into GIMP, either as paths or as high-resolution raster images. Specifically, I need to get contour lines from USGS topographical maps and overlay them on satellite images. Any suggestions?
So far I have tried:
--Using GIMP's native PDF importing function to import them as raster images. Problem: To do so at high resolution crashes my computer. Possible solution would be to import only a selected area of a PDF, but as far as I can tell this is not possible.
--Using ImageMagick to convert the PDF to a raster image. Problem: Used with the "-scale" parameter, "convert" appears to rasterize the PDF and then upscale it, leading to a choppy image.
--Using InkScape to extract the necessary vector elements from the PDF. Problem: InkScape freezes when I try to open a moderately large (25 Mb) PDF.
Any other ideas?
Many thanks,
treacl
The option you didn't mention above is to try to use the ghostscript program directly to render your output - ghostscript is used internally by GIMP to import PDF files, so you likely have it installed already.
There are tens of command switches to pass ghostscript for it to render a file into another format - the switches you need to pass are for determining the output size, resolution and which page to print. I didn't find any switch to select a portion of the page to be rendered - so, if your document is a single page, it is possible the generated file will still be to big for GIMP - but you will likely be able to crop it with ImageMagick, at least.
I guess the relevant command line for you would be something along:
gs -dNOPAUSE -dBATCH -sDEVICE=png16m -sOutputFile=page.png -dFirstPage=<pagenumber> -dLastPage=<pagenumber> -r<dpiresolution> -f<filename.pdf>
If the resulting image is still too large to be generated or operated upon, you can try changing the output format to use a smaller color depth (this one is 3 bytes per pixel: png16m) . It should be possible to pass postscript commands to transform the device, so that the area of interest is scaled up to your page size (and the remaining parts are cropped out of the rendering) - that would be the definitive fix for you - but of the top of my head, I don't know how to do that with ghostscript.
Alternatively, you can try passing ImageMagick the -density parameter as suggested in the comments.

Converting commented PDF with Ghostscript but without the comments

Gentlepeople,
I'm using the command line version of GhostScript for Windows to convert PDF to PNG images. However I noticed that also the annotations (such as comments, shapes, attached files - anything the user can put on top of the original PDF) were converted and appear in the image output. Is there any way to let Ghostscript ignore comments in PDF?
Your help is appreciated :-)
I had the same question. I found a setting in GhostScript which turns off comment printing (called annotations in their documentation). http://www.ghostscript.com/doc/current/Use.htm
the switch is -dShowAnnots=false which is case sensitive. For example, to convert a file to PNG (which was also what I wanted to do), you would use something like:
gswin64c -sDEVICE=png16m -sOutputFile="OutFile.png" -r300 -dShowAnnots=false "InputFile.pdf"
Using this command line format gave me exactly what I wanted: The first page of the source PDF converted to true-color PNG format without transparency, at 300 DPI, without any of the comments from the PDF.
Had this error:
BBox has zero width or height, which is not allowed.
Found this hint, but without solution: https://bugs.ghostscript.com/show_bug.cgi?id=696889
I already used
-dPreserveAnnots=false
but the error came nonetheless.
-dShowAnnots=false fixes it for me.

Obey the MediaBox/CropBox in PDF when using Ghostscript to render a PDF to a PNG

I've been using Ghostscript to convert my single figure plots rendered in PDF to PNG:
gswin32c -sDEVICE=png16m -r300x300 -sOutputFile=junk.png ^
-dBATCH -dNOPAUSE Figure_001-a.pdf
This works in the sense I get a PNG out and it contains the plot.
But it contains a huge amount of white space as well (an example source image: http://cdsweb.cern.ch/record/1258681/files/Figure_001-a.pdf).
If you view it in Acrobat you'll note there is no white space around the plot. If you use the above command line you'll find the plot is only about 1/3 of the space.
When doing the same thing with an EPS file I run into the same problem. However, there is the command-line parameter -dEPSCrop that one can pass to get the PS rendering engine to pay attention to the BoundingBox.
I need the similar argument for rendering PDFs. I was not able to find it in docs (nor even the -dEPSCrop, actually).
I had exactly the same issue. I fixed it by adding -dUseArtBox switch.
Example:
/usr/bin/gs -dUseArtBox -dNOPAUSE -sDEVICE=pngalpha -sOutputFile=output.png input.pdf
Note: -dUseArtBox switch is supported since ghostscript version 9.07
-dUseArtBox
Sets the page size to the ArtBox rather than the MediaBox. The art box defines the extent of the page's meaningful content (including potential white space) as intended by the page's creator. The art box is likely to be the smallest box. It can be useful when one wants to crop the page as much as possible without losing the content.
There are various options to control which "media size" Ghostscript renders a given input:
-dPDFFitPage
-dUseTrimBox
-dUseCropBox
With PDFFitPage Ghostscript will render to the current page device size (usually the default page size).
With UseTrimBox it will use the TrimBox (and it will at the same time set the PageSize to that value).
With UseCropBox it will use the CropBox (and it will at the same time set the PageSize to that value).
By default (give no parameter), Ghostscript will render using the MediaBox.
For your example, it looks like adding "-dUseCropBox" will do the job you're expecting.
Note, you can additionally control the overall size of your output by using "-sPAPERSIZE" (select amongst all pre-defined values Ghostscript knows) or (for more flexibility) use "-dDEVICEWIDTHPOINTS=NNN -dDEVICEHEIGHTPOINTS=NNN".
Have you tried using pdfcrop using pdftex (comes with texlive for example) or (not tried yet) the python script pdfcrop?
I have a similar workflow using the first tool mentioned.