How to convert PDF to low-resolution (but good quality) JPEG? - pdf

When I use the following ghostscript command to generate jpg thumbnails from PDFs, the image quality is often very poor:
gs -q -dNOPAUSE -dBATCH -sDEVICE=jpeggray -g465x600 -dUseCropBox -dPDFFitPage -sOutputFile=pdf_to_lowres.jpg test.pdf
By contrast, if I use ghostscript to generate a high-resolution png, and then use mogrify to convert the high-res png to a low-res jpg, I get pretty good results.
gs -q -dNOPAUSE -dBATCH -sDEVICE=pnggray -g2550x3300 -dUseCropBox -dPDFFitPage -sOutputFile=pdf_to_highres.png test.pdf
mogrify -thumbnail 465x600 -format jpg -write pdf_to_highres_to_lowres.jpg pdf_to_highres.png
Is there any way to achieve good results while bypassing the intermediate pdf -> high-res png step? I need to do this for a large number of pdfs, so I'm trying to minimize the compute time.
Here are links to the images referenced above:
test.pdf
pdf_to_lowres.jpg
pdf_to_highres.png
pdf_to_highres_to_lowres.jpg

One option that seems to improve the output a lot: -dDOINTERPOLATE. Here's what I got by running the same command as you but with the -dDOINTERPOLATE option:
I'm not sure what interpolation method this uses but it seems pretty good, especially in comparison to the results without it.
P.S. Consider outputting PNG images (-sDEVICE=pnggray) instead of JPEG. For most PDF documents (which tend to have just a few solid colors) it's a more appropriate choice.

Your PDF looks like it is just a wrapper around a jpeg already.
Try using the pdfimages program from xpdf to extract the actual image rather than rendering
to a file.

Related

Converting pdf to eps without rasterizing or changing fonts

I have been trying to convert a pdf vector graphic to eps. I tried two commands from the following answer: https://stackoverflow.com/a/44737018/5661667
The inkscape command inkscape input.pdf --export-eps=output.eps or rather, since --export-eps is deprecated now,
inkscape input.pdf --export-filename=output.eps
nicely converts to a vectorized eps. However, it strangely converts my Times New Roman fonts (the graphic was originally created using matplotlib) to some sans serif font (looks like Arial or something).
The ghostscript version of the conversion from the linked answer
gs -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -sOutputFile=output.eps input.pdf
keeps my fonts nicely. However, the eps seems to be rasterized despite the -dNOCACHE option.
Is there any way to get one of these to just convert my pdf to eps without modifying it?
Further info: I am using Mac OS. For the first part, my suspicion is that I only have an Arial Unicode.tff installed in /Library/Fonts/. I tried installing some other fonts, but no success for my conversion.
I had the same problem when trying to convert a powerpoint generated pdf to eps format using inkscape.
After trying with gs and disabling the transparency I noticed some areas turned black after eps conversion.
gs -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -dNOTRANSPARENCY -sDEVICE=eps2write -sOutputFile=output.eps input.pdf
Coming back to inkscape I noticed that Powerpoint added some transparent objects in these areas that turned black. So I manually removed them using inkscape and when converting to eps again the result was perfect!
In short: if there are transparent elements in your pdf, the fonts will probably be rasterized during eps conversion. So, you need to remove these elements.
Maybe there is an easier way to identify them in inkscape.
In my case I was able to use Find/Replace (Ctrl+F) to search objects with string "clipPath" and with 'Search option = Properties'. Then I open the Objects Tab (Menu Object->Objects...) and use that to delete each transparent object generated by Powerpoint.

How can I disable ghostscipt rasterization of images and paths?

I need to convert a PDF to a different ICC color profile. Through different searches and tests, I found out a way to do that:
First I convert my PDF to a PS file with:
.\gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile="test.ps" "test.pdf"
Then I convert the PS back to a PDF with the following (this is to generate a valid PDF/X-3 file):
.\gswin64c.exe -dPDFX -dNOPAUSE -dBATCH -sDEVICE=pdfwrite
-sColorConversionStrategy=/UseDeviceIndependentColor -sProcessColorModel=DeviceCMYK
-dColorAccuracy=2 -dRenderIntent=0 -sOutputICCProfile="WebCoatedSWOP2006Grade5.icc"
-dDeviceGrayToK=true -sOutputFile="final.pdf" test_PDFX_def.ps test.ps
The ICC profile is embedded and all works perfectly. The only problem is that the whole final PDF is rasterized. Here I loose all the paths and other vectorial elements quality I have in the starting file. I need to keep them vectorial because this PDF will have a specific application.
First step don't convert to PostScript!!!
Any transparent marking operations will have to be rendered if you do that, because PostScript doesn't support transparency. Other features will be lost as well, so really, don't do that. The input and output ends of Ghostscript are more or less independent; the pdfwrite device doesn't know whether the input was PDF or PostScript, and doesn't care. So you don't need to convert a PDF file into PostScript before sending it as input.
You can feed the original PDF file into the second command line in place of the PostScript file.
As long as you are producing PDF/X-3 or later then the transparency will be preserved. Make sure you are using an up to date version of Ghostscript.

Linux PDF to TIFF Quality Issue

I am trying to use a linux application to convert .pdf files to .tiff for faxing, however, our clients have not been happy with the quality of GhostScript's tiffg4 device.
In the image below, the left side shows a conversion using GhostScript tiffg4 and the right is from an online conversion service. We are unable to see which application is being used to attain that quality.
Note: The output TIFF must be black & white
Ghostscript Code:
gs -sDEVICE=tiffg4 -dNOPAUSE -dBATCH -dPDFFitPage -sPAPERSIZE=letter -g1728x2156 -sOutputFile=testg4.tiff test.pdf
We have tried these GhostScript devices:
tiffcrle
tiffg3
tiffg32d
tiffg4
tifflzw
tiffpack
My question is this--does anyone know which application and/or setting is used to achieve the quality on the right?
Extending on BitBank's comment, you could write a RGB tiff and then use ImageMagick to convert to Group 4. ImageMagick allows you to control the dithering algorithm:
gs -sDEVICE=tiff24nc -dNOPAUSE -dBATCH -dPDFFitPage -sPAPERSIZE=letter -g1728x2156 -sOutputFile=intermediate.tiff your.pdf
convert intermediate.tiff -dither FloydSteinberg -compress group4 out.tiff
ImageMagick's manual has some background on the algorithm(s) and available options.

Ghostscript: Quality and Size issue

I have a ghostscript command that converts a pdf into several PNG images (one for every page). The command arguments are as follows:
-dNOPAUSE -q -r300 -sPAPERSIZE=a4 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -dUseTrimBox -sDEVICE=png16m -dBATCH -sOutputFile="C:\outputfile%d.png" -c \"30000000 setvmthreshold\" -f "C:\inputfile.pdf"
The pdf displays as regular A4 pages in Adobe Reader, but in the PNG images it becomes huge (2480 by 3507 pixels for instance).
if I change the resolution in the ghostscript command to -r110 the page size is correct but the image quality is very rastorized.
Is there another way to improve the quality of the image without affecting the image size?
Thanks
Got it! Added the following parameter to my GS command:
-dDownScaleFactor=3
From the GS documentation:
This causes the internal rendering to be scaled down by the given
(small integer) factor before being output. For example, the following
will produce a 200dpi output png from a 300dpi internal rendering:
gs -sDEVICE=png16m -r600 -dDownScaleFactor=3 -o tiger.png\
examples/tiger.png
I had a similar problem, where PDF conversion to PNG using ghostscript resulted in an image with much greater dimensions (including extra white space). I solved the issue by using
-dUseCropBox
... which sets the page size to the CropBox rather than the MediaBox
The quality-size tradeoff is inevitable. You may choose a different compression to keep size down while maintaining reasonable quality. E.g. DCT (jpeg) or jpeg2000 if your content mainly consists of photographic images or CCITT or JBIG2 if your content is mainly black and white.
find the width and the height in points (%%BounDingBox)
use them
gs
-sDEVICE=png16m
-dDEVICEWIDTHPOINTS=$l
-dDEVICEHEIGHTPOINTS=$h
-r600
-dDownScaleFactor=3
-o tiger.png\
examples/tiger.png
where $w is the width and $h the height

Forcing Ghostscript to use antialiasing when converting a PDF to PNG?

I'm using GPL Ghostscript 9.07 (2013-02-14) on OS X (10.8.4) to convert many PDFs to PNGs.
It works fine except for one of the PDFs which turns into a PNG with jagged edges. In other words, Ghostscript turns off antialiasing for that particular PDF for some reason.
The PDF in question.
The output:
In other cases it works fine (sample: pdf -> png).
I use this command:
gs -dNOPAUSE -dBATCH -dPDFFitPage -sDEVICE=pngalpha -g200x150 -sOutputFile=01.png 01.pdf
Is it possible to force Ghostscript to use antialiasing for that PDF?
Any tips are appreciated.
This worked for me:
gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=jpeg -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r150 -sOutputFile=foo-%d.jpg foo.pdf
Source: ImageMagick convert pdf to jpeg has poor text quality after upgrading ImageMagick version to 6.7.8
The above would work for a JPG; for PNG, replace the -sDEVICE option with your choice, example: -sDEVICE=png16m
Source: http://ghostscript.com/doc/current/Devices.htm
You can try -dGraphicsAlphaBits= with values 1,2 or 4 which may or may not make a difference. It made some improvement for me, but its a small graphic at low resolution with an awkward curve, so not so much as might be expected.
Or you can use one of the anti-aliasing devices (eg tiffscaled) which are more flexible. There is no anti-aliased device for PNG output but it would be trivial to convert TIFF to PNG.
By the way, your PDF file specifically turns off anti-aliasing on the components:
8 0 obj
<</AntiAlias false/ColorSpace/DeviceCMYK/Coords[0.0 0.0 1.0 0.0]/Domain[0.0 1.0]/Extend[true true]/Function 10 0 R/ShadingType 2>>
You might like to try and see what happens if you change AntiAlias to true, though I doubt this will have an effect as I'm pretty sure the aniti-aliasing is applied to the internal rendering of the shading, not the edgses.
You can try -dDOINTERPOLATE which uses a Mitchell filter function to scale the contributions for each output pixel