How to adjust BoundingBox of an EPS file? - pdf

I want to crop main area of a PS or PDF file to create an EPS file without white space. Commands of ghostrcipt, ps2pdf, epstools can crop the main drawing out of the document file.
The problem is that they only crop in its original form, but I want to create an EPS file with BoundingBox 0 0 x y; cropped and moved to the bottom left corner.
The difference i when we want to insert the resulting EPS file inside a PS document. When having BoundingBox x0 y0 x y, the PS document inserts the EPS file at point x0 y0, instead of where we are.
EXAMPLE:
Consider a simple PS file as
%!
/Times-Roman findfont
11 scalefont setfont
72 700 moveto
(This is a test)show
if converting it to EPS with a command like
ps2eps test.ps test.eps
It will produce
%!PS-Adobe-2.0 EPSF-2.0
%%BoundingBox: 72 700 127 708
%%HiResBoundingBox: 72.000000 700.000000 127.000000 707.500000
%%EndComments
% EPSF created by ps2eps 1.68
%%BeginProlog
save
countdictstack
mark
newpath
/showpage {} def
/setpagedevice {pop} def
%%EndProlog
%%Page 1 1
/Times-Roman findfont
11 scalefont setfont
72 700 moveto
(This is a test)show
%%Trailer
cleartomark
countdictstack
exch sub { end } repeat
restore
%%EOF
It has been cropped in its original coordinates, and the resulting BoundingBox is 72 700 127 708. Now if trying to insert this EPS file within a PS document, it tries to nest at this coordinate.
It will be useful if creating an EPS file with BoundingBox: 0 0 55 8. Of course, all drawing coordinates (here moveto) must be modified with this new reference.
NOTE: As stated, my purpose from fixing the BoundingBox reference point is to make it importable within PS document. Thus, an alternative answer to this question is: how to insert an EPS file inside PS document regardless of its BoundingBox.
For example, how to insert this EPS file at location 200 200 255 208 of a PS document. I try to insert the EPS with the following code, but it will not work unless the BoundingBox is started from 0 0:
200 200 translate
save
/showpage {} bind def
(test.eps)run
restore

What about simply un-translating?
-72 -700 translate
Either in the eps itself, or in the prep section before the inclusion?
AWKward!
The following typescript illustrates an awk script which performs the desired modifications
to the eps, guided by the DSC comments (just like Mama used to do!).
The advantage is: if you can guarantee that the input EPS conforms sufficiently to DSC to provide these markers, this approach will be orders-of-magnitude faster than passing the file through ghostscript.
Simplicity is both the advantage and the limitation of this program. It scans for DSC comments, extracts values from the BoundingBox comment, suppresses the HiResBoundingBox, and adds postscript 'translate' and 'rectclip' commnds just after the Page comment. This should produce the correct results so long as the EPS really is bona-fide. But the ghostscript approach in the other answer will produce results on input files with less reliable DSC-conformance (because it's not taking shortcuts, it treats DSC as comments and completely ignores them).
Strictly speaking the 'rectclip' shouldn't be necessary, but the question asks that the output be "cropped".
592(1)11:27 AM:~ 0> cat epscrop.awk
/%%BoundingBox: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*)/{x=$2;y=$3;w=$4-x;h=$5-y;print $1,0,0,w,h}
!/%%BoundingBox:/&&!/%%HiRes/{print}
/%%Page /{print -x,-y,"translate"; print 0,0,w,h,"rectclip"}
593(1)11:27 AM:~ 0> awk -f epscrop.awk etest.eps
%!PS-Adobe-2.0 EPSF-2.0
%%BoundingBox: 0 0 55 8
%%EndComments
% EPSF created by ps2eps 1.68
%%BeginProlog
save
countdictstack
mark
newpath
/showpage {} def
/setpagedevice {pop} def
%%EndProlog
%%Page 1 1
-72 -700 translate
0 0 55 8 rectclip
/Times-Roman findfont
11 scalefont setfont
72 700 moveto
(This is a test)show
%%Trailer
cleartomark
countdictstack
exch sub { end } repeat
restore
%%EOF

To convert it to an EPS with the BoundingBox-style you want, I would use Ghostscript and let the EPS make a roundtrip: EPS => PDF => EPS.
The trick is to ensure that the PDF uses a media size that is the same as the BoundingBox width and height are by adding the -dEPSCrop param.
These two commands create your 'EPS without white space':
1st step: convert EPS to PDF:
gs \
-o so#12682621.pdf \
-sDEVICE=pdfwrite \
-dEPSCrop \
so#12682621.eps
2nd step: convert PDF back to EPS:
gs \
-o so#12682621.roundtripped.eps \
-sDEVICE=epswrite \
so#12682621.pdf
To test the fidelity of your resulting EPS, you could use ImageMagick's compare to show the differences, pixel-wise in red, as a PNG file:
compare \
-density 600 \
12682621.roundtripped.eps \
12682621.eps \
-compose src \
12682621.png
which results in:
You'll notice that there are some pixel differences. They are caused by the value of 707.500000 from the %%HiResBoundingBox, which leads to a rounding error later on (PNG can't have 'half pixels').

Related

PDF to EPS or PS to EPS conversion maintaining page size

I need to convert a PDF or Postscript file to EPS, I tried using Ghostscript with the following command to convert Postscript to EPS:
gswin32.exe -o output.eps -sDEVICE=eps2write -dFitPage input.ps
Or PDF to EPS:
gswin32c.exe -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -o output.eps -dFitPage input.pdf
They both complete successfully but they are not maintaining the page size. The input PDF or PS files are the same drawings and they both a page size of 300x300pts. You can download these files here and here. They look like this:
But after converting them to EPS the results are these, PS to EPS and PDF to EPS. They look like this, the first one is the result from PS to EPS and the second one is the result from PDF to EPS (they are opened using EPS Viewer that rasterizes the image that's the reason for the low quality):
As you can see, none of them have the original 300x300 pts size, I've tried many Ghostscript options but I can't manage to get an EPS with the right Bounding Box. I just need to convert a PDF OR PS to EPS, whatever is easier or gives better results.
What you are asking for is, more or less, the exact opposite of what is normally required.
In general people want the EPS Bounding Box to be as tight as possible to the actual marks made by the EPS, because the normal use for an EPS file is to 'embed' it in another document. If you want extra white space you would normally add it around the EPS when you embed it.
Indeed, the EPS specification says that the BoundingBox comment should not include the white space. On page 8 of the EPSF specification:
"For an EPS file, the bounding box is the smallest rectangle that encloses all the marks painted on the single page of the EPS file"
Messing with Ghostscript switches isn't going to do anything helpful for you here, the device explicitly records the marks that are made by the input, and sets the BoundiongBox from those.
Perhaps if you were to explain why you want to have an EPS file with incorrect BoundingBox comments it would be possible to make some suggestions, but Ghostscript is doing exactly what it should do here.
[addendum]
(see comment below, this is in reply)
I suspect you need to change your process in some way then. One solution is to have the PDF start by filling the entire page with white. Contrary to many people's expectations that counts as making a mark on the page so the entire page would then be considered as the BoundingBox.
As long as you are using the Ghostscript eps2write device you could also parse the document for %%BeginPageSetup, the eps2write device still writes the original document size out in this section, Eg:
%!PS-Adobe-3.0 EPSF-3.0
%%Invocation: path/gswin32c -dDisplayFormat=198788 -dDisplayResolution=96 -sDEVICE=eps2write -sOutputFile=? ?
%%BoundingBox: 101 132 191 256
%%HiResBoundingBox: 101.80 132.80 190.30 255.20
%%Creator: GPL Ghostscript GIT PRERELEASE 951 (eps2write)
....
....
%%EndProlog
%%Page: 1 1
%%BeginPageSetup
4 0 obj
<</Type/Page/MediaBox [0 0 300 300]
/Parent 3 0 R
/Resources<</ProcSet[/PDF]
>>
/Contents 5 0 R
>>
endobj
%%EndPageSetup
You can see here that the original media size was 300x300, even though the BoundingBox correctly reflects the marks made on the page. Note! This is characteristic of EPS files produced by the current version of eps2write, it won't work for EPS files from other sources and may not work with eps2write in the future.
Other than that you're stuck with finding the media size from the input and passing it separately to the program doing the insertion, presumably by putting the data in some other text file to accompany the EPS. Or, of course, manually or programmatically editing the urx,ury co-ordinates of the BoundingBox.
Ghostscript isn't going to do this for you I'm afraid.

Ghostscript renders ugly text

I'm trying to add the capability to render LaTeX equations to a project I'm working on. To do so, I use XeLaTeX to create a PDF file, which I then render to a (transparent) 96dpi-PNG using Ghostscript.
I'd like to have the rendered LaTeX blend in with the rest of the text (which is rendered using standard .NET GDI+ methods, but that's off-topic), but I can't get a reliably "good" text rendering: the output always looks somehow blurry or otherwise "bad".
Example:
From left to right, the same (small) PDF rendered at 96dpi with Ghostscript, Photoshop, and TexWorks (which I understand uses Ghostscript internally).
The command I use to run Ghostscript is the following:
"C:/Program Files (x86)/gs/gs9.09/bin/gswin32c.exe" \
-q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT \
-dMaxBitmap=500000000 -dAlignToPixels=1 -dGridFitTT=2 \
"-sDEVICE=pngalpha" -dTextAlphaBits=4 \
-dGraphicsAlphaBits=4 "-r96" -dFirstPage=1 -dLastPage=1 \
-sOutputFile="output.png" "input.pdf"
(which I actually pretty much copied from the command ImageMagick calls when converting a PDF file, but that's another story). I tried changing any of the relevant options (dAlignToPixels=0, dGridFitTT=0/1/2, dTextAlphaBits=2/4 [or without this parameter altogether]) and I even tried to render the PDF to 4 times the resolution and then downscale it, without any noticeable improvement.
Yet, I'm sure there must be some way of decently rendering the PDF with Ghostscript (since TexWorks does), although I'm unable to find it.
Any hint? The PDF is this one.
You could try to render your PDF at a higher resolution. 96dpi just isn't enough for text with 11 pt size.
If you use 192dpi and then scale the display of the resulting image to 50% (wherever you use the PNG), these parts should still appear in the same size as befor, but with a higher resolution. What used to be a 4x7 pixels 's' should now be a 8x14 pixels 's'...
Update
Ok, since my explanation seems to have been not comprehendible enough for the OP, here's the deal.
Generate a PDF file containing the word "Test", using Ghostscript. In my case, it is Ghostscript v9.10:
gs \
-o test.pdf \
-sDEVICE=pdfwrite \
-g230x100 \
-c "/Helvetica findfont \
11 scalefont \
setfont \
1 1 moveto \
(Test) show \
showpage"
From this PDF, generate 6 different images depicting the word "Test", using 6 different resolutions. The gs is still Ghostscript v9.10 (to be checked with gs -version):
for i in 1 2 3 4 5 6; do \
gs \
-o t$(( ${i} * 96 )).png \
-r$(( ${i} * 96 )) \
-sDEVICE=pngalpha \
-dAlignToPixels=1 \
-dGridFitTT=2 \
-dTextAlphaBits=4 \
-dGraphicsAlphaBits=4 \
t.pdf ; \
done
This will create the following PNGs, as confirmed by ImageMagick's identify command:
identify -format "%f : %Wx%H pixels -- %b filesize\n" t[1-9]*.png
t96.png : 31x13 pixels -- 475B filesize
t192.png : 61x27 pixels -- 774B filesize
t288.png : 92x40 pixels -- 1.1KB filesize
t384.png : 123x53 pixels -- 1.43KB filesize
t480.png : 153x67 pixels -- 1.76KB filesize
t576.png : 184x80 pixels -- 2.01KB filesize
Create a sample LaTeX document and embed the different images side by side and/or line by line. Here is my sample code:
\begin{document}
Test
\includegraphics[height=7.5pt]{t96.png}
\includegraphics[height=7.5pt]{t96.png}
\includegraphics[height=7.5pt]{t192.png}
\includegraphics[height=7.5pt]{t288.png}
\includegraphics[height=7.5pt]{t384.png}
\includegraphics[height=7.5pt]{t480.png}
\includegraphics[height=7.5pt]{t576.png}
Test\\
{}
Test <== real text
\includegraphics[height=7.5pt]{t96.png} <-- 96 dpi figure
\includegraphics[height=7.5pt]{t192.png} <-- 192 dpi figure
\includegraphics[height=7.5pt]{t288.png} <-- 288 dpi figure
\includegraphics[height=7.5pt]{t384.png} <-- 384 dpi figure
\includegraphics[height=7.5pt]{t480.png} <-- 480 dpi figure
\includegraphics[height=7.5pt]{t576.png} <-- 576 dpi figure
Test <== real text
\end{document}
Here is a screenshot (at 400% zoom) from the PDF created via LuaLaTeX from the above LaTeX code:
The line with the 8 "Test" words has actual text only in the first and the last word. The 6 words in between are images with 96, 96, 192, 288, 384, 480 and 576 dpi.
I hope you can see now clearly how scaling up your image generation to a higher resolution will result in better quality for your final PDF if you include the higher resolution images into your LaTeX code...
You are rendering text at 11 points, at 96 dpi, that works out to about 14 pixels in height which, frankly, is not a lot (and in my output the 's' is 7 pixels high by 4 wide). Looking at your output all 3 look 'blurry' and the Photoshop output looks overly bold in the capital T.
If you don't want it blurred, then don't set TextAlphaBits, or don't set it to such a high value.
I'd also suggest using the current release (9.15).

Add white border to PDF (change paper format)

I have to change a given PDF from A4 (210mm*297mm) to 216mm*303mm.
The additional 6 mm for each dimension should be set as white border of 3mm on each side. The original content of the PDF pages should be centered on the output pages.
I tried with convert:
convert in.pdf -bordercolor "#FFFFFF" -border 9 out.pdf
This gives me exactly the needed result but I loose very much sharpness of the original images in the PDF. It is all kind of blurry.
I also checked with
convert in.pdf out.pdf
which does no changes at all but also screws up the images.
So I tried Ghostcript but did not get any result. The best approach I found so far from a German side is:
gs -sOutputFile=out.pdf -sDEVICE=pdfwrite -g6120x8590 \
-c "<</Install{1 1 scale 8.5 8.5}>> setpagedevice" \
-dNOPAUSE -dBATCH in.pdf
but I get Error: /typecheck in --.postinstall--.
By default, Imagemagick converts input PDF files into images with 72dpi. This is awfully low resolution, as you experienced firsthad. The output of Imagemagick is always a raster image, so if your input PDF was text, it will no longer be.
If you don't mind the output PDF's getting bigger, you can simply increase the ratio Imagemagick is probing the original PDF using -density option, like this:
convert -density 600 in.pdf -bordercolor "#FFFFFF" -border 9 out.pdf
I used 600 because it is the sweet spot that works well for OCR. I recomment trying 300, 450, 600, 900 and 1200 and picking the best one that doesn't get unwieldably huge.
Shifting the content on the media is not especially hard, but it does mean altering the content stream of the PDF file, which most PDF manipulation packages avoid, with good reason.
The code you quote above really won't work, it leaves garbage on the operand stack, and the PLRM explicitly states that it is followed by an implicit initgraphics which will reset all the standard parameters anyway.
You could try instead setting a /BeginPage procedure to translate the origin, which will probably work:
<</BeginPage {8.5 8.5 translate} >> setpagedevice
Note that you aren't simply manipulating the original PDF file; Ghostscript takes the original PDF file, interprets it into graphics primitives, then reassembles those primitives into a new PDF file, this has implications... For example, if an image is DCT encoded (a JPEG) in the original, it will be decompressed before being passed into the output file. You probably don't want to reapply DCT encoding as this will introduce visible artefacts.
A simpler alternative, but involving multiple processing steps and therefore more potential for problems, is to first convert the PDF to PostScript with the ps2write device, specifying your media size, and also the -dCenterPages switch, then use the pdfwrite device to turn the resulting PostScript into a new PDF file.
Instead of
-g6120x8590 \
-c "<</Install{1 1 scale 8.5 8.5}>> setpagedevice"
(which is wrong), you should use:
-g6120x8590 \
-c "<</Install{8.5 8.5 translate}>> setpagedevice"
or
-g6120x8590 \
-c "<</Install{3 25.4 div 72 mul dup translate}>> setpagedevice"
(which lets Ghostscript calculate the "3mm == 8.5pt" itself...)

Reducing PDF file size using Ghostscript on Linux didn't work

I have about 50-60 pdf files (images) that are 1.5MB large each. Now I don't want to have such large pdf files in my thesis as that would make downloading, reading and printing a pain in the rear. So I tried using ghostscript to do the following:
gs \
-dNOPAUSE -dBATCH \
-sDEVICE=pdfwrite \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS="/screen" \
-sOutputFile=output.pdf \
L_2lambda_max_1wl_E0_1_zg.pdf
However, now my 1.4MB pdf is 1.5MB large.
What did I do wrong? Is there some way I can check the resolution of the pdf file? I just need 300dpi images, so would anyone suggest using convert to change the resolution or is there someway I could change the image resolution (reduce it) with gs, since the image is very grainy when I use convert
How I use convert:
convert \
-units PixelsPerInch \
~/Desktop/L_2lambda_max_1wl_E0_1_zg.pdf \
-density 600 \
~/Desktop/output.pdf
Example File
http://dl.dropbox.com/u/13223318/L_2lambda_max_1wl_E0_1_zg.pdf
If you run Ghostscript -dPDFSETTINGS=/screen this is just a sort of shortcut. In fact you'll get (implicitly) a whole bunch of settings used, which you can query with the following command:
gs \
-dNODISPLAY \
-c ".distillersettings {exch ==only ( ) print ===} forall quit" \
| grep '/screen'
On my Ghostscript (v9.06prerelease) I get the following output (slightly edited to increase readability):
/screen
<< /DoThumbnails false
/MonoImageResolution 300
/ColorImageDownsampleType /Average
/PreserveEPSInfo false
/ColorConversionStrategy /sRGB
/GrayImageDownsampleType /Average
/EmbedAllFonts true
/CannotEmbedFontPolicy /Warning
/PreserveOPIComments false
/GrayImageResolution 72
/GrayACSImageDict <<
/ColorTransform 1
/QFactor 0.76
/Blend 1
/HSamples [2 1 1 2]
/VSamples [2 1 1 2]
>>
/ColorImageResolution 72
/PreserveOverprintSettings false
/CreateJobTicket false
/AutoRotatePages /PageByPage
/MonoImageDownsampleType /Average
/NeverEmbed [/Courier
/Courier-Bold
/Courier-Oblique
/Courier-BoldOblique
/Helvetica
/Helvetica-Bold
/Helvetica-Oblique
/Helvetica-BoldOblique
/Times-Roman
/Times-Bold
/Times-Italic
/Times-BoldItalic
/Symbol
/ZapfDingbats]
/ColorACSImageDict <<
/ColorTransform 1
/QFactor 0.76
/Blend 1
/HSamples [2 1 1 2]
/VSamples [2 1 1 2] >>
/CompatibilityLevel 1.3
/UCRandBGInfo /Remove
>>
I'm wondering if your PDFs are image-heavy, and if this sort of conversion does un-welcome things (f.e. re-sampling images with the 'wrong' parameters) which increase the file size...
If this is the case (image-heavy PDF), tell so, and I'll update this answer with a few suggestions....
Update
I had a look at the sample file provided by DNA. Interesting...
No, it does not contain any image.
Instead, it contains one large stream (compressed using /FlateDecode) which consists of roughly 700.000+ (!!) operations, mostly single vector operations in PDF language, such as:
m (moveto),
l (lineto),
d (setdash),
w (setlinewidth),
S (stroke),
s (closepath and stroke),
W* (eoclip),
rg and RG (setrgbcolor)
and a few more.
(That PDF code is very inefficiently written AFAICS (but does its job), because it does concatenate many short strokes instead of doing 'long' ones, and nearly each stroke defines the color again (even if it is the same as before), and has all the other overhead (start stroke, end stroke,...).
Ghostscript's -dPDFSETTINGS=/screen do not have any effect here (there are no images to downsample, for example). The increased file size (+48 kByte to be precise) is probably due to Ghostscript re-organizing some of the internal stroking etc. commands to a different order when it interprets the file.
So there is not much you can do about the PDF file size ...
...unless you convert each of these PDF pages into a REAL image such as PNG:
gs \
-o out72.png \
-sDEVICE=pngalpha \
L_2lambda_max_1wl_E0_1_zg.pdf
(I used the pngalpha output to get transparent background.) The image dimensions of 'out.png' are 259x213px, the filesize is now 70 kByte. But I'm sure you'll not like the quality :-)
The output quality is 'bad' because Ghostscript uses a default resolution of 72 dpi.
Since you said you'd like to have 300dpi, the command becomes this:
gs \
-o out300.png \
-sDEVICE=pngalpha \
-r300 \
L_2lambda_max_1wl_E0_1_zg.pdf
The filesize now is 750 kByte, the image dimensions are 1080x889 Pixels.
Update 2
Since Curiosity is en vogue these days... :-) ...I tried to bring down the file size with the help of Adobe Acrobat X Pro on Mac.
You wanna know the results?
Performing a 'Save as... (PDF with reduced filesize)' -- which for me in the past always yielded very good results! -- created a 1,8++ MByte file (+29%). I guess this definitely puts Ghostscript's performance (file size increase +3%) into a realistic perspective !
DNA decided to go for grayscale PNGs. The way he's creating them is in two steps:
Step 1: Convert a color PDF page (such as this) to a grayscale PDF page, using Ghostscript's pdfwrite device and the settings
-dColorConversionStrategy=/Gray and
-dProcessColorModel=/DeviceGray.
Step 2: Convert the grayscale PDF page to a PNG, using Ghostscript's pngalpha device at a resolution of 300 dpi (-r300 on the GS commandline).
This reduces his initial file size of 1.4 MB to 0.7 MB.
But this workflow has the following disadvantage:
It looses all color info, without saving much disk space as compared to a color output written at the same resolution, directly from the PDF!
There are 2 alternatives to DNA's workflow:
A one-step conversion of (color) PDF -> (color) PNG, using Ghostscript's pngalpha device with the original PDF as input (same settings of 300 dpi resolution). This would have this advantage:
It would keep the color information in the PNG output, requiring only a little more space on disk!
A one-step conversion of (color) PDF -> grayscale PNG, using Ghostscript's pnggray device with the original PDF as input (same settings of 300 dpi resolution), with this mix of advantage/disadvantage :
It would loose the color information in the PNG output.
It would loose the transparent background that was preserved in DNA's workflow.
It would save lots of disk space, because the filesize would go down to about 20% of the output from DNA's workflow.
So you can make up your mind and see the output sizes and quality side-by-side, here is a shell script to demonstrate the differences:
#!/bin/bash
#
# Copywrite (c) 2012 <kurt.pfeifle#gmail.com>
# License: Creative Commons (CC BY-SA 3.0)
function echo_do() {
echo
echo "Command: ${*}"
echo "--------"
echo
"${#}"
}
[ -d out ] || mkdir out
echo
echo " We assume all PDF pages are 1-page PDFs!"
echo " (otherwise we'd have to include something like '%03d'"
echo " into the output filenames in order to get paged output)"
echo
echo '
# Convert Color PDF to Grayscale PDF.
# If PDF has transparent background (most do),
# this will remain transparent in output.)
# ATTENTION: since we don't use a resolution,
# pdfwrite will use its default value of '-r720'.
# (However, this setting will only affect raster objects...)
'
for i in *.pdf
do
echo_do gs \
-o "out/${i}---pdfwrite-devicegray-gs.pdf" \
-sDEVICE=pdfwrite \
-dColorConversionStrategy=/Gray \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 \
"${i}"
done
echo '
# Convert (previously generated) grayscale PDF to PNG using Alpha channel
# (Alpha channel can make backgrounds transparent)
'
for i in out/*pdfwrite-devicegray*.pdf
do
echo_do gs \
-o "out/$(basename "${i}")---pngalpha-from-pdfwrite-devicegray-gs.png" \
-sDEVICE=pngalpha \
-r300 \
"${i}"
done
echo '
# Convert (color) PDF to grayscale PNG using Alpha channel
# (Alpha channel can make backgrounds transparent)
'
for i in *.pdf
do
# Following only required for 'pdfwrite' output device, not for 'pngalpha'!
# -dProcessColorModel=/DeviceGray
echo_do gs \
-o "out/${i}---pngalphagray_gs.png" \
-sDEVICE=pngalpha \
-dColorConversionStrategy=/Gray \
-r300 \
"${i}"
done
echo '
# Convert (color) PDF to (color) PNG using Alpha channel
# (Alpha channel can make backgrounds transparent)
'
for i in *.pdf
do
echo_do gs \
-o "out/${i}---pngalphacolor_gs.png" \
-sDEVICE=pngalpha \
-r300 \
"${i}"
done
echo '
# Convert (color) PDF to grayscale PNG
# (no Alpha channel here, therefor [mostly] white backgrounds)
'
for i in *.pdf
do
echo_do gs \
-o "out/${i}---pnggray_gs.png" \
-sDEVICE=pnggray \
-r300 \
"${i}"
done
echo " All output to be found in ./out/ ..."
echo
Run this script and compare the different outputs side by side.
Yes, the 'direct-grayscale-PNG-from-color-PDF-using-pnggray-device' one may look a bit worse (and it doesn't sport the transparent background) than the other one -- but it is also only 20% of its file size. On the other hand, if you wan to buy a bit more quality by sacrificing a bit of disk space -- you could use -r400 instead of -r300...

Setting auto-height/width for converted Jpeg from PDF using GhostScript

I am using GS to do conversion from PDF to JPEG and following is the command that I use:
gs -sDEVICE=jpeg -dNOPAUSE -dBATCH -g500x300 -dPDFFitPage -sOutputFile=image.jpg image.pdf
In this command as u can see -g500x300 is to set the converted image size (Width x Height).
Is there a way to just set the Width without having to input the Height so it will base on the width to scale the height using its original aspect ratio? I know it can be achieved by using ImageMagick convert where you simply put 0 on the height parameter i.e. -resize 500x0. I tried with GhostScript but I don't think that is the correct way to do it.
I decided not to use ImageMagick convert reason why because it is very slow when it comes to converting a big sized multiple page PDF.
Thanks for the help!
This post explains why ghostscript is faster - https://serverfault.com/questions/167573/fast-pdf-to-jpg-conversion-on-linux-wanted, and the only workaround to fix it would involve modifying the imagemagick code.
Unfortunately, autodetermined output size is not supported by ghostscript. This is primarily because the -g option used is actually determining the device size that will hold the rendered output, and not the rendered output itself. That output size is changing because of the -dPDFFitPage switch which then tries to match the device size. And although you can define just the height of the jpeg 'device' using -dDEVICEHEIGHT=n, that will leave the device width at the unchanged default.
Although a somewhat tedious workaround, you can use ghostscript or imagemagick to get the width and height of the pdf page(s). To do this using ghostscript, see the answer to Using GhostScript to get page size. You can then calculate the proper width to set the -g flag to hold the aspect ratio. Bonus points if you can figure out a single set of commands to do all this :)
You could write a PostScript program to do this readily enough. Here is a start:
%!
% usage: gs -sFile=____.pdf scale.ps
/File where not {
(\n *** Missing source file. \(use -sFile=____.pdf\)\n) =
Usage
} {
pop
}ifelse
% Get the width and height of a PDF page
%
/GetSize {
pdfgetpage currentpagedevice
1 index get_any_box
exch pop dup 2 get exch 3 get
/PDFHeight exch def
/PDFWidth exch def
} def
%
% The main loop
% For every page in the original PDF file
%
1 1 PDFPageCount
{
/PDFPage exch def
PDFPage GetSize
% In here, knowing the desired destination size
% calculate a scale factor for the PDF to fit
% either the width or height, or whatever
% The width and height are stored in PDFWidht and PDFHeight
PDFPage pdfgetpage
pdfshowpage
} for
pdfgetpage and pdfshowpage are internal Ghostscript extensions to the PostScript language for handling PDF files.
To resize image with Ghostscript, use -dDownScaleFactor
e.g.
gs -dBATCH -dNOPAUSE -r300 -dDownScaleFactor=3 -sDEVICE=png16m -sOutputFile=/tmp/26a0e9f7-3f26-437d-9a97-1653074e819a_%d.png,/tmp/temp.pdf
-r300 here will produce a huge image
I can drop the size by scaling down by 3, aspect ratio maintained.
You can use this if it is not important to set an exact width dimension. Which works for most use cases.