Setting auto-height/width for converted Jpeg from PDF using GhostScript - pdf

I am using GS to do conversion from PDF to JPEG and following is the command that I use:
gs -sDEVICE=jpeg -dNOPAUSE -dBATCH -g500x300 -dPDFFitPage -sOutputFile=image.jpg image.pdf
In this command as u can see -g500x300 is to set the converted image size (Width x Height).
Is there a way to just set the Width without having to input the Height so it will base on the width to scale the height using its original aspect ratio? I know it can be achieved by using ImageMagick convert where you simply put 0 on the height parameter i.e. -resize 500x0. I tried with GhostScript but I don't think that is the correct way to do it.
I decided not to use ImageMagick convert reason why because it is very slow when it comes to converting a big sized multiple page PDF.
Thanks for the help!

This post explains why ghostscript is faster - https://serverfault.com/questions/167573/fast-pdf-to-jpg-conversion-on-linux-wanted, and the only workaround to fix it would involve modifying the imagemagick code.
Unfortunately, autodetermined output size is not supported by ghostscript. This is primarily because the -g option used is actually determining the device size that will hold the rendered output, and not the rendered output itself. That output size is changing because of the -dPDFFitPage switch which then tries to match the device size. And although you can define just the height of the jpeg 'device' using -dDEVICEHEIGHT=n, that will leave the device width at the unchanged default.
Although a somewhat tedious workaround, you can use ghostscript or imagemagick to get the width and height of the pdf page(s). To do this using ghostscript, see the answer to Using GhostScript to get page size. You can then calculate the proper width to set the -g flag to hold the aspect ratio. Bonus points if you can figure out a single set of commands to do all this :)

You could write a PostScript program to do this readily enough. Here is a start:
%!
% usage: gs -sFile=____.pdf scale.ps
/File where not {
(\n *** Missing source file. \(use -sFile=____.pdf\)\n) =
Usage
} {
pop
}ifelse
% Get the width and height of a PDF page
%
/GetSize {
pdfgetpage currentpagedevice
1 index get_any_box
exch pop dup 2 get exch 3 get
/PDFHeight exch def
/PDFWidth exch def
} def
%
% The main loop
% For every page in the original PDF file
%
1 1 PDFPageCount
{
/PDFPage exch def
PDFPage GetSize
% In here, knowing the desired destination size
% calculate a scale factor for the PDF to fit
% either the width or height, or whatever
% The width and height are stored in PDFWidht and PDFHeight
PDFPage pdfgetpage
pdfshowpage
} for
pdfgetpage and pdfshowpage are internal Ghostscript extensions to the PostScript language for handling PDF files.

To resize image with Ghostscript, use -dDownScaleFactor
e.g.
gs -dBATCH -dNOPAUSE -r300 -dDownScaleFactor=3 -sDEVICE=png16m -sOutputFile=/tmp/26a0e9f7-3f26-437d-9a97-1653074e819a_%d.png,/tmp/temp.pdf
-r300 here will produce a huge image
I can drop the size by scaling down by 3, aspect ratio maintained.
You can use this if it is not important to set an exact width dimension. Which works for most use cases.

Related

How to convert a rectangular region from one page of a multipage PDF to PNG? Clipping/Cropping problem

I can convert an entire page of a PDF with ghostscript to PNG, but clipping a rectangular region does not work. Here is what I currently have:
gs -dSAFER -dNOPAUSE -dBATCH -sDEVICE=png16m -r200 -dFirstPage=45 -dLastPage=45 -sOutputFile=outfile.png -q -c 0 0 640 150 rectclip -f infile.pdf
This does convert the entire page 45 to a PNG file, but it does not crop or clip it to the specified region.
Later I found out that with the -g option I can set the size of the resulting PNG file. For example adding -g640x150 will make the output file exactly that size in pixels. It clips the lower left hand corner of the page. And with -c "<> setpagedevice" I can move the clipped rectangle to the right by 100 pixels and up by 200 pixels.
There is one remaining problem. I don't want the clipped area to go beyond the page boundaries. How can I make sure to stay inside the page boundaries?
The clip operator works by appending the current path to the existing clipping path, consequently the size of the clipped area can only be reduced and never expanded.
If the -g option sets a larger size than the page boundaries, then there may be undrawn portions in the final output.

PDF to EPS or PS to EPS conversion maintaining page size

I need to convert a PDF or Postscript file to EPS, I tried using Ghostscript with the following command to convert Postscript to EPS:
gswin32.exe -o output.eps -sDEVICE=eps2write -dFitPage input.ps
Or PDF to EPS:
gswin32c.exe -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -o output.eps -dFitPage input.pdf
They both complete successfully but they are not maintaining the page size. The input PDF or PS files are the same drawings and they both a page size of 300x300pts. You can download these files here and here. They look like this:
But after converting them to EPS the results are these, PS to EPS and PDF to EPS. They look like this, the first one is the result from PS to EPS and the second one is the result from PDF to EPS (they are opened using EPS Viewer that rasterizes the image that's the reason for the low quality):
As you can see, none of them have the original 300x300 pts size, I've tried many Ghostscript options but I can't manage to get an EPS with the right Bounding Box. I just need to convert a PDF OR PS to EPS, whatever is easier or gives better results.
What you are asking for is, more or less, the exact opposite of what is normally required.
In general people want the EPS Bounding Box to be as tight as possible to the actual marks made by the EPS, because the normal use for an EPS file is to 'embed' it in another document. If you want extra white space you would normally add it around the EPS when you embed it.
Indeed, the EPS specification says that the BoundingBox comment should not include the white space. On page 8 of the EPSF specification:
"For an EPS file, the bounding box is the smallest rectangle that encloses all the marks painted on the single page of the EPS file"
Messing with Ghostscript switches isn't going to do anything helpful for you here, the device explicitly records the marks that are made by the input, and sets the BoundiongBox from those.
Perhaps if you were to explain why you want to have an EPS file with incorrect BoundingBox comments it would be possible to make some suggestions, but Ghostscript is doing exactly what it should do here.
[addendum]
(see comment below, this is in reply)
I suspect you need to change your process in some way then. One solution is to have the PDF start by filling the entire page with white. Contrary to many people's expectations that counts as making a mark on the page so the entire page would then be considered as the BoundingBox.
As long as you are using the Ghostscript eps2write device you could also parse the document for %%BeginPageSetup, the eps2write device still writes the original document size out in this section, Eg:
%!PS-Adobe-3.0 EPSF-3.0
%%Invocation: path/gswin32c -dDisplayFormat=198788 -dDisplayResolution=96 -sDEVICE=eps2write -sOutputFile=? ?
%%BoundingBox: 101 132 191 256
%%HiResBoundingBox: 101.80 132.80 190.30 255.20
%%Creator: GPL Ghostscript GIT PRERELEASE 951 (eps2write)
....
....
%%EndProlog
%%Page: 1 1
%%BeginPageSetup
4 0 obj
<</Type/Page/MediaBox [0 0 300 300]
/Parent 3 0 R
/Resources<</ProcSet[/PDF]
>>
/Contents 5 0 R
>>
endobj
%%EndPageSetup
You can see here that the original media size was 300x300, even though the BoundingBox correctly reflects the marks made on the page. Note! This is characteristic of EPS files produced by the current version of eps2write, it won't work for EPS files from other sources and may not work with eps2write in the future.
Other than that you're stuck with finding the media size from the input and passing it separately to the program doing the insertion, presumably by putting the data in some other text file to accompany the EPS. Or, of course, manually or programmatically editing the urx,ury co-ordinates of the BoundingBox.
Ghostscript isn't going to do this for you I'm afraid.

Ghostscript converting PostScript to PDF seems to ignore the page size / BoundingBox

I have created a PostScript file from a TIFF image using ImageMagick.
The command-line I am using is:
convert input.tif[0] -density 600 -alpha Off -size 5809x9408 -depth 16 intermediate.ps
This takes my input tiff image (just the main image, and not the thumbnail via using [0]) and creates a .ps file from the bitmap.
When I look at the header of my PostScript file, I can see that it has the correct page size:
%!PS-Adobe-3.0
%%Creator: (ImageMagick)
%%Title: (intermediate.ps)
%%CreationDate: (2017-05-22T08:43:44+10:00)
%%BoundingBox: -0 -0 697 1129
%%HiResBoundingBox: 0 0 697.08 1129
%%DocumentData: Clean7Bit
%%LanguageLevel: 1
%%Orientation: Portrait
%%PageOrder: Ascend
%%Pages: 1
%%EndComments
Yet, when I use GhostScript to convert this to a PDF, unless I go to a lot of trouble to specify otherwise, gs is cropping it and putting it on a US Letter sized page.
gs -dPDFA=1 -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sDefaultRGBProfile=AdobeRGB1998.icc -dOverrideICC -sOutputFile=output.pdf -r600 -P PDFA_def.ps intermediate.ps
When I open the resulting PDF, the crop box is 612 x 792 pt wich is US Letter. It should be 697 x 1129 pt, the size of the Bounding Box in the PostScript file.
I have created a custom .joboptions file using Acrobat Distiller that sets image compression and the like, and in this file if I specify the page size at the end, then the resulting PDF comes out the correct size:
<<
/HWResolution [600 600]
/PageSize [697.080 1128.960]
>> setpagedevice
Now this isn't a huge issue for a one-off conversion, but I have to convert a large number of images and I don't want to set the page size manually for every single file.
The lines you quote above are comments and, from the comments present, suggest that this is an EPS file, not a PostScript program.
The main difference is that EPS is 'encapsulated' which means its intended to be placed verbatim inside a PostScript program. The enclosing program contains the intelligence regarding the media size, and arranges to set the context such that the EPS is scaled, rotated, translated so that it fits appropriately on the media.
In order to do this successfully, the EPS file must follow certain rules; in particular it must not set any media size itself (because that would mess with the enclosing program).
So it seems likely to me that what you have is an EPS file which does not request any media size at all. So its hardly surprising that you have to tell Ghostscript what you want to do with it.
Now in order for the enclosing program to place the EPS it needs to know its characteristics, the size and shape of the content. That's what the comments are for. Ordinarily an EPS file is read by an application (eg MS Word, LibreOffice etc) which parses out those comments and uses the information when generating the final PostScript program. The reason an EPS uses comments to store this information is precisely so that it has no effect on the actual content of the EPS and so the entire EPS can be included without further processing by the application.
The short answer is that if you read the Ghostscript documentation here you will find descriptions of the EPSCrop and EPSFitPage command line switches which will do all the work for you.

Ghostscript renders ugly text

I'm trying to add the capability to render LaTeX equations to a project I'm working on. To do so, I use XeLaTeX to create a PDF file, which I then render to a (transparent) 96dpi-PNG using Ghostscript.
I'd like to have the rendered LaTeX blend in with the rest of the text (which is rendered using standard .NET GDI+ methods, but that's off-topic), but I can't get a reliably "good" text rendering: the output always looks somehow blurry or otherwise "bad".
Example:
From left to right, the same (small) PDF rendered at 96dpi with Ghostscript, Photoshop, and TexWorks (which I understand uses Ghostscript internally).
The command I use to run Ghostscript is the following:
"C:/Program Files (x86)/gs/gs9.09/bin/gswin32c.exe" \
-q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT \
-dMaxBitmap=500000000 -dAlignToPixels=1 -dGridFitTT=2 \
"-sDEVICE=pngalpha" -dTextAlphaBits=4 \
-dGraphicsAlphaBits=4 "-r96" -dFirstPage=1 -dLastPage=1 \
-sOutputFile="output.png" "input.pdf"
(which I actually pretty much copied from the command ImageMagick calls when converting a PDF file, but that's another story). I tried changing any of the relevant options (dAlignToPixels=0, dGridFitTT=0/1/2, dTextAlphaBits=2/4 [or without this parameter altogether]) and I even tried to render the PDF to 4 times the resolution and then downscale it, without any noticeable improvement.
Yet, I'm sure there must be some way of decently rendering the PDF with Ghostscript (since TexWorks does), although I'm unable to find it.
Any hint? The PDF is this one.
You could try to render your PDF at a higher resolution. 96dpi just isn't enough for text with 11 pt size.
If you use 192dpi and then scale the display of the resulting image to 50% (wherever you use the PNG), these parts should still appear in the same size as befor, but with a higher resolution. What used to be a 4x7 pixels 's' should now be a 8x14 pixels 's'...
Update
Ok, since my explanation seems to have been not comprehendible enough for the OP, here's the deal.
Generate a PDF file containing the word "Test", using Ghostscript. In my case, it is Ghostscript v9.10:
gs \
-o test.pdf \
-sDEVICE=pdfwrite \
-g230x100 \
-c "/Helvetica findfont \
11 scalefont \
setfont \
1 1 moveto \
(Test) show \
showpage"
From this PDF, generate 6 different images depicting the word "Test", using 6 different resolutions. The gs is still Ghostscript v9.10 (to be checked with gs -version):
for i in 1 2 3 4 5 6; do \
gs \
-o t$(( ${i} * 96 )).png \
-r$(( ${i} * 96 )) \
-sDEVICE=pngalpha \
-dAlignToPixels=1 \
-dGridFitTT=2 \
-dTextAlphaBits=4 \
-dGraphicsAlphaBits=4 \
t.pdf ; \
done
This will create the following PNGs, as confirmed by ImageMagick's identify command:
identify -format "%f : %Wx%H pixels -- %b filesize\n" t[1-9]*.png
t96.png : 31x13 pixels -- 475B filesize
t192.png : 61x27 pixels -- 774B filesize
t288.png : 92x40 pixels -- 1.1KB filesize
t384.png : 123x53 pixels -- 1.43KB filesize
t480.png : 153x67 pixels -- 1.76KB filesize
t576.png : 184x80 pixels -- 2.01KB filesize
Create a sample LaTeX document and embed the different images side by side and/or line by line. Here is my sample code:
\begin{document}
Test
\includegraphics[height=7.5pt]{t96.png}
\includegraphics[height=7.5pt]{t96.png}
\includegraphics[height=7.5pt]{t192.png}
\includegraphics[height=7.5pt]{t288.png}
\includegraphics[height=7.5pt]{t384.png}
\includegraphics[height=7.5pt]{t480.png}
\includegraphics[height=7.5pt]{t576.png}
Test\\
{}
Test <== real text
\includegraphics[height=7.5pt]{t96.png} <-- 96 dpi figure
\includegraphics[height=7.5pt]{t192.png} <-- 192 dpi figure
\includegraphics[height=7.5pt]{t288.png} <-- 288 dpi figure
\includegraphics[height=7.5pt]{t384.png} <-- 384 dpi figure
\includegraphics[height=7.5pt]{t480.png} <-- 480 dpi figure
\includegraphics[height=7.5pt]{t576.png} <-- 576 dpi figure
Test <== real text
\end{document}
Here is a screenshot (at 400% zoom) from the PDF created via LuaLaTeX from the above LaTeX code:
The line with the 8 "Test" words has actual text only in the first and the last word. The 6 words in between are images with 96, 96, 192, 288, 384, 480 and 576 dpi.
I hope you can see now clearly how scaling up your image generation to a higher resolution will result in better quality for your final PDF if you include the higher resolution images into your LaTeX code...
You are rendering text at 11 points, at 96 dpi, that works out to about 14 pixels in height which, frankly, is not a lot (and in my output the 's' is 7 pixels high by 4 wide). Looking at your output all 3 look 'blurry' and the Photoshop output looks overly bold in the capital T.
If you don't want it blurred, then don't set TextAlphaBits, or don't set it to such a high value.
I'd also suggest using the current release (9.15).

Add white border to PDF (change paper format)

I have to change a given PDF from A4 (210mm*297mm) to 216mm*303mm.
The additional 6 mm for each dimension should be set as white border of 3mm on each side. The original content of the PDF pages should be centered on the output pages.
I tried with convert:
convert in.pdf -bordercolor "#FFFFFF" -border 9 out.pdf
This gives me exactly the needed result but I loose very much sharpness of the original images in the PDF. It is all kind of blurry.
I also checked with
convert in.pdf out.pdf
which does no changes at all but also screws up the images.
So I tried Ghostcript but did not get any result. The best approach I found so far from a German side is:
gs -sOutputFile=out.pdf -sDEVICE=pdfwrite -g6120x8590 \
-c "<</Install{1 1 scale 8.5 8.5}>> setpagedevice" \
-dNOPAUSE -dBATCH in.pdf
but I get Error: /typecheck in --.postinstall--.
By default, Imagemagick converts input PDF files into images with 72dpi. This is awfully low resolution, as you experienced firsthad. The output of Imagemagick is always a raster image, so if your input PDF was text, it will no longer be.
If you don't mind the output PDF's getting bigger, you can simply increase the ratio Imagemagick is probing the original PDF using -density option, like this:
convert -density 600 in.pdf -bordercolor "#FFFFFF" -border 9 out.pdf
I used 600 because it is the sweet spot that works well for OCR. I recomment trying 300, 450, 600, 900 and 1200 and picking the best one that doesn't get unwieldably huge.
Shifting the content on the media is not especially hard, but it does mean altering the content stream of the PDF file, which most PDF manipulation packages avoid, with good reason.
The code you quote above really won't work, it leaves garbage on the operand stack, and the PLRM explicitly states that it is followed by an implicit initgraphics which will reset all the standard parameters anyway.
You could try instead setting a /BeginPage procedure to translate the origin, which will probably work:
<</BeginPage {8.5 8.5 translate} >> setpagedevice
Note that you aren't simply manipulating the original PDF file; Ghostscript takes the original PDF file, interprets it into graphics primitives, then reassembles those primitives into a new PDF file, this has implications... For example, if an image is DCT encoded (a JPEG) in the original, it will be decompressed before being passed into the output file. You probably don't want to reapply DCT encoding as this will introduce visible artefacts.
A simpler alternative, but involving multiple processing steps and therefore more potential for problems, is to first convert the PDF to PostScript with the ps2write device, specifying your media size, and also the -dCenterPages switch, then use the pdfwrite device to turn the resulting PostScript into a new PDF file.
Instead of
-g6120x8590 \
-c "<</Install{1 1 scale 8.5 8.5}>> setpagedevice"
(which is wrong), you should use:
-g6120x8590 \
-c "<</Install{8.5 8.5 translate}>> setpagedevice"
or
-g6120x8590 \
-c "<</Install{3 25.4 div 72 mul dup translate}>> setpagedevice"
(which lets Ghostscript calculate the "3mm == 8.5pt" itself...)