Using Ghostscript to change page size from A4 and other to Letter - pdf

I recently asked this question about changing paper size and have a command that scales properly most of time:
gs -sDEVICE=pdfwrite -sOutputFile= $outFile -dBATCH -dNOPAUSE -q -dDEVICEHEIGHTPOINTS=792 -dDEVICEWIDTHPOINTS=612 -dPDFFitPage -dFIXEDMEDIA $inFile
We need to have it generate output pages that are US Letter. I have two problem files I can post. One needs extra whitespace; the other just needs a very small rescaling but it gets a wildly different output.
The first file is an A4 file. The command scales it to a height of 792 and the width is scaled to 559.667. It's scaled accurately, but we need whitespace either on both sides or on the right. How can I modify my command (or run a second command) to do this?
The second file is 8.52" x 11.02". For this one I get an output file that is 549.127 x 709.8 pts and I don't get it at all. 0.02" is within our printing tolerances so we can let it go, but a) I'd rather have just one process b) maybe the issue isn't the small scaling adjustments and maybe it will be a problem for other files.

These, along with your previous question, are all related to the myriad different Boxes available in a PDF file, and the various ways that a PDF processor will deal with the available Boxes.
For your first file; the output of pdfwrite has a MediaBox of 612x792 but a CropBox of [26.1662903 0 585.83374 792.0] which is (of course) not Letter. Its the result of scaling A4 down to Letter and centring that scaled down area on the Letter media.
If you remove the CropBox (using a binary editor) and open the file in Acrobat you will see that the white space is evenly distributed left and right of the page.
So really its up to your printing process. Either you need to tell that process to use the MediaBox and ignore the CropBox, or have it centre the CropBox on the media when the CropBox is not the same as the media.
Your second file has a MediaBox of 684x864 which is 9.5x12 inches. However it has a TrimBox which is [33.8027 33.7217 647.533 827.028] doing the arithmetic that works out is 8.524x11.018 inches.
Clearly Acrobat (or whatever you are using to get the size) is using the TrimBox, not the MediaBox. Ghostscript uses the MediaBox by default, if you want it to use a different *Box then you haver to tell it so. Try adding -dUseTrimBox to your command line. See my answer to your previous question :-)

Related

How can I renderize a PDF into BMP fitting content to PDF page boundaries?

I am getting a BMP from a PDF with GhostScript, but its content is not fitted into page boundaries. Even I try any option, I am not able to get the content fitted.
I've tried to generate the BMP with different GhostScript options, but noone seems to fit 100% ok the content.
This is the last command I tried. Please, don't expect it to have what I need, just copied & paste from tty.
gs -dBATCH -dNOPAUSE -sPAPERSIZE=a4 -dFIXEDMEDIA -dPSFitPage -sDEVICE=bmpmono -sOutputFile=Betlem.bmp -g1184x968 -c "<</PageSize [900 500]>> setpagedevice 0 0 translate" -c "<</PageOffset [-23 -100]>> setpagedevice" -f Betlem.pdf
I am expecting to get the content fitted into the BMP image borders, without exception of a pixel. I am using an OpenCV & Python function to extract content and fit in new image and this is the debug:
initial BMP image resolution = (872, 900)
BMP image resolution after fit content into new page = (541, 870)
Have a look to the following thread for the fitting funtion in Python:
I can't find a way to fit contour on new image zero point
You are using PSFitPage for a PDF file, you should be using PDFFitPage or just FitPage.
Note that the 'fitting' in this case is fitting the PDF media size to the existing media. If the PDF content leaves white space around the edge of the media, then the resulting scaling will include that.
In addition you are using PostScript to offset the page origin, which will introduce white space, and you are trying to change the media size, which won't work because you've set -dFIXEDMEDIA. Using these in combination with any of the FitPage switches is not likely to work well.
Randomly stabbing at controls and copying bits of code intended to solve different problems isn't likely to help you I'm afraid.
Without seeing an example file I can't, of course, tell you how to solve your problem, and I'm not really sure exactly what you are trying to achieve. A bitmap with no white space ? A bitmap of a given size with no white space ? Something else ?
[Edit]
OK so looking at the PDF file, the media box is 11.69x8.27 inches, there is white space at the top, bottom, left and right between the marks on the page and the edge of the media.
Running this through Ghostscript, to TIFF at 72 dpi results in a file which Adobe Photoshop says is 11.694x8.264 inches and has white space at top bottom left and right, just like the PDF file.
By default Ghostscript uses the Media size from the PDF to render to, however you can change this. If you were to change the media size to (say) 5.8x4.14 inches, set -dFIXEDMEDIA and then rendered the PDF file what would happen is that the top and right hand side of the PDF file would be 'off the page' so you would only get the left hand portion rendered. Try this:
gs -DEVICEWIDTHPOINTS=421 -dDEVICEHEIGHTPOINTS=298 -dFIXEDMEDIA "A betlem m en vull anar(1).pdf"
You will see the white space is still present at bottom and left, and the top and right have fallen off the page.
Now, if you add FitPage that will scale the original media down until it fits the new media size (and all the content too, of course). If you try:
gs -DEVICEWIDTHPOINTS=421 -dDEVICEHEIGHTPOINTS=298 -dFIXEDMEDIA -dFitPage "A betlem m en vull anar(1).pdf"
You'll see that the output is the same physical dfimensions as the previous command, but now the whole of the PDF content can be seen because its been scaled down. You should also see that the distribution of white space has changed, because I didn't strictly divide by 2 in each direction. The FitPage switch scaled the content in both directions by the same amount, and distributed the extra space in the x direction evenly to each side, as new white space.
Now I've no clue what you mean by 'simmetric'. You can undoubtedly do what you want using Ghostscript and the PostScript language, but I don't know what it is you want. Pointing me at Python code isn't going to help I'm afraid, I don't speak Python.
I can say that Ghostscript does not add extra white space that isn't present in the original unless you mess with the rendering by addding parameters like FitPage and FIXEDMEDIA.
If you can explain what you are trying to achieve I can probably tell you what to do.

Ghostscript add white background image

I have a script which automatically adds a gutter to a PDF file. It adds gutter to left for ODD numbered pages and gutter to the right for EVEN numbered pages. It does this by moving the existing image over.
Here is the code for that:
'gs -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -o output.pdf \
-dDEVICEWIDTHPOINTS=513 \
-dDEVICEHEIGHTPOINTS=738 -dFIXEDMEDIA -c \
"<< /CurrPageNum 1 def /Install { /CurrPageNum CurrPageNum 1 add def CurrPageNum 2 mod 1 eq \
{-4.5 0 translate} {4.5 0 translate} \
ifelse } bind >> setpagedevice" -f input_file.pdf
I've found that when I send this PDF file to the printer, the additional space is not "counting" so the file is now narrower now. I think this is because transparency doesn't count on the PDF, and so when sent to the printer the pages are seen as narrower.
Is it possible to add a white background to the pdf so it ISN'T seen as transparent? Or is there an alternative way to fix this?
I'm afraid your assumption is flawed, your 'translate' has no transparency involvement at all, its shifting the content on the media (NB this is not an image, ie a bitmap, in general. Its more complex content). All the content is shifted, no matter whether it is transparent or not.
I'm afraid I can't follow what you mean about the printed page being 'narrower'. The Media request will be for a page 513x738 points, which is a really weird size; 7.125 by 10.25 inches. Unles that matches the page size of your printer, then its going to do 'something' with the result. Probably it will center it if the media is larger than the request, but if the media is smaller than requested, then it will either scale it down or crop it. Either will result in something different to what you expect.
Is there a reason you are changing the media size of the original PDF file ?
If the media request does match the printer then its still possible that there will be cropping or scaling going on, because the printable area may not be the same as hte size of the media. The paper handling of some printers means that they cannot print all the way to the edge of the media. In that case the printer may scale or crop the output again.
You can easily elimiate transparency as being the culprit by simply starting with a test file which does not contain any transparency. If you aren't certain then one solution owuld be to use a recent version of Ghostscript and use the pdfimage32 device. That will create a PDF file from the original PDF, but the output file will only contain a bitmap image, no transparency at all.
To help us consider the problem, it would be helpful to see the original PDF file, the PDF file you send to the printer, and a scan or photograph of the final printed page. It would also be useful to know the version of Ghostscript you are using, the make and model of the printer, and how you are sending the PDF file to the printer.

Scaling PDF file using ghostscript

Our system takes 8.5 x 11 PDF files (only) and does things to them. Sometimes customers hand us files to manipulate into the right shape. We're working to automate scaling non-standard sized PDF files into 8.5 x 11.
We've been able to handle most files we've tested with ghostscript, but we have this one customer submitted file that we are unable to handle. (And unfortunately we can't recreate the condition and, of course, can't post the customer's data.)
The file is PDF v1.7 and contains seven 8.5x11 pages followed by four pages that are 25.5 x 45.33 inches. I don't know how they were generated (Adobe Acrobat 10.1.2 per pdfinfo).
We have gradually added a series of parameters to our gs command until we arrived at this:
gs -sDEVICE=pdfwrite -sOutputFile=$final_file -dBATCH -dNOPAUSE -sPAPERSIZE=letter -q -r720 -g6120x7920 -dPDFFitPage -dFIXEDMEDIA $files_to_convert
This seems to work fine for our other files, but for this ONE file, the 25.5 x 45.33 pages are not scaled to letter size. Here are the measurements for the output file's pages 7 and 8's per pdfinfo:
Page 7 size: 612 x 792 pts (letter)
Page 7 rot: 0
Page 8 size: 1836 x 3264 pts
Page 8 rot: 0
I've read that PostScript has Policies, PageSize options, but I'm not aware of such a thing with PDF. And if it exists, I don't know how to alter it using ghostscript.
How can I make sure all pages are scaled to letter?
Well, Ghostscript uses PostScript as its scripting language, so anything you can do in PostScript you can do to a PDF file.
I really wouldn't use -g with pdfwrite, because -g specifies pixels, and since pdfwrite is a vector device that doesn't really work well. Use DEVICEHEIGHTPOINTS and DEVICEWIDTHPOINTS instead.
Don't set -sPAPERSIZE either, you can't set the media to be letter in one place and something different (the -g switch) elsewhere.
Its not really possible to tell you what's going on exactly with your PDF file without seeing it, and you haven't really explained what's wrong. You imply that the pages are not being scaled, but you don't say what size they are being drawn at. You also don't say why you think the pages are 'legal' size when viewed in Acrobat.
If you are saying that the pages in question are 'legal' but the media is much larger, then that is entirely possible and would suggest that the pages have a CropBox. Ghostscript uses the MediaBox for page sizes, Acrobat uses a plethora of different boxes, but usually defaults to the CropBox.
If you want Ghostscript to use the CropBox then just tell it -dUseCropBox.
Alternatively post an example somewhere and I can look at it.

Ghostscript loses emdash characters and replaces with hyphens

When I run a PDF which was originally created with LibreOffice on Linux, through ghostscript 9.19 on OSX, to produce another (flattened) PDF, the output is perfect except for one problem. All emdashes in the entire document have been replaced with a standard hyphen (awkwardly followed by half of a space.) Oddly enough, if I highlight the resulting "hyphen+space", my context menu shows that I've selected an emdash, so the underlying text is still an emdash, it is just rendering the wrong glyph.
I can reproduce this on multiple documents from the same source, and I'm assuming there's a setting or switch somewhere that can help resolve this.
I don't know whether the font used makes a difference, but for the sake of reference, the body text of my document is set in Arno Pro. When I use a modern version of LibreOffice on OS X to make a sample document also containing an emdash in Arno Pro, the same problem is not exhibited, so it seems to be specific to the software which originally made these PDF files.
These PDFs are of legacy projects that I am not set-up to re-produce at this time, so I need to prepare them for reprinting using the existing files.
How do I retain emdash glyphs when running a command such as the following?
gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite \
-sColorConversionStrategy=/LeaveColorUnchanged \
-dAutoFilterColorImages=true -dAutoFilterGrayImages=true \
-sOutputFile=output.pdf input.pdf
I can add an example of the input PDF to this question if needed.
Without seeing the PDF file it isn't possible to give you an answer. Most likely the font isn't embedded, or if it is embedded doesn't have an emdash glyph.
Copy and paste uses the ToUnicode CMap, so it isn't dependent on the font. Its simply a list of character codes and the Unicode code point associated with each, when using a given font.
Note that this doesn't mean 'the underlying text is still an emdash'. The ToUnicode information is utterly separate from the font end of things, it is effectively metadata and bears no real relation to the font or rendering.
Put the file on DropBox and post the URL and someone can look into it. I'll be on vacation for the next few days though, but maybe someone else will look.
Note that in PDF you don't necessarily specify characters and positions as a list of consecutive characters; you can specify the position of each individually, or you can specify widths which override the width in the font, etc. So there almost certainly is only one glyph, the 'white space' you refer to is probably just that, white space, its not another glyph.
I should also point out (I do this a lot) that Ghostscript never 'flattens', concatenates, merges, or anything similar operation on PDF files. WHen using Ghostscript and the pdfwrite device the original input (in whatever format) is fully interpreted into graphics marking operations, and sent tot eh device. The device executes the marking operations; in the case of a rendering device, it scan-converts and writes to a bitmap. In the case of pdfwrite, it creates PDF operators.
The result of this is that the output PDF file bears no relation to the input PDF, other than its visual appearance.
You also don't say which version of Ghostscript you are using....

Obey the MediaBox/CropBox in PDF when using Ghostscript to render a PDF to a PNG

I've been using Ghostscript to convert my single figure plots rendered in PDF to PNG:
gswin32c -sDEVICE=png16m -r300x300 -sOutputFile=junk.png ^
-dBATCH -dNOPAUSE Figure_001-a.pdf
This works in the sense I get a PNG out and it contains the plot.
But it contains a huge amount of white space as well (an example source image: http://cdsweb.cern.ch/record/1258681/files/Figure_001-a.pdf).
If you view it in Acrobat you'll note there is no white space around the plot. If you use the above command line you'll find the plot is only about 1/3 of the space.
When doing the same thing with an EPS file I run into the same problem. However, there is the command-line parameter -dEPSCrop that one can pass to get the PS rendering engine to pay attention to the BoundingBox.
I need the similar argument for rendering PDFs. I was not able to find it in docs (nor even the -dEPSCrop, actually).
I had exactly the same issue. I fixed it by adding -dUseArtBox switch.
Example:
/usr/bin/gs -dUseArtBox -dNOPAUSE -sDEVICE=pngalpha -sOutputFile=output.png input.pdf
Note: -dUseArtBox switch is supported since ghostscript version 9.07
-dUseArtBox
Sets the page size to the ArtBox rather than the MediaBox. The art box defines the extent of the page's meaningful content (including potential white space) as intended by the page's creator. The art box is likely to be the smallest box. It can be useful when one wants to crop the page as much as possible without losing the content.
There are various options to control which "media size" Ghostscript renders a given input:
-dPDFFitPage
-dUseTrimBox
-dUseCropBox
With PDFFitPage Ghostscript will render to the current page device size (usually the default page size).
With UseTrimBox it will use the TrimBox (and it will at the same time set the PageSize to that value).
With UseCropBox it will use the CropBox (and it will at the same time set the PageSize to that value).
By default (give no parameter), Ghostscript will render using the MediaBox.
For your example, it looks like adding "-dUseCropBox" will do the job you're expecting.
Note, you can additionally control the overall size of your output by using "-sPAPERSIZE" (select amongst all pre-defined values Ghostscript knows) or (for more flexibility) use "-dDEVICEWIDTHPOINTS=NNN -dDEVICEHEIGHTPOINTS=NNN".
Have you tried using pdfcrop using pdftex (comes with texlive for example) or (not tried yet) the python script pdfcrop?
I have a similar workflow using the first tool mentioned.