I have a script which automatically adds a gutter to a PDF file. It adds gutter to left for ODD numbered pages and gutter to the right for EVEN numbered pages. It does this by moving the existing image over.
Here is the code for that:
'gs -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -o output.pdf \
-dDEVICEWIDTHPOINTS=513 \
-dDEVICEHEIGHTPOINTS=738 -dFIXEDMEDIA -c \
"<< /CurrPageNum 1 def /Install { /CurrPageNum CurrPageNum 1 add def CurrPageNum 2 mod 1 eq \
{-4.5 0 translate} {4.5 0 translate} \
ifelse } bind >> setpagedevice" -f input_file.pdf
I've found that when I send this PDF file to the printer, the additional space is not "counting" so the file is now narrower now. I think this is because transparency doesn't count on the PDF, and so when sent to the printer the pages are seen as narrower.
Is it possible to add a white background to the pdf so it ISN'T seen as transparent? Or is there an alternative way to fix this?
I'm afraid your assumption is flawed, your 'translate' has no transparency involvement at all, its shifting the content on the media (NB this is not an image, ie a bitmap, in general. Its more complex content). All the content is shifted, no matter whether it is transparent or not.
I'm afraid I can't follow what you mean about the printed page being 'narrower'. The Media request will be for a page 513x738 points, which is a really weird size; 7.125 by 10.25 inches. Unles that matches the page size of your printer, then its going to do 'something' with the result. Probably it will center it if the media is larger than the request, but if the media is smaller than requested, then it will either scale it down or crop it. Either will result in something different to what you expect.
Is there a reason you are changing the media size of the original PDF file ?
If the media request does match the printer then its still possible that there will be cropping or scaling going on, because the printable area may not be the same as hte size of the media. The paper handling of some printers means that they cannot print all the way to the edge of the media. In that case the printer may scale or crop the output again.
You can easily elimiate transparency as being the culprit by simply starting with a test file which does not contain any transparency. If you aren't certain then one solution owuld be to use a recent version of Ghostscript and use the pdfimage32 device. That will create a PDF file from the original PDF, but the output file will only contain a bitmap image, no transparency at all.
To help us consider the problem, it would be helpful to see the original PDF file, the PDF file you send to the printer, and a scan or photograph of the final printed page. It would also be useful to know the version of Ghostscript you are using, the make and model of the printer, and how you are sending the PDF file to the printer.
Related
I am getting a BMP from a PDF with GhostScript, but its content is not fitted into page boundaries. Even I try any option, I am not able to get the content fitted.
I've tried to generate the BMP with different GhostScript options, but noone seems to fit 100% ok the content.
This is the last command I tried. Please, don't expect it to have what I need, just copied & paste from tty.
gs -dBATCH -dNOPAUSE -sPAPERSIZE=a4 -dFIXEDMEDIA -dPSFitPage -sDEVICE=bmpmono -sOutputFile=Betlem.bmp -g1184x968 -c "<</PageSize [900 500]>> setpagedevice 0 0 translate" -c "<</PageOffset [-23 -100]>> setpagedevice" -f Betlem.pdf
I am expecting to get the content fitted into the BMP image borders, without exception of a pixel. I am using an OpenCV & Python function to extract content and fit in new image and this is the debug:
initial BMP image resolution = (872, 900)
BMP image resolution after fit content into new page = (541, 870)
Have a look to the following thread for the fitting funtion in Python:
I can't find a way to fit contour on new image zero point
You are using PSFitPage for a PDF file, you should be using PDFFitPage or just FitPage.
Note that the 'fitting' in this case is fitting the PDF media size to the existing media. If the PDF content leaves white space around the edge of the media, then the resulting scaling will include that.
In addition you are using PostScript to offset the page origin, which will introduce white space, and you are trying to change the media size, which won't work because you've set -dFIXEDMEDIA. Using these in combination with any of the FitPage switches is not likely to work well.
Randomly stabbing at controls and copying bits of code intended to solve different problems isn't likely to help you I'm afraid.
Without seeing an example file I can't, of course, tell you how to solve your problem, and I'm not really sure exactly what you are trying to achieve. A bitmap with no white space ? A bitmap of a given size with no white space ? Something else ?
[Edit]
OK so looking at the PDF file, the media box is 11.69x8.27 inches, there is white space at the top, bottom, left and right between the marks on the page and the edge of the media.
Running this through Ghostscript, to TIFF at 72 dpi results in a file which Adobe Photoshop says is 11.694x8.264 inches and has white space at top bottom left and right, just like the PDF file.
By default Ghostscript uses the Media size from the PDF to render to, however you can change this. If you were to change the media size to (say) 5.8x4.14 inches, set -dFIXEDMEDIA and then rendered the PDF file what would happen is that the top and right hand side of the PDF file would be 'off the page' so you would only get the left hand portion rendered. Try this:
gs -DEVICEWIDTHPOINTS=421 -dDEVICEHEIGHTPOINTS=298 -dFIXEDMEDIA "A betlem m en vull anar(1).pdf"
You will see the white space is still present at bottom and left, and the top and right have fallen off the page.
Now, if you add FitPage that will scale the original media down until it fits the new media size (and all the content too, of course). If you try:
gs -DEVICEWIDTHPOINTS=421 -dDEVICEHEIGHTPOINTS=298 -dFIXEDMEDIA -dFitPage "A betlem m en vull anar(1).pdf"
You'll see that the output is the same physical dfimensions as the previous command, but now the whole of the PDF content can be seen because its been scaled down. You should also see that the distribution of white space has changed, because I didn't strictly divide by 2 in each direction. The FitPage switch scaled the content in both directions by the same amount, and distributed the extra space in the x direction evenly to each side, as new white space.
Now I've no clue what you mean by 'simmetric'. You can undoubtedly do what you want using Ghostscript and the PostScript language, but I don't know what it is you want. Pointing me at Python code isn't going to help I'm afraid, I don't speak Python.
I can say that Ghostscript does not add extra white space that isn't present in the original unless you mess with the rendering by addding parameters like FitPage and FIXEDMEDIA.
If you can explain what you are trying to achieve I can probably tell you what to do.
Here have been already quite a few questions and answers about cropping documents with Ghostscript.
However, the answers are not matching my exact needs and are still confusing to me.
I expected that there would be a single option e.g. "-AutoCropToBBox" or something like this.
For clarification, as a bounding box, I understand the smallest rectangular box which contains all (non-white(?)) printed objects completely.
Furthermore, I want/have to use a printer port redirection (RedMon) to generate a cropped PDF via printing to a Postscript-printer from basically any application.
So, under Win7/64bit, I set the redirected port properties:
Redirected port properties Win7/64bit
The output is redirected to
C:\Windows\system32\cmd.exe
The arguments for the program are:
/c gswin64c.exe -sDEVICE=pdfwrite -o -sOutputFile="%1".pdf -
"%1" contains the user input for filename. With this, I get a full-page PDF. Fine!
But how to add the cropping options?
Additional question:
If I have a multipage document will such an (auto-)cropping be individual for each page? Or would there be an option to keep it all the same e.g. like the first page or like the largest bounding box of all pages?
Another related issue:
the window for prompting for the filename is always popping up behind the application I am printing from. Any ideas to always bring it to the front?
Another question:
There is the Perl-script "ps2eps" and program bbox.exe (see http://ctan.org/pkg/ps2eps). It's said there that Ghostscript (or ps2epsi) is occationally(?) calculating wrong bounding boxes. Is this (still) true?
Thanks for your help.
Well your first problem is that PostScript programs are normally written to expect to be rendered to a specific media size, and are usually not tightly bounded to it. White space is important for readability.
So ordinarily the PostScript program you generate will request a specific media size, and the interpreter will do its best to match that. If it can't match it then it will use a strategy to try and get as close as possible, and scale the entire content to fit that media.
You can't expect the printer to perform any of those things if it doesn't know the required size until its finished, and you can't be certain of the bounding box until you have rendered all the marking content. It is true that some files generally EPS files have a %%BoundingBox comment but.. that's a comment, it has no effect in PostScript, its there for the benefit of applications which don't want to interpret the PostScript.
So that's why the simple switch you want isn't there, it would break the interpreter's normal functioning, for rendering.
So, the first thing you need to do is determine the bounding box of the content. You can do that, as Stefan says, by using the bbox device. And on that note, as far as I know the bbox device produces accurate output. If it does not then we would appreciate a bug report proving it so we can fix it. If people don't report bugs how are we supposed to know about them ? Its disappointing to see someone spreading FUD instead of helping out with a bug report.......
ps2epsi isn't Ghostscript, its a crappy cheap and cheerful script, I wouldn't use it. However..... If the original PostScript leaves stuff on the stack then it will end up as a corrupted (or invalid) EPS file and the original PostScript should be fixed before trying to use it as it will break any PostScript program that tries to use it (eg if you include the EPS in a docuemnt and then print it).
So if you are using Ghostscript, and you want to take a PostScript program and get an EPS out of it, use the eps2write device. It won't have a preview bu frankly who cares.
Now if I remember correctly the bbox device (and eps2write) record all marking operations, you can't simply record all the non-white marking operations; what if the white overwrites an existing mark on the page ? What if the media is not white ? Note that if you render to a PNG with Ghostscript, the untouched portion of the output is transparent, whereas white marks are not.
So the bbox is the extent of all the marking operations, regardless of the colour. The only other way to proceed would be to render the content and count the non-white pixels. But that only works at a specific resolution, change the resolution and the precise bounding box may change as well.
Once you have the Bounding Box you can tell Ghostscript to use media that size. Note that you will almost certainly also have to translate the origin, as its unlikely that the content will start tightly at the bottom left corner. You will need -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS to set the media size, and you will need to use -c and -f to send PostScript to alter the origin appropriately. In simple cases an '-x -y translate' will suffice but if the program executes initgraphics you will instead have to set a BeginPage procedure to alter the initial CTM.
If you set the media size with -dDEVICEWIDTHPOINTS etc then all pages will be the same size. If you don't want that then you need to write a BeginPage procedure to resize each page individually (you will also need to hook setpagedevice and remove the /PageSize entries from the dictionary.
I've no idea why Windows is putting the dialog box behind the active Window, it seems to have started doing that with Windows 7 (or possibly Vista). I don't see any way to alter that because I'm not sure what is generating the dialog.....
Personally I would suggest that you try the 2-step approach of running the original through Ghostscript's eps2write device and then take the EPS and create a PDF file using the pdfwrite device and the -dEPSCrop switch. Double converting is bad, but other solutions are worse. Note that EPS files cannot be multi-page, so you will have to create 'n' EPS files from an n-page PostScript program, and then supply a command line listing each EPS file as input to the pdfwrite device.
Take an example file and try this out from the command line before you try scripting it.
As I understood from #KenS explanations:
the way eps2write works, it may not or will not or actually cannot result in the minimum possible bounding box
it needs to be a 2-step process via -sDEVICE=bbox
So, I now ended up with the following process to "print" a PDF with a correct minimum possible bounding box:
Redirected Printer Port to cmd.exe
C:\Windows\system32\cmd.exe
Arguments for the program:
/c gswin64c.exe -q -o "%1".ps -sDEVICE=ps2write - && gswin64c.exe -q -dBATCH -dNOPAUSE -sDEVICE=bbox -dLastPage=1 "%1".ps 2>&1 >nul | perl.exe C:\myFiles\CropPS2PDF.pl "%1"
Unfortunately, it requires a little Perl script (let's call it: CropPS2PDF.pl):
#!usr/bin/perl -w
use strict;
my $FileName = $ARGV[0];
$/ = undef;
my $Crop = <STDIN>;
$Crop =~ /%%BoundingBox: (\d+) (\d+) (\d+) (\d+)/s; # get the bbox coordinates
my ($llx, $lly, $urx, $ury) = ($1, $2, $3, $4);
print "\n$FileName: $llx, $lly, $urx, $ury \n"; # print just to check
my $Command = qq{gswin64c.exe -q -o $FileName.pdf -sDEVICE=pdfwrite -c "[/CropBox [$llx $lly $urx $ury]" -c " /PAGE pdfmark" -f $FileName.ps};
print $Command; # print just to check
system($Command); # execute command
It seems to work... :-)
Improvements are welcome.
My questions are still:
Can this be done somehow without Perl? Just Win7, cmd.exe and Ghostscript?
Is there maybe a way without writing the PS-File to disk which I do not need? Of course, I could also delete it afterwards with the Perl-script.
I recently asked this question about changing paper size and have a command that scales properly most of time:
gs -sDEVICE=pdfwrite -sOutputFile= $outFile -dBATCH -dNOPAUSE -q -dDEVICEHEIGHTPOINTS=792 -dDEVICEWIDTHPOINTS=612 -dPDFFitPage -dFIXEDMEDIA $inFile
We need to have it generate output pages that are US Letter. I have two problem files I can post. One needs extra whitespace; the other just needs a very small rescaling but it gets a wildly different output.
The first file is an A4 file. The command scales it to a height of 792 and the width is scaled to 559.667. It's scaled accurately, but we need whitespace either on both sides or on the right. How can I modify my command (or run a second command) to do this?
The second file is 8.52" x 11.02". For this one I get an output file that is 549.127 x 709.8 pts and I don't get it at all. 0.02" is within our printing tolerances so we can let it go, but a) I'd rather have just one process b) maybe the issue isn't the small scaling adjustments and maybe it will be a problem for other files.
These, along with your previous question, are all related to the myriad different Boxes available in a PDF file, and the various ways that a PDF processor will deal with the available Boxes.
For your first file; the output of pdfwrite has a MediaBox of 612x792 but a CropBox of [26.1662903 0 585.83374 792.0] which is (of course) not Letter. Its the result of scaling A4 down to Letter and centring that scaled down area on the Letter media.
If you remove the CropBox (using a binary editor) and open the file in Acrobat you will see that the white space is evenly distributed left and right of the page.
So really its up to your printing process. Either you need to tell that process to use the MediaBox and ignore the CropBox, or have it centre the CropBox on the media when the CropBox is not the same as the media.
Your second file has a MediaBox of 684x864 which is 9.5x12 inches. However it has a TrimBox which is [33.8027 33.7217 647.533 827.028] doing the arithmetic that works out is 8.524x11.018 inches.
Clearly Acrobat (or whatever you are using to get the size) is using the TrimBox, not the MediaBox. Ghostscript uses the MediaBox by default, if you want it to use a different *Box then you haver to tell it so. Try adding -dUseTrimBox to your command line. See my answer to your previous question :-)
Our system takes 8.5 x 11 PDF files (only) and does things to them. Sometimes customers hand us files to manipulate into the right shape. We're working to automate scaling non-standard sized PDF files into 8.5 x 11.
We've been able to handle most files we've tested with ghostscript, but we have this one customer submitted file that we are unable to handle. (And unfortunately we can't recreate the condition and, of course, can't post the customer's data.)
The file is PDF v1.7 and contains seven 8.5x11 pages followed by four pages that are 25.5 x 45.33 inches. I don't know how they were generated (Adobe Acrobat 10.1.2 per pdfinfo).
We have gradually added a series of parameters to our gs command until we arrived at this:
gs -sDEVICE=pdfwrite -sOutputFile=$final_file -dBATCH -dNOPAUSE -sPAPERSIZE=letter -q -r720 -g6120x7920 -dPDFFitPage -dFIXEDMEDIA $files_to_convert
This seems to work fine for our other files, but for this ONE file, the 25.5 x 45.33 pages are not scaled to letter size. Here are the measurements for the output file's pages 7 and 8's per pdfinfo:
Page 7 size: 612 x 792 pts (letter)
Page 7 rot: 0
Page 8 size: 1836 x 3264 pts
Page 8 rot: 0
I've read that PostScript has Policies, PageSize options, but I'm not aware of such a thing with PDF. And if it exists, I don't know how to alter it using ghostscript.
How can I make sure all pages are scaled to letter?
Well, Ghostscript uses PostScript as its scripting language, so anything you can do in PostScript you can do to a PDF file.
I really wouldn't use -g with pdfwrite, because -g specifies pixels, and since pdfwrite is a vector device that doesn't really work well. Use DEVICEHEIGHTPOINTS and DEVICEWIDTHPOINTS instead.
Don't set -sPAPERSIZE either, you can't set the media to be letter in one place and something different (the -g switch) elsewhere.
Its not really possible to tell you what's going on exactly with your PDF file without seeing it, and you haven't really explained what's wrong. You imply that the pages are not being scaled, but you don't say what size they are being drawn at. You also don't say why you think the pages are 'legal' size when viewed in Acrobat.
If you are saying that the pages in question are 'legal' but the media is much larger, then that is entirely possible and would suggest that the pages have a CropBox. Ghostscript uses the MediaBox for page sizes, Acrobat uses a plethora of different boxes, but usually defaults to the CropBox.
If you want Ghostscript to use the CropBox then just tell it -dUseCropBox.
Alternatively post an example somewhere and I can look at it.
I have a library manual that the creator changed some of the LaTeX code and changed the page position and size, but didn't check it before compiling, distilling and sending it off. He is currently unavailable, so if I want to print it I have to fix it myself.
I was able to use some ghostscript commands to push the entire text down to something approaching centered on the page, the command is show below:
/usr/bin/gs -sDEVICE=pdfwrite -o /home/user/shiftdown.pdf -dPDFSETTINGS=/prepress -c "<</PageOffset [0 -35]>> setpagedevice" -f /home/user/brokendoc.pdf
The issue is that while the page is now printable without hitting hardware margins, the chapter titles are still halfway cut off at the top. If I open the PDF in Acrobat or Reader, I can select the chapter title and copy it and it pastes the full text in the program of my choosing. When I tried printing it on a Xerox MFP with a partially incompatible driver it printed the header, but it wouldn't duplex and I didn't want to print 700+ pages and then use the copy to 1 -> 2 function.
Does anyone know of a way to fix these cut off headers such that they either appear correctly in the PDF file or at least reliably print correctly? I have ghostscript easily available, TeX relatively easily available and the standard version of Acrobat X.
[update:]
After downloading the demo of Acrobat Pro XI, I was able to go to the "Print Production" tab and click on "Edit Object". When I clicked on the cut off chapter titles it showed me two bounding boxes that covered the entire page with one just a little taller than the other. When I right clicked on it I got the option to Add Clip and Delete Clip. When I click on Delete Clip it shows the entire chapter title. If I click on Add Clip it says, "One or more of the selected regions already have a clipping region. Proceed with setting the clipping regions for the selected objects? [No] [Yes]"
With that added information, I know there has to be a way to in a batch mode fix the issue, anyone know what command translates into this?
Without seeing the 'brokendoc.pdf' it's hard to know. If I see the file, I can tell you what's going on, and (probably) how to fix it or work around it.
I don't need the entire file, so just a shortened version that only has a few pages that shows the problem will suffice. You might be able to get this from the complete brokendoc.pdf using:
gs -sDEVICE=pdfwrite -o part.pdf -dLastPage=10 brokendoc.pdf
Also, you may want to try:
gs -sDEVICE=pdfwrite -o fitted.pdf -dPDFFitPage -sPAPERSIZE=letter -dFIXEDMEDIA brokendoc.pdf
The above will scale (and center) the page on to the specified page size. You can specify 'letter' or 'a4' or use -dMEDIAWIDTHPOINTS=_ -dMEDIAHEIGHTPOINTS=_ to get a specific output page size. The -dFIXEDMEDIA option causes gs to ignore the MediaBox in the file.