Some time I ago I found that that you can use postscript to make changes to pdf documents with Ghostscript. Available examples make the same changes to every page:
gs \
-sDEVICE=pdfwrite \
-o /path/to/output/pdf-shifted-by-1-inch-to-left.pdf \
-dPDFSETTINGS=/prepress \
-c "<</PageOffset [-72 0]>> setpagedevice" \
-f /path/to/input/pdf-original.pdf
Source: How can I shift page images in PDF files more to the left or to the right?
See also: Cropping a PDF using Ghostscript 9.01
But how could I set different offsets for different pages, without splitting up the pdf into separate files? For example move some pages to the right and some to the left.
I know of a way of doing this using pdftex but I was hoping to avoid this dependancy.
Well basically this is a PostScript question, because Ghostscript's PDF interpreter is (currently) written in PostScript so you can make changes to the PostScript graphics state which will affect the PDF interpreter, and take advantage of PostScript's language features to do programmatic tasks.
To do different things on each page you need to use a BeginPage or EndPage procedure. BeginPage is called at the start of every page, before the program is interpreted, and EndPage is called when the page is complete (ie on execution of a showpage).
You'll need a BeginPage procedure to modify the page setup before the page execution runs. This will be called with a count of the number of pages transmitted so far, so you can use that to make decisions about what you want to do.
NB the current PDF interpreter executes a setpagedevice on every page, because each page of a PDF can be a different size. This means some experimentation will be required to achieve your aims.
Related
I am using (PDF)LaTeX to make a document, and I also need to embed already existing PDF documents in it. The problem is that I have PDF documents in several different page sizes (letter, a4, etc) and I want to compile all of them into a single b5 PDF document.
If I use the pdfpages package from CTAN, all hyperlinks from the original PDFs are removed. So I tried to do it with GhostScript.
This sounds like something normal to do but I have failed to find a working solution.
I have, in the meanwhile, read a few question and answers, but failed to figure out what I am doing wrong and what I am missing.
This doesn't seem to address my problem of scaling.
Neither does that.
This seems to go in the right direction but I couldn't make use of the information :-(.
To make the problem easier, let's just try to resize a single PDF so that:
its contents are scaled to fit the page
the new page has the size I want
Sounds easy, and it is easy to do, for example with pdfjam:
pdfjam --outfile b5-foo.pdf --paper b5paper foo.pdf
Now the problem with this is that pdfjam throws away hyperlinks. From its website:
A potential drawback of pdfjam and other scripts based upon it is that any hyperlinks in the source PDF are lost.
This must be because it seems to use pdfpages mentioned above.
Unlike pdfjam, GhostScript keeps hyperlinks. However, it either:
crops the original when I downscale; or
does not put the scaled content on a page of the size I need -- instead, I get a page that seems to be scaled down, while keeping the original aspect ratio.
This is what I have installed:
$ gs --version
9.21
(Installed on Linux)
This is how I can use GhostScript to crop the content:
gs -dBATCH -dNOPAUSE \
-sDEVICE=pdfwrite -dFIXEDMEDIA -sPAPERSIZE=isob5 \
-o b5-foo.pdf foo.pdf
... and here is how I can use -dPDFFitPage to scale the content but also keep the aspect ratio of the original page size:
gs -dBATCH -dNOPAUSE \
-sDEVICE=pdfwrite -dFIXEDMEDIA -sPAPERSIZE=isob5 -dPDFFitPage \
-o b5-foo.pdf foo.pdf
To be even clearer: I seem to get a page that is scaled so that it would fit inside the b5 I am asking for, but it is not b5: it still has the H/W ratio the original (letter) had!
I'd be happy if this can be done just using switches but if I need to use PostScript that's perfectly fine.
The solution seems to be to use -dPSFitPage instead of -dPDFFitPage. This might have something to do with the PDF files that I am trying to resize. Unfortunately, I cannot share those :-(. When I tried to reproduce this with files that I generated and the problem does not reproduce. I don't know why this is or how I should have known it.
To summarize, using PDF files for both input and output:
-dFitPage and -dPDFFitPage give me scaled pages with the original aspect ratio
-dPSFitPage gives me scaled content on the page size I request with -sPAPERSIZE="$PAPERSIZE"
This seems to go against what the documentation says.
Here have been already quite a few questions and answers about cropping documents with Ghostscript.
However, the answers are not matching my exact needs and are still confusing to me.
I expected that there would be a single option e.g. "-AutoCropToBBox" or something like this.
For clarification, as a bounding box, I understand the smallest rectangular box which contains all (non-white(?)) printed objects completely.
Furthermore, I want/have to use a printer port redirection (RedMon) to generate a cropped PDF via printing to a Postscript-printer from basically any application.
So, under Win7/64bit, I set the redirected port properties:
Redirected port properties Win7/64bit
The output is redirected to
C:\Windows\system32\cmd.exe
The arguments for the program are:
/c gswin64c.exe -sDEVICE=pdfwrite -o -sOutputFile="%1".pdf -
"%1" contains the user input for filename. With this, I get a full-page PDF. Fine!
But how to add the cropping options?
Additional question:
If I have a multipage document will such an (auto-)cropping be individual for each page? Or would there be an option to keep it all the same e.g. like the first page or like the largest bounding box of all pages?
Another related issue:
the window for prompting for the filename is always popping up behind the application I am printing from. Any ideas to always bring it to the front?
Another question:
There is the Perl-script "ps2eps" and program bbox.exe (see http://ctan.org/pkg/ps2eps). It's said there that Ghostscript (or ps2epsi) is occationally(?) calculating wrong bounding boxes. Is this (still) true?
Thanks for your help.
Well your first problem is that PostScript programs are normally written to expect to be rendered to a specific media size, and are usually not tightly bounded to it. White space is important for readability.
So ordinarily the PostScript program you generate will request a specific media size, and the interpreter will do its best to match that. If it can't match it then it will use a strategy to try and get as close as possible, and scale the entire content to fit that media.
You can't expect the printer to perform any of those things if it doesn't know the required size until its finished, and you can't be certain of the bounding box until you have rendered all the marking content. It is true that some files generally EPS files have a %%BoundingBox comment but.. that's a comment, it has no effect in PostScript, its there for the benefit of applications which don't want to interpret the PostScript.
So that's why the simple switch you want isn't there, it would break the interpreter's normal functioning, for rendering.
So, the first thing you need to do is determine the bounding box of the content. You can do that, as Stefan says, by using the bbox device. And on that note, as far as I know the bbox device produces accurate output. If it does not then we would appreciate a bug report proving it so we can fix it. If people don't report bugs how are we supposed to know about them ? Its disappointing to see someone spreading FUD instead of helping out with a bug report.......
ps2epsi isn't Ghostscript, its a crappy cheap and cheerful script, I wouldn't use it. However..... If the original PostScript leaves stuff on the stack then it will end up as a corrupted (or invalid) EPS file and the original PostScript should be fixed before trying to use it as it will break any PostScript program that tries to use it (eg if you include the EPS in a docuemnt and then print it).
So if you are using Ghostscript, and you want to take a PostScript program and get an EPS out of it, use the eps2write device. It won't have a preview bu frankly who cares.
Now if I remember correctly the bbox device (and eps2write) record all marking operations, you can't simply record all the non-white marking operations; what if the white overwrites an existing mark on the page ? What if the media is not white ? Note that if you render to a PNG with Ghostscript, the untouched portion of the output is transparent, whereas white marks are not.
So the bbox is the extent of all the marking operations, regardless of the colour. The only other way to proceed would be to render the content and count the non-white pixels. But that only works at a specific resolution, change the resolution and the precise bounding box may change as well.
Once you have the Bounding Box you can tell Ghostscript to use media that size. Note that you will almost certainly also have to translate the origin, as its unlikely that the content will start tightly at the bottom left corner. You will need -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS to set the media size, and you will need to use -c and -f to send PostScript to alter the origin appropriately. In simple cases an '-x -y translate' will suffice but if the program executes initgraphics you will instead have to set a BeginPage procedure to alter the initial CTM.
If you set the media size with -dDEVICEWIDTHPOINTS etc then all pages will be the same size. If you don't want that then you need to write a BeginPage procedure to resize each page individually (you will also need to hook setpagedevice and remove the /PageSize entries from the dictionary.
I've no idea why Windows is putting the dialog box behind the active Window, it seems to have started doing that with Windows 7 (or possibly Vista). I don't see any way to alter that because I'm not sure what is generating the dialog.....
Personally I would suggest that you try the 2-step approach of running the original through Ghostscript's eps2write device and then take the EPS and create a PDF file using the pdfwrite device and the -dEPSCrop switch. Double converting is bad, but other solutions are worse. Note that EPS files cannot be multi-page, so you will have to create 'n' EPS files from an n-page PostScript program, and then supply a command line listing each EPS file as input to the pdfwrite device.
Take an example file and try this out from the command line before you try scripting it.
As I understood from #KenS explanations:
the way eps2write works, it may not or will not or actually cannot result in the minimum possible bounding box
it needs to be a 2-step process via -sDEVICE=bbox
So, I now ended up with the following process to "print" a PDF with a correct minimum possible bounding box:
Redirected Printer Port to cmd.exe
C:\Windows\system32\cmd.exe
Arguments for the program:
/c gswin64c.exe -q -o "%1".ps -sDEVICE=ps2write - && gswin64c.exe -q -dBATCH -dNOPAUSE -sDEVICE=bbox -dLastPage=1 "%1".ps 2>&1 >nul | perl.exe C:\myFiles\CropPS2PDF.pl "%1"
Unfortunately, it requires a little Perl script (let's call it: CropPS2PDF.pl):
#!usr/bin/perl -w
use strict;
my $FileName = $ARGV[0];
$/ = undef;
my $Crop = <STDIN>;
$Crop =~ /%%BoundingBox: (\d+) (\d+) (\d+) (\d+)/s; # get the bbox coordinates
my ($llx, $lly, $urx, $ury) = ($1, $2, $3, $4);
print "\n$FileName: $llx, $lly, $urx, $ury \n"; # print just to check
my $Command = qq{gswin64c.exe -q -o $FileName.pdf -sDEVICE=pdfwrite -c "[/CropBox [$llx $lly $urx $ury]" -c " /PAGE pdfmark" -f $FileName.ps};
print $Command; # print just to check
system($Command); # execute command
It seems to work... :-)
Improvements are welcome.
My questions are still:
Can this be done somehow without Perl? Just Win7, cmd.exe and Ghostscript?
Is there maybe a way without writing the PS-File to disk which I do not need? Of course, I could also delete it afterwards with the Perl-script.
I have many (around 1000) multiple-page PDFs for a program I am writing.
The problem is that many of them are inconsistent about page size, even within the same document at times. Does anyone know of a way I could programmatically go through the files and resize the pages to what I want? This can be in any language.
I can accomplish this in Adobe Acrobat Pro, but there are so many that would end up taking a long, long time. The only way I can get it to resize there is to add a background from a file, and then choosing the file i want to resize.
Generally, PDFtk is a good fit for this kind of problems. It will let you pull everything apart, and reorder/resize/modify pages on the command line.
I had a similar problem and could easily solve it with PDF split and merge, a Java based toolkit for editing PDF files.
You can resize a PDF with a command line tool, Ghostscript.
Assuming you want to resize a PDF to 306x396 points (which would give you a quarter of a letter sized pages), do it like this:
gs \
-o 306x396-points.pdf \
-sDEVICE=pdfwrite \
-g3060x3960 \
-dPDFFitPage \
-dUseCropBox \
input.pdf
Note that the -g.... dimensions are in pixels. Because Ghostscript internally computes with 720 PPI by default, these are increased by a factor of 10 as compared to the sizes in points.
For Windows, use gswin32c.exe or gswin64c.exe instead of gs.
I want to extract pages from a PDF file which has custom page numbering, e.g. there are pages with the number C1, C2, C3, and after that, 1,2,3,4 etc. starts.
When I use
$ gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER \
-dFirstPage=22 -dLastPage=36 \
-sOutputFile=outfile_p22-p36.pdf 100p-inputfile.pdf
FirstPage and LastPage are the page index, starting to count at the first page - which is not what I want
How can I tell GhostView to use the "real" page numbers?
You can, given a lot of knowledge about the internals of Ghostscript's PDF interpreter, access the page numbers. It will require a lot of looking around in the Resource/Init/pdf*.ps files (mostly
just pdf_main.ps) and an understanding of PostScript, but it is possible. Just not for the faint of heart.
To see an example PS program which digs around inside a PDF to glean information, have a look at toolbin/pdf_info.ps.
If someone comes up with a patch to allow FirstPage/LastPage to take names as labels, then we will consider it. A part of this patch should be a change add an option to pdf_info.ps to print the labels and the real page numbers.
I have a PDF file that I would like to optimize. I am receiving the file from an outside source so I don't have the means to recreate it from the beginning.
When I open the file in Acrobat and query the resources, it says that the fonts in the file take up 90%+ of the space. If I save the file as postscript and then save the postscript file to an optimized PDF, the file is significantly smaller (upwards of 80% smaller) and the fonts are still embedded.
I am trying to recreate these results with ghostscript. I have tried various permutations of options with pswrite and pdfwrite but what happens is when I do the initial conversion from PDF to Postscript, the text gets converted to an image. When I convert back to PDF the font references are gone so I end up with a PDF file that has 'imaged' text rather than actual fonts.
The file contains 22 embedded custom Type1 fonts which I have. I have added the fonts to the ghostscript search path and proved that ghostscript can find them with:
gs \
-I/home/nauc01
-sFONTPATH=/home/nauc01/fonts/Type1 \
-o 3783QP.pdf \
-sDEVICE=pdfwrite \
-g5950x8420 \
-c "200 700 moveto" \
-c "/3783QP findfont 60 scalefont setfont" \
-c "(TESTING !!!!!!) show showpage"
The resulting file has the font correctly embedded.
I have also tried using ghostscript to go from PDF to PDF like this:
gs \
-sDEVICE=pdfwrite \
-sNOPAUSE \
-I/home/nauc01 \
-dBATCH \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/printer \
-CompressFonts=true \
-dSubsetFonts=true \
-sOutputFile=output.pdf \
input.pdf
but the output is usually larger than the input and I can't view the file in anything but ghostscript (adobe reader gives "Object label badly formatted").
I can't provide the original file because they contain confidential information but I will try to answer any questions that need to be answered regarding them.
Any ideas? Thanks in advance.
Don't use pswrite. As you've discovered this will render text. instead use the ps2write device which retains fonts and text.
You don't say which version of Ghostscript you are using but I would recommend you use a recent one.
One point; Ghostscript isn't 'optimising' the PDF the way Acrobat does, its re-creating it. The original PDF is fully interpreted to produce a sequence of operations that mark the page, pdfwrite (and ps2write) then make a new file which only has those operations inside.
If you choose to subset fonts, then only the required glyphs will be included. If the original PDF contains extraneous information (Adobe Illustrator, for example, usually embeds a complete copy of the .ai file) then this will be discarded. This may result in a smaller file, or it may not.
Note that pdfwrite does not support compressed xref and some other later features at present, so some files may well get bigger.
I would personally not go via ps2write, since this just adds another layer of prcoessing and discarding of information. I would just use pdfwrite to create a new PDF file. If you find files for which this does not work (using current code) then you should raise a bug report at http://bugs.ghostscript.com so that someone can address the problem.
You might want to try the Multivalent Compress tool. It has an (experimental) option to subset embedded fonts that might make your PDF much smaller. It also contains a lot of switches that allow for better compression, sometimes at the cost of quality (JPEG compression of bitmaps, for example).
Unfortunately, the most recent version of Multivalent does no longer include the tools. Google for Multivalent20060102.jar, that version still includes them. To run Compress:
java -classpath /path/to/Multivalent20060102.jar tool.pdf.Compress [options] <pdf file>