Interacting with Ghostscript at a per page level - pdf

I have a requirement to convert PDF files to Postscript, passing in different pre-determined setpagedevice parameters or plexing options for each page.
For example the parameters I would like to set may be:
<</MediaPosition 1>> setpagedevice
and
<</MediaPosition 2>> setpagedevice
for printer tray selection.
At the point of conversion, I know which MediaPosition I would like to set for each page in the file.
Each PDF file may have up to 5000 pages. Because of the high number of pages, I would like to interact with ghostscript at a per-page level to set each setpagedevice parameter.
At the moment, I am able to apply a setpagedevice, but of course this will set the same option for every page in the generated Postscript.
gs -dBATCH -dNOPAUSE -sDEVICE=ps2write -sOutputFile=output.ps -c "<</MediaPosition 2>> setpagedevice" -f file1.pdf
Is this possible to achieve with ghostscript?

Related

Ghostscript: convert pdf such that every page is a single image without individual objects/text

I would like to convert a multipage pdf with images and text such that every page is just a flat image without individual objects/text that can be selected (needed to not mess up a pdf-->ppt conversion...). It's not enough that the text can't be searched / copied (I've tried -dNoOutputFonts).
This command does what I want for some of the pages that contain images, but not for others with mostly text (it seems -dHaveTransparency=false is key here):
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dCompatibilityLevel=1.5 -dHaveTransparency=false -dFastWebView=true -sOutputFile=out_file.pdf in_file.pdf
What is the magic setting here to get this behavior for all pages? (Ideally while retaining a high quality output and not blowing up the file size more than necessary ;-))
I just tried an experiment with good results and might be what you want:
gs -sDEVICE=pdfimage8 -r600 -dNOPAUSE -sOutputFile=output.pdf input.pdf
Try the pdfimage24 and pdfimage32 devices as needed. There is more info here: https://www.ghostscript.com/doc/9.53.3/Devices.htm#PDFimage

How to make per-page changes to a pdf document using Ghostscript?

Some time I ago I found that that you can use postscript to make changes to pdf documents with Ghostscript. Available examples make the same changes to every page:
gs \
-sDEVICE=pdfwrite \
-o /path/to/output/pdf-shifted-by-1-inch-to-left.pdf \
-dPDFSETTINGS=/prepress \
-c "<</PageOffset [-72 0]>> setpagedevice" \
-f /path/to/input/pdf-original.pdf
Source: How can I shift page images in PDF files more to the left or to the right?
See also: Cropping a PDF using Ghostscript 9.01
But how could I set different offsets for different pages, without splitting up the pdf into separate files? For example move some pages to the right and some to the left.
I know of a way of doing this using pdftex but I was hoping to avoid this dependancy.
Well basically this is a PostScript question, because Ghostscript's PDF interpreter is (currently) written in PostScript so you can make changes to the PostScript graphics state which will affect the PDF interpreter, and take advantage of PostScript's language features to do programmatic tasks.
To do different things on each page you need to use a BeginPage or EndPage procedure. BeginPage is called at the start of every page, before the program is interpreted, and EndPage is called when the page is complete (ie on execution of a showpage).
You'll need a BeginPage procedure to modify the page setup before the page execution runs. This will be called with a count of the number of pages transmitted so far, so you can use that to make decisions about what you want to do.
NB the current PDF interpreter executes a setpagedevice on every page, because each page of a PDF can be a different size. This means some experimentation will be required to achieve your aims.

Ghostscript - PDF file, with multiple trays, and with a lot of problems

I don't speak good english, but I hope anyone can help me on this one...
I spent several days on this but I can't figure out on my own. Here's the deal:
I Have 4000+ PDF documents, with TrimBox margins, each one with 16 pages, color.
I needed to batch print them:
Print pages 1-10 using the paper on tray 3;
Print pages 11-15 using the paper on tray 4, two copies uncollated.
Print page 16 using the paper on tray 3.
I'm using an Kyocera 7550ci, the PPD is here.
I have installed GhostScript 9.19, and also gsview with gsprint. Windows 7 SP1.
When I first tried to do anything at all, didn't know ghostscript or how to use it, but doing some reading I managed to "kind of" solve the problem. I duplicated the printer on Windows control panel, setted each one with the configurations I wanted and did the following command on GSPRINT:
gsprint -printer "Kyocera TASKalfa 7550ci KX" -color -dUseTrimBox -dFitPage -from 1 -to 10 s_file0001.pdf
gsprint -printer "ALT Kyocera" -color -dUseTrimBox -dFitPage -from 11 -to 15 -copies 2 s_file0001.pdf
gsprint -printer "Kyocera TASKalfa 7550ci KX" -color -dUseTrimBox -dFitPage -from 16 -to 16 s_file0001.pdf
(I setted TASKalfa 7550ci default driver to use tray 3, and ALT Kyocera to use tray 4 and uncollate).
It worked, but was painfully slow both to Windows process, and the printer to process. I soon realised GSPRINT is slow because it has to render the whole image to bitmap, and started to see if I could use pure GhostScript to do the work.
gswin32c -dBATCH -dNOPAUSE -q -dUseTrimBox -dFitPage -dFirstPage=1 -dLastPage=10 -sDEVICE=mswinpr2 -sOutputFile="%printer%Kyocera TASKalfa 7550ci KX" -f test.pdf
gswin32c -dBATCH -dNOPAUSE -q -dUseTrimBox -dFitPage -dFirstPage=11 -dLastPage=15 -sDEVICE=mswinpr2 -sOutputFile="%printer%ALT Kyocera" -f test.pdf
gswin32c -dBATCH -dNOPAUSE -q -dUseTrimBox -dFitPage -dFirstPage=16 -dLastPage=16 -sDEVICE=mswinpr2 -sOutputFile="%printer%Kyocera TASKalfa 7550ci KX" -f test.pdf
But I'm still with alot of problems... I'm frustrated that I couldn't get it to work even trying really hard to read manuals and search around.
Using mswinpr2 is still really slow, gives me wrong colors, and can't figure out how to select the paper tray.
Using any included PCL drivers, altrought was fast and managed to select the correct tray using dMediaPosition, there's only Black and white drivers...
Using pdfwrite, don't correct scale the TrimBox to fit the whole page, and can't select correct tray.
Using ps2write, can't select tray and messes up with the page position.
I'm lost. someone can give me some directions? Also, there's some way to send everythign as one file to the printer?
Thank you ALL!
---EDIT---
Thank you both for the answers!
I managed to make it work:
gswin32c -dBATCH -dNOPAUSE -q -dPDFFitPage -dUseTrimBox -dFirstPage=1 -dLastPage=10 \
-dMediaPosition=7 -sDEVICE=pxlcolor \
-sOutputFile="%printer%Kyocera TASKalfa 7550ci KX" -f in.pdf
gswin32c -dBATCH -dNOPAUSE -q -dPDFFitPage -dUseTrimBox -dFirstPage=11 -dLastPage=15 \
-dMediaPosition=5 -sDEVICE=pxlcolor -dNumCopies=2 \
-sOutputFile="%printer%Kyocera TASKalfa 7550ci KX" -f in.pdf
gswin32c -dBATCH -dNOPAUSE -q -dPDFFitPage -dUseTrimBox -dFirstPage=16 -dLastPage=16 \
-dMediaPosition=7 -sDEVICE=pxlcolor \
-sOutputFile="%printer%Kyocera TASKalfa 7550ci KX" -f in.pdf
The only thing is that the page doesn't scale correctly on pxlcolor (it does on ljet4, but it's black and white).
I'm almost there! Thanks ^^. If anyone knows about this problem, I would appreciate.
You have asked a lot of questions, all at once, that's not really a good way to get helpful answers. In addition you haven't really been too clear about some of the problems.
1) If you want to use the TrimBox for the media size then you have to tell Ghostscript you want to use the TrimBox, you do that by -dUseTrimBox, no matter what device you want to use.
2) The mswinpr2 device works by creating a Windows DeviceContext for the printer, rendering the input to a (RGB) bitmap, then blitting the bitmap to the DeviceContext and telling it to print itself. This is slow because it will involve rendering a large bitmap (size dependent on printer resolution) to memory and then sending that large bitmap to the device.
Its one great advantage is that it will work no matter what printer you have.
GSPrint uses a 'similar' but somewhat different technique and is claimed to be faster.
Note that both these devices use the default settings of the printer which probably won't work for your complex needs.
Colour management is, of course, up to Windows in this case, but if your original PDF is specified in say CMYK then this will involve conversions CMYK->RGB->CMYK which is bound to cause colour differences.
3) There are colour PCL devices available in Ghostscript, eg the cdeskjet device.
4) pdfwrite will use the TrimBox if you select -dUseTrimBox. Since it creates a PDF file its rather hard to see how it could 'select correct tray'. If you are sending the PDF file to the printer, then you could simply have started with the original PDF file. PDF files cannot contain device-dependent criteria, such as tray selection.
5) ps2write in its current incarnation will allow you to add device-specific operations, see ghostpdl/doc/VectorDevices.htm (also available on the ghostscript.com website), section 6.5 "PostScript file output" and look for the PSDocOptions and PSPageOptions keys. You could use the PSPageOptions array to introduce individual media selection commands to each page. I have no idea what you mean by 'messes up the page position', however yet again if you do not select -dUseTrimBox then it will not be using the TrimBox........
Oh, and if you want to 'scale the TrimBox to fit the whole page' (which you only mention regarding pdfwrite) then you will have to set up a fixed media of the size you want the page scaled to (-dFIXEDMEDIA, -dDEVICEHEIGHTPOINTS= and -dDEVICEWIDTHPOINTS=), select -dUseTrimBox and -dPDFFitPage.
There is no easy way to do this. While PDF itself does not provide a facility to switch the paper trays your need to convert this stream to another PDL. PostScript is a good choice.
While converting to PostScript you can inject PostScript tray switching commands like those found in PPD:
<< /ManualFeed false >> setpagedevice statusdict begin 5 setpapertray end
On Windows platform you have choices on the implementation:
Alter the PPD to make it injecting the PostScript code before every page. The code should maintain a page counter and execute tray switching commands accordingly.
Buy a third-party software providing this capability.
Extend the printer driver with the DLL injecting the PostScript code.
The first may not work with your printer diver. Then you can try to inject a PostScript code at the beginning of the job. The code should override showpage extending it with the capability described in the first option.
The same code overloading showpage you could inject in PostScript interpreter startup sequence if you had an access to internals of the controller.

How to execute ImageMagick to convert only the first page of the multipage PDF to JPEG?

How do I execute ImageMagick's convert if I want a JPEG from the first page only of a multi-page PDF?
If you are using a convert command line you can execute it with these parameters:
convert source.pdf[0] output.jpeg
Note that the page count of ImageMagick is 0-based. So [0] means 'page 1'. To select, say the 4th page, you'd have to use [3].
This syntax does not only work for PDF input. It also works with other multi-page or mult-frame formats, such as multi-page TIFF or animated multi-frame GIFs and PNGs.
Don't use ImageMagick, use Ghostscript. ImageMagick calls Ghostscript to do the work anyway...
gs -sDEVICE=jpeg -sOutputFile=<output-filename> -dLastPage=1 <input filename>
You can also change the device to jpegcmyk (for CMYK output) or jpeggray for gray output, you can change the resolution using -r, use -dFirstPage and -dLastPage to extract a continuous range of pages, etc.
To further the answer by #KenS, Here are a more few details, particularly for Windows users.
You can download GhostScript for Windows here: http://www.ghostscript.com/download/gsdnld.html. The default installation path for the executable is "C:\Program Files\gs\gs910\bin\gswin64c.exe".
The command-line arguments listed above are correct in Windows too, but here are a few more that I found useful:
gswin64c.exe -dNOPAUSE -dBATCH -r96 -sDEVICE=jpeg -sOutputFile="<out-file.jpg>"
-dFirstPage=1 -dLastPage=1 "<input-file.pdf>"
I also created a batch file that wraps this up nicely and posted it to my GitHub account. It makes it a lot easier to create thumbnails for multiple .pdf files too. Check it out at pdf2jpg.bat.

How to change page orientation of PDF? (Ghostscript or PostScript solution needed)

Given a PDF document, how do I change individual page orientation?
I'm using latest version of Ghostscript.
Why do you require usage of Ghostscript? Would it be acceptable to use another Free, Open Source Software tool running on the commandline, such as pdftk?
Anyway, here is how to rotate pages with Ghostscript. However, this may not work for your intentions, because you cannot force a certain orientation for an individual page only. It relies on an internal Ghostscript algorithm that tries to rotate pages automatically, depending on the flow of text inside the PDFs:
* -dAutoRotatePages=/None -- retains orientation of each page;
* -dAutoRotatePages=/All -- rotates all pages (or none) depending on a kind of "majority decision";
* -dAutoRotatePages=/PageByPage -- auto-rotates pages individually.
Add one of these to the Ghostscript commandline you're using.
If there is no text on a page (or if there is an automatic page rotation set to /None), then Ghostscript uses the setpagedevice settings. You can pass such setpagedevice parameters on the Ghostscript commandline using the -c switch like this:
* -c "<</Orientation 3>> setpagedevice" -- sets landscape orientation;
* -c "<</Orientation 0>> setpagedevice" -- sets portrait orientation;
* -c "<</Orientation 2>> setpagedevice" -- sets upside down orientation;
* -c "<</Orientation 1>> setpagedevice" -- sets seascape orientation.
Probably you need to set the orientation for each page when extracting the pages. I don't think it would work when merging them back to the unified document (I have never tested this).
In any case, I'd recommend to look at pdftk too (which is also available for Windows). It is a commandline tool that can rotate pages from PDFs, and much more. Easier to use than Ghostscript for your stated purpose, and much faster as well. Especially, it can rotate individual pages inside a PDF document, leaving the other pages untouched. Example:
pdftk A=in.pdf \
cat A1-3 A4west A5-end \
output out.pdf
This command will output pages 1, 2 and 3 as well as pages 5, 6, ... last un-rotated, but will rotate page 4 by 90 degrees (so the page header faces to the "west"). (However, be aware that this command can lead to unexpected results, depending on the original orientation of your input pages: You should check the orientation of all pages of your input PDF by running pdfinfo -l 1000 input.pdf and then check for the value of the rot output: if you see values different from 0, like 90, 180 and 270, these pages are already pre-rotated...)
See here for more details: http://www.accesspdf.com/pdftk/ .
Nothing else than -dNORANGEPAGESIZE worked perfectly for me.