I have several pdf files with different sizes and different width to height ratios. Now I want to create fixed-size thumbnails from 1st page of these files.
I do this using this command:
gs -dNOPAUSE -sDEVICE=jpeg -dFirstPage=1 -dLastPage=1 -sOutputFile=d:\test\a.jpeg -dJPEGQ=100 -g509x750 -dUseCropBox=true -dPDFFitPage=true -q d:\test\a.pdf -c quit
Since the original files are of different widths and heights but thumbnails should be of the same size, there will be white margins in the right side or top of the thumbnails. But I want to have equal margins on top and bottom (or right and left) of the thumbnail (just like thumbnail displayed in windows explorer).
Is there any way to do it using GhostScript?
Yes, but not with a single switch, and not while using -dPDFFitPage.
PDFFitPage will scale the content isomorphicallly (ie the same in each direction), so you will either have white margins at the top or the right of the output.
In order to centre the content, you need to duplicate the functionality of PDFFitPage, and also translate the origin in either the x or y direction, by half the 'excess' in whichever direction has space left over.
You can find the code which performs the scaling in /ghostpdl/gs/Resource/Init/pdf_main.ps, look for /pdf_PDF2PS_matrix and then:
//systemdict /PDFFitPage known {
PDFDEBUG { (Fiting PDF to imageable area of the page.) = flush } if
currentpagedevice /.HWMargins get aload pop
currentpagedevice /PageSize get aload pop
% Adjust PageSize and .HWMargins for the page portrait/landscape orientation
Note that as far as I can see, the current implementation already does centre the output:
% stack: savedCTM <pdfpagedict> [Box] scale XImageable YImageable XBox YBox
3 index 2 index 6 index mul sub 2 div 3 index 2 index 7 index mul sub 2 div
PDFDEBUG { ( Centering translate by [ ) print 1 index =print (, ) print dup =print ( ]) = flush } if
translate pop pop pop pop
Related
I have been looking for a solution for this problem :
I'm using python ReportLab canvas to generate overlay(watermark) document from source document
and merge it into source pdf document (with PyPDF2).
Summary i have two landscape-oriented A0 single pages pdfs.
I want to overlay(watermarking) them in a manner that the resulting single page pdf contains both pages merged,
but with the same denisity.
But result (merged) document contains watermark that were rotated by 90 against source document
print(source.rect.width, source.rect.height) giving next result 5102.0 2384.0
but
print(source.mediabox.width, source.mediabox.height) giving next result 2384.0 5102.0
Problem that Source document looks landscape-oriented on devices and printers but
has structure in pdf :
Rotate 90
MediaBox [ 0 0 2384 5102 ]
watermark page has next structure in pdf
MediaBox [ 0 0 5102 2384 ]
Rotate 0
You can change the rotation of a page to 0 whilst counter-rotating its dimensions and page content to compensate - resulting in a visually-unchanged file - using cpdf's -upright operation:
cpdf -upright in.pdf -o out.pdf
Preprocessing your files in this manner should let your overlay function operate as expected.
I need to add a white rectangle and some text to the bottom left corner of each page of the PDF document using Ghostscript. To achieve this, I have created the following Postscript script:
<<
/EndPage
{
2 eq { pop false }
{
newpath
0 0 moveto
0 20 lineto
200 20 lineto
200 0 lineto
closepath
%%gsave
1 setgray
fill
%%grestore
1 setlinewidth
0 setgray
stroke
gsave
/Times-Roman 9 selectfont
30 5 moveto
(My text) show
grestore
true
} ifelse
} bind
>> setpagedevice
This works well when combined with a Ghostscript command:
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=output.pdf my_script.ps input.pdf
However, if input.pdf is in landscape mode, then the white box and text are printed in the upper left corner and not the lower left. I can get it to work by adding:
90 rotate 0 -595 translate
but I can't determine when the pages are in landscape mode vs. portrait mode. I can get the page width and height, but even for landscape mode pages the width is smaller than the height. I tried the following but it fails:
/orient currentpagedevice /Orientation get def
I have been stuck with this for a while. Any help is greatly appreciated!
(Ghostscript version is 9.25)
[UPDATE]
To illustrate how the width is smaller than height for a page in landscape mode, here's the script.ps I am using: https://gist.github.com/irinkaa/9faadf30b3a5a381a0b621d72b712020
And here are the input.pdf and the output.pdf. As you can see, 612.0 - 792.0 is printed inside the output file, showing that width (612) < height (792).
When I re-run the same command on the output file, it prints the same width and height values, but the box is then placed properly in the lower left corner.
When I add the following to the script:
/orient currentpagedevice /Orientation get def
I get an error suggesting orientation isn't set (if I understand correctly):
Error: /undefined in --get--
Operand stack:
orient --dict:212/312(ro)(L)-- Orientation
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1999 1 3 %oparray_pop 1998 1 3 %oparray_pop 1982 1 3 %oparray_pop 1868 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval--
Dictionary stack:
--dict:977/1684(ro)(G)-- --dict:0/20(G)-- --dict:80/200(L)--
Current allocation mode is local
Current file position is 151
GPL Ghostscript 9.25: Unrecoverable error, exit code 1
First you should upgrade your version of Ghostscript. 9.25 is old, and has security vulnerabilities.
Secondly you need to look at both the /Orientation and /PageSize entries in the page device dictionary. Not only that but you should use the PageSize to determine the translate you are using for your 'adjustment'. Unless you are in a fixed workflow (and that seems unlikely if you are receiving mixed orientation files) then you should not assume that the media is A4.
The Ghostscript PDF interpreter looks at the MediaBox on each page of the PDF file and resets the /PageSize in the page device dictionary to match the MediaBox for the page. It will (IIRC) never set the /Orientation, if the PDF page has a /Rotate entry then that gets applied to the MediaBox and the contents of the page.
So you really just need to look at the width and height of the requested media, which is given by the /PageSize array in the page device dictionary.
Now having said that....
You say that 'even for landscape mode pages the width is smaller than the height'. That seems unlikely to me, but in the absence of an example it's hard to tell. It also makes it hard for anyone to offer any kind of advice.
I'd suggest you upload an example somewhere, and post the URL here so we can look at the file.
Oh, and I'd really recommend that you don't send the output file to stdout. It may well be convenient for you but there are already certain features of the pdfwrite device which simply won't work if you do that (they require the output file to be seekable) and there may be more cases in future.
Edit
Your problem is execution order. The program in script.ps runs before the PDF file is interpreted, then the PDF file is interpreted.
When all your program is doing is setting an EndPage procedure in the page device dictionary that's not a problem, alterations to the page device dictionary are conservative, they accumulate unless specifically overwritten.
So the fact that during the course of interpreting the PDF file changes occur to the page device dictionary doesn't matter (unless that were somehow to alter the EndPage procedure).
But at the time your program runs, the page device dictionary /PageSize key has an associated value which is an array containing the default media size (because nothing has happened to change it yet). The PageSize entry won't be altered until the PDF file gets interpreted. This means that no matter what size media your PDF file used, your program will always return the default media size.
You need to know the actual PageSize at the time the EndPage procedure is executed. So you need to investigate the current PageSize as part of the EndPage procedure.
Something like:
<<
/EndPage
{
2 eq { pop false }
{
% Get the current page device dictionary and extract the PageSize
currentpagedevice /PageSize get
% Load the values from the array onto the stack
% and discard the array copy returned by the aload operator
aload pop
% If width < height (or equal, square page)
le {
% Handle a portrait page
} {
% Handle a landscape page
} ifelse
}ifelse
} bind
>> setpagedevice
Note that this avoids creating a dictionary entry to hold the page width and height. There are several reasons for doing this;
Firstly the width and height can be different for every page (particularly in a PDF file).
Secondly you don't (in your program) create your own dictionary to store these key/value pairs which means that you are using whatever dictionary is active at the time. While that's sort of acceptable the way you have it currently, because userdict will be active at the start of the program, you have no way to know which dictionary is on top of the dictionary stack when EndPage is called. So it's not safe to just poke values into whatever dictionary happens to be top, you might end up overwriting keys with the same name, which would lead to unpredictable side effects. Likewise (as per Orientation below) if the current dictionary doesn't contain those keys, you would get an undefined error. So you're getting away with this through luck right now.
Thirdly it's generally considered better practice in PostScript to use the stack for temporary storage, rather than creating key/value pairs in dictionaries.
For the latter two reasons I'd very strongly suggest that instead of creating a key called stringholder (as your program currently does) in whatever dictionary is on top of the dictionary stack at the start of the program, and assuming it will be available during the EndPage procedure, you should instead simply create a temporary string by using 10 string instead.
Eg:
/Times-Roman 9 selectfont
30 5 moveto
pagewidth
stringHolder cvs
show
would become:
/Times-Roman 9 selectfont
30 5 moveto
currentpagedevice /PageSize get 0 get
256 string cvs
show
10 digits is possibly a little small, 256 should be enough for anyone and the string will be garbage collected so it isn't like you are leaking memory or anything.
As regards Orientation; yes, you are correct, and as I said initially the PDF interpreter doesn't set Orientation in the page device dictionary. If you try to get a key from a dictionary which doesn't contain that key then you get an undefined error. If you are uncertain whether a key exists in a dictionary you should check it first using the known operator.
Edit 2
As noted in the comments below, it's possible to test the orientation of the CTM by using the transform operator and the unit vector. If either or both of the co-ordinates resulting from transform are negative then there is rotation involved in the CTM and by examining the sign of each co-ordinate we can determine which quadrant the rotation ends up in.
For the purposes of the /Rotate flag in PDF that's sufficient, as it can only be specified in 90 degree increments. Here is an example function which determines the rotation, and a simple piece of PostScript to exercise it:
%!PS
/R {
1 1 transform
0 ge {
0 ge {
(no rotation\n) print
} {
(90 degree ccw rotation\n) print
} ifelse
} {
0 ge {
(270 ccw rotation\n) print
} {
(180 ccw rotation\n) print
} ifelse
} ifelse
} bind def
R
gsave
90 rotate R
grestore
gsave
180 rotate R
grestore
gsave
270 rotate R
grestore
gsave
360 rotate R
grestore
It's possible to use this technique to decide if the original file has been rotated, and then choose to have the EndPage procedure behave differently.
"but I can't determine when the pages are in landscape mode vs. portrait mode"
$ gs -sDEVICE=bbox -dNOPAUSE -dBATCH input.pdf | grep %B
%%BoundingBox: -1 0 842 596
%%HiResBoundingBox: -0.008930 0.018000 841.988998 595.223982
%%BoundingBox: -1 0 842 596
%%HiResBoundingBox: -0.008930 0.018000 841.988998 595.223982
Then you can have a script-portrait.ps and a script-landscape.ps as appropriate.
EDIT: I agree with KenS. The ghostscript pdfwrite output creates a different layout from the original pdf created by Acrobat Distiller 10.1.1 (Windows). I found this difference even without including the EndPage script.
I'm trying to produce new PDFs that alter dimensions only the first page (using CropBox). I used a modified version of How do I crop pages 3&4 in a multipage pdf using ghostscript
Here is what's strange: everything runs properly, but when I open the PDFs in typical applications (Preview, Acrobat, etc.), they either crash or I get a "Warning: Dimensions of Page May be Out of Range" error. In Acrobat, only one page will display, even tho page count is 2, 45, 60, or whatever.
Even stranger: I emailed the PDFs to someone to see if it was a machine-specific issue. In Gmail, everything looks fine in Google Apps's PDF viewer. So the process 'worked,' but it looks like there's something about the dimensions or page size that is throwing other apps off.
I've tried multiple GS options (dPDFFitPage, dPrinted=false, dUseCropBox, changing paper size to something other than legal), but nothing seems to work.
I'm attaching a version of a PDF that underwent this process and generates these errors as well. https://www.dropbox.com/s/ka13b7bvxmql4d2/imfwb.pdf?dl=0
Modified output is below. xmin, ymin, xmax, ymax, height, width are variables defined elsewhere in the bigger script of which GS is a part. Data are grabbed using pdfinfo
gs \
-o output/#{filename} \
-sDEVICE=pdfwrite \
-c \"<</EndPage {
0 eq {
pop /Page# where {
/Page# get
1 eq {
(page 1) == flush
[/CropBox [#{xmin} #{ymin} #{xmax} #{ymax}] /PAGE pdfmark
true
}
{
(not page 1) == flush
[/CropBox [0 #{height.to_f} #{width.to_f} #{height.to_f}] /PAGE pdfmark
true
} ifelse
}{
true
} ifelse
}
{
false
}
ifelse
}
>> setpagedevice\" \
-f #{filename}"
`#{cmd}`
For pages after the first you set
[/CropBox [0 #{height.to_f} #{width.to_f} #{height.to_f}] /PAGE pdfmark
I.e. a crop box with zero height!
E.g. in case of your sample document page 2 has the crop box [0 792.0 612.0 792.0].
This surely is not what you want...
If you really want to "produce new PDFs that alter dimensions only the first page (using CropBox)", why do you change the crop box of later pages at all? Simply don't do anything in that case!
Why "Dimensions of Page May be Out of Range"?
Well, ISO 32000-1 in its normative Annex C declares:
The minimum page size should be 3 by 3 units in default user space
Thus, according to that older PDF specification a page height of 0 indeed is out of range for PDF!
Meanwhile, though, ISO 32000-2 has dropped that requirement, so strictly speaking a page height of zero should be nothing to complain about...
I convert PDF -> many JPEG and many JPEG -> many PDF using ghostscript. I need to add watermark text on every converted JPEG (PDF) page. Is it possible using only Ghostscript and PostScript?
The only way I found:
gswin32c -q -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -sOutputFile=output.pdf watermark.ps input.pdf
But this will insert watermark.ps watermark on first separate page in output.pdf.
Can I do this on output PDF pages directly?
Can I do this on output JPEG pages directly?
<<
/BeginPage
{ gsave
/Helvetica_Bold 120 selectfont
.85 setgray 130 70 moveto 50 rotate (Sample) show
grestore
} bind
>> setpagedevice
If I use /EndPage instead of /BeginPage - it says setpagedevice is not applicable...
How to remake this script for /EndPage?
Bit too big for a comment, so I've added a new answer. The EndPage procedure (see page 441 of the PostScript Language Reference Manual) takes two additional parameters on the stack, a count of pages emitted so far, and a reason code.
You can use the count of pages to do interesting things like duplexing, or only marking even pages or whatever, but I assume in this case you don't want it, so you just 'pop' it from the stack.
The reason code tells you why the page is being emitted, again you probably don't care so you just pop the value.
Finally the EndPage must return a boolean value to the interpreter saying whether or not to transmit the page (this allows you to do other interesting things, like only printing the first 10 pages and so on).
So you need to initially remove two values, execute your code and return a boolean. Pretty trivial:
<<
/EndPage
{ pop pop %% *BEFORE* gsave as that puts a gsave object on the stack
gsave
/Helvetica_Bold 120 selectfont
.85 setgray 130 70 moveto 50 rotate (Sample) show
grestore
true %% transmit the page, set to false to not transmit the page
} bind
>> setpagedevice
The accepted answer was inserting pages for me. The pages were blank aside from the watermark. If you run into this try adding the 2eq bit here
<<
/EndPage
{
2 eq { pop false }
{
gsave
/Helvetica_Bold 120 selectfont
.85 setgray 130 70 moveto 50 rotate (Sample) show
grestore
true
} ifelse
} bind
>> setpagedevice
I found the following site that pointed me in the correct direction
http://habjan.blogspot.com/2013/10/how-to-programmatically-add-watermark.html
Here's the calling syntax where the above file is saved as watermark.ps and gswin32c references the ghostscript exe
gswin32c -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=watermarked.pdf watermark.ps original.pdf
I don't know what you mean by 'directly'. Its possible, as you have found, to have a PostScript interpreter do many kinds of things on a per-page basis. PostScript is a programming language after all.
I would suggest that the /BeginPage and/or /EndPage procedures in the page device dictionary would be the place to start. These allow you to execute arbitrary PostScript at the start or end of every page.
If you define a /BeginPage procedure then it will be executed before any marking operations from the input program, if you define a /EndPage then it will be executed after the marking operations from the input program (on a page by page basis(.
This allows you to have your own marks lie 'under' or 'over' the marks from the program.
I use Ghostscript to convert PDF documents to PCL for printing. Recently I have the additional requirement that all pages must be rotated to Portrait before printing. I have found a way to do so using Ghostscript with following command and postscript function.
"C:\Program Files (x86)\gs\bin\gswin32c.exe" "-dNOPAUSE" "-dNOPROMPT" "-dBATCH" "-sDEVICE=pxlmono" "-Ic:\Program Files (x86)\gs\fonts\;c:\Program Files (x86)\gs\lib\;c:\Program Files (x86)\gs\lib\;" "-r300" "-sOutputFile=C:\EXPORTFILE_e542e04f-5e84-4c8e-9b41-55480cd5ec52.cache" "rotate612x792.ps" "C:\EXPORTFILE_3a5de9da-d9ca-4562-8cb6-10fb8715385a.cache"
Contents of rotate612x792.ps
%! Rotate Pages
<< /Policies << /PageSize 5 >>
/PageSize [612 792]
/InputAttributes currentpagedevice
/InputAttributes get mark exch {1 index /Priority eq not {pop << /PageSize [612 792] >>} if } forall >>
>> setpagedevice
The problem is that this function replaces all page sizes with letter size. My documents are sometimes legal or A4. I have tried to modify this function to replace landscape sizes with their portrait counterpart, but have not been able to produce functioning postscript. I need to be pointed in the right direction to produce the postscript equivalent of the following pseudo code.
for(each page)
{
if(PageSize == [792 612])
PageSize = [612 792];
}
I am aware that there are non-Ghostscript ways of rotating pages, but if I can get this to work it would fit nicely into my process and would not reduce performance.
Here is a sample of one of my pdf files:
Sample1.pdf
PostScript is a programming language, so you can do a lot with it. What you need to do here is redefine the action of requesting page sizes. The Page size and content are separate in PostScript, so you need to do 2 things:
1) Alter the media request from landscape to portrait
2) rotate the content of the page
The simplest way to do this is to redefine the 'setpagedevice' operator. Here's an example:
/oldsetpagedevice /setpagedevice load def %% copy original definition
/setpagedevice {
dup /PageSize known { %% Do we have a page size redefinition ?
dup /PageSize get %% get the array if so
aload pop %% get elements remove array copy
gt { %% is width > height ?
dup /PageSize get aload %% get size array, put content on stack
3 1 roll %% roll stack to put array at back
exch %% swap width and height
3 -1 roll %% bring array back to front of stack
astore %% put swapped elements into array
/PageSize exch %% put key on stack and swap with value
2 index %% copy the original dict
3 1 roll %% move dict to back of stack
put %% put new page size array in dict
90 rotate %% rotate content 90 degrees anti-clockwise
} if
} if
oldsetpagedevice %% call the original definition
} bind def
This checks configuration changes to see if the page size is being altered, if it is it gets the new size, and looks to see if width > height (a simple definition of landscape). If that is true then it alters the request by swapping the width and height, and then rotates the page content by 90 degrees.
You can use this with Ghostscript by putting the above content in a file (eg prolog.ps) and then running that file before your own job:
gs ...... prolog.ps job.ps
I have tried this, but not with a landscape file as I didn't have one to hand. Note also that it is possible to construct a PostScript program which will defeat this.
I found a workable solution. It is not as versatile as I hoped, but it hits all my requirements.
The following postscript script will rotate A4, Letter and Legal documents to Portrait. To get it to do other page sizes adjust the min and max sizes.
%!PS
% Sequence to set up for a single page size, auto fit all pages.
% Autorotate A4 Letter and Legal page sizes to Portrait
<< /Policies << /PageSize 3 >>
/InputAttributes currentpagedevice /InputAttributes get %current dict
dup { pop 1 index exch undef } forall % remove all page sizes
dup 0 << /PageSize [ 595 0 612 99999 ] >> put % [ min-w min-h max-w max-h ]
>> setpagedevice
This postscript script will rotate A4, Letter and Legal documents to Landscape. The only difference is the Min/Max page size values.
%!PS
% Sequence to set up for a single page size, auto fit all pages.
% Autorotate A4 Letter and Legal page sizes to Landscape
<< /Policies << /PageSize 3 >>
/InputAttributes currentpagedevice /InputAttributes get %current dict
dup { pop 1 index exch undef } forall % remove all page sizes
dup 0 << /PageSize [ 0 595 99999 612 ] >> put % [ min-w min-h max-w max-h ]
>> setpagedevice
This solution is based off the auto-rotate.ps file I found in the source code for the hylafax project. That project appears to be licensed under BSD.
Although Zig158 answer is working well, since then a new option has appeared
-dFIXEDMEDIA
witch works for any paper size, not only for a4.
See Ghostscript bug tracker for additional details.