Using only Ghostscript on a fairly locked down bare bones Linux machine, I need to combine three existing PDFs into a new PDF, put a static header on each page of the new PDF, put a static footer on each page of the new PDF, and number each of the pages (such as "Page 1257").
I have solutions to the first three problems. Ghostscript out of the box can easily combine multiple PDFs into a single new PDF. Tinkering with PostScript in the -c command line option, I can add the static header and footer to the new PDF. What I cannot yet do is get it to put page numbers on the new PDF.
This is the complex command line I now have:
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=final.pdf -c "<</EndPage {2 ne {200 0 0 setrgbcolor /NimbusSans-Bold 24 selectfont 240 2 moveto (Static Footer) show 240 773 moveto (Static Header) show 0 0 0 setrgbcolor /NimbusSans-Regular 12 selectfont 2 24 moveto (Page ) show true}{pop false} ifelse} >> setpagedevice" -f title.pdf parta.pdf partb.pdf
Removing the static header and footer pieces gets to a slightly simpler command line:
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=final.pdf -c "<</EndPage {2 ne {0 0 0 setrgbcolor /NimbusSans-Regular 12 selectfont 2 24 moveto (Page ) show true}{pop false} ifelse} >> setpagedevice" -f title.pdf parta.pdf partb.pdf
I have tried many things to get a page number to show up but they either crash Ghostscript or just keep showing the same page number on each page. The only other complication is the new PDF will be between 1,000 and 2,000 pages.
What I need is a good code example of how to make PostScript display an incrementing page number on each page of a PDF.
Its not terribly hard to count pages in PostScript. I expect you know most of the following already, but I'm going to take baby steps for the benefit of anyone else who comes across this later.
We start by creating a dictionary all our own where we can store stuff. We make sure this is defined in 'userdict' so we can always find it (userdict, like systemdict, is always available). THe name chosen should be nicely unique to prevent any other PostScript or PDF prgrams overwriting it!
userdict begin %% Make userdict the current dictionary
/My_working_Dict %% Create a name for our dict (choose something unique)
10 dict %% create a 10-element dictionary
def %% def takes a key and value and puts them in the current dictionary
My_working_Dict begin %% make our working dictionary the current one
/PageCount %% Make a name for our counter
0 %% 0, this will be the initial value
def %% define PageCount as 0 in the current dictionary
end %% close the current dictionary (My_working_Dict)
end %% close the current dictionary (userdict)
There are more efficient ways to do this, but that's an easy method to describe and follow. From this point until we close the PostScript interpreter (or restore it back to an earlier state) userdict will contain a dictionary called My_working_Dict which will have a key called PageCount. The value associated with PageCount will be 0 initially, but we can change that.
You've defined an EndPage procedure like this:
<<
/EndPage {
2 ne {
200 0 0 setrgbcolor
/NimbusSans-Bold 24 selectfont
240 2 moveto
(Static Footer) show
240 773 moveto
(Static Header) show
0 0 0 setrgbcolor
/NimbusSans-Regular 12 selectfont
2 24 moveto
(Page ) show
true
}
{
pop
false
} ifelse
}
>> setpagedevice
Now when the EndPage procedure is called, there are two numbers on the stack, the topmost number is the 'reason code' and the next number is the count of previous showpage executions. Now you would think (reasonably) you could use that count for your page count, but unfortunately it gets reset to 0 on every 'setpagedevice' call, and the PDF interpreter calls setpagedevice on every page, because its possible for every page in a PDF file to be a adiffrent size, and setpagedevice is how we change the page size.
When we return from the EndPage procedure we must push a boolean on the stack which is either 'true' (send the page to the output) or 'false' (throw it away and do nothing).
So, your procedure tests the reason code to see why EndPage has been called. If its not '2' (device is being deactivated) then its either copypage or showpage, so you draw your desired additions on the page. If it is 2 then we just pop the count of pages and return 'false' so that we don't try to emit an extra page.
If it is 2 then you set the colour to RGB black (you could do 0 setgray instead) find the font NimbusSans-Bold, scale it to 24 point and set it as the current font. You then move to the position x=240, y = 2 (0,0 is bottom left and units are points, 1/72nd of an inch) and draw the text 'Static Footer' (NB parentheses are string delimiters in PostScript)
Then you move to the position x=240, y=773 and draw the text 'Static Header'.
You then redundantly set the colour again, you don't need to do that, it stays the same until you change it, and again find the font NimbusSans-Bold, this time scaling it to 12 points and selecting it as the current font. Finally you move to the position x=2, y=24 and draw the text 'Page '.
So all you need to do is extend that EndPage procedure so that it picks up the count of pages from our dictionary, converts it to a string, and draws the resulting string.
Something like :
userdict begin %% Make userdict the current dictionary
/My_working_Dict %% Create a name for our dict (choose something unique)
10 dict %% create a 10-element dictionary
def %% def takes a key and value and puts them in the current dictionary
My_working_Dict begin %% make our working dictionary the current one
/PageCount %% Make a name for our counter
0 %% 0, this will be the initial value
def %% define PageCount as 0 in the current dictionary
end %% close the current dictionary (My_working_Dict)
end %% close the current dictionary (userdict)
<<
/EndPage {
2 ne {
0 0 0 setrgbcolor
/NimbusSans-Bold 24 selectfont
240 2 moveto
(Static Footer) show
240 773 moveto
(Static Header) show
0 0 0 setrgbcolor
/NimbusSans-Regular 12 selectfont
2 24 moveto
(Page ) show
userdict /My_working_Dict get %% get My_working_dict from userdict (leaves My_working_dict on the operand stack
dup %% duplicate the dictionary reference
/PageCount get %% get PageCount from the dictionary on the stack
1 add %% add one to the count
/PageCount %% put the key on the stack
%% stack now holds << >> int /PageCount
%% where << >> is a reference to My_working_Dict, int is our new value for PageCount, and /PageCount is the key name we are using
exch %% swap the topmost two stack items
%% stack is now << >> /PageCount int
put %% puts the top two items on the stack into the dictionary which is third on the stack.
256 string %% Temporary string to hold the count
userdict /My_working_Dict get %% get My_working_dict from userdict (leaves My_working_dict on the operand stack
/PageCount get %% get PageCount from the dictionary on the stack
exch
cvs %% Convert the top object on the stack into a string, storing the result in the second object down, whic must be a string
show %% draw the string on the page using the current colour and font.
true
}
{
pop
false
} ifelse
}
>> setpagedevice
You would then execute Ghostscript with :
gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=final.pdf modifyPDF.ps title.pdf parta.pdf partb.pdf
Now I haven't actually tried this code, so bugs are entirely possible.
[Update 2]
This program is basically the same, but stores the variable in a dicitonary in global VM, stored in globaldict, in order to defeat save/restore.
globaldict begin %% Make globaldict the current dictionary
currentglobal true setglobal %% We need to make the VM allocation mode for the dictionary global
/My_working_Dict %% Create a name for our dict (choose something unique)
10 dict %% create a 10-element dictionary
def %% def takes a key and value and puts them in the current dictionary
setglobal %% put the VM allocation mode back
globaldict /My_working_Dict %% Get the dictionary from globaldict
get begin %% make our working dictionary the current one
/PageCount %% Make a name for our counter
0 %% 0, this will be the initial value
def %% define PageCount as 0 in the current dictionary
end %% close the current dictionary (My_working_Dict)
end %% close the current dictionary (globaldict)
<<
/EndPage {
2 ne {
0 0 0 setrgbcolor
/NimbusSans-Bold 24 selectfont
240 2 moveto
(Static Footer) show
240 773 moveto
(Static Header) show
0 0 0 setrgbcolor
/NimbusSans-Regular 12 selectfont
2 24 moveto
(Page ) show
globaldict /My_working_Dict get %% get My_working_dict from globaldict (leaves My_working_dict on the operand stack
dup %% duplicate the dictionary reference
/PageCount get %% get PageCount from the dictionary on the stack
1 add %% add one to the count
/PageCount %% put the key on the stack
%% stack now holds << >> int /PageCount
%% where << >> is a reference to My_working_Dict, int is our new value for PageCount, and /PageCount is the key name we are using
exch %% swap the topmost two stack items
%% stack is now << >> /PageCount int
put %% puts the top two items on the stack into the dictionary which is third on the stack.
globaldict /My_working_Dict get %% get My_working_dict from globaldict (leaves My_working_dict on the operand stack
/PageCount get %% get PageCount from the dictionary on the stack
256 string %% Temporary string to hold the count
globaldict /My_working_Dict get %% get My_working_dict from globaldict (leaves My_working_dict on the operand stack
/PageCount get %% get PageCount from the dictionary on the stack
exch
cvs %% Convert the top object on the stack into a string, storing the result in the second object down, whic must be a string
show %% draw the string on the page using the current colour and font.
true
}
{
pop
false
} ifelse
}
>> setpagedevice
I have tried this with the suppled example and it works for me.
Related
I need to add a white rectangle and some text to the bottom left corner of each page of the PDF document using Ghostscript. To achieve this, I have created the following Postscript script:
<<
/EndPage
{
2 eq { pop false }
{
newpath
0 0 moveto
0 20 lineto
200 20 lineto
200 0 lineto
closepath
%%gsave
1 setgray
fill
%%grestore
1 setlinewidth
0 setgray
stroke
gsave
/Times-Roman 9 selectfont
30 5 moveto
(My text) show
grestore
true
} ifelse
} bind
>> setpagedevice
This works well when combined with a Ghostscript command:
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=output.pdf my_script.ps input.pdf
However, if input.pdf is in landscape mode, then the white box and text are printed in the upper left corner and not the lower left. I can get it to work by adding:
90 rotate 0 -595 translate
but I can't determine when the pages are in landscape mode vs. portrait mode. I can get the page width and height, but even for landscape mode pages the width is smaller than the height. I tried the following but it fails:
/orient currentpagedevice /Orientation get def
I have been stuck with this for a while. Any help is greatly appreciated!
(Ghostscript version is 9.25)
[UPDATE]
To illustrate how the width is smaller than height for a page in landscape mode, here's the script.ps I am using: https://gist.github.com/irinkaa/9faadf30b3a5a381a0b621d72b712020
And here are the input.pdf and the output.pdf. As you can see, 612.0 - 792.0 is printed inside the output file, showing that width (612) < height (792).
When I re-run the same command on the output file, it prints the same width and height values, but the box is then placed properly in the lower left corner.
When I add the following to the script:
/orient currentpagedevice /Orientation get def
I get an error suggesting orientation isn't set (if I understand correctly):
Error: /undefined in --get--
Operand stack:
orient --dict:212/312(ro)(L)-- Orientation
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1999 1 3 %oparray_pop 1998 1 3 %oparray_pop 1982 1 3 %oparray_pop 1868 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval--
Dictionary stack:
--dict:977/1684(ro)(G)-- --dict:0/20(G)-- --dict:80/200(L)--
Current allocation mode is local
Current file position is 151
GPL Ghostscript 9.25: Unrecoverable error, exit code 1
First you should upgrade your version of Ghostscript. 9.25 is old, and has security vulnerabilities.
Secondly you need to look at both the /Orientation and /PageSize entries in the page device dictionary. Not only that but you should use the PageSize to determine the translate you are using for your 'adjustment'. Unless you are in a fixed workflow (and that seems unlikely if you are receiving mixed orientation files) then you should not assume that the media is A4.
The Ghostscript PDF interpreter looks at the MediaBox on each page of the PDF file and resets the /PageSize in the page device dictionary to match the MediaBox for the page. It will (IIRC) never set the /Orientation, if the PDF page has a /Rotate entry then that gets applied to the MediaBox and the contents of the page.
So you really just need to look at the width and height of the requested media, which is given by the /PageSize array in the page device dictionary.
Now having said that....
You say that 'even for landscape mode pages the width is smaller than the height'. That seems unlikely to me, but in the absence of an example it's hard to tell. It also makes it hard for anyone to offer any kind of advice.
I'd suggest you upload an example somewhere, and post the URL here so we can look at the file.
Oh, and I'd really recommend that you don't send the output file to stdout. It may well be convenient for you but there are already certain features of the pdfwrite device which simply won't work if you do that (they require the output file to be seekable) and there may be more cases in future.
Edit
Your problem is execution order. The program in script.ps runs before the PDF file is interpreted, then the PDF file is interpreted.
When all your program is doing is setting an EndPage procedure in the page device dictionary that's not a problem, alterations to the page device dictionary are conservative, they accumulate unless specifically overwritten.
So the fact that during the course of interpreting the PDF file changes occur to the page device dictionary doesn't matter (unless that were somehow to alter the EndPage procedure).
But at the time your program runs, the page device dictionary /PageSize key has an associated value which is an array containing the default media size (because nothing has happened to change it yet). The PageSize entry won't be altered until the PDF file gets interpreted. This means that no matter what size media your PDF file used, your program will always return the default media size.
You need to know the actual PageSize at the time the EndPage procedure is executed. So you need to investigate the current PageSize as part of the EndPage procedure.
Something like:
<<
/EndPage
{
2 eq { pop false }
{
% Get the current page device dictionary and extract the PageSize
currentpagedevice /PageSize get
% Load the values from the array onto the stack
% and discard the array copy returned by the aload operator
aload pop
% If width < height (or equal, square page)
le {
% Handle a portrait page
} {
% Handle a landscape page
} ifelse
}ifelse
} bind
>> setpagedevice
Note that this avoids creating a dictionary entry to hold the page width and height. There are several reasons for doing this;
Firstly the width and height can be different for every page (particularly in a PDF file).
Secondly you don't (in your program) create your own dictionary to store these key/value pairs which means that you are using whatever dictionary is active at the time. While that's sort of acceptable the way you have it currently, because userdict will be active at the start of the program, you have no way to know which dictionary is on top of the dictionary stack when EndPage is called. So it's not safe to just poke values into whatever dictionary happens to be top, you might end up overwriting keys with the same name, which would lead to unpredictable side effects. Likewise (as per Orientation below) if the current dictionary doesn't contain those keys, you would get an undefined error. So you're getting away with this through luck right now.
Thirdly it's generally considered better practice in PostScript to use the stack for temporary storage, rather than creating key/value pairs in dictionaries.
For the latter two reasons I'd very strongly suggest that instead of creating a key called stringholder (as your program currently does) in whatever dictionary is on top of the dictionary stack at the start of the program, and assuming it will be available during the EndPage procedure, you should instead simply create a temporary string by using 10 string instead.
Eg:
/Times-Roman 9 selectfont
30 5 moveto
pagewidth
stringHolder cvs
show
would become:
/Times-Roman 9 selectfont
30 5 moveto
currentpagedevice /PageSize get 0 get
256 string cvs
show
10 digits is possibly a little small, 256 should be enough for anyone and the string will be garbage collected so it isn't like you are leaking memory or anything.
As regards Orientation; yes, you are correct, and as I said initially the PDF interpreter doesn't set Orientation in the page device dictionary. If you try to get a key from a dictionary which doesn't contain that key then you get an undefined error. If you are uncertain whether a key exists in a dictionary you should check it first using the known operator.
Edit 2
As noted in the comments below, it's possible to test the orientation of the CTM by using the transform operator and the unit vector. If either or both of the co-ordinates resulting from transform are negative then there is rotation involved in the CTM and by examining the sign of each co-ordinate we can determine which quadrant the rotation ends up in.
For the purposes of the /Rotate flag in PDF that's sufficient, as it can only be specified in 90 degree increments. Here is an example function which determines the rotation, and a simple piece of PostScript to exercise it:
%!PS
/R {
1 1 transform
0 ge {
0 ge {
(no rotation\n) print
} {
(90 degree ccw rotation\n) print
} ifelse
} {
0 ge {
(270 ccw rotation\n) print
} {
(180 ccw rotation\n) print
} ifelse
} ifelse
} bind def
R
gsave
90 rotate R
grestore
gsave
180 rotate R
grestore
gsave
270 rotate R
grestore
gsave
360 rotate R
grestore
It's possible to use this technique to decide if the original file has been rotated, and then choose to have the EndPage procedure behave differently.
"but I can't determine when the pages are in landscape mode vs. portrait mode"
$ gs -sDEVICE=bbox -dNOPAUSE -dBATCH input.pdf | grep %B
%%BoundingBox: -1 0 842 596
%%HiResBoundingBox: -0.008930 0.018000 841.988998 595.223982
%%BoundingBox: -1 0 842 596
%%HiResBoundingBox: -0.008930 0.018000 841.988998 595.223982
Then you can have a script-portrait.ps and a script-landscape.ps as appropriate.
EDIT: I agree with KenS. The ghostscript pdfwrite output creates a different layout from the original pdf created by Acrobat Distiller 10.1.1 (Windows). I found this difference even without including the EndPage script.
I am parsing Type3 glyphs fonts from Pdf to postscript. The input file have inline image with data streams flate decode filter applied.the filter has predictor 15.
Any body can help how I take the image streams form pdf to postscript.
This is how the input stream is given in pdf
32 0 obj
<<
/Length 342
>>
stream
37 0 4 -52 33 -1 d1
0.01 0 0 0.01 0 0 concat
gsave 2900 0 0 -5100 400 -100 concat
BI
/IM true
/W 29
/H 51
/BPC 1
/D[1
0]
/F/Fl
/DP<</Predictor 15
/Columns 29>>
ID xœ=Ì¡
Â`ÅñÿeÂLθ n`0>Ù`ñ
f[¦DŒF_ÁhC1ì%Ä)¶o.¢Ÿ"†ßá†s®àì]^ÏŠÅS³tFËÂÚ3sç'Æi èÐÇ:j‹¹¨åìOTÿ ª•ÉÙÕÅŸ¨‡¹Ó$°ÆΚWèÁ!¯Cê
÷0&f µtðV ©Ë÷iôíتÄ~Ø•Œöí&´« +ro#Ê‚ûÏÅùlßG'
EI gRestore
endstream
endobj
And here is what i am trying to write in output in Postscript
/g21 {
37 0 4 -52 33 -1 setcachedevice
q
[0.01 0 0 0.01 0 0] concat
q
[2900 0 0 -5100 400 -100] concat
[ xœ…ѱNÃ0à3©p'l` ¢abä*‰'#‚W`KP¡00öQ`d# ¨CWž€u`‰štj4Ü]# /ù¤œíÿ| ÂìÊüå7úŠ‰V'‚ª¦zò¡9à*´º
m1Õ`ñ—íü‹‡½Gù#ãÝAVxc¥Ž®"6oFܬJHÃB3(æod¾…xFP†o$!v±Ã»·0—gØY÷J$û„`´#zÊ
Oí¼œÑ¸é`Ê}ü…ñ.Z¯›cF4\¡*O¤ÑPÒYòî¦/éG‘qÑç¼2>öq<Üœ<
B˜5‚²¢ºÎ/èqUTUàoÓ9͔Π܉ä²z ‡S×ÛÙC(PA²š7èT¾ŽCGÈRaLéåksnˆÃ0z<zø:ž=
]
0
<<
/ImageType 1
/Width 29
/Height 51
/ImageMatrix [29 0 0 -51 0 51]
/BitsPerComponent 1
/Decode [1 0]
/DataSource { 2 copy get exch 1 add exch }
<</Predictor 15
/Columns 29
>>
/FlateDecode filter
>>
imagemask
pop pop
gRestore
gRestore
} def
PostScript has mostly the same filters as PDF. You don't need to decompress the data, just use the FlateDecode filter in PostScript and leave the compressed data untouched.
Note you'll need Language Level 3 for Predictor 15 (or any other PNG predictor) but that shouldn't be a problem, level 3 has been the standard for 18 years.
Otherwise you'll need to implement a version of the FlateDecode filter which supports the PNG Predictor. I believe zlib is quite capable of this.
[EDIT]
Your 'PostScript output' is incomplete, you are using PDF operators (q and Q) which you have not provided a definition for. Apart from anything else this makes it impossible to run the code through an interpreter. Kindly supply a complete simple example file, as requested. Not pasted code, I'm not inclined to go and create a file myself, and besides, binary doesn't cut and paste at all well.
Off the top of my head from desk checking I can't immediately see a problem, but since I can't run the code, I could easily be missing something.
[EDIT 2]
And that file, unsurprisingly, works fine.
You haven't supplied the PostScript file that you are creating. Its rather hard for me to tell what's wrong with the PostScript you created by looking at the PDF file you started with.
You could, of course, use Ghostscript (and I see you've used it to create the PDF file) to create a PostScript file, and then look at what that contains. If you set -dCompressFonts=false then the output font won't even be compressed.
For example:
37 0 4 -52 33 -1 d1
0.01 0 0 0.01 0 0 cm
q 2900 0 0 -5100 400 -99.9998 cm
BI
/IM true
/W 29
/H 51
/BPC 1
/D[1
0]
/F[/A85
/CCF]
/DP[null
<</K -1
/Columns 29>>]
ID
-D=,M5m+t^0_>op8\HM"Du]KKrr2rthqG/5qU_ik]$f$TlUslD91qoN93j0%dckk:ld^*DV25!+
!WX>~>
EI Q
Of course you'll need to look at the prolog to see how all the procedures used there are defined, but you can do that yourself, you certainly don't need me to do it. Notice that the imagemask uses the CCITTFax and ASCII85 decode filters, its trivial to add additional filters. Since the data is guaranteed to be 'monochrome' (its a mask) the CCITT filter generally gives superior compression to Flate.
Note that if you are really using Ghostscript 9.05 then you should upgrade, that is 6 years old.
It might possibly help if you were to explain why you want to take an ugly, bitmapped, type 3 font from PDF and make an ugly, bitmapped type 3 PostScript font from it.
[EDIT 3]
well looking at your PostScript file, the definition of the glyphs does not match what you've put in your question. The actual content looks like this:
/g10135{
88 0 4 -70 82 8 setcachedevice
q
[
0.01 0 0 0.01 0 0 ] M
q
[7800 0 0 -7800 400 800 ]M
<<
/ImageType 1
/Width 78
/Height 78
/ImageMatrix [ 78 0 0 -78 0 78]
/BitsPerComponent 1
/Decode [1
0]
/DataSource ....binary data.....
<< /Predictor 15
/Columns 78
/BitsPerComponent 1>>
/FlateDecode filter def
>> imagemask
Q
Q
}bind def
You have not supplied either a file, procedure or string source as a value for the DataSource key in the dictionary. Essentially, the PostScript interpreter reads and tokenises the /DataSource key, and then proceeds to process the binary as PostScript. Unsurprisingly this causes an error 'syntaxerror in (binary token, type=156)' when processed with Ghostscript.
If you had got past that then you would have discovered that the filter operator takes a data source as well and you haven't supplied one for that either.
So you need to create a data source for your binary data. Up to you how you do that but currentfile is one way. Or readstring given that you know the string length.
So something like:
<<
/ImageType 1
/Width 29
/Height 51
/ImageMatrix [29 0 0 -51 0 51]
/BitsPerComponent 1
/Decode [1 0]
/DataSource
<length> string dup
currentfile exch readstring
.....binary data.....
<<
/Predictor 15
/Columns 29
>> /FlateDecode filter
>> imagemask
Obviously you'll have to fill in yourself by knowing the string length. The dictionary argument to FlateDecode looks to me like it shouldn't be needed.
[Edit 4]
I notice that this is appears to be intended for commercial use. Nothing wrong with that, but I'm not going to do all your homework for you, if its your job its up to you to learn the language well enough to do the job.
I'm skipping lightly over the actual implementation details below in an attempt to outline where you are going wrong. In practice things are a little more complex, I haven't discussed how the procedure stored in the CharStrings dictionary is created, or the difference with early name binding (which is an important concept in PostScript).
Your existing code is:
/g10135{
88 0 4 -70 82 8 setcachedevice
q
[
0.01 0 0 0.01 0 0 ] M
q
[7800 0 0 -7800 400 800 ]M
<<
/ImageType 1
/Width 78
/Height 78
/ImageMatrix [ 78 0 0 -78 0 78]
/BitsPerComponent 1
/Decode [1
0]
/DataSource {417 string dup
currentfile exch readstring}
...binary data....
<< /Predictor 15
/Columns 78
>>/FlateDecode filter def
>> imagemask
Q
Q
}bind def
So, the PostScript interpreter reads those bytes one at a time, and converts them into tokens. This either results in an executable token, which is executed, or an operation on one of the stacks.
So /g10135 is terminated by the { character, because that's a reserved character. The / introduces a name object, so we end up with the name object g10135 which we push on to the operand stack. The { character introduces an executable array so we put a mark on the operand stack.
Next we read 88, terminated by a white space character. That's a numeric so we store that on the operand stack, likewise the other numbers. The operand stack now contains:
/g10135
mark
88
0
4
-70
82
8
We then read setcachedevice, which is terminated by a white space. That isn't a standard token so the interpreter starts looking through the dictionaries on the dictionary stack, looking for a definition. Since it is a standard operator, we find it in systemdict and execute it. That consumes 6 operands from the operand stack, it has no other effects (actually it does, but this is a bit special because we are executing inside a font, but we'll ignore that for now).
Next we encounter a q, again this is looked up in every dictionary on the dictionary stack to find a definition. This is defined in your own prolog as a gsave, so it takes no operands and returns no operands, it simply saves the graphics state, incrementing the save depth by 1.
I'm not going to go through the rest it would be tedious, however, eventually we reach your /DataSource, this is a name, so we push it on the operand stack. The next thing we encounter is a { that's a procedure definition so we push a mark on the operand stack. We then encounter a 417 so we push that, string, dup, currentfile, exch and readstring, so our stack looks like:
/DataSource
mark
417
string
dup
currentfile
exch
readstring
Then we get the character } That is the closing mark for an executable array, so we create the array and push it onto the operand stack:
/DataSource
{....}
Then we return to the procedure and continue executing it. The next thing we find is some binary data so we try to execute that as PostScript binary tokens. Because it isn't valid the interpreter throws an error.
Just creating an executable array is not sufficient to actually execute it. If you look at the outline code I posted at the end of edit 3 above you will note that I did not put the readstring and so on in an executable array, I simply allowed the interpreter to execute that code immediately.
By doing so the readstring acts on currentfile (the actual PostScript program in this case) and reads bytes of data from the current point in that file. The current point will be immediately after consuming the white space which terminates the readstring, ie the actual binary data. The readstring operator reads enough bytes from the file to fill the string, leaving the string on the operand stack. The file pointer has moved on to the byte after the binary data, and the interpreter resumes token scanning at that point. So it then creates the FilterParams dictionary puts the /FlateDecode name on the stack and then executes the filter operator which consumes the name, the dictionary and the string operands, returning a file object. That file object then becomes the value associated with the DataSource key in the image dictionary which is passed to the imagemask operator.
While I haven't tested that code, its basically correct. There are of course other ways to achieve the same aim.
That's basically about as far as I'm prepared to go with this, you need to go and look at what I've written and compare it with your own program.
Note that the simplest way to investigate this is to take the contents of the CharProc (excluding the setcachedevice) and just run that as a PostScript program.
I have several pdf files with different sizes and different width to height ratios. Now I want to create fixed-size thumbnails from 1st page of these files.
I do this using this command:
gs -dNOPAUSE -sDEVICE=jpeg -dFirstPage=1 -dLastPage=1 -sOutputFile=d:\test\a.jpeg -dJPEGQ=100 -g509x750 -dUseCropBox=true -dPDFFitPage=true -q d:\test\a.pdf -c quit
Since the original files are of different widths and heights but thumbnails should be of the same size, there will be white margins in the right side or top of the thumbnails. But I want to have equal margins on top and bottom (or right and left) of the thumbnail (just like thumbnail displayed in windows explorer).
Is there any way to do it using GhostScript?
Yes, but not with a single switch, and not while using -dPDFFitPage.
PDFFitPage will scale the content isomorphicallly (ie the same in each direction), so you will either have white margins at the top or the right of the output.
In order to centre the content, you need to duplicate the functionality of PDFFitPage, and also translate the origin in either the x or y direction, by half the 'excess' in whichever direction has space left over.
You can find the code which performs the scaling in /ghostpdl/gs/Resource/Init/pdf_main.ps, look for /pdf_PDF2PS_matrix and then:
//systemdict /PDFFitPage known {
PDFDEBUG { (Fiting PDF to imageable area of the page.) = flush } if
currentpagedevice /.HWMargins get aload pop
currentpagedevice /PageSize get aload pop
% Adjust PageSize and .HWMargins for the page portrait/landscape orientation
Note that as far as I can see, the current implementation already does centre the output:
% stack: savedCTM <pdfpagedict> [Box] scale XImageable YImageable XBox YBox
3 index 2 index 6 index mul sub 2 div 3 index 2 index 7 index mul sub 2 div
PDFDEBUG { ( Centering translate by [ ) print 1 index =print (, ) print dup =print ( ]) = flush } if
translate pop pop pop pop
I convert PDF -> many JPEG and many JPEG -> many PDF using ghostscript. I need to add watermark text on every converted JPEG (PDF) page. Is it possible using only Ghostscript and PostScript?
The only way I found:
gswin32c -q -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -sOutputFile=output.pdf watermark.ps input.pdf
But this will insert watermark.ps watermark on first separate page in output.pdf.
Can I do this on output PDF pages directly?
Can I do this on output JPEG pages directly?
<<
/BeginPage
{ gsave
/Helvetica_Bold 120 selectfont
.85 setgray 130 70 moveto 50 rotate (Sample) show
grestore
} bind
>> setpagedevice
If I use /EndPage instead of /BeginPage - it says setpagedevice is not applicable...
How to remake this script for /EndPage?
Bit too big for a comment, so I've added a new answer. The EndPage procedure (see page 441 of the PostScript Language Reference Manual) takes two additional parameters on the stack, a count of pages emitted so far, and a reason code.
You can use the count of pages to do interesting things like duplexing, or only marking even pages or whatever, but I assume in this case you don't want it, so you just 'pop' it from the stack.
The reason code tells you why the page is being emitted, again you probably don't care so you just pop the value.
Finally the EndPage must return a boolean value to the interpreter saying whether or not to transmit the page (this allows you to do other interesting things, like only printing the first 10 pages and so on).
So you need to initially remove two values, execute your code and return a boolean. Pretty trivial:
<<
/EndPage
{ pop pop %% *BEFORE* gsave as that puts a gsave object on the stack
gsave
/Helvetica_Bold 120 selectfont
.85 setgray 130 70 moveto 50 rotate (Sample) show
grestore
true %% transmit the page, set to false to not transmit the page
} bind
>> setpagedevice
The accepted answer was inserting pages for me. The pages were blank aside from the watermark. If you run into this try adding the 2eq bit here
<<
/EndPage
{
2 eq { pop false }
{
gsave
/Helvetica_Bold 120 selectfont
.85 setgray 130 70 moveto 50 rotate (Sample) show
grestore
true
} ifelse
} bind
>> setpagedevice
I found the following site that pointed me in the correct direction
http://habjan.blogspot.com/2013/10/how-to-programmatically-add-watermark.html
Here's the calling syntax where the above file is saved as watermark.ps and gswin32c references the ghostscript exe
gswin32c -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=watermarked.pdf watermark.ps original.pdf
I don't know what you mean by 'directly'. Its possible, as you have found, to have a PostScript interpreter do many kinds of things on a per-page basis. PostScript is a programming language after all.
I would suggest that the /BeginPage and/or /EndPage procedures in the page device dictionary would be the place to start. These allow you to execute arbitrary PostScript at the start or end of every page.
If you define a /BeginPage procedure then it will be executed before any marking operations from the input program, if you define a /EndPage then it will be executed after the marking operations from the input program (on a page by page basis(.
This allows you to have your own marks lie 'under' or 'over' the marks from the program.
I use Ghostscript to convert PDF documents to PCL for printing. Recently I have the additional requirement that all pages must be rotated to Portrait before printing. I have found a way to do so using Ghostscript with following command and postscript function.
"C:\Program Files (x86)\gs\bin\gswin32c.exe" "-dNOPAUSE" "-dNOPROMPT" "-dBATCH" "-sDEVICE=pxlmono" "-Ic:\Program Files (x86)\gs\fonts\;c:\Program Files (x86)\gs\lib\;c:\Program Files (x86)\gs\lib\;" "-r300" "-sOutputFile=C:\EXPORTFILE_e542e04f-5e84-4c8e-9b41-55480cd5ec52.cache" "rotate612x792.ps" "C:\EXPORTFILE_3a5de9da-d9ca-4562-8cb6-10fb8715385a.cache"
Contents of rotate612x792.ps
%! Rotate Pages
<< /Policies << /PageSize 5 >>
/PageSize [612 792]
/InputAttributes currentpagedevice
/InputAttributes get mark exch {1 index /Priority eq not {pop << /PageSize [612 792] >>} if } forall >>
>> setpagedevice
The problem is that this function replaces all page sizes with letter size. My documents are sometimes legal or A4. I have tried to modify this function to replace landscape sizes with their portrait counterpart, but have not been able to produce functioning postscript. I need to be pointed in the right direction to produce the postscript equivalent of the following pseudo code.
for(each page)
{
if(PageSize == [792 612])
PageSize = [612 792];
}
I am aware that there are non-Ghostscript ways of rotating pages, but if I can get this to work it would fit nicely into my process and would not reduce performance.
Here is a sample of one of my pdf files:
Sample1.pdf
PostScript is a programming language, so you can do a lot with it. What you need to do here is redefine the action of requesting page sizes. The Page size and content are separate in PostScript, so you need to do 2 things:
1) Alter the media request from landscape to portrait
2) rotate the content of the page
The simplest way to do this is to redefine the 'setpagedevice' operator. Here's an example:
/oldsetpagedevice /setpagedevice load def %% copy original definition
/setpagedevice {
dup /PageSize known { %% Do we have a page size redefinition ?
dup /PageSize get %% get the array if so
aload pop %% get elements remove array copy
gt { %% is width > height ?
dup /PageSize get aload %% get size array, put content on stack
3 1 roll %% roll stack to put array at back
exch %% swap width and height
3 -1 roll %% bring array back to front of stack
astore %% put swapped elements into array
/PageSize exch %% put key on stack and swap with value
2 index %% copy the original dict
3 1 roll %% move dict to back of stack
put %% put new page size array in dict
90 rotate %% rotate content 90 degrees anti-clockwise
} if
} if
oldsetpagedevice %% call the original definition
} bind def
This checks configuration changes to see if the page size is being altered, if it is it gets the new size, and looks to see if width > height (a simple definition of landscape). If that is true then it alters the request by swapping the width and height, and then rotates the page content by 90 degrees.
You can use this with Ghostscript by putting the above content in a file (eg prolog.ps) and then running that file before your own job:
gs ...... prolog.ps job.ps
I have tried this, but not with a landscape file as I didn't have one to hand. Note also that it is possible to construct a PostScript program which will defeat this.
I found a workable solution. It is not as versatile as I hoped, but it hits all my requirements.
The following postscript script will rotate A4, Letter and Legal documents to Portrait. To get it to do other page sizes adjust the min and max sizes.
%!PS
% Sequence to set up for a single page size, auto fit all pages.
% Autorotate A4 Letter and Legal page sizes to Portrait
<< /Policies << /PageSize 3 >>
/InputAttributes currentpagedevice /InputAttributes get %current dict
dup { pop 1 index exch undef } forall % remove all page sizes
dup 0 << /PageSize [ 595 0 612 99999 ] >> put % [ min-w min-h max-w max-h ]
>> setpagedevice
This postscript script will rotate A4, Letter and Legal documents to Landscape. The only difference is the Min/Max page size values.
%!PS
% Sequence to set up for a single page size, auto fit all pages.
% Autorotate A4 Letter and Legal page sizes to Landscape
<< /Policies << /PageSize 3 >>
/InputAttributes currentpagedevice /InputAttributes get %current dict
dup { pop 1 index exch undef } forall % remove all page sizes
dup 0 << /PageSize [ 0 595 99999 612 ] >> put % [ min-w min-h max-w max-h ]
>> setpagedevice
This solution is based off the auto-rotate.ps file I found in the source code for the hylafax project. That project appears to be licensed under BSD.
Although Zig158 answer is working well, since then a new option has appeared
-dFIXEDMEDIA
witch works for any paper size, not only for a4.
See Ghostscript bug tracker for additional details.