How to split single pdf page in half vertically using ghostscript

How to split single pdf page in half vertically using ghostscript - pdf

I have a pdf in landscape orientation and is a "Page Spread".
I need to split the page in half vertically in the middle.
I used this setting to cut the pdf in half: this one gets the left part of the page
"C:\Program Files (x86)\gs\gs9.10\bin\gswin32c.exe" -o output.pdf
-sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -dSubsetFonts=true -dEmbedAllFonts=true -g 750x1085 -c
"<</BeginPage{0.95 0.96 scale 20 22 translate}>> setpagedevice" "<</PageOffset [0 0]>> setpagedevice" -f input.pdf
The above command works fine. My problem now is I need to set the Mediabox, Cropbox, Bleedbox, Trimbox, and artbox similar to the size of the splitted page. In the instance above; it should be 750x1085.
What should be the correct command/settings to run on GS so that the Mediabox, Cropbox, Bleedbox, Trimbox and artbox has the same sizes?
UPDATE
This is a sample of the PDF file I'm trying to cut in half:
PDF FILE TO SPLIT
I am now using /PAGE pdfmark and this is the command I'm trying to use:
"C:\Program Files (x86)\gs\gs9.10\bin\gswin32c.exe" -o output.pdf -sDEVICE=pdfwrite -dNOPAUSE -dBATCH
-dSAFER -dUseCropbox -dSubsetFonts=true -dEmbedAllFonts=true -g7500x10850
-c "[/CropBox [0 0 750 1085] /PAGES pdfmark"
"[/MediaBox [0 0 750 1085] /PAGES pdfmark"
"[/TrimBox [0 0 750 1085] /PAGES pdfmark"
"[/BleedBox [0 0 750 1085] /PAGES pdfmark"
"[/ArtBox [0 0 750 1085] /PAGES pdfmark"
"<</BeginPage{0.95 0.96 scale 20 22 translate}>> setpagedevice"
"<</PageOffset [0 0]>> setpagedevice"
-f input.pdf
I still can't achieve setting the Cropbox, MediaBox, TrimBox, BleedBox, ArtBox with the same size.
What should be the correct settings for the command?

Moving to an answer here, comments are too small.
Your basic problem is that the original PDF file already contains all of the Box parameters you are trying to modify. The Ghostscript PDF interpreter will, for certain classes of output device, preserve those boxes by writing its own version of a pdfmark to the device. Because this happens after your pdfmark, it replaces yours.
There is, unfortunately, no mechanism in place to disable this. Additionally, since you are using Windows, you have the resources built into the ROM file system, so you can't readily modify them.
So at this point there isn't really anything you can do about this. If you are comfortable downloading the source and meddling with some PostScript let me know and I think I can give you a solution (you don't need to rebuild GS, but the PostScript resources aren't available separately).
EDIT to include some suggested work-arounds
In pdf_main.ps:
% Test whether the current output device handles pdfmark.
/.writepdfmarkdict 1 dict dup /pdfmark //null put readonly def
/.writepdfmarks { % - .writepdfmarks <bool>
currentdevice //.writepdfmarkdict .getdeviceparams
mark eq { //false } { pop pop //true } ifelse
systemdict /DOPDFMARKS known or
} bind def
You could alter /.writepdmarks to be:
/.writepdfmarks { % - .writepdfmarks <bool>
false
} bind def
But note that a number of other things will stop being emitted in the output PDF file.
Instead you could look at the code in pdf_showpage_finish:
/pdfshowpage_finish { % <pagedict> pdfshowpage_finish -
save /PDFSave exch store
...
...
.writepdfmarks {
% Copy the boxes.
{ /CropBox /BleedBox /TrimBox /ArtBox } {
2 copy pget {
% .pdfshowpage_Install transforms the user space do same here with the boxes
oforce_elems
2 { Page pdf_cached_PDF2PS_matrix transform 4 2 roll } repeat
normrect_elems 4 index 5 1 roll fix_empty_rect_elems 4 array astore
mark 3 1 roll {/PAGE pdfmark} stopped {cleartomark} if
} {
pop
} ifelse
} forall
You can modify the line '{ /CropBox /BleedBox /TrimBox /ArtBox }', any Boxes you don't want to preserve you can remove from that line.
If you want to prevent any of them overriding your specifications, then remove the lines from '% Copy the boxes' up to '% Copy annotations and links'
Please note this only works on a system where the resources are not built into a ROM file system, or where the resources are at least available on disk, and the path to the Resource files is specified using the -I switch.

Related

How do I crop pages 3&4 in a multipage pdf using ghostscript

I would like to crop just some pages in a multipage pdf keeping all pages, some cropped, others not. I tried the following but it "deletes" the non cropped pages...
gswin64.exe -o cropped.pdf -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=4 -c "[/CropBox [24 72 1559 1794]" -c " /PAGES pdfmark" -f input.pdf
I've seen the posts on different cropping on odd and even pages, but I could not figure out how to apply this to a certain page in a multipage document.
gswin64.exe -o cropped.pdf -sDEVICE=pdfwrite -c "<</EndPage {0 eq {2 mod 0 eq {[/CropBox [0 0 1612 1792] /PAGE pdfmark true}{[/CropBox [500 500 612 792] /PAGE pdfmark true} ifelse}{false}ifelse}>> setpagedevice" -f input.pdf
This does crop all pages according to the settings of the second CropBox. If anybody is wondering about the large margins... I apply this do large drawings.
I have also tried to substitute some operators to only apply the crop to a certain page number: "sub 4" instead of "2 mod" was one attempt to attain the " 0 eq" condition only when the current page number reaches 4.

OK first things first, Ghostscript and the pdfwrite device do not 'modify' an input PDF file. For regular readers; standard lecture here, if you've read it before you can skip the following paragraph.
The way this works is that the input file is completely interpreted into a sequence of graphics primitives which are sent to the device. Rendering devices then call the graphics library to render the primitives to a bitmap, which is then output. High level (vector) devices, such as pdfwrite, translate the primitives into equivalent operations in some high level page description language, and emit that.
So, when you select -dFirstPage and -dLastPage, those are only pages for the input file you are choosing to process. So pdfwrite isn't 'deleting' your pages, you never sent them to the device in the first place.
Now, Ghostscript is a PostScript interpreter, and therefore its action can be affected by writing PostScript programs. In your case you probably want to actually process all the pages (so drop -dFirstPage and -dLastPage), but only write the pdfmark on selected pages.
The way to do this is via a BeginPage or EndPage procedure. If you search here or in the PostScript tag you'll find a number of examples. Fundamentally both procedures are called with a reason code and a count of pages so far.
From memory you will want to check the reason code is 2. If it is, then you want to check the count of pages, and it it matches your criteria (in the case here, count is 3 or 4), execute the /PAGE pdfmark. In any case you want to return 'true' so that the page is emitted.
[EDIT added here]
Hmm, OK I see the problem. What's happening is that the PDF interpreter is calling 'setpagedevice' to set the page size for each page, in case the page size has altered. The problem is that this resets the page count back to 0 each time.
Now, I wouldn't normally suggest the following, because it relies on some undocumented aspects of Ghostscript's PDF interpreter. However, I happen to know that the PDF interpreter tracks the page number internally using a named object called /Page#.
So, if I take the code you wrote, and modify it slightly:
<<
/EndPage {
0 eq {
pop /Page# where {
/Page# get
3 eq {
(page 3) == flush
[/CropBox [0 0 1612 1792] /PAGE pdfmark
true
}
{
(not page 3) == flush
[/CropBox [500 500 612 792] /PAGE pdfmark
true
} ifelse
}{
true
} ifelse
}
{
false
}
ifelse
}
>> setpagedevice
Couple of things to note; there's some debug in there, the lines with '== flush' print out some stuff on the back channel so you know how each page is being handled. If /Page# isn't defined, then the code simply leaves everything alone, this is just some basic safety-first stuff.
Rather than type all this on the command line (which also loses indenting and is hard to read) I stuck it in a file, called test.ps, then invoked GS as:
gswin32c -sDEVICE=pdfwrite -sOutputFile=out.pdf test.ps input.pdf
Its not the neatest solution in the world, but it works for me.

-dSubsetFonts=false option stops showing TrueType fonts /glyphshow

I have a PostScript that uses TrueType fonts. However, I want to include rarly used characters like registration marks (®) and right/left single/double quotes (’, “ etc).
So I used glyphshow and called the names of the glyphs
%!
<< /PageSize [419.528 595.276] >> setpagedevice
/DeviceRGB setcolorspace
% Page 1
%
% Set the Original to be the top left
%
0 595.276 translate
1 -1 scale
gsave
%
% Save this state before moving x y specifically for images
%
1 -1 scale
/BauerBodoniBT-Roman findfont 30 scalefont setfont % set the pt size %-3.792 - 16
1 0 0 setrgbcolor
10 -40 moveto /quoteright glyphshow
10 -80 moveto /registered glyphshow
/Museo-700 findfont 30 scalefont setfont % set the pt size %-3.792 - 16
1 0 1 setrgbcolor
10 -120 moveto /quoteright glyphshow
10 -180 moveto /registered glyphshow
showpage
When I execute this PostScript using the following command (due to my requirement for the pdf to be editable in Illustrator i.e. can be opened with all fonts intact) the PDF shows nothing but seems to contain the glyphs if you copy and paste from the pdf into a text file.
gs -o gly_subsetfalse.pdf -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dSubsetFonts=false -dPDFSETTINGS=/prepress textglyph.ps
However, this above command now causes issues with pulling it into Illustrator. The rare glyphs become unrecongisble (', Æ). Normal characters and regular glyphs seem fine i.e. /a glyphshow and just show text appear in pdf and illustrator.
So, it seems that having the SubsetFonts option as True shows rare glyphs but this stops me from pulling the PDF into Illustrator.
Attached are the TrueType Fonts for reference and two PDFs (one with subsetfont option being truw and the other not - default).
I have also tried the following command with the same ill results (no visible glyphs appearing on the PDF and Illustrator incorrectly shows the glyphs).
gs -o gly_subsetfalse_embedallfonts.pdf -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dPDFSETTINGS=/prepress -dSubsetFonts=false -dEmbedAllFonts=true textglyph.ps
But with this command I also get a PreFlight error from the PDF if that helps:
"Glyph width info in PDF does not match width info in embedded font"
Attached are all the files spoke about above - click here.
Encoding the font also does not produce good results.
I have encoded a TrueType(and a Type42) font in my PostScript and listed a few new characters to glyphshow.
Results are:
Command 1:
gs -o encode_ttf_subset_false.pdf -sDEVICE=pdfwrite -dSubsetFonts=false encode.ps
Results 1:
Open the PDF in Acrobat does NOT display any glyphshow characters.
Command 2:
gs -o encode_ttf_subset_true.pdf -sDEVICE=pdfwrite encode.ps
Results 2:
Open the PDF in Acrobat and it DOES show the glyphshow characters but not in Illustrator.
Command 3:
gs -o encode_ttf_subset_false_embedtrue.pdf -sDEVICE=pdfwrite -dSubsetFonts=false -dEmbedAllFonts=true encode.ps
Results 3:
Same as Result 1 (glyphshow characters do not appear).
Below is my new PostScript with Encoded TTF and Type42 (I've also included them in my file further below).
Is this a bug at least with Ghostscript?
/museobold findfont dup %%%%% This is the Type42 Font
length dict
copy begin
/Encoding Encoding 256 array copy def
Encoding 1 /oacute put
Encoding 2 /aacute put
Encoding 3 /eacute put
Encoding 4 /questiondown put
Encoding 5 /quotedblleft put
Encoding 6 /quoteright put
Encoding 7 /quotedblbase put
/museobold-Esp currentdict definefont pop
end
/museobold-Esp 18 selectfont
72 600 moveto
(\005D\001lnde est\002 el camino a San Jos\003? More characters \006 and \007) show
%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%
/BauerBodoniBT-Roman findfont dup
length dict
copy begin
/Encoding Encoding 256 array copy def
Encoding 1 /oacute put
Encoding 2 /aacute put
Encoding 3 /eacute put
Encoding 4 /questiondown put
Encoding 5 /quotedblleft put
Encoding 6 /quoteright put
Encoding 7 /quotedblbase put
/BauerBodoniBT-Roman-Esp currentdict definefont pop
end
/BauerBodoniBT-Roman-Esp 18 selectfont
72 630 moveto
(\005D\001lnde est\002 el camino a San Jos\003? More characters \006 and \007) show
showpage
Click here to downloading the following: BBBTRom.ttf (TrueType font); 3 pdfs (results 1, 2 and 3); museobold (TrueType font converted to Type42 using ttftotype42) and encode.ps.

This is back to your problem with using Illustrator as a general PDF application. It can't do that. Now as you note you've found ways round that in the past, this time I believe you are out of luck.
The PostScript glypshow operator doesn't have a PDF equivalent. Also, because of the way glyphshow works, we cannot simply use any existing font instance to store the glyph (because the glyph may not be, and probably isn't, present in the Encoding). As a result pdfwrite does the only thing it can. It makes a new font which consists only of the glyphs used by glyphshow from the specific original font's CharStrings.
Because we don;t have an Encoding to work from we have to use a custom (suymbolic) Encoding (because fonts in a PDF file have to have an Encoding) which from your previous experience I suspect means that Illustrator is unable to read the font we embed.
Using glyphshow with pdfwrite is something I would not encourage.
Now having said that, there should not be a problem with the PDF file when SubsetFonts is true, though I do have an open bug report which sounds similar. You haven't actually said which version of Ghostscript you are using, so I can't be sure if its the same problem. (nor do I have the same fonts etc). Note that this is not (I believe) related to your problem with Illustrator, that's caused by your use of glyphshow and some Illustrator limitation.
As a general rule I would not use -dPDFSETTINGS, certainly not while trying to debug a problem, nor would I limit the output to PDF 1.3.

Is it possible in Ghostscript to add watermark to every page in PDF

I convert PDF -> many JPEG and many JPEG -> many PDF using ghostscript. I need to add watermark text on every converted JPEG (PDF) page. Is it possible using only Ghostscript and PostScript?
The only way I found:
gswin32c -q -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -sOutputFile=output.pdf watermark.ps input.pdf
But this will insert watermark.ps watermark on first separate page in output.pdf.
Can I do this on output PDF pages directly?
Can I do this on output JPEG pages directly?
<<
/BeginPage
{ gsave
/Helvetica_Bold 120 selectfont
.85 setgray 130 70 moveto 50 rotate (Sample) show
grestore
} bind
>> setpagedevice
If I use /EndPage instead of /BeginPage - it says setpagedevice is not applicable...
How to remake this script for /EndPage?

Bit too big for a comment, so I've added a new answer. The EndPage procedure (see page 441 of the PostScript Language Reference Manual) takes two additional parameters on the stack, a count of pages emitted so far, and a reason code.
You can use the count of pages to do interesting things like duplexing, or only marking even pages or whatever, but I assume in this case you don't want it, so you just 'pop' it from the stack.
The reason code tells you why the page is being emitted, again you probably don't care so you just pop the value.
Finally the EndPage must return a boolean value to the interpreter saying whether or not to transmit the page (this allows you to do other interesting things, like only printing the first 10 pages and so on).
So you need to initially remove two values, execute your code and return a boolean. Pretty trivial:
<<
/EndPage
{ pop pop %% *BEFORE* gsave as that puts a gsave object on the stack
gsave
/Helvetica_Bold 120 selectfont
.85 setgray 130 70 moveto 50 rotate (Sample) show
grestore
true %% transmit the page, set to false to not transmit the page
} bind
>> setpagedevice

The accepted answer was inserting pages for me. The pages were blank aside from the watermark. If you run into this try adding the 2eq bit here
<<
/EndPage
{
2 eq { pop false }
{
gsave
/Helvetica_Bold 120 selectfont
.85 setgray 130 70 moveto 50 rotate (Sample) show
grestore
true
} ifelse
} bind
>> setpagedevice
I found the following site that pointed me in the correct direction
http://habjan.blogspot.com/2013/10/how-to-programmatically-add-watermark.html
Here's the calling syntax where the above file is saved as watermark.ps and gswin32c references the ghostscript exe
gswin32c -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=watermarked.pdf watermark.ps original.pdf

I don't know what you mean by 'directly'. Its possible, as you have found, to have a PostScript interpreter do many kinds of things on a per-page basis. PostScript is a programming language after all.
I would suggest that the /BeginPage and/or /EndPage procedures in the page device dictionary would be the place to start. These allow you to execute arbitrary PostScript at the start or end of every page.
If you define a /BeginPage procedure then it will be executed before any marking operations from the input program, if you define a /EndPage then it will be executed after the marking operations from the input program (on a page by page basis(.
This allows you to have your own marks lie 'under' or 'over' the marks from the program.

Convert subset of postscript file to pdf documents

I have a system that generates large quantities of PostScript files that each contain multiple, multi-page documents. I want to write a script that takes these large PostScript documents and outputs multiple PDF documents from each.
For example one postscript file contains 200 letters to customers, each of which is 10 pages long. This postscript file contains 2000 pages. I want to output from this 1 ps document, 200x 10 page PDFs, one for each customer.
I'm thinking GhostScript is the way to go for this level of document manipulation but I'm not sure the best way to go - Is there a function in GhostScript to take 'pages 1-10' of the input ps file? Do I have to output the entire ps file as 2000 separate ps files (1 per page) then combine them back together again?
Or is there a much simpler way of acheiving my goal with something other than GhostScript?
Many Thanks,
Ben

Technically this will be possible in the next release of Ghostscript, or using the HEAD code in the Git repository. It is now possible to switch devices when using pdfwrite which will cause the device to close and complete the current PDF file. Switching back again will start a new one.
Combine this with a BeginPage and/or EndPage procedure in the page device dictionary, and you should be able to do something like what you want.
Caveat; I haven't tried any of this, and it will take some PostScript programming to get it to work.
Because of the nature of PostScript, there is no way to extract the 'N'th page from a file, so there is no way to specify a range of pages.
As lsemi suggests you could first convert to one large PDF file and then extract the ranges you want. Ghostscript is able to use the FirstPage and LastPage switches to do this (unlike PostScript, it is possible to extract a specific page from a PDF file).

Well, you might first make the PS into a PDF object collection (or directly generate a PDF from GhostScript by printing to the PDFWriter device), and then "cut" from the big PDF using pdftk, which would be quite fast.

Create the complete PDF file first with the help of Ghostscript:
gs \
-o 2000p.pdf \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/prepress \
2000p.ps
Use pdftk to extract PDF files with 10 pages each:
for i in $(seq 0 10 199); do \
export start=$(( ${i} * 1 + 1 )); \
export end=$(( ${start} + 9 )); \
pdftk \
2000p.pdf \
cat ${start}-${end} \
output pages---${start}..${end}.pdf; \
done
You can have Ghostscript generate a 2000page sample+test PDF for you by first creating a sample PostScript file named '2000p.ps' with these contents:
%!PS
/H1 {/Helvetica findfont 48 scalefont setfont .2 .2 1 setrgbcolor} def
/pageframe {1 0 0 setrgbcolor 2 setlinewidth 10 10 575 822 rectstroke} def
/gopageno {H1 300 700 moveto } def
1 1 2000 {pageframe gopageno
4 string cvs
dup stringwidth pop
-1 mul 0 rmoveto
show
showpage} for
and then run this command:
gs -o 2000p.pdf -sDEVICE=pdfwrite -g5950x8420 2000p.ps

combining pdf files with ghostscript, how to include original file names?

I have about 250 single-page pdf files that have names like:
file_1_100.pdf,
file_1_200.pdf,
file_1_300.pdf,
file_2_100.pdf,
file_2_200.pdf,
file_2_300.pdf,
file_3_100.pdf,
file_3_200.pdf,
file_3_300.pdf
...etc
I am using the following command to combine them to a single pdf file:
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file*pdf
It works perfectly, combining them in the correct order. However, when I am looking at finished.pdf, I want to have a reference that tells me the orignal filename for each page.
Does anyone have any suggestions? Can I add page names referencing the files or something?

It is fairly easy to put the file names into a list of Bookmarks which many PDF viewers can display.
This is done with PostScript using the 'pdfmark' distiller operator. For example, use the following
gs -sDEVICE=pdfwrite -o finished.pdf control.ps
where control.ps contains PS commands to print the pages and output the bookmark (/OUT) pdfmarks:
(examples/tiger.eps) run [ /Page 1 /Title (tiger.eps) /OUT pdfmark
(examples/colorcir.ps) run [ /Page 2 /Title (colorcir.ps) /OUT pdfmark
Note that you can also perform the enumeration using PS to automate the entire process:
/PN 1 def
(file*.pdf) {
/FN exch def
FN run
[ /Page PN /Title FN /OUT pdfmark % do the file and bookmark it by filename
/PN PN 1 add def % bump the page number
} 1000 string filenameforall
NB that the order of filenameforall enumeration is not specified, so you may want to sort the list
to control the order, using the Ghostscript extension .sort ( array lt .sort lt ).
Also after thinking about this, I also realized that if an imput file has more than one page, there is a better way to set the bookmark to the correct page number using the 'PageCount' device property.
[
(file*.pdf) { dup length string copy } 1000 string filenameforall
] % create array of filenames
{ lt } .sort % sort in increasing alphabetic order
/PN 1 def
{ /FN exch def
/PN currentpagedevice /PageCount get 1 add def % get current page count done (next is one greater)
FN run [ /Page PN /Title FN /OUT pdfmark % do the file and bookmark it by filename
} forall
The above creates an array of strings (copying them to unique string objects since filenameforall
just overwrites the string it is given), then sorts it, and finally processes the array of strings
using the forall operator. By using the PageCount device property to get the count of pages already produced, the page number (PN) for the bookmark will be correct. I have tested this snippet as 'control.ps'.

To stamp the filename on each page you can use a combination of ghostscript and pdftk. Taken from https://superuser.com/questions/171790/print-pdf-file-with-file-path-in-footer
gs \
-o outdir\footer.pdf \
-sDEVICE=pdfwrite \
-c "5 5 moveto /Helvetica findfont 9 scalefont setfont (foobar-filename.pdf) show"
pdftk \
foobar-filename.pdf \
stamp outdir\footer.pdf \
output outdir\merged_foobar-filename.pdf

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas