Ghostscript: PDF total pages - api

I'm using Ghostscript library API (wrapping from C#) to print PDF documents from my application.
With the '-dFirstPage' and '-dLastPage' parameters I'm able to select an range of pages to be printed, but how about the total number of a PDF's pages?
It is not very nice to allow a user to select a page interval from 2 to 10 when, let me say, the PDF document has only 4 pages.
Consider that I'm using Ghostscript library through the gsapi_init_with_args API library call.

Ghostscript can count and display the number of pages of a PDF on stdout. The commandline is
gswin32c ^
-q ^
-dNODISPLAY ^
-c "(input.pdf) (r) file runpdfbegin pdfpagecount = quit"
Here all the -c "..." stuff is a PostScript commandline snippet (using a few GS internal command extensions). And input.pdf is the PDF filename (could also be a full path like (c:/path/to/my.pdf)).
However, a better and faster tool for this kind of job would be to use pdfinfo (part of the XPDF-utilities, also available on Windows).
Update:
#ebyrob wants to know if one can modify my example command line so that it also displays the PDF in a single operation. Try this:
gswin32c ^
-q ^
-c "(input.pdf) (r) file runpdfbegin pdfpagecount =" ^
-f input.pdf
Well, it's not a single operation -- it's just two different operations in a single commandline.

For people having issues in ghostscript >9.50 add --permit-file-read=input.pdf

I tried to make this script:
gswin32c ^
-q ^
-c "(input.pdf) (r) file runpdfbegin pdfpagecount =" ^
-f input.pdf
work in a c# wrapped solution and kept getting error "/undefinedfilename". In this case ensure that your filepath has Slashes "/" as DirectorySeperator and not Backslashes "\". I know Kurt Pfeifle already wrote it, but it happened to me i just overlooked it.

In Windows systems:
"path to gs exec" -q -dNODISPLAY -dNOSAFER --permit-file-read="path to
your file" -c "(""path to your file"") (r) file runpdfbegin
pdfpagecount = quit"
Remarks:
Just change where is 'path to...' with your path, leave the rest as is.
On the -c path you must use double slashes or unix like ones. Ex: C:\\youfile.pdf (good), C:/youfile.pdf (good), C:\yourfile.pdf (bad).
Example:
path: C:\Temp\Some Folder\myFile.pdf
gs path: C:\Temp\Some Folder\gs\bin\gswin64c.exe
path -c 1: C:\\Temp\\Some Folder\\myFile.pdf
path -c 2: C:/Temp/Some Folder/myFile.pdf
Commands:
"C:\Temp\Some Folder\gs\bin\gswin64c.exe" -q -dNODISPLAY -dNOSAFER --permit-file-read="C:\Temp\Some Folder\myFile.pdf" -c "(""C:\\Temp\\Some Folder\\myFile.pdf"") (r) file runpdfbegin pdfpagecount = quit"
"C:\Temp\Some Folder\gs\bin\gswin64c.exe" -q -dNODISPLAY -dNOSAFER --permit-file-read="C:\Temp\Some Folder\myFile.pdf" -c "(""C:/Temp/Some Folder/myFile.pdf"") (r) file runpdfbegin pdfpagecount = quit"

To sum up some of the above separate comments for windows users to avoid needing to alter between / and \\ , to show the total number of pages can be set as a shortcut for drag and drop or "sendTo", by first switching to a working directory.
#echo off & cd /d "%~dp1" & "C:\path to gs\bin\gs.exe" -q --permit-file-read="%~nx1" -c "(%~nx1) (r) file runpdfbegin pdfpagecount = quit" & pause
where gs.exe is one of the windows c(onsole) variants gswin32c.exe or gswin64c.exe
The cd /c "%~dp1" will switch console to quoted file drive path
The full quoted path to "GSwin..c.exe" calls it safely and remotely
-q will suppress (not show) the start message
since version 9.5+ the --permit-file-read="file name" is advised / required
-c "(%~nx1) does not need the quotes for name.xtension
if running a cmd as a shortcut, pause is required to see the result
beware only use on files you trust as your overriding GS -dSAFER restrictions.

Related

How to merge all PDF's in a directory with ghostscript

How can I read the files contained in the directory d:\ with batchscript not one by one files like that. I have tried the following:
#echo off
"C:\Program Files\gs\gs9.25\bin\gswin32c.exe" -sDEVICE=pdfwrite -dCompatibilityLevel=1.3 -dPDFSETTINGS=/printer -dColorImageResolution=90 -dAutoRotatePages=/None -dBATCH -dNOPAUSE -sOutputFile=d:\d\koran.pdf *d:\a\01.pdf d:\a\02.pdf d:\a\03.pdf d:\a\04.pdf d:\a\05.pdf d:\a\06.pdf d:\a\07.pdf d:\a\08.pdf d:\a\09.pdf d:\a\10.pdf d:\a\11.pdf d:\a\12.pdf d:\a\13.pdf d:\a\14.pdf d:\a\15.pdf d:\a\16.pdf d:\a\17.pdf d:\a\18.pdf d:\a\19.pdf d:\a\20.pdf d:\a\21.pdf d:\a\22.pdf d:\a\23.pdf d:\a\24.pdf*
exit
Ghostscript doesn't 'merge' PDF files. It creates new PDF files by interpreting the contents of its input, this is not the same thing. You should read the documentation here
You haven't said what the problem is with the command you have tried, its going to be hard to help you if you don't do that.
The most likely problem is that you have put * characters at the start and end of the input filenames. Ghostscript itself doesn't match wildcards, it expects you to tell it each file you want to process individually. So in order to process a directory of files is to first get a list of all the files, and then tell Ghostscritp to use each of those files in turn.
You can use the Ghostscript #filename syntax (documented here)to tell Ghostscript to use the contents of a file as if it were the command line.
So all you need to do is come up with a shell script which will write the filenames from a folder into a file. That's not a Ghostscript question, and depends totally on the operating system you are using.
For Windows something like:
dir /B *.pdf >> files.txt
gswin32c -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -sOutputFile=\temp\out.pdf #files.txt
del files.txt
might be sufficient for your needs.
I could not make it work using "files.txt" but I am using this and everything works just fine.
gswin64c -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -sOutputFile="out.pdf" (Get-ChildItem -Path .\*.pdf)
You can execute like below:
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=merged.pdf *.pdf
Reference:
https://gist.github.com/moaazsidat/b94185e9cfdba9e3cfb5bc90407e6397

Redirect ghostscript Output

I use the following command (in a Windows cmd) to decrypt pdf files stored under the directory C:\Users\David\Desktop\BS1999\ and write the output into the same folder using ghostscript:
FOR %x IN (C:\Users\David\Desktop\BS1999\*.pdf) DO gswin64c -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=%x_converted.pdf -c .setpdfwrite -f %x
So in short I have
For %x IN (*.pdf) DO my-ghostscript-function %x
What modifications do I have to make to redirect the output ("name-of-file"_converted.pdf) to another file path (for example C:\Users\David\Desktop\test)?
Thanks in advance!
Best, David

Convert PDF to PCL using Ghostscript 9.15

Requirement is to convert PDF to PCL with a macro embedded (currently testing this on Windows, however I will need to use this runtime in the application and print it from UNIX). The macro will be used later in another document to embed this cropped image and printed on one single page. I will be using PCL escape codes to call the MacroNumber and then the image will be printed. (You can consider this as a logo image.)
I am able to convert the PDF with whitespace to just the PDF without any whitespace by using CropBox.
"c:\progra~1\gs\gs9.15\bin\gswin64.exe" -o _sourcePDFcropped.pdf \
-sDEVICE=pdfwrite -c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f _sourcePDF.pdf
However, when I convert this _sourcePDFcropped.pdf to PCL, this still adding whitespace.
"c:\progra~1\gs\gs9.15\bin\gswin64c.exe" -dBATCH -dNOPAUSE \
-sDEVICE=pxlcolor -g100x200 -sOutputFile=_sourceFedGroundCroppedTest.pcl \
-f _sourceFedGroundCropped.pdf
I tried using MKPCL and it does the job. Because it doesn't have much support, I am trying to use Ghostscript.
MKPCL.EXE -c4 -t -m 100 -p Image.jpg Image.MAC
I also tried ImageMagick which internally uses Ghostscript. So I am guessing, if I use the right switches in GS, I should be able to achieve my goal.
Input PDF File: Click Here
P.S: I have seen other PDF to PCL queries on Stackoverflow, others are more of straight forward PDF to PCL. Mine is to crop the PDF and output should be PCL.
Question continued: Link
I processed the sample input PDF with the following command line, using a self-compiled Ghostscript v9.16 (unreleased, from current GhostPDL GIT sources):
gs -o - \
-sDEVICE=pdfwrite \
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f source.pdf \
\
| gs -o tst.pcl \
-sDEVICE=pxlcolor \
-dUseCropBox \
-f -
(As you may well have noticed, I'm connecting 2 different Ghostscript commands through a pipe in order to save writing a temporary PDF file to disk.)
If you want to do the same on Windows, the command line in a cmd.exe/DOS box would be:
gswin64c.exe -o - ^
-sDEVICE=pdfwrite ^
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" ^
-f source.pdf ^
^
| gswin64c.exe -o tst.pcl ^
-sDEVICE=pxlcolor ^
-dUseCropBox ^
-f -
Then I opened it with the self-compiled PCL viewer (also from GhostPDL sources), pcl6:
pcl6 tst.pcl
This is a screenshot showing the pcl6 window:
As KenS also pointed out: it is important to use -dUseCropBox when processing the cropped PDF intermediate data!
Adding a CropBox doesn't really do much, it leaves the PDF exactly the same, but adds a CropBox entry for the page. GS will usually use the MediaBox, not the CropBox, so adding a CropBox to a PDF has no effect.
You could try adding -dUseCropBox. If the white space you think is being added is in fact present in the original PDF, but masked by the CropBox, then using -dUseCropBox will have GS use the CropBox when rendering the PDF.

Merged PDFs Blank

Trying to merge all pdfs in a directory using GhostScript 9.06 64bit in a .bat file
The following, makes merged.pdf, but it is 1 page and blank
call gswin64c -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf *.pdf
If I actually specify which PDFs to merge it works fine. What gives?
call gswin64c -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf 1.pdf 2.pdf 3.pdf
You can't specify wildcards on the Ghostscript command line, simple as that.
Since GS didn't find a file called '*.pdf' it didn't execute any marking operations, in this case you get a blank file.
Ghostscript cannot do wildcard expansions by itself.
If you call gs ... *.pdf from inside a shell which can do wildcard expansion, it will work nevertheless.
There is a difference with the site you linked to and the code you used above:
Your code is DOS batch and uses call gswin64c .... But as said, Ghostscript cannnot expand wildcards itself.
The code in the linked web page is Unix shell, which does the wildcard expansion before Ghostscript gets to see its own commandline. When Ghostscript gets to see it, the wildcard expansion has happened already.
You have to find a solution for your batch file where you first store your (expanded) *.pdf file names in a variable %mypdfs% and then do call gswin64c ... %mypdfs%.
you can't specify the wildcard from the command line, but you can make gswin32c run a command file.
as the 'command file' just requires switches to be separated by any amount of white space (space, tab, line break), and there is no limit on the size of the file, we can make a file that does what you need
echo -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=merged.pdf > files.gsx
dir *.pdf /b >> files.gsx
once this file files.gsx has been created, then you can make your file using
gswin32c #files.gsx
and all the files will be merged
I did the following to solve this:
1.) dir /B *.pdf > do.bat
2.) opened do.bat with notepad to replace \r\n with spaces
3.) inserted: c:\Programs\gs\gs9.07\bin\gswin64 -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=merged.pdf at the beginning
and then executed do.bat
VoilĂ 

Ghostscript loses font while extracting the page from PDF

I split PDF into pages with help of usable command line:
for G in $(seq 1 $(pdfinfo 47.pdf | sed -n 's/Pages:[^0-9]*\([0-9]*\).*/\1/p')) ; do
gs \
-dSAFER \
-sDEVICE=pdfwrite \
-dBATCH \
-dNOPAUSE \
-dFirstPage=$G \
-dLastPage=$G \
-o $G.pdf \
47.pdf ;
done
But some pages appears without text (Graphics are still present)
So, I have tried to extract embedded font from PDF:
gs -q -dNODISPLAY extractFonts.ps -c "(47.pdf) extractFonts quit"
These fonts I have installed in system Fonts folder.
After that, I have repeat splitting and no changes were happened.
How-to be sure that pages will be extracting correctly, I have no idea now.
Ghostscript and pdfwrite are not actually intended for the purpose of splitting PDF files up, there are other tools which will probably work better, why not try pdftk ?
If you really want to use Ghostscript then I would advise you to get hold of the latest bleeding-edge code from the Git repository, in that code the pdfwrite device will accept an output file name containing a '%d' and will write one file per page.
Beyond that, it seems most likely to me that you are simply experiencing a bug, rather than 'losing the font', if the font was missing the text would still be ther but in a differnt font. Which version of GS are you using ?