How to trim unwanted text in PDF? - pdf

I can crop PDF (for example in Acrobat).
But text outside of the crop margin will still be maintained in the PDF (even though I don't see it in the viewable area).
I want to remove anything outside the crop margin. Is there a command line tool that can do so?

Ghostscript can do that. Ghostscript is a command line tool which is available for all major operating systems.
The command which does it for Linux or Mac OS X:
gs -o cropped-and-removed.pdf \
-sDEVICE=pdfwrite \
-dUseCropBox \
in.pdf
The command for Windows:
gswin64c.exe -o cropped.pdf ^
-sDEVICE=pdfwrite ^
-dUseCropBox ^
in.pdf
Be sure to use a rather recent version of Ghostscript. Current is v9.16.

Related

Convert PDF to PCL using Ghostscript 9.15

Requirement is to convert PDF to PCL with a macro embedded (currently testing this on Windows, however I will need to use this runtime in the application and print it from UNIX). The macro will be used later in another document to embed this cropped image and printed on one single page. I will be using PCL escape codes to call the MacroNumber and then the image will be printed. (You can consider this as a logo image.)
I am able to convert the PDF with whitespace to just the PDF without any whitespace by using CropBox.
"c:\progra~1\gs\gs9.15\bin\gswin64.exe" -o _sourcePDFcropped.pdf \
-sDEVICE=pdfwrite -c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f _sourcePDF.pdf
However, when I convert this _sourcePDFcropped.pdf to PCL, this still adding whitespace.
"c:\progra~1\gs\gs9.15\bin\gswin64c.exe" -dBATCH -dNOPAUSE \
-sDEVICE=pxlcolor -g100x200 -sOutputFile=_sourceFedGroundCroppedTest.pcl \
-f _sourceFedGroundCropped.pdf
I tried using MKPCL and it does the job. Because it doesn't have much support, I am trying to use Ghostscript.
MKPCL.EXE -c4 -t -m 100 -p Image.jpg Image.MAC
I also tried ImageMagick which internally uses Ghostscript. So I am guessing, if I use the right switches in GS, I should be able to achieve my goal.
Input PDF File: Click Here
P.S: I have seen other PDF to PCL queries on Stackoverflow, others are more of straight forward PDF to PCL. Mine is to crop the PDF and output should be PCL.
Question continued: Link
I processed the sample input PDF with the following command line, using a self-compiled Ghostscript v9.16 (unreleased, from current GhostPDL GIT sources):
gs -o - \
-sDEVICE=pdfwrite \
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f source.pdf \
\
| gs -o tst.pcl \
-sDEVICE=pxlcolor \
-dUseCropBox \
-f -
(As you may well have noticed, I'm connecting 2 different Ghostscript commands through a pipe in order to save writing a temporary PDF file to disk.)
If you want to do the same on Windows, the command line in a cmd.exe/DOS box would be:
gswin64c.exe -o - ^
-sDEVICE=pdfwrite ^
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" ^
-f source.pdf ^
^
| gswin64c.exe -o tst.pcl ^
-sDEVICE=pxlcolor ^
-dUseCropBox ^
-f -
Then I opened it with the self-compiled PCL viewer (also from GhostPDL sources), pcl6:
pcl6 tst.pcl
This is a screenshot showing the pcl6 window:
As KenS also pointed out: it is important to use -dUseCropBox when processing the cropped PDF intermediate data!
Adding a CropBox doesn't really do much, it leaves the PDF exactly the same, but adds a CropBox entry for the page. GS will usually use the MediaBox, not the CropBox, so adding a CropBox to a PDF has no effect.
You could try adding -dUseCropBox. If the white space you think is being added is in fact present in the original PDF, but masked by the CropBox, then using -dUseCropBox will have GS use the CropBox when rendering the PDF.

Ghostscript: How to convert EPS to PDF with the same page size

I want to convert a EPS figure to a PDF figure with the same width and height.
The following command:
gswin32 -dSAFER -dNOPLATFONTS -dNOPAUSE -dBATCH \
-sDEVICE=pdfwrite -sPAPERSIZE=letter -dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/printer -dCompatibilityLevel=1.4 -dMaxSubsetPct=100 \
-dSubsetFonts=true -dEmbedAllFonts=true -sOutputFile="test.pdf" \
-f "test.eps"
only produces a PDF file with the page size of a letter.
Any help would be much appreciated.
Here is the test EPS file: https://dl.dropboxusercontent.com/u/45318932/test.eps
EPS files cannot contain a media size request. In the absence of any media size request Ghostscript uses the default.
However.....
From the documentation:
http://www.ghostscript.com/doc/9.15/Use.htm#EPS_parameters
-dEPSCrop :
Crop an EPS file to the bounding box. This is useful when converting an EPS file to a bitmap.
To make KenS' answer more explicit, using the test.eps sample file you linked to... the following command will suffice to do what you want:
gswin32 \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/printer \
-dEPSCrop \
-o test.pdf \
test.eps
The -o test.pdf is (for not too ancient versions of Ghostscript!) shorthand for -dNOPAUSE -dBATCH -sOutputFile=test.pdf.
Your test.eps uses a font named /SHZENL+Tahoma_00. Ghostscript will automatically embed this font, and it will be a subset by default (the prefix SHZENL may change in the PDF, though).
Here is a screenshot from the page the command from your question created. That page is 612 x 792 pts (letter size):
Here is the screenshot from the page the command given in my answer created. Its page size is 360 x 216 pts:

How to set the physical size of PDF pages with Ghostscript?

I want to convert a bunch of .eps images to a single PDF using Ghostscript.
But when I look at the PDF file in a PDF viewer and set view to 100% to physical size of the file is huge!
I would like to force gs to create the PDF in letter size, but everything I tried failed.
Here's the command I'm using:
gs -dBATCH -dNOPAUSE -dEPSFitPage -dEPSCrop \
-q -sDEVICE=pdfwrite -sOutputFile=out.pdf \
file1.eps file2.eps
All my attempts with -sPAPERSIZE=legal and -dDEVICEWIDTHPOINTS=w -dDEVICEHEIGHTPOINTS=h had no effect.
-dEPSFitPage and -dEPSCrop are mutually exclusive. Try getting rid of the -dEPSCrop and putting back -sPAPERSIZE=legal. If that doesn't work, it is probably because the .eps file is over-riding the media, so try adding -dFIXEDMEDIA.
BTW, this answer is cribbed from:
Fit to page size in ghostscript (with a possibly corrupt input)
The problem was -dEPSFitPage it was fitting the page size to the .eps file size... using -dPDFFitPage (and skipping the mutually exclusive -dEPSCrop) solved my problem.
gs -dBATCH -dNOPAUSE -sPAPERSIZE=letter \
-dPDFFitPage -q -sDEVICE=pdfwrite \
-sOutputFile=out.pdf \
file1.eps file2.eps

Ghostscript loses font while extracting the page from PDF

I split PDF into pages with help of usable command line:
for G in $(seq 1 $(pdfinfo 47.pdf | sed -n 's/Pages:[^0-9]*\([0-9]*\).*/\1/p')) ; do
gs \
-dSAFER \
-sDEVICE=pdfwrite \
-dBATCH \
-dNOPAUSE \
-dFirstPage=$G \
-dLastPage=$G \
-o $G.pdf \
47.pdf ;
done
But some pages appears without text (Graphics are still present)
So, I have tried to extract embedded font from PDF:
gs -q -dNODISPLAY extractFonts.ps -c "(47.pdf) extractFonts quit"
These fonts I have installed in system Fonts folder.
After that, I have repeat splitting and no changes were happened.
How-to be sure that pages will be extracting correctly, I have no idea now.
Ghostscript and pdfwrite are not actually intended for the purpose of splitting PDF files up, there are other tools which will probably work better, why not try pdftk ?
If you really want to use Ghostscript then I would advise you to get hold of the latest bleeding-edge code from the Git repository, in that code the pdfwrite device will accept an output file name containing a '%d' and will write one file per page.
Beyond that, it seems most likely to me that you are simply experiencing a bug, rather than 'losing the font', if the font was missing the text would still be ther but in a differnt font. Which version of GS are you using ?

Remove PDF margins while using PDFFitPage in GhostScript

I'm fitting a file with no margins (produced using a pdfcrop from a normal PDF file) to a given paper size using GhostScript:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dFIXEDMEDIA \
-dPDFFitPage -d -dBATCH -dQUIET -dNOPAUSE -dDEVICEWIDTHPOINTS=864 \
-dDEVICEHEIGHTPOINTS=612 -sOutputFile=$INPUT $OUTPUT
but the output has additional margins (I was cropping in order to get rid of them).
Is it possible to force GhostScript to produce output without these margins?
Without seeing your file I cannot be certain, however I suspect that all you have done is set a /CropBox in the PDF file. By default Ghostscript uses the /MediaBox which is probably unchanged.
Try setting -dUseCropBox