Ghostscript loses font while extracting the page from PDF - pdf

I split PDF into pages with help of usable command line:
for G in $(seq 1 $(pdfinfo 47.pdf | sed -n 's/Pages:[^0-9]*\([0-9]*\).*/\1/p')) ; do
gs \
-dSAFER \
-sDEVICE=pdfwrite \
-dBATCH \
-dNOPAUSE \
-dFirstPage=$G \
-dLastPage=$G \
-o $G.pdf \
47.pdf ;
done
But some pages appears without text (Graphics are still present)
So, I have tried to extract embedded font from PDF:
gs -q -dNODISPLAY extractFonts.ps -c "(47.pdf) extractFonts quit"
These fonts I have installed in system Fonts folder.
After that, I have repeat splitting and no changes were happened.
How-to be sure that pages will be extracting correctly, I have no idea now.

Ghostscript and pdfwrite are not actually intended for the purpose of splitting PDF files up, there are other tools which will probably work better, why not try pdftk ?
If you really want to use Ghostscript then I would advise you to get hold of the latest bleeding-edge code from the Git repository, in that code the pdfwrite device will accept an output file name containing a '%d' and will write one file per page.
Beyond that, it seems most likely to me that you are simply experiencing a bug, rather than 'losing the font', if the font was missing the text would still be ther but in a differnt font. Which version of GS are you using ?

Related

How to trim unwanted text in PDF?

I can crop PDF (for example in Acrobat).
But text outside of the crop margin will still be maintained in the PDF (even though I don't see it in the viewable area).
I want to remove anything outside the crop margin. Is there a command line tool that can do so?
Ghostscript can do that. Ghostscript is a command line tool which is available for all major operating systems.
The command which does it for Linux or Mac OS X:
gs -o cropped-and-removed.pdf \
-sDEVICE=pdfwrite \
-dUseCropBox \
in.pdf
The command for Windows:
gswin64c.exe -o cropped.pdf ^
-sDEVICE=pdfwrite ^
-dUseCropBox ^
in.pdf
Be sure to use a rather recent version of Ghostscript. Current is v9.16.

Convert PDF to PCL using Ghostscript 9.15

Requirement is to convert PDF to PCL with a macro embedded (currently testing this on Windows, however I will need to use this runtime in the application and print it from UNIX). The macro will be used later in another document to embed this cropped image and printed on one single page. I will be using PCL escape codes to call the MacroNumber and then the image will be printed. (You can consider this as a logo image.)
I am able to convert the PDF with whitespace to just the PDF without any whitespace by using CropBox.
"c:\progra~1\gs\gs9.15\bin\gswin64.exe" -o _sourcePDFcropped.pdf \
-sDEVICE=pdfwrite -c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f _sourcePDF.pdf
However, when I convert this _sourcePDFcropped.pdf to PCL, this still adding whitespace.
"c:\progra~1\gs\gs9.15\bin\gswin64c.exe" -dBATCH -dNOPAUSE \
-sDEVICE=pxlcolor -g100x200 -sOutputFile=_sourceFedGroundCroppedTest.pcl \
-f _sourceFedGroundCropped.pdf
I tried using MKPCL and it does the job. Because it doesn't have much support, I am trying to use Ghostscript.
MKPCL.EXE -c4 -t -m 100 -p Image.jpg Image.MAC
I also tried ImageMagick which internally uses Ghostscript. So I am guessing, if I use the right switches in GS, I should be able to achieve my goal.
Input PDF File: Click Here
P.S: I have seen other PDF to PCL queries on Stackoverflow, others are more of straight forward PDF to PCL. Mine is to crop the PDF and output should be PCL.
Question continued: Link
I processed the sample input PDF with the following command line, using a self-compiled Ghostscript v9.16 (unreleased, from current GhostPDL GIT sources):
gs -o - \
-sDEVICE=pdfwrite \
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f source.pdf \
\
| gs -o tst.pcl \
-sDEVICE=pxlcolor \
-dUseCropBox \
-f -
(As you may well have noticed, I'm connecting 2 different Ghostscript commands through a pipe in order to save writing a temporary PDF file to disk.)
If you want to do the same on Windows, the command line in a cmd.exe/DOS box would be:
gswin64c.exe -o - ^
-sDEVICE=pdfwrite ^
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" ^
-f source.pdf ^
^
| gswin64c.exe -o tst.pcl ^
-sDEVICE=pxlcolor ^
-dUseCropBox ^
-f -
Then I opened it with the self-compiled PCL viewer (also from GhostPDL sources), pcl6:
pcl6 tst.pcl
This is a screenshot showing the pcl6 window:
As KenS also pointed out: it is important to use -dUseCropBox when processing the cropped PDF intermediate data!
Adding a CropBox doesn't really do much, it leaves the PDF exactly the same, but adds a CropBox entry for the page. GS will usually use the MediaBox, not the CropBox, so adding a CropBox to a PDF has no effect.
You could try adding -dUseCropBox. If the white space you think is being added is in fact present in the original PDF, but masked by the CropBox, then using -dUseCropBox will have GS use the CropBox when rendering the PDF.

How to remove right and left margins from a PDF file using Ghostscript or any other command line tool?

I've been trying several GS commands to remove the margins from right and left side of a PDF file such as:
gs \
-q -dNOPAUSE -dBATCH \
-sDEVICE=pdfwrite \
-dSAFER \
-dCompatibilityLevel=1.3 \
-dPDFSETTINGS=/printer \
-dSubsetFonts=true \
-dEmbedAllFonts=true \
-sPAPERSIZE=a4 \
-sOutputFile=d:\\ghost\\gs\\bin\\shiftedgulf.pdf \
-c <</BeginPage{0.9 0.9 scale 29.75 42.1 translate}>> setpagedevice \
-f d:\\ghost\\gs\\bin\\gulf.pdf"
but its like nothing is happening, my question is there any effective, direct and clear way to achieve this ?
Maybe this questions is duplicated but I tried most of the scripts and none of them is giving me any result, any other command line tool might be suggested is fine as well.
PDF files don't have 'margins'. The content is placed on the page, which may leave white space at the edges of the media, but its not a margin as such.
I'd need to see the PDF file to have any chance of figuring out what you're trying to achieve, and why what you are doing doesn't work. Setting the PAPERSIZE to A4 seems like a bad start though. You probably want to set a specific medi asize and set -dFIXEDMEDIA so that the PDF interpreter doesn't overrride it.
You may want to study this other Stackoverflow answer to a similar question
PDF - Remove White Margins
and you'll probably be able to achieve what you want....
Thank you all for the answers i found very easy and direct to the point tool, it called briss all you need is downloading the JAR briss-0.0.14 and run the command :
java -jar briss-0.0.14.jar -s original.pdf -d cropped.pdf -c 0.11/0.08/0.11/0.08:0.11/0.08/0.11/0.08
and thats all :)

Remove PDF margins while using PDFFitPage in GhostScript

I'm fitting a file with no margins (produced using a pdfcrop from a normal PDF file) to a given paper size using GhostScript:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dFIXEDMEDIA \
-dPDFFitPage -d -dBATCH -dQUIET -dNOPAUSE -dDEVICEWIDTHPOINTS=864 \
-dDEVICEHEIGHTPOINTS=612 -sOutputFile=$INPUT $OUTPUT
but the output has additional margins (I was cropping in order to get rid of them).
Is it possible to force GhostScript to produce output without these margins?
Without seeing your file I cannot be certain, however I suspect that all you have done is set a /CropBox in the PDF file. By default Ghostscript uses the /MediaBox which is probably unchanged.
Try setting -dUseCropBox

GhostScript alternative

Im currently using CentOS 5.6 (Ghostscript 8 - ImageMagick-6.2.8 )
and im trying to convert the first image of the pdf to a jpg file.
I understand that my current setup is unable to convert compressed pdf files, but is there an alternative that it can use with the same functionality?
The 'understanding' that Ghostscript is unable to convert 'compressed PDF' is wrong. Where did you pick it up?
PDF by default uses compression internally for most its objects. It's rather unusual to find a PDF 'in the wild' which is completely uncompressed.
Which exact version of Ghostscript are you using? (Try gs -v).
BTW, you do not need ImageMagick to convert (multipage) PDF to a series of JPEGs. Try this command:
gs \
-o img_%03d.jpeg \
-sDEVICE=jpeg \
input.pdf
or, for a resolution of 300 dpi (instead of the default 72 dpi):
gs \
-o img_%03d.jpeg \
-sDEVICE=jpeg \
-r300 \
input.pdf
The _%03d-part of the output filename will attach a 3-digit number to the img-name that increments with each PDF page.