I want to extract the first page of a PDF as PNG to do some image processing on it with this command:
$ gs -q -dNOPAUSE -dBATCH -dEPSCrop -sDEVICE=pngalpha -dLastPage=1 -sOutputFile='test.png' 'test2.pdf'
It works well for most PDFs but it adds a transparent margin on this one: http://ubuntuone.com/23676W4TJPyX6W2pkp5guG
Gimp does it as expected (no margin), convert has the same issue, -sDEVICE=jpeg also.
Is there any way to avoid it ?
Ghostscript doesn't add margins, and it certainly doesn't add transparent ones. THe problem is not with Ghostscript, its with your PDF file. You file contains:
/MediaBox [0 0 595 842]
/CropBox [27.5 61.0 567.5 781.0]
Ghostscript uses the MediaBox, other viewers may or may not use the CropBox. If you read the GS documentation you will find the -dUseCropBox switch which directs GS to use the CropBox of a PDF file instead of the MediaBox when setting the media size.
-dEPSCrop isn't going to do anything at all with a PDF file.
For the record, If anyone runs into the same issue, I just found the proper switch: -dUseCropBox. The final command is now:
$ gs -q -dUseCropBox -dNOPAUSE -dBATCH -sDEVICE=pngalpha -dLastPage=1 -sOutputFile='test.png' 'test2.pdf'
Related
The application receives PDF with content that does not cover the entire page. Co-ordinates and dimensions of the content is also sent along with that data. Below is the sample command to create cropped.png from target.pdf where the content starts (X: 179, Y: 212) and the size (W: 600, H: 400).
gswin64c -q -sDEVICE=png16m -o "cropped.png" -dNumRenderingThreads=4 -dSAFER -dBATCH -dNOPAUSE -dLastPage=1 -c "<< /PageOffset [-179 -212] /Page >> setpagedevice" -r144 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -f "target.pdf"
It creates the PNG with the correct offset, however I could not find a way to define the area of the PDF that should be captured. In other words, the output must be a PNG that only contains the content within the given box.
GhostScript V9.22 is installed on Windows 10. I have ImageMagicK at my disposal too.
Is there a way to achieve this? Or am I approaching this incorrectly? Any thoughts would be greatly appreciated!
Thank you
You've offset the origin, but you haven't specified the media size, so the media size will be whatever it was in the original PDF file.
You need to add -dDEVICEWIDTHPOINTS=600 -dDEVICEHEIGHTPOINTS=400 -dFIXEDMEDIA, assuming that the media size you've given is correct. Obviously I can't tell where the actual white space is without seeing the original!
At 144 dpi there's no point in using -dNumRenderingThreads, realistically that's only useful at reasonably high resolution, at this resolution you'll just slow it down.
This:
-c "<< /PageOffset [-179 -212] /Page >> setpagedevice" -r144 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -f
Is incorrect, and I'm surprised it doesn't throw an error. When you specify -c everything after that, until the -f is treated as PostScript. The -r144 etc are not valid PostScript and I'd expect it to throw an error. You would be much better to move the -c "<< /PageOffset [-179 -212] /Page >> setpagedevice" -f to immediately before the input filename.
So I'd suggest your command line should be:
gswin64c -q -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -o "cropped.png" -dLastPage=1 -r144 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -dDEVICEWIDHTPOINTS=600 -dDEVICEHEIGHTPOINTS=400 -dFIXEDMEDIA -c "<< /PageOffset [-179 -212] /Page >> setpagedevice" -f "target.pdf"
inputPdf
Use gswin32c.exe -o nul -sDEVICE=bbox bbox.pdf,I'v hnow the BoundingBox of this pdf is
%%BoundingBox: 6292 6865 8108 7535
%%HiResBoundingBox: 6292.907808 6865.505790 8107.091753 7534.493770,
I want to get a pdf with the content in the BoundingBox.
I am using the following command to crop a PDF:
gswin32c -sDEVICE=pdfwrite -dFirstPage=1 -dLastPage=1 -o croped.pdf -dDEVICEWIDTHPOINTS=1815 -dDEVICEHEIGHTPOINTS=670 -dFIXEDMEDIA -c "6292 6865 translate 6292 6865 8107 7534 rectclip" -f bbox.pdf
or
gswin32c -dQUIET -dBATCH -dNOPAUSE -dNOPROMPT -sDEVICE=pdfwrite -dFirstPage=1 -dLastPage=1 -o croped.pdf -dDEVICEWIDTHPOINTS=1815 -dDEVICEHEIGHTPOINTS=670 -dFIXEDMEDIA -c "<</PageOffset [6292 6865]>> setpagedevice" -f bbox.pdf
i'v a blank pdf file.
this command
gswin32c.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=croped.pdf -c "[/CropBox [6292.907808 6865.505790 8107.091753 7534.493770] /PAGES pdfmark" -f bbox.pdf
i'v a original file.
How can i crop this pdf correctly.
Thanks very much!
The BoundingBox looks suspicious to me.
In any event you cannot trivially do what you are trying to do with Ghostscript, because the PDF interpreter uses the information in the PDF file to set the media size.
The first two command lines 'might' work, but you've translated the CTM in the wrong direction. You've moved the origin (0,0) from the bottom left, up and right. That's moved the content of the page further off the media, which is why you get a blank page. You could try using the same values, but negated, so that the origin moves down and left. From the BoundingBox you quoted, that's the correct direction.
gswin32c -sDEVICE=pdfwrite -dFirstPage=1 -dLastPage=1 -o croped.pdf -dDEVICEWIDTHPOINTS=1816 -dDEVICEHEIGHTPOINTS=670 -dFIXEDMEDIA -c "-6292 -6865 translate" -f bbox.pdf
You don't need the rectclip, because the content is already clipped to the page.
The third command line would also work, except that you've set the CropBox before processing the PDF file, so the PDF interpreter reads the CropBox from the PDF file and overwrites the one you set. Try setting it after the input file.
gswin32c.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=croped.pdf bbox.pdf -c "[/CropBox [6292.907808 6865.505790 8107.091753 7534.493770] /PAGES pdfmark" -f
[EDIT]
OK so the reason the first command lines doesn't work is (as I suspected) because the PDF interpreter resets the graphics state before running the PDF, so it simply throws away the 'translate'.
The second command line works perfectly well for me if you negate the operands in the array for PageOffset:
gswin32c -sDEVICE=pdfwrite -sOutputFile=\temp\out.pdf -dDEVICEWIDTHPOINTS=1815 -dDEVICEHEIGHTPOINTS=670 -dFIXEDMEDIA -c "<</PageOffset [-6292 -6865]>>setpagedevice" -f D:\Users\ken\Downloads\bbox.pdf
The third command line doesn't work because it sets the CropBox for all Pages, which is a default and can be overridden by setting a CropBox on each page. Your original PDF file contains a CropBox (identical to the MediaBox) which is preserved by the PDF interpreter, so the PAGES CropBox is overridden by the CropBox specific to the page.
But the command line above worked fine for me.
I have a ghostscript command that converts a pdf into several JPG images (one for every page). The command arguments are as follows:
-q -dUseCropBox -dFirstPage=1 -dLastPage=1 -dBATCH -dDOINTERPOLATE -dNOPAUSE -dSAFER -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -dPrinted=false -r250 -sDEVICE=jpeg -dJPEGQ=100 -sOutputFile=output.jpg input.pdf -c quit
The pdf size is 1.5mb but in the JPGE images it becomes huge (~15MB) with dimension 8829 *15551
if I change the resolution in the ghostscript command to -r150 the page size is correct but the image quality is very rastorized.
Is there another way to decrease the image size of the image without affecting the image quality?
Thanks
I want to convert a bunch of .eps images to a single PDF using Ghostscript.
But when I look at the PDF file in a PDF viewer and set view to 100% to physical size of the file is huge!
I would like to force gs to create the PDF in letter size, but everything I tried failed.
Here's the command I'm using:
gs -dBATCH -dNOPAUSE -dEPSFitPage -dEPSCrop \
-q -sDEVICE=pdfwrite -sOutputFile=out.pdf \
file1.eps file2.eps
All my attempts with -sPAPERSIZE=legal and -dDEVICEWIDTHPOINTS=w -dDEVICEHEIGHTPOINTS=h had no effect.
-dEPSFitPage and -dEPSCrop are mutually exclusive. Try getting rid of the -dEPSCrop and putting back -sPAPERSIZE=legal. If that doesn't work, it is probably because the .eps file is over-riding the media, so try adding -dFIXEDMEDIA.
BTW, this answer is cribbed from:
Fit to page size in ghostscript (with a possibly corrupt input)
The problem was -dEPSFitPage it was fitting the page size to the .eps file size... using -dPDFFitPage (and skipping the mutually exclusive -dEPSCrop) solved my problem.
gs -dBATCH -dNOPAUSE -sPAPERSIZE=letter \
-dPDFFitPage -q -sDEVICE=pdfwrite \
-sOutputFile=out.pdf \
file1.eps file2.eps
I'm fitting a file with no margins (produced using a pdfcrop from a normal PDF file) to a given paper size using GhostScript:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dFIXEDMEDIA \
-dPDFFitPage -d -dBATCH -dQUIET -dNOPAUSE -dDEVICEWIDTHPOINTS=864 \
-dDEVICEHEIGHTPOINTS=612 -sOutputFile=$INPUT $OUTPUT
but the output has additional margins (I was cropping in order to get rid of them).
Is it possible to force GhostScript to produce output without these margins?
Without seeing your file I cannot be certain, however I suspect that all you have done is set a /CropBox in the PDF file. By default Ghostscript uses the /MediaBox which is probably unchanged.
Try setting -dUseCropBox