Im currently using CentOS 5.6 (Ghostscript 8 - ImageMagick-6.2.8 )
and im trying to convert the first image of the pdf to a jpg file.
I understand that my current setup is unable to convert compressed pdf files, but is there an alternative that it can use with the same functionality?
The 'understanding' that Ghostscript is unable to convert 'compressed PDF' is wrong. Where did you pick it up?
PDF by default uses compression internally for most its objects. It's rather unusual to find a PDF 'in the wild' which is completely uncompressed.
Which exact version of Ghostscript are you using? (Try gs -v).
BTW, you do not need ImageMagick to convert (multipage) PDF to a series of JPEGs. Try this command:
gs \
-o img_%03d.jpeg \
-sDEVICE=jpeg \
input.pdf
or, for a resolution of 300 dpi (instead of the default 72 dpi):
gs \
-o img_%03d.jpeg \
-sDEVICE=jpeg \
-r300 \
input.pdf
The _%03d-part of the output filename will attach a 3-digit number to the img-name that increments with each PDF page.
Related
LibreOffice Draw allows you to open a non PDF/A file and export this a PDF/A-1b or PDF/A-2b file.
The same is possible from the command line by calling on macOS
/Applications/LibreOffice.app/Contents/MacOS/soffice --headless \
--convert-to pdf:draw_pdf_Export \
--outdir ./pdfout \
./input-non-pdfa.pdf
or an a Linux simply
libreoffice --headless \
--convert-to pdf:draw_pdf_Export \
--outdir ./pdfout \
./input-non-pdfa.pdf
On the command line it is possible to tell the convert-to to create a pdf and use LibreOffice Draw to do this by telling --convert-to pdf:draw_pdf_Export.
Is there also a way to tell LibreOffice to produce a PDF/A document in headless mode?
For PDF/A-1(means PDF/A-1b?):
soffice --headless --convert-to pdf:"writer_pdf_Export:SelectPdfVersion=1" --outdir outdir input.pdf
Change the value from 1 to 2 for PDF/A-2, here is the Libreoffice source code Common.xcs, pdfexport.cxx and pdffilter.cxx.
(Maybe outdated) API/Tutorials/PDF export - Apache OpenOffice Wiki
Python Guide - PDF export filter data - The Document Foundation Wiki
excel->pdf変換 command のdpi設定 - Ask LibreOffice
Change default resolution in batch PNG conversion [closed] - Ask LibreOffice
Since with LibreOffice is possibile only via GUI, for command line solution use gs
First convert pdf to ps
pdftops input.pdf input.ps
Then convert ps to pdf/a archival format of PDF
gs -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sPDFACompatibilityPolicy=1 -sOutputFile=input-A.pdf input.ps
I can crop PDF (for example in Acrobat).
But text outside of the crop margin will still be maintained in the PDF (even though I don't see it in the viewable area).
I want to remove anything outside the crop margin. Is there a command line tool that can do so?
Ghostscript can do that. Ghostscript is a command line tool which is available for all major operating systems.
The command which does it for Linux or Mac OS X:
gs -o cropped-and-removed.pdf \
-sDEVICE=pdfwrite \
-dUseCropBox \
in.pdf
The command for Windows:
gswin64c.exe -o cropped.pdf ^
-sDEVICE=pdfwrite ^
-dUseCropBox ^
in.pdf
Be sure to use a rather recent version of Ghostscript. Current is v9.16.
Requirement is to convert PDF to PCL with a macro embedded (currently testing this on Windows, however I will need to use this runtime in the application and print it from UNIX). The macro will be used later in another document to embed this cropped image and printed on one single page. I will be using PCL escape codes to call the MacroNumber and then the image will be printed. (You can consider this as a logo image.)
I am able to convert the PDF with whitespace to just the PDF without any whitespace by using CropBox.
"c:\progra~1\gs\gs9.15\bin\gswin64.exe" -o _sourcePDFcropped.pdf \
-sDEVICE=pdfwrite -c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f _sourcePDF.pdf
However, when I convert this _sourcePDFcropped.pdf to PCL, this still adding whitespace.
"c:\progra~1\gs\gs9.15\bin\gswin64c.exe" -dBATCH -dNOPAUSE \
-sDEVICE=pxlcolor -g100x200 -sOutputFile=_sourceFedGroundCroppedTest.pcl \
-f _sourceFedGroundCropped.pdf
I tried using MKPCL and it does the job. Because it doesn't have much support, I am trying to use Ghostscript.
MKPCL.EXE -c4 -t -m 100 -p Image.jpg Image.MAC
I also tried ImageMagick which internally uses Ghostscript. So I am guessing, if I use the right switches in GS, I should be able to achieve my goal.
Input PDF File: Click Here
P.S: I have seen other PDF to PCL queries on Stackoverflow, others are more of straight forward PDF to PCL. Mine is to crop the PDF and output should be PCL.
Question continued: Link
I processed the sample input PDF with the following command line, using a self-compiled Ghostscript v9.16 (unreleased, from current GhostPDL GIT sources):
gs -o - \
-sDEVICE=pdfwrite \
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" \
-f source.pdf \
\
| gs -o tst.pcl \
-sDEVICE=pxlcolor \
-dUseCropBox \
-f -
(As you may well have noticed, I'm connecting 2 different Ghostscript commands through a pipe in order to save writing a temporary PDF file to disk.)
If you want to do the same on Windows, the command line in a cmd.exe/DOS box would be:
gswin64c.exe -o - ^
-sDEVICE=pdfwrite ^
-c "[/CropBox [1 140 320 650] /PAGES pdfmark" ^
-f source.pdf ^
^
| gswin64c.exe -o tst.pcl ^
-sDEVICE=pxlcolor ^
-dUseCropBox ^
-f -
Then I opened it with the self-compiled PCL viewer (also from GhostPDL sources), pcl6:
pcl6 tst.pcl
This is a screenshot showing the pcl6 window:
As KenS also pointed out: it is important to use -dUseCropBox when processing the cropped PDF intermediate data!
Adding a CropBox doesn't really do much, it leaves the PDF exactly the same, but adds a CropBox entry for the page. GS will usually use the MediaBox, not the CropBox, so adding a CropBox to a PDF has no effect.
You could try adding -dUseCropBox. If the white space you think is being added is in fact present in the original PDF, but masked by the CropBox, then using -dUseCropBox will have GS use the CropBox when rendering the PDF.
I split PDF into pages with help of usable command line:
for G in $(seq 1 $(pdfinfo 47.pdf | sed -n 's/Pages:[^0-9]*\([0-9]*\).*/\1/p')) ; do
gs \
-dSAFER \
-sDEVICE=pdfwrite \
-dBATCH \
-dNOPAUSE \
-dFirstPage=$G \
-dLastPage=$G \
-o $G.pdf \
47.pdf ;
done
But some pages appears without text (Graphics are still present)
So, I have tried to extract embedded font from PDF:
gs -q -dNODISPLAY extractFonts.ps -c "(47.pdf) extractFonts quit"
These fonts I have installed in system Fonts folder.
After that, I have repeat splitting and no changes were happened.
How-to be sure that pages will be extracting correctly, I have no idea now.
Ghostscript and pdfwrite are not actually intended for the purpose of splitting PDF files up, there are other tools which will probably work better, why not try pdftk ?
If you really want to use Ghostscript then I would advise you to get hold of the latest bleeding-edge code from the Git repository, in that code the pdfwrite device will accept an output file name containing a '%d' and will write one file per page.
Beyond that, it seems most likely to me that you are simply experiencing a bug, rather than 'losing the font', if the font was missing the text would still be ther but in a differnt font. Which version of GS are you using ?
I'm fitting a file with no margins (produced using a pdfcrop from a normal PDF file) to a given paper size using GhostScript:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dFIXEDMEDIA \
-dPDFFitPage -d -dBATCH -dQUIET -dNOPAUSE -dDEVICEWIDTHPOINTS=864 \
-dDEVICEHEIGHTPOINTS=612 -sOutputFile=$INPUT $OUTPUT
but the output has additional margins (I was cropping in order to get rid of them).
Is it possible to force GhostScript to produce output without these margins?
Without seeing your file I cannot be certain, however I suspect that all you have done is set a /CropBox in the PDF file. By default Ghostscript uses the /MediaBox which is probably unchanged.
Try setting -dUseCropBox