Batch converting EPS/PDF to postscript - pdf

I'm on windows and am trying to convert 6000 PDF files in batch to postscript files. The reason is I'm trying to do pdf imposition as asked here, first wanted to do in R as asked here. I found a library grImport to handle vector graphics in R, but it needs .ps files.
I could already convert batch .pdf files to .eps using Inkscape using this script. However, I need .ps for the R package. I was unable to do it using Adobe Acrobat Pro Action (it simply doesn't work on the folder, and freezes when I try it on an individual file.)
I have also tried ghostscript but setting -sDEVICE=pswrite throws an error saying the device is unknown. Also, I really could not get my head around GS.
How can I do this? (If you happen to know a solution to the main problem, sharing it is very appreciated.) Thanks in advance.

The pswrite device was deprecated years ago (it only produced level 1, big, ugly, unscalable output). You want to use the ps2write device which produces level 2 PostScript.
A simple command line for Ghostscript woudl be:
gs -sDEVICE=ps2write -o out.ps input.pdf
There are tools for imposing PDF files, you can even (with some effort) do it with Ghostscript.

Related

OCRMYPDF: 'pages' parameter not working as expected even with optimization disabled

I'm using ocrmypdf and I just want the first page of the files to have their characters recognized. I'm trying to do this with
ocrmypdf -l por --force-ocr --pages 1 --optimize 0 input.pdf output.pdf
but even then it outputs
Start processing 10 pages concurrently
the files are in portuguese and some of them have text with fonts that I can't read in python because the string becomes a lot of "(cid:)" that's why I use --force-ocr.
Also I have a lot of files (the files are actually a parameter for an application I'm developing), so this is taking too much time.
My operating system is Windows if it helps somehow.

Does GhostScript require a GPU?

I'm converting pdf to images using a nodejs package: https://www.npmjs.com/package/pdf2images-multiple
This works successfully in docker on two different local machines which both have Graphic Cards. However when I try to run this on a server in Google Cloud (which does not have a GPU). The following error occurs for particular pdf pages that have graphs:
error: message=Failed to convert page to image, killed=false, code=1, signal=null, cmd=gm convert -density 150 -quality 100 -sharpen 0x1.0 -trim '/usr/src/app/1161115-30-1kabyqq.2bteimgqfr.pdf[7]' '/usr/src/app/pdfimages1161115-30-10uod6h.siy6pyzaor/1161115-30-1kabyqq-7.png', stdout=, stderr=gm convert: "gs" "-q" "-dBATCH" "-dMaxBitmap=50000000" "-dNOPAUSE" "-sDEVICE=pnmraw" "-dTextAlphaBits=4" "-dGraphicsAlphaBits=4" "-r150x150" "-dFirstPage=8" "-dLastPage=8" "-sOutputFile=/usr/src/app/gmxHC5iw" "--" "/usr/src/app/gm0tibSq" "-c" "quit" (child process quit due to signal 11).
gm convert: Postscript delegate failed (/usr/src/app/1161115-30-1kabyqq.2bteimgqfr.pdf).
I've created an AWS instance with a GPU and this error does not occur. Looking to see if there's an Environment Variable that would be able to skip the GPU variant in GhostScript at least until Google Cloud gets GPUs or some alternative that I'm not seeing here.
The command in the error message called GraphicsMagick has documentation saying it doesn't use any GPU techniques.
http://www.graphicsmagick.org/FAQ.html#are-there-any-plans-to-use-opencl-or-cuda-to-use-a-gpu
Ghostscript does not need, and indeed is not capable of using (beyond using X to display the bitmap) a GPU. There is some SIMD code, but you can compile without that, obviously I have no idea how the Ghostscript you are using was compiled.
For Linux, its often impossible to move a binary from one box to another, because the ABI differs between the two systems in terms of things like the C runtime. Also, if the executable has been compiled with shared libraries (many distributions insist on doing this) then differing versions of the shared libraries might cause problems.
My guess is that, rather than the presence or absence of a GPU, there is some significant difference between the Google Cloud Linux and the AWS Linux.
The best way to deploy Ghostscript on Linux is to build it from source on the machine you intend to use, this is especially true if you intend to put it on multiple machines with different configurations.

Template rendering engine on Raspberry Pi

I have a project in which I am using a Raspberry Pi to print ticket to a thermal printer.
It is pretty much the same principle as in this video.
Tickets are generated from templates that may include text and images. Both text and images are dynamic, for example I may want to print the current time. I receive the template as a .psd from a designer and the thermal printer takes bitmap data. The Raspberry Pi communicates to the printer with a python library. Everything must be done locally as cloud access is not guaranteed. Performance is important.
I investigated several options:
Latex + ImageMagick
Webkit + Phantom.js
Pillow (Python Imaging Library), especially the module ImageDraw
The first option is not quite satisfying because Latex generates a pdf file and then ImageMagick is very slow to convert it to a .png.
The second option is seducing but if I am not mistaken, I would need to start a server locally.
The third option would be great because it will be pure python, but requires to build a basic typesetting system on top of PIL.
Has anyone been confronted to a similar problem ?

Converting PDF to TIFF

I am looking for some tool or library (.NET version will be perfect) I could use to convert some big PDF files (over 200MB) to TIFF in the product we are developing for our client.
I need tool I could call from the command line or a library that I could use in the .Net application.
I have tested ghostscript, and it works perfect but according to its license, we cannot use it.
Do you have any experience with free or commercial products we could use for it? Could you recommend something?
Thanks in advance!
As you explicitly ask for commercial software as well, callas pdfToolbox performs this task. I'm affiliated with this company / product so draw your own conclusions about quality / price. However, the software:
works perfectly on the command-line
exports to PNG, JPG or TIFF (or rasterized PDF)
exports to either grey, RGB or CMYK
supports smoothing and overprint preview (important when you're in graphic arts, likely less so if not)
is available on Mac OS X, Windows, Linux and Unix
Send me a private message if you want to know more.
Gnostice PDFOne .NET has a PDFDocument.SaveAsMultiPageTiff() method that you can use.
http://www.gnostice.com/nl_article.asp?id=215&t=Convert_A_Multi-Page_TIFF_To_PDF_Using_PDFOne_NET

Converting PDF to JPG with Imagemagick is really slow after update

I have several servers set up using Centos 5 (64bit) and the default yum installed versions of Ghostscript (v8.70) and Imagemagick (v6.2.8) which work really well and are very quick at converting PDF files into JPG previews.
I have removed both IM and GS on one of my servers and installed the latest versions Ghostscript(v9.0.7) and Imagemagick(v6.8.5) from source and the conversion speed has gone from around 0.5 seconds to 7.5 seconds for exactly the same original PDF.
I need to be able to run the later version of both to be able to use the inkcov device for working out which pages are colour in multipage pdfs (up to 200 pages).
Now I am assuming this slowdown is due to the compile options, as I can't believe that the later versions are so much slower. I have searched around to try and find ways of optimising at the compilation stage (changing to Q8 rather than Q16 quality etc) but nothing seems to make much difference.
thanks