Converting pdf to eps without rasterizing or changing fonts - pdf

I have been trying to convert a pdf vector graphic to eps. I tried two commands from the following answer: https://stackoverflow.com/a/44737018/5661667
The inkscape command inkscape input.pdf --export-eps=output.eps or rather, since --export-eps is deprecated now,
inkscape input.pdf --export-filename=output.eps
nicely converts to a vectorized eps. However, it strangely converts my Times New Roman fonts (the graphic was originally created using matplotlib) to some sans serif font (looks like Arial or something).
The ghostscript version of the conversion from the linked answer
gs -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -sOutputFile=output.eps input.pdf
keeps my fonts nicely. However, the eps seems to be rasterized despite the -dNOCACHE option.
Is there any way to get one of these to just convert my pdf to eps without modifying it?
Further info: I am using Mac OS. For the first part, my suspicion is that I only have an Arial Unicode.tff installed in /Library/Fonts/. I tried installing some other fonts, but no success for my conversion.

I had the same problem when trying to convert a powerpoint generated pdf to eps format using inkscape.
After trying with gs and disabling the transparency I noticed some areas turned black after eps conversion.
gs -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -dNOTRANSPARENCY -sDEVICE=eps2write -sOutputFile=output.eps input.pdf
Coming back to inkscape I noticed that Powerpoint added some transparent objects in these areas that turned black. So I manually removed them using inkscape and when converting to eps again the result was perfect!
In short: if there are transparent elements in your pdf, the fonts will probably be rasterized during eps conversion. So, you need to remove these elements.
Maybe there is an easier way to identify them in inkscape.
In my case I was able to use Find/Replace (Ctrl+F) to search objects with string "clipPath" and with 'Search option = Properties'. Then I open the Objects Tab (Menu Object->Objects...) and use that to delete each transparent object generated by Powerpoint.

Related

Ghostscript pdf conversion makes ligatures unable to copy & paste

I have a pdf (created with latex with \usepackage[a-2b]{pdfx}) where I am able to correctly copy & paste ligatures, i.e., "fi" gets pasted in my text editor as "fi". The pdf is quite large, so I'm trying to reduce its size with this ghostscript command:
gs -dPDFA-2 -dBATCH -dNOPAUSE -sPDFACompatibilityPolicy=1 -sDEVICE=pdfwrite
-dPDFSETTINGS=/printer -sProcessColorModel=DeviceRGB
-sColorConversionStrategy=UseDeviceIndependentColor
-dColorImageDownsampleType=/Bicubic -dAutoRotatePages=/None
-dCompatibilityLevel=1.5 -dEmbedAllFonts=true -dFastWebView=true
-sOutputFile=main_new.pdf main.pdf
While this produces a nice, small pdf, now when I copy a word with "fi", I instead (often) get "ő".
Since the correct characters are somehow encoded in the original pdf, is there some parameter I can give ghostscript so that it simply preserves this information in the converted pdf?
I'm using ghostscript 9.27 on macOS 10.14.
Without seeing your original file, so that I can see the way the text is encoded, it's not possible to be definitive. It certainly is not possible to have the pdfwrite device 'preserve the information'; for an explanation, see here.
If you original PDF file has a ToUnicode CMap then the pdfwrite device should use that to generate a new ToUnicode CMap in the output file, maintaining cut&paste/search. If it doesn't then the conversion process will destroy the encoding. You might be able to get an improvement in results by setting SubsetFonts to false, but it's just a guess without seeing an example.
My guess is that your original file doesn't have a ToUnicode CMap, which means that it's essentially only working by luck.

How can I disable ghostscipt rasterization of images and paths?

I need to convert a PDF to a different ICC color profile. Through different searches and tests, I found out a way to do that:
First I convert my PDF to a PS file with:
.\gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile="test.ps" "test.pdf"
Then I convert the PS back to a PDF with the following (this is to generate a valid PDF/X-3 file):
.\gswin64c.exe -dPDFX -dNOPAUSE -dBATCH -sDEVICE=pdfwrite
-sColorConversionStrategy=/UseDeviceIndependentColor -sProcessColorModel=DeviceCMYK
-dColorAccuracy=2 -dRenderIntent=0 -sOutputICCProfile="WebCoatedSWOP2006Grade5.icc"
-dDeviceGrayToK=true -sOutputFile="final.pdf" test_PDFX_def.ps test.ps
The ICC profile is embedded and all works perfectly. The only problem is that the whole final PDF is rasterized. Here I loose all the paths and other vectorial elements quality I have in the starting file. I need to keep them vectorial because this PDF will have a specific application.
First step don't convert to PostScript!!!
Any transparent marking operations will have to be rendered if you do that, because PostScript doesn't support transparency. Other features will be lost as well, so really, don't do that. The input and output ends of Ghostscript are more or less independent; the pdfwrite device doesn't know whether the input was PDF or PostScript, and doesn't care. So you don't need to convert a PDF file into PostScript before sending it as input.
You can feed the original PDF file into the second command line in place of the PostScript file.
As long as you are producing PDF/X-3 or later then the transparency will be preserved. Make sure you are using an up to date version of Ghostscript.

Converting from pdf to png with ghostscript, result with many white boxes

I'm converting pdf (created with adobe illustrator) into transparent png file, with following command:
gs -q -sDEVICE=pngalpha -r300 -o target.png -f source.pdf
However, there's undesired white boxes in the resulting PNG, looks like it's auto generated by ghostscript, some bounding box. (see attached image)
Tryied both gs-9.05 and gs-9.10, same bad result.
I've tried to export to PNG file from Illustrator or Inkscape manually, the result is good.
What does Inkscape do to render it correct, and
How could I eliminate those white boxes using ghostscript?
Try mudraw of latest (1.3) muPDF, as far as I checked it creates nice PNGs from PDF files with 1.4 transparency:
mudraw -o out.png -c rgba in.pdf
"rgba" being, as you understand, RGB + alpha
In the general case, you can't. PDF does support transparency, but the underlying media is always assumed to be white and opaque. So anywhere that marks are made on the medium is no longer transparent, its white.
You don't say which version of Ghostscript you are using, but if its earlier than 9.10 you could try upgrading.

Forcing Ghostscript to use antialiasing when converting a PDF to PNG?

I'm using GPL Ghostscript 9.07 (2013-02-14) on OS X (10.8.4) to convert many PDFs to PNGs.
It works fine except for one of the PDFs which turns into a PNG with jagged edges. In other words, Ghostscript turns off antialiasing for that particular PDF for some reason.
The PDF in question.
The output:
In other cases it works fine (sample: pdf -> png).
I use this command:
gs -dNOPAUSE -dBATCH -dPDFFitPage -sDEVICE=pngalpha -g200x150 -sOutputFile=01.png 01.pdf
Is it possible to force Ghostscript to use antialiasing for that PDF?
Any tips are appreciated.
This worked for me:
gs -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=jpeg -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -r150 -sOutputFile=foo-%d.jpg foo.pdf
Source: ImageMagick convert pdf to jpeg has poor text quality after upgrading ImageMagick version to 6.7.8
The above would work for a JPG; for PNG, replace the -sDEVICE option with your choice, example: -sDEVICE=png16m
Source: http://ghostscript.com/doc/current/Devices.htm
You can try -dGraphicsAlphaBits= with values 1,2 or 4 which may or may not make a difference. It made some improvement for me, but its a small graphic at low resolution with an awkward curve, so not so much as might be expected.
Or you can use one of the anti-aliasing devices (eg tiffscaled) which are more flexible. There is no anti-aliased device for PNG output but it would be trivial to convert TIFF to PNG.
By the way, your PDF file specifically turns off anti-aliasing on the components:
8 0 obj
<</AntiAlias false/ColorSpace/DeviceCMYK/Coords[0.0 0.0 1.0 0.0]/Domain[0.0 1.0]/Extend[true true]/Function 10 0 R/ShadingType 2>>
You might like to try and see what happens if you change AntiAlias to true, though I doubt this will have an effect as I'm pretty sure the aniti-aliasing is applied to the internal rendering of the shading, not the edgses.
You can try -dDOINTERPOLATE which uses a Mitchell filter function to scale the contributions for each output pixel

How to convert PDF to low-resolution (but good quality) JPEG?

When I use the following ghostscript command to generate jpg thumbnails from PDFs, the image quality is often very poor:
gs -q -dNOPAUSE -dBATCH -sDEVICE=jpeggray -g465x600 -dUseCropBox -dPDFFitPage -sOutputFile=pdf_to_lowres.jpg test.pdf
By contrast, if I use ghostscript to generate a high-resolution png, and then use mogrify to convert the high-res png to a low-res jpg, I get pretty good results.
gs -q -dNOPAUSE -dBATCH -sDEVICE=pnggray -g2550x3300 -dUseCropBox -dPDFFitPage -sOutputFile=pdf_to_highres.png test.pdf
mogrify -thumbnail 465x600 -format jpg -write pdf_to_highres_to_lowres.jpg pdf_to_highres.png
Is there any way to achieve good results while bypassing the intermediate pdf -> high-res png step? I need to do this for a large number of pdfs, so I'm trying to minimize the compute time.
Here are links to the images referenced above:
test.pdf
pdf_to_lowres.jpg
pdf_to_highres.png
pdf_to_highres_to_lowres.jpg
One option that seems to improve the output a lot: -dDOINTERPOLATE. Here's what I got by running the same command as you but with the -dDOINTERPOLATE option:
I'm not sure what interpolation method this uses but it seems pretty good, especially in comparison to the results without it.
P.S. Consider outputting PNG images (-sDEVICE=pnggray) instead of JPEG. For most PDF documents (which tend to have just a few solid colors) it's a more appropriate choice.
Your PDF looks like it is just a wrapper around a jpeg already.
Try using the pdfimages program from xpdf to extract the actual image rather than rendering
to a file.