To convert pdfs/eps to pixel images we use ImageMagick. When converting a PDF the command can look like this:
convert -verbose -density 150 -trim -colorspace sRGB input.pdf -quality 90 -flatten -sharpen 0x1.0 output.png
However, the input PDF (which contains only a few paths) has a specified size of 300cm by 200cm, and Ghostscript doesn't like this, and creates a huge png. The verbose output of imagemagick shows this:
"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r150x150" "-sOutputFile=/tmp/magick-3036AW7mUOP25w7J%d" "-f/tmp/magick-3036PxgJinljqMwV" "-f/tmp/magick-30369hcErAROr7V6"
/tmp/magick-3036AW7mUOP25w7J1 PNG 17717x11811 17717x11811+0+0 8-bit sRGB 1.003MB 3.910u 0:03.929
input.pdf PNG 17717x11811 17717x11811+0+0 16-bit sRGB 1.003MB 0.000u 0:00.000
PNG 17717x11811. Huge. I only need a 256x256 image.
I have tried the geometry (-geometry 256x256) options and the density option (before the filename, -density 150) in different configurations, but I does not change the ghostscript output, for example:
convert -verbose -density 150 -trim -geometry 265x265 -colorspace sRGB input.pdf -quality 90 output.png
"gs" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r150x150" "-sOutputFile=/tmp/magick-3233p3ofct0fiy5T%d" "-f/tmp/magick-3233Rads_vSSpKa6" "-f/tmp/magick-3233LQMrrEFgT0fi"
/tmp/magick-3233p3ofct0fiy5T1 PNG 17717x11811 17717x11811+0+0 8-bit sRGB 1.003MB 3.800u 0:03.799
input.pdf PNG 17717x11811 17717x11811+0+0 16-bit sRGB 1.003MB 0.000u 0:00.000
input.pdf=>output.png PNG 17717x11811=>265x38 321x213+28+65 16-bit sRGB 7.96KB 1.080u 0:00.559
How can you define the constraints for Ghostscript, when using ImageMagick?
I wouldn't say 'Ghostscript doesn't like this', if the media size is huge, then of course Ghostscript creates a huge PNG, what else did you expect it to do ?
300 cm is 118.11 inches, which at 150 dpi works out at a bitmap 17716.5, or rounded up to 17717 pixels.
200 cm is 78.74 inches, at 150 dpi that works out to 11811 pixels.
If you want it at a lower resolution, then alter the setting of -r ('density' in ImageMagick). For example, you could set 'density 10', presumably that would produce a file 1181x787. To get 256x256 you would need to set the resolution to ~2 dpi. You may, of course, find that its rather hard to see any detail when the result is that coarse.
Alternatively you can tell Ghostscript the size of media you want, and tell it 'FIXEDMEDIA' so it doesn't alter according to requests from the PostScript program or PDF file.
-g sets the media dimensions in pixels, and -dFIXEDMEDIA tells Ghostscript that the media is fixed. You will almost certainly want to also set -dFitPage, or you will only get a tiny portion of the bottom left hand corner. You will also need to not set -r.
-dFIXEDMEDIA -g and -r are described in use.htm in the ghostscript documentation
Depending on the age of your Ghostscript installation you may not be able to use -dFitPage but you might have to use -dPDFFitPage.
Almost certainly you will want to do this from the command line using Ghostscript instead of ImageMagick, I imagine constructing the command line in IM would be difficult.
Related
I use Imagemagick convert to convert pdf file to png as follows:
Magick convert -density 300 PointOnLine.pdf -quality 90 PointOnLine.png
It gives me the following warning:
convert: profile 'icc': 'RGB ': RGB color space not permitted on grayscale PNG `PointOnLine.png' # warning/png.c/MagickPNGWarningHandler/1744.
And png image created is all black. However, convert to jpg image is fine.
Update: After adding -define profile:skip=ICC, image is still dark. But if convert to jpg and then to png, it is ok, but background is dark. The same warning is still there. What is the problem? Thanks.
The following works for me without error in ImageMagick 7.0.7.22 Q16 Mac OSX Sierra with Ghostscript 9.21 and libpng #1.6.34_0. Your PDF has an alpha channel, so you might want to flatten it.
magick -density 300 PointOnLine.pdf -flatten -quality 90 result.png
This also works without error, but leaves the alpha channel in the png, though you won't see it here until you extract the image:
magick -density 300 PointOnLine.pdf -quality 90 result2.png
Note that in IM 7 you should just use magick and not magick convert.
Check that you are using a current version of Ghostscript and libpng, if you do not get the same results.
Your delegates.xml file for PS:alpha should show sDEVICE=pngalpha rather than pnmraw as follows.
<delegate decode="ps:alpha" stealth="True" command=""gs" -sstdout=%%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>
USER REQUESTED RESULTING IMAGES THAT I POSTED TO BE REMOVED!
Command which worked for me was:
magick -density 300 PointOnLine.pdf -depth 8 -strip -background white -alpha off PointOnLine.tiff
It did not gave any warning, also removed black blackground as well.
I was able to convert it to the text afterwards using tesseract:
tesseract PointOnLine.tiff PointOnLine
I understand you are using ImageMagick under Windows, even if not stated (and the respective versions of IM, Win were not posted)
I am under Ubuntu 16.04 LTS, and I will provide an answer possibly useful. (Under Win, prepend everything with Magick).
For me,
convert -density 300 -quality 90 PointOnLine.pdf PointOnLine.png
works fine, with no warnings, producing a suitable output.
I tried other things which work as well, some of them may suit you.
First convert your pdf to RGB and then to png.
convert -density 300 -colorspace RGB PointOnLine.pdf PointOnLine_rgb.pdf
convert -density 300 PointOnLine_rgb.pdf PointOnLine_rgb.png
If you post your PDF, I can check it out. Otherwise, perhaps it is CMYK, which PNG does not support. So try
magick -quiet -density 300 -colorspace srgb PointOnLine.pdf -quality 90 PointOnLine.png
Note in IM 7, use magick not magick convert. Also not that -quality is different for PNG than JPG. See https://www.imagemagick.org/script/command-line-options.php#quality
I had the same issue and resolved adding -colorspace RGB before the output filename.
convert -density 300 PointOnLine.pdf -quality 90 -colorspace RGB PointOnLine.png
How can I convert sRGB PDF to CMYK PDF on Linux Command Line, by keeping small file sizes and high qualtiy.
If I use imagemagic convert, I get the correct colors with this command.:
convert -profile /usr/share/ghostscript/9.05/iccprofiles/srgb.icc -profile /usr/share/ghostscript/9.05/iccprofiles/default_cmyk.icc sRGB.pdf default_cmyk.pdf
But this does not keep a sharp image. If I use convert qualtiy options to solve this, the file size blow up.
The qualtiy of the image is much better, if I use GhostScript.
gs -dSAFER -dBATCH -dNOPAUSE -dNOCACHE -sDEVICE=pdfwrite -sColorConversionStrategy=CMYK -dProcessColorModel=/DeviceCMYK -sOutputICCProle=/usr/share/ghostscript/9.05/iccprofiles/default_cmyk.icc -dOverrideICC -sOutputFile=gs_default_cmyk.pdf sRGB.pdf
But then colors does not fit.
How can I get the colors I get with convert and the line accuracy and file size I get with GhostScript?
I run the following command to split a PDF in ImageMagick:
convert file.pdf[5-10] file.png
The resulting output files are always suffixed starting with zero. That is:
file-0.png, file-1.png, file-2.png...
Any ideas what I might be doing wrong? The documentation states that the files should be suffixed starting at 5, matching the page numbers of the pages extracted.
I ended up solving this by using the -scene # command line parameter.
This causes the output to begin at the desired index. For posterity:
convert file.pdf -scene 5 file-%d.png
You see the result you describe because ImageMagick's page count for multi-page image formats is zero-based: Page 1 will have index 0, page 2 will have index 1, etc.
Also, ImageMagick cannot process PDF input files itself: it employs Ghostscript as its 'delegate' -- Ghostscript consumes the PDF first and emits a raster file for each PDF page. Only these raster files are then processed by ImageMagick.
Depending on your exact ImageMagick version and IM setup, this may result in an indirect PNG output generation, and the conversion chain may look like this:
PDF --> PPM (portable pixmap) --> PNG
^ ^
| |
| +-- (handled by ImageMagick)
+-- (handled by Ghostscript)
If you are unlucky, the result will be slow and the quality may not be as good as it could be.
To verify what exactly happens in a convert a.pdf a.png command, you can add the -verbose parameter. That will show you the Ghostscript command being employed by IM to process the PDF input:
convert -verbose a.pdf a.png
/var/tmp/magick-15951W3TZ3WRpwIUk1 PNG 612x792 612x792+0+0 8-bit sRGB 3.73KB 0.000u 0:00.000
a.pdf PDF 612x792 612x792+0+0 16-bit sRGB 3.73KB 0.000u 0:00.000
a.pdf=>a.png PDF 612x792 612x792+0+0 8-bit sRGB 2c 2.95KB 0.000u 0:00.000
[ghostscript library] -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT \
-dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" \
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" \
"-sOutputFile=/var/tmp/magick-15951W3TZ3WRpwIUk%d" \
"-f/var/tmp/magick-15951nJD8-fF8kA7j" \
"-f/var/tmp/magick-15951JTZDMwtEswHn"
(As you can see, my IM installation is set up to do a PDF->PNG conversion without the detour via PPM... Your mileage may vary.)
You may get better results when using Ghostscript directly, instead of running an IM convert command. (If ImageMagick works at all with PDF->PNG conversion, you have a working Ghostscript installation for sure.) So you can try this:
gs \
-o file-%03d.png \
-sDEVICE=pngalpha \
file.pdf
The -%03d file name suffix will cause Ghostscript to output file-001.png, file-002.png, file-003.png.
However, if you are unlucky and have an older version of Ghostscript installed, the file name will also start with a file-000 one...
In any case, since your sample command seems to suggest that you want to convert only a page range (5--10) from the PDF file (not all pages), here is the command to use:
gs \
-o file-%03d.png \
-sDEVICE=pngalpha \
-dFirstPage=5 \
-dLastPage=10 \
file.pdf
But the bad news here is: Ghostscript will STILL start with naming the output files as file-001.png (page 5) ... file-005.png (page 10).
To work around that, you'll have to generated the PNGs for the first 4 pages too, and later delete them again:
gs \
-o file-%03d.png \
-sDEVICE=pngalpha \
-dFirstPage=1 \
-dLastPage=10 \
file.pdf
rm -rf file-00{1,2,3,4}.png
I created color separations using
gs -sDEVICE=tiffsep -dNOPAUSE -dBATCH -dSAFER -r600x600 -sOutputFile=p%08d.tif input.pdf
The outputs are all greyscale separations as documented.
Questions
1. How do I combine just the CYAN and MAGENTA separations (or any combination of colors) to make a PDF file?
2. How do I make sure the output PDF from the combo is in color and not greyscale?
Thanks.
You should be able to use Imagemagick's convert to combine CMYK separations; references:
http://www.productionmonkeys.net/guides/ghostscript/examples
CMYK colors inverted with draw or annotate - ImageMagick forum
Combine 4 grayscale images into a final CMYK image - ImageMagick
Example: first, create RGB pdf with Latex (from https://softwarerecs.stackexchange.com/questions/19210/linux-gui-for-quick-browsing-of-cmyk-separations-of-multi-page-pdf); use this as test.tex:
\documentclass{standalone}
\usepackage{tikz}
\begin{document}
\begin{tikzpicture}
\draw[fill=none,draw=black,line width=2pt] (0cm,0cm) rectangle (4cm,5cm);
\draw[fill=red] (1cm,1cm) circle (1cm) ;
\draw[fill=blue] (2cm,2.5cm) circle (1cm) ;
\draw[fill=green] (3cm,4cm) circle (1cm) ;
\end{tikzpicture}
\end{document}
... then build the PDF with:
pdflatex test.tex
Split RGB pdf into CMYK separations as tiff images using Ghostscript (see original softwarerecs thread for images), which will be called:
gs -sDEVICE=tiffsep -dNOPAUSE -dBATCH -dSAFER -r150x150 -sOutputFile=test%04d.tif test.pdf
Merge/combine CMYK separations into a CMYK color tiff using Imagemagick:
convert \
test0001\(Cyan\).tif \
test0001\(Magenta\).tif \
test0001\(Yellow\).tif \
test0001\(Black\).tif \
-set colorspace CMYK -negate -combine combined.tif
... and here is how the final combined.tif looks like (I had to do convert combined.tif combined.png to upload it here, else .tif alone is not accepted):
For comparison, here is a png derived from the original PDF (convert -density 150 -flatten test.pdf test.png):
Notice how the colors are slightly different, which is expected due to the colorspace roundtrip. Also, note that for more correct colors, you'll probably have to use ICC profiles during conversion...
Finally, you should find a way to convert/import the final CMYK color TIFF into a PDF... (probably either ghostscript or imagemagick could do that, but I haven't tried it..)
For just cyan and magenta - use a white image with the same size as the channel TIF separations, to insert it in place of the missing separations:
convert -size 240x299 xc:white white.png
... and then do the merge again:
convert \
test0001\(Cyan\).tif \
test0001\(Magenta\).tif \
white.png \
white.png \
-set colorspace CMYK -negate -combine combinedCM.tif
Here is the output (after convert combinedCM.tif combinedCM.png):
Open the separations in an image editor which supports CMYK channels (eg Photoshop), combine the channels as required, save as PDF (or PostScript and use GS to convert to PDF).
GM is unable to identify background transparency of PDF and PNG created using "gm convert" gets white background while same PDF is converted to PNG with transparent background by IM.
$convert -verbose /var/tmp/abc.pdf /var/tmp/abc.png
/var/tmp/magick-16370Tq7WYv5U54Pa1 PNG 288x720 288x720+0+0 8-bit sRGB 20.7KB 0.000u 0:00.009
/var/tmp/abc.pdf PDF 288x720 288x720+0+0 16-bit sRGB 20.7KB 0.000u 0:00.000
/var/tmp/abc.pdf=>/var/tmp/abc.png PDF 288x720 288x720+0+0 8-bit sRGB 17c 16.6KB 0.010u 0:00.009
[ghostscript library] -q -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72" "-sOutputFile=/var/tmp/magick-16370Tq7WYv5U54Pa%d" "-f/var/tmp/magick-16370CVWmPbzBmjpF" "-f/var/tmp/magick-16370khy6Y-G3TgtO"
$gm convert -verbose /var/tmp/abc.pdf /var/tmp/abc.png
gm convert: "gs" "-q" "-dBATCH" "-dMaxBitmap=50000000" "-dNOPAUSE" "-sDEVICE=pnmraw" "-dTextAlphaBits=4" "-dGraphicsAlphaBits=4" "-r72x72" "-sOutputFile=/var/folders/6d/n_hv45rs1jv17nxwfjwj776cspn_3c/T/gmoCp6rG" "--" "/var/folders/6d/n_hv45rs1jv17nxwfjwj776cspn_3c/T/gmBEgWnK" "-c" "quit".
/var/tmp/abc.pdf PDF 288x720+0+0 DirectClass 8-bit 607.6K 0.000u 0:01
/var/tmp/abc.pdf=>/var/tmp/abc.png PNG 288x720+0+0 DirectClass 8-bit 0.000u 0:01
Upon further investigation, it seems "identify" from IM can correctly identify background in PDF but "gm identify" from GM cannot.
$identify -verbose abc.pdf
Image: abc.pdf
Format: PDF (Portable Document Format)
Type: Bilevel
Colorspace: Gray
Depth: 16/4-bit
Channel depth:
gray: 1-bit
alpha: 4-bit
Alpha: graya(255,0) #FFFFFFFFFFFF0000
Colors: 16
Background color: graya(255,1)
Transparent color: graya(0,0)
Version: ImageMagick 6.8.9-1 Q16 x86_64 2014-07-01 http://www.imagemagick.org
$gm identify -verbose abc.pdf
gm identify: "gs" "-q" "-dBATCH" "-dMaxBitmap=50000000" "-dNOPAUSE" "-sDEVICE=pnmraw" "-dTextAlphaBits=4" "-dGraphicsAlphaBits=4" "-r72x72" "-sOutputFile=/var/folders/6d/n_hv45rs1jv17nxwfjwj776cspn_3c/T/gmzhBEIk" "--" "/var/folders/6d/n_hv45rs1jv17nxwfjwj776cspn_3c/T/gmAPm2Po" "-c" "quit".
Image: abc.pdf
Format: PDF (Portable Document Format)
Type: grayscale
Depth: 4 bits-per-pixel component
Channel Depths:
Gray: 4 bits
Background Color: white
Comment: Image generated by GPL Ghostscript (device=pnmraw)
Signature: 215f1c08ec575526ce398d193c4df22faaea100c10255e0db747641bdaaeac49
Tainted: False
The reason why your (ImageMagick) convert and your (GraphicsMagick) gm convert commands produce different output is this:
Both utilities are NOT able to process PDF input files directly, both can only handle rater image formats.
In order to process PDF input files, both utilities resort to a 'delegate' program: in both cases this is Ghostscript (which CAN process PDF input files).
Both utilities however do use different 'delegate command lines' (as can be directly seen in your quoted -verbose commandline outputs:
i. convert employs as its Ghostscript output device pngalpha.
ii. gm convertemploys as its Ghostscript output device pnmraw.
Both utilities then process the output of their delegate's command into the final (raster) format file.
The problem is: the raster format 'pnmraw' does not support transparency (an alpha channel), but 'pngalpha' does. Hence, the utility which first converts PDF input to pnmraw has lost the transparent page backgrounds and replaced them by (opaque) white backgrounds.
Unless you modify your GraphicsMagick setup to make it use pngalpha in its delegate command (the same as ImageMagick uses) your gm convert will not show transparent background.
Just wanted to add to Kurt Pfeifle answer since it pointed me to this solution. The configuration that he is referring to is found in the delegates.mgk file (graphicsmagick/1.3.19_1/lib/GraphicsMagick/config).
For me the issue was this line:
<!-- Read color Postscript, EPS, and PDF -->
<delegate decode="gs-color" stealth="True" command='"gs" -q -dBATCH -dMaxBitmap=50000000 -dNOPAUSE -sDEVICE=ppmraw -dTextAlphaBits=%u -dGraphicsAlphaBits=%u -r%s %s "-sOutputFile=%s" -- "%s" -c quit' />
I changed it to:
<delegate decode="gs-color+alpha" stealth="True" command='"gs" -q -dBATCH -dMaxBitmap=50000000 -dNOPAUSE -sDEVICE=pngalpha -dTextAlphaBits=%u -dGraphicsAlphaBits=%u -r%s %s "-sOutputFile=%s" -- "%s" -c quit' />
and my pngs come out with transparent backgrounds!