converting PDF with image magic leaving extra white space around - pdf

I am trying to convert .pdf files to .jpg using image-magic
convert -limit -limit map 300 -flatten -density 300 -quality 100 -crop '400x400+20+20' dummy.pdf[0] test.jpg
but the problem i am facing is when i convert the file, it cropping the area but marking all the other area as white.
for example if i am converting a pdf with 1000x1000 size and cropping it to a 100x100 size, the output am getting is an image with 1000x1000 size with 100x100 area croped from the pdf and rest is white space.
sample.pdf
i cannot use trim, since my pdf may or may not have white border and trim will remove it

Your syntax is not in the proper order for Imagemagick. Most of the settings and operators need to come after reading the input PDF. Using Imagemagick 6.9.10.71 Q16 Mac OSX Sierra:
convert -limit map 300 -density 300 dummy.pdf[0] -background white -flatten -crop '400x400+20+20' -quality 100 test.jpg

Related

converting a pdf page to an image using GraphicsMagick

How do I convert only page 2 of a pdf file to a jpg image file, using GraphicsMagick command line prompt?
What option can I use in the gm.exe convert command?
gm.exe convert testing.pdf testing.jpg
Add the page number (starting from zero) in square brackets after the PDF filename:
gm.exe convert testing.pdf[1] testing.jpg
By the way, you can use the same indexing technique for accessing specific frames of a GIF animation, or layers of multi-layer/directory TIFFs.
use the blow command, will get high quality png with white background.
magick convert -density 300 -quality 100% -background white -alpha remove -alpha off ./646.04.pdf ./x.png

convert PDF to EPS to PNG without text

How to convert PDF to PNG (and filter out the text)..
I want to render images and vector graphics (vector text included) without plain text
Below only the image is extracted.. not the whole page of the PDF
gs -sDEVICE=eps2write -dFILTERTEXT -dFirstPage=1 -dLastPage=1 -o out.eps 091.pdf
convert -density 300 -background white -alpha off out.eps -resize 2480x3508! OUT.png
from EPS
PDF
The FILTERTEST switch (as I'm pretty sure is documented) only works on text, not on image which contain a pattern of pixels which look like text, or on vector linework which looks like text.
Wihtout seeing your EPS I can't tell if the text you are complaining about is text or not, but my guess would be not.
By the way, if you want a PNG then there's no reason to convert the EPS to PDF, just render the EPS directly to a PNG file.

Error in converting images in Imagemagick

I use Imagemagick convert to convert pdf file to png as follows:
Magick convert -density 300 PointOnLine.pdf -quality 90 PointOnLine.png
It gives me the following warning:
convert: profile 'icc': 'RGB ': RGB color space not permitted on grayscale PNG `PointOnLine.png' # warning/png.c/MagickPNGWarningHandler/1744.
And png image created is all black. However, convert to jpg image is fine.
Update: After adding -define profile:skip=ICC, image is still dark. But if convert to jpg and then to png, it is ok, but background is dark. The same warning is still there. What is the problem? Thanks.
The following works for me without error in ImageMagick 7.0.7.22 Q16 Mac OSX Sierra with Ghostscript 9.21 and libpng #1.6.34_0. Your PDF has an alpha channel, so you might want to flatten it.
magick -density 300 PointOnLine.pdf -flatten -quality 90 result.png
This also works without error, but leaves the alpha channel in the png, though you won't see it here until you extract the image:
magick -density 300 PointOnLine.pdf -quality 90 result2.png
Note that in IM 7 you should just use magick and not magick convert.
Check that you are using a current version of Ghostscript and libpng, if you do not get the same results.
Your delegates.xml file for PS:alpha should show sDEVICE=pngalpha rather than pnmraw as follows.
<delegate decode="ps:alpha" stealth="True" command=""gs" -sstdout=%%stderr -dQUIET -dSAFER -dBATCH -dNOPAUSE -dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 "-sDEVICE=pngalpha" -dTextAlphaBits=%u -dGraphicsAlphaBits=%u "-r%s" %s "-sOutputFile=%s" "-f%s" "-f%s""/>
USER REQUESTED RESULTING IMAGES THAT I POSTED TO BE REMOVED!
Command which worked for me was:
magick -density 300 PointOnLine.pdf -depth 8 -strip -background white -alpha off PointOnLine.tiff
It did not gave any warning, also removed black blackground as well.
I was able to convert it to the text afterwards using tesseract:
tesseract PointOnLine.tiff PointOnLine
I understand you are using ImageMagick under Windows, even if not stated (and the respective versions of IM, Win were not posted)
I am under Ubuntu 16.04 LTS, and I will provide an answer possibly useful. (Under Win, prepend everything with Magick).
For me,
convert -density 300 -quality 90 PointOnLine.pdf PointOnLine.png
works fine, with no warnings, producing a suitable output.
I tried other things which work as well, some of them may suit you.
First convert your pdf to RGB and then to png.
convert -density 300 -colorspace RGB PointOnLine.pdf PointOnLine_rgb.pdf
convert -density 300 PointOnLine_rgb.pdf PointOnLine_rgb.png
If you post your PDF, I can check it out. Otherwise, perhaps it is CMYK, which PNG does not support. So try
magick -quiet -density 300 -colorspace srgb PointOnLine.pdf -quality 90 PointOnLine.png
Note in IM 7, use magick not magick convert. Also not that -quality is different for PNG than JPG. See https://www.imagemagick.org/script/command-line-options.php#quality
I had the same issue and resolved adding -colorspace RGB before the output filename.
convert -density 300 PointOnLine.pdf -quality 90 -colorspace RGB PointOnLine.png

Configuration and optimization ImageMagic and Tesseract

We are using ImageMagic and tesseract to try to read information in documents, but we are not finding the right configuration and combination of both softwares to optimize the original scanned tif document, and apply tesseract to it to obtain the information.
First we use to scan the document in a scanner with a configuration of 300 dpi, and the tif document produces uses to have 170KB size.
Then we try to run a pre-process of the image with imagemagic before passiing it to tesseract 3.0.3, to produce a PDF with text document.
The first command we use is this one:
convert page.tiff -respect-parenthesis -compress LZW -density 300
-bordercolor black -border 1 -fuzz 1% -trim +repage -fill white -draw
"color 0,0 floodfill" -alpha off -shave 1x1 -bordercolor black -border 2
-fill white -draw "color 0,0 floodfill" -alpha off -shave 0x1 -fuzz 1%
-deskew 40 +repage temp.tiff
And then we apply it to tesseract this way:
tesseract -l spa temp.tiff temp pdf
This produces a quite heavy pdf https://drive.google.com/open?id=0B3CPIZ_TyzFXd2UtWldfajR4SVU but tesseract is not able to read data that are in cells, or in a table just under the header of the table if the background of the header is darker.
Then we have tried to use this command with convert:
convert page.tiff -compress LZW -fuzz 1% -trim -alpha off -shave 1x1 temp.tiff
And this produces a very light pdf document https://drive.google.com/open?id=0B3CPIZ_TyzFXWFEwT3JucDBTVVU, but we are still having the same problems.
Could someone point us what way shall we follow to optimize the image to try to obtain information like the ones in the example? or guidelines to optimize images to improve the tesseract accuracy?
The type of documents we are trying to process are very different with different kind of font types and sizes
If on a Unix-based system, you could try my script, textcleaner, at http://www.fmwconcepts.com/imagemagick/index.php

graphicsmagick - convert images to pdf and vice versa

I want to shift from imagemagick to graphicsmagick
But I encounter some issues with the syntax
With imagemagick
First I need to merge some images into a PDF
convert -density 300 page_*.tif output.pdf
And then I need to create a thumbnail of the first page of the PDF
convert -density 300 file.pdf[0] -background white -alpha remove -resize 140x140 -strip -quality 40 thumb.jpg[0]
This works fine.. But I want to switch the first command to graphicsmagick
Width graphicsmagick/imagemagick
The graphicsmagick syntax here works fine
gm convert -density 300 page_*.tif output.pdf
But when creating the thumbnail with imagemagick the output has the right size but the acutal image is downsized inside the image itself?!
Thumbnail with imagemagick
https://secure.dyndev.dk/data/voucher/30000/400/30435_eb7e5d0a9df71b2783e2fa89efd9de12fcdb9679.pdf
Thumbnail with graphicsmagick
https://secure.dyndev.dk/data/voucher/30000/400/30433_7710d6404534b0868ab8da41dd651e971b70e16b.pdf
Just hit the same issue, and found a solution here:
https://blog.josephscott.org/2009/11/16/imagemagick-convert-pdf-to-jpg-partial-image-size-problem/
You need to change your convert command into:
convert -density 300 -define "pdf:use-cropbox=true" file.pdf[0] -background white -alpha remove -resize 140x140 -strip -quality 40 thumb.jpg[0]
And perhaps add a -resize "2000x2000>" to limit the size of the resulting JPEG, especially with high density values.