Identify image depth and print size via Carrierwave, mini-magick and ImageMagick - ruby-on-rails-3

The goal is to validate an image based on dynamic height and width parameters, as well as DPI.
ImageMagick has the following command Identify which has a number of options.
-density
will generate the geometry widthxheight
-verbose
will generate a helpful "Print size: " and "Resolution"... among 78 other different lines... where width and height need to be parsed out to meet minimum requirements +/- 2%
so how does one extract those into a method, without stepping on intermediate toes (mini-magick)?

As the section on meta-information indicates, MiniMagick accesses the data using ImageMagick function in a single call, for example height density:
image["%y"]
ImageMagick has 47 single-letter attribute percent escapes that allows extraction of the data, provided your call to the image contains a ".path" suffix
image = MiniMagick::Image.open(#yourClass.theColumn.path)

Related

How to get DPI of a PDF file?

Using ImageMagick or GhostScript or any PHP code how can I get the DPI value of PDF files?
Here is the link for two demo files
http://jmp.sh/O5g5wL4 -- of 72 DPI
http://jmp.sh/RxrnYrY -- of 300 DPI
I have used
$image = new Imagick();
$image->readImage('xyz.pdf');
$resolutions = $image->getImageResolution();
It gives the same result for two different PDF files having different DPI.
I have also used
pdfimages -list xyz.pdf
It gives a list of all information but how to fetch the DPI value from the list.
How to get the exact DPI value of a PDF?
As fmw42 says PDF files themselves have no resolution. However in your case both the files consist of nothing but an image. In one case the image is ~48 MB and in the other its around 200 MB.
The reason is that the images have a different effective resolution.
In PDF the image is simply a bitmap, a sequence of coloured pixels. These are then drawn onto the underlying media. At this point there is no resolution, the pixels are laid down in a specific media size. In your case 22 inches by 82 inches.
The effective resolution is given by dividing the dimension by the number of pixels in the image in that dimension.
So if I have an image which is 1000x1000 pixels, and I draw it in a 1 inch square, then the effective resolution of the image is 1000 dpi. If I change my mind and draw it in a square 4 inches by 4 inches, then the effective resolution is 250 dpi.
The image hasn't changed, just the area it covers.
Now consider I have two images drawn in 1 inch squares. the first image is 1000x1000, the second is 500x500. The effective resolution of the first image is 1000 dpi, the effective resolution of the second is 500 dpi.
So you can see that, in PDF, the effective resolution of the image is a combination of the dimensions of the image, and the dimensions of the media it covers.
That's a difficult thing to measure in a PDF file. The area covered is calculated using matrix algebra and can be a combination of several different matrices.
The actual dimensions of the image, by contrast are quite easy to determine, they are given in the image dictionary. Your images are: 1620x5868 and 3372x12225. In both cases the media is the same size; 22.5x81.5 inches.
Since the images cover the entire media, the effective resolutions are;
1620/22.5 = 72 by 5868/81.5 = 72
3372/22.5 = 149.866 by 12225/81.5 = 150
I think MuPDF will give you image dimensions and media dimensions, assuming all your PDF files are constructed like this you can then simply perform the maths, but note that this won't be so simple for ordinary PDF files where images don't cover the entire media.
Using mutool info -I -M 150-dpi.pdf gives:
Retrieving info from pages 1-1...
Mediaboxes (1):
1 (6 0 R): [ 0 0 1620 5868 ]
Images (1):
1 (6 0 R): [ DCT ] 3375x12225 8bpc DevCMYK (12 0 R)
So there's your image dimensions and your media size. All you need to do is apply the division of one by the other.
Note: In debian and related distros, mutool is contained in mupdf-tools package, not in mupdf package itself. It can by therefore installed by sudo apt install mupdf-tools.
I use pdfimages -list from the poppler library, gives you all the information about the images.

I need detect the approximate location of QR code in scanned image (PDF converted to PNG)

I have many scanned document in PDF.
I use ImageMagick with Ghostscript to convert PDF to PNG in big density. I use convert -density 288 2.pdf 2.png. After that I read the pixels with PHP and find where is QR code and decode it. Because image is very big (~ 2500px), it's need very much RAM. I want, before I read pixels with PHP, to crop the image with ImageMagick and leave only that part with the QR code.
Can I detect the approximate location of QR code with ImageMagick, crop and leave only that part ?
Sample PDF
Converted PNG
Further Update
I see your discussion with Kurt about better extraction of the image from the PDF in the first place, and his recommendation was to use pdfimages. I just wanted to add that you won't find that if you do brew search pdfimages, but you actually need to use
brew install poppler
and then you get the pdfimages executable.
Updated Answer
If you change the tile size to 100x100 on the crop command and run this for the second PDF you supplied:
convert -density 288 pdf2.pdf -crop 100x100 tile%04d.png
and then use the same entropy analysis command
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
...
...
0.84432:+600+3100:tile0750.png
0.846019:+600+2800:tile0678.png
0.980938:+700+400:tile0103.png
0.984906:+700+500:tile0127.png
0.988808:+600+400:tile0102.png
0.998365:+600+500:tile0126.png
The last 4 listed tiles are
Likewise for the other PDF file you supplied, you get
0.863498:+1900+500:tile0139.png
0.954581:+2000+500:tile0140.png
0.974077:+1900+600:tile0163.png
0.97671:+2000+600:tile0164.png
which means these tiles
I would think that should help you pretty much approximately locate the QR code.
Original Answer
This is not all that scientific, but it may help you get started. The key, I think, is the entropy of the various areas of the image. The QR code has a lot of information encoded in a small area so it should have high entropy. So, I use ImageMagick to split the image into square 400x400 tiles like this:
convert image.png -crop 400x400 tile%03d.png
which gives me 54 tiles. Then I calculate the entropy of each of the tiles and sort them by increasing entropy, also outputting their offsets from the top left of the frame, and their name, like this:
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
0.00408949:+1200+2800:tile045.png
0.00473755:+1600+2800:tile046.png
0.00944815:+800+2800:tile044.png
0.0142171:+1200+3200:tile051.png
0.0143607:+1600+3200:tile052.png
0.0341039:+400+2800:tile043.png
0.0349564:+800+3200:tile050.png
0.0359226:+800+0:tile002.png
0.0549334:+800+400:tile008.png
0.0556793:+400+3200:tile049.png
0.0589632:+400+0:tile001.png
0.0649078:+1200+0:tile003.png
0.10811:+1200+400:tile009.png
0.116287:+2000+3200:tile053.png
0.120092:+800+800:tile014.png
0.12454:+0+2800:tile042.png
0.125963:+1600+0:tile004.png
0.128795:+800+1200:tile020.png
0.133506:+0+400:tile006.png
0.139894:+1600+400:tile010.png
0.143205:+2000+2800:tile047.png
0.144552:+400+2400:tile037.png
0.153143:+0+0:tile000.png
0.154167:+400+400:tile007.png
0.173786:+0+2400:tile036.png
0.17545:+400+1600:tile025.png
0.193964:+2000+400:tile011.png
0.209993:+0+3200:tile048.png
0.211954:+1200+800:tile015.png
0.215337:+400+2000:tile031.png
0.218159:+800+1600:tile026.png
0.230095:+2000+1200:tile023.png
0.237791:+2000+0:tile005.png
0.239336:+2000+1600:tile029.png
0.24275:+800+2400:tile038.png
0.244751:+0+2000:tile030.png
0.254958:+800+2000:tile032.png
0.271722:+2000+2000:tile035.png
0.275329:+0+1600:tile024.png
0.278992:+2000+800:tile017.png
0.282241:+400+1200:tile019.png
0.285228:+1200+1200:tile021.png
0.290524:+400+800:tile013.png
0.320734:+0+800:tile012.png
0.330168:+1600+2000:tile034.png
0.360795:+1200+2000:tile033.png
0.391519:+0+1200:tile018.png
0.421396:+1200+1600:tile027.png
0.421421:+2000+2400:tile041.png
0.421696:+1600+2400:tile040.png
0.486866:+1600+1600:tile028.png
0.489479:+1600+800:tile016.png
0.611449:+1600+1200:tile022.png
0.674079:+1200+2400:tile039.png
and, hey presto, the last one listed (i.e. the one with the highest entropy) tile039.png is this one.
I have drawn a rectangle around its location using this command
convert image.png -stroke red -fill none -strokewidth 3 -draw "rectangle 1200,2400 1600,2800" a.jpg
I concede there may be luck involved, but I only have one image to test my mad theories. You may need to tile twice, the second time with an x-offset and y-offset of half a tile width, so that you don't cut the QR code and split it across 2 tiles. You may need different size tiles for different size barcodes. You may need to consider the last 3-5 tiles located for your next algorithm. But I think it could form the basis of a method.

ImageMagick losing aspect ratio

Very simply I have a script that calls imagemagick on my photos.
The original image is 320 x 444, and I want to create a few scaled down versions but keeping the same aspect ratio
I call imagemagick using
convert oldfile.png -resize 290x newfile.png
I want to scale it to my set widths but the heights scale accordingly.
I do 80x, 160x and 290x in 3 separate commands.
The smallest 2 produce images with the same original aspect ratio, the 290x does not.
The size of the image it produces is 290 x 402
I have no idea why that one fails to keep the aspect ratio but the other 2 sizes maintaine it.
Any ideas?
I think that the problem in the third command is the requested size itself:
in the first command both dimensions are divided by 4: 320/4=80 and 444/4=111
in the second command both dimensions are divided by 2: 320/2=160 and 444/2=222
444 and 320 have only two common divisors: 2 and 4. You already used these divisors in your first two commands, so any other (width, height) couple will give you a slightly different aspect ratio: it is impossible to obtain the same exact aspect ratio fixing 290.
In fact while your original image has an aspect ratio of 1.3875, with a 290x403 image you would obtain an aspect ratio of 1,389655172 and with a 290x402 image you would get a 1,386206897 ratio: fixing 290 there is no other dimension's value that can give you the desired aspect ratio.
In general however Imagemagick always tries to preserve the aspect ratio of the image, as you can read in Imagemagick documentation:
The argument to the resize operator is the area into which the image
should be fitted. This area is not the final size of the image but the
maximum sizes for the image. that is because IM tries to preserve the
aspect ratio of the image more than the final (unless a '!' flag is
given), but at least one (if not both) of the final dimensions should
match the argument given image. So let me be clear... Resize will fit
the image into the requested size. It does NOT fill, the requested box
size.
For further reference see here

Create a tiff with only images and no text from a postscript file with ghostscript

Is it possible to create a tiff file from a postscript-file (created from a pdf-document with readable text and images) into a tiff file without the text and only the images?
There is a way to create a tiff with no images, but I don't know how to use that way for my task. I need it to generate two images from a postscript-file - the first one with the images only and the second one with the text only.
Since the text is drawn over the top of the image, simple clipping won't do the job.
You can hack the text out by redefining the show operators to no-ops. Insert this after the %%Page comment line (where the page code really starts).
/show{pop}def
/ashow{3{pop}repeat}def
/widthshow{4{pop}repeat}def
/awidthshow{6{pop}repeat}def
/kshow{2{pop}repeat}def
/xshow{2{pop}repeat}def
/xyshow{2{pop}repeat}def
/yshow{2{pop}repeat}def
/glyphshow{pop}def
/cshow{2{pop}repeat}def
This will suppress all text-drawing operators. Edit: Now includes level 2 and 3 operators.
If you're trying to selectively suppress different kinds of elements, you may want to redefine only some of these operators. You can add % at the beginning of a line to comment-out a line of the code, keeping the full list intact (for future uses).
Another way to selectively suppress elements from a ps file is to use the powerful clipping mechanism.
144 288 72 72 rectclip % clip to a 1"x1" square 2" from the left, 4" from the bottom
Clip works with an arbitrary path. So you can even string together points in a connect-the-dots fashion to get the effect of a lasso-clip. Probably easiest if you print the image out and trace a grid to easily plot points for the trajectory.
100 100 moveto
200 200 lineto
300 100 lineto
500 500 lineto
200 700 lineto
closepath clip % clip to a non-rectangular convex polygon
Clipping will suppress all drawing operations that fall outside of the clippath while the clipping path is in effect.

Image Magick: Image optimization for websites

I have a camera which produces photographs of 3008x2000 pixels. I use Image Magick to scale and resize the photos to be put up on my website. The size of the images I am using on the website is 602x400. I use this command to reduce the size:
convert DSC_0124.JPG -scale 20% -size 24% img1.jpg
This produces an image which is 602x400 pixels in size. But the file size will be always above 250KB. More images on a single html page means the page will be heavier and loading time will be longer. Are there any features in image magic that will help me to keep the file size as small as possible, possibly, below 100KB. But the image size should be the same, that is, 602x400px. I have achieved similar optimisation with SEAMonster tool for MS Windows. As it doean't have a commandline alternative, it wouldn't be of much help when there are hundreds of images to be converted.
Use command as Delan proposed with additional "-strip" flag to remove EXIF data, this have reduced the size of some of my images drastically. Here is a bash script for unix platforms, but you can use the second part only for individual images.
for X in *.jpg; do convert "$X" -resize 602x400 -strip -quality 86 "$X"; done
This will convert all images in the directory.
Use -quality to set the compression level:
convert DSC_0124.JPG -scale 20% -size 24% -quality [0..100] img1.jpg
You can define the maximum size of the output image at 100KB like this:
convert DSC_0124.JPG -resize 602x400! -strip -define jpeg:extent=100KB img1.jpg
If you are running your website on PHP, you might want to consider the SLIR image resizing script, it does a great job resizing to various constraints (see below) and caches the results.
Parameters:
w Maximum width
h Maximum height
c Crop ratio
q Quality
b Background fill color
p Progressive
http://shiftingpixel.com/2008/03/03/smart-image-resizer/
http://code.google.com/p/smart-lencioni-image-resizer/