GraphicsMagick crop PDF - pdf

I've got a 8.5x11 PDF at 300dpi. It has a single UPC label in the top left corner of the PDF. Imagine that there could be 30 labels on a 1 sheet, but we just have 1 label.
I'm trying to crop the PDF to be just the size of the 1 label. So far I've got this
gm convert -density 300 single.pdf out.pdf
Which doesn't do any cropping. When I crop to say 300x100 it makes a 20MB file with 30000 pages.
I have not a clue how to use -crop to actually crop to the correct size. I need it to be 3.5inches by 1.125 inches.

Using the following input PDF (here converted to a PNG):
the following command will crop the label:
gm wiz.pdf -crop 180x50+1+1 cropped.pdf
This label is sized 180x50 pixels.
For an 8.5x11in PDF at 300 PPI you'd have a 2450x3300 pixels PDF (which I doubt you do, but that's another question) and you'd need to use -crop 1050x337+0+0 (more exactly, 1050x337.5+0+0 -- but you cannot crop half pixels!).
Note, the +0+0 part crops the top left corner. If you need offset to the right by N pixels and to the bottom by M pixels use +N+M...
Using ImageMagick instead...
You could also use ImageMagick's convert command:
convert wiz.pdf[180x50+1+1] cropped.pdf
Comment about image sizes...
One additional comment about this remark:
"I have not a clue how to use -crop to actually crop to the correct size."
There is no other real size for raster images than pixels. ABC pixels wide and XYZ pixels high...
There is no such thing as an absolute, real size for a digital image that you can measure in inches... unless you additionally can state the resolution at which a given image is rendered on a display or a print device!
An 8.50x11in sized image at 300 PPI will translate to 2550x3300 pixels.
However, if your image does not contain this amount of pixels (which is the real, absolute size of any raster image), you may still be able to render it at 300 PPI -- but its size in inches will be different from 8.5x11in!
So, whenever you want to crop, use the absolute number of pixels you want. Don't use resolution/density at all on your command line!

Related

How to get DPI of a PDF file?

Using ImageMagick or GhostScript or any PHP code how can I get the DPI value of PDF files?
Here is the link for two demo files
http://jmp.sh/O5g5wL4 -- of 72 DPI
http://jmp.sh/RxrnYrY -- of 300 DPI
I have used
$image = new Imagick();
$image->readImage('xyz.pdf');
$resolutions = $image->getImageResolution();
It gives the same result for two different PDF files having different DPI.
I have also used
pdfimages -list xyz.pdf
It gives a list of all information but how to fetch the DPI value from the list.
How to get the exact DPI value of a PDF?
As fmw42 says PDF files themselves have no resolution. However in your case both the files consist of nothing but an image. In one case the image is ~48 MB and in the other its around 200 MB.
The reason is that the images have a different effective resolution.
In PDF the image is simply a bitmap, a sequence of coloured pixels. These are then drawn onto the underlying media. At this point there is no resolution, the pixels are laid down in a specific media size. In your case 22 inches by 82 inches.
The effective resolution is given by dividing the dimension by the number of pixels in the image in that dimension.
So if I have an image which is 1000x1000 pixels, and I draw it in a 1 inch square, then the effective resolution of the image is 1000 dpi. If I change my mind and draw it in a square 4 inches by 4 inches, then the effective resolution is 250 dpi.
The image hasn't changed, just the area it covers.
Now consider I have two images drawn in 1 inch squares. the first image is 1000x1000, the second is 500x500. The effective resolution of the first image is 1000 dpi, the effective resolution of the second is 500 dpi.
So you can see that, in PDF, the effective resolution of the image is a combination of the dimensions of the image, and the dimensions of the media it covers.
That's a difficult thing to measure in a PDF file. The area covered is calculated using matrix algebra and can be a combination of several different matrices.
The actual dimensions of the image, by contrast are quite easy to determine, they are given in the image dictionary. Your images are: 1620x5868 and 3372x12225. In both cases the media is the same size; 22.5x81.5 inches.
Since the images cover the entire media, the effective resolutions are;
1620/22.5 = 72 by 5868/81.5 = 72
3372/22.5 = 149.866 by 12225/81.5 = 150
I think MuPDF will give you image dimensions and media dimensions, assuming all your PDF files are constructed like this you can then simply perform the maths, but note that this won't be so simple for ordinary PDF files where images don't cover the entire media.
Using mutool info -I -M 150-dpi.pdf gives:
Retrieving info from pages 1-1...
Mediaboxes (1):
1 (6 0 R): [ 0 0 1620 5868 ]
Images (1):
1 (6 0 R): [ DCT ] 3375x12225 8bpc DevCMYK (12 0 R)
So there's your image dimensions and your media size. All you need to do is apply the division of one by the other.
Note: In debian and related distros, mutool is contained in mupdf-tools package, not in mupdf package itself. It can by therefore installed by sudo apt install mupdf-tools.
I use pdfimages -list from the poppler library, gives you all the information about the images.

Tiling an image over a page with ImageMagick with print margins?

I am trying to get ImageMagick to do something for me and I am running into a few problems. First, I am not understanding units of measure and such passed into ImageMagick and so my script is not producing what I need. Second, the way I am doing it is extremely inefficient. Running this script takes a very long time (the one you see below is slightly trimmed down from what I am running).
So to what I am doing... I have a number of svg files with icons in them. I am looking to generate a page for each of these files. The page generated will contain the icon tiled over the entire page with a margin on the side. I am looking for 1/2 inch tiles with 1/2 margins around the page which needs to be a US Letter (8 1/2 x 11 inch).
After reading a lot of the documentation this is what I came up with.
colors=(red blue purple yellow green black)
mkdir -p generated/icons/
for color in ${colors[#]}; do
images=`printf "source/icons/${color}.svg%.0s " {1..300}`
montage $images -tile 15x20 -page Letter+1+1 -units PixelsPerInch -density 2550x3300 \
generated/icons/${color}.pdf
done
So for each of my files I run montage. I use printf to repeat the image file name 300 times. I then tile this 15x20 times. 15x20 comes from 8.5 minus 1 inch margins = 7.5*2 = 15 and likewise (11-1)*2 = 20. 300 images come from 15*20. I then say I want this on a letter page offset 1x1. (This was my attempt at a margin) I say I am speaking in pixel per inch (but none of the units seem to match up). I set the dpi to 300 by the density command where 8.5*300 = 2550 and 11*300 = 3300.
I've been toying with other settings (geometry etc.) but none of these are working. And the units don't seem to make sense either... Right now my resultant pdf is a square etc...
How do I make tiled pages as such? Also is there a way for me to do this more efficiently? What I have thus far is very slow.
EDIT:
Some more information:
i:montage --version
Version: ImageMagick 6.8.8-10 Q16 x86_64 2015-03-10 http://www.imagemagick.org
tile image:
my current output:
Notice margins not right, is square not a letter page, also tiles as skewed
Given the PNG image you provided, and I presume you want a 1 inch border of white all around inside an 8.5x11 inch printed image. Thus the tiled width would be 7.5 inches and tiled height would be 10 inches.
1 in = 300 dpi so border thickness = 300 px = 2 tiles thick
11-1 = 10 inches tall for tiled region height = 10*300 = 3000 px
8.5-1 = 7.5 inches wide for tiled region width = 7.5*300 = 2250 px
1 tile = 0.5 inches at 300 dpi = 0.5*300 = 150 px
convert lUDbK.png -resize "150x150!" -write mpr:tile +delete -size 2250x3000 tile:mpr:tile -bordercolor white -border 300 -units pixelsperinch -density 300 tiled_page.png
Time to process was 1.75 sec on my Mac Mini.
This produces an image which is rather large. You will have to extract the image to see the border, since this page background is white.
(Note that PNG only supports pixelspercentimeter, but IM converts my specification of pixelperinch accordingly. So if you look at the meta data, it will probably show you some other density in units of pixelspercentimeter. But they will correspond to the desired 300 dpi.)

I need detect the approximate location of QR code in scanned image (PDF converted to PNG)

I have many scanned document in PDF.
I use ImageMagick with Ghostscript to convert PDF to PNG in big density. I use convert -density 288 2.pdf 2.png. After that I read the pixels with PHP and find where is QR code and decode it. Because image is very big (~ 2500px), it's need very much RAM. I want, before I read pixels with PHP, to crop the image with ImageMagick and leave only that part with the QR code.
Can I detect the approximate location of QR code with ImageMagick, crop and leave only that part ?
Sample PDF
Converted PNG
Further Update
I see your discussion with Kurt about better extraction of the image from the PDF in the first place, and his recommendation was to use pdfimages. I just wanted to add that you won't find that if you do brew search pdfimages, but you actually need to use
brew install poppler
and then you get the pdfimages executable.
Updated Answer
If you change the tile size to 100x100 on the crop command and run this for the second PDF you supplied:
convert -density 288 pdf2.pdf -crop 100x100 tile%04d.png
and then use the same entropy analysis command
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
...
...
0.84432:+600+3100:tile0750.png
0.846019:+600+2800:tile0678.png
0.980938:+700+400:tile0103.png
0.984906:+700+500:tile0127.png
0.988808:+600+400:tile0102.png
0.998365:+600+500:tile0126.png
The last 4 listed tiles are
Likewise for the other PDF file you supplied, you get
0.863498:+1900+500:tile0139.png
0.954581:+2000+500:tile0140.png
0.974077:+1900+600:tile0163.png
0.97671:+2000+600:tile0164.png
which means these tiles
I would think that should help you pretty much approximately locate the QR code.
Original Answer
This is not all that scientific, but it may help you get started. The key, I think, is the entropy of the various areas of the image. The QR code has a lot of information encoded in a small area so it should have high entropy. So, I use ImageMagick to split the image into square 400x400 tiles like this:
convert image.png -crop 400x400 tile%03d.png
which gives me 54 tiles. Then I calculate the entropy of each of the tiles and sort them by increasing entropy, also outputting their offsets from the top left of the frame, and their name, like this:
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
0.00408949:+1200+2800:tile045.png
0.00473755:+1600+2800:tile046.png
0.00944815:+800+2800:tile044.png
0.0142171:+1200+3200:tile051.png
0.0143607:+1600+3200:tile052.png
0.0341039:+400+2800:tile043.png
0.0349564:+800+3200:tile050.png
0.0359226:+800+0:tile002.png
0.0549334:+800+400:tile008.png
0.0556793:+400+3200:tile049.png
0.0589632:+400+0:tile001.png
0.0649078:+1200+0:tile003.png
0.10811:+1200+400:tile009.png
0.116287:+2000+3200:tile053.png
0.120092:+800+800:tile014.png
0.12454:+0+2800:tile042.png
0.125963:+1600+0:tile004.png
0.128795:+800+1200:tile020.png
0.133506:+0+400:tile006.png
0.139894:+1600+400:tile010.png
0.143205:+2000+2800:tile047.png
0.144552:+400+2400:tile037.png
0.153143:+0+0:tile000.png
0.154167:+400+400:tile007.png
0.173786:+0+2400:tile036.png
0.17545:+400+1600:tile025.png
0.193964:+2000+400:tile011.png
0.209993:+0+3200:tile048.png
0.211954:+1200+800:tile015.png
0.215337:+400+2000:tile031.png
0.218159:+800+1600:tile026.png
0.230095:+2000+1200:tile023.png
0.237791:+2000+0:tile005.png
0.239336:+2000+1600:tile029.png
0.24275:+800+2400:tile038.png
0.244751:+0+2000:tile030.png
0.254958:+800+2000:tile032.png
0.271722:+2000+2000:tile035.png
0.275329:+0+1600:tile024.png
0.278992:+2000+800:tile017.png
0.282241:+400+1200:tile019.png
0.285228:+1200+1200:tile021.png
0.290524:+400+800:tile013.png
0.320734:+0+800:tile012.png
0.330168:+1600+2000:tile034.png
0.360795:+1200+2000:tile033.png
0.391519:+0+1200:tile018.png
0.421396:+1200+1600:tile027.png
0.421421:+2000+2400:tile041.png
0.421696:+1600+2400:tile040.png
0.486866:+1600+1600:tile028.png
0.489479:+1600+800:tile016.png
0.611449:+1600+1200:tile022.png
0.674079:+1200+2400:tile039.png
and, hey presto, the last one listed (i.e. the one with the highest entropy) tile039.png is this one.
I have drawn a rectangle around its location using this command
convert image.png -stroke red -fill none -strokewidth 3 -draw "rectangle 1200,2400 1600,2800" a.jpg
I concede there may be luck involved, but I only have one image to test my mad theories. You may need to tile twice, the second time with an x-offset and y-offset of half a tile width, so that you don't cut the QR code and split it across 2 tiles. You may need different size tiles for different size barcodes. You may need to consider the last 3-5 tiles located for your next algorithm. But I think it could form the basis of a method.

ImageMagick losing aspect ratio

Very simply I have a script that calls imagemagick on my photos.
The original image is 320 x 444, and I want to create a few scaled down versions but keeping the same aspect ratio
I call imagemagick using
convert oldfile.png -resize 290x newfile.png
I want to scale it to my set widths but the heights scale accordingly.
I do 80x, 160x and 290x in 3 separate commands.
The smallest 2 produce images with the same original aspect ratio, the 290x does not.
The size of the image it produces is 290 x 402
I have no idea why that one fails to keep the aspect ratio but the other 2 sizes maintaine it.
Any ideas?
I think that the problem in the third command is the requested size itself:
in the first command both dimensions are divided by 4: 320/4=80 and 444/4=111
in the second command both dimensions are divided by 2: 320/2=160 and 444/2=222
444 and 320 have only two common divisors: 2 and 4. You already used these divisors in your first two commands, so any other (width, height) couple will give you a slightly different aspect ratio: it is impossible to obtain the same exact aspect ratio fixing 290.
In fact while your original image has an aspect ratio of 1.3875, with a 290x403 image you would obtain an aspect ratio of 1,389655172 and with a 290x402 image you would get a 1,386206897 ratio: fixing 290 there is no other dimension's value that can give you the desired aspect ratio.
In general however Imagemagick always tries to preserve the aspect ratio of the image, as you can read in Imagemagick documentation:
The argument to the resize operator is the area into which the image
should be fitted. This area is not the final size of the image but the
maximum sizes for the image. that is because IM tries to preserve the
aspect ratio of the image more than the final (unless a '!' flag is
given), but at least one (if not both) of the final dimensions should
match the argument given image. So let me be clear... Resize will fit
the image into the requested size. It does NOT fill, the requested box
size.
For further reference see here

Image Magick: Image optimization for websites

I have a camera which produces photographs of 3008x2000 pixels. I use Image Magick to scale and resize the photos to be put up on my website. The size of the images I am using on the website is 602x400. I use this command to reduce the size:
convert DSC_0124.JPG -scale 20% -size 24% img1.jpg
This produces an image which is 602x400 pixels in size. But the file size will be always above 250KB. More images on a single html page means the page will be heavier and loading time will be longer. Are there any features in image magic that will help me to keep the file size as small as possible, possibly, below 100KB. But the image size should be the same, that is, 602x400px. I have achieved similar optimisation with SEAMonster tool for MS Windows. As it doean't have a commandline alternative, it wouldn't be of much help when there are hundreds of images to be converted.
Use command as Delan proposed with additional "-strip" flag to remove EXIF data, this have reduced the size of some of my images drastically. Here is a bash script for unix platforms, but you can use the second part only for individual images.
for X in *.jpg; do convert "$X" -resize 602x400 -strip -quality 86 "$X"; done
This will convert all images in the directory.
Use -quality to set the compression level:
convert DSC_0124.JPG -scale 20% -size 24% -quality [0..100] img1.jpg
You can define the maximum size of the output image at 100KB like this:
convert DSC_0124.JPG -resize 602x400! -strip -define jpeg:extent=100KB img1.jpg
If you are running your website on PHP, you might want to consider the SLIR image resizing script, it does a great job resizing to various constraints (see below) and caches the results.
Parameters:
w Maximum width
h Maximum height
c Crop ratio
q Quality
b Background fill color
p Progressive
http://shiftingpixel.com/2008/03/03/smart-image-resizer/
http://code.google.com/p/smart-lencioni-image-resizer/