Decrease pdf-filesize in ImageMagick - pdf

I am using GIMP to convert grayscale PNM-files (scanned documents) to PDF.
My goal is a small filesize. (Ideally: viewable on different devices without any problems and maybe suitable long-term preservation - PDF/A?)
So far, so good. Trying to reproduce that process with ImageMagick in a batch script doesn't give me that same small filesize as in GIMP.
GIMP (Ver. 2.8.14) workflow:
Open File
Change resolution (density) to 300x300 Pixel/in
Set threshold to 127 (=50%)
Export as OutGIMP.pdf
ImageMagick (Ver. 6.7.9-0 2012-09-16 Q16) workflow:
convert Scan.pnm -density 300x300 -threshold 50% -monochrome OutA.pdf
convert Scan.pnm -density 300x300 -threshold 50% -monochrome OutB.png
convert OutB.png OutC.pdf
Using an example File this results in:
OutGIMP.pdf: 141.195 Byte
OutA.pdf: 684.245 Byte
OutB.png: 137.246 Byte
OutC.pdf: 217.860 Byte
How can I get a PDF with ImageMagick that is at least as small as the GIMP-PDF?
Edit
Continuing the GIMP (Ver. 2.8.14) workflow from above with:
Scale to 100x100 Pixel/in while keeping the Imagesize
Export as OutGIMP_100ppi.pdf
strangely results in:
OutGIMP_100ppi.pdf: 179.123 Byte

Related

converting a pdf page to an image using GraphicsMagick

How do I convert only page 2 of a pdf file to a jpg image file, using GraphicsMagick command line prompt?
What option can I use in the gm.exe convert command?
gm.exe convert testing.pdf testing.jpg
Add the page number (starting from zero) in square brackets after the PDF filename:
gm.exe convert testing.pdf[1] testing.jpg
By the way, you can use the same indexing technique for accessing specific frames of a GIF animation, or layers of multi-layer/directory TIFFs.
use the blow command, will get high quality png with white background.
magick convert -density 300 -quality 100% -background white -alpha remove -alpha off ./646.04.pdf ./x.png

I need detect the approximate location of QR code in scanned image (PDF converted to PNG)

I have many scanned document in PDF.
I use ImageMagick with Ghostscript to convert PDF to PNG in big density. I use convert -density 288 2.pdf 2.png. After that I read the pixels with PHP and find where is QR code and decode it. Because image is very big (~ 2500px), it's need very much RAM. I want, before I read pixels with PHP, to crop the image with ImageMagick and leave only that part with the QR code.
Can I detect the approximate location of QR code with ImageMagick, crop and leave only that part ?
Sample PDF
Converted PNG
Further Update
I see your discussion with Kurt about better extraction of the image from the PDF in the first place, and his recommendation was to use pdfimages. I just wanted to add that you won't find that if you do brew search pdfimages, but you actually need to use
brew install poppler
and then you get the pdfimages executable.
Updated Answer
If you change the tile size to 100x100 on the crop command and run this for the second PDF you supplied:
convert -density 288 pdf2.pdf -crop 100x100 tile%04d.png
and then use the same entropy analysis command
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
...
...
0.84432:+600+3100:tile0750.png
0.846019:+600+2800:tile0678.png
0.980938:+700+400:tile0103.png
0.984906:+700+500:tile0127.png
0.988808:+600+400:tile0102.png
0.998365:+600+500:tile0126.png
The last 4 listed tiles are
Likewise for the other PDF file you supplied, you get
0.863498:+1900+500:tile0139.png
0.954581:+2000+500:tile0140.png
0.974077:+1900+600:tile0163.png
0.97671:+2000+600:tile0164.png
which means these tiles
I would think that should help you pretty much approximately locate the QR code.
Original Answer
This is not all that scientific, but it may help you get started. The key, I think, is the entropy of the various areas of the image. The QR code has a lot of information encoded in a small area so it should have high entropy. So, I use ImageMagick to split the image into square 400x400 tiles like this:
convert image.png -crop 400x400 tile%03d.png
which gives me 54 tiles. Then I calculate the entropy of each of the tiles and sort them by increasing entropy, also outputting their offsets from the top left of the frame, and their name, like this:
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
0.00408949:+1200+2800:tile045.png
0.00473755:+1600+2800:tile046.png
0.00944815:+800+2800:tile044.png
0.0142171:+1200+3200:tile051.png
0.0143607:+1600+3200:tile052.png
0.0341039:+400+2800:tile043.png
0.0349564:+800+3200:tile050.png
0.0359226:+800+0:tile002.png
0.0549334:+800+400:tile008.png
0.0556793:+400+3200:tile049.png
0.0589632:+400+0:tile001.png
0.0649078:+1200+0:tile003.png
0.10811:+1200+400:tile009.png
0.116287:+2000+3200:tile053.png
0.120092:+800+800:tile014.png
0.12454:+0+2800:tile042.png
0.125963:+1600+0:tile004.png
0.128795:+800+1200:tile020.png
0.133506:+0+400:tile006.png
0.139894:+1600+400:tile010.png
0.143205:+2000+2800:tile047.png
0.144552:+400+2400:tile037.png
0.153143:+0+0:tile000.png
0.154167:+400+400:tile007.png
0.173786:+0+2400:tile036.png
0.17545:+400+1600:tile025.png
0.193964:+2000+400:tile011.png
0.209993:+0+3200:tile048.png
0.211954:+1200+800:tile015.png
0.215337:+400+2000:tile031.png
0.218159:+800+1600:tile026.png
0.230095:+2000+1200:tile023.png
0.237791:+2000+0:tile005.png
0.239336:+2000+1600:tile029.png
0.24275:+800+2400:tile038.png
0.244751:+0+2000:tile030.png
0.254958:+800+2000:tile032.png
0.271722:+2000+2000:tile035.png
0.275329:+0+1600:tile024.png
0.278992:+2000+800:tile017.png
0.282241:+400+1200:tile019.png
0.285228:+1200+1200:tile021.png
0.290524:+400+800:tile013.png
0.320734:+0+800:tile012.png
0.330168:+1600+2000:tile034.png
0.360795:+1200+2000:tile033.png
0.391519:+0+1200:tile018.png
0.421396:+1200+1600:tile027.png
0.421421:+2000+2400:tile041.png
0.421696:+1600+2400:tile040.png
0.486866:+1600+1600:tile028.png
0.489479:+1600+800:tile016.png
0.611449:+1600+1200:tile022.png
0.674079:+1200+2400:tile039.png
and, hey presto, the last one listed (i.e. the one with the highest entropy) tile039.png is this one.
I have drawn a rectangle around its location using this command
convert image.png -stroke red -fill none -strokewidth 3 -draw "rectangle 1200,2400 1600,2800" a.jpg
I concede there may be luck involved, but I only have one image to test my mad theories. You may need to tile twice, the second time with an x-offset and y-offset of half a tile width, so that you don't cut the QR code and split it across 2 tiles. You may need different size tiles for different size barcodes. You may need to consider the last 3-5 tiles located for your next algorithm. But I think it could form the basis of a method.

How to compress images (png, jpg and so on) using objective C

i want to shrink png or jpg on OSX. i only want to shrinkg without affecting the image quality.
like tinypng.org
is there any recommended library? i just know imagemagick. is there a way to do that natively? or another library to shrink/compress images without affecting the image quality?
my aim is to shrink the file size, for example:
logo.png >> 476 k before shrink
logo.png >> 50k after shrink
Edit: to be clear, i want to compress the size of the file, not the image resolution.
TinyPNG.org works by using image quantisation - the similar colours in the image are converted into a HSV or RGB model and then merged depending on the distance.
How does it work?
...
When you upload a PNG (Portable Network Graphics) file, similar colours in your image are combined. This technique is called “quantisation”
...
src: http://tinypng.org
An answer here outlines a method of doing so: https://stackoverflow.com/a/492230/556479.
There are also some answers on this question with refer to how you can do so on Mac OS using objective-c: How do I reduce a bitmap to a known set of RGB colours
See Wikipedia for a more in depth guide: http://en.wikipedia.org/wiki/Color_quantization
Did you have a problem using ImageMagick? It has a rich set of quantize functions such as
bool MagickQuantizeImage( MagickWand mgck_wnd,
float number_colors,
int colorspace_type,
float treedepth,
bool dither,
bool measure_error )
Here is a very thorough guide to quantization using imageMagick
My suggestion is to use http://pngnq.sourceforge.net, it will give better results than ImageMagick and for the single example given in http://tinypng.org, it also produces a very similar output. It is a tiny C implementation of the method present in the paper "Kohonen Neural Networks for Optimal Colour Quantization". That alone is much better since you are no longer relying on closed unknown implementations.
Original (57 KB), tinypng.org (16 KB), pngnq (17 KB):
Using ImageMagick, the best quantization to 256 colors I can get uses the LAB colorspace and dithering by Floyd-Steinberg:
convert input.png -quantize LAB -dither FloydSteinberg -colors 256 output.png
This produces a 16 KB png, but it contains much more visual artifacts:

Image Magick: Image optimization for websites

I have a camera which produces photographs of 3008x2000 pixels. I use Image Magick to scale and resize the photos to be put up on my website. The size of the images I am using on the website is 602x400. I use this command to reduce the size:
convert DSC_0124.JPG -scale 20% -size 24% img1.jpg
This produces an image which is 602x400 pixels in size. But the file size will be always above 250KB. More images on a single html page means the page will be heavier and loading time will be longer. Are there any features in image magic that will help me to keep the file size as small as possible, possibly, below 100KB. But the image size should be the same, that is, 602x400px. I have achieved similar optimisation with SEAMonster tool for MS Windows. As it doean't have a commandline alternative, it wouldn't be of much help when there are hundreds of images to be converted.
Use command as Delan proposed with additional "-strip" flag to remove EXIF data, this have reduced the size of some of my images drastically. Here is a bash script for unix platforms, but you can use the second part only for individual images.
for X in *.jpg; do convert "$X" -resize 602x400 -strip -quality 86 "$X"; done
This will convert all images in the directory.
Use -quality to set the compression level:
convert DSC_0124.JPG -scale 20% -size 24% -quality [0..100] img1.jpg
You can define the maximum size of the output image at 100KB like this:
convert DSC_0124.JPG -resize 602x400! -strip -define jpeg:extent=100KB img1.jpg
If you are running your website on PHP, you might want to consider the SLIR image resizing script, it does a great job resizing to various constraints (see below) and caches the results.
Parameters:
w Maximum width
h Maximum height
c Crop ratio
q Quality
b Background fill color
p Progressive
http://shiftingpixel.com/2008/03/03/smart-image-resizer/
http://code.google.com/p/smart-lencioni-image-resizer/

Dimension Preserving JPEG to EPS Conversion

I am looking for the best way to convert my JPEG files into EPS. I have to convert my image files to EPS to insert into my LaTeX files. Note that I am using dvipdfm to compile my LaTeX file into PDF and I am not using pdflatex.
The problem is that the actual size of the image changes under the conversion to EPS. Therefore, I have to use the "scale" option of the "includegraphics" command in LaTeX to get the image scale to its actual size. I have tried Gimp, Jpeg2ps and ImageMagick Convert to convert my JPEG files into EPS files. However, each of these converters produces an EPS file whose actual size is different from the actual size of the original JPEG file.
I'd like to know if anyone knows of a way to convert JPEG files to EPS files which preserves the original dimensions of the image. Such a dimension-preserving converter would relieve us from scaling the image in the LaTeX file manually.
My LaTeX file (include-image.tex) is the following:
\documentclass{article}
\usepackage[dvipdfm]{graphicx}
\begin{document}
\begin{figure}
\includegraphics{image.eps}
\end{figure}
\end{document}
And, I use the following Makefile to produce the pdf:
include-image.pdf: include-image.dvi
dvipdfm include-image.dvi
include-image.dvi: include-image.tex
latex include-image.tex
JPEG is a raster format with a fixed resolution, EPS is a vector format with no resolution.
http://www.logodesignworks.com/blog/vector-graphics-and-raster-graphics-difference
A raster graphics don't have physical dimensions relative to print media, the application that renders them out uses a conversion ratio, Dots-Per-Inch (DPI), to scale the graphic. If you have a 2000x2000 pixel JPEG and you print it out at 400 DPI it will be 5x5 inches, if you print it at 800 DPI it will be 2.5x2.5 inches.
In the jpeg2ps program you mention there is a -r switch to specify the DPI of the input JPEG that will calculate the dimensions of the EPS file by dividing the pixel dimensions of the JPEG by the DPI value to get the inch dimensions of the EPS file.