How to change the dpi of differently sized images within a pdf in order to realise a common "print size" for every page? - pdf

I am new to the world of raster images, so I will first explain which definitions I use and hope that I will use them right:
- geometry (the total number of pixels of the image %w * %h)
- resolution (pixels per inch / ppi)
- size or "print size" (the display size (e.g. in inches) on screen or printer)
I have some PDF documents containing raster images of different geometry. When opening with evince they therefore all display (and I guess potentially print) with different sizes. I would like to define the print size within the pdf so that evince (or any other viewer) would display every page with the same size when opening the document.
How could this be realized? Geometry and print size of the image are linked by the resolution as far as I understand. Currently one of my pdf's shows to following ImageMagick:identify output:
$identify -units PixelsPerInch -format "%w x %h - %[resolution.x] x %[resolution.y] - %[fx:w/72] x %[fx:h/72] in\n" example.pdf
geometry - resol. - print size -
538 x 375 - 72 x 72 - 7.47 x 5.20 in
546 x 381 - 72 x 72 - 7.58 x 5.29 in
1210 x 1681 - 72 x 72 - 16.80 x 23.34 in
1658 x 1166 - 72 x 72 - 23.02 x 16.19 in
542 x 365 - 72 x 72 - 7.52 x 5.06 in
1673 x 1169 - 72 x 72 - 23.23 x 16.23 in
I would like to realize a constant print size (column 3) and I do not want to change the geometry of the image/ avoid to re-compress it, so that it does not loose quality. In order to proceed it seems to me that I need to understand the following which I cannot find any information about:
1) Which of these three values is actually saved in the pdf document and which one is calculated by identify?
2) Which software (and how) would allow me to batch process a number of pdf files in order to achieve my goal.
3) Guessing that geometry and resol are values of the pdf file and print size is derived from it, the software would need to calculate a resolution value for each image so that print size would qual over all pages?
Thank you very much!
Cheers,
Benjamin

1) I think only the first two are actually stored in the PDF, but the third value (print size) is directly related to resolution (538x375) and pixel density (72ppi aka 72dpi), so it can easily be calculated anyway.
2) It seems like you're going about this a little backward. There are plenty of applications that are perfectly suited to controlling image layout and printing. Adobe Illustrator is one of the most common and there are some free ones, too. But these are going to involve loading the images, visually arranging them on the page and adjusting the print sizes visually, rather than programmatically.
2) If you did want to do this programmatically, though, I think you're going to have a hard time finding software to solve that problem. GIMP and Photoshop both have some batching capability, and I know GIMP has a fairly robust CLI, so you might be able to use that.
3) Yes, you'll start with the print size you want, divide the number of pixels by the number of inches to get ppi/dpi.
NOTE: Keep in mind that dpi goes both directions. If you have a 200 x 300 image and a 400 x 400 image, and you want them both to print 10 inches square, then you're going to distort the 200 x 300 image, stretching it horizontally. The 200 x 300 image will also look poorer quality than the 400 x 400, because you have fewer pixels to work with.
For these and other reasons, I highly recommend a visual approach, rather than a coding approach.
Good luck!

Related

ImageMagick converts PDF into tiny images despite setting density and resize options

I'm using ImageMagick to convert the following PDF to an PNG file: PDF from IMSLP (Permalink)
In a PDF viewer it looks nice (even though it needs quite a bit of zooming):
but when converting with
convert "file.pdf" "/tmp/file.png"
the produced image gets an extremely low resolution:
when adding density and resize information, I get somewhat bigger images, but still not the original resolution that is stored within the PDF (certainly not 300 DPI)
convert -density "300" -resize "3000x3000>" "file.pdf" "/tmp/file.png"
When using Poppler-Utils' pdfimages, I'm getting the appropriate image:
My question is: Is there any way to tell ImageMagick to extract the images in the "correct" resolution (as is stored in the PDF document)? In other words, ignore the zoom that is necessary to view the PDF properly, thus extracting the correct image resolution?
I'm using ImageMagick 7.1.0.16 with Ghostscript 9.55.0 inside an Alpine Linux docker image.
Very unusual structure you have there its been through many changes but we can guess some pages may have been converted to 300 dpi or 600 dpi since they all render at roughly the same size.
Note that graphics dpi is subjective it is not that value that's used inside a PDF it is the the pixels per default of 72 point units that relate to a graphics working dpi. the image may have been 75 dpi but stored at 300 pixels per 72 points.
1st Analysis says images are
image-0028 = 714 X 900 dots nominally 600 dpi
image-0002 = 726 X 900 dots nominally 600 dpi
image-0005 = 674 x 900 dots nominally 600 dpi
image-0008 = 674 x 900 dots nominally 600 dpi
image-0011 = 674 x 900 dots nominally 600 dpi
image-0014 = 674 x 900 dots nominally 600 dpi
but all have been down-sampled to various sizes approx. 1.2" x 1.5" so a sensible source size to match all those reductions is possibly
9.6" x 12" with some cropping.
Thus to get the nearest original quality extract pages # 600 dpi (lossless png would be best to keep those lossy jpeg flaws)
Then reconvert them to 75 dpi should give you the closest to the poor quality inputs.
You need to increase your density much larger and put your resize after reading the input in Imagemagick.
This will be 5800 × 7200 pixels:
convert -density 4800 IMSLP358086-PMLP578359-Ehr_OP_20_5.pdf[1] x.png
This will be 2417 × 3000 pixels:
convert -density 4800 IMSLP358086-PMLP578359-Ehr_OP_20_5.pdf[1] -resize "3000x3000>" y.png

How to select a lens for reading very small font characters

I am trying to implement an OCR / OCV algorithm for inspecting printed text in black ink on a white background. The text size is ranging from 3 pt. to 6 pt. I tried first to capture images with a 5 MP monochrome camera using an 8 mm, 12 mm and 16 mm lens but I could not get the characters with good clarity. I repeated the same test with 10 MP camera also considering that higher pixel depth will give more information but I got same results.
I'm not sure, how I can get a clearer image. Whether a 5 MP / 10 MP is enough and if there is any way to determine the lens to be used in such application.
The FOV for inspection is about 300 x 250 mm and the working distance I considered from approx. 400 mm to 650 mm.
Due to copyright concerns I cannot post the image of the object under inspection.
Any help or direction is greatly appreciated. Thanks.
It's simple geometry. It is:
3pt =~ 1mm.
Assuming you want to have 10 pixels to cover each character, your IFOV needs to be:
IFOV =~ (font_width / 10) / distance = 0.1 / 650 =~ 0.15 milliradians / pixel.
For the work area width you mention, the horizontal field of view is:
FOV = 2 * atan((300 / 2) / 650) =~ 453 milliradians =~ 26 deg
So the minimal (horizontal) sensor resolution you'd need is:
Width = 453 / 0.15 = 3020 pixels.
Thus a 10MP sensor should be quite sufficient, and 5MP one may be adequate.
To choose the lens, from the above spec for the FOV, and the format (width, height) of your choice of sensor, you can work out by the same simple trigonometry the needed focal length. Finally, among all lenses matching that focal length that are available for your camera mount, you need to choose one that (a) can be focused at the distance of interest and (b) has an adequate Optical Transfer Function such that one line can be resolved at the above IFOV.
In practice, after running the math and looking at catalogs, you'll end up with several candidate lenses. My advice would then be to get samples and try them out on your on setup, and specifically with the particular lighting rig you'll be using, before making a final decision. Depending on your particular project, factors influencing the choice, in addition (obviously) to the cost of the lens + sensor combination, may be size/weight, sensitivity to environment conditions (temperature, humidity, vibrations), availability and lead time for sourcing, etc.

True Type Font Scaling

MSDN's truetype font article (https://learn.microsoft.com/en-us/typography/opentype/otspec160/ttch01) gives the following for converting FUnits to pixels:
Values in the em square are converted to values in the pixel coordinate system by multiplying them by a scale. This scale is:
pointSize * resolution / ( 72 points per inch * units_per_em )
where pointSize is the size at which the glyph is to be displayed, and resolution is the resolution of the output device. The 72 in the denominator reflects the number of points per inch.
For example, assume that a glyph feature is 550 FUnits in length on a 72 dpi screen at 18 point. There are 2048 units per em. The following calculation reveals that the feature is 4.83 pixels long.
550 * 18 * 72 / ( 72 * 2048 ) = 4.83
Questions:
It says "pointSize is the size at which the glyph is to be displayed." How does one compute this, and what units is it in?
It says "resolution is the resolution of the output device". Is this in DPI? Where would I get this information?
It says "72 in the denominator reflects the number of points per inch." Is this related to DPI or no?
In the example, it says '18 point'. Is this 18 used in computing the resolution or the pointSize?
Unfortunately, Apple's documentation is more or less the same, and other than that there are barely any resources other than just reading the source code of stb_truetype.
It says "pointSize is the size at which the glyph is to be displayed." How does one compute this, and what units is it in?
You don’t compute the point size, you set it. It’s the nominal size you want the font to be displayed in (think the font menu in a text editor). The ‘point size’ is a traditional typographical measurement system, with ‘point’ being roughly 1/72 of an inch. This brings the other question:
It says "72 in the denominator reflects the number of points per inch." Is this related to DPI or no?
No. Again, these are typographical points — the same unit you set the point size with. That’s why it’s part of the denominator in the first place: the point size is expressed in a measurement system of 72 points to an inch, and that has to be somehow taken into account in the equation.
Now, the typographical points are different from the output device’s dots or pixels. While in the early days of desktop publishing it was common to have a screen resolution of 72 pixels per inch that indeed corresponded to typographical system of 72 points per inch (no coincidence in that), these days the output resolution can, of course, vary quite dramatically, so it’s important to keep the point vs pixel distinction in mind.
In the example, it says '18 point'. Is this 18 used in computing the resolution or the pointSize?
Neither. It is the point size; see above. The entire example could be translated as follows. With a font based on 2048 units per em, if a particular glyph feature is 550 em units long and the glyph gets displayed at the size of 18 points (that is, 18/72 of an inch) on a device with screen resolution of 72 pixels per inch, the pixel size of that feature will be 4.84.
It says "resolution is the resolution of the output device". Is this in DPI? Where would I get this information?
It’s DPI/PPI, yes. You have to query some system API for that information or just hardcode the value if you’re targeting a specific device.

How to get DPI of a PDF file?

Using ImageMagick or GhostScript or any PHP code how can I get the DPI value of PDF files?
Here is the link for two demo files
http://jmp.sh/O5g5wL4 -- of 72 DPI
http://jmp.sh/RxrnYrY -- of 300 DPI
I have used
$image = new Imagick();
$image->readImage('xyz.pdf');
$resolutions = $image->getImageResolution();
It gives the same result for two different PDF files having different DPI.
I have also used
pdfimages -list xyz.pdf
It gives a list of all information but how to fetch the DPI value from the list.
How to get the exact DPI value of a PDF?
As fmw42 says PDF files themselves have no resolution. However in your case both the files consist of nothing but an image. In one case the image is ~48 MB and in the other its around 200 MB.
The reason is that the images have a different effective resolution.
In PDF the image is simply a bitmap, a sequence of coloured pixels. These are then drawn onto the underlying media. At this point there is no resolution, the pixels are laid down in a specific media size. In your case 22 inches by 82 inches.
The effective resolution is given by dividing the dimension by the number of pixels in the image in that dimension.
So if I have an image which is 1000x1000 pixels, and I draw it in a 1 inch square, then the effective resolution of the image is 1000 dpi. If I change my mind and draw it in a square 4 inches by 4 inches, then the effective resolution is 250 dpi.
The image hasn't changed, just the area it covers.
Now consider I have two images drawn in 1 inch squares. the first image is 1000x1000, the second is 500x500. The effective resolution of the first image is 1000 dpi, the effective resolution of the second is 500 dpi.
So you can see that, in PDF, the effective resolution of the image is a combination of the dimensions of the image, and the dimensions of the media it covers.
That's a difficult thing to measure in a PDF file. The area covered is calculated using matrix algebra and can be a combination of several different matrices.
The actual dimensions of the image, by contrast are quite easy to determine, they are given in the image dictionary. Your images are: 1620x5868 and 3372x12225. In both cases the media is the same size; 22.5x81.5 inches.
Since the images cover the entire media, the effective resolutions are;
1620/22.5 = 72 by 5868/81.5 = 72
3372/22.5 = 149.866 by 12225/81.5 = 150
I think MuPDF will give you image dimensions and media dimensions, assuming all your PDF files are constructed like this you can then simply perform the maths, but note that this won't be so simple for ordinary PDF files where images don't cover the entire media.
Using mutool info -I -M 150-dpi.pdf gives:
Retrieving info from pages 1-1...
Mediaboxes (1):
1 (6 0 R): [ 0 0 1620 5868 ]
Images (1):
1 (6 0 R): [ DCT ] 3375x12225 8bpc DevCMYK (12 0 R)
So there's your image dimensions and your media size. All you need to do is apply the division of one by the other.
Note: In debian and related distros, mutool is contained in mupdf-tools package, not in mupdf package itself. It can by therefore installed by sudo apt install mupdf-tools.
I use pdfimages -list from the poppler library, gives you all the information about the images.

How to read no. pixels per res. unit in TIFF header

I'm trying to read a TIFF image that has been exported from a Leica (SP5) program. I can read other details (e.g. bits per sample, image size x, image size y) as per tags defined in TIFF documentation. I'm sort of crudely reading the header out as unsigned integers until I get to a certain tag number.
I know at 296, my 'Resolution Unit' is cm. At 282 and 283, it's supposed to give me the number of pixels (in x and y) per resolution unit. I'm not sure how to do this. Can someone please help??
Well, if at 296 you discover what the unit type is (either 1 - No absolute unit, 2 - Inch, or 3 - Centimeter) and at 282 and 283 you get XResolution and YResolution respectively then you have everything you need to solve the problem.
To get a per unit type measure just multiply XResolution and YResolution together:
XResolution * YResolution = PixelsPerUnit
Since you are trying to find the area of the rectangle created by the resolutions.