I am trying to implement an OCR / OCV algorithm for inspecting printed text in black ink on a white background. The text size is ranging from 3 pt. to 6 pt. I tried first to capture images with a 5 MP monochrome camera using an 8 mm, 12 mm and 16 mm lens but I could not get the characters with good clarity. I repeated the same test with 10 MP camera also considering that higher pixel depth will give more information but I got same results.
I'm not sure, how I can get a clearer image. Whether a 5 MP / 10 MP is enough and if there is any way to determine the lens to be used in such application.
The FOV for inspection is about 300 x 250 mm and the working distance I considered from approx. 400 mm to 650 mm.
Due to copyright concerns I cannot post the image of the object under inspection.
Any help or direction is greatly appreciated. Thanks.
It's simple geometry. It is:
3pt =~ 1mm.
Assuming you want to have 10 pixels to cover each character, your IFOV needs to be:
IFOV =~ (font_width / 10) / distance = 0.1 / 650 =~ 0.15 milliradians / pixel.
For the work area width you mention, the horizontal field of view is:
FOV = 2 * atan((300 / 2) / 650) =~ 453 milliradians =~ 26 deg
So the minimal (horizontal) sensor resolution you'd need is:
Width = 453 / 0.15 = 3020 pixels.
Thus a 10MP sensor should be quite sufficient, and 5MP one may be adequate.
To choose the lens, from the above spec for the FOV, and the format (width, height) of your choice of sensor, you can work out by the same simple trigonometry the needed focal length. Finally, among all lenses matching that focal length that are available for your camera mount, you need to choose one that (a) can be focused at the distance of interest and (b) has an adequate Optical Transfer Function such that one line can be resolved at the above IFOV.
In practice, after running the math and looking at catalogs, you'll end up with several candidate lenses. My advice would then be to get samples and try them out on your on setup, and specifically with the particular lighting rig you'll be using, before making a final decision. Depending on your particular project, factors influencing the choice, in addition (obviously) to the cost of the lens + sensor combination, may be size/weight, sensitivity to environment conditions (temperature, humidity, vibrations), availability and lead time for sourcing, etc.
Related
I have the task to simulate a camera with a full well capacity of 10.000 Photons per sensor element
in numpy. My first Idea was to do it like that:
camera = np.random.normal(0.0,1/10000,np.shape(img))
Imgwithnoise= img+camera
but it hardly shows an effect.
Has someone an idea how to do it?
From what I interpret from your question, if each physical pixel of the sensor has a 10,000 photon limit, this points to the brightest a digital pixel can be on your image. Similarly, 0 incident photons make the darkest pixels of the image.
You have to create a map from the physical sensor to the digital image. For the sake of simplicity, let's say we work with a grayscale image.
Your first task is to fix the colour bit-depth of the image. That is to say, is your image an 8-bit colour image? (Which usually is the case) If so, the brightest pixel has a brightness value = 255 (= 28 - 1, for 8 bits.) The darkest pixel is always chosen to have a value 0.
So you'd have to map from the range 0 --> 10,000 (sensor) to 0 --> 255 (image). The most natural idea would be to do a linear map (i.e. every pixel of the image is obtained by the same multiplicative factor from every pixel of the sensor), but to correctly interpret (according to the human eye) the brightness produced by n incident photons, often different transfer functions are used.
A transfer function in a simplified version is just a mathematical function doing this map - logarithmic TFs are quite common.
Also, since it seems like you're generating noise, it is unwise and conceptually wrong to add camera itself to the image img. What you should do, is fix a noise threshold first - this can correspond to the maximum number of photons that can affect a pixel reading as the maximum noise value. Then you generate random numbers (according to some distribution, if so required) in the range 0 --> noise_threshold. Finally, you use the map created earlier to add this noise to the image array.
Hope this helps and is in tune with what you wish to do. Cheers!
I would like to calculate the Horizontal and Vertical field of view from the camera intrinsic matrix for the cameras used in the KITTI dataset. The reason I need the Field of view is to convert a depth map into 3D point clouds.
Though this question has been asked quite a long time ago, I felt it needed an answer as I ran into the same issue and was unable to find any info on it.
I have however solved it using the information available in this document and some more general camera calibration documents
Firstly, we need to convert the supplied disparity into distance. This can be done through fist converting the disp map into floats through the method in the dev_kit where they state:
disp(u,v) = ((float)I(u,v))/256.0;
This disparity can then be converted into a distance through the default stereo vision equation:
Depth = Baseline * focal length/ Disparity
Now come some tricky parts. I searched high and low for the focal length and was unable to find it in documentation.
I realised just now when writing that the baseline is documented in the aforementioned source however from section IV.B we can see that it can be found in P(i)rect indirectly.
The P_rects can be found in the calibration files and will be used for both calculating the baseline and the translation from uv in the image to xyz in the real world.
The steps are as follows:
For pixel in depthmap:
xyz_normalised = P_rect \ [u,v,1]
where u and v are the x and y coordinates of the pixel respectively
which will give you a xyz_normalised of shape [x,y,z,0] with z = 1
You can then multiply it with the depth that is given at that pixel to result in a xyz coordinate.
For completeness, as P_rect is the depth map here, you need to use P_3 from the cam_cam calibration txt files to get the baseline (as it contains the baseline between the colour cameras) and the P_2 belongs to the left camera which is used as a reference for occ_0 files.
MSDN's truetype font article (https://learn.microsoft.com/en-us/typography/opentype/otspec160/ttch01) gives the following for converting FUnits to pixels:
Values in the em square are converted to values in the pixel coordinate system by multiplying them by a scale. This scale is:
pointSize * resolution / ( 72 points per inch * units_per_em )
where pointSize is the size at which the glyph is to be displayed, and resolution is the resolution of the output device. The 72 in the denominator reflects the number of points per inch.
For example, assume that a glyph feature is 550 FUnits in length on a 72 dpi screen at 18 point. There are 2048 units per em. The following calculation reveals that the feature is 4.83 pixels long.
550 * 18 * 72 / ( 72 * 2048 ) = 4.83
Questions:
It says "pointSize is the size at which the glyph is to be displayed." How does one compute this, and what units is it in?
It says "resolution is the resolution of the output device". Is this in DPI? Where would I get this information?
It says "72 in the denominator reflects the number of points per inch." Is this related to DPI or no?
In the example, it says '18 point'. Is this 18 used in computing the resolution or the pointSize?
Unfortunately, Apple's documentation is more or less the same, and other than that there are barely any resources other than just reading the source code of stb_truetype.
It says "pointSize is the size at which the glyph is to be displayed." How does one compute this, and what units is it in?
You don’t compute the point size, you set it. It’s the nominal size you want the font to be displayed in (think the font menu in a text editor). The ‘point size’ is a traditional typographical measurement system, with ‘point’ being roughly 1/72 of an inch. This brings the other question:
It says "72 in the denominator reflects the number of points per inch." Is this related to DPI or no?
No. Again, these are typographical points — the same unit you set the point size with. That’s why it’s part of the denominator in the first place: the point size is expressed in a measurement system of 72 points to an inch, and that has to be somehow taken into account in the equation.
Now, the typographical points are different from the output device’s dots or pixels. While in the early days of desktop publishing it was common to have a screen resolution of 72 pixels per inch that indeed corresponded to typographical system of 72 points per inch (no coincidence in that), these days the output resolution can, of course, vary quite dramatically, so it’s important to keep the point vs pixel distinction in mind.
In the example, it says '18 point'. Is this 18 used in computing the resolution or the pointSize?
Neither. It is the point size; see above. The entire example could be translated as follows. With a font based on 2048 units per em, if a particular glyph feature is 550 em units long and the glyph gets displayed at the size of 18 points (that is, 18/72 of an inch) on a device with screen resolution of 72 pixels per inch, the pixel size of that feature will be 4.84.
It says "resolution is the resolution of the output device". Is this in DPI? Where would I get this information?
It’s DPI/PPI, yes. You have to query some system API for that information or just hardcode the value if you’re targeting a specific device.
I am new to the world of raster images, so I will first explain which definitions I use and hope that I will use them right:
- geometry (the total number of pixels of the image %w * %h)
- resolution (pixels per inch / ppi)
- size or "print size" (the display size (e.g. in inches) on screen or printer)
I have some PDF documents containing raster images of different geometry. When opening with evince they therefore all display (and I guess potentially print) with different sizes. I would like to define the print size within the pdf so that evince (or any other viewer) would display every page with the same size when opening the document.
How could this be realized? Geometry and print size of the image are linked by the resolution as far as I understand. Currently one of my pdf's shows to following ImageMagick:identify output:
$identify -units PixelsPerInch -format "%w x %h - %[resolution.x] x %[resolution.y] - %[fx:w/72] x %[fx:h/72] in\n" example.pdf
geometry - resol. - print size -
538 x 375 - 72 x 72 - 7.47 x 5.20 in
546 x 381 - 72 x 72 - 7.58 x 5.29 in
1210 x 1681 - 72 x 72 - 16.80 x 23.34 in
1658 x 1166 - 72 x 72 - 23.02 x 16.19 in
542 x 365 - 72 x 72 - 7.52 x 5.06 in
1673 x 1169 - 72 x 72 - 23.23 x 16.23 in
I would like to realize a constant print size (column 3) and I do not want to change the geometry of the image/ avoid to re-compress it, so that it does not loose quality. In order to proceed it seems to me that I need to understand the following which I cannot find any information about:
1) Which of these three values is actually saved in the pdf document and which one is calculated by identify?
2) Which software (and how) would allow me to batch process a number of pdf files in order to achieve my goal.
3) Guessing that geometry and resol are values of the pdf file and print size is derived from it, the software would need to calculate a resolution value for each image so that print size would qual over all pages?
Thank you very much!
Cheers,
Benjamin
1) I think only the first two are actually stored in the PDF, but the third value (print size) is directly related to resolution (538x375) and pixel density (72ppi aka 72dpi), so it can easily be calculated anyway.
2) It seems like you're going about this a little backward. There are plenty of applications that are perfectly suited to controlling image layout and printing. Adobe Illustrator is one of the most common and there are some free ones, too. But these are going to involve loading the images, visually arranging them on the page and adjusting the print sizes visually, rather than programmatically.
2) If you did want to do this programmatically, though, I think you're going to have a hard time finding software to solve that problem. GIMP and Photoshop both have some batching capability, and I know GIMP has a fairly robust CLI, so you might be able to use that.
3) Yes, you'll start with the print size you want, divide the number of pixels by the number of inches to get ppi/dpi.
NOTE: Keep in mind that dpi goes both directions. If you have a 200 x 300 image and a 400 x 400 image, and you want them both to print 10 inches square, then you're going to distort the 200 x 300 image, stretching it horizontally. The 200 x 300 image will also look poorer quality than the 400 x 400, because you have fewer pixels to work with.
For these and other reasons, I highly recommend a visual approach, rather than a coding approach.
Good luck!
I would like to know if it is possible to know the camera calibration matrix anyhow , just by knowing it's specifications , without using camera calibration???
You can take a guess, but this will not replace a proper calibration, since every single camera is different--even if it is of the exact same type.
In your camera matrix, you have usually fx, fy, cx, cy (for square pixels). Take cx=w/2 and cy=h/2, where w and h are the width and height of your image, respectively.
For fx and fy, it is a bit trickier. Theoretically, we have fx = w*f_mm/w_mm, where f_mm is the focal length of your lens in mm and w_mm is the width of your CCD sensor in mm.
However, since lenses are round and sensors usually not, you cannot just take the values from the specifications. There are tables that should give a good estimate for sensor width and height given the sensor size from the specifications, e.g. on Wikipedia. However, if the lens is mounted slightly different, these values are not true anymore.
With this, you will also not calibrate for distortions. It is highly recommended to do a proper calibration, e.g. with a checkerboard.