Changing numpy array using dpi value - numpy

I have an numpy array which I save to a image using savefig(). Then I read it in my code and the image is multiplied bigger than my original aray as dpi while saving is 100.
Is it possible to use dpi to make the image size larger and get it in a numpy array without saving and loading it again?

Sounds like you want to take an array of size (a, b) and scale it by an arbitrary factor s so that the resulting array has shape (a*s, b*s)?
There are several ways of doing this as far as I am aware, but perhaps the best resource is the cookbook page on rebinning: http://www.scipy.org/Cookbook/Rebinning
HTH

Related

Storing pre-processed images

I am evaluating a couple of object detection models on a data set and was planning on performing pre-processing on the data using standardization to zero mean and unit variance. But I don't know how to store the images when they have been pre-processed. Currently they are in jpg format, but what format can be used after I have pre-processed them? Some of the models I evaluate are yolov4, yolov5, and SSD.
If i instead scaled the pixel values from 0-255 to 0-1, what image format could I then use?
Also, if I train the object detector on pre-processed images and then want to apply it to a video, I assume I need to somehow pre-process the video to get decent results. How would I go about doing that?
I have calculated mean and std on my data set using the python module cv2. I read the images using imread which returns a numpy array. Then I subtract mean and divide with std. This gives me a numpy array with both negative and positive floating point values. But when I try to save this numpy array as an image using the function imwrite(filename, array), it doesn't work. I assume because the numpy array isn't allowed to contain negative values.

Simple Captcha Solving

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:
I tried to the remove the noisy dots with some filters:
import cv2
import numpy as np
import pytesseract
img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))
Resulting tranformed images are:
Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?
Final Update:
As #Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.
And here are the new resulting images:
I've taken a much more direct approach to filtering ink splotches from pdf documents. I won't share the whole thing it's a lot of code, but here is the general strategy I adopted:
Use Python Pillow library to get an image object where you can manipulate pixels directly.
Binarize the image.
Find all connected pixels and how many pixels are in each group of connected pixels. You can do this using the minesweeper algorithm. Which is easy to search for.
Set some threshold value of pixels that all legitimate letters are expected to have. This will be dependent on your image resolution.
replace all black pixels in groups below the threshold with white pixels.
Convert back to image.
Your final output image is too blurry. To enhance the performance of pytesseract you need to sharpen it.
Sharpening is not as easy as blurring, but there exist a few code snippets / tutorials (e.g. http://datahacker.rs/004-how-to-smooth-and-sharpen-an-image-in-opencv/).
Rather than chaining blurs, blur once either using Gaussian or Median Blur, experiment with parameters to get the blur amount you need, perhaps try one method after the other but there is no reason to chain blurs of the same method.
There is an OCR example in python that detect the characters. Save several images and apply the filter and train a SVM algorithm. that may help you. I did trained a algorithm with even few Images but the results were acceptable. Check this link.
Wish you luck
I know the post is a bit old but I suggest you to try this library I've developed some time ago. If you have a set of labelled captchas that service would fit you. Take a look: https://github.com/punkerpunker/captcha_solver
In README there is a section "Train model on external data" that you might be interested in.

python numpy/scipy zoom changing center

I have a 2D numpy array, say something like:
import numpy as np
x = np.random.rand(100, 100)
Now, I want to keep zoom this image (keeping the size the same i.e. (100, 100)) and I want to change the centre of the zoom.
So, say I want to zoom keeping the point (70, 70) at the centre and normally how one would do it is to "translate" the image to that point and then zoom.
I wonder how I can achieve this with scipy. I wonder if there is way to specify say 4 coordinates from this numpy array and basically fill the canvas with the interpolated image from this region of interest?
You could use ndimage.zoom to do the zooming part. I use ndimage a lot, and it works well and is fast. https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.zoom.html
The 4 coordinates part you mention is I presume two corners of region you want to zoom into. That's easy by just using numpy slicing of your image (presuming your image is an np array):
your_image[r1:r2, c1:c2]
Assuming you want your output image at 100x100, then your r1-r2, and c1-c2 differences will be the same, so your region is square.
nd.zoom takes a zoom factor (float). You would need to compute whta athat zoom factor is in order to take your sliced image and turn it into a 100x100 sized array:
ndimage.zoom(your_image[r1:r2, c1:c2], zoom=your_zoom_factor)

How do I save color mapped array of same dimensions of the original array?

I have data that I would like to save as png's. I need to keep the exact pixel dimensions - I don't want any inter-pixel interpolation, smoothing, or up/down sizing, etc. I do want to use a colormap, though (and mayber some other features of matplotlib's imshow). As I see it there are a couple ways I could do this:
1) Manually roll my own colormapping. (I'd rather not do this)
2) Figure out how to make sure the pixel dimenensions of the image in the figure produced by imshow are exactly correct, and then extract just the image portion of the figure for saving.
3) Use some other method which will directly give me a color mapped array (i.e. my NxN grayscale array -> NxNx3 array, using one of matplotlibs colormaps). Then save it using another png save method such as scipy.misc.imsave.
How can I do one of the above? (Or another alternate)
My problem arose when I was just saving the figure directly using savefig, and realized that I couldn't zoom into details. Upscaling wouldn't solve the problem, since the blurring between pixels is exactly one of the things I'm looking for - and the pixel size has a physical meaning.
EDIT:
Example:
import numpy as np
import matplotlib.pyplot as plt
X,Y = np.meshgrid(np.arange(-50.0,50,.1), np.arange(-50.0,50,.1))
Z = np.abs(np.sin(2*np.pi*(X**2+Y**2)**.5))/(1+(X/20)**2+(Y/20)**2)
plt.imshow(Z,cmap='inferno', interpolation='nearest')
plt.savefig('colormapeg.png')
plt.show()
Note zooming in on the interactive figure gives you a very different view then trying to zoom in on the saved figure. I could up the resolution of the saved figure - but that has it's own problems. I really just need the resolution fixed.
It seems you are looking for plt.imsave().
In this case,
plt.imsave("filename.png", Z, cmap='inferno')

Reshaping numpy 3D array

I have a dataset with dimensions: (32, 32, 73257) where 32x32 are pixels of a single image.
How do I reshape it to (73257, 1024) so that every image is unrolled in a row?
So far, I did:
self.train_data = self.train_data.reshape(n_training_examples, number_of_pixels*number_of_pixels)
and it looks like I got garbage instead of normal pictures. I am assuming that reshaping was performed across wrong dimension...??
As suggested in the comments, first get every image in a column, then transpose:
self.train_data = self.train_data.reshape(-1, n_training_examples).T
The memory layout of your array will not be changed by any of these operations, so two contiguous pixels of any image will lay 73257 bytes apart (assuming a uint8 image), which may not be the best of options if you want to process your data one image at a time. You will need to time and validate this, but creating a copy of the array may prove advantageous performance-wise:
self.train_data = self.train_data.reshape(-1, n_training_examples).T.copy()