Training a model to be biased to a given background - tensorflow

I am re-training an inception resnet v2 model to recognize a sequence of 3 digit numbers(of a particular font type).The sequence is artificially generated with black lettering on a white background. I am of the opinion that subjecting the model to see only a specific background will help me eliminate false detections(in this case any other 3 digit sequence not on a white background), as the model will not predict(high probabilities) for sequences whose backgrounds are not white.Is it a valid assumption to make?
PS:I have tried using tesseract previously to perform text extraction from an image. I used east text detector for detection, which gave me the bounding boxes for the text. I followed that with OCR using pytesseract, but it always returned an empty string. Furthermore, on rotating of the digits, east-text detector failed to recognize the rotated sequence of digits. I was hence left with no option but to train and perform text detection and extractiion using a Neural Network Model.
code for pytesseract:
import cv2
import numpy as np
import pytesseract
from pytesseract import image_to_string
from PIL import Image
refPt=[(486,302),(540,308),(538,328),(484,323)] #the bbox returned by east
refpt = np.array(refPt,dtype=np.int32)
roi_corners=np.array(refPt[0:4],np.int32).reshape((-1,1,2))
inp_img=cv2.imread("1.jpg")
mask = np.zeros(inp_img.shape, dtype=np.uint8)
channel_count = inp_img.shape[2]
ignore_mask_color = (255,)*channel_count
mask = cv2.fillPoly(mask, np.array(refPt[0:4],np.int32).reshape((-1,1,2))], ignore_mask_color)
masked_image = cv2.bitwise_and(inp_img, mask)
print (image_to_string(Image.fromarray(masked_image),lang='eng'))

Related

Why does image_dataset_from_directory return a different array than loading images normally?

I noticed that the output from TensorFlow's image_dataset_from_directory is different than directly loading images (either by PIL, Keras' load_img, etc.). I set up an experiment: I have a single RGB image with dimensions 2400x1800x3, and tried comparing the resulting numpy arrays from the different methods:
from PIL import Image
from tensorflow.keras.utils import image_dataset_from_directory, load_img, img_to_array
train_set = image_dataset_from_directory(
'../data/',
image_size=(2400, 1800), # I'm using original image size
label_mode=None,
batch_size=1
)
for batch in train_set:
img_from_dataset = np.squeeze(batch.numpy()) # remove batch dimension
img_from_keras = img_to_array(load_img(img_path))
img_from_pil = img_to_array(Image.open(img_path))
print(np.all(img_from_dataset == img_from_keras)) # False
print(np.all(img_from_dataset == img_from_pil)) # False
print(np.all(img_from_keras == img_from_pil)) # True
So, even though all methods return the same shape numpy array, the values from image_dataset_from_directory are different. Why is this? And what can/should I do about it?
This is a particular problem during prediction time where I'm taking a single image (i.e. not using image_dataset_from_directory to load the image).
This is strange but I have not figured out exactly why but if you print out a pixel values from the img_from_dataset, img_from_keras and img_from_pil I found that the pixel values for img_from_data are sometimes lower by 1, that is it looks like some kind of rounding is going on. All 3 are supposed to return float32 so I can't see why they should be different. I also tried using
ImageDataGenerator().flow_from_directory and it matches the data for img_from_keras and img_from_pil. Now img_from_dataset return a A tf.data.Dataset object it yields float32 tensors of shape (batch_size, image_size[0], image_size[1], num_channels).
I used this code to detect the pixel value difference where I used a 224 X 224 X3 image
match=True
for i in range(224):
for j in range(224):
for k in range (3):
if img_from_dataset[i,j,k] != img_from_keras[i,j,k]:
match=False
print(img_from_dataset[i,j,k], img_from_keras[i,j,k], i, j, k)
break
if match==False:
break
if match == False:
break
print(match)
An example output of the code is
86.0 87.0 0 0 2
False
If you ever figured out why the difference let me know. I expect one will have to go through the detailed code. I took a quick look. Even though you specified the image size as being the same as the original image, image_dataset_from_directory still resizes the image using tf.image.resize with the iterpolation as interpolation='bilinear'. Maybe the load_img(img_path) and PIL image.open use different interpolations.

How to save an image that has been visualized/generated by a Keras model?

I am using detecto model to visualize an image. So basically I am passing an image to this model and it will draw a boundary line accross the object and dislay the visualized image.
from keras.preprocessing.image import load_img
from keras.preprocessing.image import save_img
from keras.preprocessing.image import img_to_array
from detecto import core, utils, visualize
image = utils.read_image('retina_model/4.jpg')
model = core.Model()
labels, boxes, scores = model.predict_top(image)
img=visualize.show_labeled_image(image, boxes,)
Now, I am trying to convert this visualized image into Numpy array. I am using the below line for converting the image into numpy array :
img_array = img_to_array(img)
It is giving the errror :
Unsupported Image Shape
All I want is to display the visualized image which is the output of this model to my website. The plan is to convert the image into numpy array and then save the image by code using the below line :
save_img('image1.jpg', img_array)
So I was planning to download this visualized image (output of this model) so that I can display the downloaded image to my website. If there is some other way to do achieve this then please let me know.
Detecto's documentation says the utils.read_image() is already returning a NumPy array.
But you are passing the return of visualize.show_labeled_image() to Keras' img_to_array(img)
Looking at the Detecto source code of visualize.show_labeled_image(), it has no return type, so it is returning None by default. So I think your problem is you are not passing a valid image to img_to_array(img), but None.
I don't think the call to img_to_array(img) is needed, because you already have the image as a NumPy array. But note that according to Detecto's documentation, utils.read_image() is "Equivalent to using OpenCV’s cv2.imread function and converting from BGR to RGB format" . Make sure that's what you want.
you can visit the official github repo of detecto/visualize.pyto find out the show_labeled_image() function it uses matplotlib to plot the image with bounding boxes you can modify that code in your file to save the plot using plt.save_fig()

How to crop the detected object after training using YOLO?

I am using YOLO for model training. I want to crop the detected object.
For Darknet repository am using is: https://github.com/AlexeyAB/darknet/
For Detection and storing output coordinates in a text file am using this:
!./darknet detector test data_for_colab/obj.data data_for_colab/yolov3-tiny-obj.cfg yolov3-tiny-obj_10000.weights -dont_show -ext_output < TEST.txt > result.txt
Result.jpg
Considering in the TEST.txt file you have details as the sample image.
You can use re module of python for text pattern detection, ie your "class_name".
Parsing the .txt file
import re
path='/content/darknet/result.txt'
myfile=open(path,'r')
lines=myfile.readlines()
pattern= "class_name"
for line in lines:
if re.search(pattern,line):
Cord_Raw=line
Cord=Cord_Raw.split("(")[1].split(")")[0].split(" ")
Now we will get the coordinates in a list.
Coordinate calculation
x_min=int(Cord[1])
x_max=x_min + int(Cord[5])
y_min=int(Cord[3])
y_max=y_min+ int(Cord[7])
Cropping from the actual image
import cv2
img = cv2.imread("Image.jpg")
crop_img = img[y_min:y_max, x_min:x_max]
cv2.imwrite("Object.jpg",crop_img)

MNIST Matplotlib: showing color

I am exploring the MNIST dataset which is a collection of gray-scale handwritten digit images. I am using Matplotlib to plot random images from the dataset:
plt.subplot(221)
plt.imshow(X_train[1],cmap='gray')
plt.subplot(222)
plt.imshow(X_train[100])
plt.subplot(223)
plt.imshow(X_train[4559])
plt.subplot(224)
plt.imshow(X_train[50000])
plt.show()
My question is why the images are coming up as colored when I don't explicitly set cmap='gray'.
Shouldn't they all appear as grayscale images by default as that's their true nature?
This is because, by default, imshow() uses 'viridis' as cmap.

Saving an imshow-like image while preserving resolution

I have an (n, m) array that I've been visualizing with matplotlib.pyplot.imshow. I'd like to save this data in some type of raster graphics file (e.g. a png) so that:
The colors are the ones shown with imshow
Each element of the underlying array is exactly one pixel in the saved image -- meaning that if the underlying array is (n, m) elements, the image is NxM pixels. (I'm not interested in interpolation='nearest' in imshow.)
There is nothing in the saved image except for the pixels corresponding to the data in the array. (I.e. there's no white space around the edges, axes, etc.)
How can I do this?
I've seen some code that can kind of do this by using interpolation='nearest' and forcing matplotlib to (grudgingly) turn off axes, whitespace, etc. However, there must be some way to do this more directly -- maybe with PIL? After all, I have the underlying data. If I can get an RGB value for each element of the underlying array, then I can save it with PIL. Is there some way to extract the RGB data from imshow? I can write my own code to map the array values to RGB values, but I don't want to reinvent the wheel, since that functionality already exists in matplotlib.
As you already guessed there is no need to create a figure. You basically need three steps. Normalize your data, apply the colormap, save the image. matplotlib provides all the necessary functionality:
import numpy as np
import matplotlib.pyplot as plt
# some data (512x512)
import scipy.misc
data = scipy.misc.lena()
# a colormap and a normalization instance
cmap = plt.cm.jet
norm = plt.Normalize(vmin=data.min(), vmax=data.max())
# map the normalized data to colors
# image is now RGBA (512x512x4)
image = cmap(norm(data))
# save the image
plt.imsave('test.png', image)
While the code above explains the single steps, you can also let imsave do all three steps (similar to imshow):
plt.imsave('test.png', data, cmap=cmap)
Result (test.png):