How to clean images to use with a MNIST trained model? - tensorflow

I am creating a machine learning model for classifying images of numbers. I have trained the model using Tensorflow and Keras using the inbuilt tf.keras.datasets.mnist dataset. The model works quite well with the test images from the mnist dataset itself but I would like to feed it images of my own. The images that I am feeding this model is extracted from a Captcha so they will follow a similar pattern. I have included some examples of the images in this public google drive folder. When I feed these images, I noticed that the model is not very accurate and I have some guesses as to why.
The background of the image creates too much noise in the picture.
The number is not centered.
The image is not striclty in the color format of MNIST training set (Black background white text).
I wanted to ask how can I remove the background and centre it so that the noise in the image is reduced allowing for better classifications.
Here is the model I am using:
import tensorflow as tf
from tensorflow import keras
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
class Stopper(keras.callbacks.Callback):
def on_epoch_end(self, epoch, log={}):
if log.get('acc') >= 0.99:
self.model.stop_training = True
print('\nReached 99% Accuracy. Stopping Training...')
model = keras.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(1024, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(
optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
x_train, x_test = x_train / 255, x_test / 255
model.fit(x_train, y_train, epochs=10, callbacks=[Stopper()])
And here is my method of importing the image into tensorflow:
from PIL import Image
img = Image.open("image_file_path").convert('L').resize((28, 28), Image.ANTIALIAS)
img = np.array(img)
model.predict(img[None,:,:])
I have also included some examples from the MNIST dataset here. I would like a script to convert my images as closely to the MNIST dataset format as possible. Also, since I would have to do this for an indefinite number of images, I would appreciate if you could provide a fully automated method for this conversion. Thank you very much.

You need to train with a dataset similar to the images you're testing. The MNIST data is hand-written numbers, which is not going to be similar to the computer generated fonts for Captcha data.
What you need to do is gain a catalog of Captcha data similar to what you're predicting on (preferably from the same source you will be inputting to the final model). It's a painstaking task to capture the data, and you'll probably need around 300-400 images for each label before you start to get something useful.
A key note: your model will only ever be as good as the training data you supplied to the model. Trying to make a good model with bad training data is an effort in pure frustration
To address some of your thoughts:
[the model is not very accurate because] the background of the image creates too much noise in the picture.
This is true. If the image data has noise and the neural net was not trained using any noise in the images, then it will not recognize a strong pattern when it encounters this type of distortion. One possible way to combat this is to take clean images and progamatically add noise to the image (noise similar to what you see in the real Captcha) before sending it to be trained.
[the model is not very accurate because] The number is not centered.
Also true for the same reasons. If all the training data is centered, the model will be overtuned for this property and make incorrect guesses. Follow a similar pattern to the one above if you don't have the capacity to manually capture and catalog a good sampling of data.
[the model is not very accurate because] The image is not striclty in the color format of MNIST training set (Black background white text).
You can get around this by applying a binary threshold to the data before processing/ normalize the color input before training. Depending on the amount of noise in the captcha you may have better results allowing the number and noise to retain some of it's color information (still put in greyscale and normalize, just don't apply the threshold).
Additionally I'd recommend using a convolution net rather than the linear network as it is better at distinguishing 2D features like edges and corners. i.e. use keras.layers.Conv2D layers before flattening with keras.layers.Flatten
See the great example found here: Trains a simple convnet on the MNIST dataset.
model = tf.keras.models.Sequential(
[
tf.keras.layers.Conv2D(
32,
kernel_size=(3, 3),
activation=tf.nn.relu,
input_shape=input_shape,
),
tf.keras.layers.Conv2D(64, (3, 3), activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(
num_classes, activation=tf.nn.softmax
),
]
)
I've used this setup for reading fonts in video gameplay footage, and with a test set of 10,000 images I'm achieving 99.98% accuracy, using a random sampling of half the dataset in training, and calculating accuracy using the total set.

Related

Why doesn't my CNN validation accuracy increase? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 months ago.
Improve this question
I am attempting to create a simple CNN to be able to distinguish eye (retinal) scans of different severities. It is a multi-class classification problem, 5 classes. This by now is probably a fairly standard, textbook case for CNNs. I am using the Kaggle EyePACs dataset. The photos are very big, so I'm using a dataset that has rescaled them.
My issue is, when I'm training the model, I expect to see the usual learning curves where both training and validation curves increase together like this example from google:
However my curves look like this:
I haven't done any image pre-processing on the data, I was hoping that there would be some rudimentary learning going on which I can then improve upon using CLAHE and what have you. I've changed the classes so that instead of trying to predict the grades from 0 to 4, I've removed the middle classes so that we just have the extremes: 0 and 4 (and thus it became a binary classification problem, where class 4 was relabelled 1 and so it's 0 and 1). However the curve didn't change much and still looks like this:
What could be the issue? I thought that as the model gets better with the training data, it must improve on the validation. Yes, this is overfitting, but I assumed that kicks in after some positive learning, not straight away. Validation set doesn't seem to be learning at all. Also, shouldn't these models start with random parameters, so that the initial accuracy would be random; but instead it's around 0.75 from the get-go. It just doesn't learn after that. What's going on? What should I look at changing? Is this a data problem or a hyperparameter problem? Shall I include the code here? Many thanks.
=============================Edit=============================
Here's the code I used. I know it's rudimentary, it's a mishmash of both the 'image classification from scratch' Keras tutorial as well as some standard MNIST tutorials you get around the web. Grateful for any pointers.
Creating the image-label dataset objects for train (+validation split) and test:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized train 15/Binary 0-4",
labels="inferred",
label_mode="binary",
validation_split=0.2,
seed=1337,
subset="training",
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized train 15/Binary 0-4",
labels="inferred",
label_mode="binary",
validation_split=0.2,
seed=1337,
subset="validation",
)
test_ds = tf.keras.preprocessing.image_dataset_from_directory(
"/content/drive/MyDrive/Colab Notebooks/resized test 15/0-4/",
labels="inferred",
label_mode="binary",
)
Found 26518 files belonging to 2 classes.
Using 21215 files for training.
Found 26518 files belonging to 2 classes.
Using 5303 files for validation.
Found 36759 files belonging to 2 classes.
#To make it run faster (I think?):
train_ds = train_ds.prefetch(buffer_size=32)
val_ds = val_ds.prefetch(buffer_size=32)
test_ds = test_ds.prefetch(buffer_size=32)
#The architecture:
from keras.models import Sequential
from keras.layers import Dense, Rescaling, Conv2D, MaxPool2D, Flatten
model = Sequential()
model.add(Rescaling(1.0 / 255))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3)))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Flatten())
model.add(Dense(units=2, activation='sigmoid'))
#Compile it:
from keras import optimizers
model.compile(optimizer=keras.optimizers.Adam(1e-3), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
#And then finally train (run) it:
history = model.fit(
x=train_ds,
epochs=30,
validation_data=val_ds,
)
#I think this is how I evaluate the trained model against the test data:
loss, acc = model.evaluate(test_ds)
print("Accuracy", acc)
#It prints out the following output:
1149/1149 [==============================] - 278s 238ms/step - loss: 0.1408 - accuracy: 0.9672
Accuracy 0.9671916961669922
And then of course I end it with model.save('Binary CNN 0-4').
I think I have spotted one thing I can change already -- that's to change the loss function to binary_crossentropy and adjust the number of units at the final dense layer to 1 (instead of 2)(?). But surely that little change won't actually address why the validation set isn't learning.
You've not included code, so I hope it's OK to give a couple of
tentative general answers
Q) initial accuracy, how can it be as high as 0.75?
A) Tensorflow reports the average training accuracy over the epoch, and if
there are many batches then it learns during epoch 0.
The first accuracy reported is the average over epoch 0
and can be much better than random.
If, for example, the input data is unbalanced and has 75% of
labels in one category, the model may learn very quickly that
it can achieve 75% accuracy by allocating 100% of training data
to that category.
Q) Can overfitting start at the beginning?
A) It can start very close to the beginning. A network may in effect just be memorising the training set.
There are standard approaches to overfitting, which include
i) Try a simpler network. It makes sense anyway to start simple and add
complexity as required.
ii) Regularization of layers - add (e.g.) L2 regularizers to your layers
iii) Add dropout layers between hidden layers
iv) Batch normalisation between hidden layers.
v) Image augmentation (randomly add some rotation, shift, flipping if appropriate)
vi) Get more training data
vii) Use transfer learning
as another answer has suggested. This is most likely appropriate if
you don't have much training data. You can then just add a layer
or two to the pre-built model (probably removing its last
layer or two), and train only the new layers.
Only trial and error will show what works

Is validation curve slight greater or lower in CNN models good?

Can you tell me which one among the two is a good validation vs train plot?
Both of them are trained with same keras sequential layers, but the second one is trained using more number of samples, i.e. augmented the dataset.
I'm a little bit confused about the zigzags in the first plot, otherwise I think it is better than the second.
In the second plot, there are no zigzags but the validation accuracy tends to be a little high than train, is it overfitting or considerable?
It is an image detection model where the first model's dataset size is 5170 and the second had 9743 samples.
The convolutional layers defined for the model building:
tf.keras.layers.Conv2D(128,(3,3), activation = 'relu', input_shape = (150,150,3)),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(64,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Conv2D(32,(3,3), activation = 'relu'),
tf.keras.layers.MaxPool2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512,activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Dense(1,activation='sigmoid')
Can the model be improved?
From the graphs the second graph where you have more samples is better. The reason is with more samples the model is trained on a much wider probability distribution of images. So when validation is run you have a better chance of correctly classifying the image. You have a lot of dropout in your model. This is good to prevent over fitting, however it will lower the training accuracy relative to the validation accuracy. Your model seems to be doing well. It might improve if you add additional convolution- max pooling layers. Alternative of course is to use transfer learning. I would recommend efficientnetb3. I also recommend using an adjustable learning rate. The Keras callback ReduceLROnPlateau works well for that purpose. Documentation is here.. Code below shows my recommended settings.
rlronp=tf.keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=2,
verbose=1,
mode='auto'
)
in model.fit include callbacks=[rlronp]

What is the behavior of data augmentation layer?

I am trying to understand the tensor flow data augmentation tutorial
In the following defined model
model = tf.keras.Sequential([
resize_and_rescale,
data_augmentation,
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
# Rest of your model
])
My understanding is no matter how many image rotate/zoom/transform defined in data_augmentation. This data_augmentation layer output just 1 image from 1 input image, am I correct?
I saw another post Does ImageDataGenerator add more images to my dataset?. Someone answers each epoch ImageDataGenerator will create different images, is that the same behavior here?
Otherwise, it is just the same transformed image trained epoch after epoch, which makes no sense.
Yes! Data augmentation layer would just transform images and return same shape as input (batch_size, *image_dims). But, due to randomisation in data augmentation layer, you are likely to get a different output each time that layer is called. For instance, in linked tutorial, random rotate or zoom is applied with a 20% chance, in addition the zoom factor and rotation angle are randomly selected(within specified limits) each time that layer is called.

Can you use an embedding + CNN model for both text and image classification?

I have a tensorflow CNN model with an embedding layer for text classification as follows:
model = tf.keras.Sequential([
Embedding(vocab_size, embedding_dim, input_length=maxlen, weights=[embedding], trainable=False),
Conv1D(128, 5, activation='relu'),
GlobalMaxPooling1D(),
Dense(10, activation='relu'),
Dense(1, activation='sigmoid')
])
My colleague is adamant that this is viable but I found this post stating it is not feasible. I understand CNN as an algorithm can be used for text and image inputs, but my understanding is that you can't use the same CNN model for text input and image input: text will use Conv1D and image, Conv2D.
The linked post mentions:
Process the image using a CNN model.
Process the text using another model ... By CNN I mean usually a 1D CNN that runs over the words in a sentence.
Merge the 2 latent spaces which tells information about the image and the
text.
Run last few Dense layers for classification.
If I'm on the right track, how I can go about building two sub models (one for text, another for image classification) and merge the latent spaces. Thank you!
Admittedly, the solution to this is to build two different models and concatenate their outputs prior to the Dense layers.
You can start by looking at this very good article(although there it's about MLP + CNN, the logic is still the same) : https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/
Therefore, the solution mentioned there is indeed viable and the way to go.

Tensorflow Polynomial Linear Regression curve fit

I have created this Linear regression model using Tensorflow (Keras). However, I am not getting good results and my model is trying to fit the points around a linear line. I believe fitting points around degree 'n' polynomial can give better results. I have looked googled how to change my model to polynomial linear regression using Tensorflow Keras, but could not find a good resource. Any recommendation on how to improve the prediction?
I have a large dataset. Shuffled it first and then spited to 80% training and 20% Testing. Also dataset is normalized.
1) Building model:
def build_model():
model = keras.Sequential()
model.add(keras.layers.Dense(units=300, input_dim=32))
model.add(keras.layers.Activation('sigmoid'))
model.add(keras.layers.Dense(units=250))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=200))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=150))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=100))
model.add(keras.layers.Activation('tanh'))
model.add(keras.layers.Dense(units=50))
model.add(keras.layers.Activation('linear'))
model.add(keras.layers.Dense(units=1))
#sigmoid tanh softmax relu
optimizer = tf.train.RMSPropOptimizer(0.001,
decay=0.9,
momentum=0.0,
epsilon=1e-10,
use_locking=False,
centered=False,
name='RMSProp')
#optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
model.compile(loss='mse',
optimizer=optimizer,
metrics=['mae'])
return model
model = build_model()
model.summary()
2) Train the model:
class PrintDot(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs):
if epoch % 100 == 0: print('')
print('.', end='')
EPOCHS = 500
# Store training stats
history = model.fit(train_data, train_labels, epochs=EPOCHS,
validation_split=0.2, verbose=1,
callbacks=[PrintDot()])
3) plot Train loss and val loss
enter image description here
4) Stop When results does not get improved
enter image description here
5) Evaluate the result
[loss, mae] = model.evaluate(test_data, test_labels, verbose=0)
#Testing set Mean Abs Error: 1.9020842795676374
6) Predict:
test_predictions = model.predict(test_data).flatten()
enter image description here
7) Prediction error:
enter image description here
Polynomial regression is a linear regression with some extra additional input features which are the polynomial functions of original input features.
i.e.;
let the original input features are : (x1,x2,x3,...)
Generate a set of polynomial functions by adding some transformations of the original features, for example: (x12, x23, x13x2,...).
One may decide which all functions are to be included depending on their constraints such as intuition on correlation to the target values, computational resources, and training time.
Append these new features to the original input feature vector. Now the transformed input feature vector has a size of len(x1,x2,x3,...) + len(x12, x23, x13x2,...)
Further, this updated set of input features (x1,x2,x3,x12, x23, x13x2,...) is feeded into the normal linear regression model. ANN's architecture may be tuned again to get the best trained model.
PS: I see that your network is huge while the number of inputs is only 32 - this is not a common scale of architecture. Even in this particular linear model, reducing the hidden layers to one or two hidden layers may help in training better models (It's a suggestion with an assumption that this particular dataset is similar to other generally seen regression datasets)
I've actually created polynomial layers for Tensorflow 2.0, though these may not be exactly what you are looking for. If they are, you could use those layers directly or follow the procedure used there to create a more general layer https://github.com/jloveric/piecewise-polynomial-layers