What is the behavior of data augmentation layer? - tensorflow

I am trying to understand the tensor flow data augmentation tutorial
In the following defined model
model = tf.keras.Sequential([
resize_and_rescale,
data_augmentation,
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
# Rest of your model
])
My understanding is no matter how many image rotate/zoom/transform defined in data_augmentation. This data_augmentation layer output just 1 image from 1 input image, am I correct?
I saw another post Does ImageDataGenerator add more images to my dataset?. Someone answers each epoch ImageDataGenerator will create different images, is that the same behavior here?
Otherwise, it is just the same transformed image trained epoch after epoch, which makes no sense.

Yes! Data augmentation layer would just transform images and return same shape as input (batch_size, *image_dims). But, due to randomisation in data augmentation layer, you are likely to get a different output each time that layer is called. For instance, in linked tutorial, random rotate or zoom is applied with a 20% chance, in addition the zoom factor and rotation angle are randomly selected(within specified limits) each time that layer is called.

Related

Does Keras preprocessing layers apply to the validation set?

I was reading the data augmentation article on Keras and they allow one to make preprocessing layer a part of the model:
model = tf.keras.Sequential([
resize_and_rescale,
data_augmentation,
layers.Conv2D(16, 3, padding="same", activation="relu"),
layers.MaxPooling2D(),
# Rest of your model
])
I'm wondering whether one or both of the resize_and_rescale and data_augmentation layers are also applied to the validation set during training?
It depends on which type of augmentation you are using. For example if you use resizing layer or a rescale layer they are applied even during inference mode, that is they would be applied to the valiation data in model.fit. For other augmentation layers like RandomFlip layer the documentation states:
During inference time, the output will be identical to input.
So you have to look up the information on the type of layer you are using. Documentation is here. From what I could gather I think only the resizing and rescaling layers remain active during inference mode.

Extracting activations from a specific layer of neural network

I was working on an image recognition problem. After training the model, I saved the architecture as well as weights. Now I want to use the model for extracting features from other images and perform SVM on that. For this, I want to remove the last two layers of my model and get the values calculated by the CNN and fully connected layers till then. How can I do that in Keras?
# a simple model
model = keras.models.Sequential([
keras.layers.Input((32,32,3)),
keras.layers.Conv2D(16, 3, activation='relu'),
keras.layers.Flatten(),
keras.layers.Dense(10, activation='softmax')
])
# after training
feature_only_model = keras.models.Model(model.inputs, model.layers[-2].output)
feature_only_model take a (32,32,3) for input and the output is the feature vector
If your model is subclassed - just change call() method.
If not:
if your model is complicated - wrap your model by subclassed model and change forward pass in call() method, or
if your model is simple - create model without the last layers, load weights to every layer separately

efficientnet.tfkeras vs tf.keras.applications.efficientnet

I am trying to use efficientnet to custom train my dataset.
And I find out with all other code/data/config the same.
efficientnet.tfkeras.EfficientNetB0 can gives ~90% training/prediction accruacy and tf.keras.applications.efficientnet.EfficientNetB0 only gives ~70% accuracy.
But I guess both should be the same implementation of the efficient net, or I am missing something here?
I am using latest efficientnet and Tensorflow 2.3.0
with strategy.scope():
model = tf.keras.Sequential([
efficientnet.tfkeras.EfficientNetB0( #tf.keras.applications.efficientnet.EfficientNetB0
input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3),
weights='imagenet',
include_top=False
),
L.GlobalAveragePooling2D(),
L.Dense(1, activation='sigmoid')
])
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['binary_crossentropy']
)
model.summary()
I did run into the same problem for EfficientNetB4 and did encounter the following:
The number of total parameters are not equal. The trainable parameters are equal, but the non-trainable parameters aren't. The efficientnet.tfkeras has 7 fewer non-trainable parameters than the tf.keras.applications model.
The number of layers are not equal, the efficientnet.tfkeras has fewer layers than tf.keras.application model.
The different layers are at the very beginning, the most noteworthy are the normalization and rescaling layers, which are in the tf.keras.applications model, but not in the efficientnet.tfkeras model. You can observe this yourself using the model.summary() method.
When applying this layer, by using model.layers[i](array), it turn out these layers do rescale the image by dividing it by 255 and applying normalization according to:
(input_image - IMAGENET_MEAN) / square_root(IMAGENET_STD)
Thus, it turns out the image normalization is build into the model. When you perform this normalization yourself to the input image, the image will be normalized twice resulting in extremely small pixel values. The model will therefore have a hard time learning.
TLDR: Do not normalize the input image as it is build into the tf.keras.application model, input images should have values in the range 0-255.

How to clean images to use with a MNIST trained model?

I am creating a machine learning model for classifying images of numbers. I have trained the model using Tensorflow and Keras using the inbuilt tf.keras.datasets.mnist dataset. The model works quite well with the test images from the mnist dataset itself but I would like to feed it images of my own. The images that I am feeding this model is extracted from a Captcha so they will follow a similar pattern. I have included some examples of the images in this public google drive folder. When I feed these images, I noticed that the model is not very accurate and I have some guesses as to why.
The background of the image creates too much noise in the picture.
The number is not centered.
The image is not striclty in the color format of MNIST training set (Black background white text).
I wanted to ask how can I remove the background and centre it so that the noise in the image is reduced allowing for better classifications.
Here is the model I am using:
import tensorflow as tf
from tensorflow import keras
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
class Stopper(keras.callbacks.Callback):
def on_epoch_end(self, epoch, log={}):
if log.get('acc') >= 0.99:
self.model.stop_training = True
print('\nReached 99% Accuracy. Stopping Training...')
model = keras.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(1024, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)])
model.compile(
optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
x_train, x_test = x_train / 255, x_test / 255
model.fit(x_train, y_train, epochs=10, callbacks=[Stopper()])
And here is my method of importing the image into tensorflow:
from PIL import Image
img = Image.open("image_file_path").convert('L').resize((28, 28), Image.ANTIALIAS)
img = np.array(img)
model.predict(img[None,:,:])
I have also included some examples from the MNIST dataset here. I would like a script to convert my images as closely to the MNIST dataset format as possible. Also, since I would have to do this for an indefinite number of images, I would appreciate if you could provide a fully automated method for this conversion. Thank you very much.
You need to train with a dataset similar to the images you're testing. The MNIST data is hand-written numbers, which is not going to be similar to the computer generated fonts for Captcha data.
What you need to do is gain a catalog of Captcha data similar to what you're predicting on (preferably from the same source you will be inputting to the final model). It's a painstaking task to capture the data, and you'll probably need around 300-400 images for each label before you start to get something useful.
A key note: your model will only ever be as good as the training data you supplied to the model. Trying to make a good model with bad training data is an effort in pure frustration
To address some of your thoughts:
[the model is not very accurate because] the background of the image creates too much noise in the picture.
This is true. If the image data has noise and the neural net was not trained using any noise in the images, then it will not recognize a strong pattern when it encounters this type of distortion. One possible way to combat this is to take clean images and progamatically add noise to the image (noise similar to what you see in the real Captcha) before sending it to be trained.
[the model is not very accurate because] The number is not centered.
Also true for the same reasons. If all the training data is centered, the model will be overtuned for this property and make incorrect guesses. Follow a similar pattern to the one above if you don't have the capacity to manually capture and catalog a good sampling of data.
[the model is not very accurate because] The image is not striclty in the color format of MNIST training set (Black background white text).
You can get around this by applying a binary threshold to the data before processing/ normalize the color input before training. Depending on the amount of noise in the captcha you may have better results allowing the number and noise to retain some of it's color information (still put in greyscale and normalize, just don't apply the threshold).
Additionally I'd recommend using a convolution net rather than the linear network as it is better at distinguishing 2D features like edges and corners. i.e. use keras.layers.Conv2D layers before flattening with keras.layers.Flatten
See the great example found here: Trains a simple convnet on the MNIST dataset.
model = tf.keras.models.Sequential(
[
tf.keras.layers.Conv2D(
32,
kernel_size=(3, 3),
activation=tf.nn.relu,
input_shape=input_shape,
),
tf.keras.layers.Conv2D(64, (3, 3), activation=tf.nn.relu),
tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
tf.keras.layers.Dropout(0.25),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(
num_classes, activation=tf.nn.softmax
),
]
)
I've used this setup for reading fonts in video gameplay footage, and with a test set of 10,000 images I'm achieving 99.98% accuracy, using a random sampling of half the dataset in training, and calculating accuracy using the total set.

CNN Keras Object Localization - Bad predictions

I'm a beginner in machine learning and I currently am trying to predict the position of an object within an image that is part of a dataset I created.
This dataset contains about 300 images in total and contains 2 classes (Ace and Two).
I created a CNN that predicts whether it's an Ace or a two with about 88% accuracy.
Since this dataset was doing a great job, I decided to try and predict the position of the card (instead of the class). I read up some articles and from what I understood, all I had to do was to take the same CNN that I used to predict the class and to change the last layer for a Dense layer of 4 nodes.
That's what I did, but apparently this isn't working.
Here is my model:
model = Sequential()
model.add(Conv2D(64,(3,3),input_shape = (150,150,1)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(32,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=2))
model.add(Dense(64))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(4))
model.compile(loss="mean_squared_error",optimizer='adam',metrics=[])
model.fit(X,y,batch_size=1,validation_split=0,
epochs=30,verbose=1,callbacks=[TENSOR_BOARD])
What I feed to my model:
X: a grayscale Image of 150x150 pixels. Each pixels are rescaled between [0-1]
y: Smallest X coordinate, Highest Y coordinate, Width and Height of the object (each of those values are between [0-1].
And here's an example of predictions it gives me:
[array([ 28.66145 , 41.278576, -9.568813, -13.520659], dtype=float32)]
but what I really wanted was:
[0.32, 0.38666666666666666, 0.4, 0.43333333333333335]
I knew something was wrong here so I decided to train and test my CNN on a single image (so it should overfit and predict the right bounding box for this single image if it worked). Even after overfitting on this single image, the predicted values were ridiculously high.
So my question is:
What am I doing wrong ?
EDIT 1
After trying #Matias's solution which was to add a sigmoid activation function to the last layer, all of the output's values are now between [0,1].
But, even with this, the model still produces bad outputs.
For example, after training it 10 epochs on the same image, it predicted this:
[array([0.0000000e+00, 0.0000000e+00, 8.4378130e-18, 4.2288357e-07],dtype=float32)]
but what I expected was:
[0.2866666666666667, 0.31333333333333335, 0.44666666666666666, 0.5]
EDIT 2
Okay, so, after experimenting for quite a while, I've come to a conclusion that the problem was either my model (the way it is built)
or the lack of training data.
But even if it was caused by a lack of training data, I should have been able to overfit it on 1 image in order to get the right predictions for this one, right?
I created another post which asks about my last question since the original one has been answered and I don't want to completely re-edit the post since it would make the first answers kind of pointless.
Since your targets (the Y values) are normalized to the [0, 1] range, the output of the model should match this range. For this you should use a sigmoid activation at the output layer, so the output is constrained to the [0, 1] range:
model.add(Dense(4, activation='sigmoid'))