Three channel vs. grayscale image input to a CNN - tensorflow

I am trying to mimic and build an image classification model described in a paper. At first I was using 100x100x3 images as input to the neural network. The model was giving very good accuracy (~95%) in training phase, but producing very poor predictions, resulting in a confusion matrix that had same values in all cells. But when I trained the model with single band grayscale images as input, as done in the paper itself, the accuracy reduced to ~84% but the model made good predictions with good confusion matrix. I am building the CNN model with keras and using ImageDataGenerator for flowing images.
This is how I create image augmentation.(same as described in the paper):
# Flip image along y=x line: returns flipped image as an integer array
def flip_xy(img):
img_flip = tfimg.rot90(np.flipud(PIL.Image.fromarray(img).resize((100,100))))
return np.array(img_flip)
# Rotate img thrice by 90 degrees: returns 3 rotated images as integer arrays
def rotate90(img):
img90 = tfimg.rot90(np.array(img.resize((100,100))))
img180 = tfimg.rot90(np.array(img90))
img270 = tfimg.rot90(np.array(img180))
return np.array(img90), np.array(img180), np.array(img270)
# Crop image once from each corner and once from centre: return image object
def getCrops(img):
upper_left = img.crop((10,10,110,110))
upper_right = img.crop((0,10,100,110))
lower_left = img.crop((10,0,110,100))
lower_right = img.crop((0,0,100,100))
centre = img.crop((5,5,105,105))
return upper_left, upper_right, lower_left, lower_right, centre
# Create 40 copies from single image: data augmentation
def createCopies(img_path, target_folder):
img = image.load_img(img_path, target_size = (110,110))
img_file = img_path.split("\\")[-1].split(".")[0]
k = 1
ul, ur, ll, lr, ctr = getCrops(img)
for image0 in [ul, ur, ll, lr, ctr]:
i90, i180, i270 = rotate90(image0)
for image1 in [i90, i180, i270, np.array(image0)]:
flip = flip_xy(image1)
PIL.Image.fromarray(image1).save(os.path.join(target_folder, f"{img_file}({k}).jpg"))
PIL.Image.fromarray(flip).save(os.path.join(target_folder, f"{img_file}({k+1}).jpg"))
k += 2
I then iterate over all the original images, creating augmentations.
Here is the code for model training:
Model training. Ignore the fact that the training phase is not completed in the notebook. It is from a later run. In the original run, the training ran for all 30 epochs.
Here is the code for model evaluation:
Model evaluation. Here I have a second doubt. Why does predicting each image individually and predicting the generator object produce different results?
Can anyone explain why this happens? Shouldn't three channel images be able to train the model better? And why is there a discrepancy between model accuracy while training and actual predictions?

Related

Keras compatible code for shrink and perturb deep model training

I am referring to this study https://proceedings.neurips.cc/paper/2020/file/288cd2567953f06e460a33951f55daaf-Paper.pdf "On Warm-Starting Neural Network Training". Here, the authors propose a shrink and perturb technique to retrain the models on new-arriving data. In warm restart, the models are initialized with their previously trained weights on old data and are retrained on the new data. In the proposed technique, the weights and biases of the existing model are shrunk towards zero and then added with random noise. To shrink a weight, it is multiplied by a value that's between 0 and 1, typically about 0.5. Their official pytorch code is available at https://github.com/JordanAsh/warm_start/blob/main/run.py. A simple explanation of this study is given at https://pureai.com/articles/2021/02/01/warm-start-ml.aspx where the writer gives a simple pytorch function to perform shrink and perturbation of the existing model as shown below:
def shrink_perturb(model, lamda=0.5, sigma=0.01):
for (name, param) in model.named_parameters():
if 'weight' in name: # just weights
nc = param.shape[0] # cols
nr = param.shape[1] # rows
for i in range(nr):
for j in range(nc):
param.data[j][i] = \
(lamda * param.data[j][i]) + \
T.normal(0.0, sigma, size=(1,1))
return
With the defined function, a prediction model can be
initialized with the shrink-perturb technique using code like this:
net = Net().to(device)
fn = ".\\Models\\employee_model_first_100.pth"
net.load_state_dict(T.load(fn))
shrink_perturb(net, lamda=0.5, sigma=0.01)
# now train net as usual
Is there a Keras compatible version of this function definition where we can shrink the weights and add random gaussian noise to an existing model like this?
model = load_model('weights/model.h5')
model.summary()
shrunk_model = shrink_perturn(model,lamda=0.5,sigma=0.01)
shrunk_model.summary()
maybe something like this:
ws = [w * 0.5 + tf.random.normal(w.shape) for w in model.get_weights()]
model.set_weights(ws)

CNN + LSTM model for images performs poorly on validation data set

My training and loss curves look like below and yes, similar graphs have received comments like "Classic overfitting" and I get it.
My model looks like below,
input_shape_0 = keras.Input(shape=(3,100, 100, 1), name="img3")
model = tf.keras.layers.TimeDistributed(Conv2D(8, 3, activation="relu"))(input_shape_0)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(16, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(32, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(Flatten())(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.4))(model)
model = LSTM(16, kernel_regularizer=tf.keras.regularizers.l2(0.007))(model)
# model = Dense(100, activation="relu")(model)
# model = Dense(200, activation="relu",kernel_regularizer=tf.keras.regularizers.l2(0.001))(model)
model = Dense(60, activation="relu")(model)
# model = Flatten()(model)
model = Dropout(0.15)(model)
out = Dense(30, activation='softmax')(model)
model = keras.Model(inputs=input_shape_0, outputs = out, name="mergedModel")
def get_lr_metric(optimizer):
def lr(y_true, y_pred):
return optimizer.lr
return lr
opt = tf.keras.optimizers.RMSprop()
lr_metric = get_lr_metric(opt)
# merged.compile(loss='sparse_categorical_crossentropy',
optimizer='adam', metrics=['accuracy'])
model.compile(loss='sparse_categorical_crossentropy',
optimizer=opt, metrics=['accuracy',lr_metric])
model.summary()
In the above model building code, please consider the commented lines as some of the approaches I have tried so far.
I have followed the suggestions given as answers and comments to this kind of question and none seems to be working for me. Maybe I am missing something really important?
Things that I have tried:
Dropouts at different places and different amounts.
Played with inclusion and expulsion of dense layers and their number of units.
Number of units on the LSTM layer was tried with different values (started from as low as 1 and now at 16, I have the best performance.)
Came across weight regularization techniques and tried to implement them as shown in the code above and so tried to put it at different layers ( I need to know what is the technique in which I need to use it instead of simple trial and error - this is what I did and it seems wrong)
Implemented learning rate scheduler using which I reduce the learning rate as the epochs progress after a certain number of epochs.
Tried two LSTM layers with the first one having return_sequences = true.
After all these, I still cannot overcome the overfitting problem.
My data set is properly shuffled and divided in a train/val ratio of 80/20.
Data augmentation is one more thing that I found commonly suggested which I am yet to try, but I want to see if I am making some mistake so far which I can correct it and avoid diving into data augmentation steps for now. My data set has the below sizes:
Training images: 6780
Validation images: 1484
The numbers shown are samples and each sample will have 3 images. So basically, I input 3 mages at once as one sample to my time-distributed CNN which is then followed by other layers as shown in the model description. Following that, my training images are 6780 * 3 and my Validation images are 1484 * 3. Each image is 100 * 100 and is on channel 1.
I am using RMS prop as the optimizer which performed better than adam as per my testing
UPDATE
I tried some different architectures and some reularizations and dropouts at different places and I am now able to achieve a val_acc of 59% below is the new model.
# kernel_regularizer=tf.keras.regularizers.l2(0.004)
# kernel_constraint=max_norm(3)
model = tf.keras.layers.TimeDistributed(Conv2D(32, 3, activation="relu"))(input_shape_0)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(64, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Conv2D(128, 3, activation="relu"))(model)
model = tf.keras.layers.TimeDistributed(MaxPooling2D(2))(model)
model = tf.keras.layers.TimeDistributed(Dropout(0.3))(model)
model = tf.keras.layers.TimeDistributed(GlobalAveragePooling2D())(model)
model = LSTM(128, return_sequences=True,kernel_regularizer=tf.keras.regularizers.l2(0.040))(model)
model = Dropout(0.60)(model)
model = LSTM(128, return_sequences=False)(model)
model = Dropout(0.50)(model)
out = Dense(30, activation='softmax')(model)
Try to perform Data Augmentation as a preprocessing step. Lack of data samples can lead to such curves. You can also try using k-fold Cross Validation.
There are many ways to prevent overfitting, according to the papers below:
Dropout layers (Disabling randomly neurons). https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
Input Noise (e.g. Random Gaussian Noise on the imges). https://arxiv.org/pdf/2010.07532.pdf
Random Data Augmentations (e.g. Rotating, Shifting, Scaling, etc.).
https://arxiv.org/pdf/1906.11052.pdf
Adjusting Number of Layers & Units.
https://clgiles.ist.psu.edu/papers/UMD-CS-TR-3617.what.size.neural.net.to.use.pdf
Regularization Functions (e.g. L1, L2, etc)
https://www.researchgate.net/publication/329150256_A_Comparison_of_Regularization_Techniques_in_Deep_Neural_Networks
Early Stopping: If you notice that for N successive epochs that your model's training loss is decreasing, but the model performs poorly on validaiton data set, then It is a good sign to stop the training.
Shuffling the training data or K-Fold cross validation is also common way way of dealing with Overfitting.
I found this great repository, which contains examples of how to implement data augmentations:
https://github.com/kochlisGit/random-data-augmentations
Also, this repository here seems to have examples of CNNs that implement most of the above methods:
https://github.com/kochlisGit/Tensorflow-State-of-the-Art-Neural-Networks
The goal should be to get the model predict correctly irrespective of
the order in which the 3 images in the sample are arranged.
If the order of the images of each sample is not important for the training, I think your model does the inverse, the Timedistributed layers succeded by LSTM take into account the order of the three images. As a solution, primarily, you can add images by reordering the images of each sample (= Augmented data). Secondly, try to consider the three images as one image with three-channel and remove the Timedistributed layers (I'm not sure that the three-channels are more efficient but you can give it a try)

How to make image cube for 3d convolution with file path

Recently, I am studying 3d convolution for video image processing with tensorflow.
I make model with tutorial blog. But i want to make my custom dataset. My input image's shape is (128,128,3) and i want to make image cube(128,128,100,3). I use tensorflow.data.dataset and I tried to create a map function by recalling my memories I used for 2d convolution. I want to image cube using path that consist of (Number of image cube, 100) with tf.data.dataset map function because of running out of memory when using NumPy.
I tried to use code like the following
def load_image(path):
images = []
for i, p in enumerate(path):
image_string = tf.io.read_file(p)
image = tf.io.decode_jpeg(p, channels=3)
image = tf.reshape(image, [128,128,1,3])
image = image / 255
images.append(image)
image_block = tf.concat(images, axis=2)
return image_block
train_data = tf.data.Dataset.from_tensor_slices(total_files) # shape (1077,100)
train_data = train_data.map(load_images, num_parallel_calls=tf.data.experimental.AUTOTUNE)
But have error that tensor's shape changes. And i also use tf.Variable using .assign but have similar error.
How to make 3d convolution's input image cube with path??? I use tensorflow 2.0.
So you cannnot iterate over tensor like for x in tensor. In that case you can for example iterate over range and get value by index like
for x in range(tf.shape(tensor)[0]):
y = tensor[x]

Keras/Tensorflow - Generate predictions in batch for imagenet (I get only one result back)

I am generating imagenet tags for all keyframes in a video with a single call and have this code:
# all keras/tf/mobilenet imports
model_imagenet = MobileNetV2(weights='imagenet')
frames_list = []
for frame in frame_set:
frame_img = frame.to_image()
frame_pil = frame_img.resize((224,224), Image.ANTIALIAS)
ts = int(frame.pts)
x = image.img_to_array(frame_pil)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
frames_list.append(x)
print(len(frames_list))
preds_list = model_imagenet.predict_on_batch(frames_list)
print("[*]",preds_list)
The result appears thus:
frames_list count: 125
and the predictions thus, one row of 1000 dimensions (imagenet classes), shouldn't it be 125 rows?:
[[1.15425530e-04 1.83317825e-04 4.28701424e-05 2.87547664e-05
:
7.91769926e-05 1.30803732e-04 4.81895368e-05 3.06891889e-04]]
This is generating prediction for a single row in the batch. I have tried both predict and predict_on_batch with the same result.
How can I get a bulk prediction for say 200 frames at one go with Keras/Tensorflow/Mobilenet?
ImageNet is a popular database which consists of 1000 different categories.
The dimension of 1000 is natural and to be expected, since for one image the softmax outputs a probability for each of the 1000 classes.
EDIT: For multiple image predictions, you should use predict_generator(). In addition, as of TensorFlow 2.0, if you use the Keras backend, predict_generator() has been deprecated in favor of simple predict, which also allows input data as generators.
E.g. : (from How to use predict_generator with ImageDataGenerator?) :
test_datagen = ImageDataGenerator(rescale=1./255)
#Modify the batch size here
test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(200, 200),
color_mode="rgb",
shuffle = False,
class_mode='categorical',
batch_size=1)
filenames = test_generator.filenames
nb_samples = len(filenames)
predict = model.predict_generator(test_generator,steps = nb_samples)
Please bear in mind that it will be highly unlikely to have a lot of predictions at once, since it is constrained to the memory of the video card.
Also, note the difference between predict and predict_on_batch: What is the difference between the predict and predict_on_batch methods of a Keras model?
OK, here is how I solved it, hope this helps someone else:
preds_list = model_imagenet.predict(np.vstack(frames_list),batch_size=32)
print("[*]",preds_list)
Please note the np.vstack and adjust the batch_size to whatever your computer is capable of.

Tensorflow slim how to specify batch size during training

I'm trying to use slim interface to create and train a convolutional neural network, but I couldn't figure out how to specify the batch size for training.
During the training my net crashes because of "Out of Memory" on my graphic card.
So I think that should be a way to handle this condition...
Do I have to split the data and the labels in batches and then explicitly loop or the slim.learning.train is taking care of it?
In the code I paste train_data are all the data in my training set (numpy array)..and the model definition is not included here
I had a quick loop to the sources but no luck so far...
g = tf.Graph()
with g.as_default():
# Set up the data loading:
images = train_data
labels = tf.contrib.layers.one_hot_encoding(labels=train_labels, num_classes=num_classes)
# Define the model:
predictions = model7_2(images, num_classes, is_training=True)
# Specify the loss function:
slim.losses.softmax_cross_entropy(predictions, labels)
total_loss = slim.losses.get_total_loss()
tf.scalar_summary('losses/total loss', total_loss)
# Specify the optimization scheme:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)
train_tensor = slim.learning.create_train_op(total_loss, optimizer)
slim.learning.train(train_tensor,
train_log_dir,
number_of_steps=1000,
save_summaries_secs=300,
save_interval_secs=600)
Any hints suggestions?
Edit:
I re-read the documentation...and I found this example
image, label = MyPascalVocDataLoader(...)
images, labels = tf.train.batch([image, label], batch_size=32)
But It's not clear at all how to feed image and label to be passed to tf.train.batch... as MyPascalVocDataLoader function is not specified...
In my case my data set are loaded from a sqlite database and I have training data and labels as numpy array....still confused.
Of course I tried to pass my numpy arrays (converted to constant tensor) to the tf.train.batch like this
image = tf.constant(train_data)
label = tf.contrib.layers.one_hot_encoding(labels=train_labels, num_classes=num_classes)
images, labels = tf.train.batch([image, label], batch_size=32)
But seems not the right path to follow... it seems that the train.batch wants only one element from my data set...(how to pass this? it does not make sense to me to pass only train_data[0] and train_labels[0])
Here you can create the tfrecords which is the special type of binary file format used by the tensorflow. As you mentioned you have the training images and the labels, you can easily create the TFrecords for training and validation.
After creating the TFrecords, all you need to right is decode the images from the encoded TFrecords and give it to your model input. There you can select the batch size and all.