I have a very basic query. I have made 4 almost identical(Difference being input shapes) CNN and have merged them while connecting to a Feed Forward Network of fully connected layers.
Code for the almost identical CNN(s):
model3 = Sequential()
model3.add(Convolution2D(32, (3, 3), activation='relu', padding='same',
input_shape=(batch_size[3], seq_len, channels)))
model3.add(MaxPooling2D(pool_size=(2, 2)))
model3.add(Dropout(0.1))
model3.add(Convolution2D(64, (3, 3), activation='relu', padding='same'))
model3.add(MaxPooling2D(pool_size=(2, 2)))
model3.add(Flatten())
But on tensorboard I see all the Dropout layers are interconnected, and Dropout1 is of different color than Dropout2,3,4,etc which all are the same color.
I know this is an old question but I had the same issue myself and just now I realized what's going on
Dropout is only applied if we're training the model. This should be deactivated by the time we're evaluating/predicting. For that purpose, keras creates a learning_phase placeholder, set to 1.0 if we're training the model.
This placeholder is created inside the first Dropout layer you create and is shared across all of them. So that's what you're seeing there!
Related
I am rather new to deep learning and got some questions on performing a multi-label image classification task with keras convolutional neural networks. Those are mainly referring to evaluating keras models performing multi label classification tasks. I will structure this a bit to get a better overview first.
Problem Description
The underlying dataset are album cover images from different genres. In my case those are electronic, rock, jazz, pop, hiphop. So we have 5 possible classes that are not mutual exclusive. Task is to predict possible genres for a given album cover. Each album cover is of size 300px x 300px. The images are loaded into tensorflow datasets, resized to 150px x 150px.
Model Architecture
The architecture for the model is the following.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.experimental.preprocessing.RandomFlip("vertical"),
layers.experimental.preprocessing.RandomRotation(0.4),
layers.experimental.preprocessing.RandomZoom(height_factor=(0.2, 0.6), width_factor=(0.2, 0.6))
]
)
def create_model(num_classes=5, augmentation_layers=None):
model = Sequential()
# We can pass a list of layers performing data augmentation here
if augmentation_layers:
# The first layer of the augmentation layers must define the input shape
model.add(augmentation_layers)
model.add(layers.experimental.preprocessing.Rescaling(1./255))
else:
model.add(layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)))
model.add(layers.Conv2D(32, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
# Use sigmoid activation function. Basically we train binary classifiers for each class by specifiying binary crossentropy loss and sigmoid activation on the output layer.
model.add(layers.Dense(num_classes, activation='sigmoid'))
model.summary()
return model
I'm not using the usual metrics here like standard accuracy. In this paper I read that you cannot evaluate multi-label classification models with the usual methods. In chapter 7. evaluation metrics the hamming loss and an adjusted accuracy (variant of exact match) are presented which I use for this model.
The hamming loss is already provided by tensorflow-addons (see here) and an implementation of the subset accuracy I found here (see here).
from tensorflow_addons.metrics import HammingLoss
hamming_loss = HammingLoss(mode="multilabel", threshold=0.5)
def subset_accuracy(y_true, y_pred):
# From https://stackoverflow.com/questions/56739708/how-to-implement-exact-match-subset-accuracy-as-a-metric-for-keras
threshold = tf.constant(.5, tf.float32)
gtt_pred = tf.math.greater(y_pred, threshold)
gtt_true = tf.math.greater(y_true, threshold)
accuracy = tf.reduce_mean(tf.cast(tf.equal(gtt_pred, gtt_true), tf.float32), axis=-1)
return accuracy
# Create model
model = create_model(num_classes=5, augmentation_layers=data_augmentation)
# Compile model
model.compile(loss="binary_crossentropy", optimizer="adam", metrics=[subset_accuracy, hamming_loss])
# Fit the model
history = model.fit(training_dataset, epochs=epochs, validation_data=validation_dataset, callbacks=callbacks)
Problem with this model
When training the model subset_accuracy hamming_loss are at some point stuck which looks like the following:
What could cause this behaviour. I am honestly a little bit lost right now. Could this be a case of the dying relu problem? Or is it wrong use of the metrics mentioned or is the implementation of those maybe wrong?
So far I tried to test differen optimizers and lowering the learning rate (e.g. from 0.01 to 0.001, 0.0001, etc..) but that didn't help either.
Maybe somebody has an idea that can help me.
Thanks in advance!
I think you need to tune your model's hyperparameters right. For that I'll recommend try using Keras Tuner library.
This would take some time to run, but will fetch you right set of hyperparameters.
I'm doing a side project to learn AI with ANN, I thought of making an unsupervised model that extracts features of each frame on a video to compare them in the future and detect image repetitions.
My idea is to use a CNN to extract for each frame the features but I can't seem to make it work, as I am learning my intuition tells me that there is something I am just not understanding.
How can I create an unsupervised model that extracts features of an array of images?
This is what I got:
img = load_image_func(???) # this loads a video and return a reshaped ordered list of frames
input_shape = (150, 150, 3)
# The model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', name='conv_1', input_shape=input_shape))
model.add(MaxPooling2D((2, 2), name='maxpool_1'))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same', name='conv_2'))
model.add(MaxPooling2D((2, 2), name='maxpool_2'))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same', name='conv_3'))
model.add(MaxPooling2D((2, 2), name='maxpool_3'))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same', name='conv_4'))
model.add(MaxPooling2D((2, 2), name='maxpool_4'))
model.add(Flatten())
model.add(Dropout(0.5))
model.add(Dense(512, activation='relu', name='dense_1'))
model.add(Dense(128, activation='relu', name='dense_2'))
model.add(Dense(67500, activation='sigmoid', name='output'))
optimizer=keras.optimizers.Adam(learning_rate=0.001)
model.compile(loss = 'sparse_categorical_crossentropy', optimizer= optimizer, metrics=['accuracy'])
#model.summary()
model.fit(vidcap, vidcap, batch_size=64, epochs=20)
I have the feeling I should be training the model but as it is unsupervised I don't have train data.
Also, how many units should I put in the output layer as I don't how many features will be detected?
Thanks for your time
Indeed, the CNN model will extract several features of an Image (e.g. colors, shapes, edges, patterns, etc.)
However, what are you defining as Images Repetition? Are you looking for an algorithm that finds similar images? If this is the case, then You might wanna look into Siamese Networks, which is exactly what they do:
https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf
The main idea here is that there are 2 Neural Networks that are trained together! Then, after the training is done, You use both neural networks to extract features of the same images seperately and compare the results to find the similarity.
i'm a beginner in the argument. I have this problem: I have to classify the percentage of 2 class in each frame of a video.
I created a small dataset with about 500 images (250 of each class), and a CNN with these layers:
model = tf.models.Sequential()
model.add(tf.layers.Conv2D(32, kernel_size=(3, 3), activation='relu',input_shape=(224,224,3)))
model.add(tf.layers.MaxPooling2D((2, 2)))
model.add(tf.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(tf.layers.MaxPooling2D((2, 2)))
model.add(tf.layers.Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(tf.layers.MaxPooling2D((2, 2)))
model.add(tf.layers.Conv2D(256, kernel_size=(3, 3), activation='relu'))
model.add(tf.layers.MaxPooling2D((2, 2)))
model.add(tf.layers.Flatten())
model.add(tf.layers.Dense(512, activation='relu'))
model.add(tf.layers.Dropout(0.2))
model.add(tf.layers.Dense(2,activation='sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy', optimizer=tf.optimizers.Adam(learning_rate=0.00001), metrics=['accuracy'])
1)It's better for the problem use binary_crossentropy + sigmoid or binary_crossentropy + softmax?
2)Then it's better to use transfer learning/fine tuning or build CNN from scratch like this?
3)I'm using ImageDataGenerator for DataAugmentation because the small dataset, it's right?
4)Which values I can use for batch_size, steps_per_epochs,learning_rate...I noticed that the model accuracy goes early to 1.0 with val_accuracy, and in the predictions doesn't return the correct percentage of each class, but return values like [9.999e-1 4.444e-5]
Since, yours is a binary classification, go with sigmoid. Softmax is for multi-class (>2).
It is always better to use transfer learning. Go with VGG16, ResNet, Inception and others.
Yes, in case of small datasets, data augmentation helps a lot.
You need to use one neuron in the last layer rather than 2. Since, in one neuron, if value is greater than 0.5, it will be considered as class 1 otherwise 0. If you want to stick with two neurons, then, for getting your answer, you should take np.argmax of the prediction, in the example you have given, pred = [9.999e-1 4.444e-5], the predicted class is 0, as pred[0] > pred[1].
I have this cnn model:
model = Sequential()
model.add(Convolution2D(32, (3, 3), activation='relu', input_shape=(n_rows,n_cols,1)))
model.add(Convolution2D(32, (3, 3), activation='relu'))
model.add(AveragePooling2D(pool_size=(1,3)))
model.add(Flatten())
model.add(Dense(1024, activation='relu')) #needed?
model.add(Dense(3)) #default linear activation
I can train it and obtain related weights.
After I want to load the weights up to Flatten (the dense part is not useful for the second stage) and pass the Flatten to an LSTM.
Of course, it is also suggested to use TimeDistributed on the CNN net.
How to do all this: load weights, take only CNN part, TimeDistribute it, and finally add LSTM?
Thanks!
You can use model.save_weights("filename.h5") to save the weights, and model.load_weights("filename.h5") to load them back into the model.
Source: https://keras.io/getting-started/faq/#savingloading-only-a-models-weights
Learning from this Keras document example
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', # why filter is 32?
input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3))) # why filter is not changed?
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same')) # why filter is changed to 64?
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512)) # why Dense neurons is 512? not 1024? what's the rule to set the number?
Here are my qeustions:
why in the 1st layer filter is 32 and not changed in the 2nd place but still in 1st layer?
Why in the 2nd layer filter is changed to 64? What is the rule to set the number?
why Dense neurons are 512? not 1024? what's the rule to set the number?
Why in the 1st layer filter is 32 and not changed in the 2nd place but still in 1st layer?
Number of filters can be any arbitrary number. It's just a matter of having more kernels in that layer. Each filter does a separate convolution on all channels of the input. So 32 filters does 32 separate convolutions on all RGB channels of the input.
Why in the 2nd layer filter is changed to 64? What is the rule to set the number?
Again following the first answer, number of filter on each layer can be anything. Here for example, the second layers has 64 filters doing 64 separate convolutions on all 32 channels of the output of the first layer.
Why Dense neurons are 512? not 1024? what's the rule to set the number?
Again the dense layer can have any number of neurons. For example of you have a 64x64x3 RGB input, your last convolution output will produce (batch_size, 16, 16, 64) (assuming padding='same' and stride of (2,2) on max pool layer) output.
After going through Flatten() layer this will become a (batch_size, 16*16*64) output. Then you convert take this as the input to the dense layer and produce a (batch_size, 512) output (because the Dense layer has 512 neurons). To be exact the Dense layer does the following matrix multiplication. (batch_size, 16*16*64) x (16*16*64, 512) which results in a (batch_size, 512) sized output from the Dense layer.
Note: To set these parameters, best way would be to do hyperparameter optimization w.r.t your dataset.
Edit: What do I mean by separate convolutions
So a filter would represent a single color here. This is for 1D convolution (with padding='valid'). But you get the idea. They are randomly initialized separate filters. Over time, they learn various filters.