I have a siamese CNN that is performing very well (96% accuracy, 0.08 loss) on training data but poorly (70% accuracy, 0.1 loss) on testing data.
The architecture is below:
input_main = Input(shape=input_shape, dtype='float32')
x = Conv2D(32, (3, 3), padding='same', activation='relu',
kernel_regularizer=l2(0.005))(input_main)
x = Conv2D(16, (5, 5), activation='relu',
kernel_regularizer=l2(0.005))(x)
x = MaxPooling2D(pool_size=(5, 5))(x)
x = Dropout(0.5)(x)
x = Conv2D(32, (3, 3), padding='same', activation='relu',
kernel_regularizer=l2(0.0005))(x)
x = Conv2D(32, (7, 7), activation='relu',
kernel_regularizer=l2(0.005))(x)
x = MaxPooling2D(pool_size=(3, 3))(x)
x = Dropout(0.5)(x)
x = Flatten()(x)
#x = Dropout(0.5)(x)
x = Dense(16, activation='relu',
kernel_regularizer=l2(0.005))(x)
model = Model(inputs=input_main, outputs=x)
Two of these are then combined to make a siamese architecture, and the difference between the vectors from the final layer informs the result. I have experimented with dropout and regularization, and neither has been able to solve the problem (these parameters are the ones I am testing at time of posting)
I have also tried simplifying the architecture to fewer conv layers, and this has not solved the problem.
The data is 256x128x1 images, sent through the network in pairs with binary labels based on whether they are the same or not. I also use data augmentation, with some small rotations and translations.
Can anyone suggest anything else to try to solve this overfitting problem?
Related
I am trying to understand the loss function using Keras functional API.
I have a sample multi-output model based on the B-CNN model.
img_input = Input(shape=input_shape, name='input')
#--- block 1 ---
x = Conv2D(32, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
#--- coarse 1 branch ---
c_1_bch = Flatten(name='c_flatten')(x)
c_1_bch = Dense(64, activation='relu', name='c_dense')(c_1_bch)
c_1_bch = BatchNormalization()(c_1_bch)
c_1_bch = Dropout(0.5)(c_1_bch)
c_1_pred = Dense(num_c, activation='softmax', name='pred_coarse')(c_1_bch)
#--- block 3 ---
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
x = BatchNormalization()(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
x = BatchNormalization()(x)
x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
#--- fine block ---
x = Flatten(name='flatten')(x)
x = Dense(128, activation='relu', name='fc_1')(x)
x = BatchNormalization()(x)
x = Dropout(0.5)(x)
fine_pred = Dense(num_classes, activation='softmax', name='pred_fine')(x)
model = keras.Model(inputs= [img_input],
outputs= [c_1_pred, fine_pred],
name='B-CNN_Model')
This classification model takes one input and provides 2 predictions.
According to this post, we need to compile it first with the proper loss function, metrics, and optimizer by mentioning the name variables for each output layer.
I have done this in the following way.
model.compile(optimizer = optimizers.SGD(learning_rate=0.003, momentum=0.9, nesterov=True),
loss={'pred_coarse':'mse',
'pred_fine':'categorical_crossentropy'},
loss_weights={'pred_coarse':beta,
'pred_fine':gamma},
metrics={'pred_coarse':'accuracy',
'pred_fine':'accuracy'})
[Note: Here, output layer pred_coarse is using Mean Square Error and pred_fine is using Categorical Cross Entropy loss function. The loss_weights beta and gamma are variable and update the value after certain epochs using keras.callbacks.Callback function ]
Now, My question is, what happens if we compile the model without mentioning the name variables for each output layer and provide only one function instead? For example, we compile the model as follows:
model.compile(optimizer=optimizers.SGD(learning_rate=0.003, momentum=0.9, nesterov=True),
loss='categorical_crossentropy',
loss_weights=[beta, gamma],
metrics=['accuracy'])
Unlike the previous compile example, this one uses the Categorical Cross Entropy loss function. The model compiles and runs without any errors. Does the model using Categorical Cross Entropy loss function for both pred_coarse and pred_fine output layers?
I want to use a pre-trained model (from Keras Applications), with weights, and append my (very simple) CNN model at the end. To this end I am trying to loosely follow the tutorial here under the sub-header 'Fine-tune InceptionV3 on a new set of classes'.
My original simple CNN model was this:
model = Sequential()
model.add(Rescaling(1.0 / 255))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3)))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Flatten())
model.add(Dense(units=5, activation='softmax'))
As I'm following the tutorial, I've converted it as so:
x = base_model.output
x = Rescaling(1.0 / 255)(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(256,256,3))(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = GlobalAveragePooling2D()(x)
predictions = Dense(units=5, activation='softmax')(x)
As you can see, the difference is that the top model is a Sequential() model while the bottom is Functional (I think?), and also, that the Flatten() layer has been replaced with GlobalAveragePooling2D(). I did this because I kept getting shape-related errors and it wasn't compiling. I thought I got it once I replaced the Flatten() layer with the GlobalAveragePooling() as this part of the code finally did compile, however now that I'm trying to train the model, it's giving me the following error:
ValueError: Exception encountered when calling layer "max_pooling2d_7" (type MaxPooling2D).
Negative dimension size caused by subtracting 2 from 1 for '{{node model/max_pooling2d_7/MaxPool}} = MaxPool[T=DT_FLOAT, data_format="NHWC", explicit_paddings=[], ksize=[1, 2, 2, 1], padding="VALID", strides=[1, 2, 2, 1]](model/conv2d_10/Relu)' with input shapes: [?,1,1,64].
Call arguments received:
• inputs=tf.Tensor(shape=(None, 1, 1, 64), dtype=float32)
I don't want to remove the MaxPooling layer as I want this fine-tuned model append to be as close to the 'simple CNN' model I originally had, so that I can compare the two results. But I keep getting hit with these shape errors, which I don't really understand, and it's coming to the end of the day.
Is there a nice quick-fix that can enable this VGG16+simple CNN to work?
the first most important technical problem in your model structure is that you are rescaling images after passed through the base_model, so you should implement it just before the base model
the second one is that you have defined input_shape in the model above in convolution layer while data first pass throught base model, so you should define input layer before base model and then pass its output thorough base_model and the other layers
here i've edited your code:
inputs = Input(shape = (input_shape=(256,256,3))
x = Rescaling(1.0 / 255)(inputs)
x = base_model(x)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = Conv2D(64, kernel_size=(3, 3), activation='relu')(x)
x = MaxPool2D(pool_size=(2, 2), strides=2)(x)
x = GlobalAveragePooling2D()(x)
predictions = Dense(units=5, activation='softmax')(x)
model = keras.Model(inputs = [inputs], outputs = [predictions])
And for the error raised, in this case you could set convolution layers padding parameter to 'same' or even resize images to larger size to override the problem.
I am trying to get the output of the latent layer/hidden layer to use it as input for something else. I trained my model in an efficient way to minimize the loss so my model could learn the latent features efficiently and as close as possible to the image.
My model is
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format
#Encoder
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# Decoder
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x) # opposite of Pooling
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
I want the output of encoded layer as my output for the model. Is it possible? ad If yes, Please tell me how.
you can simply do in this way
autoencoder.fit(...)
latent_model = Model(input_img, encoded)
latent_representation = latent_model.predict(X)
I am training the autoencoder with 2000 identical images. My expectation is, that given the autoencoder has enough capacity the loss will approach 0 and the accuracy will approach 1 after a certain training time. Instead I see a quick convergence to loss = 0.07 and accuracy=0.76. Reducing the number of convolutional layers gave some improvement. Reducing the number of kernels per layer increased the loss. There is no improvement after that. Is my expectation wrong? Or is there something wrong with my autoencoder architecture? What can be done to make an almost lossless autoencoder?
input_img = Input(shape=(image_size_x, image_size_y, 1))
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
Thanks!
You need to add a dense layer between your autoconvolutional encoder and autoconvolution decoder. This is the latent reprensentation, also called embedding layer. This is the layer in which the image is compressed. That is the "compressed knowledge" that the architecture is trying to "learn".
For the implementation, from this tutorial: https://www.tensorflow.org/tutorials/generative/cvae
I would suggest you add these lines between the encoder and the decoder part:
x = tf.keras.layers.Flatten()(x),
x = tf.keras.layers.Dense(latent_dim + latent_dim)
I'm training a convolutional autoencoder for IR faces, this is my first time doing autoencoder. I have about 1300 training images, and I didn't using any regulation method. Here's what I got after 800 epochs:
top: test images, bottom: output from autoencoder.
And this is my training curve: top: training loss, bottom: validation loss. Validation loss uses the test set images that is separated from training set. At the end, the training loss is about 0.006, but the validation loss is 0.009.
My model is defined bellow, with input images with size 110X150 and output images with size 88X120. I simply resize the source images to make the training labels. Each sample/label are normalized by dividing by 255.
As for the architecture of this network, I read one paper using this similar layout for RGB images face feature, and I halved each layer's depth (channels) for my purpose.
So my question is, is there something wrong? The training curve is quite weird to me. And how do I improve this autoencoder? More epochs? Regulations? Choose another activation function(I heard about leaky ReLU is better). Any feedback and suggestion is appreciated, thanks!
def create_models():
input_img = Input(shape=(150, 110, 1)) # adapt this if using `channels_first` image data format
x = Conv2D(128, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(512, (3, 3), activation='relu')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (8, 6, 512) i.e. 128-dimensional
x = Conv2D(512, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(128, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='tanh', padding='same')(x)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')
return autoencoder