Overfitting on my model classification images CNN [closed] - tensorflow

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
Improve this question
I tried a lot to make a change in my validation accuracy by adding layers and dropout but still, I have no change yet my accuracy is upper than 95% and my validation accuracy is always stuck in 88%.
my split :
x_train,x_validate,y_train,y_validate = train_test_split(x_train,y_train,test_size = 0.2,random_state = 42)
after splitting data (shape) :
x_train shape: (5850,)
x_train shape: (5850,)
x_validate shape: (1463,)
y_validate shape: (1463,)
x_test shape: (2441,)
y_test shape: (2441,)
width and height and number of channels :
width, height, channels = 64, 64, 3
after converting images to array (shape) :
Training set shape : (5850, 64, 64, 3)
Validation set shape : (1463, 64, 64, 3)
Test set shape : (2441, 64, 64, 3)
and I have 6 classes
augmentation :
datagen = ImageDataGenerator(
featurewise_center=True,
samplewise_center=True,
featurewise_std_normalization=True,
samplewise_std_normalization=True,
zca_whitening=False,
rotation_range=0.9,
zoom_range = 0.7,
width_shift_range=0.8,
height_shift_range=0.8,
horizontal_flip=True,
vertical_flip=True)
datagen.fit(x_train)
my Sequential :
model = Sequential()
model.add(Conv2D(16,(3,3),input_shape = (224,224,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D())
model.add(Conv2D(32,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D())
model.add(Conv2D(64,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D())
model.add(Conv2D(128,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D())
model.add(Conv2D(256,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(256))
model.add(Activation("relu"))
model.add(Dropout(0.3))
model.add(Dense(256))
model.add(Activation("relu"))
model.add(Dropout(0.2))
model.add(Dense(numberOfClass)) # output
model.add(Activation("softmax"))
model.compile(loss = "binary_crossentropy",
optimizer = "adam",
metrics = ["accuracy"])
batch_size = 256
I put an early stopping in my code to make the best less validation loss, how can improve my validation accuracy to at least 92%?
Epoch 1/100
29/29 [==============================] - 62s 2s/step - loss: 0.4635 - accuracy: 0.3040 - val_loss: 0.4227 - val_accuracy: 0.4007
Epoch 00001: val_loss improved from inf to 0.42266, saving model to ./model_best_weights.h5
Epoch 2/100
29/29 [==============================] - 60s 2s/step - loss: 0.4230 - accuracy: 0.3260 - val_loss: 0.4046 - val_accuracy: 0.3314
Epoch 00002: val_loss improved from 0.42266 to 0.40463, saving model to ./model_best_weights.h5
Epoch 3/100
29/29 [==============================] - 60s 2s/step - loss: 0.3833 - accuracy: 0.4234 - val_loss: 0.3417 - val_accuracy: 0.5125
Epoch 00003: val_loss improved from 0.40463 to 0.34174, saving model to ./model_best_weights.h5
Epoch 4/100
29/29 [==============================] - 60s 2s/step - loss: 0.3351 - accuracy: 0.5040 - val_loss: 0.3108 - val_accuracy: 0.5432
Epoch 00004: val_loss improved from 0.34174 to 0.31083, saving model to ./model_best_weights.h5
Epoch 5/100
29/29 [==============================] - 59s 2s/step - loss: 0.3002 - accuracy: 0.5683 - val_loss: 0.2655 - val_accuracy: 0.6247
Epoch 00005: val_loss improved from 0.31083 to 0.26553, saving model to ./model_best_weights.h5
Epoch 6/100
29/29 [==============================] - 60s 2s/step - loss: 0.2794 - accuracy: 0.6025 - val_loss: 0.2677 - val_accuracy: 0.6194
Epoch 00006: val_loss did not improve from 0.26553
Epoch 7/100
29/29 [==============================] - 60s 2s/step - loss: 0.2606 - accuracy: 0.6374 - val_loss: 0.2524 - val_accuracy: 0.6477
Epoch 00007: val_loss improved from 0.26553 to 0.25239, saving model to ./model_best_weights.h5
Epoch 8/100
29/29 [==============================] - 59s 2s/step - loss: 0.2400 - accuracy: 0.6751 - val_loss: 0.2232 - val_accuracy: 0.6997
Epoch 00008: val_loss improved from 0.25239 to 0.22320, saving model to ./model_best_weights.h5
Epoch 9/100
29/29 [==============================] - 60s 2s/step - loss: 0.2307 - accuracy: 0.6875 - val_loss: 0.2092 - val_accuracy: 0.7181
Epoch 00009: val_loss improved from 0.22320 to 0.20916, saving model to ./model_best_weights.h5
Epoch 10/100
29/29 [==============================] - 59s 2s/step - loss: 0.2085 - accuracy: 0.7284 - val_loss: 0.2092 - val_accuracy: 0.7255
Epoch 00010: val_loss did not improve from 0.20916
Epoch 11/100
29/29 [==============================] - 60s 2s/step - loss: 0.1961 - accuracy: 0.7463 - val_loss: 0.1943 - val_accuracy: 0.7603
Epoch 00011: val_loss improved from 0.20916 to 0.19435, saving model to ./model_best_weights.h5
Epoch 12/100
29/29 [==============================] - 60s 2s/step - loss: 0.1894 - accuracy: 0.7621 - val_loss: 0.1829 - val_accuracy: 0.7669
Epoch 00012: val_loss improved from 0.19435 to 0.18294, saving model to ./model_best_weights.h5
Epoch 13/100
29/29 [==============================] - 60s 2s/step - loss: 0.1766 - accuracy: 0.7770 - val_loss: 0.1751 - val_accuracy: 0.7780
Epoch 00013: val_loss improved from 0.18294 to 0.17508, saving model to ./model_best_weights.h5
Epoch 14/100
29/29 [==============================] - 60s 2s/step - loss: 0.1606 - accuracy: 0.8006 - val_loss: 0.1666 - val_accuracy: 0.8005
Epoch 00014: val_loss improved from 0.17508 to 0.16662, saving model to ./model_best_weights.h5
Epoch 15/100
29/29 [==============================] - 60s 2s/step - loss: 0.1531 - accuracy: 0.8105 - val_loss: 0.1718 - val_accuracy: 0.7816
Epoch 00015: val_loss did not improve from 0.16662
Epoch 16/100
29/29 [==============================] - 61s 2s/step - loss: 0.1449 - accuracy: 0.8265 - val_loss: 0.1600 - val_accuracy: 0.8083
Epoch 00016: val_loss improved from 0.16662 to 0.16000, saving model to ./model_best_weights.h5
Epoch 17/100
29/29 [==============================] - 62s 2s/step - loss: 0.1309 - accuracy: 0.8419 - val_loss: 0.1609 - val_accuracy: 0.8202
Epoch 00017: val_loss did not improve from 0.16000
Epoch 18/100
29/29 [==============================] - 60s 2s/step - loss: 0.1165 - accuracy: 0.8607 - val_loss: 0.1572 - val_accuracy: 0.8222
Epoch 00018: val_loss improved from 0.16000 to 0.15722, saving model to ./model_best_weights.h5
Epoch 19/100
29/29 [==============================] - 60s 2s/step - loss: 0.1109 - accuracy: 0.8711 - val_loss: 0.1523 - val_accuracy: 0.8370
Epoch 00019: val_loss improved from 0.15722 to 0.15225, saving model to ./model_best_weights.h5
Epoch 20/100
29/29 [==============================] - 60s 2s/step - loss: 0.1008 - accuracy: 0.8877 - val_loss: 0.1405 - val_accuracy: 0.8484
Epoch 00020: val_loss improved from 0.15225 to 0.14046, saving model to ./model_best_weights.h5
Epoch 21/100
29/29 [==============================] - 60s 2s/step - loss: 0.1063 - accuracy: 0.8764 - val_loss: 0.1514 - val_accuracy: 0.8390
Epoch 00021: val_loss did not improve from 0.14046
Epoch 22/100
29/29 [==============================] - 61s 2s/step - loss: 0.0880 - accuracy: 0.8979 - val_loss: 0.1423 - val_accuracy: 0.8550
Epoch 00022: val_loss did not improve from 0.14046
Epoch 23/100
29/29 [==============================] - 60s 2s/step - loss: 0.0750 - accuracy: 0.9196 - val_loss: 0.1368 - val_accuracy: 0.8632
Epoch 00023: val_loss improved from 0.14046 to 0.13678, saving model to ./model_best_weights.h5
Epoch 24/100
29/29 [==============================] - 60s 2s/step - loss: 0.0712 - accuracy: 0.9218 - val_loss: 0.1520 - val_accuracy: 0.8521
Epoch 00024: val_loss did not improve from 0.13678
Epoch 25/100
29/29 [==============================] - 60s 2s/step - loss: 0.0664 - accuracy: 0.9288 - val_loss: 0.1600 - val_accuracy: 0.8451
Epoch 00025: val_loss did not improve from 0.13678
Epoch 26/100
29/29 [==============================] - 60s 2s/step - loss: 0.0605 - accuracy: 0.9360 - val_loss: 0.1528 - val_accuracy: 0.8636
Epoch 00026: val_loss did not improve from 0.13678
Epoch 00026: early stopping
images of my graph :
https://i.imgur.com/pNYwcE8.jpg
https://i.imgur.com/ZCSRI8e.jpg

You should experiment more, but glancing at your code, I can give you the following tips:
according to the plot, validation accuracy is increasing a bit even in the end, maybe you can try to increase EarlyStopping patience and monitor validation accuracy instead of validation loss
add batch normalization into your architecture
increase dropout rate, maybe to some value between 0.4 and 0.7
tune learning rate, and maybe use some learning rate scheduler like ReduceLROnPlateau which might help to train even further after there is no increase in validation metrics
Good luck!

Related

why keras wont train on my entire image set

I am training a convolutional model in keras.
The size of my training data is
(60000, 28, 28, 1)
I build a simple model here
model = Sequential()
model.add(
Conv2D(
filters=32,
kernel_size=(4, 4),
input_shape=(28, 28, 1),
activation='relu'
)
)
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(
loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']
)
When I tried to fit this model to my data in the following line
model.fit(
x_train,
y_train_hot_encode,
epochs=20,
validation_data=(x_test, y_test_hot_encode),
)
I noticed something weird in the logs
Epoch 1/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.5311 - accuracy: 0.8109 - val_loss: 0.3381 - val_accuracy: 0.8780
Epoch 2/20
1875/1875 [==============================] - 19s 10ms/step - loss: 0.2858 - accuracy: 0.8948 - val_loss: 0.2820 - val_accuracy: 0.8973
Epoch 3/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.2345 - accuracy: 0.9150 - val_loss: 0.2732 - val_accuracy: 0.9001
Epoch 4/20
1875/1875 [==============================] - 18s 9ms/step - loss: 0.2016 - accuracy: 0.9247 - val_loss: 0.2549 - val_accuracy: 0.9077
Epoch 5/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1644 - accuracy: 0.9393 - val_loss: 0.2570 - val_accuracy: 0.9077
Epoch 6/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1434 - accuracy: 0.9466 - val_loss: 0.2652 - val_accuracy: 0.9119
Epoch 7/20
1875/1875 [==============================] - 17s 9ms/step - loss: 0.1225 - accuracy: 0.9553 - val_loss: 0.2638 - val_accuracy: 0.9135
As you can see, each epoch was trained on 1875 images and not on the entire 60K images, why is that? or am I reading the log the wrong way?
It is because the number shown there is number of steps instead of number of examples trained. As you didn't supply batch_size into model.fit(), it used the default batch size 32.
The expected number of steps per epoch is ceil(60000/32) = 1875, consistent with what is shown in the log.

Keras monitor on val_recall reports not improve although it is improving

Monitoring Keras metric of val_reall. It has been improving but it keeps the best value as the lowest 0.9958 although better values 0.9978 or 0.9985 have been recorded. The monitor mode is set to 'auto'.
Please help understand why the Keras thinks the metric is not improving.
Epoch 1/10
6883/6883 [==============================] - 1982s 287ms/step - loss: 0.1025 - recall: 0.9738 - accuracy: 0.9631 - val_loss: 0.0537 - val_recall: 0.9978 - val_accuracy: 0.9837
Epoch 00001: val_recall improved from inf to 0.99783, saving model to /content/drive/MyDrive/home/repository/mon/kaggle/toxic_comment_classification/toxicity_classification_2021JUL10_1647/model_Ctoxic_B32_L256/model.h5
Epoch 2/10
6883/6883 [==============================] - 1970s 286ms/step - loss: 0.0348 - recall: 0.9946 - accuracy: 0.9901 - val_loss: 0.0412 - val_recall: 0.9958 - val_accuracy: 0.9888
Epoch 00002: val_recall improved from 0.99783 to 0.99583, saving model to /content/drive/MyDrive/home/repository/mon/kaggle/toxic_comment_classification/toxicity_classification_2021JUL10_1647/model_Ctoxic_B32_L256/model.h5
Epoch 3/10
6883/6883 [==============================] - 1970s 286ms/step - loss: 0.0181 - recall: 0.9968 - accuracy: 0.9952 - val_loss: 0.0446 - val_recall: 0.9984 - val_accuracy: 0.9897
Epoch 00003: val_recall did not improve from 0.99583
Epoch 4/10
6883/6883 [==============================] - 1972s 286ms/step - loss: 0.0125 - recall: 0.9976 - accuracy: 0.9967 - val_loss: 0.0429 - val_recall: 0.9985 - val_accuracy: 0.9902
Epoch 00004: val_recall did not improve from 0.99583
Epoch 5/10
6883/6883 [==============================] - 1973s 287ms/step - loss: 0.0094 - recall: 0.9979 - accuracy: 0.9974 - val_loss: 0.0663 - val_recall: 0.9991 - val_accuracy: 0.9873
Epoch 00005: ReduceLROnPlateau reducing learning rate to 5.9999998484272515e-06.
Epoch 00005: val_recall did not improve from 0.99583
Epoch 6/10
6883/6883 [==============================] - 1970s 286ms/step - loss: 0.0031 - recall: 0.9996 - accuracy: 0.9993 - val_loss: 0.0646 - val_recall: 0.9998 - val_accuracy: 0.9901
Epoch 00006: val_recall did not improve from 0.99583
Epoch 7/10
6883/6883 [==============================] - 1967s 286ms/step - loss: 0.0019 - recall: 0.9998 - accuracy: 0.9997 - val_loss: 0.0641 - val_recall: 0.9997 - val_accuracy: 0.9903
Restoring model weights from the end of the best epoch.
Epoch 00007: val_recall did not improve from 0.99583
Epoch 00007: early stopping
Solution
As per the comment by Innat, mode=max at callbacks.
From Comments:
Setting mode=max in the Callbacks has resolved the issue.

Facial expression val_accuracy does not improve

I am running te following code:
basemodel.fit(X_train,y_train,epochs=25,validation_split=.1,callbacks=call_back)
But I get a result Epoch 00014: val_accuracy did not improve from 0.57709. I am not sure what is the issue there because I clearly see that my loss has decreased and my accuracy has increased.
This is the result
Epoch 1/25
909/909 [==============================] - 13s 6ms/step - loss: 1.6465 - accuracy: 0.3396 - val_loss: 1.4830 - val_accuracy: 0.4334
Epoch 00001: val_accuracy improved from -inf to 0.43344, saving model to checkpoint/best_model.h5
Epoch 2/25
909/909 [==============================] - 5s 5ms/step - loss: 1.3402 - accuracy: 0.4860 - val_loss: 1.3291 - val_accuracy: 0.4926
Epoch 00002: val_accuracy improved from 0.43344 to 0.49257, saving model to checkpoint/best_model.h5
Epoch 3/25
909/909 [==============================] - 5s 5ms/step - loss: 1.2050 - accuracy: 0.5418 - val_loss: 1.2769 - val_accuracy: 0.5025
Epoch 00003: val_accuracy improved from 0.49257 to 0.50248, saving model to checkpoint/best_model.h5
Epoch 4/25
909/909 [==============================] - 5s 5ms/step - loss: 1.1054 - accuracy: 0.5806 - val_loss: 1.1936 - val_accuracy: 0.5495
Epoch 00004: val_accuracy improved from 0.50248 to 0.54954, saving model to checkpoint/best_model.h5
Epoch 5/25
909/909 [==============================] - 5s 5ms/step - loss: 1.0190 - accuracy: 0.6159 - val_loss: 1.1535 - val_accuracy: 0.5551
Epoch 00005: val_accuracy improved from 0.54954 to 0.55511, saving model to checkpoint/best_model.h5
Epoch 6/25
909/909 [==============================] - 5s 5ms/step - loss: 0.9329 - accuracy: 0.6502 - val_loss: 1.1962 - val_accuracy: 0.5641
Epoch 00006: val_accuracy improved from 0.55511 to 0.56409, saving model to checkpoint/best_model.h5
Epoch 7/25
909/909 [==============================] - 5s 5ms/step - loss: 0.8435 - accuracy: 0.6846 - val_loss: 1.1707 - val_accuracy: 0.5771
Epoch 00007: val_accuracy improved from 0.56409 to 0.57709, saving model to checkpoint/best_model.h5
Epoch 8/25
909/909 [==============================] - 5s 5ms/step - loss: 0.7527 - accuracy: 0.7201 - val_loss: 1.3817 - val_accuracy: 0.5545
Epoch 00008: val_accuracy did not improve from 0.57709
Epoch 9/25
909/909 [==============================] - 5s 5ms/step - loss: 0.6633 - accuracy: 0.7576 - val_loss: 1.5021 - val_accuracy: 0.5207
Epoch 00009: val_accuracy did not improve from 0.57709
Epoch 10/25
909/909 [==============================] - 5s 5ms/step - loss: 0.5865 - accuracy: 0.7874 - val_loss: 1.5610 - val_accuracy: 0.5721
Epoch 00010: val_accuracy did not improve from 0.57709
Epoch 11/25
909/909 [==============================] - 5s 5ms/step - loss: 0.5154 - accuracy: 0.8097 - val_loss: 1.5723 - val_accuracy: 0.5430
Epoch 00011: val_accuracy did not improve from 0.57709
Epoch 12/25
909/909 [==============================] - 5s 6ms/step - loss: 0.4540 - accuracy: 0.8333 - val_loss: 2.1641 - val_accuracy: 0.5650
Epoch 00012: val_accuracy did not improve from 0.57709
Epoch 13/25
909/909 [==============================] - 5s 5ms/step - loss: 0.4106 - accuracy: 0.8511 - val_loss: 2.3236 - val_accuracy: 0.5322
Epoch 00013: val_accuracy did not improve from 0.57709
Epoch 14/25
909/909 [==============================] - 5s 5ms/step - loss: 0.3747 - accuracy: 0.8682 - val_loss: 1.8985 - val_accuracy: 0.5567
Epoch 00014: val_accuracy did not improve from 0.57709
Epoch 15/25
909/909 [==============================] - 5s 5ms/step - loss: 0.3480 - accuracy: 0.8768 - val_loss: 2.1689 - val_accuracy: 0.5505
Epoch 00015: val_accuracy did not improve from 0.57709
Epoch 16/25
909/909 [==============================] - 5s 5ms/step - loss: 0.3224 - accuracy: 0.8878 - val_loss: 2.0880 - val_accuracy: 0.5269
Epoch 00016: val_accuracy did not improve from 0.57709
Epoch 17/25
909/909 [==============================] - 5s 5ms/step - loss: 0.3157 - accuracy: 0.8912 - val_loss: 2.2746 - val_accuracy: 0.5328
Epoch 00017: val_accuracy did not improve from 0.57709
Epoch 18/25
909/909 [==============================] - 5s 5ms/step - loss: 0.2960 - accuracy: 0.8992 - val_loss: 2.3014 - val_accuracy: 0.5582
Epoch 00018: val_accuracy did not improve from 0.57709
Epoch 19/25
909/909 [==============================] - 5s 5ms/step - loss: 0.2961 - accuracy: 0.8998 - val_loss: 2.8190 - val_accuracy: 0.5399
Epoch 00019: val_accuracy did not improve from 0.57709
Epoch 20/25
909/909 [==============================] - 5s 5ms/step - loss: 0.2945 - accuracy: 0.9016 - val_loss: 2.5621 - val_accuracy: 0.5495
Epoch 00020: val_accuracy did not improve from 0.57709
Epoch 21/25
909/909 [==============================] - 5s 5ms/step - loss: 0.2772 - accuracy: 0.9075 - val_loss: 2.6602 - val_accuracy: 0.5402
Epoch 00021: val_accuracy did not improve from 0.57709
Epoch 22/25
909/909 [==============================] - 5s 6ms/step - loss: 0.2857 - accuracy: 0.9070 - val_loss: 2.7156 - val_accuracy: 0.5381
Epoch 00022: val_accuracy did not improve from 0.57709
Epoch 23/25
909/909 [==============================] - 5s 5ms/step - loss: 0.2767 - accuracy: 0.9098 - val_loss: 3.4705 - val_accuracy: 0.5291
Epoch 00023: val_accuracy did not improve from 0.57709
Epoch 24/25
909/909 [==============================] - 5s 6ms/step - loss: 0.2725 - accuracy: 0.9100 - val_loss: 3.5462 - val_accuracy: 0.5706
Epoch 00024: val_accuracy did not improve from 0.57709
Epoch 25/25
909/909 [==============================] - 5s 5ms/step - loss: 0.2675 - accuracy: 0.9134 - val_loss: 2.3214 - val_accuracy: 0.5254
Epoch 00025: val_accuracy did not improve from 0.57709
<tensorflow.python.keras.callbacks.History at 0x7f9d42d7afd0>
Below is a screenshot of my code:
My learning rate is .01.
basemodel.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=.01), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
This is the case of overfitting/memorization of the training data by the model.
Change the Validation data and set it to train data, you will see validation loss will also go down.
With the discussion That I had with you!! You had just 1000 data points and the model that you are building have 403,463 trainable parameters.
Choices that you have
Get more data
Use pretrained layers(this is known as transfer learning)
Use regularization parameter
Use Dropout
Use Batch normalization (Won't be very Effective)
Getting more data or using pre-trained layers will be highly effective in your case!!

Keras model gets worse when fine-tuning

I'm trying to follow the fine-tuning steps described in https://www.tensorflow.org/tutorials/images/transfer_learning#create_the_base_model_from_the_pre-trained_convnets to get a trained model for binary segmentation.
I create an encoder-decoder with the weights of the encoder being the ones of the MobileNetV2 and fixed as encoder.trainable = False. Then, I define my decoder as said in the tutorial and I train the network for 300 epochs using a learning rate of 0.005. I get the following loss value and Jaccard index during the lasts epochs:
Epoch 297/300
55/55 [==============================] - 85s 2s/step - loss: 0.2443 - jaccard_sparse3D: 0.5556 - accuracy: 0.9923 - val_loss: 0.0440 - val_jaccard_sparse3D: 0.3172 - val_accuracy: 0.9768
Epoch 298/300
55/55 [==============================] - 75s 1s/step - loss: 0.2437 - jaccard_sparse3D: 0.5190 - accuracy: 0.9932 - val_loss: 0.0422 - val_jaccard_sparse3D: 0.3281 - val_accuracy: 0.9776
Epoch 299/300
55/55 [==============================] - 78s 1s/step - loss: 0.2465 - jaccard_sparse3D: 0.4557 - accuracy: 0.9936 - val_loss: 0.0431 - val_jaccard_sparse3D: 0.3327 - val_accuracy: 0.9769
Epoch 300/300
55/55 [==============================] - 85s 2s/step - loss: 0.2467 - jaccard_sparse3D: 0.5030 - accuracy: 0.9923 - val_loss: 0.0463 - val_jaccard_sparse3D: 0.3315 - val_accuracy: 0.9740
I store all the weights of this model and then, I compute the fine-tuning with the following steps:
model.load_weights('my_pretrained_weights.h5')
model.trainable = True
model.compile(optimizer=Adam(learning_rate=0.00001, name='adam'),
loss=SparseCategoricalCrossentropy(from_logits=True),
metrics=[jaccard, "accuracy"])
model.fit(training_generator, validation_data=(val_x, val_y), epochs=5,
validation_batch_size=2, callbacks=callbacks)
Suddenly the performance of my model is way much worse than during the training of the decoder:
Epoch 1/5
55/55 [==============================] - 89s 2s/step - loss: 0.2417 - jaccard_sparse3D: 0.0843 - accuracy: 0.9946 - val_loss: 0.0079 - val_jaccard_sparse3D: 0.0312 - val_accuracy: 0.9992
Epoch 2/5
55/55 [==============================] - 90s 2s/step - loss: 0.1920 - jaccard_sparse3D: 0.1179 - accuracy: 0.9927 - val_loss: 0.0138 - val_jaccard_sparse3D: 7.1138e-05 - val_accuracy: 0.9998
Epoch 3/5
55/55 [==============================] - 95s 2s/step - loss: 0.2173 - jaccard_sparse3D: 0.1227 - accuracy: 0.9932 - val_loss: 0.0171 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 0.9999
Epoch 4/5
55/55 [==============================] - 94s 2s/step - loss: 0.2428 - jaccard_sparse3D: 0.1319 - accuracy: 0.9927 - val_loss: 0.0190 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 1.0000
Epoch 5/5
55/55 [==============================] - 97s 2s/step - loss: 0.1920 - jaccard_sparse3D: 0.1107 - accuracy: 0.9926 - val_loss: 0.0215 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 1.0000
Is there any known reason why this is happening? Is it normal?
Thank you in advance!
OK I found out what I do different that makes it NOT necessary to compile. I do not set encoder.trainable = False. What I do in the code below is equivalent
for layer in encoder.layers:
layer.trainable=False
then train your model. Then you can unfreeze the encoder weights with
for layer in encoder.layers:
layer.trainable=True
You do not need to recompile the model. I tested this and it works as expected. You can
verify by priniting model summary before and after and look at the number of trainable parameters. As for changing the learning rate I find it is best to use the the keras callback ReduceLROnPlateau to automatically adjust the learning rate based on validation loss. I also recommend using the EarlyStopping callback which monitors validation and halts training if the loss fails to reduce after 'patience' number of consecutive epochs. Setting restore_best_weights=True will load the weights for the epoch with the lowest validation loss so you don't have to save then reload the weights. Set epochs to a large number to ensure this callback activates. The code I use is shown below
es=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=3,
verbose=1, restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1,
verbose=1)
callbacks=[es, rlronp]
In model.fit set callbacks=callbacks

Keras - training loss vs validation loss

Just for the sake of the argument I am using the same data during training for training and validation, like this:
model.fit_generator(
generator=train_generator,
epochs=EPOCHS,
steps_per_epoch=train_generator.n // BATCH_SIZE,
validation_data=train_generator,
validation_steps=train_generator.n // BATCH_SIZE
)
So I would expect that the loss and the accuracy of training and validation at the end of each epoch would be pretty much the same? Still it looks like this:
Epoch 1/150
26/26 [==============================] - 55s 2s/step - loss: 1.5520 - acc: 0.3171 - val_loss: 1.6646 - val_acc: 0.2796
Epoch 2/150
26/26 [==============================] - 46s 2s/step - loss: 1.2924 - acc: 0.4996 - val_loss: 1.5895 - val_acc: 0.3508
Epoch 3/150
26/26 [==============================] - 46s 2s/step - loss: 1.1624 - acc: 0.5873 - val_loss: 1.6197 - val_acc: 0.3262
Epoch 4/150
26/26 [==============================] - 46s 2s/step - loss: 1.0601 - acc: 0.6265 - val_loss: 1.9420 - val_acc: 0.3150
Epoch 5/150
26/26 [==============================] - 46s 2s/step - loss: 0.9790 - acc: 0.6640 - val_loss: 1.9667 - val_acc: 0.2823
Epoch 6/150
26/26 [==============================] - 46s 2s/step - loss: 0.9191 - acc: 0.6951 - val_loss: 1.8594 - val_acc: 0.3342
Epoch 7/150
26/26 [==============================] - 46s 2s/step - loss: 0.8811 - acc: 0.7087 - val_loss: 2.3223 - val_acc: 0.2869
Epoch 8/150
26/26 [==============================] - 46s 2s/step - loss: 0.8148 - acc: 0.7379 - val_loss: 1.9683 - val_acc: 0.3358
Epoch 9/150
26/26 [==============================] - 46s 2s/step - loss: 0.8068 - acc: 0.7307 - val_loss: 2.1053 - val_acc: 0.3312
Why does especially the accuracy differ so much although its from the same data source? Is there something about the way how this is calculated that I am missing?
The generator is created like this:
train_images = keras.preprocessing.image.ImageDataGenerator(
rescale=1./255
)
train_generator = train_images.flow_from_directory(
directory="data/superheros/images/train",
target_size=(299, 299),
batch_size=BATCH_SIZE,
shuffle=True
)
Yes, it shuffles the images, but as it iterates over all images also for validation, shouldn't the accuracy at least be close?
So the model looks like this:
inceptionV3 = keras.applications.inception_v3.InceptionV3(include_top=False)
features = inceptionV3.output
net = keras.layers.GlobalAveragePooling2D()(features)
predictions = keras.layers.Dense(units=2, activation="softmax")(net)
for layer in inceptionV3.layers:
layer.trainable = False
model = keras.Model(inputs=inceptionV3.input, outputs=predictions)
optimizer = keras.optimizers.RMSprop()
model.compile(
optimizer=optimizer,
loss="categorical_crossentropy",
metrics=['accuracy']
)
So no dropout or anything, just the inceptionv3 with a softmax layer on top. I would expect that the accuracy differs a bit, but not in this magnitude.
Are you sure your train_generator returns the same data when Keras retrieves training data and validation data, if it's a generator?
The name being generator, I'd expect it not to :)