Keras: Validation accuracy stays the exact same but validation loss decreases - tensorflow

I know that the problem can't be with the dataset because I've seen other projects use the same dataset.
Here is my data preprocessing code:
import pandas as pd
dataset = pd.read_csv('political_tweets.csv')
dataset.head()
dataset = pd.read_csv('political_tweets.csv')["tweet"].values
y_train = pd.read_csv('political_tweets.csv')["dem_or_rep"].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(dataset, y_train, test_size=0.1)
max_words = 10000
print(max_words)
max_len = 25
tokenizer = Tokenizer(num_words = max_words, filters='!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~\t\n1234567890', lower=False,oov_token="<OOV>")
tokenizer.fit_on_texts(x_train)
x_train = tokenizer.texts_to_sequences(x_train)
x_train = pad_sequences(x_train, max_len, padding='post', truncating='post')
tokenizer.fit_on_texts(x_test)
x_test = tokenizer.texts_to_sequences(x_test)
x_test = pad_sequences(x_test, max_len, padding='post', truncating='post')
And my model:
model = Sequential([
Embedding(max_words+1,64,input_length=max_len),
Bidirectional(GRU(64, return_sequences = True), merge_mode='concat'),
GlobalMaxPooling1D(),
Dense(64,kernel_regularizer=regularizers.l2(0.02)),
Dropout(0.5),
Dense(1, activation='sigmoid'),
])
model.summary()
model.compile(loss='binary_crossentropy', optimizer=RMSprop(learning_rate=0.0001), metrics=['accuracy'])
model.fit(x_train,y_train, batch_size=128, epochs=500, verbose=1, shuffle=True, validation_data=(x_test, y_test))
Both of my losses decrease, my training accuracy increases, but the validation accuracy stays at 50% (which is awful considering I am doing a binary classification model).
Epoch 1/500
546/546 [==============================] - 35s 64ms/step - loss: 1.7385 - accuracy: 0.5102 - val_loss: 1.2458 - val_accuracy: 0.5102
Epoch 2/500
546/546 [==============================] - 34s 62ms/step - loss: 0.9746 - accuracy: 0.5137 - val_loss: 0.7886 - val_accuracy: 0.5102
Epoch 3/500
546/546 [==============================] - 34s 62ms/step - loss: 0.7235 - accuracy: 0.5135 - val_loss: 0.6943 - val_accuracy: 0.5102
Epoch 4/500
546/546 [==============================] - 34s 62ms/step - loss: 0.6929 - accuracy: 0.5135 - val_loss: 0.6930 - val_accuracy: 0.5102
Epoch 5/500
546/546 [==============================] - 34s 62ms/step - loss: 0.6928 - accuracy: 0.5135 - val_loss: 0.6931 - val_accuracy: 0.5102
Epoch 6/500
546/546 [==============================] - 34s 62ms/step - loss: 0.6927 - accuracy: 0.5135 - val_loss: 0.6931 - val_accuracy: 0.5102
Epoch 7/500
546/546 [==============================] - 37s 68ms/step - loss: 0.6925 - accuracy: 0.5136 - val_loss: 0.6932 - val_accuracy: 0.5106
Epoch 8/500
546/546 [==============================] - 34s 63ms/step - loss: 0.6892 - accuracy: 0.5403 - val_loss: 0.6958 - val_accuracy: 0.5097
Epoch 9/500
546/546 [==============================] - 35s 63ms/step - loss: 0.6815 - accuracy: 0.5633 - val_loss: 0.7013 - val_accuracy: 0.5116
Epoch 10/500
546/546 [==============================] - 34s 63ms/step - loss: 0.6747 - accuracy: 0.5799 - val_loss: 0.7096 - val_accuracy: 0.5055
I've seen other posts on this topic and they say to add dropout, crossentropy, decrease the learning rate, etc. I have done all of this and none of it works.
Any help is greatly appreciated.
Thanks in advance!

A couple of observations for your problem:
Though not particularly familiar with the dataset, I trust that it is used in many circumstances without problems. However, you could try to check for its balance. In train_test_split() there is a parameter called stratify which, if fed the y, it will ensure the same number of samples for each class are in training set and test set proportionally.
Your phenomenon with validation loss and validation accuracy is not something out of the ordinary. Imagine that in the first epochs, the neural network considers some ground truth positive examples (ys) with GT == 1 with 55% confidence. While the training advances, the neural network learns better, and now it is 90% confident for a ground truth positive example (ys) with GT == 1. Since the threshold for calculating the accuracy is 50% , in both situations you have the same accuracy. Nevertheless, the loss has changed significantly, since 90% >> 55%.
You training seems to advance(slowly but surely). Have you considered using Adam as an off-the-shelves optimizer?
If the low accuracy is still maintained over some epochs, you may very well suffer from a well known phenomenon called underfitting, in which your model is unable to capture the dependencies between your data. To mitigate/avoid underfitting altogether, you may want to use a more complex model (2 LSTMs / 2 GRUs) stacked.
At this stage, remove the Dropout() layer, since you have underfitting, not overfitting.
Decrease the batch_size. Very big batch_size can lead to local minima, rendering you network unable to properly learn/generalize.
If none of these work, try starting with a lower learning rate, say 0.00001 instead of 0.0001.
Reiterate over the dataset preprocessing steps. Ensure the sentences are converted properly.

I have had a similar issue and I think it might be because dropout is right before the output layer. Try moving it to one layer before that.

Related

Why is the val_acc of my model 4 classification always around 25%? My train_acc is increasing

I train my model in Colab. I want to use mobilenet_v2 model to classify picture to 4 categories, but the accuracy is always about 25%.
And the train_acc is increasing, but the val_acc is not.
Part of my codes as follows:
base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(input_shape=(224,224,3), weights=None, classes=4)
from tensorflow.keras.optimizers import RMSprop
base_model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1/255)
train_generator = train_datagen.flow_from_directory(
'GarbageForMaixHub',
target_size=(224,224),
batch_size=16,
class_mode='categorical'
)
base_model.fit(
train_generator,
steps_per_epoch=64,
epochs=5,
verbose=1,
validation_data=train_generator, # same to train data
validation_steps=16
)
And part of the outputs as follows:
Epoch 1/5
64/64 [==============================] - 15s 226ms/step - loss: 1.1683 - acc: 0.4434 - val_loss: 1.4742 - val_acc: 0.2070
Epoch 2/5
64/64 [==============================] - 15s 228ms/step - loss: 1.2155 - acc: 0.4258 - val_loss: 1.4351 - val_acc: 0.2852
Epoch 3/5
64/64 [==============================] - 15s 228ms/step - loss: 1.1015 - acc: 0.5029 - val_loss: 1.4660 - val_acc: 0.2617
Epoch 4/5
64/64 [==============================] - 14s 223ms/step - loss: 1.0972 - acc: 0.5088 - val_loss: 1.5096 - val_acc: 0.2852
Epoch 5/5
64/64 [==============================] - 14s 222ms/step - loss: 1.0932 - acc: 0.5195 - val_loss: 1.6133 - val_acc: 0.2266
<keras.callbacks.History at 0x7ff01f7eeeb0>
The val_acc is so different from train_acc when I used my train_generator as my validation_generator.
What could I do to solve, or to avoid such problem?
I feel a little clueless. I tried some diffirent model such as tf.keras.applications.EfficientNetB0, but it's no different. I have check the picture and labels, no problems found.

Questions about the number of learning [duplicate]

So I've been following Google's official tensorflow guide and trying to build a simple neural network using Keras. But when it comes to training the model, it does not use the entire dataset (with 60000 entries) and instead uses only 1875 entries for training. Any possible fix?
import tensorflow as tf
from tensorflow import keras
import numpy as np
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0
class_names = ['T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot']
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss= tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)
Output:
Epoch 1/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3183 - accuracy: 0.8866
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3169 - accuracy: 0.8873
Epoch 3/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3144 - accuracy: 0.8885
Epoch 4/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3130 - accuracy: 0.8885
Epoch 5/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3110 - accuracy: 0.8883
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3090 - accuracy: 0.8888
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3073 - accuracy: 0.8895
Epoch 8/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3057 - accuracy: 0.8900
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3040 - accuracy: 0.8905
Epoch 10/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3025 - accuracy: 0.8915
<tensorflow.python.keras.callbacks.History at 0x7fbe0e5aebe0>
Here's the original google colab notebook where I've been working on this: https://colab.research.google.com/drive/1NdtzXHEpiNnelcMaJeEm6zmp34JMcN38
The number 1875 shown during fitting the model is not the training samples; it is the number of batches.
model.fit includes an optional argument batch_size, which, according to the documentation:
If unspecified, batch_size will default to 32.
So, what happens here is - you fit with the default batch size of 32 (since you have not specified anything different), so the total number of batches for your data is
60000/32 = 1875
It does not train on 1875 samples.
Epoch 1/10
1875/1875 [===
1875 here is the number of steps, not samples. In fit method, there is an argument, batch_size. The default value for it is 32. So 1875*32=60000. The implementation is correct.
If you train it with batch_size=16, you will see the number of steps will be 3750 instead of 1875, since 60000/16=3750.
Just use batch_size = 1, if you want the entire 60000 data samples to be visible.

Keras model gets worse when fine-tuning

I'm trying to follow the fine-tuning steps described in https://www.tensorflow.org/tutorials/images/transfer_learning#create_the_base_model_from_the_pre-trained_convnets to get a trained model for binary segmentation.
I create an encoder-decoder with the weights of the encoder being the ones of the MobileNetV2 and fixed as encoder.trainable = False. Then, I define my decoder as said in the tutorial and I train the network for 300 epochs using a learning rate of 0.005. I get the following loss value and Jaccard index during the lasts epochs:
Epoch 297/300
55/55 [==============================] - 85s 2s/step - loss: 0.2443 - jaccard_sparse3D: 0.5556 - accuracy: 0.9923 - val_loss: 0.0440 - val_jaccard_sparse3D: 0.3172 - val_accuracy: 0.9768
Epoch 298/300
55/55 [==============================] - 75s 1s/step - loss: 0.2437 - jaccard_sparse3D: 0.5190 - accuracy: 0.9932 - val_loss: 0.0422 - val_jaccard_sparse3D: 0.3281 - val_accuracy: 0.9776
Epoch 299/300
55/55 [==============================] - 78s 1s/step - loss: 0.2465 - jaccard_sparse3D: 0.4557 - accuracy: 0.9936 - val_loss: 0.0431 - val_jaccard_sparse3D: 0.3327 - val_accuracy: 0.9769
Epoch 300/300
55/55 [==============================] - 85s 2s/step - loss: 0.2467 - jaccard_sparse3D: 0.5030 - accuracy: 0.9923 - val_loss: 0.0463 - val_jaccard_sparse3D: 0.3315 - val_accuracy: 0.9740
I store all the weights of this model and then, I compute the fine-tuning with the following steps:
model.load_weights('my_pretrained_weights.h5')
model.trainable = True
model.compile(optimizer=Adam(learning_rate=0.00001, name='adam'),
loss=SparseCategoricalCrossentropy(from_logits=True),
metrics=[jaccard, "accuracy"])
model.fit(training_generator, validation_data=(val_x, val_y), epochs=5,
validation_batch_size=2, callbacks=callbacks)
Suddenly the performance of my model is way much worse than during the training of the decoder:
Epoch 1/5
55/55 [==============================] - 89s 2s/step - loss: 0.2417 - jaccard_sparse3D: 0.0843 - accuracy: 0.9946 - val_loss: 0.0079 - val_jaccard_sparse3D: 0.0312 - val_accuracy: 0.9992
Epoch 2/5
55/55 [==============================] - 90s 2s/step - loss: 0.1920 - jaccard_sparse3D: 0.1179 - accuracy: 0.9927 - val_loss: 0.0138 - val_jaccard_sparse3D: 7.1138e-05 - val_accuracy: 0.9998
Epoch 3/5
55/55 [==============================] - 95s 2s/step - loss: 0.2173 - jaccard_sparse3D: 0.1227 - accuracy: 0.9932 - val_loss: 0.0171 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 0.9999
Epoch 4/5
55/55 [==============================] - 94s 2s/step - loss: 0.2428 - jaccard_sparse3D: 0.1319 - accuracy: 0.9927 - val_loss: 0.0190 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 1.0000
Epoch 5/5
55/55 [==============================] - 97s 2s/step - loss: 0.1920 - jaccard_sparse3D: 0.1107 - accuracy: 0.9926 - val_loss: 0.0215 - val_jaccard_sparse3D: 0.0000e+00 - val_accuracy: 1.0000
Is there any known reason why this is happening? Is it normal?
Thank you in advance!
OK I found out what I do different that makes it NOT necessary to compile. I do not set encoder.trainable = False. What I do in the code below is equivalent
for layer in encoder.layers:
layer.trainable=False
then train your model. Then you can unfreeze the encoder weights with
for layer in encoder.layers:
layer.trainable=True
You do not need to recompile the model. I tested this and it works as expected. You can
verify by priniting model summary before and after and look at the number of trainable parameters. As for changing the learning rate I find it is best to use the the keras callback ReduceLROnPlateau to automatically adjust the learning rate based on validation loss. I also recommend using the EarlyStopping callback which monitors validation and halts training if the loss fails to reduce after 'patience' number of consecutive epochs. Setting restore_best_weights=True will load the weights for the epoch with the lowest validation loss so you don't have to save then reload the weights. Set epochs to a large number to ensure this callback activates. The code I use is shown below
es=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=3,
verbose=1, restore_best_weights=True)
rlronp=tf.keras.callbacks.ReduceLROnPlateau( monitor="val_loss", factor=0.5, patience=1,
verbose=1)
callbacks=[es, rlronp]
In model.fit set callbacks=callbacks

Why training accuracy and validation accuracy are different for the same dataset with tensorflow2.0?

I am training with tensorflow2.0 and tensorflow_datasets. But I am not understand: why does the training accuracy and loss and valdataion accuracy and loss are different?
This is my code:
import tensorflow as tf
import tensorflow_datasets as tfds
data_name = 'uc_merced'
dataset = tfds.load(data_name)
# the train_data and the test_data are same dataset
train_data, test_data = dataset['train'], dataset['train']
def parse(img_dict):
img = tf.image.resize_with_pad(img_dict['image'], 256, 256)
#img = img / 255.
label = img_dict['label']
return img, label
train_data = train_data.map(parse)
train_data = train_data.batch(96)
test_data = test_data.map(parse)
test_data = test_data.batch(96)
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = tf.keras.applications.ResNet50(weights=None, classes=21,
input_shape=(256, 256, 3))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_data, epochs=50, verbose=2, validation_data=test_data)
It is very simple and you can run it on your computer. you can see my train data and validation data are the same train_data, test_data = dataset['train'], dataset['train'].
But the train accuracy (loss) are not the same with validation accuracy (loss). Why is it happen? Is this the bug of tensorflow2.0?
Epoch 1/50
22/22 - 51s - loss: 3.3766 - accuracy: 0.2581 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/50
22/22 - 30s - loss: 1.8221 - accuracy: 0.4590 - val_loss: 123071.9851 - val_accuracy: 0.0476
Epoch 3/50
22/22 - 30s - loss: 1.4701 - accuracy: 0.5405 - val_loss: 12767.8928 - val_accuracy: 0.0519
Epoch 4/50
22/22 - 30s - loss: 1.2113 - accuracy: 0.6071 - val_loss: 3.9311 - val_accuracy: 0.1186
Epoch 5/50
22/22 - 31s - loss: 1.0846 - accuracy: 0.6567 - val_loss: 23.7775 - val_accuracy: 0.1386
Epoch 6/50
22/22 - 31s - loss: 0.9358 - accuracy: 0.7043 - val_loss: 15.3453 - val_accuracy: 0.1543
Epoch 7/50
22/22 - 32s - loss: 0.8566 - accuracy: 0.7243 - val_loss: 8.0415 - val_accuracy: 0.2548
In short, the culprit here is BatchNorm.
Since you have a small dataset and large batch size, you only do 22 updates per epoch. The BatchNorm layer has a default momentum of 0.99, so it takes some time to move the BatchNorm running means/variances to values more appropriate for your dataset (which, given you do not normalise the pixel values away from the [0, 255] range, is pretty far from the typical mean=0, variance=1 sort of range that neural networks are generally designed/initialised to expect).
The reason for the big discrepancy in train vs. validation loss/accuracy is because the training behaviour of batch norm versus the testing behaviour is very different, especially with so few batches. The mean of the data running through the network during training is very far from the running mean accumulated so far, which only updates slowly due to the default BatchNorm momentum/decay of 0.99.
If you reduce your batch size from 96 to, say, 4, you substantially increase the frequency of updates to the BatchNorm running means/variances. Doing this, plus uncommenting the #img = img / 255. line in your data parsing function, alleviates the train/validation discrepancy to a large extent. Doing so gives me this output for three epochs:
Epoch 1/7
525/525 - 51s - loss: 3.2650 - accuracy: 0.1633 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/7
525/525 - 38s - loss: 2.6455 - accuracy: 0.2152 - val_loss: 12.1067 - val_accuracy: 0.2114
Epoch 3/7
525/525 - 38s - loss: 2.5033 - accuracy: 0.2414 - val_loss: 16.9369 - val_accuracy: 0.2095
You can also keep your code the same, and instead modify the keras_applications implementation of Resnet50 to use BatchNormalization(..., momentum=0.9) everywhere. This gives me the following output after two epochs, which I think more or less shows that indeed this is the main cause of your issue:
Epoch 1/2
22/22 [==============================] - 33s 1s/step - loss: 3.1512 - accuracy: 0.2357 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/2
22/22 [==============================] - 16s 748ms/step - loss: 1.7975 - accuracy: 0.4505 - val_loss: 4.1324 - val_accuracy: 0.2810

Keras Batchnormalization, differing results in trainin and evaluation on training dataset

I'm am training a CNN, for the sake of debugging a my problem I am working on a small subset of the actual training data.
During training the loss and accuracy seem very reasonable and pretty good. (In the example I used the same small subset for validation, the problem shows here already)
Fit on x_train and validate on x_train, using batch_size=32
Epoch 10/10
1/10 [==>...........................] - ETA: 2s - loss: 0.5126 - acc: 0.7778
2/10 [=====>........................] - ETA: 1s - loss: 0.3873 - acc: 0.8576
3/10 [========>.....................] - ETA: 1s - loss: 0.3447 - acc: 0.8634
4/10 [===========>..................] - ETA: 1s - loss: 0.3320 - acc: 0.8741
5/10 [==============>...............] - ETA: 0s - loss: 0.3291 - acc: 0.8868
6/10 [=================>............] - ETA: 0s - loss: 0.3485 - acc: 0.8848
7/10 [====================>.........] - ETA: 0s - loss: 0.3358 - acc: 0.8879
8/10 [=======================>......] - ETA: 0s - loss: 0.3315 - acc: 0.8863
9/10 [==========================>...] - ETA: 0s - loss: 0.3215 - acc: 0.8885
10/10 [==============================] - 3s - loss: 0.3106 - acc: 0.8863 - val_loss: 1.5021 - val_acc: 0.2707
When I evaluate on the same training dataset however the accuracy is really off from what I saw during training ( I would expect it to be at least as good as during training on the same dataset).
When evaluating straight forward or using
K.set_learning_phase(0)
I get, similar to the validation (Evaluating on x_train using batch_size=32):
Evaluation Accuracy: 0.266318537392, Loss: 1.50756853772
If I set the backend to learning phase the results get pretty good again, so the per batch normalization seems to work well. I suspect that the cumulated mean and variance are not properly being used.
So after
K.set_learning_phase(1)
I get (Evaluating on x_train using batch_size=32):
Evaluation Accuracy: 0.887728457507, Loss: 0.335956037511
I added the the batchnormalization layer after the first convolutional layer like this:
model = models.Sequential()
model.add(Conv2D(80, first_conv_size, strides=2, activation="relu", input_shape=input_shape, padding=padding_name))
model.add(BatchNormalization(axis=-1))
model.add(MaxPooling2D(first_max_pool_size, strides=4, padding=padding_name))
...
Further down the line I would also have some dropout layers, which I removed to investigate the Batchnormalization behavior. My intend would be to use the model in non-training phase for normal prediction.
Shouldn't it work like that, or am I missing some additional configuration?
Thanks!
I'm using keras 2.0.8 with tensorflow 1.1.0 (anaconda)
This is really annoying. When you set the learning_phase to be True - a BatchNormalization layer is getting normalization statistics straight from data what might be a problem when you have a small batch_size. I came across similar issue some time ago - and here you have my solution:
When building a model - add an option if the model would predict in either learning or not-learning phase and in this used in learning phase use the following class instead of BatchNormalization:
class NonTrainableBatchNormalization(BatchNormalization):
"""
This class makes possible to freeze batch normalization while Keras
is in training phase.
"""
def call(self, inputs, training=None):
return super(
NonTrainableBatchNormalization, self).call(inputs, training=False)
Once you train your model - reset its weights to a NonTrainable copy:
learning_phase_model.set_weights(learned_model.get_weights())
Now you can fully enjoy using BatchNormalization in a learning_phase.