How to avoid overfitting with keras? [closed] - tensorflow

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
def build_model():
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[32,32,3]))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(500, activation="relu"))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(300, activation="relu"))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(10, activation="softmax"))
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.SGD(), metrics=['accuracy'])
return model
keras_clf = keras.wrappers.scikit_learn.KerasClassifier(build_model)
def exponential_decay_fn(epoch):
return 0.05 * 0.1**(epoch / 20)
lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay_fn)
history = keras_clf.fit(np.array(X_train_new), np.array(y_train_new), epochs=100,
validation_data=(np.array(X_validation), np.array(y_validation)),
callbacks=[keras.callbacks.EarlyStopping(patience=10),lr_scheduler])
I use 'drop out', 'early stopping', and 'lr scheduler'. The results seem overfitting, I tried to reduce n_neurons of hidden layers to (300, 100). The results were underfitting, the accuracy of the train set was only around 0.5.
Are there any suggestions?

i dealing with these issue I first start out with a simple model like just a few dense layer with not a lot of nodes. I run the model and look at the resultant training accuracy. First step in modelling is to get a high training accuracy. You can add more layers and or more nodes in each layer until you get a satisfactory level of accuracy. Once that is achieved then start to evaluate the validation loss. If after a certain number of epochs the training loss continues to decrease but the validation loss starts to TREND upward then you are in an over fitting condition. Now the word TREND is import. I can't tell from you graphs if you are really overfitting but it looks to me that the validation loss has reached its minimum and is probably oscillating around the minimum. This is normal and is NOT overfitting. If you have an adjustable lr callback that monitors validation loss or alternately a learning rate scheduler lowering the learning may get you to a lower minimum loss but at some point (provided you run for enough epochs) continually reducing the learning rate doesn't get you to a lower minimum loss. The model has just done the best it can.
Now if you are REALLY over fitting you can take remedial actions. One is to add more dropout at the potential of reduced training accuracy. Another is to add L1 and or L2 regularization. Documentation for that is here.. If your training accuracy is high but your validation accuracy is poor it usually implies you need more training samples because the samples you have are not fully representative of the data probability distribution. More training data is always better. I notice you have 10 classes. Look at the balance of your dataset. If the classes have a significantly different number of samples this can cause problems. There are a bunch of methods to handle that problem like over-sampling under represented classes, under-sampling over represented classes, or a combination of both. An easy method is to use the class_weight parameter in model.fit. Look at your validation set and make sure it is not using to many samples from under represented classes. Always best to select the validation set randomly from the overall data set.

Related

Keras - Regularization & custom loss [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have built a custom Keras model which consists of various layers. Since I wanted to add L2 regularization to such layers, I've passed an instance of keras.regularizers.l2 as the argument for the kernel_regularizer parameter of those layers (as an example, see the constructor of keras.layers.Conv2D). Now, if I were to train this model using, say, Keras's implementation of the binary cross-entropy loss (keras.losses.BinaryCrossEntropy), I would be sure that the L2 regularization that I've specified would be taken into consideration when computing the loss.
In my case, however, I have a custom loss function that requires several other parameters aside from y_true and y_pred, meaning that there's no way I can pass this function as the argument for the loss parameter of model.compile(...) (in fact, I don't even call model.compile(...)). As a result, I also had to write a custom training loop. In other words, instead of simply running model.fit(...), I had to:
Perform forward propagation by calling model(x)
Compute the loss
Compute the gradients of the loss with respect to the model's weights (that is, model.trainable_variables) with tf.GradientTape
Apply the gradients
Repeat
My question is: in which phase is regularization accounted for?
During forward propagation?
During the computation/application of the gradients?
Keep in mind that my custom loss function does NOT account for regularization, so if it's not accounted for in any of the two phases I've mentioned above, then I'm actually training a model with no regularization whatsoever (even though I've provided a value for the kernel_regularizer argument in each layer that my network is made of). In that case, would I be forced to compute the regularization term by hand and add it to the loss?
Regularization losses are computed on the forward pass of the model, and their gradients are applied on the backward pass. I don't think that your training step is applying any weight regularization, and consequently your model isn't regularized. One way to check this would be to actually look at the weights of a trained model - if they're sparse, it means you've regularized the weights in some way. L1 regularization will actually push some weights to 0. L2 regularization does a similar thing, but often results in less sparse weights.
This post outlines writing a training loop from scratch in Keras and has a section on model regularization. The author adds the loss from regularization layers in his training step with the following command:
loss += sum(model.losses)
I think this may be what you need. If you are still unsure, I would train a model with the line above in the training loop, and another model without that line. Inspecting the weights of the trained models will give you some input on whether or not the weight regularization is working as expected.

What to do when accuracy increasing but loss is also increasing on validation data?

I'm currently working on a multi-class classification problem which is highly imbalanced. I want to save my model weights for best epoch but I'm confused on which metric I should choose?
Here's my training progress bar :
I am using ModelCheckpoint callback in tf.keras and monitoring val_loss as a metric to save best model weights.
As you can see in the image,
At 8th epoch I got an val_acc = 0.9845 but val_loss = 0.629 and precision and recall is also high here.
But at 3rd epoch I got val_acc = 0.9840 but val_loss = 0.590
I understand the difference is not huge but in such cases what's the ideal metric to believe on imbalanced dataset?
The most important factors are the the validation and training error.
If the validation loss (error) is going to increase so means overfitting. You must set the number of epochs as high as possible and avoid the overfitting and terminate training based on the error rates. . As long as it keeps dropping training should continue. Till model start to converge at nth epochs. Indeed it should converge quite well to a low val_loss.
Just bear in mind an epoch is one learning cycle where the learner can see the whole training data set. If you have two batches, the learner needs to go through two iterations for one epoch.
This link can be helpful.
You can divide data in 3 data sets, training, validation and evaluation. Train each network along enough number of epochs to track the training Mean Squared Error to be stuck in a minimum.
The training process uses training data-set and should be executed epoch by epoch, then calculate the Mean Squared Error of the network in each epoch for the validation set. The network for the epoch with the minimum validation MSE is selected for the evaluation process.
This can happen for several reasons. Assuming you have used proper separation of train, test and validation set and preprocessing of datasets like min-max scaler, adjusting missing values, you can do the following.
First run the model for several epoch and plot the validation loss graph.
If the loss is first reducing and after reaching a certain point it is now increasing, if the graph is in U shape, then you can do early stopping.
In other scenario, when loss is steadily increasing, early stopping won't work. In this case, add dropout layer of 0.2-0.3 in between the major layers. This will introduce randomness in the layers and will stop the model from memorising.
Now once you add dropouts, your model may suddenly start to behave strange. Tweak with activation functions and number of output nodes or Dense layer and it will eventually get right.

Text classification issue

I'm newbie in ML and try to classify text into two categories. My dataset is made with Tokenizer from medical texts, it's unbalanced and there are 572 records for training and 471 for testing.
It's really hard for me to make model with diverse predict output, almost all values are same. I've tired using models from examples like this and to tweak parameters myself but output is always without sense
Here are tokenized and prepared data
Here is script: Gist
Sample model that I used
sequential_model = keras.Sequential([
layers.Dense(15, activation='tanh',input_dim=vocab_size),
layers.BatchNormalization(),
layers.Dense(8, activation='relu'),
layers.BatchNormalization(),
layers.Dense(1, activation='sigmoid')
])
sequential_model.summary()
sequential_model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc'])
train_history = sequential_model.fit(train_data,
train_labels,
epochs=15,
batch_size=16,
validation_data=(test_data, test_labels),
class_weight={1: 1, 0: 0.2},
verbose=1)
Unfortunately I can't share datasets.
Also I've tired to use keras.utils.to_categorical with class labels but it didn't help
Your loss curves makes sense as we see the network overfit to training set while we see the usual bowl-shaped validation curve.
To make your network perform better, you can always deepen it (more layers), widen it (more units per hidden layer) and/or add more nonlinear activation functions for your layers to be able to map to a wider range of values.
Also, I believe the reason why you originally got so many repeated values is due to the size of your network. Apparently, each of the data points has roughly 20,000 features (pretty large feature space); the size of your network is too small and the possible space of output values that can be mapped to is consequently smaller. I did some testing with some larger hidden unit layers (and bumped up the number of layers) and was able to see that the prediction values did vary: [0.519], [0.41], [0.37]...
It is also understandable that your network performance varies so because the number of features that you have is about 50 times the size of your training (usually you would like a smaller proportion). Keep in mind that training for too many epochs (like more than 10) for so small training and test dataset to see improvements in loss is not great practice as you can seriously overfit and is probably a sign that your network needs to be wider/deeper.
All of these factors, such as layer size, hidden unit size and even number of epochs can be treated as hyperparameters. In other words, hold out some percentage of your training data as part of your validation split, go one by one through the each category of factors and optimize to get the highest validation accuracy. To be fair, your training set is not too high, but I believe you should hold out some 10-20% of the training as a sort of validation set to tune these hyperparameters given that you have such a large number of features per data point. At the end of this process, you should be able to determine your true test accuracy. This is how I would optimize to get the best performance of this network. Hope this helps.
More about training, test, val split

diagnosis on training process of neural network

I am training an autoencoder DNN for a regression question. Need suggestions on how to improve the training process.
The total number of training sample is about ~100,000. I use Keras to fit the model, setting validation_split = 0.1. After training, I drew loss function change and got the following picture. As can be seen here, validation loss is unstable and mean values are very close to training loss.
My question is: based on this, what is the next step I should try to improve the training process?
[Edit on 1/26/2019]
The details of network architecture are as follows:
It has 1 latent layer of 50 nodes. The input and output layer have 1000 nodes,respectively. The activation of hidden layer is ReLU. Loss function is MSE. For optimizer, I use Adadelta with default parameter settings. I also tried to set lr=0.5, but got very similar results. Different features of the data have scaled between -10 and 10, with mean of 0.
By observing the graph provided, the network could not approximate the function which establishes a relation between the input and output.
If your features are too diverse. That one of them is large and others have a very small value, then you should normalize the feature vector. You can read more here.
For a better training and testing result, you can follow these tips,
Use a small network. A network with one hidden layer is enough.
Perform activations in the input as well as hidden layers. The output layer must have a linear function. Use ReLU activation function.
Prefer small learning rate like 0.001. Use RMSProp optimizer. It works fine on most regression problems.
If you are not using mean squared error function, use it.
Try slow and steady learning and not fast learning.

loss decrease starts from second epoch [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am using Python with Keras and Tensorflow (gpu).
I train a ConvNet for an image classification task. When I train the Network, I get following results for the loss function on training data:
before first epoch: 1.099
after first epoch: 1.094
after second epoch: 0.899
after third epoch: 0.713
after fourth epoch: 0.620722375
after fifth epoch: 0.532505135
Why does the decrease of loss function starts at second epoch? Why is there no decrease after first epoch?
Thanks in advance.
Keras calculates the loss on training data while it is training. So, for the first epoch, samples in start perform way poor (because the model is not trained yet) and as the training progresses, model actually becomes better but due to poor loss on starting samples, overall loss looks poor.
On a side note, you can check validation loss which is calculated after the epoch and that'll be much better indicator of the true loss.
The loss is decreasing but it's hard to say without looking at the variables why it barely decreased in the first epoch and it decreased more later. Probably the model took a while to find the way to minimize the function and in the second epoch the optimizer could minimize the loss function better.
That is one confusing bit that tends to get ignored, because it usually does not have a notable impact. A typical training loop may look something like this
import tensorflow as tf
# Build graph
# ...
loss = ...
train_op = ...
with tf.Session() as sess:
while keep_training:
_, current_loss = sess.run([train_op, loss], feed_dict={...})
# ...
The thing is, when you call sess.run there, the loss value that you get is computed before updating the weights. loss is the value that is used to optimize the model, so it is computed and then back-propagated to compute the updates to the weights applied by train_op, so it cannot possibly use the new weights, as it is needed to get compute those in the first place! You could add another loss operation to the graph that is evaluated after train_op, but that would require evaluating each batch twice, and anyway you will see the new loss value in the next iteration. As I said, most times this is not important, but for example if you want to find out at what exact point some weights became NaN or something like that it can be misleading.