Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am using Python with Keras and Tensorflow (gpu).
I train a ConvNet for an image classification task. When I train the Network, I get following results for the loss function on training data:
before first epoch: 1.099
after first epoch: 1.094
after second epoch: 0.899
after third epoch: 0.713
after fourth epoch: 0.620722375
after fifth epoch: 0.532505135
Why does the decrease of loss function starts at second epoch? Why is there no decrease after first epoch?
Thanks in advance.
Keras calculates the loss on training data while it is training. So, for the first epoch, samples in start perform way poor (because the model is not trained yet) and as the training progresses, model actually becomes better but due to poor loss on starting samples, overall loss looks poor.
On a side note, you can check validation loss which is calculated after the epoch and that'll be much better indicator of the true loss.
The loss is decreasing but it's hard to say without looking at the variables why it barely decreased in the first epoch and it decreased more later. Probably the model took a while to find the way to minimize the function and in the second epoch the optimizer could minimize the loss function better.
That is one confusing bit that tends to get ignored, because it usually does not have a notable impact. A typical training loop may look something like this
import tensorflow as tf
# Build graph
# ...
loss = ...
train_op = ...
with tf.Session() as sess:
while keep_training:
_, current_loss = sess.run([train_op, loss], feed_dict={...})
# ...
The thing is, when you call sess.run there, the loss value that you get is computed before updating the weights. loss is the value that is used to optimize the model, so it is computed and then back-propagated to compute the updates to the weights applied by train_op, so it cannot possibly use the new weights, as it is needed to get compute those in the first place! You could add another loss operation to the graph that is evaluated after train_op, but that would require evaluating each batch twice, and anyway you will see the new loss value in the next iteration. As I said, most times this is not important, but for example if you want to find out at what exact point some weights became NaN or something like that it can be misleading.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
def build_model():
model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[32,32,3]))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(500, activation="relu"))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(300, activation="relu"))
keras.layers.Dropout(rate=0.2)
model.add(keras.layers.Dense(10, activation="softmax"))
model.compile(loss='sparse_categorical_crossentropy', optimizer=keras.optimizers.SGD(), metrics=['accuracy'])
return model
keras_clf = keras.wrappers.scikit_learn.KerasClassifier(build_model)
def exponential_decay_fn(epoch):
return 0.05 * 0.1**(epoch / 20)
lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay_fn)
history = keras_clf.fit(np.array(X_train_new), np.array(y_train_new), epochs=100,
validation_data=(np.array(X_validation), np.array(y_validation)),
callbacks=[keras.callbacks.EarlyStopping(patience=10),lr_scheduler])
I use 'drop out', 'early stopping', and 'lr scheduler'. The results seem overfitting, I tried to reduce n_neurons of hidden layers to (300, 100). The results were underfitting, the accuracy of the train set was only around 0.5.
Are there any suggestions?
i dealing with these issue I first start out with a simple model like just a few dense layer with not a lot of nodes. I run the model and look at the resultant training accuracy. First step in modelling is to get a high training accuracy. You can add more layers and or more nodes in each layer until you get a satisfactory level of accuracy. Once that is achieved then start to evaluate the validation loss. If after a certain number of epochs the training loss continues to decrease but the validation loss starts to TREND upward then you are in an over fitting condition. Now the word TREND is import. I can't tell from you graphs if you are really overfitting but it looks to me that the validation loss has reached its minimum and is probably oscillating around the minimum. This is normal and is NOT overfitting. If you have an adjustable lr callback that monitors validation loss or alternately a learning rate scheduler lowering the learning may get you to a lower minimum loss but at some point (provided you run for enough epochs) continually reducing the learning rate doesn't get you to a lower minimum loss. The model has just done the best it can.
Now if you are REALLY over fitting you can take remedial actions. One is to add more dropout at the potential of reduced training accuracy. Another is to add L1 and or L2 regularization. Documentation for that is here.. If your training accuracy is high but your validation accuracy is poor it usually implies you need more training samples because the samples you have are not fully representative of the data probability distribution. More training data is always better. I notice you have 10 classes. Look at the balance of your dataset. If the classes have a significantly different number of samples this can cause problems. There are a bunch of methods to handle that problem like over-sampling under represented classes, under-sampling over represented classes, or a combination of both. An easy method is to use the class_weight parameter in model.fit. Look at your validation set and make sure it is not using to many samples from under represented classes. Always best to select the validation set randomly from the overall data set.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I have built a custom Keras model which consists of various layers. Since I wanted to add L2 regularization to such layers, I've passed an instance of keras.regularizers.l2 as the argument for the kernel_regularizer parameter of those layers (as an example, see the constructor of keras.layers.Conv2D). Now, if I were to train this model using, say, Keras's implementation of the binary cross-entropy loss (keras.losses.BinaryCrossEntropy), I would be sure that the L2 regularization that I've specified would be taken into consideration when computing the loss.
In my case, however, I have a custom loss function that requires several other parameters aside from y_true and y_pred, meaning that there's no way I can pass this function as the argument for the loss parameter of model.compile(...) (in fact, I don't even call model.compile(...)). As a result, I also had to write a custom training loop. In other words, instead of simply running model.fit(...), I had to:
Perform forward propagation by calling model(x)
Compute the loss
Compute the gradients of the loss with respect to the model's weights (that is, model.trainable_variables) with tf.GradientTape
Apply the gradients
Repeat
My question is: in which phase is regularization accounted for?
During forward propagation?
During the computation/application of the gradients?
Keep in mind that my custom loss function does NOT account for regularization, so if it's not accounted for in any of the two phases I've mentioned above, then I'm actually training a model with no regularization whatsoever (even though I've provided a value for the kernel_regularizer argument in each layer that my network is made of). In that case, would I be forced to compute the regularization term by hand and add it to the loss?
Regularization losses are computed on the forward pass of the model, and their gradients are applied on the backward pass. I don't think that your training step is applying any weight regularization, and consequently your model isn't regularized. One way to check this would be to actually look at the weights of a trained model - if they're sparse, it means you've regularized the weights in some way. L1 regularization will actually push some weights to 0. L2 regularization does a similar thing, but often results in less sparse weights.
This post outlines writing a training loop from scratch in Keras and has a section on model regularization. The author adds the loss from regularization layers in his training step with the following command:
loss += sum(model.losses)
I think this may be what you need. If you are still unsure, I would train a model with the line above in the training loop, and another model without that line. Inspecting the weights of the trained models will give you some input on whether or not the weight regularization is working as expected.
I'm currently working on a multi-class classification problem which is highly imbalanced. I want to save my model weights for best epoch but I'm confused on which metric I should choose?
Here's my training progress bar :
I am using ModelCheckpoint callback in tf.keras and monitoring val_loss as a metric to save best model weights.
As you can see in the image,
At 8th epoch I got an val_acc = 0.9845 but val_loss = 0.629 and precision and recall is also high here.
But at 3rd epoch I got val_acc = 0.9840 but val_loss = 0.590
I understand the difference is not huge but in such cases what's the ideal metric to believe on imbalanced dataset?
The most important factors are the the validation and training error.
If the validation loss (error) is going to increase so means overfitting. You must set the number of epochs as high as possible and avoid the overfitting and terminate training based on the error rates. . As long as it keeps dropping training should continue. Till model start to converge at nth epochs. Indeed it should converge quite well to a low val_loss.
Just bear in mind an epoch is one learning cycle where the learner can see the whole training data set. If you have two batches, the learner needs to go through two iterations for one epoch.
This link can be helpful.
You can divide data in 3 data sets, training, validation and evaluation. Train each network along enough number of epochs to track the training Mean Squared Error to be stuck in a minimum.
The training process uses training data-set and should be executed epoch by epoch, then calculate the Mean Squared Error of the network in each epoch for the validation set. The network for the epoch with the minimum validation MSE is selected for the evaluation process.
This can happen for several reasons. Assuming you have used proper separation of train, test and validation set and preprocessing of datasets like min-max scaler, adjusting missing values, you can do the following.
First run the model for several epoch and plot the validation loss graph.
If the loss is first reducing and after reaching a certain point it is now increasing, if the graph is in U shape, then you can do early stopping.
In other scenario, when loss is steadily increasing, early stopping won't work. In this case, add dropout layer of 0.2-0.3 in between the major layers. This will introduce randomness in the layers and will stop the model from memorising.
Now once you add dropouts, your model may suddenly start to behave strange. Tweak with activation functions and number of output nodes or Dense layer and it will eventually get right.
I am training a model, and using the original learning rate of the author (I use their github too), I get a validation loss that keeps oscillating a lot, it will decrease but then suddenly jump to a large value and then decrease again, but never really converges as the lowest it gets is 2 (while training loss converges to 0.0 something - much below 1)
At each epoch I get the training accuracy and at the end, the validation accuracy. Validation accuracy is always greater than the training accuracy.
When I test on real test data, I get good results, but I wonder if my model is overfitting. I expect a good model's val loss to converge in a similar fashion with training loss, but this doesn't happen and the fact that the val loss oscillates to very large values at times worries me.
Adjusting the learning rate and scheduler etc etc, I got the val loss and training loss to a downward fashion with less oscilliation, but this time my test accuracy remains low (as well as training and validation accuracies)
I did try a couple of optimizers (adam, sgd, adagrad) with step scheduler and also the pleateu one of pytorch, I played with step sizes etc. but it didn't really help, neither did clipping gradients.
Is my model overfitting?
If so, how can I reduce the overfitting besides data augmentation?
If not (I read some people on quora said it is nothing to worry about, though I would think it must be overfitting), how can I justify it? Even if I would get similar results for a k-fold experiment, would it be good enough? I don't feel it would justify the oscilliating. How should I proceed?
The training loss at each epoch is usually computed on the entire training set.
The validation loss at each epoch is usually computed on one minibatch of the validation set, so it is normal for it to be more noisey.
Solution: You can report the Exponential Moving Average of the validation loss across different epochs to have less fluctuations.
It is not overfitting since your validation accuracy is not less than the training accuracy. In fact, it sounds like your model is underfitting since your validation accuracy > training accuracy.
I am trying to use ConvLSTM layers in Keras 2 to train an action recognition model. The model has 3 ConvLSTM layers and 2 Fully Connected ones.
At each and every epoch the accuracy for the first batch (usually more than one) is zero and then it increases to some amount more than the previous epoch. For example, the first epoch finishes at 0.3 and the next would finish at 0.4 and so on.
My question is why does it get back to zero at each epoch?
p.s.
The ConvLSTM is stateless.
The model is compiled with SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True), for some reason it does not converge using Adam.
So - in order to understand why something like this happening you need to understand how keras computes accuracy during the batch computation:
Before each batch - a number of positively classified examples is stored.
After each batch - a number of positively classified examples is stored and it's printed after division by all examples used in training.
As your accuracy is pretty low it's highly probable that in a first few batches none of the examples will be classified properly. Especially when you have a small batch. This makes accuracy to be 0 at the beginning of your training.