As per my understanding the loss would decrease after an epoch, but why while training a model the loss value is decreasing after each step in an epoch? Backpropagation happens only once in an epoch or does it happen at each step in an epoch?
This is probably a silly question, but I couldn't find answer anywhere. If there is already a question regarding this, please post the link.
Related
I have a question about the code posted on keras.io\EDSR . Why there's a contradiction between Val_loss and Val_PSNR? in some epochs, from one epoch to the next epoch when Val_loss decreases, we expect Val_PSNR to increase, but this does not happen and vice versa. Is this normal? what's the reason? If not, what is the solution?
I tried to implement PSNR and Loss function in such a way that they are compatible with each other, but The problem was not solved.
I'm currently working on a multi-class classification problem which is highly imbalanced. I want to save my model weights for best epoch but I'm confused on which metric I should choose?
Here's my training progress bar :
I am using ModelCheckpoint callback in tf.keras and monitoring val_loss as a metric to save best model weights.
As you can see in the image,
At 8th epoch I got an val_acc = 0.9845 but val_loss = 0.629 and precision and recall is also high here.
But at 3rd epoch I got val_acc = 0.9840 but val_loss = 0.590
I understand the difference is not huge but in such cases what's the ideal metric to believe on imbalanced dataset?
The most important factors are the the validation and training error.
If the validation loss (error) is going to increase so means overfitting. You must set the number of epochs as high as possible and avoid the overfitting and terminate training based on the error rates. . As long as it keeps dropping training should continue. Till model start to converge at nth epochs. Indeed it should converge quite well to a low val_loss.
Just bear in mind an epoch is one learning cycle where the learner can see the whole training data set. If you have two batches, the learner needs to go through two iterations for one epoch.
This link can be helpful.
You can divide data in 3 data sets, training, validation and evaluation. Train each network along enough number of epochs to track the training Mean Squared Error to be stuck in a minimum.
The training process uses training data-set and should be executed epoch by epoch, then calculate the Mean Squared Error of the network in each epoch for the validation set. The network for the epoch with the minimum validation MSE is selected for the evaluation process.
This can happen for several reasons. Assuming you have used proper separation of train, test and validation set and preprocessing of datasets like min-max scaler, adjusting missing values, you can do the following.
First run the model for several epoch and plot the validation loss graph.
If the loss is first reducing and after reaching a certain point it is now increasing, if the graph is in U shape, then you can do early stopping.
In other scenario, when loss is steadily increasing, early stopping won't work. In this case, add dropout layer of 0.2-0.3 in between the major layers. This will introduce randomness in the layers and will stop the model from memorising.
Now once you add dropouts, your model may suddenly start to behave strange. Tweak with activation functions and number of output nodes or Dense layer and it will eventually get right.
For research, I'm working on face anti-spoofing. I have the rose-youtu dataset which spec could be found here in details. The problem is no matter what architecture I use for the Model, the validation loss would not be less than ~0.5, Although I tried different architectures (3D CNN, 2D CNN, Fine-tuning, Inception) with different combinations of regularization (Dropout, L1, L2, L1_L2) but the thing is the val_loss will end up at ~0.5 with and when I do evaluation on the test set I get ~0.7 loss and ~0.85% accuracy. Note that there is no overfitting happening, both loss and val_loss are close and converging until they reach ~0.3 which causes the loss to decrease and val_loss fluctuating in the range of ~0.4 0.6.
What is that I'm not considering? Does it mean that the dataset could not be improved more than this?
I am training a model, and using the original learning rate of the author (I use their github too), I get a validation loss that keeps oscillating a lot, it will decrease but then suddenly jump to a large value and then decrease again, but never really converges as the lowest it gets is 2 (while training loss converges to 0.0 something - much below 1)
At each epoch I get the training accuracy and at the end, the validation accuracy. Validation accuracy is always greater than the training accuracy.
When I test on real test data, I get good results, but I wonder if my model is overfitting. I expect a good model's val loss to converge in a similar fashion with training loss, but this doesn't happen and the fact that the val loss oscillates to very large values at times worries me.
Adjusting the learning rate and scheduler etc etc, I got the val loss and training loss to a downward fashion with less oscilliation, but this time my test accuracy remains low (as well as training and validation accuracies)
I did try a couple of optimizers (adam, sgd, adagrad) with step scheduler and also the pleateu one of pytorch, I played with step sizes etc. but it didn't really help, neither did clipping gradients.
Is my model overfitting?
If so, how can I reduce the overfitting besides data augmentation?
If not (I read some people on quora said it is nothing to worry about, though I would think it must be overfitting), how can I justify it? Even if I would get similar results for a k-fold experiment, would it be good enough? I don't feel it would justify the oscilliating. How should I proceed?
The training loss at each epoch is usually computed on the entire training set.
The validation loss at each epoch is usually computed on one minibatch of the validation set, so it is normal for it to be more noisey.
Solution: You can report the Exponential Moving Average of the validation loss across different epochs to have less fluctuations.
It is not overfitting since your validation accuracy is not less than the training accuracy. In fact, it sounds like your model is underfitting since your validation accuracy > training accuracy.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am using Python with Keras and Tensorflow (gpu).
I train a ConvNet for an image classification task. When I train the Network, I get following results for the loss function on training data:
before first epoch: 1.099
after first epoch: 1.094
after second epoch: 0.899
after third epoch: 0.713
after fourth epoch: 0.620722375
after fifth epoch: 0.532505135
Why does the decrease of loss function starts at second epoch? Why is there no decrease after first epoch?
Thanks in advance.
Keras calculates the loss on training data while it is training. So, for the first epoch, samples in start perform way poor (because the model is not trained yet) and as the training progresses, model actually becomes better but due to poor loss on starting samples, overall loss looks poor.
On a side note, you can check validation loss which is calculated after the epoch and that'll be much better indicator of the true loss.
The loss is decreasing but it's hard to say without looking at the variables why it barely decreased in the first epoch and it decreased more later. Probably the model took a while to find the way to minimize the function and in the second epoch the optimizer could minimize the loss function better.
That is one confusing bit that tends to get ignored, because it usually does not have a notable impact. A typical training loop may look something like this
import tensorflow as tf
# Build graph
# ...
loss = ...
train_op = ...
with tf.Session() as sess:
while keep_training:
_, current_loss = sess.run([train_op, loss], feed_dict={...})
# ...
The thing is, when you call sess.run there, the loss value that you get is computed before updating the weights. loss is the value that is used to optimize the model, so it is computed and then back-propagated to compute the updates to the weights applied by train_op, so it cannot possibly use the new weights, as it is needed to get compute those in the first place! You could add another loss operation to the graph that is evaluated after train_op, but that would require evaluating each batch twice, and anyway you will see the new loss value in the next iteration. As I said, most times this is not important, but for example if you want to find out at what exact point some weights became NaN or something like that it can be misleading.