Keras validation loss - tensorflow

In training keras/tensorflow networks I observe losses in validation datasets much lower than in training datasets - by orders of magnitude. I am using the adam optimizer and mean_squared_error loss. The behavior is consistent over different network types and dataset sizes. I would expect mean loss in validation to greater than in train. Perhaps mean_squared_error is erroneously absolute squared error, which could account for lower values since validation is typically smaller than train? If not then what explains this behavior?

This is in part due to the fact that training error for an epoch is computed as an average over batches of training data, while validation error is computed at the end of the epoch for the entire validation set. Since training makes progress within an epoch, training error over the first batches is typically higher than over the last batches. It should not make a difference of orders of magnitude though.
It could be that your training and validation data are fundamentally different, or that you haven't preprocessed the data correctly. Make sure you standardize your data. Moreover, if you are not working with timeseries, randomly shuffle your dataset before you split into train/val set.

Related

What to do when accuracy increasing but loss is also increasing on validation data?

I'm currently working on a multi-class classification problem which is highly imbalanced. I want to save my model weights for best epoch but I'm confused on which metric I should choose?
Here's my training progress bar :
I am using ModelCheckpoint callback in tf.keras and monitoring val_loss as a metric to save best model weights.
As you can see in the image,
At 8th epoch I got an val_acc = 0.9845 but val_loss = 0.629 and precision and recall is also high here.
But at 3rd epoch I got val_acc = 0.9840 but val_loss = 0.590
I understand the difference is not huge but in such cases what's the ideal metric to believe on imbalanced dataset?
The most important factors are the the validation and training error.
If the validation loss (error) is going to increase so means overfitting. You must set the number of epochs as high as possible and avoid the overfitting and terminate training based on the error rates. . As long as it keeps dropping training should continue. Till model start to converge at nth epochs. Indeed it should converge quite well to a low val_loss.
Just bear in mind an epoch is one learning cycle where the learner can see the whole training data set. If you have two batches, the learner needs to go through two iterations for one epoch.
This link can be helpful.
You can divide data in 3 data sets, training, validation and evaluation. Train each network along enough number of epochs to track the training Mean Squared Error to be stuck in a minimum.
The training process uses training data-set and should be executed epoch by epoch, then calculate the Mean Squared Error of the network in each epoch for the validation set. The network for the epoch with the minimum validation MSE is selected for the evaluation process.
This can happen for several reasons. Assuming you have used proper separation of train, test and validation set and preprocessing of datasets like min-max scaler, adjusting missing values, you can do the following.
First run the model for several epoch and plot the validation loss graph.
If the loss is first reducing and after reaching a certain point it is now increasing, if the graph is in U shape, then you can do early stopping.
In other scenario, when loss is steadily increasing, early stopping won't work. In this case, add dropout layer of 0.2-0.3 in between the major layers. This will introduce randomness in the layers and will stop the model from memorising.
Now once you add dropouts, your model may suddenly start to behave strange. Tweak with activation functions and number of output nodes or Dense layer and it will eventually get right.

Validation loss less than training loss (vald accuracy higher than training accuracy) without using dropout

I have been working on Multitask model, using VGG16 with no dropout layers. I find out that the validation accuracy is higher than the training accuracy and validation loss is lesser than the training loss.
I cant seem to findout the reason to why is this happening in the model.
Below is the training plot:
Data:
I am using (randomly shuffled images) 70% train, 15% validation, 15% test, and the results on 15% test data is as follows:
Do you think these results are too good to be true?
At the beginning, yes, but at the end you can see they sort of start changing places.
At the end of the training you are getting near an overfit point (if the val loss starts increasing or the val accurace starts decreasing, then you've reached overfitting)
But at the beginning, what can explain that behavior might be some data unbalance between training and test. Maybe you've got easier examples in the validation database, or a class unbalance, or more empty values, etc.

How do I know when to stop training my CNN?

I've been training my CNN and got the following as results:
I just know that the training and validation accuracy needs to both be high, but are these numbers good enough? How do I know when to stop? Should I concern myself with the losses, or only accuracy? Which epoch shows the best result so far?
Loss value implies how poorly or well a model behaves after each iteration of optimization. where as accuracy of a model is usually determined after the model parameters and is calculated in the form of a percentage.
Yes, training and validation should be high. Numbers always depend on subject area where we are dealing. In case of medical domain these numbers not good.
If you have serious class imbalance, your model will maximize accuracy by simply always picking the most common class, but this would not be a useful model. In this case cross entropy or log-loss would be a better loss function to optimize.
Generally the lower the loss the better a model unless the model has overfitted to the training data.
10th epoch is best where you got higher validation accuracy and lower validation loss.

Different between fit and evaluate in keras

I have used 100000 samples to train a general model in Keras and achieve good performance. Then, for a particular sample, I want to use the trained weights as initialization and continue to optimize the weights to further optimize the loss of the particular sample.
However, the problem occurred. First, I load the trained weight by the keras API easily, then, I evaluate the loss of the one particular sample, and the loss is close to the loss of the validation loss during the training of the model. I think it is normal. However, when I use the trained weight as the inital and further optimize the weight over the one sample by model.fit(), the loss is really strange. It is much higher than the evaluate result and gradually became normal after several epochs.
I think it is strange that, for the same one simple and loading the same model weight, why the model.fit() and model.evaluate() return different results. I used batch normalization layers in my model and I wonder that it may be the reason. The result of model.evaluate() seems normal, as it is close to what I seen in the validation set before.
So what cause the different between fit and evaluation? How can I solve it?
I think your core issue is that you are observing two different loss values during fit and evaluate. This has been extensively discussed here, here, here and here.
The fit() function loss includes contributions from:
Regularizers: L1/L2 regularization loss will be added during training, increasing the loss value
Batch norm variations: during batch norm, running mean and variance of the batch will be collected and then those statistics will be used to perform normalization irrespective of whether batch norm is set to trainable or not. See here for more discussion on that.
Multiple batches: Of course, the training loss will be averaged over multiple batches. So if you take average of first 100 batches and evaluate on the 100th batch only, the results will be different.
Whereas for evaluate, just do forward propagation and you get the loss value, nothing random here.
Bottomline is, you should not compare train and validation loss (or fit and evaluate loss). Those functions do different things. Look for other metrics to check if your model is training fine.

Tensorflow: loss decreasing, but accuracy stable

My team is training a CNN in Tensorflow for binary classification of damaged/acceptable parts. We created our code by modifying the cifar10 example code. In my prior experience with Neural Networks, I always trained until the loss was very close to 0 (well below 1). However, we are now evaluating our model with a validation set during training (on a separate GPU), and it seems like the precision stopped increasing after about 6.7k steps, while the loss is still dropping steadily after over 40k steps. Is this due to overfitting? Should we expect to see another spike in accuracy once the loss is very close to zero? The current max accuracy is not acceptable. Should we kill it and keep tuning? What do you recommend? Here is our modified code and graphs of the training process.
https://gist.github.com/justineyster/6226535a8ee3f567e759c2ff2ae3776b
Precision and Loss Images
A decrease in binary cross-entropy loss does not imply an increase in accuracy. Consider label 1, predictions 0.2, 0.4 and 0.6 at timesteps 1, 2, 3 and classification threshold 0.5. timesteps 1 and 2 will produce a decrease in loss but no increase in accuracy.
Ensure that your model has enough capacity by overfitting the training data. If the model is overfitting the training data, avoid overfitting by using regularization techniques such as dropout, L1 and L2 regularization and data augmentation.
Last, confirm your validation data and training data come from the same distribution.
Here are my suggestions, one of the possible problems is that your network start to memorize data, yes you should increase regularization,
update:
Here I want to mention one more problem that may cause this:
The balance ratio in the validation set is much far away from what you have in the training set. I would recommend, at first step try to understand what is your test data (real-world data, the one your model will face in inference time) descriptive look like, what is its balance ratio, and other similar characteristics. Then try to build such a train/validation set almost with the same descriptive you achieve for real data.
Well, I faced the similar situation when I used Softmax function in the last layer instead of Sigmoid for binary classification.
My validation loss and training loss were decreasing but accuracy of both remained constant. So this gave me lesson why sigmoid is used for binary classification.