How to proceed training after stopping it and changing some parameters? - tensorflow

I'm training my model via model.fit() in Keras. I stopped the training, by interrupting it, or even because it is done, and then changed the batch_size and decided to go with more training. Here is what's happening:
The loss when the training was stopped/finished = 26
The loss when the training proceeded = 46
Meanining that I lost all the progress I made and it is as if I'ms starting over.
It does procceed from where it left only if I don't change anything. But if I changed the batch size, it is as if the optimizer re-initializes my weights and throw out my progress. How can I get a handle on what the optimizer is doing without my consent ?

You most likely have some examples that give you large loss values. MSE makes this worse. When batch size is larger then you are probably getting a lot of these outliers in your batch. You can look at the top loss contributing examples.

Related

How are P, R, and F scores calculated in spaCy CLI train NER?

I am using the spaCy CLI train command for NER with train_path set to the training dataset (train-set) and dev_path set to the evaluation dataset (test-set). The printout in the console shows me NER Precision, Recall, and the F-score.
However, it is not clear to me how the scores were calculated. Are they the scores from the model predicting on the train-set (train-scores) or from the test-set (test-scores)?
I want to determine after which epoch to stop training to prevent overfitting. Currently after 60 epochs the Loss is still slightly decreasing and Precision, Recall, and F-score are still slightly increasing. It seems to me that the model might be memorizing the training data and that the P, R, and F-scores are calculated on the train-set and thus keep improving.
To my knowledge a good stopping point in training would be right before the test-scores start dropping again, even though the train-scores keep increasing. So I would like to compare them over time (epochs).
My questions are:
Are the scores displayed in the console while training train-scores or test-scores?
And how to get access to the other one?
If it is the train-score, for what is the testset (dev_path) used?
The loss is calculated from the training examples, as a side effect of calling nlp.update() in the training loop. However, all the other performance metrics are calculated on the dev set, by calling the Scorer.
To my knowledge a good stopping point in training would be right before the test-scores start dropping again, even though the train-scores keep increasing
Yep, I agree. So looking at the spacy train results, this would be when the (training) loss is still decreasing, while the (dev) F-score starts decreasing again.
Currently after 60 epochs the Loss is still slightly decreasing and Precision, Recall, and F-score are still slightly increasing.
So it looks like you can train for some epochs more :-)

Validation loss less than training loss (vald accuracy higher than training accuracy) without using dropout

I have been working on Multitask model, using VGG16 with no dropout layers. I find out that the validation accuracy is higher than the training accuracy and validation loss is lesser than the training loss.
I cant seem to findout the reason to why is this happening in the model.
Below is the training plot:
Data:
I am using (randomly shuffled images) 70% train, 15% validation, 15% test, and the results on 15% test data is as follows:
Do you think these results are too good to be true?
At the beginning, yes, but at the end you can see they sort of start changing places.
At the end of the training you are getting near an overfit point (if the val loss starts increasing or the val accurace starts decreasing, then you've reached overfitting)
But at the beginning, what can explain that behavior might be some data unbalance between training and test. Maybe you've got easier examples in the validation database, or a class unbalance, or more empty values, etc.

Different between fit and evaluate in keras

I have used 100000 samples to train a general model in Keras and achieve good performance. Then, for a particular sample, I want to use the trained weights as initialization and continue to optimize the weights to further optimize the loss of the particular sample.
However, the problem occurred. First, I load the trained weight by the keras API easily, then, I evaluate the loss of the one particular sample, and the loss is close to the loss of the validation loss during the training of the model. I think it is normal. However, when I use the trained weight as the inital and further optimize the weight over the one sample by model.fit(), the loss is really strange. It is much higher than the evaluate result and gradually became normal after several epochs.
I think it is strange that, for the same one simple and loading the same model weight, why the model.fit() and model.evaluate() return different results. I used batch normalization layers in my model and I wonder that it may be the reason. The result of model.evaluate() seems normal, as it is close to what I seen in the validation set before.
So what cause the different between fit and evaluation? How can I solve it?
I think your core issue is that you are observing two different loss values during fit and evaluate. This has been extensively discussed here, here, here and here.
The fit() function loss includes contributions from:
Regularizers: L1/L2 regularization loss will be added during training, increasing the loss value
Batch norm variations: during batch norm, running mean and variance of the batch will be collected and then those statistics will be used to perform normalization irrespective of whether batch norm is set to trainable or not. See here for more discussion on that.
Multiple batches: Of course, the training loss will be averaged over multiple batches. So if you take average of first 100 batches and evaluate on the 100th batch only, the results will be different.
Whereas for evaluate, just do forward propagation and you get the loss value, nothing random here.
Bottomline is, you should not compare train and validation loss (or fit and evaluate loss). Those functions do different things. Look for other metrics to check if your model is training fine.

[MXNet]Periodic Loss Value when training with "step" learning rate policy

When training deep CNN, a common way is to use SGD with momentum with a "step" learning rate policy (e.g. learning rate set to be 0.1,0.01,0.001.. at different stages of training).But I encounter an unexpected phenomenon when training with this strategy under MXNet.
That is the periodic training loss value
https://user-images.githubusercontent.com/26757001/31327825-356401b6-ad04-11e7-9aeb-3f690bc50df2.png
The above is the training loss at a fixed learning rate 0.01, where the loss is decreasing normally
https://user-images.githubusercontent.com/26757001/31327872-8093c3c4-ad04-11e7-8fbd-327b3916b278.png
However, at the second stage of training (with lr 0.001) , the loss goes up and down periodically, and the period is exactly an epoch
So I thought it might be the problem of data shuffling, but it cannot explain why it doesn't happen in the first stage. Actually I used ImageRecordIter as the DataIter and reset it after every epoch, is there anything I missed or set mistakenly?
train_iter = mx.io.ImageRecordIter(
path_imgrec=recPath,
data_shape=dataShape,
batch_size=batchSize,
last_batch_handle='discard',
shuffle=True,
rand_crop=True,
rand_mirror=True)
The codes for training and loss evaluation:
while True:
train_iter.reset()
for i,databatch in enumerate(train_iter):
globalIter += 1
mod.forward(databatch,is_train=True)
mod.update_metric(metric,databatch.label)
if globalIter % 100 == 0:
loss = metric.get()[1]
metric.reset()
mod.backward()
mod.update()
Actually the loss can converge, but it takes too long.
I've suffered from this problem for a long period of time, on different network and different datasets.
I didn't have this problem when using Caffe. Is this due to the implementation difference?
Your loss/learning curves look suspiciously smooth, and I believe you can observe the same oscillation in the loss even when the learning rate is set to 0.01 just at a smaller relative scale (i.e. if you 'zoomed in' to the chart you'd see the same pattern). You may have an issue with your data iterator passing the same batch for example. And your training loop looks wrong but this could be due to formatting (e.g. mod.update() only performed every 100 batches isn't correct).
You can observe periodicity in your loss when you're traveling across a valley in the loss surface, up and down the sides rather than down the valley. Choosing a lower learning rate can help fix this, and make sure you are using momentum too.

How to interpret increase in both loss and accuracy

I have run deep learning models(CNN's) using tensorflow. Many times during the epoch, i have observed that both loss and accuracy have increased, or both have decreased. My understanding was that both are always inversely related. What could be scenario where both increase or decrease simultaneously.
The loss decreases as the training process goes on, except for some fluctuation introduced by the mini-batch gradient descent and/or regularization techniques like dropout (that introduces random noise).
If the loss decreases, the training process is going well.
The (validation I suppose) accuracy, instead, it's a measure of how good the predictions of your model are.
If the model is learning, the accuracy increases. If the model is overfitting, instead, the accuracy stops to increase and can even start to decrease.
If the loss decreases and the accuracy decreases, your model is overfitting.
If the loss increases and the accuracy increase too is because your regularization techniques are working well and you're fighting the overfitting problem. This is true only if the loss, then, starts to decrease whilst the accuracy continues to increase.
Otherwise, if the loss keep growing your model is diverging and you should look for the cause (usually you're using a too high learning rate value).
I think the top-rated answer is incorrect.
I will assume you are talking about cross-entropy loss, which can be thought of as a measure of 'surprise'.
Loss and accuracy increasing/decreasing simultaneously on the training data tells you nothing about whether your model is overfitting. This can only be determined by comparing loss/accuracy on the validation vs. training data.
If loss and accuracy are both decreasing, it means your model is becoming more confident on its correct predictions, or less confident on its incorrect predictions, or both, hence decreased loss. However, it is also making more incorrect predictions overall, hence the drop in accuracy. Vice versa if both are increasing. That is all we can say.
I'd like to add a possible option here for all those who struggle with a model training right now.
If your validation data is a bit dirty, you might experience that in the beginning of the training the validation loss is low as well as the accuracy, and the more you train your network, the accuracy increases with the loss side by side. The reason why it happens, because it finds the possible outliers of your dirty data and gets a super high loss there. Therefore, your accuracy will grow as it guesses more data right, but the loss grows with it.
This is just what I think based on the math behind the loss and the accuracy,
Note :-
I expect your data is categorical
Your models output :-
[0.1,0.9,0.9009,0.8] (used to calculate loss)
Maxed output :-
[0,0,1,0] (used to calculate acc )
Expected output :-
[0,1,0,0]
Lets clarify what loss and acc calculates :
Loss :- The overall error of y and ypred
Acc :- Just if y and maxed(ypred) is equal
So in a overall our model almost nailed it , resulting in a low loss
But in maxed output no overall is seen its just that they should completely match ,
If they completely match :-
1
else:
0
Thus resulting in a low accuracy too
Try to check mae of the model
remove regularization
check if your are using correct loss
You should check your class index (both train and valid) in training process. It might be sorted in different ways. I have this problem in colab.