Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to classify some images if they contain object : Yes or No.
I firstly run 50 epochs with rmsprop optimizer and continue with second run of SGD optimizer with more 50 epochs.
My first run ends with loss ~ 0.4 and model is saved . Second run starts with the saved model.
The problem that at start of second run Keras shows that loss is ~ 0.8 at 1 epoch.
Why could it happen for same loss function ?
Your new SGD optimizer is not optimized for that model. The moments and the adaptive learning rate were forgotten.
Thus, there is indeed a high chance that this new compilation start badly.
You can try to restart with lower learning rates, and also try to add moment to the SGD.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am training a semantic segmentation model consists of 3 classes(counting with the background).
The background is the dominant class, and the problem is that the model predicts every pixel as background.
I am currently using cross entropy loss function.
What are the solutions for this situation?
This is a typical strong imbalance for image segmentation; down below there are a couple of solutions to tackle this problem.
Use Jaccard(IoU) loss or dice loss; rather than optimizing for accuracy, you will optimise for the intersection over union, for example, and it has been demonstrated that they work much better than cross_entropy in case of imbalanced problems.
You may try to use class weights(sample weights in Keras/TF) in order to assign a greater importance for class 2 and 3 which are not background.
The Focal Loss has shown improvements in MLPs or other deep learning tasks, in which the dataset is strongly imbalanced. Focal loss can be combined with a loss from (1) and (3); it has the potential to improve your results.
You should expect to get the best performance improvement by employing (1) alone.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am developing a model in tensorflow and find that it is good on my specific evaluation method. But when I transfer to pytorch, I can't achieve the same results. I have checked the model architecture, the weight init method, the lr schedule, the weight decay, momentum and epsilon used in BN layer, the optimizer, and the data preprocessing. All things are the same. But I can't get the same results as in tensorflow. Anybody have met the same problem?
I did a similar conversion recently.
First you need to make sure that the forward path produces the same results: disable all randomness, initialize with the same values, give it a very small input and compare. If there is a discrepancy, disable parts of the network and compare enabling layers one by one.
When the forward path is confirmed, check the loss, gradients, and updates after one forward-backward cycle.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I try 2 class (dog/cat) classification with cnn.
But I found its graph of training is strange.
Why accuracy values shakes greatly? And is it correct training?
optimizer: adam
learning rate: 1e-4
network: https://gist.github.com/elect000/130acbdb0a3779910082593db4296254
optimizer: adam
learning rate: 1e-6
Likely your learning rate is too high.
When the learning rate is too high, the network takes large leaps when changing the weights, and this can cause it to overshoot the local minimum it's approaching.
Have a read of this article for a better description, and a nice diagram:
https://www.quora.com/In-an-artificial-neural-network-algorithm-what-happens-if-my-learning-rate-is-wrong-too-high-or-too-low
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am using around 250K images in total with 6 different labels and I am using VGG , with its last layer changed to accommodate 6 categories. What will be the value of learning rate and momentum if SGD optimiser is used?
It depends on a lot of factor, including your training data, batch size, network... You should try different learning rates and see how fast they converge. The Keras LearningRateScheduler callback is also helpful.
Generally, in fine-tuning, the learning rate is kept small. The convention used is 10x smaller than the learning rate used to train the model from scratch.
Momentum is used to dampen the oscillations in the optimization procedure. (when a reduction in one dimension is higher than the other dimension). Higher the momentum more forcefully the optimization procedure is forced to move in directions where the gradient is consistent (in direction) and dampens movement in directions where gradient direction changes. Default values are good to go.
Generally used values lr = 1e-4, momentum=0.9.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Hi I'm new to deep learning and convolutional neural network. Could someone please explain the problem in the figure below? Someone told me that the fluctuation of validation accuracy is the problem here. But I don't quite understand the negative effect of this fluctuation. Why don't we just look at the last point of the figure?
enter image description here
When training a deep learning module you have to validate it.
Which means you are showing the unseen data to algorithm.
So validation accuracy can be less that the training accuracy. Because there's an scenario called over-fitting. Where your training algorithm is too much attached to training data and does not generalize well to other unseen data.
On the fluctuating issue it can be normal. Because we training and testing the algorithm is a stochastic manner.