What should be the value for learning rate and momentum? [closed] - tensorflow

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am using around 250K images in total with 6 different labels and I am using VGG , with its last layer changed to accommodate 6 categories. What will be the value of learning rate and momentum if SGD optimiser is used?

It depends on a lot of factor, including your training data, batch size, network... You should try different learning rates and see how fast they converge. The Keras LearningRateScheduler callback is also helpful.

Generally, in fine-tuning, the learning rate is kept small. The convention used is 10x smaller than the learning rate used to train the model from scratch.
Momentum is used to dampen the oscillations in the optimization procedure. (when a reduction in one dimension is higher than the other dimension). Higher the momentum more forcefully the optimization procedure is forced to move in directions where the gradient is consistent (in direction) and dampens movement in directions where gradient direction changes. Default values are good to go.
Generally used values lr = 1e-4, momentum=0.9.

Related

Semantic Segmentation with a dominant class [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am training a semantic segmentation model consists of 3 classes(counting with the background).
The background is the dominant class, and the problem is that the model predicts every pixel as background.
I am currently using cross entropy loss function.
What are the solutions for this situation?
This is a typical strong imbalance for image segmentation; down below there are a couple of solutions to tackle this problem.
Use Jaccard(IoU) loss or dice loss; rather than optimizing for accuracy, you will optimise for the intersection over union, for example, and it has been demonstrated that they work much better than cross_entropy in case of imbalanced problems.
You may try to use class weights(sample weights in Keras/TF) in order to assign a greater importance for class 2 and 3 which are not background.
The Focal Loss has shown improvements in MLPs or other deep learning tasks, in which the dataset is strongly imbalanced. Focal loss can be combined with a loss from (1) and (3); it has the potential to improve your results.
You should expect to get the best performance improvement by employing (1) alone.

why can't I reimplement my tensorflow model with pytorch? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am developing a model in tensorflow and find that it is good on my specific evaluation method. But when I transfer to pytorch, I can't achieve the same results. I have checked the model architecture, the weight init method, the lr schedule, the weight decay, momentum and epsilon used in BN layer, the optimizer, and the data preprocessing. All things are the same. But I can't get the same results as in tensorflow. Anybody have met the same problem?
I did a similar conversion recently.
First you need to make sure that the forward path produces the same results: disable all randomness, initialize with the same values, give it a very small input and compare. If there is a discrepancy, disable parts of the network and compare enabling layers one by one.
When the forward path is confirmed, check the loss, gradients, and updates after one forward-backward cycle.

Keras report highly inconsistent loss between 2 optimization runs [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to classify some images if they contain object : Yes or No.
I firstly run 50 epochs with rmsprop optimizer and continue with second run of SGD optimizer with more 50 epochs.
My first run ends with loss ~ 0.4 and model is saved . Second run starts with the saved model.
The problem that at start of second run Keras shows that loss is ~ 0.8 at 1 epoch.
Why could it happen for same loss function ?
Your new SGD optimizer is not optimized for that model. The moments and the adaptive learning rate were forgotten.
Thus, there is indeed a high chance that this new compilation start badly.
You can try to restart with lower learning rates, and also try to add moment to the SGD.

Any way to manually make a variable more important in a machine learning model? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Sometimes you know by experience or by some expert knowledge some variable will play a key role in this model, is there a way to manually make the variable count more so the training process can speed up and the method can combine some human knowledge/wisdom/intelligence.
I still think machine learning combined with human knowledge is the strongest weapon we have now
This might work by scaling your input data accordingly.
On the other hand the strength of a neural network is to figure out
which features are in fact important and which combinations with other
features are important - from the data.
You might argue, that you'll decrease training time. Somebody else might argue that you're biasing your training in such a way that it might even take more time.
Anyway if you would want to do this, assuming a fully connected layer, you could increasedly initialize the weights of the input feature you found important.
Another way, could be to first pretrain the model according to a training loss, that should have your feature as an output. Than keep the weights and switch to the actual loss - I have never tried this, but it could work.

validation accuracy of convolutional neural network [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Hi I'm new to deep learning and convolutional neural network. Could someone please explain the problem in the figure below? Someone told me that the fluctuation of validation accuracy is the problem here. But I don't quite understand the negative effect of this fluctuation. Why don't we just look at the last point of the figure?
enter image description here
When training a deep learning module you have to validate it.
Which means you are showing the unseen data to algorithm.
So validation accuracy can be less that the training accuracy. Because there's an scenario called over-fitting. Where your training algorithm is too much attached to training data and does not generalize well to other unseen data.
On the fluctuating issue it can be normal. Because we training and testing the algorithm is a stochastic manner.