I used tf.keras.callbacks.ReduceLROnPlateau to reduce the learning rate during the training and then saved the weights of my model. After I load the weights again, it seems Tensorflow remembers the last learning rate used. I tried to reset the learning rate in the optimizer's definition bytf.keras.optimizers.RMSprop(learning_rate = 0.001), however this does not work and the Tensorflow still uses the last used learning rate. Then I tried to set the minimum learning rate in tf.keras.callbacks.ReduceLROnPlateauto what I want but this does not work either.
How can I change the learning rate of a model that I saved its weights?
Related
I am working on cnn model which has 4 conv layers and 3 dense layers. dataset have around 28000 images and 7000 test images. The model has saved checkpoints and I have trained it several times and achieved 60 % accuracy so far, and while training learning rate is reduced to 2.6214403e-07 (as i used ReduceLROnPlateau factor 0.4). I have question if I increased the learning rate say 1e-4. and resumed the training how will it effect my model? Is It a good idea?
accuracy vs epoch
If your learning curve plateaus immediately and doesn't change much beyond the initial few epochs (as in your case), then your learning rate is too low. While you can resume training with higher learning rates, it would likely render any progress of the initial epochs meaningless. Since you typically only decrease the learning rate between epochs, and given the slow initial progress of your network, you should simply retrain with an increased initial learning rate until you see larger changes in the first few epochs. You can then identify the point of convergence by whenever overfitting happens (test accuracy goes down while train accuracy goes up) and stop there. If this point is still "unnecessarily late", you can additionally reduce the amount that the learning rate decays to make faster progress between epochs.
I'm trying to study the performace of YOLOv2. I have implemented the YOLOv2 in tensorflow with the backbone of mobilenet.
After setting the learning rate=0, weight decay=0, BN's is_training=False, the loss of test dataset is still changing unexpectedly.
I just want to know if there is any other possible parameter may influence the test loss.
I am setting up an object detection pipeline based on recently released tensorflow object detection API. I am using the arXiv as guidance. I am looking to understand the below for training on my own dataset.
It is not clear how they selected the learning rate schedules and how that would change based on the number of GPUs available for training. How do the training rate schedule change based on number of GPU's available for training? The paper mentions 9 GPUs are used. How should I change the training rate if I only want to use 1 GPU?
The released sample training config file for Pascal VOC using Faster R-CNN has initial learning rate = 0.0001. This is 10x lower than what was published in the original Faster-RCNN paper. Is this due to an assumption on the number of GPU's available for training or due to a different reason?
When I start training from the COCO detection checkpoint, how should the training loss decrease? Looking at tensorboard, on my dataset training loss is low - between 0.8 to 1.2 per iteration (with batch size of 1). Below image shows the various losses from tensorboard. . Is this expected behavior?
For questions 1 and 2: our implementation differs in a few small details compared to the original paper and internally we train all of our detectors with asynchronous SGD with ~10 GPUs. Our learning rates are calibrated for this setting (which you will also have if you decide to train via Cloud ML Engine as in the Pets walkthrough). If you use another setting, you will have to do a bit of hyperparameter exploration. For a single GPU, leaving the learning rate alone probably won't hurt performance, but you may be able to get faster convergence by increasing it.
For question 3: Training losses decrease erratically and you can only see the decrease if you smooth the plots quite a bit over time. Moreover, it's hard to explicitly say how well you are doing with respect to eval metrics just by looking at the training losses. I recommend looking at the mAP plots over time as well as the image visualizations to really get an idea of whether your model has "lifted off".
Hope this helps.
I am working on a deep learning (CNN + AEs) approach on facial images.
I have
an input layer of 112*112*3 of facial images
3 convolution + max pooling + ReLU
2 layers of fully connected with 512 neurons with 50% dropout to
avoid overfitting and last output layer with 10 neurons since I have
10 classes.
also used reduce mean of softmax cross entropy and also L2.
For training I divided my dataset to 3 groups of:
60% for training
20% for validation
20% for evaluation
The problem is after few epochs the validation error rate stay fixed value and never changes. I have used tensorflow to implement my project.
I hadn't such problem before with CNNs so I think it's first time. I have checked the code it's based on tensorflow documentation so I don't think if the problem is with the code. Maybe I need to change some parameters but I am not sure.
Any idea about common solutions for such problem?
Update:
I changed the optimizer from momentum to Adam whith default learning rate. For now validation error changes but it's lower than mini batch error most of the time while both have same batch sizes.
I have tested the model with and without biases with 0.1 as initial values but no good fit yet.
Update
I fixed the issue I will update with more details soon.
One common solution that I found helpful for this type of problem is using TensorBoard. You can add details visualize training performance information after each epoch for different points in the computational graph. Adding key metrics is worth it since you can see how training progresses after applying changes in the adaptive learning rate, batch size, neural network architecture, drop out / regularization, number of GPUs, etc.
Here is the link that I found helpful to add these details:
https://www.tensorflow.org/how_tos/graph_viz/#runtime_statistics
From various examples of Tensorflow (translation, ptb) it seems like that you need to explicitly change learning rate when using GradientDescentOptimizer. But is it the case while using some more 'sophisticated' techniques like Adagrad, Adadelta etc. Also when we continue training the model from a saved instance, are the past values used by these optimizers saved in the model file ?
It depends on the Optimizer you are using. Vanilla SGD needs (accepts) individual adaption of the learning rate. Some others do. Adadelta for example does not. (https://arxiv.org/abs/1212.5701)
So this depends not so much on Tensorflow but rather on the mathematical background of the optimizer you are using.
Furthermore: Yes, saving and restarting the training does not reset the learning rates, but continuous at the point saved.