Tensorflow reducing learning rates of saved model - tensorflow

I am working on cnn model which has 4 conv layers and 3 dense layers. dataset have around 28000 images and 7000 test images. The model has saved checkpoints and I have trained it several times and achieved 60 % accuracy so far, and while training learning rate is reduced to 2.6214403e-07 (as i used ReduceLROnPlateau factor 0.4). I have question if I increased the learning rate say 1e-4. and resumed the training how will it effect my model? Is It a good idea?
accuracy vs epoch

If your learning curve plateaus immediately and doesn't change much beyond the initial few epochs (as in your case), then your learning rate is too low. While you can resume training with higher learning rates, it would likely render any progress of the initial epochs meaningless. Since you typically only decrease the learning rate between epochs, and given the slow initial progress of your network, you should simply retrain with an increased initial learning rate until you see larger changes in the first few epochs. You can then identify the point of convergence by whenever overfitting happens (test accuracy goes down while train accuracy goes up) and stop there. If this point is still "unnecessarily late", you can additionally reduce the amount that the learning rate decays to make faster progress between epochs.

Related

CNN model's val_loss go down well but val_loss change a lot

I'm using keras(tensorflow) to train my own CNN model.
As shown in the chart, the train_loss goes down well, but val_loss has big change among each Epoch.
What can be the reason, and what should I do to improve it?
This is typical behavior when training in deep learning. Think about it, your target loss is the training loss, so it is directly affected by the training process and as you said "goes down well". The validation loss is only affected indirectly, so naturally it will be more volatile in comparison.
When you are training, the model is attempting to estimate the real distribution of the data, however all it got is the distribution of the training dataset to rely on (which is similar but not the same).
The big spike at the end of your loss curve might be the result of over-fitting. If you are not using a decaying learning rate during training, I would suggest it.

Validation Loss Never Decreases

I am training a neural network for regression with TensorFlow, and getting strange behaviour on my loss curves. The task is to predict the motion of an object in an image, when an action is applied to the object. So the network takes in an image, and an action, and outputs the motion.
The image input is followed by three CNN layers, and in parallel, the action input is followed by a dense layer. These are then concatenated, and followed by two dense layers, before the output. All layers have ReLUs. Data is normalised to have zero mean and standard deviation of one.
Below is the training curve:
The strange behaviour is that, whilst the training loss decreases over time, the validation loss increases right from the start. Usually, when the training curve decreases far below the validation curve, this indicates overfitting. However, in my case, the validation curve never actually decreases at all. Usually, overfitting can be diagnosed when the validation curve drops and then rises again.
Instead, it is as if the network is overfitting right from the very first epoch. In fact, the validation curve seems to follow the opposite trajectory to the training curve. Every improvement in the training prediction results in an opposite effect on the validation prediction.
I have also tried varying the step size (I am using Adam, and in this graph, the step size is 0.0001, then reduces to 0.00001 at epoch 100). My network uses dropout on all the dense layers. I have also tried reducing the number of parameters in the network to prevent overfitting, but the same behaviour occurs. I have a batch size of 50.
What could be the diagnosis of this behaviour? Is the network overfitting, or something else? If it is overfitting, then why do my attempts to reduce the number of parameters and add dropout still result in this same effect? Andy why does the overfitting occur immediately, without the validation loss decreasing at all?
Thank you!

Indication of overfitting

I'm training an image recognition model using Inception and transfer learning, based on the Tensorflow of Poets tutorial.
I have it running for 500k steps, looking to see the optimum number of steps before overtraining strats. The below tensorboard image displays my training accuracy steadily rising but validation accuracy has plateaued around 70K steps. My understanding was validation accuracy would start going down when it started overtraining.
What would be my optimum number of steps in the below chart? 70k steps or 260k?
It is crystal clear that you are overfitting your model. To solve the overfitting problem there are several solutions:
1) Early stopping.
2) Regularization.
3) Reducing your model VC dimension by reducing the number of layers or number of units per layer.
4) Augmenting your dataset.
5) Applying transfer learning.
For your case, you can try early stopping. The best number of iterations according to your graph is 60K.

Tensor flow DNN intermitent error

I am using a Tensor Flow DNN classifier to recognize emotions in images. The training accuracy I am getting is around 80%. However, if I run the application again (fully train and test) I occasionally get a really low test accuracy, around 25%. This is without changing any code or dataset.
I understand that the initial weights are randomized in DNN classifiers but that would not give such a large difference in test accuracy.
I am using 23 features and have tested with varying sizes of datasets (50 - 1000 images). The intermittent low test accuracy always exists.

Odd results for Image Recognition using AlexNet in Deep Learning

I am using a modified AlexNet (cifar-10-model) available in the tensorflow tutorials to do some image recognition of some mechanic part images but getting very wierd results.
The training accuracy is very soon to achieve 100%. But the testing accuracy is starting as high as 45% decreasing very fast to as low as 9%.
I am doing my test on a training set of 20,000 images and testing set of 2,500 images with 8 categories. I do training and testing by batch with size of 1024.
The accuracy and training loss is showed below and you can see that:
The testing accuracy starts at as high as 45%, which doesn't make sense.
The mechanical images are always classified as 'left bracket'
Accuracy
Classification results
your testing accuracy is decreasing, I think it happens because of Overfitting. Try to use simpler model or regularization method to tune the model.
You might want to check your data or feature extraction for errors. I did a protein structure prediction for 3-labels, but I was using a wrong extraction method. My validation accuracy starts at 45% too and then falls quickly.
Knowing where my errors are, I started from scratch: now I do protein structure prediction for 8-labels. The accuracy from the first epoch is 60% and able to rise steadily to 64.9% (the current Q8 world record for CB513 is 68.9%).
So validation accuracy starting at 45% is not a problem, but falling quickly is. I'm afraid that you have an error somewhere in your data/extraction rather than just overfitting.