I am loading the inceptionV3 keras net with a tensorflow backend. After loading saved weights and setting the trainable flag of all the layers to false I try to fit the model and expect to see everything stable. But the validation loss increase (and accuracy decrease) with each epoch, while the training loss and accuracy are indeed stable as expected.
Can someone explain this strange behavior ? I susupext it is related to the batch normalization layers.
I had the same problem and looks like I found the solution. Check it out here
Related
I'm struggling with understanding how keras model works.
When we train model, we give metrics(like ['accuracy']) and loss function(like cross-entropy) as arguments.
What I want to know is which is the goal for model to optimize.
After fitting, leant model maximize accuracy? or minimize loss?
The model optimizes the loss; Metrics are only there for your information and reporting results.
https://en.wikipedia.org/wiki/Loss_function
Note that metrics are optional, but you must provide a loss function to do training.
You can also evaluate a model on metrics that were not added during training.
Keras models work by minimizing the loss by adjusting trainable model parameters via back propagation. The metrics such as training accuracy, validation accuracy etc are provided as information but can also be used to improve your model's performance through the use of Keras callbacks. Documentation for that is located here. For example the callback ReduceLROnPlateau (documentation is here) can be used to monitor a metric like validation loss and adjust the model's learning rate if the loss fails to decrease after a certain number (patience parameter) of consecutive epochs.
What is the difference between rescaling and not rescaling images for predicting using a tf.keras Resnet50 pre-trained on ImageNet?
Is it necessary? How much of an impact does it have on the predictions?
It is the difference between the model working as expected, and not working at all, usually if you do not apply the proper normalization that was applied to the training set, then the model performs weird, like always producing the same output, which is undesirable.
So always use the exact same scaling and normalization used to train a model.
I trained tensorflow slim default models like mobilenetv1 and inceptionv2 on imagenet from scratch.
The loss is decreasing from ~7 to ~2, training seems to be good.
But the validation accuracy using eval_image_classifier.py shows around 0.
Checkpoints trained from scratch is used for validation accuracy check.
While I use tensorflow provided pretrained checkpoints of mobilenetv1 to check validation accuracy, Accuracy is about 70% as claimed in the site.
I also trained darknet-19 from scratch and the validation accuracy is around 60%, and I added batch normalization to vgg16 and trained on imagenet, the validation accuracy is also above 50%.
Could anyone tell me why default models on slim like mobilenetv1 and inceptionv2 trained from scratch shows around 0 accuracy on validation?
I am trying to train a Faster-RCNN network with Inception-v3 architecture (reference paper: Google's paper) as my fixed feature extractor using keras on my own dataset (number of classes = 4) which is very different compared to the Image-net. Still I initialized it with Image-net weights because this paper gives evidence that initializing with pre-trained weights is always better compared to random initialization.
Upon Training for 60 Epochs my Training accuracy is at 96% and my validation accuracy is at 84% ,Over-fit! (severe maybe?). But what is more worrying is that my loss did not converge at all. Upon testing the network it failed miserably! like, it didn't even detect.
Then I took a slightly different approach. I did a two step training. First I trained the Inception-v3 on my dataset like a classification problem (Still initialized it with Image-net weights) it converged well. Then I used those weights to initialize the Faster-RCNN network. This worked! But, I am confused why this two staged approach works but Training from scratch didn't work. Given I initialized both the methods with the pre-trained image-net weights initially.
Is there a way to train Faster RCNN from scratch?
I'm using the pretrained tensorflow inception v3 model and transfer learning to do some image classification on a new image training set I have. I'm following the instructions laid out here:
https://www.tensorflow.org/versions/r0.8/how_tos/image_retraining/index.html
However, I'm getting some severe overfitting (training accuracy is in the high 90s but CV/test accuracy is in the 50s).
Besides doing some image augmentation to try to increase my training sample size, I was wondering if doing some dropout in the retrain phase might help.
I am using this file (that came with tensorflow) as the base/template for my retraining/transfer learning:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py
Looking at the inception v3 model, dropout is in there. However, I don't see any dropout added in the retrain.py file.
Does it make sense that I could try to add dropout to the retraining to solve my overfitting? If so, where would I add that? If not, why?
Thanks
From Max's comment above, which was a good answer:
Max got some good improvement adding dropout to the retrain.py source. If you want to try it, you can reference his forked script. It has some additional updates, but the main part you should look at starts on line 784