What is the difference between rescaling and not rescaling images for predicting using a tf.keras Resnet50 pre-trained on ImageNet?
Is it necessary? How much of an impact does it have on the predictions?
It is the difference between the model working as expected, and not working at all, usually if you do not apply the proper normalization that was applied to the training set, then the model performs weird, like always producing the same output, which is undesirable.
So always use the exact same scaling and normalization used to train a model.
Related
I'm training a classification model with custom layers on top of BERT. During this, the training performance of this model is going down with increasing epochs ( after the first epoch ) .. I'm not sure what to fix here - is it the model or the data?
( for the data it's binary labels, and balanced in the number of data points for each label).
Any quick pointers on what the problem could be? Has anyone come across this before?
Edit: Turns out there was a mismatch in the transformers library and tf version I was using. Once I fixed that, the training performance was fine!
Thanks!
Remember that fine-tuning a pre-trained model like Bert usually requires a much smaller number of epochs than models trained from scratch. In fact the authors of Bert recommend between 2 and 4 epochs. Further training often translates to overfitting to your data and forgetting the pre-trained weights (see catastrophic forgetting).
In my experience, this affects small datasets especially as it's easy to overfit on them, even at the 2nd epoch. Besides, you haven't commented on your custom layers on top of Bert, but adding much complexity there might increase overfitting also -- note that the common architecture for text classification only adds a linear transformation.
I used LSTM by applying Keras for the task of Named Entity Recognition. The model present around 95% accuracy. Then I tried to create the graph for accuracy and loss. Now I do know, what do these two graphs present?
I am using Google's Dopamine framework to train a specific reinforcement learning use-case. I am using an auto encoder to pre-train the convolutional layers of the Deep Q Network and then transfer those pre-trained weights in the final network.
To that end, I have created a separate model (in this case an auto-encoder) which I train and save the resulting model and weights.
The DQN model is created using Keras's model sub-classing method and the model used to save the trained convolutional layers weights was build using the Sequential API. My issue is with when trying to load the pre-trained weights to my final DQN model. Based on whether I use the load_model() or load_weights() functionality from Tensorflow's API I get two different overall behaviors of my network and I would like to understand why. Specifically I have the two following scenarios:
Loading the weights with theload_weights() method to the final model. The weights are the weights of the encoder plus one additional layer(added just before saving the weights) to fit the architecture of the final network implemented in dopamine where they are loaded.
First load the saved model with load_model() and then when defining the new model in the __init__() method, extract the relevant layers from the loaded model and then use them for the final model.
Overall, I would expect the two approaches to yield similar results with regards to the average reward achieved per episode , when I use the same pre-trained weights. However the two approaches differ ( 1. yield higher average reward than 2. although using the same pre-trained weights) and I don't understand why.
Furthermore, in order to validate this behavior I have tried loading random weights with the two aforementioned approaches in order to see a change in behavior. In both cases, based on which of the two aforementioned loading methods I am using, I end up with very similar resulting behavior with the respected case when loading the trained weights. It's seems like the pre-trained weights in each respected case have no effect on the overall resulting training behavior. Although, this might be irrelevant to the issue I am trying to investigate here as it might be the case that the pre-trained weights don't offer any benefit overall which is also possible.
Any thoughts and ideas on this would be much appreciated.
This is a more general version of a question I've already asked: Significant difference between outputs of deep tensorflow keras model in Python and tensorflowjs conversion
As far as I can tell, the layers of a tfjs model when run in the browser (so far only tested in Chrome and Firefox) will have small numerical differences in the output values when compared to the same model run in Python or Node. The cumulative effect of these small differences across all the layers of the model can cause fairly significant differences in the output. See here for an example of this.
This means a model trained in Python or Node will not perform as well in terms of accuracy when run in the browser. And the deeper your model, the worse it will get.
Therefore my question is, what is the best way to train a model to use with tfjs in the browser? Is there a way to ensure the output will be identical? Or do you just have to accept that there will be small numerical differences and, if so, are there any methods that can be used to train a model to be more resilient to this?
This answer is based on my personal observations. As such, it is debatable and not backed by much evidence. Some things that I follow to get accuracy of 16-bit models close to 32 bit models are:
Avoid using activations that have small upper and lower bounds, such as sigmoid or tanh, for hidden layers. These activations cause the weights of the next layer to become very sensitive to small values, and hence, small changes. I prefer using ReLU for such models. Since it is now the standard activation for hidden layers in most models, you should be using it in any case.
Avoid weight decay and L1/L2 regularizations on weights while training (the kernel_regularizer parameter in keras), since these increase sensitivity of weights. Use Dropout instead, I didn't observe a major drop in performance on TFLite when using it instead of numerical regularizers.
I have searched this for a while, but it seems Keras only has quantization feature after the model is trained. I wish to add Tensorflow fake quantization to my Keras sequential model. According to Tensorflow's doc, I need these two functions to do fake quantization: tf.contrib.quantize.create_training_graph() and tf.contrib.quantize.create_eval_graph().
My question is has anyone managed to add these two functions in a Keras model? If yes, where should these two function be added? For example, before model.compile or after model.fit or somewhere else? Thanks in advance.
I worked around by post-training quantization. Since my final goal is to train a mdoel for mobile device, instead of fake quantization during training, I exported keras .h5 file and converted to Tenforflow lite .tflite file directly (with post_training_quantize flag set to true). I tested this on a simple cifar-10 model. The original keras model and the quantized tflite model have very close accuracy (the quantized one a bit lower).
Post-training quantization: https://www.tensorflow.org/performance/post_training_quantization
Convert Keras model to tensorflow lite: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/toco/g3doc/python_api.md
Used the tf-nightly tensorflow here: https://pypi.org/project/tf-nightly/
If you still want to do fake quantization (because for some model, post-training quantization may give poor accuracy according to Google), the original webpage is down last week. But you can find it from github: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize
Update: Turns out post-quantization does not really quantize the model. During inference, it still uses float32 kernels to do calculations. Thus, I've switched to quantization-aware training. The accuracy is pretty good for my cifar10 model.