Why does very simple port of the official Keras mnist example to tensorflow 2.x result in massive drop in accuracy? - tensorflow2.0

Here is the mnist example from the Keras documentation:
https://keras.io/examples/mnist_cnn/
I put it into google colab, under Tensorflow 1.x, and it performs really well:
https://colab.research.google.com/drive/15NW-lXhRUxqSCCygVxddXCo5ID7yF2iL
I made very simple changes to make it execute under TF-2.x:
https://colab.research.google.com/drive/1ul-eFn1XRe9ta3cu5vHchaa4DxStRda_
It completely crushes performance! Accuracy drops like a rock!
What did I do wrong?

The difference is in the optimizers. tf.keras.optimizers.Adadelta uses a learning rate of 0.001. keras.optimizers.Adadelta uses a learning rate of 1.0.
Check keras.optimizers and tf.keras.optimizers.Adadelta for more details. In particular, the Tensorflow page mentions that Adadelta is supposed to have a learning rate of 1.0 to match the original paper.

Related

Keras getting frozen when using regularizer in CNN model

I had a custom CNN implementation in keras running with TensorFlow backend. To improve generalizability I was working on adding regularization to the CNN model. The model works fine without any activity/kernel regularization. The moment I add an activity/kernel regularization the model freezes in between; training typically stops in between batches/iterations of a single epoch (for e.g. 67/172 batch). The issue is very repeatable and reproducible on my system and I was able to localize the issue to the implementation of regularization. It was strange to see this behavior and I could not find similar issues by others. I am not sure if I need to provide any additional information, if someone can guide me on what is lacking, I would be more than happy to provide the required information, and guidance on the issue would be greatly appreciated.
The following are some helpful information about things like the libraries/dependencies
Keras 2.4.3
Tensorflow 2.3.1
GPU: NVIDIA 1070 TI (8GB)
cudart64_101.dll was successfully openedT
The code was written in Spyder running on Python 3.8
Input: 32 batch size, input size (32, 256,64,1)
Using model.fit function to train the model
100,277 parameters, 99523 trainable
Actually, I think this issue is fixed after I updated the NVIDIA software to the latest version (11.1) and added the most recent ones to the path

What effects should tensorflow.compat.v1.disable_v2_behavior() have on training using the Keras API?

I have a CNN that trains, on a few hundred thousand examples, to a validation accuracy of ~95% after one epoch. It's straight forward code, using Keras to define a network using the Sequential API. Originally I prepared and used this model on TF 1.3. When I port it over to TF 2.1, replacing the keras calls with tensorflow.keras, it gets to ~60% quickly and gets stuck there (seemingly for many epochs), and the training loss always seems to converge to the same value.
If I add in tf.disable_v2_behavior() at the top of the script, it trains similarly to before.
The documentation states simply that "It switches all global behaviors that are different between TensorFlow 1.x and 2.x to behave as intended for 1.x". Hidden behind the Keras API, I haven't found a clear answer to what this really means in practice. Why should I expect a VGG-like CNN, defined using Keras and trained with model.fit(), to work well without v2 behaviour but to fail so consistently with?
Edit: disable_eager_execution() produces the same result, with improved performance.
Please try disabling eager execution and see if that helps.
tf.compat.v1.disable_eager_execution()
(Add this to the top of your script)

Is there a Tensorflow or Keras equivalent to fastai's interp.plot_top_losses?

Is there a Tensorflow or Keras equivalent to fastai's interp.plot_top_losses? If not, how can I manually obtain the predictions with the greatest loss?
Thank you.
I found the answer, it is ktrain! Comes with learning rate finder, learning rate schedules, ready to used per-trained models and many more features inspired by fastai.
https://github.com/amaiya/ktrain

Why I got different training results from using keras and tf.keras?

I was using TensorFlow 1.13 and Keras for my research projects. Nowadays, due to some future warnings, I installed TensorFlow 2.0 and tried to use it.
Instead of using Keras as I did before, I used tf.keras and built the same RNN model. i.e.
from keras.layers import Dense (I used before)
v.s.
from tf.keras.layers import Dense (I tried now)
All other codes are the same. However, I get some worse results for using import from tf.keras.layers one. And I am pretty sure it's not a coincidence, I tried cross-validation and run the models many times.
Does anyone have some ideas about why it happens? Are there any differences from the tf.keras.layers and keras.layers? If so, how can we be careful in case we got some "wrose" results?
tf.keras is tensorflow's implementation of the keras api. Ideally, using tf.keras should not provide you worse results. However, there might be a mismatch in the versions of both the keras which may/may not give you different results. You can check the version using tf.keras.version function and see if that is the same version of keras that you had used before.
For more details refer:
https://www.tensorflow.org/guide/keras/overview

Optimizers in Tensorflow

From various examples of Tensorflow (translation, ptb) it seems like that you need to explicitly change learning rate when using GradientDescentOptimizer. But is it the case while using some more 'sophisticated' techniques like Adagrad, Adadelta etc. Also when we continue training the model from a saved instance, are the past values used by these optimizers saved in the model file ?
It depends on the Optimizer you are using. Vanilla SGD needs (accepts) individual adaption of the learning rate. Some others do. Adadelta for example does not. (https://arxiv.org/abs/1212.5701)
So this depends not so much on Tensorflow but rather on the mathematical background of the optimizer you are using.
Furthermore: Yes, saving and restarting the training does not reset the learning rates, but continuous at the point saved.