Adam Optimizer in CNTK BrainScript - cntk

This seems to be possible using the Python and C# APIs as per the documentation.
The BrainScript documentation doesn't have Adam as one of the options for gradUpdateType.
Is it possible to use the Adam optimizer in BrainScript?

Sorry Adam optimizer is implemented in V2 CNTK only. In BS, you can use FsAdagrad which is close to Adam.
Thanks,
Emad

Related

TensorFlow optimisation during running model speed up Predict

I want to disable a computation of several filters during Predict call with Tensorflow 2 and Keras.
Do i have to modify the source code of Tensorflow to achieve that ?
Short answer: No, you don't have to modify the Tensorflow source code.
Long answer with example detailled here.

Is there any TF implementation of the Original BERT other than Google and HuggingFace?

Trying to find any Tensorflow/Keras implementation of the original BERT model trained using MLM/NSP. The official google and HuggingFace implementations are very complex and has so much of added functionalities. But I want to learn and implement BERT for just learning its working.
Any leads will be helpful?
As mentioned in the comment, you can try the following implementation of MLP-BERT TensorFlow. It's a simplified version and easy to follow comparatively.

Is it really necessary to tune/optimize the learning rate when using ADAM optimizer?

Is it really necessary to optimize the initial learning rate when using ADAM as optimizer in tensorflow/keras? How can this be done (in tensorflow 2.x)?
It is. Like with any hyperparameter, an optimal learning rate should be search for. It might be the case that your model will not learn if the learning rate is too big or too small even with an optimizer like ADAM which has a nice properties regarding decay etc.
Example of behavior of a model under ADAM optimizer with respect to a learning rate can be seen in this article How to pick the best learning rate for your machine learning project
Looking for right hyperparameters is called hyperparameter tuning. I am not using TF 2.* in my projects so I will give a reference to what TensorFlow itself offers Hyperparameter Tuning with the HParams Dashboard

Is there a Tensorflow or Keras equivalent to fastai's interp.plot_top_losses?

Is there a Tensorflow or Keras equivalent to fastai's interp.plot_top_losses? If not, how can I manually obtain the predictions with the greatest loss?
Thank you.
I found the answer, it is ktrain! Comes with learning rate finder, learning rate schedules, ready to used per-trained models and many more features inspired by fastai.
https://github.com/amaiya/ktrain

How do I choose an optimizer for my tensorflow model?

Tensorflow seems to have a large collection of optimizers, is there any high level guideline (or review paper) on which one is best adapted to specific classes of loss functions ?
It depends on your datasets and NN models, but generally, I would start with Adam. Figure 2 in this paper (http://arxiv.org/abs/1412.6980) shows Adam works well.
Also, you can see a very nice animation from
http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html.