The gradient in the update_step() function in Keras - tensorflow

In tf.Keras , the function update_step() is used by an optimizer to update the variables at every iteration given the variable and the gradient. I want to know if this gradient is the full gradient of all the samples in the batch or the gradient for a sample in the batch?
thank you

Related

tensorflow gradient return none for custom loss function

I implement a model with a custom training loop and custom loss function. Loss function return a float value. But while calculate gradient tf.gradient(loss,model.trainable_weights), It gives None gradient. I know the issue is way of calculate loss. I tried with custom mse loss. It works fine. I want to implement loss function like count_predition_0's+count_predition_1's/label_0's+label_1's. It's a binary classification problem. I set batch size is 100. So model return 100 batch output. I only consider few batch eg out of 100 batch i only consider or filter it out 40 based on input. Both label and prediction in same shape that is not issue here. That label and prediction pass to custom_loss function.
Notebook link

How to get gradients during fit or fit_generator in Keras

I need to monitor the gradients in real time during training when using fit or fit_generator methods. This should have been achieved by using custom callback function. However, I don't how to access the gradients correctly. The attribute model.optimizer.update returns tensors of gradients but it need to be fed with data. What I want to get is the value of gradients that have been applied in the last batch during training.
The following answer does not give the corresponding solution because it just define a function to calculate the gradients by feeding extra data.
Getting gradient of model output w.r.t weights using Keras

Custom external loss metric for Gradient Optimizer?

I have an external function which takes y and y_prediction (in matrix format), and computes a metric which depicts how good or bad the prediction actually is.
Unfortunately the metric is no simple y - ypred or confusion matrix, but still very useful and important. How can I use this number computed for the loss or as an argument for optimizer.minimize?
If i understood correctly i think there is two way to do this:
Either the loss you want to compute can be writen as tensorflow ops which gradient is defined (for exemple SVD has no gradient defined in tensorflow library saddly) then the optimisation is direct.
Or you can always write your loss function with numpy operators and use tf.py_func() https://www.tensorflow.org/api_docs/python/tf/py_func and then you have to explicit the gradient by hand as said in here : How to make a custom activation function with only Python in Tensorflow?
But you have to know an explicit formula of your gradient ...

How to cancel BP in some layers in tensorflow?

when I try to fine-tune a VGG network, I only want to update the weights after 5th convolution layers ,in caffe , we can cancel BP in configure file. What should I do in tensorflow ? thanks !
Just use tf.stop_gradient() on the input of your 5th layer. Tensorflow will not backpropagate the error below. tf.stop_gradient() is an operation that acts as the identity function in the forward direction, but stops the gradient in the backward direction.
From documentation:
tf.stop_gradient
Stops gradient computation.
When executed in a graph, this op outputs its input tensor as-is.
When building ops to compute gradients, this op prevents the
contribution of its inputs to be taken into account. Normally, the
gradient generator adds ops to a graph to compute the derivatives of a
specified 'loss' by recursively finding out inputs that contributed to
its computation. If you insert this op in the graph it inputs are
masked from the gradient generator. They are not taken into account
for computing gradients.
Otherwise you can use optimizer.minimize(loss, variables_of_fifth_layer). Here you are running backpropagation and updating only on the variables of your 5th layer.
For a fast selection of the variables of interest you could:
Define as trainable=False all the variables that you don't want to update, and use variables_of_fifth_layer=tf.trainable_variables().
Divide layers by defining specific scopes and then variables_of_fifth_layer = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,"scope/of/fifth/layer")

Implementing gradient descent in TensorFlow instead of using the one provided with it

I want to use gradient descent with momentum (keep track of previous gradients) while building a classifier in TensorFlow.
So I don't want to use tensorflow.train.GradientDescentOptimizer but I want to use tensorflow.gradients to calculate gradients and keep track of previous gradients and update the weights based on all of them.
How do I do this in TensorFlow?
TensorFlow has an implementation of gradient descent with momentum.
To answer your general question about implementing your own optimization algorithm, TensorFlow gives you the primitives to calculate the gradients, and update variables using the calculated gradients. In your model, suppose loss designates the loss function, and var_list is a python list of TensorFlow variables in your model (which you can get by calling tf.all_variables or tf.trainable_variables, then you can calculate the gradients w.r.t your variables as follows :
grads = tf.gradients(loss, var_list)
For the simple gradient descent, you would simply subtract the product of the gradient and the learning rate from the variable. The code for that would look as follows :
var_updates = []
for grad, var in zip(grads, var_list):
var_updates.append(var.assign_sub(learning_rate * grad))
train_op = tf.group(*var_updates)
You can train your model by calling sess.run(train_op). Now, you can do all sorts of things before actually updating your variables. For instance, you can keep track of the gradients in a different set of variables and use it for the momentum algorithm. Or, you can clip your gradients before updating the variables. All these are simple TensorFlow operations because the gradient tensors are no different from other tensors that you compute in TensorFlow. Please look at the implementations (Momentum, RMSProp, Adam) of some the fancier optimization algorithms to understand how you can implement your own.