Custom gradient in tensorflow attempts to convert model to tensor - tensorflow

I am trying to use the output of one neural network to compute the loss value for another network. As the first network is approximating another function (L2 distance) I would like to provide the gradients myself, as if it had come from an L2 function.
An example of my loss function in simplified code is:
def loss_function(model_1_output):
def grad(dy, variables=None):
gradients = 2 * pred
return gradients
pred = model_2(model_1_output)
loss = pred ** 2
return loss, grad
This is called in a standard tensorflow 2.0 custom training loop such as:
with tf.GradientTape() as tape:
model_1_output = model_1(training_data)
loss = loss_function(model_1_output)
gradients = tape.gradient(loss, model_1.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables)
However, whenever I try to run this I keep getting the error:
ValueError: Attempt to convert a value (<model.model_2 object at 0x7f41982e3240>) with an unsupported type (<class 'model.model_2'>) to a Tensor.
The whole point of using the custom_gradients decorator is that I don't want the model_2 in the loss function to be included in the back propagation as I give it the gradients manually.
How can I make tensorflow completely ignore anything inside the loss function? So that for example I could do non-differetiable operations. I have tried using with tape.stop_recording() but I always result in a no gradients found error.
OS: Ubuntu 18.04
tensorflow: 2.0.0
python: 3.7


Calculating gradients in Custom training loop, difference in performace TF vs Torch

I have attempted to translate pytorch implementation of a NN model which calculates forces and energies in molecular structures to TensorFlow. This needed a custom training loop and custom loss function so I implemented to different one step training functions below.
First using Nested Gradient Tapes.
def calc_gradients(D_train_batch, E_train_batch, F_train_batch, opt):
#set up gradient tape scope in order to track gradients of both d(Loss)/d(Weights)
#and d(output)/d(input)
with tf.GradientTape() as tape1:
with tf.GradientTape() as tape2:
#set gradient tape to watch Tensor
#pass D thru model to get predicted energy vals
E_pred = model(D_train_batch, training=True)
df_dD_train_batch = tape2.gradient(E_pred, D_train_batch)
#matrix mult of -Grad_D(f) x Grad_r(D)
F_pred = -tf.einsum('ijkl,il->ijk', dD_dr_train_batch, df_dD_train_batch)
#calculate loss value
loss = force_energy_loss(E_pred, F_pred, E_train_batch, F_train_batch)
grads = tape1.gradient(loss, model.trainable_weights)
opt.apply_gradients(zip(grads, model.trainable_weights))
Other attempt with gradient tape (persistent = true)
def calc_gradients_persistent(D_train_batch, E_train_batch, F_train_batch, opt):
#set up gradient tape scope in order to track gradients of both d(Loss)/d(Weights)
#and d(output)/d(input)
with tf.GradientTape(persistent = True) as outer:
#set gradient tape to watch Tensor
#output values from model, set trainable to be true to get
#model.trainable_weights out
E_pred = model(D_train_batch, training=True)
#set gradient tape to watch trainable weights
#get gradient of output (f/E_pred) w.r.t input (D/D_train_batch) and cast to double
df_dD_train_batch = outer.gradient(E_pred, D_train_batch)
#matrix mult of -Grad_D(f) x Grad_r(D)
F_pred = -tf.einsum('ijkl,il->ijk', dD_dr_train_batch, df_dD_train_batch)
#calculate loss value
loss = force_energy_loss(E_pred, F_pred, E_train_batch, F_train_batch)
#get gradient of loss w.r.t to trainable weights for back propogation
grads = outer.gradient(loss, model.trainable_weights)
#updates weights using the optimizer and the gradients (grads)
opt.apply_gradients(zip(grads, model.trainable_weights))
These were attempted translations of the pytorch code
# Forward pass: Predict energies from the descriptor input
E_train_pred_batch = model(D_train_batch)
# Get derivatives of model output with respect to input variables. The
# torch.autograd.grad-function can be used for this, as it returns the
# gradients of the input with respect to outputs. It is very important
# to set the create_graph=True in this case. Without it the derivatives
# of the NN parameters with respect to the loss from the force error
# will not be populated (=the force error will not affect the
# training), but the model will still run fine without errors.
df_dD_train_batch = torch.autograd.grad(
# Get derivatives of input variables (=descriptor) with respect to atom
# positions = forces
F_train_pred_batch = -torch.einsum('ijkl,il->ijk', dD_dr_train_batch, df_dD_train_batch)
# Zero gradients, perform a backward pass, and update the weights.
loss = energy_force_loss(E_train_pred_batch, E_train_batch, F_train_pred_batch, F_train_batch)
which is from the tutorial for the Dscribe library at
Using either versions of the TF implementation there is a huge loss in prediction accuracy compared to running the pytorch version. I was wondering, have I maybe misunderstood the pytorch code and translated incorrectly and if so where is my discrepancy?
Model directly computes energies E, from which we use the gradient of E w.r.t D in order to calculate the forces F. The loss function is a weighted sum of MSE of both Force and energies.
These methods are in fact the same, my error was somewhere else which was creating differing results. For anyone whose trying to implement the TensorFlow versions, the nested gradient tapes are about 2x faster, at least in this scenario and also ensure to wrap the functions in an #tf.function in order to use graphs over eager execution, The speed up is about 10x.

How to evaluate the value of a tensor, from inside the model function of a custom tf.estimator

I am implementing an NLP model based on BERT, using tf.TPUEstimator(). I want to implement layer-wise training, where I need to select only one layer of the model to train for each epoch. In order to do this I wanted to change my model_fn and get the value of current_epoch.
I know how to compute the value of current_epoch as a tensor using tf.train.get_or_create_global_step() inside the model_fn BUT, I need to evaluate the value of this tensor to select which layer to train and implement return the correct train_op to the tf.estimator (train_op pertaining to a single layer chosen accrding to the value of the current_epoch).
I am unable to evaluate this tensor (current_epoch / global_step) from inside the model_fn. I tried the following but the training hangs at the step
global_step = tf.train.get_or_create_global_step()
graph = tf.get_default_graph()
my_sess = tf.Session(graph=graph)
current_epoch = (global_step * full_bs) // train_size
current_epoch =
# My program hangs at the initialising step:
Is there any way to evaluate a tensor using the tf.Estimators default session? How do I get the default session/ Graph?
Most importantly what is wrong in my code and why does the training hang when using tpu's and TPUEstimator?
This is not direct answer to OP's 2nd question, it is answer to the title.
I managed to print variable value with get_variable_value, but not sure if this is optimal way.
estimator = tf.contrib.tpu.TPUEstimator(
# ...
out = estimator.get_variable_value('output_bias')
I got
<class 'numpy.ndarray'>
[-0.00107745 0.00107744]

InvalidArgumentError: In[0] is not a matrix. Instead it has shape []

I'm not able to train the network using keras, getting the following error, at epoch 1, first batch:
InvalidArgumentError: In[0] is not a matrix. Instead it has shape []
[[{{node training/SGD/gradients/dense_1/MatMul_grad/MatMul}}]]
I'm trying to solve a regression problem using Keras and a custom function provided by
The network is a quite simple CNN VGG-like.
I think the problem is the loss function. In particular, I suppose that the weight initialization is the issue (take a look at the Tensorflow example:
That's my loss function:
def custom_loss(y_true, y_pred):
loss = SE3GeodesicLoss(np.ones((1, 6)))
y_pred = tf.cast(y_pred, dtype=tf.float32)
y_true = tf.cast(y_true, dtype=tf.float32)
loss = SE3GeodesicLoss(np.ones(6))
geodesic_loss = loss.geodesic_loss(y_pred, y_true)
geodesic_loss = tf.cast(geodesic_loss, dtype=tf.float32)
return geodesic_loss
What's strange is that I'm able to use this function as a metric for the training.
Further information:
What I'm trying to do is to estimate the position of an object having images as input and relative Eulerian angles and distance of the target as labels (which means 6 parameters [r_x, r_y, r_z, t_x, t_y, t_z]). I'm trying to implement this loss function in order to solve the attitude estimation problem. Other losses (means: MSE, MAE) are not effective enough in solving attitude regression problem.
Do you have any suggestion?

Tensorflow: optimize over input with gradient descent

I have a TensorFlow model (a convolutional neural network) which I successfully trained using gradient descent (GD) on some input data.
Now, in a second step, I would like to provide an input image as initialization then and optimize over this input image with fixed network parameters using GD. The loss function will be a different one, but this a detail.
So, my main question is how to tell the gradient descent algorithm to
stop optimizing the network parameters
to optimize over the input image
The first can probably done with this
Holding variables constant during optimizer
Do you guys have ideas about the second point?
I guess I can recode the gradient descent algorithm myself using the TF gradient function, but my gut feeling tells me that there should be an easier way, which also allows me to benefit from more complex GD variants (Adam etc.).
No need for your SDG own implementation. TensorFlow provides all functions:
import tensorflow as tf
import numpy as np
# some input
data_pldhr = tf.placeholder(tf.float32)
img_op = tf.get_variable('input_image', [1, 4, 4, 1], dtype=tf.float32, trainable=True)
img_assign = img_op.assign(data_pldhr)
# your starting image
start_value = (np.ones((4, 4), dtype=np.float32) + np.eye(4))[None, :, :, None]
# override variable_getter
def nontrainable_getter(getter, *args, **kwargs):
kwargs['trainable'] = False
return getter(*args, **kwargs)
# all variables in this scope are not trainable
with tf.variable_scope('myscope', custom_getter=nontrainable_getter):
x = tf.layers.dense(img_op, 10)
y = tf.layers.dense(x, 10)
# the usual stuff
cost_op = tf.losses.mean_squared_error(x, y)
train_op = tf.train.AdamOptimizer(0.1).minimize(cost_op)
# fire up the training process
with tf.Session() as sess:, {data_pldhr: start_value})
for i in range(10):
_, c =[train_op, cost_op])
represent an image as tf.Variable with trainable=True
initialise this variable with the starting image (initial guess)
recreate the NN graph using TF variables with trainable=False and copy the weights from the trained NN graph using tf.assign
calculate the loss function
plug the loss into any TF optimiser algorithm you want
Another alternative is to use ScipyOptimizerInterface, which allows to use scipy's minimizer. This supports constrained minimization.
I'm looking for a solution to the same problem, but my model is not an easy one as I have an LSTM network with cells created with MultiRNNCell, I don't think it is possible to get the weight and clone the network. Is there any workaround so that I can compute the gradient wrt the input?

How do I get the gradient of the loss at a TensorFlow variable?

The feature I'm after is to be able to tell what the gradient of a given variable is with respect to my error function given some data.
One way to do this would be to see how much the variable has changed after a call to train, but obviously that can vary massively based on the learning algorithm (for example it would be almost impossible to tell with something like RProp) and just isn't very clean.
Thanks in advance.
The tf.gradients() function allows you to compute the symbolic gradient of one tensor with respect to one or more other tensors—including variables. Consider the following simple example:
data = tf.placeholder(tf.float32)
var = tf.Variable(...) # Must be a tf.float32 or tf.float64 variable.
loss = some_function_of(var, data) # some_function_of() returns a `Tensor`.
var_grad = tf.gradients(loss, [var])[0]
You can then use this symbolic gradient to evaluate the gradient in some specific point (data):
sess = tf.Session()
var_grad_val =, feed_dict={data: ...})
In TensorFlow 2.0 you can use GradientTape to achieve this. GradientTape records the gradients of any computation that happens in the context of that. Below is an example of how you might do that.
import tensorflow as tf
# Here goes the neural network weights as tf.Variable
x = tf.Variable(3.0)
# TensorFlow operations executed within the context of
# a GradientTape are recorded for differentiation
with tf.GradientTape() as tape:
# Doing the computation in the context of the gradient tape
# For example computing loss
y = x ** 2
# Getting the gradient of network weights w.r.t. loss
dy_dx = tape.gradient(y, x)
print(dy_dx) # Returns 6