Tensorflow - No gradients provided for any variable while executing the mean_squared_error loss function - tensorflow

I am doing transfer learning using a pre-trained inception-resnet-v2 model. From one of the conv layers I am extracting the best activation (best quality) to calculate the predicted landmarks using opencv and numpy operations. The loss function I am applying is the mean_squared_error loss. Unfortunately, when I am executing this function I get an error message that no gradients are available for any of the variables. I am struggling with this problem since two weeks and I don't know how to proceed. While debugging I could see that the problem occurred when the apply_gradients function gets executed internally. I have searched and used some solutions from here like this ones:
ValueError: No gradients provided for any variable in Tensorflow
selecting trainable variables to compute gradient "No variables to optimize"
Tensorflow: How to replace or modify gradient?
...
In addition, I have tried to write my own operation with gradient support, using this awesome tutorial: https://code-examples.net/en/q/253d718, because this solution wraps my python and opencv code in tensorflow. Unfortunately, the issue still remains. Tracing the path from the output of the network to the mean_squared_error function using TensorBoard, I could see that the path is available and continuously, too.
# Extracts the best predicted images from a specific activation
layer
# PYTHON function: get_best_images(...) -> uses numpy and opencv
# PYTHON function: extract_landmarks(...) -> uses numpy
# Endpoints is the conv layer that gets extracted
best_predicted = tf.py_func(get_best_images, [input,
end_points['Conv2d_1a_3x3']], tf.uint8) # Gets best activation
best_predicted.set_shape(input.shape)
# Gets the predicted landmarks and processes both target and
predicted for further calculation
proc_landmarks = tf.py_func(get_landmarks, [best_predicted,
target_landmarks], [tf.int32, tf.int32])
proc_landmarks[0].set_shape(target_landmarks.shape)
# target landmarks
proc_landmarks[1].set_shape(target_landmarks.shape)
# predicted landmarks
# --> HERE COMES THE COMPUTATION TO PROCESS THE TARGET AND PREDICTED
LANDMARKS
# Flattens and reshapes the tensors to 1D (68,1)
target_flatten = tf.reshape(target_result[0], [-1])
target_flatten = tf.reshape(target_flatten, [68,1])
predicted_flatten = tf.reshape(predicted_result[1], [-1])
predicted_flatten = tf.reshape(predicted_flatten, [68,1])
edit_target_landmarks = tf.cast(target_flatten, dtype=tf.float32)
edit_predicted_landmarks = tf.cast(predicted_flatten,
dtype=tf.float32)
# Calculating the loss
mse_loss =
tf.losses.mean_squared_error(labels=edit_target_landmarks,
predictions=edit_predicted_landmarks)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001,
name='ADAM_OPT').minimize(mse_loss) # <-- here does the error occur
The error message is this one (for short only some variables get listed):
ValueError: No gradients provided for any variable, check your graph >for ops that do not support gradients, between variables ["'InceptionResnetV2/Conv2d_1a_3x3/weights:0' shape=(3, 3, 3, 32) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_1a_3x3/BatchNorm/beta:0' shape=(32,) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_2a_3x3/weights:0' shape=(3, 3, 32, 32) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_2a_3x3/BatchNorm/beta:0' shape=(32,) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_2b_3x3/weights:0' shape=(3, 3, 32, 64) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_2b_3x3/BatchNorm/beta:0' shape=(64,) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_3b_1x1/weights:0' shape=(1, 1, 64, 80) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_3b_1x1/BatchNorm/beta:0' shape=(80,) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_4a_3x3/weights:0' shape=(3, 3, 80, 192) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_4a_3x3/BatchNorm/beta:0' shape=(192,) >dtype=float32_ref>", "'InceptionResnetV2/Mixed_5b/Branch_0/Conv2d_1x1/weights:0' shape=(1, 1, >192, 96) dtype=float32_ref>", "
EDIT:
I have managed to compute the gradients for the first two variables of the train list using this guide Override Tensorflow Backward-Propagation. Based on that I forgot the third parameter (which is mentioned as the d parameter in the guide) in the forward and backward propagation function which is in my case the conv layer output of the net. Nevertheless, I am getting only the first two gradients computed and all the others are missing. Do I have to compute and return in the backpropagation function for every trainable variable the gradient?. When I am right in the backpropagation function we are computing the derivatives with respect to the ops inputs, which are in my case 2 variables (target and predicted) and one for the conv layer output (i.e. return grad * op.inputs[0], grad * op.inputs[1], grad * op.inputs[2]). I thought that the overall computation for all trainable variables gets done after defining the custom gradient computation and while applying the opt.compute_gradient function using as a second parameter the variable list. Am I right or wrong?.
I have posted the part of the TensorBoard output for the mean_squared_error op. The image shows the additional loss function which I had left out to simplify my problem. This loss function works well. The arrow from the mean_squared_error function to the gradient computation is missing, because of the issue. I hope this gives a better overview.

Related

TensorFlow Batch Normalization Dimension

I'm trying to use batch normalization in a conv2d_transpose as follows:
h1 = tf.layers.conv2d_transpose(inputs, 64, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
h2 = tf.layers.conv2d_transpose(h1, 3, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
And I am receiving the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 32 and 64
From merging shape 2 with other shapes. for 'tower0/AddN' (op: 'AddN') with input shapes: [?,32,32,64], [?,64,64,3].
I've seen that other people have had this error in Keras because of the difference in dimension ordering between TensorFlow and Theano. However, I'm using pure TensorFlow, all of my variables are in TensorFlow dimension format (batch_size, height, width, channels), and the data_format of the conv2d_transpose layer should be the default 'channels_last'. What am I missing here?
tf.layers.batch_normalization should be added as a layer, not a regularizer. activity_regularizer is a function that takes activity (layer's output) and produces an extra loss term that is added to the overall loss term of the whole network. For example, you might want to penalize networks that produce high activation. You can see how activity_regularizer is called on the outputs and its result added to the loss here.

How to freeze/lock weights of one TensorFlow variable (e.g., one CNN kernel of one layer)

I have a TensorFlow CNN model that is performing well and we would like to implement this model in hardware; i.e., an FPGA. It's a relatively small network but it would be ideal if it were smaller. With that goal, I've examined the kernels and find that there are some where the weights are quite strong and there are others that aren't doing much at all (the kernel values are all close to zero). This occurs specifically in layer 2, corresponding to the tf.Variable() named, "W_conv2". W_conv2 has shape [3, 3, 32, 32]. I would like to freeze/lock the values of W_conv2[:, :, 29, 13] and set them to zero so that the rest of the network can be trained to compensate. Setting the values of this kernel to zero effectively removes/prunes the kernel from the hardware implementation thus achieving the goal stated above.
I have found similar questions with suggestions that generally revolve around one of two approaches;
Suggestion #1:
tf.Variable(some_initial_value, trainable = False)
Implementing this suggestion freezes the entire variable. I want to freeze just a slice, specifically W_conv2[:, :, 29, 13].
Suggestion #2:
Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list)
Again, implementing this suggestion does not allow the use of slices. For instance, if I try the inverse of my stated goal (optimize only a single kernel of a single variable) as follows:
Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list = W_conv2[:,:,0,0]))
I get the following error:
NotImplementedError: ('Trying to optimize unsupported type ', <tf.Tensor 'strided_slice_2228:0' shape=(3, 3) dtype=float32>)
Slicing tf.Variables() isn't possible in the way that I've tried it here. The only thing that I've tried which comes close to doing what I want is using .assign() but this is extremely inefficient, cumbersome, and caveman-like as I've implemented it as follows (after the model is trained):
for _ in range(10000):
# get a new batch of data
# reset the values of W_conv2[:,:,29,13]=0 each time through
for m in range(3):
for n in range(3):
assign_op = W_conv2[m,n,29,13].assign(0)
sess.run(assign_op)
# re-train the rest of the network
_, loss_val = sess.run([optimizer, loss], feed_dict = {
dict_stuff_here
})
print(loss_val)
The model was started in Keras then moved to TensorFlow since Keras didn't seem to have a mechanism to achieve the desired results. I'm starting to think that TensorFlow doesn't allow for pruning but find this hard to believe; it just needs the correct implementation.
A possible approach is to initialize these specific weights with zeros, and modify the minimization process such that gradients won't be applied to them. It can be done by replacing the call to minimize() with something like:
W_conv2_weights = np.ones((3, 3, 32, 32))
W_conv2_weights[:, :, 29, 13] = 0
W_conv2_weights_const = tf.constant(W_conv2_weights)
optimizer = tf.train.RMSPropOptimizer(0.001)
W_conv2_orig_grads = tf.gradients(loss, W_conv2)
W_conv2_grads = tf.multiply(W_conv2_weights_const, W_conv2_orig_grads)
W_conv2_train_op = optimizer.apply_gradients(zip(W_conv2_grads, W_conv2))
rest_grads = tf.gradients(loss, rest_of_vars)
rest_train_op = optimizer.apply_gradients(zip(rest_grads, rest_of_vars))
tf.group([rest_train_op, W_conv2_train_op])
I.e,
Preparing a constant Tensor for canceling the appropriate gradients
Compute gradients only for W_conv2, then multiply element-wise with the constant W_conv2_weights to zero the appropriate gradients and only then apply gradients.
Compute and apply gradients "normally" to the rest of the variables.
Group the 2 train ops to a single training op.

Does K.function method of Keras with Tensorflow backend work with network layers?

I recently have started using Keras to build neural networks. I built a simple CNN to classify MNIST dataset. Before learning the model I used K.set_image_dim_ordering('th') in order to plot a convolutional layer weights. Right now I am trying to visualize convolutional layer output with K.function method, but I keep getting error.
Here is what I want to do for now:
input_image = X_train[2:3,:,:,:]
output_layer = model.layers[1].output
input_layer = model.layers[0].input
output_fn = K.function(input_layer, output_layer)
output_image = output_fn.predict(input_image)
print(output_image.shape)
output_image = np.rollaxis(np.rollaxis(output_image, 3, 1), 3, 1)
print(output_image.shape)
fig = plt.figure()
for i in range(32):
ax = fig.add_subplot(4,8,i+1)
im = ax.imshow(output_image[0,:,:,i], cmap="Greys")
plt.xticks(np.array([]))
plt.yticks(np.array([]))
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([1, 0.1, 0.05 ,0.8])
fig.colorbar(im, cax = cbar_ax)
plt.tight_layout()
plt.show()
And this is what I get:
File "/home/kinshiryuu/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1621, in function
return Function(inputs, outputs, updates=updates)
File "/home/kinshiryuu/anaconda3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1569, in __init__
raise TypeError('`inputs` to a TensorFlow backend function '
TypeError: `inputs` to a TensorFlow backend function should be a list or tuple.
You should do the following changes:
output_fn = K.function([input_layer], [output_layer])
output_image = output_fn([input_image])
K.function takes the input and output tensors as list so that you can create a function from many input to many output. In your case one input to one output.. but you need to pass them as a list none the less.
Next K.function returns a tensor function and not a model object where you can use predict(). The correct way of using is just to call as a function
I think you can also use K.function to get gradients.
self.action_gradients = K.gradients(Q_values, actions)
self.get_action_gradients=K.function[*self.model.input, K.learning_phase()], outputs=action_gradients)
which basically runs the graph to obtain the Q-value to calculate the gradient of the Q-value w.r.t. action vector in DDPG. Source code here (lines 64 to 70): https://github.com/nyck33/autonomous_quadcopter/blob/master/criticSolution.py#L65
In light of the accepted answer and this usage here (originally from project 5 autonomous quadcopter in the Udacity Deep Learning nanodegree), a question remains in my mind, ie. is K.function() something that can be used fairly flexibly to run the graph and to designate as outputs of K.function() for example outputs of a particular layer, gradients or even weights themselves?
Lines 64 to 67 here: https://github.com/nyck33/autonomous_quadcopter/blob/master/actorSolution.py
It is being used as a custom training function for the actor network in DDPG:
#caller
self.actor_local.train_fn([states, action_gradients, 1])
#called
self.train_fn = K.function(inputs=[self.model.input, action_gradients, K.learning_phase()], \
outputs=[], updates=updates_op)
outputs is given a value of an empty list because we merely want to train the actor network with the action_gradients from the critic network.

Oversampling images during inference

It is is a common practice in convolutional neural networks to oversample a given image during inference,
I.e to create a batch from different transformation of the same image (most common - different crops and mirroring), transfer the entire batch through the network and average (or another kind of reducing function) over the results to get a single prediction (caffe example),
How can this approach be implemented in tensorflow?
You can take a look at the TF cnn tutorial. In particular, the function distorted_inputs does the image preprocessing step.
In short, there are a couple of TF functions in the tf.image package that help with distorting the images. You can use either them or regular numpy functions to create an extra dimension for the output, for which you can average the results:
Before:
input_place = tf.placeholder(tf.float32, [None, 256, 256, 3])
prediction = some_model(input_place) # size: [None]
sess.run(prediction, feed_dict={input_place: batch_of_images})
After:
input_place = tf.placeholder(tf.float32, [None, NUM_OF_DISTORTIONS, 256, 256, 3])
prediction = some_model(input_place) # make sure it is of size [None, NUM_DISTORTIONS]
new_prediction = tf.reduce_mean(prediction, axis=1)
new_batch = np.zeros(batch_size, NUM_OF_DISTORTIONS, 256, 256, 3)
for i in xrange(len(batch_of_images)):
for f in xrange(len(distortion_functions)):
new_batch[i, f, :, :, :] = distortion_functions[f](batch_of_images[i])
sess.run(new_prediction, feed_dict={input_place: new_batch})
Take a look at TF's image-related functions. You could apply those transformations at test time to some input image, and stack all of them together to make a batch.
I imagine you could also do this using OpenCV or some other image processing tool. I don't see a need to do it in the computation graph. You could create the batches beforehand, and pass it through in feed_dict.

Tensorflow: optimize over input with gradient descent

I have a TensorFlow model (a convolutional neural network) which I successfully trained using gradient descent (GD) on some input data.
Now, in a second step, I would like to provide an input image as initialization then and optimize over this input image with fixed network parameters using GD. The loss function will be a different one, but this a detail.
So, my main question is how to tell the gradient descent algorithm to
stop optimizing the network parameters
to optimize over the input image
The first can probably done with this
Holding variables constant during optimizer
Do you guys have ideas about the second point?
I guess I can recode the gradient descent algorithm myself using the TF gradient function, but my gut feeling tells me that there should be an easier way, which also allows me to benefit from more complex GD variants (Adam etc.).
No need for your SDG own implementation. TensorFlow provides all functions:
import tensorflow as tf
import numpy as np
# some input
data_pldhr = tf.placeholder(tf.float32)
img_op = tf.get_variable('input_image', [1, 4, 4, 1], dtype=tf.float32, trainable=True)
img_assign = img_op.assign(data_pldhr)
# your starting image
start_value = (np.ones((4, 4), dtype=np.float32) + np.eye(4))[None, :, :, None]
# override variable_getter
def nontrainable_getter(getter, *args, **kwargs):
kwargs['trainable'] = False
return getter(*args, **kwargs)
# all variables in this scope are not trainable
with tf.variable_scope('myscope', custom_getter=nontrainable_getter):
x = tf.layers.dense(img_op, 10)
y = tf.layers.dense(x, 10)
# the usual stuff
cost_op = tf.losses.mean_squared_error(x, y)
train_op = tf.train.AdamOptimizer(0.1).minimize(cost_op)
# fire up the training process
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(img_assign, {data_pldhr: start_value})
print(sess.run(img_op))
for i in range(10):
_, c = sess.run([train_op, cost_op])
print(c)
print(sess.run(img_op))
represent an image as tf.Variable with trainable=True
initialise this variable with the starting image (initial guess)
recreate the NN graph using TF variables with trainable=False and copy the weights from the trained NN graph using tf.assign
calculate the loss function
plug the loss into any TF optimiser algorithm you want
Another alternative is to use ScipyOptimizerInterface, which allows to use scipy's minimizer. This supports constrained minimization.
I'm looking for a solution to the same problem, but my model is not an easy one as I have an LSTM network with cells created with MultiRNNCell, I don't think it is possible to get the weight and clone the network. Is there any workaround so that I can compute the gradient wrt the input?