Tensorflow embedding_lookup gradients registering on CPU? - tensorflow

I have something equivalent to a sparse softmax:
...
with tf.device('/gpu:0'):
indices = tf.placeholder(tf.int32, [None, dimsize])
self._W = weight_variable([self._num_nodes, input_layer_size])
self._b = bias_variable([self._num_nodes])
sampled_W = tf.transpose(tf.nn.embedding_lookup(self._W, indices), [0,2,1]) # [batchsize, inputlayersize, dim1size]
sampled_b = tf.nn.embedding_lookup(self._b, indices) # [batchsize, dim1size]
...
However, when I enable placement logging, I see multiple instances of the gradients being placed on the CPU, e.g.:
gradients/.../embedding_lookup_1_grad/Size: /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:819] gradients/.../embedding_lookup_1_grad/Size: /job:localhost/replica:0/task:0/cpu:0
This happens no matter the optimizer I choose. Am I missing something here?

If you use
tf.Session(config=tf.ConfigProto(allow_soft_placement=False))
you should get an error. That's because embedding_lookup isn't implemented on the GPU currently.

Related

TensorFlow get updates computed by optimizer

In tf, the optimizer class only has two function:
compute_gradients
apply_gradients
where apply_gradients returns an op that performs the update w <- w + Δw via a tf.assign_add function.
However I need direct access to the Δw itself. (or w' = w+Δw). I know that the optimizer adds nodes to the computational graph which compute this Δw for each variable. How can I access them? Or do I have to re-implement the optimizer myself?
The reason is that I need to compute gradients dw'/dw, as I am working on something related to gradient based hyperparameter optimization (cf. https://arxiv.org/abs/1703.01785)
The "delta" applied to each variable is not accessible through any common method or name. In fact, looking a bit into the source it seems rather difficult extract, as it varies from one optimizer to the other.
What you can do, at least, is to compute the differences between variable values and their updates. For example it could work like this:
import tensorflow as tf
with tf.Graph().as_default():
# Setup example model
x = tf.placeholder(tf.float32, [None, 1])
y = tf.placeholder(tf.float32, [None, 2])
w = tf.Variable([[1., 2.]], tf.float32)
pred = x # w
loss = (tf.reduce_sum(tf.squared_difference(pred, y))
/ tf.cast(tf.shape(x)[0], tf.float32))
# Record variable values before training step
# (tf.identity should work here but it does not so we use
# a trivial add operation to enforce the control dependency)
w_old = w + 0
# Train after having recorded variable values
with tf.control_dependencies([w_old]):
train_op = tf.train.GradientDescentOptimizer(0.1).minimize(loss)
# Compute deltas after training
with tf.control_dependencies([train_op]):
w_delta = w.read_value() - w_old
init_op = tf.global_variables_initializer()
# Test
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(w))
# [[1. 2.]]
_, w_delta_val = sess.run(
[train_op, w_delta],
feed_dict={x: [[1.], [2.]], y: [[3., 4], [5., 6.]]})
print(w_delta_val)
# [[0.79999995 0.5999999 ]]
print(sess.run(w))
# [[1.8 2.6]]
To get the updated w', you get just print the w directly after you have executed optimizer.apply_gradients(). The w at present is the w'.
Now, if you want to acquire the gradient of w, just do the operation of w'-w.

tf.Variable can't pin to GPU?

My code:
import tensorflow as tf
def main():
with tf.device('/gpu:0'):
a = tf.Variable(1)
init_a = tf.global_variables_initializer()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(init_a)
if __name__ == '__main__':
main()
The error:
InvalidArgumentError (see above for traceback): Cannot assign a device
for operation 'Variable': Could not satisfy explicit device
specification '/device:GPU:0' because no supported kernel for GPU
devices is available.
Does this mean tf can't pin Variable to GPU?
Here is another thread which related to this topic.
int32 types are not (as of January 2018) comprehensively supported on GPUs. I believe the full error would say something like:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Variable': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Assign: CPU
Identity: CPU
VariableV2: CPU
[[Node: Variable = VariableV2[container="", dtype=DT_INT32, shape=[], shared_name="", _device="/device:GPU:0"]()]]
And it's the DT_INT32 there that is causing you trouble, since you explicitly requested that the variable be placed on GPU but there is no GPU kernel for the corresponding operation and dtype.
If this was just a test program and in reality you need variables of another type, such as float32, you should be fine. For example:
import tensorflow as tf
with tf.device('/gpu:0'):
# Providing 1. instead of 1 as the initial value will result
# in a float32 variable. Alternatively, you could explicitly
# provide the dtype argument to tf.Variable()
a = tf.Variable(1.)
init_a = tf.global_variables_initializer()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(init_a)
Alternatively, you could choose to explicitly place int32 variables on CPU, or just not specify any device at all and let TensorFlow's device placement select GPU where appropriate. For example:
import tensorflow as tf
v_int = tf.Variable(1, name='intvar')
v_float = tf.Variable(1., name='floatvar')
init = tf.global_variables_initializer()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(init)
Which will show that 'intvar' is placed on CPU while 'floatvar' is on GPU using some log lines like:
floatvar: (VariableV2)/job:localhost/replica:0/task:0/device:GPU:0
intvar: (VariableV2)/job:localhost/replica:0/task:0/device:CPU:0
Hope that helps.
This means that Tensorflow cannot find the device you specified.
I assume you wanted to specify that your code is executed on your GPU 0.
The correct syntax would be:
with tf.device('/device:GPU:0'):
The shortform you are using is only allowed for the CPU.
You can also check this answer here: How to get current available GPUs in tensorflow?
It shows how to list the GPU devices that are recognized by TF.
And this lists the syntax: https://www.tensorflow.org/tutorials/using_gpu

Can't learn parameters of tf.contrib.distributions.MultivariateNormalDiag via optimization

Working example:
import numpy as np
import tensorflow as tf
## construct data
np.random.seed(723888)
N,P = 50,3 # number and dimensionality of observations
Xbase = np.random.multivariate_normal(mean=np.zeros((P,)), cov=np.eye(P), size=N)
## construct model
X = tf.placeholder(dtype=tf.float32, shape=(None, P), name='X')
mu = tf.Variable(np.random.normal(loc=0.0, scale=0.1, size=(P,)), dtype=tf.float32, name='mu')
xDist = tf.contrib.distributions.MultivariateNormalDiag(loc=mu, scale_diag=tf.ones(shape=(P,), dtype=tf.float32), name='xDist')
xProbs = xDist.prob(X, name='xProbs')
## prepare optimizer
eta = 1e-3 # learning rate
loss = -tf.reduce_mean(tf.log(xProbs), name='loss')
optimizer = tf.train.AdamOptimizer(learning_rate=eta).minimize(loss)
## launch session
with tf.Session() as sess:
tf.global_variables_initializer().run()
sess.run(optimizer, feed_dict={X: Xbase})
I want to do optimization over the parameters of a multivariate gaussian distribution in tensorflow, as in my above example. I can successfully run commands like sess.run(loss, feed_dict={X: Xbase}), so I have implemented the distribution correctly. When I try to run the optimization op, I get an odd error message:
InvalidArgumentError: -1 is not between 0 and 3
[[Node: gradients_1/xDist_7/xProbs/Prod_grad/InvertPermutation = InvertPermutation[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients_1/xDist_7/xProbs/Prod_grad/concat)]]
Caused by op 'gradients_1/xDist_7/xProbs/Prod_grad/InvertPermutation'
That I do not understand.
I get the same error message if I use tf.contrib.distributions.MultivariateNormalFullCovariance instead of tf.contrib.distributions.MultivariateNormalDiag. I do not get the error if scale_diag and not loc is the variable being optimized over.
I'm still looking into why this is failing, but for a short-term fix, does making the following change work?
xLogProbs = xDist.log_prob(X, name='xLogProbs')
loss = -tf.reduce_mean(xLogProbs, name='loss')
Note: this is actually preferable to tf.log(xProbs) because it is never less numerically precise--and sometimes substantially more precise. (This is true of all tf.Distributions.)

How to freeze/lock weights of one TensorFlow variable (e.g., one CNN kernel of one layer)

I have a TensorFlow CNN model that is performing well and we would like to implement this model in hardware; i.e., an FPGA. It's a relatively small network but it would be ideal if it were smaller. With that goal, I've examined the kernels and find that there are some where the weights are quite strong and there are others that aren't doing much at all (the kernel values are all close to zero). This occurs specifically in layer 2, corresponding to the tf.Variable() named, "W_conv2". W_conv2 has shape [3, 3, 32, 32]. I would like to freeze/lock the values of W_conv2[:, :, 29, 13] and set them to zero so that the rest of the network can be trained to compensate. Setting the values of this kernel to zero effectively removes/prunes the kernel from the hardware implementation thus achieving the goal stated above.
I have found similar questions with suggestions that generally revolve around one of two approaches;
Suggestion #1:
tf.Variable(some_initial_value, trainable = False)
Implementing this suggestion freezes the entire variable. I want to freeze just a slice, specifically W_conv2[:, :, 29, 13].
Suggestion #2:
Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list)
Again, implementing this suggestion does not allow the use of slices. For instance, if I try the inverse of my stated goal (optimize only a single kernel of a single variable) as follows:
Optimizer = tf.train.RMSPropOptimizer(0.001).minimize(loss, var_list = W_conv2[:,:,0,0]))
I get the following error:
NotImplementedError: ('Trying to optimize unsupported type ', <tf.Tensor 'strided_slice_2228:0' shape=(3, 3) dtype=float32>)
Slicing tf.Variables() isn't possible in the way that I've tried it here. The only thing that I've tried which comes close to doing what I want is using .assign() but this is extremely inefficient, cumbersome, and caveman-like as I've implemented it as follows (after the model is trained):
for _ in range(10000):
# get a new batch of data
# reset the values of W_conv2[:,:,29,13]=0 each time through
for m in range(3):
for n in range(3):
assign_op = W_conv2[m,n,29,13].assign(0)
sess.run(assign_op)
# re-train the rest of the network
_, loss_val = sess.run([optimizer, loss], feed_dict = {
dict_stuff_here
})
print(loss_val)
The model was started in Keras then moved to TensorFlow since Keras didn't seem to have a mechanism to achieve the desired results. I'm starting to think that TensorFlow doesn't allow for pruning but find this hard to believe; it just needs the correct implementation.
A possible approach is to initialize these specific weights with zeros, and modify the minimization process such that gradients won't be applied to them. It can be done by replacing the call to minimize() with something like:
W_conv2_weights = np.ones((3, 3, 32, 32))
W_conv2_weights[:, :, 29, 13] = 0
W_conv2_weights_const = tf.constant(W_conv2_weights)
optimizer = tf.train.RMSPropOptimizer(0.001)
W_conv2_orig_grads = tf.gradients(loss, W_conv2)
W_conv2_grads = tf.multiply(W_conv2_weights_const, W_conv2_orig_grads)
W_conv2_train_op = optimizer.apply_gradients(zip(W_conv2_grads, W_conv2))
rest_grads = tf.gradients(loss, rest_of_vars)
rest_train_op = optimizer.apply_gradients(zip(rest_grads, rest_of_vars))
tf.group([rest_train_op, W_conv2_train_op])
I.e,
Preparing a constant Tensor for canceling the appropriate gradients
Compute gradients only for W_conv2, then multiply element-wise with the constant W_conv2_weights to zero the appropriate gradients and only then apply gradients.
Compute and apply gradients "normally" to the rest of the variables.
Group the 2 train ops to a single training op.

How do I get the gradient of the loss at a TensorFlow variable?

The feature I'm after is to be able to tell what the gradient of a given variable is with respect to my error function given some data.
One way to do this would be to see how much the variable has changed after a call to train, but obviously that can vary massively based on the learning algorithm (for example it would be almost impossible to tell with something like RProp) and just isn't very clean.
Thanks in advance.
The tf.gradients() function allows you to compute the symbolic gradient of one tensor with respect to one or more other tensors—including variables. Consider the following simple example:
data = tf.placeholder(tf.float32)
var = tf.Variable(...) # Must be a tf.float32 or tf.float64 variable.
loss = some_function_of(var, data) # some_function_of() returns a `Tensor`.
var_grad = tf.gradients(loss, [var])[0]
You can then use this symbolic gradient to evaluate the gradient in some specific point (data):
sess = tf.Session()
var_grad_val = sess.run(var_grad, feed_dict={data: ...})
In TensorFlow 2.0 you can use GradientTape to achieve this. GradientTape records the gradients of any computation that happens in the context of that. Below is an example of how you might do that.
import tensorflow as tf
# Here goes the neural network weights as tf.Variable
x = tf.Variable(3.0)
# TensorFlow operations executed within the context of
# a GradientTape are recorded for differentiation
with tf.GradientTape() as tape:
# Doing the computation in the context of the gradient tape
# For example computing loss
y = x ** 2
# Getting the gradient of network weights w.r.t. loss
dy_dx = tape.gradient(y, x)
print(dy_dx) # Returns 6