tensorflow summary ops can assign to gpu - tensorflow

Here is part of my code.
with tf.Graph().as_default(), tf.device('/cpu:0'):
global_step = tf.get_variable(
'global_step',
[],
initializer = tf.constant_initializer(0),
writer = tf.summary.FileWriter(logs_path,graph=tf.get_default_graph())
with tf.device('/gpu:0'):
tf.summary.scalar('learning_rate', INITIAL_LEARNING_RATE)
summary_op = tf.summary.merge_all()
when I run it. I will get following error:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'learning_rate': Could not satisfy explicit device specification '/device:GPU:0' because no
supported kernel for GPU devices is available.
[[Node: learning_rate = ScalarSummary[T=DT_FLOAT, _device="/device:GPU:0"](learning_rate/tags, learning_rate/values)]]
if I move these 2 ops into tf.device("/cpu:0") device scope, It will work.
tf.summary.scalar('learning_rate', INITIAL_LEARNING_RATE)
summary_op = tf.summary.merge_all()
I google it. there are many suggestiones about using "allow_soft_placement=True". But I think this solution is basically change device scope automatically. So my question is:
why these 2 ops can not assign to gpu? Is there any documents I can look at to figure out what ops can or cannot assign to gpu?
any suggestion is welcome.

You can't assign a summary operation to a GPU because is meaningless.
In short, a GPU executes parallel operations. A summary is nothing but a file in which you append new lines every time you write on it. It's a sequential operation that has nothing in common with the operation that GPUs are capable to do.

Your error says it all:
Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
That operation (in the tensorflow version you're using) has no GPU implementation and thus must be sent to a CPU device.

Related

Running Tensorflow model inference script on multiple GPU

I'm trying to run the model scoring (inference graph) from tensorflow objec detection API to run it on multiple GPU's, tried specifying the GPU number in the main, but it runs only on single GPU.placed GPU utilization snapshot here
Using tensorflow-gpu==1.13.1, can you kindly point me what I'm missing here.
for i in range(2):
with tf.device('/gpu:{}' . format(i)):
tf_init()
init = tf.global_variables_initializer
with detection_graph.as_default():
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
call to #run_inference_multiple_images function
The responses to this question should give you a few options for fixing this.
Usually TensorFlow will occupy all visible GPUs unless told otherwise. So if you haven't already tried, you could just remove the with tf.device line (assuming you only have the two GPUs) and TensorFlow should use them both.
Otherwise, I think the easiest is setting the environment variables with os.environ["CUDA_VISIBLE_DEVICES"] = "0,1".

tf.Variable can't pin to GPU?

My code:
import tensorflow as tf
def main():
with tf.device('/gpu:0'):
a = tf.Variable(1)
init_a = tf.global_variables_initializer()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(init_a)
if __name__ == '__main__':
main()
The error:
InvalidArgumentError (see above for traceback): Cannot assign a device
for operation 'Variable': Could not satisfy explicit device
specification '/device:GPU:0' because no supported kernel for GPU
devices is available.
Does this mean tf can't pin Variable to GPU?
Here is another thread which related to this topic.
int32 types are not (as of January 2018) comprehensively supported on GPUs. I believe the full error would say something like:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Variable': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Assign: CPU
Identity: CPU
VariableV2: CPU
[[Node: Variable = VariableV2[container="", dtype=DT_INT32, shape=[], shared_name="", _device="/device:GPU:0"]()]]
And it's the DT_INT32 there that is causing you trouble, since you explicitly requested that the variable be placed on GPU but there is no GPU kernel for the corresponding operation and dtype.
If this was just a test program and in reality you need variables of another type, such as float32, you should be fine. For example:
import tensorflow as tf
with tf.device('/gpu:0'):
# Providing 1. instead of 1 as the initial value will result
# in a float32 variable. Alternatively, you could explicitly
# provide the dtype argument to tf.Variable()
a = tf.Variable(1.)
init_a = tf.global_variables_initializer()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(init_a)
Alternatively, you could choose to explicitly place int32 variables on CPU, or just not specify any device at all and let TensorFlow's device placement select GPU where appropriate. For example:
import tensorflow as tf
v_int = tf.Variable(1, name='intvar')
v_float = tf.Variable(1., name='floatvar')
init = tf.global_variables_initializer()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(init)
Which will show that 'intvar' is placed on CPU while 'floatvar' is on GPU using some log lines like:
floatvar: (VariableV2)/job:localhost/replica:0/task:0/device:GPU:0
intvar: (VariableV2)/job:localhost/replica:0/task:0/device:CPU:0
Hope that helps.
This means that Tensorflow cannot find the device you specified.
I assume you wanted to specify that your code is executed on your GPU 0.
The correct syntax would be:
with tf.device('/device:GPU:0'):
The shortform you are using is only allowed for the CPU.
You can also check this answer here: How to get current available GPUs in tensorflow?
It shows how to list the GPU devices that are recognized by TF.
And this lists the syntax: https://www.tensorflow.org/tutorials/using_gpu

Tensorflow contrib.learn.Estimator multi-GPU

In order to use the contrib.learn.Estimator for multi-GPU training, I am attempting to specify GPU assignments in my model_fn.
In pseudo-code:
def model_fn(X, y):
with tf.device('/gpu:1'):
... various tensorflow ops for model ...
return predictions, loss, train_op
Everything works fine without the tf.device('/gpu:1') call, but with it I encounter the following error:
InvalidArgumentError (see above for traceback): Cannot assign a device to
node 'save/ShardedFilename_1': Could not satisfy explicit device
specification '/device:GPU:1' because no supported kernel
for GPU devices is available.
I do not believe that I am adding the offending op to the graph myself, but rather that it is injected through the Estimator's snapshot functionality.
I believe that the solution is to set allow_soft_placement=True so that non GPU functions will fall to CPU, but it's not obvious to me how that exposed when dealing with contrib.learn.Estimator.
I see that the option is usually set in ConfigProto & passed to the session, but I've been using the Estimator's functionality to manage the session for me. Should I be taking control of the session creation, or am I missing a parameter somewhere to accomplish this?
Many thanks in advance for any advice.
Along with Estimator leaving contrib in Tensorflow 1.0 this is fixed.

Tensorflow, restore variables in a specific device

Maybe my question is a bit naive, but I really didn't find anything in the tensorflow documentation.
I have a trained tensorflow model where the variables of it was placed in the GPU. Now I would like to restore this model and test it using the CPU.
If I do this via 'tf.train.Saver.restore` as in the example:
saver = tf.train.import_meta_graph("/tmp/graph.meta")
saver.restore(session, "/tmp/model.ckp")
I have the following excpetion:
InvalidArgumentError: Cannot assign a device to node 'b_fc8/b_fc8/Adam_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
How can I make restore these variables in the CPU?
Thanks
Use clear_devices flag, ie
saver = tf.train.import_meta_graph("/tmp/graph.meta", clear_devices=True)
I'm using tensorflow 0.12 and clear_devices=True and tf.device('/cpu:0') was not working with me (saver.restore was still trying to assign variables to /gpu:0).
I really needed to force everything to /cpu:0 since I was loading several models which wouldn't fit in GPU memory anyways. Here are two alternatives to force everything to /cpu:0
Set os.environ['CUDA_VISIBLE_DEVICES']=''
Use the device_count of ConfigProto like tf.Session(config=tf.ConfigProto(device_count={"GPU": 0, "CPU": 1}))

Selectively registering the backward pass of a set of ops on the GPU

I have a set of ops that are faster on CPUs than GPUs, both in terms of the forward and backward (gradient) computations. However they're only a small fraction of the whole model, most of which is better run on the GPU. Currently, if I just use with tf.device(...) when specifying the forward model, and I let TF decide where to place the optimizer (e.g. tf.train.AdamOptimizer op), then it puts all the backward pass computations on the GPU, which is suboptimal. Is there some way of specifying that an op and its gradients should be registered on the GPU?
Currently there's no good way to customize the device assignment for ops in the (automatically generated) gradient computation. However, one thing you can do is to register a "device function" using with tf.device():, (though the documentation for this function applies and is more comprehensive). A "device function" is a function that takes a newly-constructed tf.Operation and returns a device name, and TensorFlow assigns the operation to that device. This enables you to do the following:
# These are almost certainly faster on GPU, but are just shown as an example.
OPS_ON_CPU = set(["AvgPool", "AvgPoolGrad"])
def _device_function(op):
if op.type in OPS_ON_CPU:
return "/cpu:0"
else:
# Other ops will be placed on GPU if available, otherwise CPU.
return ""
with tf.device(_device_function):
# Build model in here.
# ...
loss = ...
train_op = tf.train.AdamOptimizer(0.01).minimize(loss)
...which will place all ops with type "AvgPool" or "AvgPoolGrad" on the CPU.