tf.Variable can't pin to GPU? - tensorflow

My code:
import tensorflow as tf
def main():
with tf.device('/gpu:0'):
a = tf.Variable(1)
init_a = tf.global_variables_initializer()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(init_a)
if __name__ == '__main__':
main()
The error:
InvalidArgumentError (see above for traceback): Cannot assign a device
for operation 'Variable': Could not satisfy explicit device
specification '/device:GPU:0' because no supported kernel for GPU
devices is available.
Does this mean tf can't pin Variable to GPU?
Here is another thread which related to this topic.

int32 types are not (as of January 2018) comprehensively supported on GPUs. I believe the full error would say something like:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'Variable': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Colocation Debug Info:
Colocation group had the following types and devices:
Assign: CPU
Identity: CPU
VariableV2: CPU
[[Node: Variable = VariableV2[container="", dtype=DT_INT32, shape=[], shared_name="", _device="/device:GPU:0"]()]]
And it's the DT_INT32 there that is causing you trouble, since you explicitly requested that the variable be placed on GPU but there is no GPU kernel for the corresponding operation and dtype.
If this was just a test program and in reality you need variables of another type, such as float32, you should be fine. For example:
import tensorflow as tf
with tf.device('/gpu:0'):
# Providing 1. instead of 1 as the initial value will result
# in a float32 variable. Alternatively, you could explicitly
# provide the dtype argument to tf.Variable()
a = tf.Variable(1.)
init_a = tf.global_variables_initializer()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(init_a)
Alternatively, you could choose to explicitly place int32 variables on CPU, or just not specify any device at all and let TensorFlow's device placement select GPU where appropriate. For example:
import tensorflow as tf
v_int = tf.Variable(1, name='intvar')
v_float = tf.Variable(1., name='floatvar')
init = tf.global_variables_initializer()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(init)
Which will show that 'intvar' is placed on CPU while 'floatvar' is on GPU using some log lines like:
floatvar: (VariableV2)/job:localhost/replica:0/task:0/device:GPU:0
intvar: (VariableV2)/job:localhost/replica:0/task:0/device:CPU:0
Hope that helps.

This means that Tensorflow cannot find the device you specified.
I assume you wanted to specify that your code is executed on your GPU 0.
The correct syntax would be:
with tf.device('/device:GPU:0'):
The shortform you are using is only allowed for the CPU.
You can also check this answer here: How to get current available GPUs in tensorflow?
It shows how to list the GPU devices that are recognized by TF.
And this lists the syntax: https://www.tensorflow.org/tutorials/using_gpu

Related

TF2 add report_tensor_allocations_upon_oom to RunOptions

I'm getting this message:
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
How do I do that in Tensorflow 2.3?
Over the past few days this turned out to be a surprisingly frustrating issue. There appears to be no working example of how to do this in TF2.
This is still a long way from an allocated tensor list, but a start for TF2:
Tensorflow 2.4.1 contains the tf.config.experimental.get_memory_usage method, which returns the current number of bytes used on the GPU. Comparing this value across different points in time can shed some light on which tensors take up VRAM. It seems to be pretty accurate.
BTW, the latest nightly build contains the tf.config.experimental.get_memory_info method instead, seems they had a change of heart. This one contains the current, as well as the peak memory used.
Example code on TF 2.4.1:
import tensorflow as tf
print(tf.config.experimental.get_memory_usage("GPU:0")) # 0
tensor_1_mb = tf.zeros((1, 1024, 256), dtype=tf.float32)
print(tf.config.experimental.get_memory_usage("GPU:0")) # 1050112
tensor_2_mb = tf.zeros((2, 1024, 256), dtype=tf.float32)
print(tf.config.experimental.get_memory_usage("GPU:0")) # 3147264
tensor_1_mb = None
print(tf.config.experimental.get_memory_usage("GPU:0")) # 2098688
tensor_2_mb = None
print(tf.config.experimental.get_memory_usage("GPU:0")) # 1536
In order to load RunOptions you need to use RunMetadata. Both of them for TF2 can be found in the tf.compat.v1 package.
The following code works for keras 2.4.3 with a tensorflow 2.4.0 backend:
#Model build()
import tensorflow as tf
run_opts = tf.compat.v1.RunOptions(report_tensor_allocations_upon_oom=True)
runmeta = tf.compat.v1.RunMetadata()
keras_model.compile(optimizer=..., loss=..., options = run_opts, run_metadata=runmeta)
#Model fit()

How to set specific gpu in bert?

ResourceExhaustedError (see above for traceback):
OOM when allocating tensor of shape [768] and type float [[node
bert/encoder/layer_0/attention/output/LayerNorm/beta/adam_m/Initializer/zeros
(defined at /home/zyl/souhu/bert/optimization.py:122) =
Const_class=["loc:#bert/encoder/layer_0/attention/output/LayerNorm/beta/adam_m/Assign"],
dtype=DT_FLOAT, value=Tensor, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
How to set gpu 1 or another to run bert?
The easiest way to set what GPUs will be used is setting CUDA_VISIBLE_DEVICES environment variable. It will still be GPU:0 TensorFlow, different physically different device.
If you are using BERT within Python (which is rather a painful way), you can use the code that is creating BERT graph in a block:
with tf.device('/device:GPU:1'):
model = modeling.BertModel(...)

tensorflow summary ops can assign to gpu

Here is part of my code.
with tf.Graph().as_default(), tf.device('/cpu:0'):
global_step = tf.get_variable(
'global_step',
[],
initializer = tf.constant_initializer(0),
writer = tf.summary.FileWriter(logs_path,graph=tf.get_default_graph())
with tf.device('/gpu:0'):
tf.summary.scalar('learning_rate', INITIAL_LEARNING_RATE)
summary_op = tf.summary.merge_all()
when I run it. I will get following error:
InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'learning_rate': Could not satisfy explicit device specification '/device:GPU:0' because no
supported kernel for GPU devices is available.
[[Node: learning_rate = ScalarSummary[T=DT_FLOAT, _device="/device:GPU:0"](learning_rate/tags, learning_rate/values)]]
if I move these 2 ops into tf.device("/cpu:0") device scope, It will work.
tf.summary.scalar('learning_rate', INITIAL_LEARNING_RATE)
summary_op = tf.summary.merge_all()
I google it. there are many suggestiones about using "allow_soft_placement=True". But I think this solution is basically change device scope automatically. So my question is:
why these 2 ops can not assign to gpu? Is there any documents I can look at to figure out what ops can or cannot assign to gpu?
any suggestion is welcome.
You can't assign a summary operation to a GPU because is meaningless.
In short, a GPU executes parallel operations. A summary is nothing but a file in which you append new lines every time you write on it. It's a sequential operation that has nothing in common with the operation that GPUs are capable to do.
Your error says it all:
Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
That operation (in the tensorflow version you're using) has no GPU implementation and thus must be sent to a CPU device.

Tensorflow embedding_lookup gradients registering on CPU?

I have something equivalent to a sparse softmax:
...
with tf.device('/gpu:0'):
indices = tf.placeholder(tf.int32, [None, dimsize])
self._W = weight_variable([self._num_nodes, input_layer_size])
self._b = bias_variable([self._num_nodes])
sampled_W = tf.transpose(tf.nn.embedding_lookup(self._W, indices), [0,2,1]) # [batchsize, inputlayersize, dim1size]
sampled_b = tf.nn.embedding_lookup(self._b, indices) # [batchsize, dim1size]
...
However, when I enable placement logging, I see multiple instances of the gradients being placed on the CPU, e.g.:
gradients/.../embedding_lookup_1_grad/Size: /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:819] gradients/.../embedding_lookup_1_grad/Size: /job:localhost/replica:0/task:0/cpu:0
This happens no matter the optimizer I choose. Am I missing something here?
If you use
tf.Session(config=tf.ConfigProto(allow_soft_placement=False))
you should get an error. That's because embedding_lookup isn't implemented on the GPU currently.

How to set specific gpu in tensorflow?

I want to specify the gpu to run my process. And I set it as follows:
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant(3.0)
with tf.Session() as sess:
while True:
print sess.run(a)
However it still allocate memory in both my two gpus.
| 0 7479 C python 5437MiB
| 1 7479 C python 5437MiB
There are 3 ways to achieve this:
Using CUDA_VISIBLE_DEVICES environment variable.
by setting environment variable CUDA_VISIBLE_DEVICES="1" makes only device 1 visible and by setting CUDA_VISIBLE_DEVICES="0,1" makes devices 0 and 1 visible. You can do this in python by having a line os.environ["CUDA_VISIBLE_DEVICES"]="0,1" after importing os package.
Using with tf.device('/gpu:2') and creating the graph. Then it will use GPU device 2 to run.
Using config = tf.ConfigProto(device_count = {'GPU': 1}) and then sess = tf.Session(config=config). This will use GPU device 1.
TF would allocate all available memory on each visible GPU if not told otherwise. Here are 5 ways to stick to just one (or a few) GPUs.
Bash solution. Set CUDA_VISIBLE_DEVICES=0,1 in your terminal/console before starting python or jupyter notebook:
CUDA_VISIBLE_DEVICES=0,1 python script.py
Python solution. run next 2 lines of code before constructing a session
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"
Automated solution. Method below will automatically detect GPU devices that are not used by other scripts and set CUDA_VISIBLE_DEVICES for you. You have to call mask_unused_gpus before constructing a session. It will filter out GPUs by current memory usage. This way you can run multiple instances of your script at once without changing your code or setting console parameters.
The function:
import subprocess as sp
import os
def mask_unused_gpus(leave_unmasked=1):
ACCEPTABLE_AVAILABLE_MEMORY = 1024
COMMAND = "nvidia-smi --query-gpu=memory.free --format=csv"
try:
_output_to_list = lambda x: x.decode('ascii').split('\n')[:-1]
memory_free_info = _output_to_list(sp.check_output(COMMAND.split()))[1:]
memory_free_values = [int(x.split()[0]) for i, x in enumerate(memory_free_info)]
available_gpus = [i for i, x in enumerate(memory_free_values) if x > ACCEPTABLE_AVAILABLE_MEMORY]
if len(available_gpus) < leave_unmasked: raise ValueError('Found only %d usable GPUs in the system' % len(available_gpus))
os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(map(str, available_gpus[:leave_unmasked]))
except Exception as e:
print('"nvidia-smi" is probably not installed. GPUs are not masked', e)
mask_unused_gpus(2)
Limitations: if you start multiple scripts at once it might cause a collision, because memory is not allocated immediately when you construct a session. In case it is a problem for you, you can use a randomized version as in original source code: mask_busy_gpus()
Tensorflow 2.0 suggest yet another method:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
# Restrict TensorFlow to only use the first GPU
try:
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
except RuntimeError as e:
# Visible devices must be set at program startup
print(e)
Tensorflow/Keras also allows to specify gpu to be used with session config. I can recommend it only if setting environment variable is not an options (i.e. an MPI run). Because it tend to be the least reliable of all methods, especially with keras.
config = tf.ConfigProto()
config.gpu_options.visible_device_list = "0,1"
with tf.Session(config) as sess:
#or K.set_session(tf.Session(config))
I believe that you need to set CUDA_VISIBLE_DEVICES=1. Or which ever GPU you want to use. If you make only one GPU visible, you will refer to it as /gpu:0 in tensorflow regardless of what you set the environment variable to.
More info on that environment variable: https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/
You can modify the GPU options settings by adding at the begining of your python script:
gpu_options = tf.GPUOptions(visible_device_list="0")
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
"0" is here the name of the GPU you want to use. You can have the list of the GPU available by typing the command nvidia-smi in the terminal prompt.
With Keras, these 2 functions allow the selection of CPU or GPU and in the case of GPU the fraction of memory that will be used.
import os
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
def set_cpu_option():
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ["CUDA_VISIBLE_DEVICES"] = ""
def set_gpu_option(which_gpu, fraction_memory):
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = fraction_memory
config.gpu_options.visible_device_list = which_gpu
set_session(tf.Session(config=config))
return
set_gpu_option("0", 0.9)
# or
set_cpu_option()
def set_specific_gpu(ID):
gpus_all_physical_list = tf.config.list_physical_devices(device_type='GPU')
tf.config.set_visible_devices(gpus_all_physical_list[ID], 'GPU')
refer to https://www.tensorflow.org/api_docs/python/tf/config/list_physical_devices
import tensorflow as tf
gpu_number = 2 #### GPU number
gpus = tf.config.list_physical_devices('GPU')
if gpus:
tf.config.experimental.set_visible_devices(gpus[gpu_number], 'GPU')
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
The most elegant and clean way I have seen this work for me on my multi-core gpu setup is:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
tf_device='/gpu:0'
This assigns the task to gpu device 1.
Similarly, doing something on the lines:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="2"
tf_device='/gpu:0'
The os.environ command can be seen as a way of making only that GPU device exposed on which you intend to run the code. The second command just picks the first of the available devices that you specified.
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="3"
The only thing which worked for me cleanly from within Processes to assign specific GPU to each process in a Pool.
TF 2.9 and above version of tensorflow has changed the APIs, so updating for the same,
gpus = tf.config.list_physical_devices('GPU')
gpu_id = 0
if gpus:
# Restrict TensorFlow to only use only one GPU based on gpu_id
try:
tf.config.set_visible_devices(gpus[gpu_id], 'GPU')
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
except RuntimeError as e:
# Visible devices must be set before GPUs have been initialized
print(e)