Here is the output of from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 8087623604945614369
]
Here is the output of pip list | grep tensorflow:
tensorflow-gpu (1.4.0)
tensorflow-tensorboard (0.4.0rc3)
I can confirm that I have installed cuda 8.0 and cudnn on my machine and the output of nvidia-smi shows the GPU along with other details. Can someone please help me to understand why the output from print(device_lib.list_local_devices()) doesn't show the GPU?rr
Tried this simple tensorflow example:
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))
Error:
Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]
Related
I am using the following code to see If I am able to stop TF/KERAS from producing logs.
import tensorflow as tf
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
tf.debugging.set_log_device_placement(True)
# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
Here you may see that I have used os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' in my code to disable logs.
Num GPUs Available: 1 Executing op MatMul in device
/job:localhost/replica:0/task:0/device:GPU:0 tf.Tensor( [[22. 28.]
[49. 64.]], shape=(2, 2), dtype=float32)
Is there any way I can disable TF/KERAS to print Executing op MatMul in device /job:localhost/replica:0/task:0/device:GPU:0 ?
Remove the following line and you can get rid of the ops device placement messages:
tf.debugging.set_log_device_placement(True)
Now, I try to use tf.nn.ctc_beam_search_decoder() on GPU.
But I have a problem that it does not use GPU.
I was able to check that other tensorflow functions(e.g. Reshape and SigmoidGrad etc.) run on GPU.
But some ones including ctc_beam_search_decoder() only run on CPU, and ctc_beam_search_decoder() is slow.
Then, I have two questions.
First, does not ctc_beam_search_decoder() support GPU in TensorFlow2 ?
Second, if it's supported, could you give me how to implement or the function (or method) ?
I show simple example below.
program code.
import tensorflow as tf
from tensorflow.python.client import device_lib
tf.debugging.set_log_device_placement(True)
print(device_lib.list_local_devices())
inputs = tf.convert_to_tensor([
[0.1, 0.2, 0.3, 0.4, 0.5],
[0.2, 0.0, 0.3, 0.1, 0.1],
[0.2, 0.21, 0.3, 0.4, 0.1],
[0.2, 0.0, 0.6, 0.1, 0.5],
[0.2, 1.2, 0.3, 2.1, 0.1]])
inputs = tf.expand_dims(inputs, axis=1)
inputs_len = tf.convert_to_tensor([5])
decoded, _ = tf.nn.ctc_beam_search_decoder(inputs, inputs_len)
result(std output).
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 714951449022474384
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 11733532016050292601
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 394441871956590417
physical_device_desc: "device: XLA_GPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 11150726272
locality {
bus_id: 1
links {
}
}
incarnation: 5917663253173554940
physical_device_desc: "device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7"
]
Executing op ExpandDims in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op CTCBeamSearchDecoder in device /job:localhost/replica:0/task:0/device:CPU:0
Executing op StridedSlice in device /job:localhost/replica:0/task:0/device:GPU:0
Ignore the inputs and outputs data and focus on the device being used.
In this case, ExpandDims and StridedSlice were executed on GPU. But CTCBeamSearchDecoder was not executed on GPU.
The beam search decoder is implemented in plain C++, so it runs on the CPU and not on the GPU (code see here [1], which is basically the same as in TF1).
Beam search is an iterative algorithm (goes from one time-step to the next), so I don't think running it on the GPU would give much of a performance improvement.
The simplest way to improve runtime is to tune the beam width (the smaller the faster, the larger the more accurate).
[1] https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/ctc/ctc_beam_search.h#L159
I have checked the website but as always it was not clear for me. Can anyone describes all of the steps (from very beginning) to run any tensorflow program on GPU's?
From Tensorflow official site:
https://www.tensorflow.org/tutorials/using_gpu
# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))
Link for the tf tutorial
# Creates a graph.
with tf.device('/cpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
In above example cpu:0 has assigned to the execution process. With the log_device_placement true. So this is solution for the above code that the have mentioned
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
id: 0000:05:00.0
b: /job:localhost/replica:0/task:0/cpu:0
a: /job:localhost/replica:0/task:0/cpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
Now here place holders a, b and the matmul operation c which runs inside a session is inside the device log cpu:o but in the log device description why only MatMul has been executed in gpu:0 ?
That seems like a bug in the documentation, the MatMul operation will be placed on CPU in this case.
Indeed, running the code sample does show this:
import tensorflow as tf
# Creates a graph.
with tf.device('/cpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
# And to prove that GPUs exist and can be used
with tf.device('/gpu:0'):
d = tf.random_normal([])
print sess.run(d)
Will show the following placements:
MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0
b: (Const): /job:localhost/replica:0/task:0/cpu:0
a: (Const): /job:localhost/replica:0/task:0/cpu:0
random_normal/RandomStandardNormal: (RandomStandardNormal): /job:localhost/replica:0/task:0/gpu:0
I think the documentation bug is that the c = tf.matmul(a, b) statement was supposed to be outside the with tf.device('/cpu:0') scope.
How does it start the device in tensorflow?And where can I found the details in source code. I am searching for a long time on net. But no use. Please help or try to give some ideas how to achieve this.
You can assign a task to a device like this:
# Creates a graph.
with tf.device('/cpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
taken from here