tensorflow-gpu installed but not used for calculation - tensorflow

I have a fresh install of windows 10 and installed tensorflow-gpu (i think i should have done successfully) as i run the sample code, i see the gpu0 is used as follow:
>>> import tensorflow as tf
>>> # Creates a graph.
... a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a, b)
>>> # Creates a session with log_device_placement set to True.
... sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2017-05-08 02:10:35.354149: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.354283: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.355376: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.355835: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.356245: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.356629: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.356977: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.357376: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.765058: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.607
pciBusID 0000:01:00.0
Total memory: 11.00GiB
Free memory: 9.12GiB
2017-05-08 02:10:35.765151: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:908] DMA: 0
2017-05-08 02:10:35.765851: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:918] 0: Y
2017-05-08 02:10:35.780335: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0
2017-05-08 02:10:36.157808: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0
>>> # Runs the op.
... print(sess.run(c))
MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
2017-05-08 02:10:46.297244: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:841] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
b: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-05-08 02:10:46.299024: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:841] b: (Const)/job:localhost/replica:0/task:0/gpu:0
a: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-05-08 02:10:46.302386: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:841] a: (Const)/job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
But when i run the code for deep-learning, the gpu-memory is all used but the gpu-loading is almost 0. the cpu-loading is around 20% (before install tensorflow-gpu, the cpu-loading is 100%), the time for learning is a bit faster than using cpu.
what may be the cause? Please give me some advance, thank you so much

The line gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0
shows that the computation is being done on the gpu.
The reason the useage isn't too high is probably because there isn't too much processing to do relative to the power of your GPU.

Related

Segmentation Fault when using tensorflow

My main program in Keras and Tensorflow hits a seg fault
python fine-tune.py --train_dir /root/data/train_dump/ --val_dir
/root/data/val_dump/ --nb_epoch 100 --batch_size 32
Using TensorFlow backend.
WARNING:root:Keras version 2.2.2 detected. Last version known to be fully compatible of Keras is 2.1.3 .
WARNING:root:TensorFlow version 1.10.0 detected. Last version known to be fully compatible is 1.5.0 .
Found 2976 images belonging to 4 classes.
Found 1100 images belonging to 4 classes.
2018-09-06 23:28:06.891710: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-06 23:28:08.030896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:08:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-09-06 23:28:08.030988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-06 23:28:08.688955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-06 23:28:08.689049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-09-06 23:28:08.689071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-09-06 23:28:08.689614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10974 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:08:00.0, compute capability: 7.0)
Epoch 1/100
2018-09-06 23:29:31.853354: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-09-06 23:29:31.853522: E tensorflow/stream_executor/cuda/cuda_dnn.cc:360] Possibly insufficient driver version: 390.30.0
Segmentation fault (core dumped)
Yet tensorflow appears to be okay via this tf.session() test:
In [2]: import tensorflow as tf
...:
...: with tf.device('/gpu:0'):
...: a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
...: b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
...: c = tf.matmul(a, b)
...: with tf.Session() as sess:
...: print(sess.run(c))
...:
2018-09-06 23:37:31.603090: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-06 23:37:32.765804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:08:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-09-06 23:37:32.765866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-06 23:37:33.358184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-06 23:37:33.358276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-09-06 23:37:33.358295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-09-06 23:37:33.358793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10974 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:08:00.0, compute capability: 7.0)
[[22. 28.]
[49. 64.]]
Thinking maybe the warnings had something to do with it, I did give that a try.
Installing collected packages: keras
Found existing installation: Keras 2.2.2
Uninstalling Keras-2.2.2:
Successfully uninstalled Keras-2.2.2
Successfully installed keras-2.1.3
(fastai) root#607b0f29-ad6b-482c-aead-aeae0a84fe2f:~# python fine-tune.py --train_dir /root/data/train_dump/ --val_dir /root/data/val_dump/ --nb_epoch 100 --batch_size 32
Using TensorFlow backend.
Found 2976 images belonging to 4 classes.
Found 1100 images belonging to 4 classes.
2018-09-07 00:30:10.722085: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-07 00:30:11.880320: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:08:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-09-07 00:30:11.880388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN V, pci bus id: 0000:08:00.0, compute capability: 7.0)
Epoch 1/100
2018-09-07 00:30:51.858613: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-09-07 00:30:51.858965: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.30 Wed Jan 31 22:08:49 PST 2018
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
"""
2018-09-07 00:30:51.859055: E tensorflow/stream_executor/cuda/cuda_dnn.cc:393] possibly insufficient driver version: 390.30.0
2018-09-07 00:30:51.859099: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
Aborted (core dumped)

Could not satisfy explicit device specification '/device:GPU:0' because no devices matching

I want to use TensorFlow 0.12 for GPU on my Ubuntu 14.04 machine.
But when assigning a device to a node I am getting the following error.
InvalidArgumentError (see above for traceback): Cannot assign a device to
node 'my_model/RNN/zeros': Could not satisfy explicit device specification
'/device:GPU:0' because no devices matching that specification are registered
in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
[[Node: my_model/RNN/zeros = Fill[T=DT_FLOAT, _device="/device:GPU:0"]
(my_model/RNN/pack, my_model/RNN/zeros/Const)]]
My tensorflow seems to be set up correctly, since this simple program works:
import tensorflow as tf
# Creates a graph.
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
Which outputs:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA
library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:128]
successfully opened CUDA library libcudnn.so locally I
tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library
libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:128]
successfully opened CUDA library libcuda.so.1 locally I
tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library
libcurand.so locally I tensorflow/core/common_runtime/gpu/gpu_device.cc:885]
Found device 0 with properties: name: Tesla K40m major: 3 minor: 5
memoryClockRate (GHz) 0.745 pciBusID 0000:08:00.0 Total memory: 11.17GiB Free
memory:
11.10GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I
tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I
tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow
device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:08:00.0)
Device mapping: /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name:
Tesla K40m, pci bus id: 0000:08:00.0 I tensorflow/core/common_runtime
/direct_session.cc:255] Device mapping: /job:localhost/replica:0/task:0/gpu:0
-> device: 0, name: Tesla K40m, pci bus id: 0000:08:00.0
MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0 I tensorflow/core
/common_runtime/simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0
/task:0/gpu:0 b: (Const): /job:localhost/replica:0/task:0/gpu:0 I
tensorflow/core/common_runtime/simple_placer.cc:827] b: (Const)/job:localhost
/replica:0/task:0/gpu:0 a: (Const): /job:localhost/replica:0/task:0/gpu:0 I
tensorflow/core/common_runtime/simple_placer.cc:827] a: (Const)/job:localhost
/replica:0/task:0/gpu:0 [[ 22. 28.] [ 49.
64.]]
How can I assign a device to a node correctly?
Try using sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)). This will resolve the problem if it couldn't place an operation on the GPU. Since some operations have only CPU implementation.
Using allow_soft_placement=True will allow TensorFlow to fall back to CPU when no GPU implementation is available.

How can I get rid of the warnings for Tensorflow? Other solutions are not working

So for example, if I run this piece of code:
import tensorflow as tf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))
Then I would get this:
2017-06-24 10:20:57.441289: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-06-24 10:20:57.442069: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-24 10:20:57.443010: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-24 10:20:57.444615: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-24 10:20:57.445662: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-24 10:20:57.446273: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-24 10:20:57.447475: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-24 10:20:57.448190: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-06-24 10:21:01.548276: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1060
major: 6 minor: 1 memoryClockRate (GHz) 1.6705
pciBusID 0000:01:00.0
Total memory: 6.00GiB
Free memory: 5.01GiB
2017-06-24 10:21:01.549636: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0
2017-06-24 10:21:01.550040: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0: Y
2017-06-24 10:21:01.550479: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0
2017-06-24 10:21:01.910806: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\direct_session.cc:265] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1060, pci bus id: 0000:01:00.0
MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
b: (Const): /job:localhost/replica:0/task:0/gpu:0
a: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-06-24 10:21:01.918147: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
2017-06-24 10:21:01.918794: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] b: (Const)/job:localhost/replica:0/task:0/gpu:0
2017-06-24 10:21:01.919403: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] a: (Const)/job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
The warnings always come up and its extremely annoying. I have tried using other solution such as:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
But it still gave me the same warnings.
Two things, how can I get rid of the warnings? Is tensorflow using just GPU?
There warnings caused by building parameters. Recompile your tensorflow with parameter bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse --copt=-msse2 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2
--copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package will solve the problem
You are getting this info dump, because you are starting your Session explicitly with the instruction to show them:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
the log_device_placement=True bit is intended specifically to demonstrate which operations are performed on which devices (for example to demonstrate that the GPU is indeed utilized properly, etc.)
If you don't want to see this info, simply start the session without such options:
sess = tf.Session()
Having this bit is still helpful however:
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
Because TF might show you other warnings that you might find annoying/unnecessary, so I'd suggest keeping this line (but in my experience that doesn't have any effect on Windows, only on Mac and Linux)

Warning when I run tensorflow-gpu. Is it using the GPU?

When I run this command:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
I get this log:
2017-06-16 11:29:42.305931: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-16 11:29:42.305950: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-16 11:29:42.305963: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-16 11:29:42.305975: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-16 11:29:42.305986: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-06-16 11:29:42.406689: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-16 11:29:42.406961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7715
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 248.75MiB
2017-06-16 11:29:42.406991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-06-16 11:29:42.407010: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-06-16 11:29:42.407021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0
2017-06-16 11:29:42.408087: I tensorflow/core/common_runtime/direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0
Does this assure me that tensorflow code will use GPU? I had a previous version of tensorflow and the message was clear that it used GPU. Now after I upgraded it, the messages are different and confusing.
I can see that it discovered my GPU, but is it using it for sure or still using the CPU? How can I check this from the code to make sure the device used is GPU?
I'm concerned because I have:
import keras
Using TensorFlow backend
It shows that keras is using a CPU version!
Use device scope as follow:
with tf.device('/gpu:0'):
a = tf.constant(0)
sess = tf.Session()
sess.run(a)
If it doesn't complain that it can't assign a device to node, you are using the GPU.
You can go one step further to analyse where each node is being allocated to through log_device_placement.

Tensorflow Kernel Crashes when I try to use trace_level=tf.RunOptions.FULL_TRACE

I try to monitor the usage of my tensorflow models with timeline. This link explains how to use it: https://stackoverflow.com/a/37774470/6716760. The minimal example here is:
import tensorflow as tf
from tensorflow.python.client import timeline
x = tf.random_normal([1000, 1000])
y = tf.random_normal([1000, 1000])
res = tf.matmul(x, y)
# Run the graph with full trace option
with tf.Session() as sess:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess.run(res, options=run_options, run_metadata=run_metadata)
# Create the Timeline object, and write it to a json
tl = timeline.Timeline(run_metadata.step_stats)
ctf = tl.generate_chrome_trace_format()
with open('timeline.json', 'w') as f:
f.write(ctf)
Unfortunately I get the following error when I try to execute the script:
An error ocurred while starting the kernel
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:0a:00.0
Total memory: 11.90GiB
Free memory: 11.61GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x28f93b0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:09:00.0
Total memory: 11.90GiB
Free memory: 11.75GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2c976b0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:06:00.0
Total memory: 11.90GiB
Free memory: 11.75GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2ba5d80
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:05:00.0
Total memory: 11.89GiB
Free memory: 11.52GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) ‑> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:0a:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) ‑> (device: 1, name: TITAN X (Pascal), pci bus id: 0000:09:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) ‑> (device: 2, name: TITAN X (Pascal), pci bus id: 0000:06:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) ‑> (device: 3, name: TITAN X (Pascal), pci bus id: 0000:05:00.0)
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH: /usr/local/cuda/lib64
F tensorflow/core/platform/default/gpu/cupti_wrapper.cc:59] Check failed: ::tensorflow::Status::OK() == (::tensorflow::Env::Default()‑>GetSymbolFromLibrary( GetDsoHandle(), kName, &f)) (OK vs. Not found: /home/sysgen/anaconda3/lib/python3.5/site‑packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cuptiActivityRegisterCallbacks)could not find cuptiActivityRegisterCallbacksin libcupti DSO
The error is hidden in the last line at the end. But what does this mean? How can I fix it?
you have to do :
sudo apt install libcupti-dev
and add this to your bashrc / zshrc :
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
Hope it help
It happened to me and reason was the file cupti64_80.dll that could not be found.
Cuda 8 install this file in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64 folder that is not in the path.
So copy the dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin, and the lib file to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64