My main program in Keras and Tensorflow hits a seg fault
python fine-tune.py --train_dir /root/data/train_dump/ --val_dir
/root/data/val_dump/ --nb_epoch 100 --batch_size 32
Using TensorFlow backend.
WARNING:root:Keras version 2.2.2 detected. Last version known to be fully compatible of Keras is 2.1.3 .
WARNING:root:TensorFlow version 1.10.0 detected. Last version known to be fully compatible is 1.5.0 .
Found 2976 images belonging to 4 classes.
Found 1100 images belonging to 4 classes.
2018-09-06 23:28:06.891710: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-06 23:28:08.030896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:08:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-09-06 23:28:08.030988: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-06 23:28:08.688955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-06 23:28:08.689049: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-09-06 23:28:08.689071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-09-06 23:28:08.689614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10974 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:08:00.0, compute capability: 7.0)
Epoch 1/100
2018-09-06 23:29:31.853354: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-09-06 23:29:31.853522: E tensorflow/stream_executor/cuda/cuda_dnn.cc:360] Possibly insufficient driver version: 390.30.0
Segmentation fault (core dumped)
Yet tensorflow appears to be okay via this tf.session() test:
In [2]: import tensorflow as tf
...:
...: with tf.device('/gpu:0'):
...: a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
...: b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
...: c = tf.matmul(a, b)
...: with tf.Session() as sess:
...: print(sess.run(c))
...:
2018-09-06 23:37:31.603090: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-06 23:37:32.765804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:08:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-09-06 23:37:32.765866: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-06 23:37:33.358184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-06 23:37:33.358276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-09-06 23:37:33.358295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-09-06 23:37:33.358793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10974 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:08:00.0, compute capability: 7.0)
[[22. 28.]
[49. 64.]]
Thinking maybe the warnings had something to do with it, I did give that a try.
Installing collected packages: keras
Found existing installation: Keras 2.2.2
Uninstalling Keras-2.2.2:
Successfully uninstalled Keras-2.2.2
Successfully installed keras-2.1.3
(fastai) root#607b0f29-ad6b-482c-aead-aeae0a84fe2f:~# python fine-tune.py --train_dir /root/data/train_dump/ --val_dir /root/data/val_dump/ --nb_epoch 100 --batch_size 32
Using TensorFlow backend.
Found 2976 images belonging to 4 classes.
Found 1100 images belonging to 4 classes.
2018-09-07 00:30:10.722085: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-07 00:30:11.880320: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:08:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2018-09-07 00:30:11.880388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN V, pci bus id: 0000:08:00.0, compute capability: 7.0)
Epoch 1/100
2018-09-07 00:30:51.858613: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-09-07 00:30:51.858965: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.30 Wed Jan 31 22:08:49 PST 2018
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
"""
2018-09-07 00:30:51.859055: E tensorflow/stream_executor/cuda/cuda_dnn.cc:393] possibly insufficient driver version: 390.30.0
2018-09-07 00:30:51.859099: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
Aborted (core dumped)
Related
tensorflow sticks to Creating CudaSolver handles for stream without moving forward for multiple hours when training the deep neural network.
tensorflow version:1.12
cuda version: 9.0
cudnn version: 7.3
2019-06-13 09:45:17.087403: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-06-13 09:45:18.568961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:5a:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2019-06-13 09:45:18.569067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-13 09:45:19.636474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-13 09:45:19.636549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-13 09:45:19.636578: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-13 09:45:19.636958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15117 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:5a:00.0, compute capability: 6.0)
2019-06-13 09:45:21.246683: I tensorflow/core/kernels/cuda_solvers.cc:159] Creating CudaSolver handles for stream 0x56140303bcf0
It is that complex deep neural network with a large of dataset would take a long period of time to train. After running a few hours, the model starts to update.
I have installed Python 3.5, tensorflow-gpu 1.5, CUDA v.9, cuDNN v.7 and NVIDIA AVX 5200. Although, I have installed tensorflow, bit cannot be used in Python and returns the following exception:
2019-03-31 22:27:45.285194: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2019-03-31 22:27:46.132472: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1105] Found device 0 with properties:
name: NVS 5200M major: 2 minor: 1 memoryClockRate(GHz): 1.344
pciBusID: 0000:01:00.0
totalMemory: 1.00GiB freeMemory: 823.00MiB
2019-03-31 22:27:46.132585: I C:\tf_jenkins\workspace\rel- win\M\windows-gpu\PY\35 \tensorflow\core\common_runtime\gpu\gpu_device.cc:1168] Ignoring visible gpu device (device: 0, name: NVS 5200M, pci bus id: 0000:01:00.0, compute capability: 2.1) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.0.
And when checking it as:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
I get:
name: "/device:CPU:0"
device_type: "CPU"
How do I solve this problem?
I am using tensorflow-gpu. I want to use GTX1070, but tensorflow-gpu uses my CPU. I don't know what to do.
I use CUDA 9.0 and CUDNN 7.1.4. My tensorflow-gpu version is 1.9.
After running this command on the official website
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2018-07-30 10:53:43.369025: I
T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141]
Your CPU supports instructions that this TensorFlow binary was not
compiled to use: AVX2 2018-07-30 10:53:43.829922: I
T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1392]
Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor:
1 memoryClockRate(GHz): 1.683 pciBusID: 0000:01:00.0 totalMemory:
8.00GiB freeMemory: 6.63GiB 2018-07-30 10:53:43.919043: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1392]
Found device 1 with properties: name: GeForce GTX 1050 major: 6 minor:
1 memoryClockRate(GHz): 1.455 pciBusID: 0000:05:00.0 totalMemory:
2.00GiB freeMemory: 1.60GiB 2018-07-30 10:53:43.926001: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1456]
Ignoring visible gpu device (device: 1, name: GeForce GTX 1050, pci
bus id: 0000:05:00.0, compute capability: 6.1) with Cuda
multiprocessor count: 5. The minimum required count is 8. You can
adjust this requirement with the env var
TF_MIN_GPU_MULTIPROCESSOR_COUNT. 2018-07-30 10:53:43.934810: I
T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1471]
Adding visible gpu devices: 0 2018-07-30 10:53:44.761551: I
T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:952]
Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-30 10:53:44.765678: I
T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:958]
0 1 2018-07-30 10:53:44.768363: I
T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971]
0: N N 2018-07-30 10:53:44.771773: I
T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:971]
1: N N 2018-07-30 10:53:44.774913: I
T:\src\github\tensorflow\tensorflow\coenter code
herere\common_runtime\gpu\gpu_device.cc:1084] Created TensorFlow
device (/job:localhost/replica:0/task:0/device:GPU:0 with 6395 MB
memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus
id: 0000enter code here:01:00.0, compute capability: 6.1)
As I can see from the log excerpt of your tensorflow engine - it uses GPU device 0
(/job:localhost/replica:0/task:0/device:GPU:0 with 6395 MB memory) -> physical
GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute
capability: 6.1)
But refuses to use your GeForce GTX 1050. This is possible due to the Environment variable TF_MIN_GPU_MULTIPROCESSOR_COUNT which is seems set to 8.
Try to set it to value of 5 as advised in your log earlier:
set TF_MIN_GPU_MULTIPROCESSOR_COUNT=5
If you want to be sure which device is used - initialize the session with
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
You can read more on Using GPUs tensorflow docs page
I have a fresh install of windows 10 and installed tensorflow-gpu (i think i should have done successfully) as i run the sample code, i see the gpu0 is used as follow:
>>> import tensorflow as tf
>>> # Creates a graph.
... a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
>>> b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
>>> c = tf.matmul(a, b)
>>> # Creates a session with log_device_placement set to True.
... sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2017-05-08 02:10:35.354149: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.354283: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.355376: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.355835: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.356245: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.356629: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.356977: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.357376: W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-08 02:10:35.765058: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:887] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.607
pciBusID 0000:01:00.0
Total memory: 11.00GiB
Free memory: 9.12GiB
2017-05-08 02:10:35.765151: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:908] DMA: 0
2017-05-08 02:10:35.765851: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:918] 0: Y
2017-05-08 02:10:35.780335: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0
2017-05-08 02:10:36.157808: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\direct_session.cc:257] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0
>>> # Runs the op.
... print(sess.run(c))
MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
2017-05-08 02:10:46.297244: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:841] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
b: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-05-08 02:10:46.299024: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:841] b: (Const)/job:localhost/replica:0/task:0/gpu:0
a: (Const): /job:localhost/replica:0/task:0/gpu:0
2017-05-08 02:10:46.302386: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\simple_placer.cc:841] a: (Const)/job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
But when i run the code for deep-learning, the gpu-memory is all used but the gpu-loading is almost 0. the cpu-loading is around 20% (before install tensorflow-gpu, the cpu-loading is 100%), the time for learning is a bit faster than using cpu.
what may be the cause? Please give me some advance, thank you so much
The line gpu:0 -> device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0
shows that the computation is being done on the gpu.
The reason the useage isn't too high is probably because there isn't too much processing to do relative to the power of your GPU.
I try to monitor the usage of my tensorflow models with timeline. This link explains how to use it: https://stackoverflow.com/a/37774470/6716760. The minimal example here is:
import tensorflow as tf
from tensorflow.python.client import timeline
x = tf.random_normal([1000, 1000])
y = tf.random_normal([1000, 1000])
res = tf.matmul(x, y)
# Run the graph with full trace option
with tf.Session() as sess:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess.run(res, options=run_options, run_metadata=run_metadata)
# Create the Timeline object, and write it to a json
tl = timeline.Timeline(run_metadata.step_stats)
ctf = tl.generate_chrome_trace_format()
with open('timeline.json', 'w') as f:
f.write(ctf)
Unfortunately I get the following error when I try to execute the script:
An error ocurred while starting the kernel
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:0a:00.0
Total memory: 11.90GiB
Free memory: 11.61GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x28f93b0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:09:00.0
Total memory: 11.90GiB
Free memory: 11.75GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2c976b0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:06:00.0
Total memory: 11.90GiB
Free memory: 11.75GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x2ba5d80
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:05:00.0
Total memory: 11.89GiB
Free memory: 11.52GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) ‑> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:0a:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) ‑> (device: 1, name: TITAN X (Pascal), pci bus id: 0000:09:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) ‑> (device: 2, name: TITAN X (Pascal), pci bus id: 0000:06:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) ‑> (device: 3, name: TITAN X (Pascal), pci bus id: 0000:05:00.0)
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH: /usr/local/cuda/lib64
F tensorflow/core/platform/default/gpu/cupti_wrapper.cc:59] Check failed: ::tensorflow::Status::OK() == (::tensorflow::Env::Default()‑>GetSymbolFromLibrary( GetDsoHandle(), kName, &f)) (OK vs. Not found: /home/sysgen/anaconda3/lib/python3.5/site‑packages/tensorflow/python/_pywrap_tensorflow.so: undefined symbol: cuptiActivityRegisterCallbacks)could not find cuptiActivityRegisterCallbacksin libcupti DSO
The error is hidden in the last line at the end. But what does this mean? How can I fix it?
you have to do :
sudo apt install libcupti-dev
and add this to your bashrc / zshrc :
export LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
Hope it help
It happened to me and reason was the file cupti64_80.dll that could not be found.
Cuda 8 install this file in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\extras\CUPTI\libx64 folder that is not in the path.
So copy the dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin, and the lib file to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64