Tensorflow takes >1 min on first run on video card with 5.0 compute capability - tensorflow

I'm running tensorflow 0.8.0 for python3 (pip installation), and the following file test.py:
import tensorflow as tf
a = tf.convert_to_tensor([1], dtype=tf.int32)
b = tf.to_float(a)
with tf.Session():
b.eval()
... takes more than a minute to run:
$time python3 test.py
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 860M
major: 5 minor: 0 memoryClockRate (GHz) 1.0195
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.61GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 860M, pci bus id: 0000:01:00.0)
real 1m6.985s
user 1m6.700s
sys 0m1.480s
I should mention other tensorflow programs seem to work fine, e.g.
$time python3 -m tensorflow.models.image.mnist.convolutional
takes under 4 minutes.
Edit:
$cat /usr/local/cuda/version.txt
CUDA Version 7.5.18
$ls /usr/local/cuda/lib64/libcudnn*
/usr/local/cuda/lib64/libcudnn.so /usr/local/cuda/lib64/libcudnn.so.4.0.7
/usr/local/cuda/lib64/libcudnn.so.4 /usr/local/cuda/lib64/libcudnn_static.a

I think your GPU GTX 860M is a sm_50 device. The default TensorFlow binary supports sm_35 and sm_52 by default. That means your binary only has PTX, and the Cuda runtime has to JIT them into SASS on the first run of that kernel, and that takes one minute or so. But they should be cached in later runs, unless the caching was explicitly disabled.

The first call to eval() or run() is typically much slower than subsequent calls since it needs to setup the session. Subsequent calls to eval/run are typically much faster.

Related

I'm trying to use tensorflow on my ubuntu 16.04 machine, but even after installing tensorflow gpu; I cant use gpu for my tensorflow

I'm trying to use tensorflow GPU on my ubuntu 16.04 machine. I've successfuly installed CUDA toolkit (8.0.61) and cuDNN (6.0.21). The problem is, I can't use tensorflow gpu even after this installtion process.
While importing tensorflow is not showing any import lines for tensorflow GPU.its not showing any lines while importing tensorflow, while other my other ubuntu machine it shows some lines while importing tensorflow.
Nvidia driver version and usage
There are no lines when importing TensorFlow. Only starting a sessions gives you some output:
>>> import tensorflow as tf
>>> tf.Session()
2017-09-17 20:06:20.174697: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2017-09-17 20:06:20.343531: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-09-17 20:06:20.344062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Found device 0 with properties:
name: GeForce GTX 960 major: 5 minor: 2 memoryClockRate(GHz): 1.329
pciBusID: 0000:01:00.0
totalMemory: 3.94GiB freeMemory: 3.61GiB
2017-09-17 20:06:20.344084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1055] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 960, pci bus id: 0000:01:00.0, compute capability: 5.2)
<tensorflow.python.client.session.Session object at 0x7f387d89f210>
After consulting my crystal ball, I suggest you to further check the environment variable TF_CPP_MIN_LOG_LEVEL.

Tensorflow: Inconsistent GPU recognition

I ran a check to see whether or not my Tensorflow installation is using my GPU using the example code from the Tensorflow instructions here
When I ran the code for the first time, I got this output:
$ python gpu-test.py
out:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 3.94GiB
Free memory: 3.91GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0
MatMul: (MatMul): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0/task:0/gpu:0
b: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:827] b: (Const)/job:localhost/replica:0/task:0/gpu:0
a: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:827] a: (Const)/job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
It's using the GPU, all good!
With this certainty in mind, I launch a Jupyter notebook with large CNN and train it, and it's super slow.
I'm confused and run gpu-test.py a second time. This time, even though nothing changed in the meantime, I get a different output:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: ip-172-31-19-90
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: ip-172-31-19-90
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 375.39.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
"""
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0
E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 367.57.0 does not match DSO version 375.39.0 -- cannot find working devices in this configuration
Device mapping: no known devices.
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:
MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:827] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0
b: (Const): /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:827] b: (Const)/job:localhost/replica:0/task:0/cpu:0
a: (Const): /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:827] a: (Const)/job:localhost/replica:0/task:0/cpu:0
[[ 22. 28.]
[ 49. 64.]]
I am completely confused right now.
The only two things that happened between me running the GPU test the first and the second time were (1) I unziped a file and (2) I ran said Jupyter notebook. Nothing was installed, updated, or in anyway changed about the system from my side.
Can anybody help?
How come this is happening all of a sudden when it didn't happen 5 minutes earlier:
kernel version 367.57.0 does not match DSO version 375.39.0
And how can I update the kernel version?
I've found out what happened: An automatic driver update running in the background as an unattended update tried to update the driver to version 375.39.0.
The GRID K520 GPUs on the AWS g2.2xlarge instances, however, are too old for this driver version.
The attempted automatic update leaves the system in an inconsistent state and breaks it all.
The only way for me was to launch a new AWS instance and kill the update process right after startup to keep the system intact. Very annoying issue :/.
If anyone happens to have the same problem:
Launch a fresh AWS g2 instance
SSH yourself in immediately
Display running processes by typing top into the terminal
Check if there's a busy process saying "unattended...." and if yes, copy its PID (process ID)
Kill it with kill -9 PID before it can attempt to install the update
This means you need to update your cuda driver to the latest version. Not sure where the inconsistency could be coming from.

How does one have TensorFlow not run the script unless the GPU was loaded successfully?

I have been trying to run some TensorFlow training on some machine with GPUs however, whenever I try to do so I get some type of error that seems to say it wasn't able to use the GPU for some reason (usually memory issue, or cuda issue or cudnn etc). However, since the thing TensorFlow does automatically is to just run in CPU if it can't use the GPU its been hard to tell for me if it was actually able to leverage the GPU or not. Thus, I wanted to have my script just fail/halt unless the GPU is being used. How do I do that?
For the sake of an example, currently I have the message:
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla P100-SXM2-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.4805
pciBusID 0000:85:00.0
Total memory: 15.93GiB
Free memory: 15.63GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:85:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla P100-SXM2-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.4805
pciBusID 0000:85:00.0
Total memory: 15.93GiB
Free memory: 522.25MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-SXM2-16GB, pci bus id: 0000:85:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
where it seems to load all the cuda fine but then at the end complains. The complaining lines are:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
we could try to debug these specific bug but for the moment it proceeds to train however, I have no idea if its using cpu or gpu. Can we just have it not proceed training if any weird cuda/cudnn or whatever gpu bug comes up?
Use with tf.device('/gpu:0'):. This will kill your program if /gpu:0 doesnt exist.
eg see https://github.com/hughperkins/tensorflow-cl/blob/tensorflow-cl/tensorflow/stream_executor/cl/test/test_binary_ops.py#L52
with tf.Graph().as_default():
with tf.Session(config=tf.ConfigProto(log_device_placement=False)) as sess:
with tf.device('/gpu:0'):
tf_a = tf.placeholder(tf_dtype, [None, None], 'a')
tf_b = tf.placeholder(tf_dtype, [None, None], 'b')
tf_c = tf.__dict__[tf_func](tf_a, tf_b, name="c")
You can list all available devices in tensorflow: How to get current available GPUs in tensorflow?. If GPU is not in the list, you can make the program throw exceptions.

Tensor flow - Mac GPU installation

After I run a python3 script I get the following statements and do not know where the 3 errors are coming from. I am using cudnn v5.0 but obviously I have gone wrong somewhere along the installation pipeline. Any help would be fantastic.
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.dylib locally
number of elements at final reshape = %d. 61440
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.28GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:354] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/stream_executor/cuda/cuda_dnn.cc:361] error retrieving driver version: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got ""
E tensorflow/stream_executor/cuda/cuda_dnn.cc:321] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:457] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Abort trap: 6
You probably installed the wrong cuDNN!
I had the same probablem and was able to fix mine by replacing my cuDNN with one compatiable:

TensorFlow GPU: is cudnn optional? Couldn't open CUDA library libcudnn.so

I installed the tensorflow-0.8.0 GPU version, tensorflow-0.8.0-cp27-none-linux_x86_64.whl. It says it requires CUDA toolkit 7.5 and CuDNN v4.
# Ubuntu/Linux 64-bit, GPU enabled. Requires CUDA toolkit 7.5 and CuDNN v4. For
# other versions, see "Install from sources" below.
However, I accidently forget to install CuDNN v4, but it works OK besides the error message, "Couldn't open CUDA library libcudnn.so". But it works and says, "Creating TensorFlow device (/gpu:0)".
msg without CuDNN
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:99] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH: /usr/local/cuda/lib64:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:1562] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
('Extracting', 'MNIST_data/train-images-idx3-ubyte.gz')
/usr/lib/python2.7/gzip.py:268: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
chunk = self.extrabuf[offset: offset + size]
/home/ubuntu/TensorFlow-Tutorials/input_data.py:42: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
data = data.reshape(num_images, rows, cols, 1)
('Extracting', 'MNIST_data/train-labels-idx1-ubyte.gz')
('Extracting', 'MNIST_data/t10k-images-idx3-ubyte.gz')
('Extracting', 'MNIST_data/t10k-labels-idx1-ubyte.gz')
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 4.00GiB
Free memory: 3.95GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1704 get requests, put_count=1321 evicted_count=1000 eviction_rate=0.757002 and unsatisfied allocation rate=0.870305
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1704 get requests, put_count=1812 evicted_count=1000 eviction_rate=0.551876 and unsatisfied allocation rate=0.536972
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 256 to 281
Later, I installed CuDNN, but I don't see the differences.
msg with CuDNN
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
('Extracting', 'MNIST_data/train-images-idx3-ubyte.gz')
/usr/lib/python2.7/gzip.py:268: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
chunk = self.extrabuf[offset: offset + size]
/home/ubuntu/TensorFlow-Tutorials/input_data.py:42: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
data = data.reshape(num_images, rows, cols, 1)
('Extracting', 'MNIST_data/train-labels-idx1-ubyte.gz')
('Extracting', 'MNIST_data/t10k-images-idx3-ubyte.gz')
('Extracting', 'MNIST_data/t10k-labels-idx1-ubyte.gz')
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 4.00GiB
Free memory: 3.95GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1704 get requests, put_count=1321 evicted_count=1000 eviction_rate=0.757002 and unsatisfied allocation rate=0.870305
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1704 get requests, put_count=1811 evicted_count=1000 eviction_rate=0.552181 and unsatisfied allocation rate=0.537559
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 256 to 281
So what's differences with/without CuDNN?
cuDNN is used to speedup a few TensorFlow operations such as the convolution. I noticed in your log file that you're training on the MNIST dataset. The reference MNIST model provided with TensorFlow is built around 2 fully connected layers and a softmax. Therefore TensorFlow won't attempt to call cuDNN when training this model.
I'm not sure that TensorFlow will automatically fallback to a slower convolution algorithm when cuDNN isn't available. If it doesn't you can always disable the use of cuDNN by setting the TF_USE_CUDNN environment variable to 0 before running TensorFlow.
solution when you work with MNIST dataset and if you get CUDNN related errors, try this
import sys
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
then continue with your code
model.fit(training_images, training_labels, epochs=10, callbacks=[callbacks])
and fitting should work out perfectly without any errors/exceptions