Tensorflow not using GPU (according to TensorBoard)

Tensorflow not using GPU (according to TensorBoard) - tensorflow

edit : GTX 1070, ubuntu 16.04, git hash :
3b75eb34ea2c4982fb80843be089f02d430faade
I am retraining inception model on my own data. Everything is fine until the final command :
bazel-bin/inception/flowers_train \
--config=cuda \
--train_dir="${TRAIN_DIR}" \
--data_dir="${OUTPUT_DIRECTORY}" \
--pretrained_model_checkpoint_path="${MODEL_PATH}" \
--fine_tune=True \
--initial_learning_rate=0.001 \
--input_queue_memory_factor=1
According to the logs, Tensorflow seems to be using the GPU :
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7715
pciBusID 0000:03:00.0
Total memory: 7.92GiB
Free memory: 7.77GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:03:00.0)
But when I am checking the learning in TensorBoard, the net is using mainly the CPU (blue /device:CPU:0, green /device:GPU:0):
TensorBoard graph:
I have tried this two TensorFlow setups :
Install from the source with nvidia-367 drivers, CUDA8 8.0, cuDNN
v5, source from the master (16/10/06 - r11?). compiled for GPU
use:
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
docker GPU image of Tensorflow on a PC with a GTX
1070 8Go
nvidia-docker run -it -p 8888:8888 -p 6006:6006 gcr.io/tensorflow/tensorflow:latest-gpu /bin/bash
Any help ?

According to this issue , the inception 'tower' is where the bulk of the work is being performed. So it seems mostly fine.
Except there is still something weird.
Running watch nvidia-smi gives :
Mon Oct 10 10:31:04 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 0000:03:00.0 On | N/A |
| 29% 57C P2 41W / 230W | 7806MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1082 G /usr/lib/xorg/Xorg 69MiB |
| 0 3082 C /usr/bin/python 7729MiB |
+-----------------------------------------------------------------------------+
While top gives :
PID UTIL. PR NI VIRT RES SHR S %CPU %MEM TEMPS+ COM.
3082 root 20 0 26,739g 3,469g 1,657g S 101,3 59,7 7254:50 python
GPU seems to be ignored...

Related

Stuck with enabling GPUs for Tensorflow in WSL2 under Windows 10

I can't get Tensorflow 2 to use my GPUs under WSL2. I am aware of this question, but GPU support is now (supposedly) no longer experimental.
Windows is on the required 21H2 version, which should support the WSL2 GPU connection.
Windows 10 Pro, 21H2, build 19044.1706
The PC has two GPUs:
GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-19c8549a-4b8d-5d70-456b-776ceece4b0f)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-2a946756-0472-fb90-f1a4-b40cce1bba4f)
I had installed Ubuntu under WSL2 some time ago:
PS C:\Users\jem-m> wsl --status
Default Distribution: Ubuntu-20.04
Default Version: 2
...
Kernel version: 5.10.16
In the Windows PowerShell, I can run nvidia-smi.exe, which gives me
PS C:\Users\jem-m> nvidia-smi.exe
Mon May 16 18:13:27 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.77 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:08:00.0 On | N/A |
| 23% 31C P8 10W / 250W | 753MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... WDDM | 00000000:41:00.0 Off | N/A |
| 23% 31C P8 12W / 250W | 753MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
while the nvidia-smi in the WSL2 Ubuntu shell gives
(testenv) jem-mosig:~/ $ nvidia-smi [17:48:30]
Mon May 16 17:49:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:08:00.0 On | N/A |
| 23% 34C P8 10W / 250W | 784MiB / 11264MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 23% 34C P8 13W / 250W | 784MiB / 11264MiB | 12% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Note the same driver and CUDA version, but different NVIDIA-SMI version.
This seems to indicate that CUDA works under WSL2 as it is supposed to. But when I run
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
# 2022-05-17 12:13:05.016328: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
# []
in python inside WSL2 I get [], so no GPU is recognized by Tensorflow. This is Python 3.8.0 and Tensorflow 2.4.1 freshly installed in a new Miniconda environment inside Ubuntu WSL2. I don't know what is going wrong. Any suggestions?
Addendum
I don't get any error messages when importing Tensorflow. But some warnings are produced when working with it. E.g., when I run
import tensorflow as tf
print(tf.__version__)
model = tf.keras.Sequential([tf.keras.layers.Dense(3)])
model.compile(loss="mse")
print(model.predict([[0.]]))
I get
2.4.1
2022-05-17 10:38:28.792209: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 10:38:28.792411: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-17 10:38:28.794356: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-05-17 10:38:28.853557: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-05-17 10:38:28.860126: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3792975000 Hz
[[0. 0. 0.]]
These don't seem to be GPU related, though.

Dr. Snoopy got me onto the right track: Despite the fact that the TF website says that
The TensorFlow pip package includes GPU support for CUDA®-enabled cards
, I still needed to run conda install tensorflow-gpu and it worked! Now
import tensorflow as tf
from tensorflow.python.client import device_lib
print("devices: ", [d.name for d in device_lib.list_local_devices()])
print("GPUs: ", tf.config.list_physical_devices('GPU'))
print("TF v.: ", tf.__version__)
gives lots of debug messages and
devices: ['/device:CPU:0', '/device:GPU:0', '/device:GPU:1']
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
TF v.: 2.4.1

TensorFlow not loading cuDNN

I have finally managed to get CUDA to work on a Microsoft Azure server with a Kesla T80. Now I need to get cuDNN to work, but TensorFlow won't load it.
This is the message from TensorFlow:
>>> import tensorflow as tf
>>> tf.Session()
2017-04-27 13:05:51.476251: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:51.476306: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:51.476338: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:51.476366: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:51.476394: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-04-27 13:05:58.164781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID ad52:00:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-04-27 13:05:58.164822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-04-27 13:05:58.164835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-04-27 13:05:58.164853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: ad52:00:00.0)
<tensorflow.python.client.session.Session object at 0x7fc3c76c0050>
So I see there is no cuDNN libraries being loaded.
I have the proper files in /cuda-8.0/include/ and /cuda-8.0/lib64/
$ ls /usr/local/cuda-8.0/include/ | grep "cudnn"
cudnn.h
$ ls /usr/local/cuda-8.0/lib64/ | grep "cudnn"
libcudnn.so
libcudnn.so.5
libcudnn.so.5.1.10
libcudnn_static.a
My ~/.bashrc file has the proper paths
export CUDA_HOME=/usr/local/cuda8.0
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
EDIT
Changed the .bashrcto:
export CUDA_HOME=/usr/local/cuda-8.0
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
export PATH=${CUDA_HOME}/include:${PATH}
Still no luck.
Output from nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | AD52:00:00.0 Off | 0 |
| N/A 71C P0 61W / 149W | 0MiB / 11439MiB | 24% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I'm using tensorflow version 1.1.0, Ubuntu 16.04 and CUDA 8.0.
EDIT
So I just tried to delete the cudnn files and load tensorflow, which gave me an error. Something along couldn't finde libcuddn.so.5. So I think it loads it, but I was of the impression that TensorFlow will write something along with "libcuddn.so loaded successfully" if it used cuDNN.

Tensorflow seems to be using two GPUs but one GPU seems not be doing anything

I just build a system with two GTX 680 GPUs. To test my system I'm running cifar10_multi_gpu_train.py, training CIFAR10 using Tensorflow.
Tensorflow creates two Tensorflow devices based on the GPUs (last two lines):
$ python tutorials/image/cifar10/cifar10_multi_gpu_train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 680
major: 3 minor: 0 memoryClockRate (GHz) 1.15
pciBusID 0000:01:00.0
Total memory: 3.94GiB
Free memory: 3.15GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x28eb270
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 680
major: 3 minor: 0 memoryClockRate (GHz) 1.15
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.90GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 680, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 680, pci bus id: 0000:03:00.0)
However, when monitoring the GPUs during training (using watch -n 1 nvidia-smi), I noticed that the second GPU isn't getting hot at all (71 degrees for GPU0 vs 30 degrees for GPU1):
Every 1,0s: nvidia-smi Mon Apr 24 01:30:40 2017
Mon Apr 24 01:30:40 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 Off | 0000:01:00.0 N/A | N/A |
| 43% 71C P0 N/A / N/A | 3947MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 680 Off | 0000:03:00.0 N/A | N/A |
| 30% 30C P8 N/A / N/A | 3737MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
Also note here that the memory of both GPUs are completely allocated.
Why is my second GPU not used?

Ok, I should have taken more time in reading the script:
tf.app.flags.DEFINE_integer('num_gpus', 1,
"""How many GPUs to use.""")
I just set this to two, and everything works just fine:
Every 1,0s: nvidia-smi Mon Apr 24 02:44:30 2017
Mon Apr 24 02:44:30 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 Off | 0000:01:00.0 N/A | N/A |
| 37% 63C P0 N/A / N/A | 3807MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 680 Off | 0000:03:00.0 N/A | N/A |
| 36% 61C P0 N/A / N/A | 3807MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
I would have expected that the script would automatically use all the GPUs available.
Getting around 2450 examples/sec, 0.051 sec/batch with cifar10_multi_gpu_train.py.

How to verify Tensorflow Serving is using GPUs on a GPU instance?

While running Tensorflow Serving how to verify it uses GPUs for serving?
Configured Tensorflow to use GPUs during ./configure.
Tried monitoring nvidia-smi while running inference it shows no running process found.

First， of course you need to configure to use cuda when ./configure
Second, you should compile tf serving using
bazel build -c opt --config=cuda tensorflow/...
and
bazel build -c opt --config=cuda --spawn_strategy=standalone //tensorflow_serving/model_servers:tensorflow_model_server
Lastly you can see the information when you serve the model if serving with GPU:
I
external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:965]
Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor:
1 memoryClockRate(GHz): 1.721 pciBusID: 0000:01:00.0 totalMemory:
7.92GiB freeMemory: 7.76GiB
I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1055]
Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
and check the nvidia-smi at the same time
+----------------------------------------------------------------------------------------------------------------------------+ | Processes: ______________________________________________________GPU Memory |
| GPU _______ PID ______ Type _______ Process name _______________________ Usage |
|======================================================================---|
| 0 _________1215________G ________/usr/lib/xorg/Xorg_______________________59MiB | | 0 _________ 7341_______ C ___ ...ing/model_servers/tensorflow_model_server __7653MiB |
+----------------------------------------------------------------------------------------------------------------------------+
GPU is consumed a lot.

CUDA_ERROR_OUT_OF_MEMORY in tensorflow

When I started to train some neural network, it met the CUDA_ERROR_OUT_OF_MEMORY but the training could go on without error. Because I wanted to use gpu memory as it really needs, so I set the gpu_options.allow_growth = True.The logs are as follows:
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device:0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Iter 20, Minibatch Loss= 40491.636719
...
And after using nvidia-smi command, it gets:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27 Driver Version: 367.27
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:01:00.0 Off | N/A |
| 40% 61C P2 46W / 180W | 8107MiB / 8111MiB | 96% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A |
| 0% 40C P0 40W / 180W | 0MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
│
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 22932 C python 8105MiB |
+-----------------------------------------------------------------------------+
After I commented the gpu_options.allow_growth = True, I trained the net again and everything was normal. There was no the problem of CUDA_ERROR_OUT_OF_MEMORY. Finally, ran the nvidia-smi command, it gets:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27 Driver Version: 367.27
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:01:00.0 Off | N/A |
| 40% 61C P2 46W / 180W | 7793MiB / 8111MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A |
| 0% 40C P0 40W / 180W | 0MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
│
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 22932 C python 7791MiB |
+-----------------------------------------------------------------------------+
I have two questions about it. Why did the CUDA_OUT_OF_MEMORY come out and the procedure went on normally? why did the memory usage become smaller after commenting allow_growth = True.

In case it's still relevant for someone, I encountered this issue when trying to run Keras/Tensorflow for the second time, after a first run was aborted. It seems the GPU memory is still allocated, and therefore cannot be allocated again. It was solved by manually ending all python processes that use the GPU, or alternatively, closing the existing terminal and running again in a new terminal window.

By default, tensorflow try to allocate a fraction per_process_gpu_memory_fraction of the GPU memory to his process to avoid costly memory management. (See the GPUOptions comments).
This can fail and raise the CUDA_OUT_OF_MEMORY warnings.
I do not know what is the fallback in this case (either using CPU ops or a allow_growth=True).
This can happen if an other process uses the GPU at the moment (If you launch two process running tensorflow for instance).
The default behavior takes ~95% of the memory (see this answer).
When you use allow_growth = True, the GPU memory is not preallocated and will be able to grow as you need it. This will lead to smaller memory usage (as the default option is to use the whole memory) but decreases the perfomances if not use properly as it requires a more complex handeling of the memory (which is not the most efficient part of CPU/GPU interactions).

I faced this issue when trying to train model back to back. I figured that the GPU memory wasn't available due to previous training run. So I found the easiest way would be to manually flush the GPU memory before every next training.
Use nvidia-smi to check the GPU memory usage:
nvidia-smi
nvidia-smi --gpu-reset
The above command may not work if other processes are actively using the GPU.
Alternatively you can use the following command to list all the processes that are using GPU:
sudo fuser -v /dev/nvidia*
And the output should look like this:
USER PID ACCESS COMMAND
/dev/nvidia0: root 2216 F...m Xorg
sid 6114 F...m krunner
sid 6116 F...m plasmashell
sid 7227 F...m akonadi_archive
sid 7239 F...m akonadi_mailfil
sid 7249 F...m akonadi_sendlat
sid 18120 F...m chrome
sid 18163 F...m chrome
sid 24154 F...m code
/dev/nvidiactl: root 2216 F...m Xorg
sid 6114 F...m krunner
sid 6116 F...m plasmashell
sid 7227 F...m akonadi_archive
sid 7239 F...m akonadi_mailfil
sid 7249 F...m akonadi_sendlat
sid 18120 F...m chrome
sid 18163 F...m chrome
sid 24154 F...m code
/dev/nvidia-modeset: root 2216 F.... Xorg
sid 6114 F.... krunner
sid 6116 F.... plasmashell
sid 7227 F.... akonadi_archive
sid 7239 F.... akonadi_mailfil
sid 7249 F.... akonadi_sendlat
sid 18120 F.... chrome
sid 18163 F.... chrome
sid 24154 F.... code
From here, I got the PID for the process which was holding the GPU memory, which in my case is 24154.
Use the following command to kill the process by its PID:
sudo kill -9 MY_PID
Replace MY_PID with the relevant PID.

Tensorflow 2.0 alpha
The problem is, that Tensorflow is greedy in allocating all available VRAM. That causes issues for some people.
For Tensorflow 2.0 alpha / nightly use this:
import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4)
Source: https://www.tensorflow.org/alpha/guide/using_gpu

I was experienced memory error in Ubuntu 18.10.
When i changed resolution of my monitor from 4k to fullhd (1920-1080) memory available become 438mb and neural network training started.
I was really surprised by this behavior.
By the way, i have Nvidia 1080 with 8gb memory, still dont know why only 400mb available

Environment:
1.CUDA 10.0
2.cuNDD 10.0
3.tensorflow 1.14.0
4.pip install opencv-contrib-python
5.git clone https://github.com/thtrieu/darkflow
6.Allowing GPU memory growth
Reference

fuser -k /dev/nvidia[0]
Worked for me.
Thanks to https://forums.developer.nvidia.com/t/11-gb-of-gpu-ram-used-and-no-process-listed-by-nvidia-smi/44459/16

Check the correctness of the input dataset.
İf you have a null input list may occur this error too.
The situation that I faced in Colab with tf.keras

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas