CUDA_ERROR_OUT_OF_MEMORY in tensorflow - tensorflow

When I started to train some neural network, it met the CUDA_ERROR_OUT_OF_MEMORY but the training could go on without error. Because I wanted to use gpu memory as it really needs, so I set the gpu_options.allow_growth = True.The logs are as follows:
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device:0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Iter 20, Minibatch Loss= 40491.636719
...
And after using nvidia-smi command, it gets:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27 Driver Version: 367.27
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:01:00.0 Off | N/A |
| 40% 61C P2 46W / 180W | 8107MiB / 8111MiB | 96% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A |
| 0% 40C P0 40W / 180W | 0MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
│
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 22932 C python 8105MiB |
+-----------------------------------------------------------------------------+
After I commented the gpu_options.allow_growth = True, I trained the net again and everything was normal. There was no the problem of CUDA_ERROR_OUT_OF_MEMORY. Finally, ran the nvidia-smi command, it gets:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.27 Driver Version: 367.27
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M.
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 0000:01:00.0 Off | N/A |
| 40% 61C P2 46W / 180W | 7793MiB / 8111MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 0000:02:00.0 Off | N/A |
| 0% 40C P0 40W / 180W | 0MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
│
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 22932 C python 7791MiB |
+-----------------------------------------------------------------------------+
I have two questions about it. Why did the CUDA_OUT_OF_MEMORY come out and the procedure went on normally? why did the memory usage become smaller after commenting allow_growth = True.

In case it's still relevant for someone, I encountered this issue when trying to run Keras/Tensorflow for the second time, after a first run was aborted. It seems the GPU memory is still allocated, and therefore cannot be allocated again. It was solved by manually ending all python processes that use the GPU, or alternatively, closing the existing terminal and running again in a new terminal window.

By default, tensorflow try to allocate a fraction per_process_gpu_memory_fraction of the GPU memory to his process to avoid costly memory management. (See the GPUOptions comments).
This can fail and raise the CUDA_OUT_OF_MEMORY warnings.
I do not know what is the fallback in this case (either using CPU ops or a allow_growth=True).
This can happen if an other process uses the GPU at the moment (If you launch two process running tensorflow for instance).
The default behavior takes ~95% of the memory (see this answer).
When you use allow_growth = True, the GPU memory is not preallocated and will be able to grow as you need it. This will lead to smaller memory usage (as the default option is to use the whole memory) but decreases the perfomances if not use properly as it requires a more complex handeling of the memory (which is not the most efficient part of CPU/GPU interactions).

I faced this issue when trying to train model back to back. I figured that the GPU memory wasn't available due to previous training run. So I found the easiest way would be to manually flush the GPU memory before every next training.
Use nvidia-smi to check the GPU memory usage:
nvidia-smi
nvidia-smi --gpu-reset
The above command may not work if other processes are actively using the GPU.
Alternatively you can use the following command to list all the processes that are using GPU:
sudo fuser -v /dev/nvidia*
And the output should look like this:
USER PID ACCESS COMMAND
/dev/nvidia0: root 2216 F...m Xorg
sid 6114 F...m krunner
sid 6116 F...m plasmashell
sid 7227 F...m akonadi_archive
sid 7239 F...m akonadi_mailfil
sid 7249 F...m akonadi_sendlat
sid 18120 F...m chrome
sid 18163 F...m chrome
sid 24154 F...m code
/dev/nvidiactl: root 2216 F...m Xorg
sid 6114 F...m krunner
sid 6116 F...m plasmashell
sid 7227 F...m akonadi_archive
sid 7239 F...m akonadi_mailfil
sid 7249 F...m akonadi_sendlat
sid 18120 F...m chrome
sid 18163 F...m chrome
sid 24154 F...m code
/dev/nvidia-modeset: root 2216 F.... Xorg
sid 6114 F.... krunner
sid 6116 F.... plasmashell
sid 7227 F.... akonadi_archive
sid 7239 F.... akonadi_mailfil
sid 7249 F.... akonadi_sendlat
sid 18120 F.... chrome
sid 18163 F.... chrome
sid 24154 F.... code
From here, I got the PID for the process which was holding the GPU memory, which in my case is 24154.
Use the following command to kill the process by its PID:
sudo kill -9 MY_PID
Replace MY_PID with the relevant PID.

Tensorflow 2.0 alpha
The problem is, that Tensorflow is greedy in allocating all available VRAM. That causes issues for some people.
For Tensorflow 2.0 alpha / nightly use this:
import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4)
Source: https://www.tensorflow.org/alpha/guide/using_gpu

I was experienced memory error in Ubuntu 18.10.
When i changed resolution of my monitor from 4k to fullhd (1920-1080) memory available become 438mb and neural network training started.
I was really surprised by this behavior.
By the way, i have Nvidia 1080 with 8gb memory, still dont know why only 400mb available

Environment:
1.CUDA 10.0
2.cuNDD 10.0
3.tensorflow 1.14.0
4.pip install opencv-contrib-python
5.git clone https://github.com/thtrieu/darkflow
6.Allowing GPU memory growth
Reference

fuser -k /dev/nvidia[0]
Worked for me.
Thanks to https://forums.developer.nvidia.com/t/11-gb-of-gpu-ram-used-and-no-process-listed-by-nvidia-smi/44459/16

Check the correctness of the input dataset.
İf you have a null input list may occur this error too.
The situation that I faced in Colab with tf.keras

Related

Stuck with enabling GPUs for Tensorflow in WSL2 under Windows 10

I can't get Tensorflow 2 to use my GPUs under WSL2. I am aware of this question, but GPU support is now (supposedly) no longer experimental.
Windows is on the required 21H2 version, which should support the WSL2 GPU connection.
Windows 10 Pro, 21H2, build 19044.1706
The PC has two GPUs:
GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-19c8549a-4b8d-5d70-456b-776ceece4b0f)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-2a946756-0472-fb90-f1a4-b40cce1bba4f)
I had installed Ubuntu under WSL2 some time ago:
PS C:\Users\jem-m> wsl --status
Default Distribution: Ubuntu-20.04
Default Version: 2
...
Kernel version: 5.10.16
In the Windows PowerShell, I can run nvidia-smi.exe, which gives me
PS C:\Users\jem-m> nvidia-smi.exe
Mon May 16 18:13:27 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.77 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:08:00.0 On | N/A |
| 23% 31C P8 10W / 250W | 753MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... WDDM | 00000000:41:00.0 Off | N/A |
| 23% 31C P8 12W / 250W | 753MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
while the nvidia-smi in the WSL2 Ubuntu shell gives
(testenv) jem-mosig:~/ $ nvidia-smi [17:48:30]
Mon May 16 17:49:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:08:00.0 On | N/A |
| 23% 34C P8 10W / 250W | 784MiB / 11264MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 23% 34C P8 13W / 250W | 784MiB / 11264MiB | 12% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Note the same driver and CUDA version, but different NVIDIA-SMI version.
This seems to indicate that CUDA works under WSL2 as it is supposed to. But when I run
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
# 2022-05-17 12:13:05.016328: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
# []
in python inside WSL2 I get [], so no GPU is recognized by Tensorflow. This is Python 3.8.0 and Tensorflow 2.4.1 freshly installed in a new Miniconda environment inside Ubuntu WSL2. I don't know what is going wrong. Any suggestions?
Addendum
I don't get any error messages when importing Tensorflow. But some warnings are produced when working with it. E.g., when I run
import tensorflow as tf
print(tf.__version__)
model = tf.keras.Sequential([tf.keras.layers.Dense(3)])
model.compile(loss="mse")
print(model.predict([[0.]]))
I get
2.4.1
2022-05-17 10:38:28.792209: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 10:38:28.792411: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-17 10:38:28.794356: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-05-17 10:38:28.853557: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-05-17 10:38:28.860126: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3792975000 Hz
[[0. 0. 0.]]
These don't seem to be GPU related, though.
Dr. Snoopy got me onto the right track: Despite the fact that the TF website says that
The TensorFlow pip package includes GPU support for CUDA®-enabled cards
, I still needed to run conda install tensorflow-gpu and it worked! Now
import tensorflow as tf
from tensorflow.python.client import device_lib
print("devices: ", [d.name for d in device_lib.list_local_devices()])
print("GPUs: ", tf.config.list_physical_devices('GPU'))
print("TF v.: ", tf.__version__)
gives lots of debug messages and
devices: ['/device:CPU:0', '/device:GPU:0', '/device:GPU:1']
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
TF v.: 2.4.1

Nvidia GeForce 210 compute issue on Ubuntu 18.04

I am using ubuntu 18.04 (I have dual booted windows with ubuntu 18.04).
nvidia-smi
This is the output I got when I ran the above command on my ubuntu(18.04) terminal:
Fri Oct 9 09:33:56 2020
+------------------------------------------------------+
| NVIDIA-SMI 340.108 Driver Version: 340.108 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 210 Off | 0000:01:00.0 N/A | N/A |
| 35% 52C P8 N/A / N/A | 368MiB / 1023MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
Before that, I followed these steps to install required driver on my system:
sudo add-apt-repository --remove ppa:graphics-drivers/ppa
sudo apt-get purge nvidia*
sudo apt autoremove
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo shutdown -r now
When I tried to run Geekbench5 compute benchmark test, the output stopped when it was running Histogram Equalization. This is the output when I ran this ./geekbench5 --compute OpenCL in the folder where I extracted geekbench5:
[1009/092949:FATAL:src/halogen/cuda/cuda_library.cpp(1481)] Failed to load
cuDevicePrimaryCtxRetain: /usr/lib/x86_64-linux-gnu/libcuda.so.1: undefined symbol: cuDevicePrimaryCtxRetain
[1009/092949:FATAL:src/halogen/cuda/cuda_library.cpp(1481)] Failed to load cuDevicePrimaryCtxRetain: /usr/lib/x86_64-linux-gnu/libcuda.so.1: undefined symbol: cuDevicePrimaryCtxRetain
Geekbench 5.2.4 Tryout : https://www.geekbench.com/
Geekbench 5 is in tryout mode.
Geekbench 5 requires an active Internet connection when in tryout mode, and
automatically uploads test results to the Geekbench Browser. Other features
are unavailable in tryout mode.
Buy a Geekbench 5 license to enable offline use and remove the limitations of
tryout mode.
If you would like to purchase Geekbench you can do so online:
https://store.primatelabs.com/v5
If you have already purchased Geekbench, enter your email address and license
key from your email receipt with the following command line:
./geekbench5 -r <email address> <license key>
Running Gathering system information
System Information
Operating System Ubuntu 18.04.5 LTS 4.15.0-118-generic x86_64
Model To be filled by O.E.M. To be filled by O.E.M.
Motherboard O.E.M Intel H81
BIOS American Megatrends Inc. 4.6.5
Processor Information
Name Intel Core i5-4460
Topology 1 Processor, 4 Cores
Identifier GenuineIntel Family 6 Model 60 Stepping 3
Base Frequency 3.20 GHz
L1 Instruction Cache 32.0 KB x 2
L1 Data Cache 32.0 KB x 2
L2 Cache 256 KB x 2
L3 Cache 6.00 MB
Memory Information
Size 7.75 GB
OpenCL Information
Platform Vendor NVIDIA Corporation
Platform Name NVIDIA CUDA
Device Vendor NVIDIA Corporation
Device Name GeForce 210
Device Driver Version 340.108
Maximum Frequency 1.23 GHz
Compute Units 2
Device Memory 1024 MB
OpenCL
Running Sobel
Running Canny
Running Stereo Matching
Running Histogram Equalization
[1009/093329:ERROR:src/interface/console/consolemain.cpp(808)] Geekbench encountered an internal error and cannot continue. Please contact support#primatelabs.com for assistance.
Internal error message: clCreateImage returned -40.
Also, when I tried running the geekbench5 compute benchmark test on windows 10(same machine, on GUI), it paused running at Histogram equalization.
I am not getting any idea why this is happening.Is anything really wrong with my GPU or driver or anything else? I tried to search online, installed the driver again,rebooted the system, but the results are same. Can someone please help?
Your driver installation is fine, but your GPU is 11 years old and does not support some of the more recent features of the OpenCL standard. The geekbench error message -40 means that the image size geekbench uses for one of its benchmarks is not supported by your GPU. This causes the benchmark to crash. Maybe an older version of geekbench still works.

Tensorflow seems to be using two GPUs but one GPU seems not be doing anything

I just build a system with two GTX 680 GPUs. To test my system I'm running cifar10_multi_gpu_train.py, training CIFAR10 using Tensorflow.
Tensorflow creates two Tensorflow devices based on the GPUs (last two lines):
$ python tutorials/image/cifar10/cifar10_multi_gpu_train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 680
major: 3 minor: 0 memoryClockRate (GHz) 1.15
pciBusID 0000:01:00.0
Total memory: 3.94GiB
Free memory: 3.15GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x28eb270
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 680
major: 3 minor: 0 memoryClockRate (GHz) 1.15
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.90GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 680, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 680, pci bus id: 0000:03:00.0)
However, when monitoring the GPUs during training (using watch -n 1 nvidia-smi), I noticed that the second GPU isn't getting hot at all (71 degrees for GPU0 vs 30 degrees for GPU1):
Every 1,0s: nvidia-smi Mon Apr 24 01:30:40 2017
Mon Apr 24 01:30:40 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 Off | 0000:01:00.0 N/A | N/A |
| 43% 71C P0 N/A / N/A | 3947MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 680 Off | 0000:03:00.0 N/A | N/A |
| 30% 30C P8 N/A / N/A | 3737MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
Also note here that the memory of both GPUs are completely allocated.
Why is my second GPU not used?
Ok, I should have taken more time in reading the script:
tf.app.flags.DEFINE_integer('num_gpus', 1,
"""How many GPUs to use.""")
I just set this to two, and everything works just fine:
Every 1,0s: nvidia-smi Mon Apr 24 02:44:30 2017
Mon Apr 24 02:44:30 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 Off | 0000:01:00.0 N/A | N/A |
| 37% 63C P0 N/A / N/A | 3807MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 680 Off | 0000:03:00.0 N/A | N/A |
| 36% 61C P0 N/A / N/A | 3807MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
I would have expected that the script would automatically use all the GPUs available.
Getting around 2450 examples/sec, 0.051 sec/batch with cifar10_multi_gpu_train.py.

How to verify the usage of the GPU?

How can I verify that CNTK is using the GPU? I have installed the CNTK-2-0-beta7-0-Windows-64bit-GPU-1bit-SGD binaries on my machine. But, when I try to run this from Python:
from cntk.device import set_default_device, gpu
set_default_device(gpu(0))
I get:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-eca77b3090be> in <module>()
1 from cntk.device import set_default_device, gpu
----> 2 set_default_device(gpu(0))
C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\device.py in gpu(device_id)
74 :class:`~cntk.device.DeviceDescriptor`: GPU device descriptor
75 '''
---> 76 return cntk_py.DeviceDescriptor.gpu_device(device_id)
77
78 def use_default_device():
ValueError: Specified GPU device id (0) is invalid.
Adding some more information today:
This is the result from running NVidia_smi.exe
C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe
Thu Jan 12 20:38:30 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 369.61 Driver Version: 369.61 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GPU WDDM | 0000:01:00.0 Off | N/A |
| N/A 51C P0 2W / N/A | 864MiB / 1024MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
After restarting the kernel in a Jupyter Notebook, I get:
import cntk as C
if C.device.default().type() == 0:
print('running on CPU')
else:
print('running on GPU')
running on CPU
However today I was able to run:
from cntk.device import set_default_device, gpu
set_default_device(gpu(0))
import cntk as C
if C.device.default().type() == 0:
print('running on CPU')
else:
print('running on GPU')
running on GPU
Should the GPU be the default on a GPU machine, or do you need to explicitly set it?
This sounds like an intermittent failure. This can happen on some laptops such as the Surface Book which have two GPUs, one from NVIDIA and an integrated one, and the laptop has shutdown the NVIDIA GPU to conserve energy, e.g. when it is running on battery.
Regarding default behavior, by default CNTK will choose the best available device and if it is a GPU it will lock it so no other process can use it. If you explicitly use set_default_device(gpu(0)) then the GPU won't get locked and other processes can use it.

Tensorflow not using GPU (according to TensorBoard)

edit : GTX 1070, ubuntu 16.04, git hash :
3b75eb34ea2c4982fb80843be089f02d430faade
I am retraining inception model on my own data. Everything is fine until the final command :
bazel-bin/inception/flowers_train \
--config=cuda \
--train_dir="${TRAIN_DIR}" \
--data_dir="${OUTPUT_DIRECTORY}" \
--pretrained_model_checkpoint_path="${MODEL_PATH}" \
--fine_tune=True \
--initial_learning_rate=0.001 \
--input_queue_memory_factor=1
According to the logs, Tensorflow seems to be using the GPU :
I tensorflow/core/common_runtime/gpu/gpu_device.cc:951] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.7715
pciBusID 0000:03:00.0
Total memory: 7.92GiB
Free memory: 7.77GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:972] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:03:00.0)
But when I am checking the learning in TensorBoard, the net is using mainly the CPU (blue /device:CPU:0, green /device:GPU:0):
TensorBoard graph:
I have tried this two TensorFlow setups :
Install from the source with nvidia-367 drivers, CUDA8 8.0, cuDNN
v5, source from the master (16/10/06 - r11?). compiled for GPU
use:
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
docker GPU image of Tensorflow on a PC with a GTX
1070 8Go
nvidia-docker run -it -p 8888:8888 -p 6006:6006 gcr.io/tensorflow/tensorflow:latest-gpu /bin/bash
Any help ?
According to this issue , the inception 'tower' is where the bulk of the work is being performed. So it seems mostly fine.
Except there is still something weird.
Running watch nvidia-smi gives :
Mon Oct 10 10:31:04 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 0000:03:00.0 On | N/A |
| 29% 57C P2 41W / 230W | 7806MiB / 8113MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1082 G /usr/lib/xorg/Xorg 69MiB |
| 0 3082 C /usr/bin/python 7729MiB |
+-----------------------------------------------------------------------------+
While top gives :
PID UTIL. PR NI VIRT RES SHR S %CPU %MEM TEMPS+ COM.
3082 root 20 0 26,739g 3,469g 1,657g S 101,3 59,7 7254:50 python
GPU seems to be ignored...