Stuck with enabling GPUs for Tensorflow in WSL2 under Windows 10 - tensorflow

I can't get Tensorflow 2 to use my GPUs under WSL2. I am aware of this question, but GPU support is now (supposedly) no longer experimental.
Windows is on the required 21H2 version, which should support the WSL2 GPU connection.
Windows 10 Pro, 21H2, build 19044.1706
The PC has two GPUs:
GPU 0: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-19c8549a-4b8d-5d70-456b-776ceece4b0f)
GPU 1: NVIDIA GeForce GTX 1080 Ti (UUID: GPU-2a946756-0472-fb90-f1a4-b40cce1bba4f)
I had installed Ubuntu under WSL2 some time ago:
PS C:\Users\jem-m> wsl --status
Default Distribution: Ubuntu-20.04
Default Version: 2
...
Kernel version: 5.10.16
In the Windows PowerShell, I can run nvidia-smi.exe, which gives me
PS C:\Users\jem-m> nvidia-smi.exe
Mon May 16 18:13:27 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.77 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... WDDM | 00000000:08:00.0 On | N/A |
| 23% 31C P8 10W / 250W | 753MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... WDDM | 00000000:41:00.0 Off | N/A |
| 23% 31C P8 12W / 250W | 753MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
while the nvidia-smi in the WSL2 Ubuntu shell gives
(testenv) jem-mosig:~/ $ nvidia-smi [17:48:30]
Mon May 16 17:49:53 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02 Driver Version: 512.77 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:08:00.0 On | N/A |
| 23% 34C P8 10W / 250W | 784MiB / 11264MiB | 8% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 23% 34C P8 13W / 250W | 784MiB / 11264MiB | 12% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Note the same driver and CUDA version, but different NVIDIA-SMI version.
This seems to indicate that CUDA works under WSL2 as it is supposed to. But when I run
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
# 2022-05-17 12:13:05.016328: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
# []
in python inside WSL2 I get [], so no GPU is recognized by Tensorflow. This is Python 3.8.0 and Tensorflow 2.4.1 freshly installed in a new Miniconda environment inside Ubuntu WSL2. I don't know what is going wrong. Any suggestions?
Addendum
I don't get any error messages when importing Tensorflow. But some warnings are produced when working with it. E.g., when I run
import tensorflow as tf
print(tf.__version__)
model = tf.keras.Sequential([tf.keras.layers.Dense(3)])
model.compile(loss="mse")
print(model.predict([[0.]]))
I get
2.4.1
2022-05-17 10:38:28.792209: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 10:38:28.792411: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-17 10:38:28.794356: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2022-05-17 10:38:28.853557: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-05-17 10:38:28.860126: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3792975000 Hz
[[0. 0. 0.]]
These don't seem to be GPU related, though.

Dr. Snoopy got me onto the right track: Despite the fact that the TF website says that
The TensorFlow pip package includes GPU support for CUDA®-enabled cards
, I still needed to run conda install tensorflow-gpu and it worked! Now
import tensorflow as tf
from tensorflow.python.client import device_lib
print("devices: ", [d.name for d in device_lib.list_local_devices()])
print("GPUs: ", tf.config.list_physical_devices('GPU'))
print("TF v.: ", tf.__version__)
gives lots of debug messages and
devices: ['/device:CPU:0', '/device:GPU:0', '/device:GPU:1']
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
TF v.: 2.4.1

Related

Tensorflow is not detecting my GPUs. What shall I do (May 2021)?

TF Version : 2.4.1
CUDA Version : 11.1
tf.test_is_gpu_available() -- returns --> FALSE
tf.test.is_built_with_cuda() -- returns --> TRUE
I tried to revert back TF to 2.4.0, but didn't work
I have also tried:
$ pip uninstall tensorflow
$ pip install tensorflow-gpu
But nothing seems to work, TF is just not detecting my GPUs
EDIT 1:
Output of nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
Output of nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 30% 35C P8 23W / 300W | 23MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 3090 Off | 00000000:43:00.0 Off | N/A |
| 30% 40C P8 27W / 300W | 5MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 3090 Off | 00000000:81:00.0 Off | N/A |
| 64% 63C P2 179W / 300W | 24043MiB / 24268MiB | 59% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2362 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 2564 G /usr/bin/gnome-shell 12MiB |
| 1 N/A N/A 2362 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2362 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 14304 C python3 24035MiB |
+-----------------------------------------------------------------------------+
While running tf.test.is_gpu_avaliable(), I get the following warning:
WARNING:tensorflow:From Spell_correction.py:35: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-05-07 21:46:21.855460: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-05-07 21:46:21.856690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:43:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-05-07 21:46:21.856716: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-05-07 21:46:21.856735: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-05-07 21:46:21.856747: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-05-07 21:46:21.856759: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-05-07 21:46:21.856771: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-05-07 21:46:21.856829: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.1/lib64
2021-05-07 21:46:21.856846: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-05-07 21:46:21.856856: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-05-07 21:46:21.856863: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-05-07 21:46:21.942589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-07 21:46:21.942626: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-05-07 21:46:21.942633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
Another Observation:
Pytorch is detecting GPU, while TF is not.
torch.cuda.is_available() --> TRUE
tf.test.is_gpu_available() --> FALSE
if you use ubuntu 20.04, I suggest to follow steps from here. I recently had the same problem.
You have
NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 30% 35C P8 23W / 300W | 23MiB / 24268MiB | 0% Default |
| | | N/A
try to get latest version of NVIDIA 465 and Cuda 11.3. For my case nvidia-smi is as below:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
What I did;
(1) I uninstalled NVIDIA and CUDA completely see here and be careful.
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get install ubuntu-desktop
sudo rm /etc/X11/xorg.conf
echo 'nouveau' | sudo tee -a /etc/modules
(2) I downloaded NVIDIA, download .run file and simply run sudo bash NVIDIA*.run
(3) I downloaded cuDNN and perform following as described here
tar -xzvf cudnn-11.3-.*.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
Also check .bashrc files as well as described here:
cd ~
gedit .bashrc or nano .bashrc
#add this in the end :
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda11.3/targets/x86_64linux\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Then, pip install tensorflow-gpu==2.4.1

What is the reason that TensorFlow does not detect GPU on Windows

I have installed CUDA 9.0 on my machine which has the NVIDIA GTX 1080 graphics cards. When I run the command nvcc --version then I get:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:32_Central_Daylight_Time_2017
Cuda compilation tools, release 9.0, V9.0.176
But I have tried the steps from the TensorFlow official site to install TF with GPU support, but it still using the CPU.
I have tried pip install and Anaconda install, all was the same result. No one was able to detect GPU, then I have tried many other tutorials on the web, which they were able to detect the GPU, but I am not.
What can be the reason, is there any changing in the new GPU version of TF? If yes, then what is the latest documentation to install TF with GPU support, if not, then where I am doing wrong.
Thanks!
Update1: Tensorflow really wastes my time. Very annoying, at the first I decided to build TF from source, to use it with CUDA 10, but on both OS Windows 10 and Ubuntu 18.04 I was unable to build it successfully. So I gave up, then I decided to use with CUDA 9.0, which is not supported in Ubuntu 18.04, so I came back to windows, but even still the prebuilt library of TF not working, really annoying.
I don't know why TF still using CUDA 9.0 which CUDA 10.0 already officially released, and TF still not supporting Python 3.7? amazing not? and the same thing with MS Build Tools 2015, which 2017 already exist, and many more tools. TF relays on old versions of the tools which make a lot of problem for some people that they must uninstall their new versions which still using, it is very annoying...
Update2: nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 417.71 Driver Version: 417.71 CUDA Version: 9.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 WDDM | 00000000:01:00.0 On | N/A |
| 27% 35C P8 8W / 180W | 498MiB / 8192MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1264 C+G Insufficient Permissions N/A |
| 0 2148 C+G ...0108.0_x64__8wekyb3d8bbwe\HxOutlook.exe N/A |
| 0 4360 C+G ...mmersiveControlPanel\SystemSettings.exe N/A |
| 0 7332 C+G C:\Windows\explorer.exe N/A |
| 0 7384 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 8488 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 9704 C+G ...osoft.LockApp_cw5n1h2txyewy\LockApp.exe N/A |
| 0 10588 C+G ...al\Google\Chrome\Application\chrome.exe N/A |
| 0 10904 C+G ...x64__8wekyb3d8bbwe\Microsoft.Photos.exe N/A |
| 0 12608 C+G ...DIA GeForce Experience\NVIDIA Share.exe N/A |
| 0 13000 C+G ...241.0_x64__8wekyb3d8bbwe\Calculator.exe N/A |
| 0 14668 C+G ...ng4wbp0\app\DellMobileConnectClient.exe N/A |
| 0 17628 C+G ...2.0_x64__8wekyb3d8bbwe\WinStore.App.exe N/A |
| 0 18060 C+G ...oftEdge_8wekyb3d8bbwe\MicrosoftEdge.exe N/A |
+-----------------------------------------------------------------------------+
I finally figured out the problem. this may help others It is a bug with TF 1.12, I have removed and reinstalled TF 1.11 which it is able to detect GPU.
Some suggestions to TF team:
Before releasing a new version, please make sure that it is working
in all OS systems without any bugs (the bugs like this which I
against is really a very low lever bug)
Please also refresh your third-party libraries to support the newest
versions, e.g: CUDA 10, which I had installed in my machine, but because of installing TF I stepped back to 9.0; annoying. VS 2015,
Python 3.7, and and and ... as well.
Please update the documentation, for both install and building from
source, describe everything clearly, what needs, what to install, how to build all of the need tools and utils must be described clearly. In the documentation, I found that building TF from source is
very very easy, but in reality was not, there I found a lot of errors
like the others, so I was unable to build from source.
Till now I found the TF the most annoying framework, building and installing. TF is very sensitive the happening of errors probability in both building or installing is very high.
Good Luck!!

libtensorflow_framework.so: undefined symbol: cuDevicePrimaryCtxGetState

I have installed tensorflow-gpu with conda successfully. When I test doing import tensorflow I have the issue mentioned above. Any idea? I have checked my drivers, the nvidia toolkit and cudnn are intalled correctly and set the values of PATH, LD_LIBRARY_PATH and CUDA_HOME respectively.
...Fri Nov 23 12:00:18 2018
+------------------------------------------------------+
| NVIDIA-SMI 340.107 Driver Version: 340.107 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro FX 5600 Off | 0000:02:00.0 N/A | N/A |
| 61% 77C P0 N/A / N/A | 2MiB / 1535MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
You need to have the proper minimum CUDA support (seems like it's 7 https://askubuntu.com/questions/988787/nvidia-cuda-theano-could-not-find-symbol-cudeviceprimaryctxgetstate) with cuDNN 3.
Upgrade your drivers if possible to get this version.
Otherwise, use either tensorflow-mkl or for older CPU models tensorflow-eigen.
I had the same problem - I had already installed the required NVIDIA drivers on my Ubuntu machine, but when I tried to import Tensorflow I got the same error.
So for me the problem was that I was using the Nouveau driver instead of the NVIDIA driver. In order to fix the issue you need to go to System Settings->Software & Updates->Additional Drivers and select the option Using NVIDIA binary driver ... and then click at the Apply Changes button. Then just reboot and you're done.

Tensorflow seems to be using two GPUs but one GPU seems not be doing anything

I just build a system with two GTX 680 GPUs. To test my system I'm running cifar10_multi_gpu_train.py, training CIFAR10 using Tensorflow.
Tensorflow creates two Tensorflow devices based on the GPUs (last two lines):
$ python tutorials/image/cifar10/cifar10_multi_gpu_train.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 680
major: 3 minor: 0 memoryClockRate (GHz) 1.15
pciBusID 0000:01:00.0
Total memory: 3.94GiB
Free memory: 3.15GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x28eb270
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 680
major: 3 minor: 0 memoryClockRate (GHz) 1.15
pciBusID 0000:03:00.0
Total memory: 3.94GiB
Free memory: 3.90GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 680, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 680, pci bus id: 0000:03:00.0)
However, when monitoring the GPUs during training (using watch -n 1 nvidia-smi), I noticed that the second GPU isn't getting hot at all (71 degrees for GPU0 vs 30 degrees for GPU1):
Every 1,0s: nvidia-smi Mon Apr 24 01:30:40 2017
Mon Apr 24 01:30:40 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 Off | 0000:01:00.0 N/A | N/A |
| 43% 71C P0 N/A / N/A | 3947MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 680 Off | 0000:03:00.0 N/A | N/A |
| 30% 30C P8 N/A / N/A | 3737MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
Also note here that the memory of both GPUs are completely allocated.
Why is my second GPU not used?
Ok, I should have taken more time in reading the script:
tf.app.flags.DEFINE_integer('num_gpus', 1,
"""How many GPUs to use.""")
I just set this to two, and everything works just fine:
Every 1,0s: nvidia-smi Mon Apr 24 02:44:30 2017
Mon Apr 24 02:44:30 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 680 Off | 0000:01:00.0 N/A | N/A |
| 37% 63C P0 N/A / N/A | 3807MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 680 Off | 0000:03:00.0 N/A | N/A |
| 36% 61C P0 N/A / N/A | 3807MiB / 4036MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
| 1 Not Supported |
+-----------------------------------------------------------------------------+
I would have expected that the script would automatically use all the GPUs available.
Getting around 2450 examples/sec, 0.051 sec/batch with cifar10_multi_gpu_train.py.

How to verify the usage of the GPU?

How can I verify that CNTK is using the GPU? I have installed the CNTK-2-0-beta7-0-Windows-64bit-GPU-1bit-SGD binaries on my machine. But, when I try to run this from Python:
from cntk.device import set_default_device, gpu
set_default_device(gpu(0))
I get:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-eca77b3090be> in <module>()
1 from cntk.device import set_default_device, gpu
----> 2 set_default_device(gpu(0))
C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py34\lib\site-packages\cntk\device.py in gpu(device_id)
74 :class:`~cntk.device.DeviceDescriptor`: GPU device descriptor
75 '''
---> 76 return cntk_py.DeviceDescriptor.gpu_device(device_id)
77
78 def use_default_device():
ValueError: Specified GPU device id (0) is invalid.
Adding some more information today:
This is the result from running NVidia_smi.exe
C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe
Thu Jan 12 20:38:30 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 369.61 Driver Version: 369.61 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GPU WDDM | 0000:01:00.0 Off | N/A |
| N/A 51C P0 2W / N/A | 864MiB / 1024MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
After restarting the kernel in a Jupyter Notebook, I get:
import cntk as C
if C.device.default().type() == 0:
print('running on CPU')
else:
print('running on GPU')
running on CPU
However today I was able to run:
from cntk.device import set_default_device, gpu
set_default_device(gpu(0))
import cntk as C
if C.device.default().type() == 0:
print('running on CPU')
else:
print('running on GPU')
running on GPU
Should the GPU be the default on a GPU machine, or do you need to explicitly set it?
This sounds like an intermittent failure. This can happen on some laptops such as the Surface Book which have two GPUs, one from NVIDIA and an integrated one, and the laptop has shutdown the NVIDIA GPU to conserve energy, e.g. when it is running on battery.
Regarding default behavior, by default CNTK will choose the best available device and if it is a GPU it will lock it so no other process can use it. If you explicitly use set_default_device(gpu(0)) then the GPU won't get locked and other processes can use it.