I tried to get current power usage with the following command in Windows 10 x64:
nvidia-smi.exe --format=csv,noheader --query-gpu=power.draw
And got the next result:
[Not Supported]
I checked it on the GTX1050(notebook) video card.
Please also see the nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 382.05 Driver Version: 382.05 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 WDDM | 0000:01:00.0 Off | N/A |
| N/A 38C P8 N/A / N/A | 319MiB / 2048MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Also tried to get this info via
NVML library:
nvmlReturn_t result;
nvmlDevice_t device;
result = nvmlInit();
if (NVML_SUCCESS != result)
{
printf("Failed to initialize NVML: %s\n", nvmlErrorString(result));
return 1;
}
result = nvmlDeviceGetHandleByIndex(0, &device);
if (NVML_SUCCESS != result)
{
printf("Failed to get handle for device %i: %s\n", 0, nvmlErrorString(result));
}
unsigned int power_usage = 0;
result = nvmlDeviceGetPowerUsage(device, &power_usage);
printf(nvmlErrorString(result));
The output is the same:
Not Supported
First question: Is exist the way to get the power usage or other parameter from NVIDIA card which is not supported?
Please also see the Feature Matrix part in the old
manual
it contain the information about features supported NVIDIA cards. Second question: Is exist such docs about new video cards?
I had the same problem with NVIDIA GT1030. It seems that some features including the feature you mention is no longer supported by NVIDIA in newer drivers. I solved the problem by installing an older version. Try finding the first version of driver that included support for your GPU. Check this link.
Related
I am frequently rerunning the same mxnet script while I try to iron out some bugs in a new script (and I am new to mxnet). Pretty often when I try to run my script I get an error that the GPU is out of memory, and when I use nvidia-smi to check, this is what I see:
Wed Dec 5 15:41:29 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24.02 Driver Version: 396.24.02 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:65:00.0 On | N/A |
| 0% 54C P2 68W / 300W | 10891MiB / 11144MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1446 G /usr/lib/xorg/Xorg 40MiB |
| 0 1481 G /usr/bin/gnome-shell 114MiB |
| 0 10216 G ...-token=8422C9FC67F51AEC1893FEEBE9DB68C6 31MiB |
| 0 18221 G /usr/lib/xorg/Xorg 458MiB |
| 0 18347 G /usr/bin/gnome-shell 282MiB |
+-----------------------------------------------------------------------------+
So it seems like most of the memory is in use (10891/11144) BUT I don't see any process in the list taking up a large portion of the GPU, so there doesn't seem to be anything to call. And my mxnet script has been exited out, so I assume it shouldn't be that. I would understand if there were some seconds or even tens of seconds lagging if the GPU does not know right away that the script no longer needs memory, but I am going on many minutes and still see the same display. What gives, and is there some memory cleanup I should do? If so, how? Thank you for any tips to a newbie.
The GPU memory usage is completely bound to the lifetime of the process. If you see GPU memory used, there must be a process that's still alive and holding on to memory. If you run ps -a |grep python you should see all python processes and that will tell you which process is still alive.
This question already has answers here:
How does CUDA assign device IDs to GPUs?
(4 answers)
Closed 4 years ago.
I saw this solution, but it doesn't quite answer my question; it's also quite old so I'm not sure how relevant it is.
I keep getting conflicting outputs for the order of GPU units. There are two of them: Tesla K40 and NVS315 (legacy device that is never used). When I run deviceQuery, I get
Device 0: "Tesla K40m"
...
Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0
Device 1: "NVS 315"
...
Device PCI Domain ID / Bus ID / location ID: 0 / 3 / 0
On the other hand, nvidia-smi produces a different order:
0 NVS 315
1 Tesla K40m
Which I find very confusing. The solution I found for Tensorflow (and a similar one for Pytorch) is to use
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
PCI Bus ID is 4 for Tesla and 3 for NVS, so it should set it to 3 (NVS), is that right?
In pytorch I set
os.environ['CUDA_VISIBLE_DEVICES']='0'
...
device = torch.cuda.device(0)
print torch.cuda.get_device_name(0)
to get Tesla K40m
when I set instead
os.environ['CUDA_VISIBLE_DEVICES']='1'
device = torch.cuda.device(1)
print torch.cuda.get_device_name(0)
to get
UserWarning:
Found GPU0 NVS 315 which is of cuda capability 2.1.
PyTorch no longer supports this GPU because it is too old.
warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
NVS 315
So I'm quite confused: what's the true order of GPU devices that tf and pytorch use?
By default, CUDA orders the GPUs by computing power. GPU:0 will be the fastest GPU on your host, in your case the K40m.
If you set CUDA_DEVICE_ORDER='PCI_BUS_ID' then CUDA orders your GPU depending on how you set up your machine meaning that GPU:0 will be the GPU on your first PCI-E lane.
Both Tensorflow and PyTorch use the CUDA GPU order. That is consistent with what you showed:
os.environ['CUDA_VISIBLE_DEVICES']='0'
...
device = torch.cuda.device(0)
print torch.cuda.get_device_name(0)
Default order so GPU:0 is the K40m since it is the most powerful card on your host.
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES']='0'
...
device = torch.cuda.device(0)
print torch.cuda.get_device_name(0)
PCI-E lane order, so GPU:0 is the card with the lowest bus-id in your case the NVS.
I am going to buy a laptop to do some TF work. Is the GPU version of TF able to take advantage of Nvidia Quadro P1000 and P2000? Will it run faster on these two GPUs than on the mobile version of 1050Ti? Thanks
If I am correct, Tensorflow can run in all Nvidia devices that supports CUDA.
Check this website for their computational compabilities:
https://developer.nvidia.com/cuda-gpus
There you can see the computational power of Nvidia GPU cards.
For your questions about those three cards (P1000, P2000, GeForce 1050Ti), they all have the same computational capabilities: 6.1, which means they won't differ too much in GPU computation.
But from their datasheet (P2000, P1000, 1050ti):
---------------------------------------------------------
| | Memory | Memory Interface | Memory Bandwidth|
---------------------------------------------------------
|P1000 |4G GDRR5| 128 bit | 82Gb/s |
|P2000 |5G GDDR5| 160 bit | 140Gb/s |
|1050Ti |4G GDDR5| 128 bit | 112Gb/s |
---------------------------------------------------------
I would say, P2000 > 1050Ti > P1000
BTW, what does that 6.1 number mean? Basically, it means how much operations and functions they can support. You can find the details in the figure below and this link, and similar discussion here
I am running following code on Google Cloud ML using BASIC GPU (Tesla K80)
https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10
LRN is taking the most amount of time and its running on CPU. I am wondering if following stats quoted in https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_train.py were obtained by running on CPU because I don't see thats the case.
System | Step Time (sec/batch) | Accuracy
1 Tesla K20m | 0.35-0.60 | ~86% at 60K steps (5 hours)
If I force it to run it with GPU it throws following error:
Cannot assign a device to node 'norm1': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available. [[Node: norm1 = LRNT=DT_HALF, alpha=0.00011111111, beta=0.75, bias=1, depth_radius=4, _device="/device:GPU:0"]
I'm running tensorflow on GPU id 1 using export CUDA_VISIBLE_DEVICES=1, everything in nvidia-smi looks good, my python process is running on gpu 1, memory and power consumption show GPU 1 is in use.
But oddly GPU 0, which is unused (based on the process list, memory, power usage, and common sense) shows 96% Volatile GPU-Utilization.
Anyone know why?
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20c Off | 0000:03:00.0 Off | 0 |
| 30% 41C P0 53W / 225W | 0MiB / 4742MiB | 96% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K20c Off | 0000:43:00.0 Off | 0 |
| 36% 49C P0 95W / 225W | 4516MiB / 4742MiB | 63% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 5193 C python 4514MiB |
+-----------------------------------------------------------------------------+
Run ps aux | grep 5193 to see which program is using the GPU.
Your GPUs have ECC enabled, so you will see high CPU or memory utilization.
During driver initialization when ECC is enabled one can see high GPU and Memory Utilization readings. This is caused by ECC Memory Scrubbing mechanism that is performed during driver initialization.
When Persistence Mode is Disabled, driver deinitializes when there are no clients running (CUDA apps or nvidia-smi or XServer) and needs to initialize again before any GPU application (like nvidia-smi) can query its state thus causing ECC Scrubbing.
As a rule of thumb always run with Persistence Mode Enabled. Just run as root nvidia-smi -pm 1. This will speed up application lunching by keeping the driver always loaded.
Reference: https://devtalk.nvidia.com/default/topic/539632/k20-with-high-utilization-but-no-compute-processes-/