I have an unknown Linux machine, I need to check which GPU it uses (more specifically if it uses an AMD GPU). To know more about CPUs I have used cat /proc/cpuinfo. Is there something similar for GPUs?
If clinfo is available, it'll give you a list of OpenCL-capable compute devices, including GPUs. You're out of luck if GPUs are not supporting OpenCL or drivers are not installed. There is no generic way of getting a list of all kinds of GPU devices. On some platforms you can at least get a list of discrete GPUs from lspci output, but you'll miss the integrated and non-PCI GPUs this way.
If you already have an X11 server running on that box, you can always do glxinfo on it. It cannot be done in a headless way though.
Related
According to this Vulkan tutorial, I can use vkEnumeratePhysicalDevices to get a list of available GPUs. However, I don't see my external NVIDIA GPU in there, only my Intel iGPU.
This eGPU is connected via Thunderbolt and is running CUDA code just fine. Is there anything I might have missed? Is it supposed to work out of the box?
My machine is running Arch Linux with up-to-date proprietary NVIDIA drivers.
The eGPU is a NVIDIA GTX 1050 (Lenovo Graphics Dock). Is it possible that it just does not support Vulkan somehow?
Vulkan support should work just as well with external GPUs (eGPUs). Seeing the eGPU enumerated as a Vulkan device may require the eGPU to be recoznized by Xorg (or Wayland in the future).
See recently created https://wiki.archlinux.org/title/External_GPU#Xorg for changes probably required in Xorg config.
Is there a way to run RAPIDS without a GPU? I usually develop on a small local machine without a GPU, then push my code to a powerful remote server for real use. Things like TensorFlow allow switching between the CPU and GPU depending on if they're available. Can an equivalent thing be done with RAPIDS? Even if it's slow, being able to test things on a machine without a GPU would be extremely helpful.
There isn't a way to use RAPIDS without a GPU, and part of the reason for that is we're following the APIs the community has adopted in CPU packages across Pandas, Numpy, SKLearn, NetworkX, etc. This way it should be as easy as swapping an import statement to get something working on the CPU vs the GPU.
I've just started an instance on a Google Compute Engine with 2 GPUs (Nvidia Tesla K80). And straight away after the start, I can see via nvidia-smi that one of them is already fully utilized.
I've checked a list of running processes and there is nothing running at all. Does it mean that Google has rented out that same GPU to someone else?
It's all running on this machine:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial
Enabling "persistence mode" with nvidia-smi -pm 1 might solve the problem.
ECC in combination with non persistence mode can lead to 100% GPU utilization.
Alternatively you can disable ECC with nvidia-smi -e 0.
Note: I'm not sure if the performance actually is worse. I can remember that I was able to train ML model despite the 100% GPU utilization but I don't know if it was slower.
I would like to suggest you to report and create this issue on the Google Issue Tracker as need to investigate. Please provide your project number and instance name over there. Please follow this URL that make you able to create a file as private in Google Issue Tracker.
At first, let me explain what i have to do.
My develop enviroment is Tizen OS. may be you are unfamilier that, anyway this os is using linux kernel based redhat and targeting on mobile, tv, etc.. And my target device is consists of exynos 5422 and arm mali-t628.
My main work is implement some gpu library to let tensorflow lite's operation can use the library.
I proceeded to build and install tensorflow lite as a rpm package file.
I am googling many times about the tensorflow and gpu. and get some useless information about cuda. i didnt see any info for my case(tizen and mali gpu).
i think linux have gpu instruction like the cpu or library.. but i cant find them.
can you suggest search keyword or document?
You can go to nvidia’s cuda toolkit page, where you can find the documentation and
Training buttons / options.
Also there’s the CUDA programming guide wich i myself find very usefull and helpull for CUDA.
I believe that one or two of those may help you.
CUDA is for NVidia GPU. Mali is not NVidia's, but ARM's. So you CANNOT use CUDA in your given hardware. Besides, if you want CUDA, you'd better drop Tensorflow-lite and use Tensorflow.
If you want to use CUDA, get a hardware with supported NVidia GPU (e.g., x64 machine with NVidia GPU). Note that you can use Tensorflow-GPU & CUDA/CUDNN in Tizen with x64+NVidia GPU. You just need to be careful on nvidia GPU kernel driver version and userspace driver version. Because NVidia's GPU userspace driver and CUDA/CUDNN are statically built, its Linux drivers are compatible with Tizen. (I've tested tensorflow-gpu, CUDA/CUDNN in Tizen with NVidia driver version 111... probably in winter, 2017)
If you want to use Tizen/Tensorflow-lite in the given hardware, forget CUDA.
Suppose we have AMD GPU (for example Radeon HD 7970) and minimal linux system without X and etc.
What should be installed and what should be launched and how it should be launched to have proper OpenCL environment? In best case it should be headless environment.
Requirements to environment:
GPU visible by OpenCL programs (clinfo for example)
It is possible to monitor temperature and set fan speed (for example using aticonfig).
P.S. Simple install Xserver, catalyst and run X :0 won't work properly. See X server with fglrx driver won't responce after exactly 49 accesses to X server
UPD When you use AMD GPU on linux, OpenCL applications don't see AMD GPU if Xserver isn't launched.
I had similar problem, asked a question and had succeed solving it by myself.
For R9 290 cards and newer i assume you have:
Built kernel 4.14 or later, with amdgpu driver support. There is option in linux kernel config under Graphics Support.
All nesesary firmware .bin blobs are incorporated. To do so easily you may edit buildroot/package/linux-firmware/* contents for buildroot, and manually add BR2_PACKAGE_LINUX_FIRMWARE_AMDGPU option by yourself, along with BR2_PACKAGE_LINUX_FIRMWARE_RADEON (use it as a template). Actually we should post that update to their git.
When booting you should see appropriate dmesg messages about amdgpu initializing, per each adapter. And screen mode should be switched. If you still see large console text and no videomode switch occured during init then you have problem in kernel/firmware, you should fix that out first.
To answer second question, controlling fan speeds/temperatures is achieved via powerplay filesystem, eg /sys/class/drm/.. like this:
cd sys/class/drm/card0/device/hwmon/hwmon0
echo 1 > pwm1_enable
cat pwm1_max > pwm1
You may dig a bit deeper and find powertune parameters nearby, in device folder.
But instead of using /sys/class/drm/card0/device/pp_dpm_sclk i highly recommend flashing that values directly in cards' bios. Set with required frequencies/voltages, as it is more reliable, stable and api independent - you either init it, or not :)
PS. Also put away 7970, buy something a bit newer. I dont know if it is still supported in the latest drivers, we havent such an old card by hands right now. I tested 290, 390, 480, 580 cards series. (for R9 270, miner fails to build cl code). For older cards better to use some older software <=16.40 and maybe a bit older kernel <=4.13