TensorFlow isn't using Nvidia - tensorflow

TensorFlow fails to use nvidia card though nvidia driver, cuda toolkit, cudnn installed and configured.
One thing that I suspect is the reason is the nvidia card on my laptop is connected to pci as 3d controller instead of VGA:
00:02.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 07)
Subsystem: ASUSTeK Computer Inc. Skylake Integrated Graphics
Kernel driver in use: i915_bpo
01:00.0 3D controller: NVIDIA Corporation GK208M [GeForce 920M] (rev a1)
Subsystem: ASUSTeK Computer Inc. GK208M [GeForce 920M]
Kernel modules: nvidiafb, nouveau, nvidia_304
Even the Nvidia xserver settings don't see the GPU:
Is this true that tensorflow can only use the graphic card as VGA?

After three month, I finally figured out even first what the issue is and resolved it. It turned out to be a nvidia issue with Secure Boot.
Feel obliged to thank jorgemf and Yao Zhang for your help at a time I couldn't even good articulate the problem.
Meanwhile I hope my case can help other people having a same problem.
All started with my attempt to install nvidia driver again today. The installation seemed successful but in the end, it says,
Unable to load the “nvidia-drm” kernel module.
So I thought maybe I could manually load the kernel with
modprobe mvidia-drm
but got an error says something like "required key not applicable". Wonder what that meant so googled a bit. It turned out to be application not registered! So that module has been stopped by Secure Boot!
Went back to boot settings and disabled secure boot. Installed nvidia driver again, successful! Now in Nvidia settings it looks like this:
See now the gpu device shows there.
Head further to install cuda and cudnn. Found this github gist super useful: https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07
Last step, just followed the installation on Tensorflow home page. Tested it did run on GPU!
The take-home message is if you fail to install Nvidia driver on linux system, you probably need to disable Secure Boot. Personal opinion, Windows turned this good idea into a nightmare for linux users!

Related

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver 1

Good afternoon. NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. - such an inscription appears despite the fact that it is subscribed to colab pro. My settings are correct. Help me, I don't know where to write.

Blender 2.9 Could not find a matching GPU name warning on Chromebook

I'm using an Asus Chromebook with a CPU(I think).
This is what the Error says:
Warning: Could not find a matching GPU name. Things may not behave as expected.
Detected OpenGL configuration:
Vendor: Red Hat
Renderer: virgl
/run/user/1000/gvfs/ non-existent directory
found bundled python: /home/sekhong5417/blender/2.90/python
This works on my Friend's Chromebook who has a GPU.
Also I am kinda young so I can't replace anything or buy a new device.
There are images at the bottom
If anyone still runs into this issue, there is an incompatibility with Blender and Intel ChromeOS GPU drivers.
See https://developer.blender.org/T77651#1172666 for more details and an updated working build of v2.93.
Hopefully, the fix gets included in the next release.
I use Acer Chromebook spin 13 and I just met the same issue with you. I think it is maybe the Debian within Chromebook don't have the driver that matches the Intel GPU. My Chromebook uses Intel HD graphics 620. I tried many ways to install the driver but they all failed. Linux works easier with Nvidia GPU though. So my idea is you can try to find intel a drive which matches your Graphic card and try again.

External GPU with Vulkan

According to this Vulkan tutorial, I can use vkEnumeratePhysicalDevices to get a list of available GPUs. However, I don't see my external NVIDIA GPU in there, only my Intel iGPU.
This eGPU is connected via Thunderbolt and is running CUDA code just fine. Is there anything I might have missed? Is it supposed to work out of the box?
My machine is running Arch Linux with up-to-date proprietary NVIDIA drivers.
The eGPU is a NVIDIA GTX 1050 (Lenovo Graphics Dock). Is it possible that it just does not support Vulkan somehow?
Vulkan support should work just as well with external GPUs (eGPUs). Seeing the eGPU enumerated as a Vulkan device may require the eGPU to be recoznized by Xorg (or Wayland in the future).
See recently created https://wiki.archlinux.org/title/External_GPU#Xorg for changes probably required in Xorg config.

Caffe and Tensorflow on a Dell 7559 with nvidia optimus technology

I bought a dell 7559 laptop for deep learning. I got ubuntu 16.04 installed on it but I am having trouble getting caffe and tensorflow on it. The laptop used Nvidia Optimus technology to switch between gpu and cpu to save battery usage. I checked the bios to see if I can set it to use only gpu but there is no option for it. Using bumblebee or nvidia-prime didnt work either. I now have ubuntu 16 with mate desktop environment it is preventing from getting the black screen but didnt help with the cuda issue. I was able to install the drivers and cuda but when I build caffe and tensorflow they fail saying that it didnt detect a gpu. And I wasnt able to install opengl. I tried using several versions of nvidia drivers but it didnt help. Any help would be great. thanks.
I think Bumblebee can enable you to run Caffe/Tensorflow in GPU mode. More generally, it also allows you to run other CUDA programs on a laptop with Optimus technology .
When you have installed Bumblebee correctly (tutorial: Bumblebee Wiki for Ubuntu ), you can invoke the Caffe binary by pepending optirun before the caffe binary. So it goes like the following:
optirun ../../caffe-master/build/tools/caffe train --solver=solver.prototxt
This works for the NVidia DIGITS server as well:
optirun ./digits-devserver
In addition, Bumblebee also works on my dual-graphics desktop PC (Intel HD 4600 + GTX 750 Ti) as well. The display on my PC is driven by the Intel HD 4600 through the HDMI port on the motherboard. The NVidia GTX 750 Ti is only used for CUDA programs.
In fact, for my desktop PC, the "nvidia-prime" (it's actually invoked through the command line program prime-select) is used to choose the GPU that drives the desktop. I have the integrated GPU connect to the display with the HDMI port and the NVidia GPU through a DisplayPort. Currently, the DisplayPort is inactive. The display signal comes from the HDMI port.
As far as I understand, PRIME does so by modifying /etc/X11/Xorg.conf to make either the Intel integrated GPU or the NVidia GPU the current display adapter available to X. I think the PRIME settings only makes sense when both GPUs are connected to some display, which means there need not be an Optimus link between the two GPUs like in a laptop (or, for a laptop with a Mux such as Dell Precision M4600, the Optimus is disabled in BIOS).
More information about the Display Mux and Optimus may be found here: Using the NVIDIA Driver with Optimus Laptops
Hope this helps!

GKE - GPU nvidia - cuda drivers dont work

I have setup a kubernetes node with a nvidia tesla k80 and followed this tutorial to try to run a pytorch docker image with nvidia drivers and cuda drivers working.
I have managed to install the nvidia daemonsets and i can now see the following pods:
nvidia-driver-installer-gmvgt
nvidia-gpu-device-plugin-lmj84
The problem is that even while using the recommendend image nvidia/cuda:10.0-runtime-ubuntu18.04 i still can't find the nvidia drivers inside my pod:
root#pod-name-5f6f776c77-87qgq:/app# ls /usr/local/
bin cuda cuda-10.0 etc games include lib man sbin share src
But the tutorial mention:
CUDA libraries and debug utilities are made available inside the container at /usr/local/nvidia/lib64 and /usr/local/nvidia/bin, respectively.
I have also tried to test if cuda was working through torch.cuda.is_available() but i get False as a return value.
Many help in advance for your help
Ok so i finally made nvidia drivers work.
It is mandatory to set a ressource limit to access the nvidia driver, which is weird considering either way my pod was on the right node with the nvidia drivers installed..
This made the nvidia folder accessible, but im'still unable to make the cuda install work with pytorch 1.3.0 .. [ issue here ]