WSL2- $nvidia-smi command not running - tensorflow

I have Ubuntu 18.04LTS install inside WSL2 and I was able to use GPU. I can run
$nvidia-smi from window run terminal.
However, I can not find any result when I run $nvidia-smi on WSL2

The fix is now available at the nvidia-docs.
cp /usr/lib/wsl/lib/nvidia-smi /usr/bin/nvidia-smi
chmod ogu+x /usr/bin/nvidia-smi
Source: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations

From the known limitations in documentation from nvidia :
NVIDIA Management Library (NVML) APIs are not supported. Consequently,
nvidia-smi may not be functional in WSL 2.
However you should be able to run https://docs.nvidia.com/cuda/wsl-user-guide/index.html#unique_1238660826
EDIT: since this answer, nvidia-smi supported since driver 465.42
I am running it well using 470.57.02.

When I was installing CUDA 11.7.1 in WSL2, a same error raised which noted me
Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system
I fixed it by updating the Windows system to version 19044 21H2.

Related

Stopping and starting a deep learning google cloud VM instance causes tensorflow to stop recognizing GPU

I am using the pre-built deep learning VM instances offered by google cloud, with an Nvidia tesla K80 GPU attached. I choose to have Tensorflow 2.5 and CUDA 11.0 automatically installed. When I start the instance, everything works great - I can run:
Import tensorflow as tf
tf.config.list_physical_devices()
And my function returns the CPU, accelerated CPU, and the GPU. Similarly, if I run tf.test.is_gpu_available(), the function returns True.
However, if I log out, stop the instance, and then restart the instance, running the same exact code only sees the CPU and tf.test.is_gpu_available() results in False. I get an error that looks like the driver initialization is failing:
E tensorflow/stream_executor/cuda/cuda_driver.cc:355] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
Running nvidia-smi shows that the computer still sees the GPU, but my tensorflow can’t see it.
Does anyone know what could be causing this? I don’t want to have to reinstall everything when I’m restarting the instance.
Some people (sadly not me) are able to resolve this by setting the following at the beginning of their script/main:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
I had to reinstall CUDA drivers and from then on it worked even after restarting the instance. You can configure your system settings on NVIDIAs website and it will provide you the commands you need to follow to install cuda. It also asks you if you want to uninstall the previous cuda version (yes!).This is luckily also very fast.
I fixed the same issue with the commands below, taken from https://issuetracker.google.com/issues/191612865?pli=1
gsutil cp gs://dl-platform-public-nvidia/b191551132/restart_patch.sh /tmp/restart_patch.sh
chmod +x /tmp/restart_patch.sh
sudo /tmp/restart_patch.sh
sudo service jupyter restart
Option-1:
Upgrade a Notebooks instance's environment. Refer the link to upgrade.
Notebooks instances that can be upgraded are dual-disk, with one boot disk and one data disk. The upgrade process upgrades the boot disk to a new image while preserving your data on the data disk.
Option-2:
Connect to the notebook VM via SSH and run the commands link.
After execution of the commands, the cuda version will update to 11.3 and the nvidia driver version to 465.19.01.
Restart the notebook VM.
Note: Issue has been solved in gpu images. New notebooks will be created with image version M74. About new image version is not yet updated in google-public-issue-tracker but you can find the new image version M74 in console.

why my computer does not detect GPU and using CPU?

I have a Gforce 1080 Ti GPU and I installed visuall studio 2017 enterprise, 430.64-desktop-win10-64bit-international-whql, cuda_10.0.130_411.31_win10, cudnn-9.0-windows10-x64-v7.4.2.24 and Anaconda3-5.2.0-Windows-x86_64 respectively on my computer. after that, I make a virtual environment variable using Anaconda command prompt and install TensorFlow-GPU using this command: pip install --ignore-installed --upgrade tensorFlow-gpu==1.9 but my system using CPU instead of gpu.one time at first it used gpu and then during learning my network, it used CPU again. what is the problem? and what should I do to solve this problem and make force my system to use GPU? please help me. thank you.
According to https://www.tensorflow.org/install/source#tested_source_configurations
tensorflow_gpu-1.9.0 only supports CUDA 9.0, it might be the issue. I suggest you could try tensorflow_gpu-1.13.1

How to use a remote machine's GPU in jupyter notebook

I am trying to run tensorflow on a remote machine's GPU through Jupyter notebook. However, if I print the available devices using tf, I only get CPUs. I have never used a GPU before and am relatively new at using conda / jupyter notebook remotely as well, so I am not sure how to set up using the GPU in jupyter notebook.
I am using an environment set up by someone else who already executed the same code on the same GPU, but they did it via python script, not in a jupyter notebook.
this is the only code in the other person's file that had to do with the GPU
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
set_session(tf.Session(config=config))
I think the problem was that I had tensorflow in my environment instead of tensorflow-gpu. But now I get this message "cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version" and I don't know how to update the driver through terminal
How is your environment set up? Specifically, what is your remote environment, and what is your local environment? Sounds like your CUDA drivers are out of date, but it could be more than just that. If you are just getting started, I would recommend finding an environment that requires little to no configuration work on your part, so you can get started more easily/quickly.
For example, you can run GPUs on the cloud, and connect to them via local terminal. You also have your "local" frontend be Colab by connecting it to a local runtime. (This video explains that particular setup, but there's lots of other options)
You may also want to try running nvidia-smi on the remote machine to see if the GPUs are visible.
Here is another solution, that describes how to set up a GPU-Jupyterlab instance with Docker.
To update your drivers via terminal, run:
ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo reboot
Are your CUDA paths set appropriately? Like that?
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Does tensorflow support Python 3.6.4 on Windows?

I'm running a Windows computer with just a CPU (no GPU). When I run pip install tensorflow -vvv in order to see what pip is doing, it lists a lot of links, but for all of them, it says "Skipping link ... it is not compatible with this Python."
Does tensorflow support Python 3.6.4 on Windows? If so, what binary URL should I use to install it?
(I previously installed with this version due to reading this, but ran into this error without the DLL load failed message, so I'm wondering if there's a better version I should use.)
Also, I'm aware that Tensorflow says they support Python 3.x, but right now it hasn't been working for me.
You have probably installed Python 32bits, you need the 64bits version

Ubuntu 16.04 screen completely freezes only mouse moves

Ever since I have upgraded my laptop (Click here for hardware specs.) my screen usually freezes. Mostly in chrome or Firefox browser. I am pretty sure this is a Nvidia driver problem but I can't seem to find the solution. I am running a Nvidia Quadro K2100M.
I am currently running Nvidia 361.42. I have tried using open source Xorg server without any luck.
The only solution I have found so far is forcefully turning off the computer by holding down the power button.
Things that I have tried:
I got keyboard input
I cannot switch to another terminal to restart lightdm
This problem came to me occasionally, making me really annoyed.
As illustrated in many blogs, this may be caused by graphic driver problem. For me, my desktop has a NVIDIA video card, you can run lspci | grep VGA to see what type of your video type, in my case, it returned:
02:00.0 VGA compatible controller: NVIDIA Corporation G98 [GeForce 8400 GS Rev. 2] (rev a1)
I followed the instruction on jiakai zhang's blog to reinstall proper drivers for the desktop, hope this will help you.
The key steps in [1] are to reinstall the ubuntu desktop and nvidia drivier by:
$ sudo su
$ apt-get update
$ apt-get install --reinstall ubuntu-desktop
$ apt-get install unity
$ apt-get remove --purge nvidia*
$ reboot
$ sudo apt-get install nvidia-current
$ sudo reboot
Updating the grub settings worked for me! Do the following:
1. Open the GRUB configuration
sudo vi /etc/default/grub
2. Change the value of GRUB_CMDLINE_LINUX_DEFAULT from "quiet splash" to
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_idle.max_cstate=1"
and save the file.
3. Update & Reboot
sudo update-grub
sudo reboot
More info:
This is a bug in the processor, known as the c-state bug. It causes total freezes when the CPU tries to enter an unsupported sleep state. It's a problem for many Bay Trail devices especially with newer (4.*) kernels. There is a simple workaround until it gets properly fixed upstream. You just need to pass a kernel boot parameter and the random freezing stops completely. The parameter may increase battery consumption slightly, but it will give you a usable system. You do this by editing the configuration file for GRUB as described above.
GRUB - boot loader package from the GNU Project, which provides a user the choice to boot one of multiple operating systems
installed on a computer or select a specific kernel configuration available on a particular operating system's partitions;
Intel Bay Trail - new Atom Processors from Intel. Atom is Intel's family of x86 and x86-64 processors that are optimized
for small computing devices, such as smartphones and mobile Internet devices;
C-States - used to optimize optimize or reduce power consumption in idle mode (i. e. when no code is executed) - (C0 to C8)
Reference: here.
I have since fixed this problem by re-installing Ubuntu 16.04 and not switching from the nouveau video driver. I also disable updates and everything been working good for about 2 months now.
Gaming is pretty good but I usually play steam games so doesn't push any kinda hard core graphics
Well, I had the same problem: My PC was freezing randomly. I tried Ubuntu 16, 17 and 18.04 and everything was the same. I tried several drivers and didn't get a solution. I tried several solutions that I found in the forums (including this) and got bad and harmful results.
My solution was: I stopped using the graphical nvidia card, removed it and now I'm using the integrated Intel HD graphics card (Intel® HD Graphics 530 card (Skylake GT2)) and all the problems were solved!
I fixed mine using a few commands from #Qoros solution above. i just ran apt-get update, apt-get install nvidia-current, and sudo reboot. cheers to #Qoros btw!
For me, none of the approaches described in rest of the answers worked.
I was opening multiple terminal tabs running some heavy processes and ubuntu used to freeze when I had 6-7 tabs. I tried monitoring the resources used while I was starting my processes in terminal tabs. You can do it by opening System Monitor app and going to Resources tab.
What I noticed is that when my RAM(8GB) and my swap space(1GB) were completely used up, ubuntu would freeze.
As a solution, I increased my swap space and made it 16GB. After this memory never gets used completely and ubuntu doesn't freeze.
https://askubuntu.com/questions/178712/how-to-increase-swap-space decsibes how to increase swap space.