Using VMWare Fusion to access GPUs - gpu

I am running the VM Fusion 8 Pro with Ubuntu 14.04 on a MacPro. The MacPro comes with dual AMD FirePro D500 GPUs. I installed the AMD APP SDK within Ubuntu, but it is only seeing the CPU as a device, and not the GPUs. Can someone please guide me so that I can run OpenCL kernels on the GPU(s).
Googling has revealed things like GPU passthrough, but there isn't enough detail on how to exactly access a GPU from within VMWare Fusion.
Sincerely,
Vishal

Last time I checked it was necessary to have motherboard support to allow the virtual machines to access the GPUs.

Related

External GPU with Vulkan

According to this Vulkan tutorial, I can use vkEnumeratePhysicalDevices to get a list of available GPUs. However, I don't see my external NVIDIA GPU in there, only my Intel iGPU.
This eGPU is connected via Thunderbolt and is running CUDA code just fine. Is there anything I might have missed? Is it supposed to work out of the box?
My machine is running Arch Linux with up-to-date proprietary NVIDIA drivers.
The eGPU is a NVIDIA GTX 1050 (Lenovo Graphics Dock). Is it possible that it just does not support Vulkan somehow?
Vulkan support should work just as well with external GPUs (eGPUs). Seeing the eGPU enumerated as a Vulkan device may require the eGPU to be recoznized by Xorg (or Wayland in the future).
See recently created https://wiki.archlinux.org/title/External_GPU#Xorg for changes probably required in Xorg config.

100% GPU utilization on a GCE without any processes

I've just started an instance on a Google Compute Engine with 2 GPUs (Nvidia Tesla K80). And straight away after the start, I can see via nvidia-smi that one of them is already fully utilized.
I've checked a list of running processes and there is nothing running at all. Does it mean that Google has rented out that same GPU to someone else?
It's all running on this machine:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial
Enabling "persistence mode" with nvidia-smi -pm 1 might solve the problem.
ECC in combination with non persistence mode can lead to 100% GPU utilization.
Alternatively you can disable ECC with nvidia-smi -e 0.
Note: I'm not sure if the performance actually is worse. I can remember that I was able to train ML model despite the 100% GPU utilization but I don't know if it was slower.
I would like to suggest you to report and create this issue on the Google Issue Tracker as need to investigate. Please provide your project number and instance name over there. Please follow this URL that make you able to create a file as private in Google Issue Tracker.

Server 2016 VM does not boot after hyper-v install

I am using VMWare Workstation 14, with Windows Server 2016 installed on it. A few weeks ago in my server class, we had an in class lab to setup a Nano Server. I successfully got that up and running and installed Hyper-V and loaded it into it. Started up, and signed in. Then two days later at class again, the VM just freezes at the windows logo with circling white dots.
I have installed multiple VMs trying to get it up and running. Always the same result. I have the virtualization enable in BIOS on my laptop.
I have a VM snapshot I just took before installing Hyper-V, installed the role, rebooted, and again, stops at the windows logo with circling dots.
I don't know where to check if there is an issue, or if something is configured incorrectly. I am just looking for some help and ideas on what I should check. I do also have the virtualization stuff within VMWare enabled under the properties for the VM.
System details:
Asus GL502VMZ
Version 10.0.17134 Build 17134
Processor Intel(R) Core(TM) i7-7700HQ CPU # 2.80GHz, 2808 Mhz, 4 Core(s), 8 Logical Processor(s)
Installed Physical Memory (RAM) 16.0 GB
NVIDIA GeForce GTX 1060 3GB
Have you tried uninstalling the anti-virus? I was running into the same issue in my server class (Using VMware to run server 2016 which we where using to nest hyper-v)until a class mate said he didn't have the AV (we where looking for differences on why his worked and mind didn't). I removed my AV (was AVG) now the server no longer stalls at the boot screen.
Just my 2 cents that may help someone in the future. I had the same issue on an HP Proliant ML310e that worked fine for the full install until enabling the Hyper-V role, then it would hang at the Windows icon screen with the swirling dots. I was able to get past that by disabling Intel VT-d in the bios. Not a particularly good solution, but it allowed the system to boot successfully and run normally.

Caffe and Tensorflow on a Dell 7559 with nvidia optimus technology

I bought a dell 7559 laptop for deep learning. I got ubuntu 16.04 installed on it but I am having trouble getting caffe and tensorflow on it. The laptop used Nvidia Optimus technology to switch between gpu and cpu to save battery usage. I checked the bios to see if I can set it to use only gpu but there is no option for it. Using bumblebee or nvidia-prime didnt work either. I now have ubuntu 16 with mate desktop environment it is preventing from getting the black screen but didnt help with the cuda issue. I was able to install the drivers and cuda but when I build caffe and tensorflow they fail saying that it didnt detect a gpu. And I wasnt able to install opengl. I tried using several versions of nvidia drivers but it didnt help. Any help would be great. thanks.
I think Bumblebee can enable you to run Caffe/Tensorflow in GPU mode. More generally, it also allows you to run other CUDA programs on a laptop with Optimus technology .
When you have installed Bumblebee correctly (tutorial: Bumblebee Wiki for Ubuntu ), you can invoke the Caffe binary by pepending optirun before the caffe binary. So it goes like the following:
optirun ../../caffe-master/build/tools/caffe train --solver=solver.prototxt
This works for the NVidia DIGITS server as well:
optirun ./digits-devserver
In addition, Bumblebee also works on my dual-graphics desktop PC (Intel HD 4600 + GTX 750 Ti) as well. The display on my PC is driven by the Intel HD 4600 through the HDMI port on the motherboard. The NVidia GTX 750 Ti is only used for CUDA programs.
In fact, for my desktop PC, the "nvidia-prime" (it's actually invoked through the command line program prime-select) is used to choose the GPU that drives the desktop. I have the integrated GPU connect to the display with the HDMI port and the NVidia GPU through a DisplayPort. Currently, the DisplayPort is inactive. The display signal comes from the HDMI port.
As far as I understand, PRIME does so by modifying /etc/X11/Xorg.conf to make either the Intel integrated GPU or the NVidia GPU the current display adapter available to X. I think the PRIME settings only makes sense when both GPUs are connected to some display, which means there need not be an Optimus link between the two GPUs like in a laptop (or, for a laptop with a Mux such as Dell Precision M4600, the Optimus is disabled in BIOS).
More information about the Display Mux and Optimus may be found here: Using the NVIDIA Driver with Optimus Laptops
Hope this helps!

TensorFlow isn't using Nvidia

TensorFlow fails to use nvidia card though nvidia driver, cuda toolkit, cudnn installed and configured.
One thing that I suspect is the reason is the nvidia card on my laptop is connected to pci as 3d controller instead of VGA:
00:02.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 07)
Subsystem: ASUSTeK Computer Inc. Skylake Integrated Graphics
Kernel driver in use: i915_bpo
01:00.0 3D controller: NVIDIA Corporation GK208M [GeForce 920M] (rev a1)
Subsystem: ASUSTeK Computer Inc. GK208M [GeForce 920M]
Kernel modules: nvidiafb, nouveau, nvidia_304
Even the Nvidia xserver settings don't see the GPU:
Is this true that tensorflow can only use the graphic card as VGA?
After three month, I finally figured out even first what the issue is and resolved it. It turned out to be a nvidia issue with Secure Boot.
Feel obliged to thank jorgemf and Yao Zhang for your help at a time I couldn't even good articulate the problem.
Meanwhile I hope my case can help other people having a same problem.
All started with my attempt to install nvidia driver again today. The installation seemed successful but in the end, it says,
Unable to load the “nvidia-drm” kernel module.
So I thought maybe I could manually load the kernel with
modprobe mvidia-drm
but got an error says something like "required key not applicable". Wonder what that meant so googled a bit. It turned out to be application not registered! So that module has been stopped by Secure Boot!
Went back to boot settings and disabled secure boot. Installed nvidia driver again, successful! Now in Nvidia settings it looks like this:
See now the gpu device shows there.
Head further to install cuda and cudnn. Found this github gist super useful: https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07
Last step, just followed the installation on Tensorflow home page. Tested it did run on GPU!
The take-home message is if you fail to install Nvidia driver on linux system, you probably need to disable Secure Boot. Personal opinion, Windows turned this good idea into a nightmare for linux users!