virt-install hangs - GPU Passthrough for Virtual Machines

virt-install hangs - GPU Passthrough for Virtual Machines - virtual-machine

I want to run VMs that uses host's GPU. For that, I followed this docs to enable modules/grub configurations. Looks like I successfully configured, I can see dmesg | grep -i vfio. But when I run virt-install, it is hanging forever, parallely I can't run even virsh list --all. Every time I have to restart my laptop, in order to run any virsh/virt-install commands again.
veeru#ghost:~$ sudo su
[sudo] password for veeru:
root#ghost:/home/veeru# virt-install \
> --name vm0 \
> --ram 12028 \
> --disk path=/home/veeru/ubuntu14-HD.img,size=30 \
> --vcpus 2 \
> --os-type linux \
> --os-variant ubuntu16.04 \
> --network bridge=bridge:br0 \
> --graphics none \
> --console pty,target_type=serial \
> --location /home/veeru/Downloads/ubuntu-16.04.5.iso --force \
> --extra-args 'console=ttyS0,115200n8 serial' \
> --host-device 01:00.0 \
> --features kvm_hidden=on \
> --machine q35
Starting install...
Retrieving file .treeinfo... | 0 B 00:00:00
Retrieving file content... | 0 B 00:00:00
Retrieving file info... | 67 B 00:00:00
Retrieving file vmlinuz... | 6.8 MB 00:00:00
Retrieving file initrd.gz... | 14 MB 00:00:00
Below is the output when I do strace of process for above command
veeru#ghost:~$ sudo strace -p 9747
strace: Process 9747 attached
restart_syscall(<... resuming interrupted poll ...>
PS: My laptop is Predator Helios 300(UEFI-Secure Boot), GPU: Nvidia GeForce GTX1050Ti, Ubuntu Mate 18.04(Installed nvidia drivers), 8GB Ram,

Ok, I see the problem, the GPU is already being used by host(my laptop) i.e it is busy. So, when I run virt-install command, it hangs forever which is no wonder.
In order to resolve the issue, switch your X11 to use CPU. I use Ubuntu Mate 18.06 which has handy tool to switch like in below screenshot
Ater that logout and login and check nvidia GPU is not being used by any process by running nvidia-smi; it should similar output like below.
veeru#ghost:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Now you should able to run virt-install like me.

Related

Unable to install nvm on WSL due to network issues

I'm trying to install nvm using curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash on WSL2, but I'm getting different errors. Initially, the curl command would return the following:
> $ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:09 --:--:-- 0curl: (6) Could not resolve host: raw.githubusercontent.com
After running netsh int ip reset in Windows, which was suggested in another question, the same command started timing instead:
> $ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:04:59 --:--:-- 0
curl: (28) Connection timed out after 300000 milliseconds
I've also tried manually saving the install.sh to my machine and running it locally (after setting its permissions with chmod +x install.sh), but that returns a similar error:
> $ ./install.sh
=> Downloading nvm from git to '/home/mparra/.nvm'
=> Cloning into '/home/mparra/.nvm'...
fatal: unable to access 'https://github.com/nvm-sh/nvm.git/': Failed to connect to github.com port 443: Connection timed out
Failed to clone nvm repo. Please report this!
I can successfully ping github.com. ping -c 100 github.com returns the following:
--- github.com ping statistics ---
100 packets transmitted, 100 received, 0% packet loss, time 99156ms
rtt min/avg/max/mdev = 15.280/20.739/85.205/9.141 ms
This issue suggests that a Windows update resolved the issue, but that's not an option for me since it's a work machine and I can't update beyond build 18363.2039. I've also checked that my VPN is not enabled and I set my DNS to 8.8.8.8 and 8.8.4.4, which had no effect.

Please try the following in your WSL
sudo rm /etc/resolv.conf
sudo bash -c 'echo "nameserver 8.8.8.8" > /etc/resolv.conf'
sudo bash -c 'echo "[network]" > /etc/wsl.conf'
sudo bash -c 'echo "generateResolvConf = false" >> /etc/wsl.conf'
sudo chattr +i /etc/resolv.conf
I can install with curl.

I have a feeling you are probably correct about this being the same issue mentioned on Github that was resolved in a Windows update.
If that's truly the case, you are probably going to continue to run into issues even after getting nvm installed. For instance, nvm probably will have trouble downloading Node releases.
The easiest solution that I can propose, if it works for you, is to simply convert to WSL1 instead of WSL2. WSL1 will handle most (but not all) Node use-cases just as well as WSL2. And WSL1 handles networking very differently than WSL2. If the Windows networking stack is working fine for you, then WSL1's should as well.
As noted in that Github issue, this seemed to be a problem that occurred only in Hyper-V instances. WSL2 runs in Hyper-V, but WSL1 does not.
If you go this route, you can either:
create a copy of your existing WSL2 distribution and convert that copy to WSL1. From PowerShell:
wsl --shutdown
wsl -l -v # Confirm <distroname>
wsl --export <distroname> path\to\backup.tar
mkdir .\path\for\new\instance
wsl --import WSL1 .\path\for\new\instance path\to\backup.tar --version 1 # WSL1 can be whatever name you choose
wsl -d WSL1
Note that you'll be root, by default. To change the default user, follow this answer.
Or, just convert the WSL2 instance to WSL1:
wsl --shutdown
wsl -l -v # Confirm <distroname>
wsl --export <distroname> path\to\backup.tar # Just in case
wsl --set-version <distroname> 1
If WSL1 doesn't work for you (at least in the short term until your company pushes that update), then there may be another option similar to the one mentioned in this comment on that Github issue. Let me know if you need to go that route, and I'll see if I can simply that a bit.

virt-install blocked in domain connection

I tried to install Centos 7 guest vm using the virt-install command with the local ks file but I am stuck in the "Connected to the domain" state
Can you help me please
this is the output from terminal:
[klaude#localhost ~]$ LANG=C sudo virt-install --connect qemu:///system \
> --name centos7-ks \
> --ram 2048 \
> --disk path=/var/lib/libvirt/images/centos7-ks.qcow2,size=10 \
> --vcpus 2 \
> --os-type linux \
> --os-variant centos7.0 \
> --network bridge=virbr0 \
> --graphics none \
> --location /var/lib/libvirt/images/CentOS-7-x86_64-DVD-1804.iso \
> --initrd-inject=/home/klaude/Documents/ks1.cfg
[sudo] password for klaude:
Sorry, try again.
[sudo] password for klaude:
WARNING Did not find 'console=ttyS0' in --extra-args, which is likely required to see text install output from the guest.
Starting install...
Retrieving file .treeinfo... | 354 B 00:00:00
Retrieving file vmlinuz... | 5.9 MB 00:00:00
Retrieving file initrd.img... | 50 MB 00:00:00
Allocating 'centos7-ks.qcow2' | 10 GB 00:00:00
Connected to domain centos7-ks
Escape character is ^]

Cannot run Mesos Containers with GPU tasks

I am running Mesos on Ubuntu and am trying to execute:
mesos-execute \
--master=$(cat /etc/mesos/zk) \
--name=gpu-test \
--docker_image=nvidia/cuda \
--command="nvidia-smi" \
--framework_capabilities="GPU_RESOURCES" \
--resources="gpus:1"
and it is failing because: sh: 1: nvidia-smi: not found
even though when I run it without container support
mesos-execute \
--master=$(cat /etc/mesos/zk) \
--name=gpu-test \
--command="nvidia-smi" \
--framework_capabilities="GPU_RESOURCES" \
--resources="gpus:1"
it has access to the gpu
plus if I run it without container support but put the command as
nvidia-docker run -it nvidia/cuda nvidia-smi
it works, so it seems that the mesos containerizer doesnt have access to the GPUs. But in the /etc/mesos-slave/ directory I gave it containerizers mesos (and all the other required flags to run gpu commands). Plus non-gpu related commands are working fine.

This looks like a regression in 1.3.0. I downgraded to 1.2.1 on Ubuntu and can successfully use GPUs with Docker containers and the Mesos containerizer again.
sudo apt-get install mesos=1.2.1-2.0.1
It looks like someone filed a related bug but there's been no activity:
https://issues.apache.org/jira/browse/MESOS-7730

How to enable VNC in VM in KVM when the KVM server doesn't have a GUI?

Can someone tell me how to enable the VNC in VM which is being created using virt-install on KVM hypervisor?
My server doesn't have a GUI so I used to run the following command to spin up a VM:
virt-install \
--name centos6 \
--ram 1024 \
--disk path=/var/lib/libvirt/images/centos6.img,bus=virtio,size=30 \
--vcpus 1 \
--os-type linux \
--os-variant rhel6 \
--network bridge=br0 \
--graphics none \
--location 'http://mirror.i3d.net/pub/centos/6/os/x86_64/' \
--extra-args 'console=ttyS0,115200n8 serial'
Now I want to install GUI on the VM(centos6) and install VNC, can someone tell me how to achieve that?
Thanks.

Found answer on this link though it is implemented on ubuntu but same can be replicated for other distribution : https://www.howtoforge.com/tutorial/ubuntu-gnome-vnc-headless-server

regenerating certificates hangs on windows 7

I'm a total docker newbie and tried to get it working on my windows 7 64-bit machine.
The installation went okay, but the "Docker Quickstart Terminal" will not start up as expected. It seems to hang when trying to create the SSH key:
(default) Downloading https://github.com/boot2docker/boot2docker/releases/download/v
(default) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
(default) Creating VirtualBox VM...
(default) Creating SSH key...
Error creating machine: Error in driver during machine creation: exit status 1
Looks like something went wrong... Press any key to continue...
so I tried to regenerate the certificates in a cmd window and also this does not work:
>docker-machine regenerate-certs default
Regenerate TLS machine certs? Warning: this is irreversible. (y/n): y
Regenerating TLS certificates
Detecting the provisioner...
OS type not recognized
I've tried to deactivate my virus scanner and execute the cmd windows as admin without success.
Any ideas what to check? Are there any interesting logfiles?
here's the docker version output:
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.3
Git commit: a34a1d5
Built: Fri Nov 20 17:56:04 UTC 2015
OS/Arch: windows/amd64
An error occurred trying to connect: Get http://localhost:2375/v1.21/version: dial tcp 127.0.0.1:2375:
ConnectEx tcp: No connection could be made because the target machine actively refused it.

If you don't have hyper-v activated (that is more a Windows 10 issue), and if your BIOS VT-X/AMD-v is enabled, then something else went wrong.
If docker-machine ls still lists the default machine, delete it: docker-machine rm default.
If you had (previous to your docker-toolbox installation) a VirtualBox already installed, try and:
uninstall completely VirtualBox
in C:\Windows\system32\drivers\, find and delete these five files (there may be less left, that is ok, delete them anyway):
vboxdrv.sys,
vboxnetadp.sys,
vboxnetflt.sys,
vboxusbmon.sys,
vboxusb.sys.
in regedit, key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\, delete these 5 folders (there may be less left, that is ok, delete them anyway):
VBoxDrv,
VBoxNetAdp,
VBoxNetFlt,
VBoxUSBMon,
VBoxUSB.
Then reinstall the latest VirtualBox.
Make sure:
you have the latest docker-machine copied somewhere in your PATH (the 0.5.3 has been released 22 hours ago: releases/download/v0.5.3/docker-machine_windows-amd64.exe).
%HOME% is defined (typically to %HOMEDRIVE%%HOMEPATH%)
From there, try manually to recreate the default machine like the quick-start script did:
docker-machine create -d virtualbox --virtualbox-memory 2048 --virtualbox-disk-size 204800 default
eval $($DOCKER_MACHINE env my_new_container --shell=bash)
docker-machine ssh my_new_container

I've now tried to create a Linux VM directly in VirtualBox and start it from there: also gets some time-out. So I think it's not related to docker.
I've found a VirtualBox bug-report that says, that this can happen when you have Avira installed.
Here's a discussion about the issue on the Avira forum - unfortunatly mostly in German.
One paragraph indicates that it may help to deactivate "Advanced process protection":
Configuration -> General -> Security and disable the option "Advanced
process protection". Click "Apply" and restart the device. You should
be able to run your VM in VirtualBox after that.
In my case this does not help, so I'll need to wait for a fix or completely uninstall Avira.

(defualt) DBG | Getting to WaitForSSH function...
(defualt) DBG | Using SSH client type: external
(defualt) DBG | &{[-F /dev/null -o PasswordAuthentication=no -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=quiet -o ConnectionAttempts=3 -o ConnectTimeout=10 -o ControlMaster=no -o ControlPath=none docker#127.0.0.1 -o IdentitiesOnly=yes -i C:\Users\Ming.docker\machine\machines\defualt\id_rsa -p 58549] C:\Program Files\OpenSSH\bin\ssh.exe }
(defualt) DBG | About to run SSH command:
(defualt) DBG | exit 0
(defualt) DBG | SSH cmd err, output: exit status 255:
(defualt) DBG | Error getting ssh command 'exit 0' : Something went wrong running an SSH command!
(defualt) DBG | command : exit 0
(defualt) DBG | err : exit status 255
(defualt) DBG | output :

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas