System Virtualization : Understanding IO virtualization and role of hypervisor [closed] - system

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I would like to obtain a correct understanding of I/O virtualization. The context is pure/full virtualization and not para-virtualization.
My understanding is that a hypervisor virtualizes hardware and offers virtual resources to each sandboxed application. Each sandbox thinks its accessing the underlying hardware, but in reality it is not. Instead it is the hypervisor which does all the accesses. It is this aspect I need to understand better.
Let assume a chip has a hardware timer meant to be used by OS kernel as a tick timer. Lets assume that there are 2 virtual machines (E.g Windows and Linux) running atop the hypervisor.
None of the virtual machines have modified their source code. So they continue to spit out instructions that directly program the timer resource.
What is the role of the hypervisor really here? How are the two OSes really prevented from accessing the real stuff?

After a bit of reading, I have reached a certain level of understanding described at:
https://stackoverflow.com/a/13045437/1163200
I reproduce it wholly here:
This is an attempt to answer my own question.
System Virtualization : Understanding IO virtualization and role of hypervisor
Virtualization
Virtualization as a concept enables multiple/diverse applications to co-exist on the same underlying hardware without being aware of each other.
As an example, full blown operating systems such as Windows, Linux, Symbian etc along with their applications can coexist on the same platform. All computing resources are virtualized.
What this means is none of the aforesaid machines have access to physical resources. The only entity having access to physical resources is a program known as Virtual Machine Monitor (aka Hypervisor).
Now this is important. Please read and re-read carefully.
The hypervisor provides a virtualized environment to each of the machines above. Since these machines access NOT the physical hardware BUT virtualized hardware, they are known as Virtual Machines.
As an example, the Windows kernel may want to start a physical timer (System Resource). Assume that ther timer is memory mapped IO. The Windows kernel issues a series of Load/Store instructions on the Timer addresses. In a Non-Virtualized environment, these Load/Store would have resulted in programming of the timer hardware.
However in a virtualized environment, these Load/Store based accesses of physical resources will result in a trap/Fault. The trap is handled by the hypervisor. The Hypervisor knows that windows tried to program timer. The hypervisor maintains Timer data structures for each of the virtual machines. In this case, the hypervisor updates the timer data structure which it has created for Windows. It then programs the real timer. Any interrupt generated by the timer is handled by the hypervisor first. Data structures of virtual machines are updated and the latter's interrupt service routines are called.
To cut a long story short, Windows did everything that it would have done in a Non-Virtualized environment. In this case, its actions resulted in NOT the real system resource being updated, but virtual resources (The data structures above) getting updated.
Thus all virtual machines think they are accessing the underlying hardware; In reality unknown to them, all accesses to physical hardware is mediated through by the hypervisor.
Everything described above is full/classic virtualization. Most modern CPUs are unfit for classic virtualization. The trap/fault does not apply to all instructions. So the hypervisor is easily bypassed on modern devices.
Here is where para-virtualization comes into being. The sensitive instructions in the source code of virtual machines are replaced by a call to Hypervisor. The load/store snippet above may be replaced by a call such as
Hypervisor_Service(Timer Start, Windows, 10ms);
EMULATION
Emulation is a topic related to virtualization. Imagine a scenario where a program originally compiled for ARM is made to run on ATMEL CPU. The ATMEL CPU runs an Emulator program which interprets each ARM instruction and emulates necessary actions on ATMEL platform. Thus the Emulator provides a virtualized environment.
In this case, virtualization of system resources is NOT performed via trap and execute model.

Related

make virtual machine appear a real machine to applications

I am using VmWare Workstation 14 and when I install an operating system (any of them) some programs and apps are able to identify that I am using a virtual machine.
I have seen the vm is using virtualized devices that are really named virtual. like for example VmWare Network Card or etc. Is there any way to install fake real like hardware drivers on these virtual machines? Can this simple change make the app see this vm as a real machine?
How to make this virtual machine appear as a real machine to applications?
Is there really any way?
This was asked as a yes-or-no question so my answer is:
Yes... probably. But it's a lot of work.
There's a 2006 presentation by Tom Liston and Ed Skoudis that talks about this: https://handlers.sans.org/tliston/ThwartingVMDetection_Liston_Skoudis.pdf
It focuses on VMware, but some of it would also apply to other types of Virtual Machine Environments (VMEs).
In summary, they identify as many things as they can find that would allow VM detection, which would each have to be addressed, and they also mention some VMware-specific mitigations for them.
VME artifacts in processes, file system, and/or Windows registry. These would include the VMtools service and "over 50 different references in the file system to 'VMware' and vmx" and "over 300 references in the Registry to 'VMware'", all of which would have to be deleted or changed.
VME artifacts in memory. Specific regions of memory tend to be different in guests (VMs) than hosts, namely the Interrupt Descriptor Table (IDT), Global
Descriptor Table (GDT), and Local Descriptor Table (LDT). The method by which the VM is built may allow these to appear the same in guests as they do in hosts.
VME-specific virtual hardware. This would include the drivers you mention like VmWare Network Card. The drivers would have to be removed or replaced with drivers that do not match the names or code signatures of any virtual drivers. Probably easiest to do on an open-source system, simply by modifying the driver source code and build.
VME-specific processor instructions and capabilities. Some VMEs add non-standard machine language instructions, or modify the behaviour of existing instructions. These can be changed or removed by editing the VME source code, at the cost of convenient host-guest interaction.
VME differences in behaviour. A VM might respond differently on the network, or fail at time synchonization. This could be mitigated with additional source code changes (on both host and guest) to make the network traffic look closer to normal, and providing sufficient CPU cores to the VM would help make sure it does not run more slowly than wall clock time.
Again this is from 2006, so if anyone has a more up-to-date reference, I'd love to see their answer.

On virtual machines like JVM and CLR, is there something similar to an operating system, which provides programming APIs?

JVM and CLR are virtual machines. Similarly to bare metal computer machines, they provide virtual machine languages.
On real computer machines, we have operating systems, which provide system calls and APIs. For example the famous book Advanced Programming Unix Environment describes the APIs provided by Unix and Linux. Windows, however, provide different APIs.
On top of virtual machines like JVM and CLR, is there something which
plays the role of an operating system, and
provides programming APIs?
If there is nothing playing the role of an OS on virtual machines like JVM and CLR, what provides programming APIs (such as those in Java, C#, ...) similar to OS APIs?
Note: I am asking about VMs and on top of them, instead of what is underlying them. Do VMs not run some virtual OS on top of them? If there is no, why is there no such a need?
Thanks.
There are two kinds of virtual machines: system virtual machines and process virtual machines. System virtual machines provide a virtualization of complete instructions sets including user-mode instructions and kernel-mode instructions and therefore they can run operating systems. Process virtual machines virtualizes user-mode instructions and, usually, some system calls (such as those for managing threads, memory, and files) and therefore can only run applications or processes. That is, on top of a single process virtual machine a single app or process can run. The JVM and CLR are process virtual machines.
While in theory it is indeed possible to develop an OS to run on a process virtual machine, this is practically not useful because the performance of the programs that will run on that OS will be terrible due to the excessive layering in software.
Generally, system and process virtual machines themselves are not considered to be operating systems. However, process virtual machines do not necessarily require an OS to run on and may run on a bare-metal computer. The .NET Micro Framework is an example of such VM. Such VMs are sometimes called operating systems. Some virtual ISAs or a subset thereof have been implemented completely in hardware similar to x86 and ARM. One could develop operating systems to run on them. They are almost never used in industry because of their low performance.
An "operating system" is a a large, fuzzy ball of hairs. You got the hal, kernel, userspace... do we count some userspace libraries (typically libc) too?
With squinting you can find some comparable concepts in the JVM/JRE but generally a JVM runs on top of a bona-fide operating system and thus does not reimplement all aspects and instead simply provides platform-independent abstractions over facilities that you can find on almost all systems.
For example these days Thread usually is just java representation of native OS threads, but a JVM could choose to implement thread scheduling in userspace, and in sun's JVM did back during 1.1 times and some other JVMs still do today.
I'll answer this from the perspective of Java since I have no in-depth knowledge of .Net. I would assume, however that the CLR and JVM are similar from this point of view.
Let's start with an operating system. The purpose of this is to abstract away the hardware specific interfaces as well as providing a runtime environment for processes.
The OS uses device drivers to provide a uniform interface to similar devices (like the processor, memory, disk drives, network cards and so on). The OS uses system calls to allow user-level code to interact with these devices. If you wrote code in C you will call 'open' then 'read' and 'write' to the device before calling 'close'. The 'ioctl' (IO control) system call is also used a lot for device control. Each OS provides a standard set of these system calls (you can run Linux on an Intel or ARM processor, but you have the same set of system calls for each distro). Incidentally, this is also how Docker works by using a standard set of system calls to enable containers to be moved from one platform to another without problem.
The OS also provides the ability to run multiple processes simultaneously. With newer, multi-core machines this really can happen in parallel but the OS also uses scheduling to share a CPU between multiple processes or threads. By switching processes or threads very quickly this gives the impression that things are happening simultaneously, even on a single processor.
Now let's look at the JVM, which is a user-level process (from the OS point of view so just like any other user application). This has been designed to abstract away CPU and operating specific functionality from the Java application. The bytecodes generated by the Java compiler do not contain any system calls. If you look at the bytecode instruction set (defined in the Java Virtual Machine Specification) you will find that the instructions provide many familiar low-level features such as loading a register, bit manipulation and so on. In addition, there are many instructions that are higher level and relate more specifically to Java; things like invokestatic that invokes a static method on a class, monitorenter. monitorexit for locking, newarray and so on.
The JVM takes these bytecodes and converts them from a CPU- and OS-independent form (that of a Virtual Machine) into the instructions for the specific CPU architecture and OS that the JVM is running on. In some cases this can be a one-to-one mapping (for things like bitwise operators), but can often be much more complex and involve the use of system calls to open files, access network interfaces, etc. The JVM also uses the OS to deal with threads created by the application. In the very early days of Java operating systems like Windows 95 did not have the concept of threads within a process so the JVM had to provide it's own implementation (this was called green threads and performed pretty badly).
To summarise the JVM takes the platform neutral bytcodes of the class files it is executing and converts them to the apprporiate native CPU instructions and system calls to make the application run. The JVM does not provide any traditional OS services, it just uses them.

Difference between "process virtual machine" with "system virtual machine"

What's the difference between process virtual machine with system virtual machine?
My guess is that process VM is not providing a kind of an operating system for the whole application for that OS, rather providing an environment for some specific application.
And system VM is providing an environment for an OS to be installed just like VirtualBox.
Am I getting it correct?
Another question is the difference between the two different implementation of system VM: hosted vs. stand-alone.
I'm a beginner studying OS, so easy and understandable answer would be greatly appreciated :)
A Process virtual machine, sometimes called an application virtual machine, runs as a normal application inside a host OS and supports a single process. It is created when that process is started and destroyed when it exits. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware or operating system, and allows a program to execute in the same way on any platform.
A System virtual machine provides a complete system platform which supports the execution of a complete operating system (OS),Just like you said VirtualBox is one example.
A Host virtual machine is the server component of a virtual machine , which provides computing resources in the underlying hardware to support guest virtual machine (guest VM).
The following is from http://airccse.org/journal/jcsit/5113ijcsit11.pdf :
System Virtual Machines
A System Virtual Machine gives a complete virtual hardware platform with support for execution
of a complete operating system (OS).
The advantage of using System VM are:
Multiple Operating System environments can run in parallel on the same piece of
hardware in strong isolation from each other.
The VM can provide an instruction set architecture (ISA) that is slightly different from
that of the real machine
The main draw backs are:
Since the VM indirectly accesses the same hardware the efficiency is compromised.
Multiply VMs running in parallel on the same physical machine may result in varied
performance depending on the workload imposed on the system. Implementing proper
isolation techniques may address this drawback.

How Java Virtual Machine can work on system without virtualization support?

If hardware support is a must for virtualization, how can Java Virtual Machines run on machines without support for virtualization ? Or is JVM not a virtual machine ?
A JVM is not virtual in the same sense as a VirtualBox or VMWare virtual machine. It is a 'machine' that implements the Java bytecode, not a virtualized version of actual hardware.
The term-of-art 'virtual machine' was coined a very long time ago for the following scenario:
make up a computer, like Knuth's MIX.
write a computer program that implements the made-up computer.
run programs
When this virtual machine runs, it's a completely ordinary program, running completely in user mode. It needs no special help from the hardware or operating system to work reasonably well. This is especially true of the JVM, since the Java byte code does not deal with low-level hardware I/O or other things which are hard to simulate.
Later, historically, (to pick a particular instance), IBM invented VM/370. VM/370 uses the other sense of the term 'virtual machine'. In this later sense, the hardware and operating system cooperate to allow a single physical machine to host multiple virtual instances of (more or less) the same architecture, in which multiple copies of the whole operating system are written as if they are running on more or less bare hardware. Later, the X86 was designed with features to facilitate this.
So, yes, any virtual machine is making use of some physical hardware, unless you implement it with pieces of paper passed around a table (pace John Searle). But when the virtual machine bears no resemblance to the machine it is running on, then there's no need for special help from the operating system and hardware, and no need for anything as complex as VM/370, or VMware.
If hardware support is a must for virtualization, ...
Let me stop you right there :-)
There is a difference in concept between the JVM (software virtualization) and (for example) a VMWare VM (hardware-assisted virtualization).
The JVM (and other software-based VMMs such as the ones that allow to to emulate x86 on Solaris hardware - I think Bochs and possibly DosBox fall into this category) runs like any other application, using the operating system to gain access to the hardware, or emulating its own hardware purely in software.
VMWare, and the other VMMs optimised for speed, rely on hardware support. In other words, they run on the hardware as if they have full access to the hardware and, only when they try to do something they're not supposed to does the OS captures that attempt and fake it.
That's why VMWare runs so much faster than the software-only emulators. It's because, for the vast majority of the time, it's actually running on the real hardware.
The JVM is a virtual machine, but it doesn't require any additional support from the Operating System. Instead of virtualising instructions for a particular CPU it executes java bytecode.
The JVM is a virtual machine for running Java, in other words it emulates a machine which would be capable of running java. It is a confusing choice of names, but it comes from the general meaning of "machine" not from the more common Virtual Machine meaning.
The JVM, like a regular VM emulates the execution of instructions, but in the case of the JVM the instructions being emulated are Java Instructions, and in the case of a VM they are Hardware Instructions as would be executed by an OS running on the same hardware.
Yes the JVM does access hardware, however this is why you install a MAC or WINDOWS JVM since the instructions are translated by the JVM and acted upon depending on the installation of the JVM, for example, open file dialog on mac opens the mac dialog and windows JVM opens the windows dialog.
So its not being virtualized by the system, but the bytecode is being virtualized by the JVM you installed. It's basically like an application that reads something(bytecode) and does something(access hardware, or other stuff).
It should be noted that nothing stipulates that a JVM does not (have to) have HW virtualization access. There are notable exceptions, but to which the answered poster alluded, few CPs exist that run Java bytecode natively. Maybe someday a Java bytecode HAL or TIMI will be commonplace to put the JVM into the same class as the formalized HW virtualization?

What are the benefits of a Hypervisor VM?

I'm looking into using virtual machines to host multiple OSes and I'm looking at the free solutions which there are a lot of them. I'm confused by what a hypervisor is and why are they different or better than a "standard" virtual machine. When I mean standard I going to use the benchmark virtual machine VMWare Server 2.0.
For a dual core system with 4 GB of ram that would be capable of running a max of 3 VMs. Which is the best choice? Hypervisor or non-hypervisor and why? I've already read the Wikipedia article but the technical details are over my head. I need a basic answer of what can these different VM flavors do for me.
My main question relates to how I would do testing on multiple environments. I am concerned about the isolation of OSes so I can test applications on multiple OSes at the same time. Also which flavor gives a closer experience of how a real machine operates?
I'm considering the following:
(hypervisor)
Xen
Hyper-V
(non-hypervisor)
VirtualBox
VMWare Server 2.0
Virtual PC 2007
*The classifications of the VMs I've listed may be incorrect.
The main difference is that Hyper-V doesn't run on top of the OS but instead along with the system it runs on top of a thin layer called hypervisor. Hypervisor is a computer hardware platform virtualization software that allows multiple operating systems to run on a host computer concurrently.
Many other virtualization solution uses other techniques like emulation. For more details see Wikipedia.
Disclaimer, everything below is (broadly) my opinion.
Its helpful to consider a virtual machine monitor (a hypervisor) as a very small microkernel. It has very few jobs beyond accessing the underlying hardware, such as monitoring of event channels and granting guest domains access to specific resources .. while enforcing some kind of scheduler.
All guest machines are completely oblivious of the others, the isolation is true. Guests do not share memory with the privileged guest (or each other). So, in this instance, you could (roughly) think of each guest (even the privileged one) as a process, as far as the VMM is concerned. Typically, the first guest gets extra privileges so that it can manage the rest. This is the ideal technology to use when virtual machines are put into production and exposed to the world.
Additionally, some guests can be patched to become aware of the hypervisor, significantly increasing their performance.
On the other hand we have things like VMWare and QEMU, which rely on the host kernel to give it access to bare metal and enough memory to exist. They assume that all guests need to be presented with a complete machine, the limits put on the process presenting these (more or less) become the limits of the virtual machine. I say more or less because device mapper QoS is not commonly implemented. This is the ideal solution for trying code in some other OS, or some other architecture. A lot of people will call QEMU, Simics or even sometimes VMWare (depending on the product) a 'simulator'.
For production roll outs I use Xen, for testing something I just cross compiled I use QEMU, Simics or VirtualBox.
If you are just testing / rolling new code on various operating systems and architectures, I highly recommend #2. If your need is introspection (i.e. watching guest memory change as bad programs run in a guest) ... I'd need more explanation before answering.
Benefits of Hypervisor:
Hypervisor separates virtual machines logically, assigning each its own slice of underlying computing power, memory, and storage, thus preventing the virtual machines from interfering with each other.