What is the difference between trap and emulate and binary translation? - virtual-machine

I understand what trap and emulate is, however I'm struggling to understand what binary translation is and how it differs from trap and emulate. I'm very new to this topic and am trying to understand this introduction from a paper from 2006:
"Until recently, the x86 architecture has not permitted classical trap-and-emulate virtualization. Virtual Machine Monitors for x86, such as VMware ® Workstation and Virtual PC, have instead used binary translation of the guest kernel code. However, both Intel and AMD have now introduced architectural extensions to support classical virtualization."
I also don't understand what "classical virtualization" is in the context trap and emulate vs binary translation. Any help understanding these terms would be appreciated.

I think this link will help you. I have tried to summarized it, for more information refer the link.
Whenever the guest operating system tries to perform one of these privileged operations, the processor will "trap" the instruction and hand over control to the host operating system or hypervisor, so that it can do the required operation and then return control back to the guest. But most real-world instruction sets, including x86, were not designed with virtualization in mind. As a result, there are privileged instructions that do not have any corresponding trap facility.
Binary translation addresses this problem directly. Instead of depending on the processor itself to detect the privileged instructions it uses virtualization software that inspects the instruction stream in software and whenever the virtualization software detects a problem instruction, it rewrites it on-the-fly,typically replacing it with a kind of manual trap, that will hand over control to the hypervisor at the appropriate moment. Hope this helps you.

Related

Why does Hyper-V require hardware virtualization?

I know Hyper-V is type 1 or native hypervisor, meaning it sits on top of hardware and doesn't require an operating system, i.e. talks to the hardware through the ISA interface).
But I don't understand why does it require hardware assisted virtualization? Does it mean Hyper-V is not full native hypervisor because it requires another part (put in hardware)? Does every native hypervisor require hardware virtualization?
Because without hardware virtualization it would have to run an emulation which comes with a BRUTAL performance implication. There is no way to do proper virtualiaztion without either interpreting a significant number of machine code, or have hardware support for this. EVERY native hypervisor requires hardware virtualization - which is, btw., nothing new... it was in the firstp rocessors mit 60s, iirc (196x). Yes, this is that old. VMS - the Mainframe operating system - is acutally short for... "Virtual Machine System". The processors back then had hardware virtualization.

Could we fake processor and RAM with hypervisor?

I hear a lot about "Hypervisors are not emulators. If you need to emulate another hardware specifications than you computer have, you need to use emulator, not hypervisor".
Well, but yesterday I saw this video on youtube - click here - which shows how to install Win 95 on modern macOS with VMware Fusion.
The strange thing for me is that on 17:39 you could see that Win 95 virtual machine is "Pentium Pro with 64 MB RAM".
Hmm! So, Fusion somehow faked processor and RAM, right? But it is not emulator, right? So, does it mean that any hypervisor can fake processor and RAM?
At the time of its release, Windows 95 only had code to recognize CPUIDs up to Pentium Pro. Any processor not lower than Pentium Pro is "called" Pentium Pro.
The main difference is the Hypervisor cannot emulate CPU code. All code must run on the original processor.
The hypervisor does emulate the BIOS, which in tells the OS the hardware specs available; including RAM, Boot order and peripherals attached.
When you are talking about VMWare Fusion the way this works depends on how virtualization is achieved. According to wikipedia VMWare Fusion utilizes hardware-assisted virtualization, dynamic binary translation, and para-virtualization.
In the hardware-assited virtualization case, #Strom is correct and guest instructions can be executed directly on the host CPU. Besides #Strom answer, you can fake the CPU type by trapping and emulating the cpuid instruction.
In the para-virtualization case you replace critical instructions by calls to the hypervisor which emulates the instruction on behalf of the guest. So again you emulate the cpuid instruction to "fake" the CPU type. Keep in mind that this requires a modified, hence para-virtualized, guest operating system.
Finally, dynamic binary translation scans the guest code for critical instructions during runtime and either replaces them by traps into the hypervisor achieving some kind of "live para-virtualization" or translating blocks of guest code into equivalent blocks of host code that modifies the VM state according to the original guest code (this is e.g. how the QEMU full system emulator works). As a result, again you are able to "fake" the CPU type by emulating the cpuid instruction. Notice that guest and host can be the same architecture in this case, but there is no need for this.
Of course a combination of above techniques is also feasible.
As for virtualization of main memory, the hypervisor is in full control of the hardware so you can simply configure a VM with just 64MB of main memory. The VM is not able to "see" more than this due to the techniques shortly discussed above.
Please keep in mind that this just gives a very short overview of virtualization and I tried to keep it short and informative, so I know my explanations are partially not very accurate. If you are really interested in virtualization I recommend reading "Virtual Machines: Versatile Platforms for Systems and Processes" or the papers on the topic by Popek & Goldberg and "Xen and the Art of Virtualization"

On virtual machines like JVM and CLR, is there something similar to an operating system, which provides programming APIs?

JVM and CLR are virtual machines. Similarly to bare metal computer machines, they provide virtual machine languages.
On real computer machines, we have operating systems, which provide system calls and APIs. For example the famous book Advanced Programming Unix Environment describes the APIs provided by Unix and Linux. Windows, however, provide different APIs.
On top of virtual machines like JVM and CLR, is there something which
plays the role of an operating system, and
provides programming APIs?
If there is nothing playing the role of an OS on virtual machines like JVM and CLR, what provides programming APIs (such as those in Java, C#, ...) similar to OS APIs?
Note: I am asking about VMs and on top of them, instead of what is underlying them. Do VMs not run some virtual OS on top of them? If there is no, why is there no such a need?
Thanks.
There are two kinds of virtual machines: system virtual machines and process virtual machines. System virtual machines provide a virtualization of complete instructions sets including user-mode instructions and kernel-mode instructions and therefore they can run operating systems. Process virtual machines virtualizes user-mode instructions and, usually, some system calls (such as those for managing threads, memory, and files) and therefore can only run applications or processes. That is, on top of a single process virtual machine a single app or process can run. The JVM and CLR are process virtual machines.
While in theory it is indeed possible to develop an OS to run on a process virtual machine, this is practically not useful because the performance of the programs that will run on that OS will be terrible due to the excessive layering in software.
Generally, system and process virtual machines themselves are not considered to be operating systems. However, process virtual machines do not necessarily require an OS to run on and may run on a bare-metal computer. The .NET Micro Framework is an example of such VM. Such VMs are sometimes called operating systems. Some virtual ISAs or a subset thereof have been implemented completely in hardware similar to x86 and ARM. One could develop operating systems to run on them. They are almost never used in industry because of their low performance.
An "operating system" is a a large, fuzzy ball of hairs. You got the hal, kernel, userspace... do we count some userspace libraries (typically libc) too?
With squinting you can find some comparable concepts in the JVM/JRE but generally a JVM runs on top of a bona-fide operating system and thus does not reimplement all aspects and instead simply provides platform-independent abstractions over facilities that you can find on almost all systems.
For example these days Thread usually is just java representation of native OS threads, but a JVM could choose to implement thread scheduling in userspace, and in sun's JVM did back during 1.1 times and some other JVMs still do today.
I'll answer this from the perspective of Java since I have no in-depth knowledge of .Net. I would assume, however that the CLR and JVM are similar from this point of view.
Let's start with an operating system. The purpose of this is to abstract away the hardware specific interfaces as well as providing a runtime environment for processes.
The OS uses device drivers to provide a uniform interface to similar devices (like the processor, memory, disk drives, network cards and so on). The OS uses system calls to allow user-level code to interact with these devices. If you wrote code in C you will call 'open' then 'read' and 'write' to the device before calling 'close'. The 'ioctl' (IO control) system call is also used a lot for device control. Each OS provides a standard set of these system calls (you can run Linux on an Intel or ARM processor, but you have the same set of system calls for each distro). Incidentally, this is also how Docker works by using a standard set of system calls to enable containers to be moved from one platform to another without problem.
The OS also provides the ability to run multiple processes simultaneously. With newer, multi-core machines this really can happen in parallel but the OS also uses scheduling to share a CPU between multiple processes or threads. By switching processes or threads very quickly this gives the impression that things are happening simultaneously, even on a single processor.
Now let's look at the JVM, which is a user-level process (from the OS point of view so just like any other user application). This has been designed to abstract away CPU and operating specific functionality from the Java application. The bytecodes generated by the Java compiler do not contain any system calls. If you look at the bytecode instruction set (defined in the Java Virtual Machine Specification) you will find that the instructions provide many familiar low-level features such as loading a register, bit manipulation and so on. In addition, there are many instructions that are higher level and relate more specifically to Java; things like invokestatic that invokes a static method on a class, monitorenter. monitorexit for locking, newarray and so on.
The JVM takes these bytecodes and converts them from a CPU- and OS-independent form (that of a Virtual Machine) into the instructions for the specific CPU architecture and OS that the JVM is running on. In some cases this can be a one-to-one mapping (for things like bitwise operators), but can often be much more complex and involve the use of system calls to open files, access network interfaces, etc. The JVM also uses the OS to deal with threads created by the application. In the very early days of Java operating systems like Windows 95 did not have the concept of threads within a process so the JVM had to provide it's own implementation (this was called green threads and performed pretty badly).
To summarise the JVM takes the platform neutral bytcodes of the class files it is executing and converts them to the apprporiate native CPU instructions and system calls to make the application run. The JVM does not provide any traditional OS services, it just uses them.

software virtualization vs hardware virtualization

What actually happens when hardware virtualization is enabled ?
if not, a hypervisor uses binary translation. but, when hardware virtualization is enabled, i have read that it uses trap and emulate.
so the guest code executes directly on the host cpu, if its a privileged instruction the cpu hands over the control to the hypervisor, the hypervisor emulates that instruciton and then executes it.
so, what does the emulation means here ? is the same binary translation carried out when hardware virtualization is enable ?
Enabling HW virtualization sets the vmx flag in Intel and svm flag in AMD.
In Intel architecture, this allows the user-space calls to run as-is on the lower protection ring as they cannot potentially interfear with the host OS. On the other hand the kernel-space calls of the virualized OS are trapped and binary-translated by the hypervisor.
This is done so as to partly take away the CPU intensive translation for trivial calls. How-much of this happens depends on the virtualization type- full, partial or paravirtual.
Binary-translation is a subset of the more elaborate process of emulation. It alligns the guest code to be able to run on the host-architecture.

How Java Virtual Machine can work on system without virtualization support?

If hardware support is a must for virtualization, how can Java Virtual Machines run on machines without support for virtualization ? Or is JVM not a virtual machine ?
A JVM is not virtual in the same sense as a VirtualBox or VMWare virtual machine. It is a 'machine' that implements the Java bytecode, not a virtualized version of actual hardware.
The term-of-art 'virtual machine' was coined a very long time ago for the following scenario:
make up a computer, like Knuth's MIX.
write a computer program that implements the made-up computer.
run programs
When this virtual machine runs, it's a completely ordinary program, running completely in user mode. It needs no special help from the hardware or operating system to work reasonably well. This is especially true of the JVM, since the Java byte code does not deal with low-level hardware I/O or other things which are hard to simulate.
Later, historically, (to pick a particular instance), IBM invented VM/370. VM/370 uses the other sense of the term 'virtual machine'. In this later sense, the hardware and operating system cooperate to allow a single physical machine to host multiple virtual instances of (more or less) the same architecture, in which multiple copies of the whole operating system are written as if they are running on more or less bare hardware. Later, the X86 was designed with features to facilitate this.
So, yes, any virtual machine is making use of some physical hardware, unless you implement it with pieces of paper passed around a table (pace John Searle). But when the virtual machine bears no resemblance to the machine it is running on, then there's no need for special help from the operating system and hardware, and no need for anything as complex as VM/370, or VMware.
If hardware support is a must for virtualization, ...
Let me stop you right there :-)
There is a difference in concept between the JVM (software virtualization) and (for example) a VMWare VM (hardware-assisted virtualization).
The JVM (and other software-based VMMs such as the ones that allow to to emulate x86 on Solaris hardware - I think Bochs and possibly DosBox fall into this category) runs like any other application, using the operating system to gain access to the hardware, or emulating its own hardware purely in software.
VMWare, and the other VMMs optimised for speed, rely on hardware support. In other words, they run on the hardware as if they have full access to the hardware and, only when they try to do something they're not supposed to does the OS captures that attempt and fake it.
That's why VMWare runs so much faster than the software-only emulators. It's because, for the vast majority of the time, it's actually running on the real hardware.
The JVM is a virtual machine, but it doesn't require any additional support from the Operating System. Instead of virtualising instructions for a particular CPU it executes java bytecode.
The JVM is a virtual machine for running Java, in other words it emulates a machine which would be capable of running java. It is a confusing choice of names, but it comes from the general meaning of "machine" not from the more common Virtual Machine meaning.
The JVM, like a regular VM emulates the execution of instructions, but in the case of the JVM the instructions being emulated are Java Instructions, and in the case of a VM they are Hardware Instructions as would be executed by an OS running on the same hardware.
Yes the JVM does access hardware, however this is why you install a MAC or WINDOWS JVM since the instructions are translated by the JVM and acted upon depending on the installation of the JVM, for example, open file dialog on mac opens the mac dialog and windows JVM opens the windows dialog.
So its not being virtualized by the system, but the bytecode is being virtualized by the JVM you installed. It's basically like an application that reads something(bytecode) and does something(access hardware, or other stuff).
It should be noted that nothing stipulates that a JVM does not (have to) have HW virtualization access. There are notable exceptions, but to which the answered poster alluded, few CPs exist that run Java bytecode natively. Maybe someday a Java bytecode HAL or TIMI will be commonplace to put the JVM into the same class as the formalized HW virtualization?