Two CPU in two modes - VMX and non-VMX modes - virtual-machine

In a dual or multi-CPU configuration, is it possible for one CPU to be operating in VMX mode(root or non-root) and another (or other) to be executing in legacy (non-VMX) mode?
Explanation in this regard will be appreciated

Yes.
VMXON — This instruction takes a single 64-bit source operand that is in memory. It causes a logical processor
to enter VMX root operation and to use the memory referenced by the operand to support VMX operation.
Emphasis mine
You can find the answer to this type of questions directly in the Intel Manuals, Volume 3, Chapters 23-33
The quote state that the effect of enabling VMX is limited to the logical processor that is executing the suitable instruction.
Each CPU has its set of cores and each core has its set of logical processors, thus each "CPU" has seen by the end user operates independently.

Related

Is there any connection between the segments created in memory by a microprocessor and the memory structure of a process in an operating system?

In 8086 microprocessor, we segment the memory into segments of 64K each because of the 16 bit registers (Since a 20 bit address cannot be stored in the 16 bit register). These segments are categorized as code segment, data segment, stack segment and extra segment. This structure is similar to that of created by a process in operating system. Does that mean each process takes up memory equivalent to 4 segments which will be equivalent to 4*64K in case of 8086 ? And if this is true then by doing some more math we can say that only 4 process will be handled by a 8086 microprocessor at a time (i.e. one of the process will be running state and others would be in block or ready state) since maximum of 16 segments are possible (Total memory size / size of each segment = 1MB/64K = 16).
I have just started studying this and saw this equivalence between process and segments. Does any such connection between the segments of the memory and the memory structure of the process actually exists or it's just my crazy imagination ?
A little history helps. Early UNIX(tm) ran on the Digital pdp minicomputer family. The first circulated versions were V6 & V7, which were exclusive to the pdp-11 family. That family could support a whopping 256K of RAM; but the gp register set (used for address formation) were 16bits wide. There was a limited memory protection scheme in the processor, which permitted the kernel (supervisor) to have a separate address space from user (user); and instructions (addresses generated by pc) to be separate from data (generated by other means). This will probably get edited into the dust by pdp-11 fanbois.
At around this time, intel was rolling what was to become the 8086. Current 8-bit CPUs were already straining at a 64K address space limitation, and were using a concept called bank switching to increase that. In bank switching, some sub-ranges of the 64K address space could be re-pointed into a larger memory bank; so although you could carefully address much more memory. The Hitachi 64180 was one of the CPUs that incorporated this into its silicon; most used external memory controllers.
The 8086 addressing scheme was an amalgamation of these notions. You could produce an Operating System which supported dynamically relocated processes and shared text with up to 64K Instruction + 64K Data. The general idea was you take the segment registers out of the programming model, thus if the OS has to relocate the process, it knows that the process had no saved copy of the old segment value. The commercial OS QNX 1.x, 2.x provided this as a model; the later using the 286 extensions to protect against programs that played with the segment registers.
For programs that didn't care about such subtleties (Lotus 123, ...), you could use the segment registers to effectively create a 2^20 address space on the 8086. It is an ugly programming model in this mode because address formation is A=Seg*16+Base, so Seg=1,Base=0 and Seg=0,Base=16 resolve to the same address.
So, you aren't hallucinating, it was quite intentional, if more than a little half-arsed.

Connection between microprogramming and embedded systems

What is the connection between microprogramming and embedded systems?
Is microprogramming a machine language?
Microprogramming it is the same as microcode?
Are embedded systems manufactured using microprogramming only?
Or isn't it an exclusivity of an embedded system that is using microprogramming?
If possible, please exemplify. Thanks!
Microprogramming / microcoding is an implementation technique for processors — as such dates back pretty far.
A processor implements an instruction set; programs using these instructions are generated by a compiler or assembly language programmer and stored in program files, later loaded into memory to execute the program.
A microcoded processor is like having another, different processor-within-the-processor that is used to interpret the instruction stream (sequences of machine language) of the program. This processor within the processor has its own instructions set and its own program. Unlike the externally visible instruction set (which can load & run any program), the processor within the processor generally only runs one dedicated program (the instruction set interpreter), which is stored in a ROM (or re-writable flash) inside the processor.
(In some such systems, the processor within the processor has instructions that are very wide (as in horizontal microcode), and impractical (regarding code size) for general use by regular programs.)
What is the connection between microprogramming and embedded systems?
There is no particular relationship between microcoding and embedded. Yes/no on either can be mixed with each other.
Is microprogramming a machine language?
Yes, I would say it is, but it is generally not accessible to operating systems and user programs.
Microcoding was particularly popular when virtually all instructions each executed in multiple cycles. Later techinques removed the indirection of the microcoded machine in favor of direct hardwired execution, with single cycle approaches. This publication sheds some light on some of the thinking of the day during the transition of the state of the art from microcoding to hard wiring. See also IBM 801.
Most processors these days are not microprogrammed; however, the very advanced techniques applied by x86 processors may mimic microprogramming techniques here and there.
Embedded systems are simply processors used in devices that are not seen as "computers", for example, a thermostat, a microwave, or a car (which might have numerous embedded systems). Considerations here are that these systems are dedicated: they tend to run a single program (rather than running an operating system capable of running any program the user directs); they have low power requirements, disconnected requirements (disconnected from user terminal/screen/keyboard, perhaps from network, etc..). Still, embedded system keep getting even more powerful.

What is the difference between the gem5 CPU models and which one is more accurate for my simulation?

When running a simulation in gem5, I can select a CPU with fs.py --cpu-type.
This option can also show a list of all CPU types if I use an invalid CPU type such as fs.py --cpu-type.
What is the difference between those CPU types and which one should I choose for my experiment?
Question inspired by: https://www.mail-archive.com/gem5-users#gem5.org/msg16976.html
An overview of the CPU types can be found at: https://cirosantilli.com/linux-kernel-module-cheat/#gem5-cpu-types
In summary:
simplistic CPUs (derived from BaseSimpleCPU): for example AtomicSimpleCPU (the default one). They have no CPU pipeline, and therefor are completely unrealistic. However, they also run much faster. Therefore,they are mostly useful to boot Linux fast and then checkpoint and switch to a more detailed CPU.
Within the simple CPUs we can notably distinguish:
AtomicSimpleCPU: memory requests finish immediately
TimingSimpleCPU: memory requests actually take time to go through to the memory system and return. Since there is no CPU pipeline however, the simulated CPU stalls on every memory request waiting for a response.
An alternative to those is to use KVM CPUs to speed up boot if host and guest ISA are the same, although as of 2019, KVM is less stable as it is harder to implement and debug.
in-order CPUs: derived from the generic MinorCPU by parametrization, Minor stands for In Order:
for ARM: HPI is made by ARM and models a "(2017) modern in-order Armv8-A implementation". This is your best in-order ARM bet.
out-of-order CPUs, derived from the generic DerivO3CPU by parametrization, O3 stands for Out Of Order:
for ARM: there are no models specifically published by ARM as of 2019. The only specific O3 model available is ex5_big for an A15, but you would have to verify its authors claims on how well it models the real core A15 core.
If none of those are accurate enough for your purposes, you could try to create your own in-order/out-of-order models by parametrizing MinorCPU / DerivO3CPU like HPI and ex5_big do, although this could be hard to get right, as there isn't generally enough public information on non-free CPUs to do this without experiments or reverse engineering.
The other thing you will want to think about is the memory system model. There are basically two choices: classical vs Ruby, and within Ruby, several options are available, see also: https://cirosantilli.com/linux-kernel-module-cheat/#gem5-ruby-build

How can I verify that my hardware prefetcher is disabled

I have disabled hardware prefetching using the following guidelines:
Installed msr-tools 1.3
wrmsr -a 0x1A4 1
The prefetcher information for my system (Broadwell) is in the msr address 0x1A4
as shown by intel documentation.
I did rdmsr -a 0x1A4 the out put showed 1.
According to the intel docs if the bit number corresponding to the particular prefetcher is set to 1 that means it is disabled.
I wanted to know if there is anyother way I can verify that my hardware prefetchers have been disabled?
Disabled prefetcher shall slowdown some operations which benefit from enabled prefetcher. You will need to write some code (probably in assembler language) and measure it's performance with enabled and disabled prefetcher.
Some long time ago I wrote test program to measure memory read performance. It was repeatedly reading memory in blocks of different sizes. It proved obvious correlation between memory block sizes and capacities of different levels of memory cache.
You can run some I/O intensive and prefetch-friendly workloads to verify.
btw, My CPU is Gold 6240, and set 0x1A4 as 1 is not working on it.
Instead, I use sudo wrmsr -a 0x1A4 0xf

Virtual Processors and Logical Partitions

I basically wanted to know what exactly a virtual processor is. At IBM's site they define it as:
"A virtual processor is a representation of a physical processor core to the operating system of a logical partition that uses shared processors. "
I understand that if there are x processors, each of which can simultaneously perform two operations, then the system can perform 2x operations simultaneously. But where does virtual processor fit into this. And i tried looking up the difference between a logical partition and other partitions such as primary but wasn't really sure.
I'd like to draw an analogy between virtual memory and virtual processors.
Start with expectations:
A user program is written against a set of expectation about what the memory looks like (an a nice flat, large, continuous memory model is the best...)
An OS system is written against a set of expectation of how the hardware performs (what CPU protection modes operation are available, how interrupts arrive and are blocked and handled, how to talk to IO devices, etc...)
Realize that expectation can be met directly by the hardware, or by an abstraction layer
Virtual memory is a set of (specialized, not found in simple chips) hardware tools and OS services that fake a user program into thinking that it has that nice, flat, large, continuous memory space, even while the OS is busily dividing the real memory into little piece, and storing some of them on disk, bringing other back, and otherwise making a real hash of it. But your code doesn't care. Everything just works.
A virtual processor system is a set of (specialized, not found in consumer CPUs) hardware tools and hypervisor services that allow your OS to believe it has direct access to one or more processors with the expected protection modes, interrupts, etc. even though the hypervisor is busily swapping whole OS contexts onto and off of one or more real processors, starting and stopping access to IO busses, and so on and so forth. But the OS doesn't care. Everything just works.
The hardware support to do this is has only recently started to be available in "desktop" CPUs, but Big Iron has had it for ages. It is useful for a couple of reasons
Protection. In a properly protected OS, it is tough for one processes or user to spy on another. But since they can be resident in the same context, it may still be possible. Virtualizing OSs divides them by another, even thinner channel and makes it that much harder for data to leak, and malicious things to be done.
Robustness. If you can swap OS contexts in and out you migrate them from one machine to anther and checkpoint and restart. Which allows for computers that detect failures on their own processors and recover gracefully.
These are the things (aside from millions of LOC of heavily debugged, mission critical code) that have kept people paying for Big Iron.