What Model-Specific Register(s) control RAM error correction on Ivy Bridge Xeon? - msr

How can one determine whether error correction is active on an Ivy Bridge system? (Requires the combination of a Xeon 12xx-v2 CPU and ECC UDIMMs).
Ideally such a method would also run on systems without the requisite hardware (and return that ECC is disabled) as well as checking the memory controller configuration when the hardware is present. But for my purposes I just need it to work on a system that definitely does have ECC-capable CPU and RAM.
Normally I would use an existing tool such as MemTest86+ to check this, however it hasn't been updated to support Ivy Bridge yet.

On IVB processors, the ECC is controlled in the chipset (CSR), and not via MSRs.
Specifically, on IVB this is in bus 1, device 15 and 29, at offset 0x7C bit 2.
This should be programmed by the BIOS/MRC during start-up on platforms based on the SPD information in the DIMM reporting that the DIMMs are ECC capable (along with additional settings).

Related

Most hardware is controlled at the driver level through memory mapped handles. How is RAM controlled? Is there a spec for it?

I'm just getting into the massive topic of learning UEFI driver development and from what I understand so far, hardware peripherals are controlled using specific addresses mapped to memory. Well, the memory is hardware too. Is it not controlled by drivers?
I assume the CPU and motherboard have built-in circuits that handle this, but my curiosity is whether drivers have any hardware level control to this handling. I'd just prefer to know for sure and I'm not sure what manual would explain this.
[kernel/UEFI] driver <-> memory mapped address <-> firmware [hardware:keyboard]
[kernel/UEFI] driver <-> ? <-> firmware [hardware:RAM]
guess:
spec
driver <-> CPU microcode <-> motherboard circuit <-> firmware
I just think assumptions are bad and can't find a citation confirming the probable answer. The answer is relevant to security and which supply chain / standard we're trusting. Like PCIe or NVMe are standard specs, perhaps there's a standard for RAM <-> CPU communication?
Maybe this question is a better fit for an Engineering SE site?
From software development perspective, there isn't a driver for RAM control ‐ it's exposed as an indistinguishable part of the hardware via the Instruction Set for a given Architecture (ISA). Just like CPU is hardware, but without a driver to control it. Reason for drivers is access to hardware unknown to the ISA, such as a USB device, a particular manufacturer's SSD (which may or may not be present at power up time), graphics hardware, etc. Just like CPU, RAM must be present at power up ‐ you won't get much further than an error code via lights and beeps without RAM in your system at power up time. This isn't the case for most, if not all, other hardware; such other hardware is therefore optional, and isn't part of the ISA definition; drivers (software) are needed to access such optional hardware.
RAM is managed by both hardware (CPU; see Intel manual, volume 3), especially modern hardware, which provides virtualization support for a modern OS, including paging support, and software (the OS memory allocator), for purposes of virtualizing RAM for the running processes. No drivers though ‐ just addressing via ISA instructions.
If you're looking for an answer from a hardware perspective, such as the details of the bus circuitry, which CPU pins are involved, exact protocol, etc., then this question is a better candidate for Engineering StackExchange site.

Virtualization architecture on mainframe (z/Architecture)

I have with interest studied the hardware virtualization extensions added by Intel and AMD to the x86 architecture (known as VMX and SVM, respectively). While this is still a relatively recent addition to x86 CPU's my understanding is that the mainframe architecture made extensive use of virtualization since the 70's-80's for instance in the form of the venerable z/VM operating system. Even nested virtualization has been used.
My question is, is there a public documentation of the hardware facilities provided by the z/Architecture used by the z/VM operating system to implement this virtualization? I.e. the control registers and data structures that the hardware implements to allows the hypervisor to simulate the guest state and trap necessary instructions? Another thing I am curious about is if the z/Architecture supports second-level address translation (which was added later to VMX and SVM).
Just to get it out of the way, System/370 and all its descendants support virtualization as is (they satisfy virtualization requirements). In that sense, no special hardware support has ever been needed, as opposed to Intel architecture.
The performance improvements for VM guests on System/370, XA, ESA etc. all the way through z/Architecture have been traditionally implemented using DIAG (diagnose) instruction as well as microcode (now millicode) assist. In modern terms, it is more of paravirtualization. The facilities are documented, you can start here for instance.
Update - after reading extensive comments, a few notes and clarifications.
S/370 and its descendants never needed specialized hardware virtualization support to correctly run guest operating systems - not because the virtualization was part of the initial design and requirements - it wasn't, but because the architecture was properly designed to support secure multiuser environment. Popek and Goldberg's virtualization requirements are actually very weak - in essence, that only privileged instructions can affect system configuration. This requirement was part of even S/370's predecessor, System/360, well before first virtualized systems appeared.
Performance imporvements of VM guests proceeded along two lines.
First, paravirtualization approach - essentially developing well-architected API for guest-hypervisor communication. It's been used not only for performance, but for a wide variety of other services such as inter-VM communication. The API is documented in the manual referred to above.
Second, microcode extensions (VM microcode assist) that performed some performance sensitive hypervisor logic at microcode level, essentially hardware level. That is not paravirtualization, it is hardware virtualization support proper. But in early 370 machines this support was not architected, meaning it was model-dependent and subject to change. With 370/XA, IBM introduced a proper architectural way to support high-performance virtualization, the Start Interpretive Execution (SIE) instruction. This instruction is not documented in Principles of Operation, but rather in a separate publication, IBM System/370 XA Interpretive Execution. (This document is referenced multiple times in Principles of Operation. The link refers to the first version of the document, you can download version 2 here. I am not sure if that publication was ever updated - probably this is the latest version.) Additionally, I/O subsystem provided VM assist too.
I failed to mention the SIE instruction and the manual that documented it in my original answer, which is a crucial part of the story. I am grateful to the author of the question and the extensive comments that proded me to check my memory and figure out that I skipped an important bit of technical background. This presentation provides an excellent overview of z/VM core facilities that covers additional aspects including memory management, I/O, networking etc.
The SIE instruction is how virtualization software accesses the z/Architecture Interpretive Execution Facility (IEF). The exact details of the interface have not been published since the early 1990s.
This is a hardware-based capability. IEF provides two levels of virtualization. The first level is used by firmware (via the SIE instruction) to create partitions. In each partition you can run an operating system. One of those operating systems is z/VM. It uses the SIE instruction (running within the context of the first level SIE instruction) to run virtual machines. You can run any z/Architecture operating system in a virtual machine, including z/VM itself.
The SIE instruction takes as input the description of a virtual server (partition or virtual machine). The hardware then runs the virtual server's instruction stream, stopping only when it needs help from whatever issued the SIE instruction, whether it be the partition hypervisor or the z/VM hypervisor.

Is Kaveri a HSA-compliant processor?

I have looked at lots of HSA introductions and find that a HSA-compliant GPU should be preemptible and support context switch.
But the Wikipedia article "AMD Accelerated Processing Unit" says GPU compute context switch, GPU graphics preemption will have support in Carizzo APU (2015).
So I wonder whether Kaveri is a HSA-compliant processor?
Thanks!
Kaveri is a 1st generation HSA-compliant APU.
As a 1st generation, it is still missing some features of the HSA specification. One of those features is Mid-wave preemption, which means the ability to preempt a graphic/compute work in the middle, context-switch to a different wave (work) and then resume the original wave.
Without this feature, Kaveri needs to finish the wave and only then can it move to a different wave.
Having said that, there is already an infrastructure for running HSA applications on Kaveri in Linux (Ubuntu 13/14). See https://github.com/HSAFoundation/Linux-HSA-Drivers-And-Images-AMD for kernel bits and https://github.com/HSAFoundation/Okra-Interface-to-HSA-Device for userspace bits.
This infrastructure also supports the Aparapi and Sumatra projects on Kaveri - running Java code on the GPU.
Hope this helps.

cortex a9 boot and memory

I am a newbie starting out in micro-controller programming. The chip of interest here is cortex-a9. At reset or power up there has to be code at 0x0000000 from my readings. My questions though they may sound too trivial will help me in putting some concepts in perspective.
Does the memory address 0x0000000 reside in ROM?
What happens right after the code is read from that address?
Should there be some sort of boot-loader present & if so at what address should this be in & Should it also be residing in ROM?
Finally, at what point does the kernel kick in & where does the kernel code reside?
ARM sells cores not chips, what resides at that address depends on the chip vendor that has bought the ARM core and put it in their chip. Implementations vary from vendor to vendor, chip to chip.
Traditionally an ARM will boot from address zero, more correctly the reset exception vector is at address zero. Unlike other processor families, the traditional ARM model is NOT a list of addresses for exception entry points but instead the ARM EXECUTES the instruction at that address, which means you need to use either a relative branch or a load pc instruction. The newer cortex-m series, which are thumb/thumb2 only (they cannot execute ARM (32 bit) instructions) uses the traditional (non-ARM) like list of addresses, also the zero address is not an exception vector, it is the address to load in the stack pointer, then the second entry is reset and so on. Also the cortex-m exception list is different, that family has like 128 individual interrupts, where the traditional ARM has two, fast and normal. There is a recent cortex-m based question or perhaps phrased as thumb2 question for running linux on a thumb2 ARM. I think the cortex-m implementations are all microcontroller class chips and only have on chip memory in the tens of kbytes, basically these dont fall into the category you are asking about. And you are asking about cortex-a9 anyway.
A number of cores or maybe all of them have a boot option where the boot address can be 0x00000000 or something like 0xFFFF0000 as an alternate address. using that would be very confusing for ARM users, but it provides the ability for example to have a rom at one address and a ram at another allowing you to boot on power up from a rom then switch the exception table into ram for runtime operation. You probably have a chip with a core that can do this but it is up to the chip vendor whether or not to use these edge of the core features or to hardwire them to some setting and not provide you that flexibility.
You need to look at the datasheet/docs for the chip in question. Find out what the name of the ARM core is, as you mentioned cortex-a9. Ideally you want to know the rev as well r0p0 kind of a thing, then go to ARM's website and find the TRM, technical reference manual for that core. You will also want to get a copy of the ARM ARM, ARM Architectural Reference Manual. The (traditional) ARM exception vectors are described in the ARM ARM as well as quite a ton more info. You also need the chip vendors documentation, and look into their boot scheme. Some will point address zero to the boot prom on power up, then the bootloader will need to do something, flip a bit in a register, and the memory controller will switch address 0 to ram. Some might have address 0 always configured as ram, and some other address always configured as rom, lets say 0x80000000 for example, and the chip will copy some items from rom to ram for you before boot, or the chip may simply have the power up setting for the reset vector to be a branch to rom, then it is up to the bootloader to patch up the vector table. As many different schemes as you can think of, it is likely someone has tried it, so you have to study the chip vendors documentation or sample code to understand Basically the answer to your rom question, is it depends and you have to check with the chip vendor.
The ARM TRM for the core should describe, if any, the strap options on the core (like being able to boot from an alternate address), connect those strap options, if any, that are implemented by the vendor. The ARM ARM is not really going to get into that like the TRM. A vendor worth buying from though will have some of their own documentation and/or code that shows what their rom based boot strategy is.
For a system destined to be a linux system you are going to have a bootloader, some non-linux code (very much like the bios on your desktop/laptop) that brings up the system and eventually launches linux. Linux is going to need a fair amount of memory (relative to microcontroller and other well known ARM implementations), that ram may end up being sram or dram and the bootloader may have to initialize the memory interface before it can start linux up. There are popular bootloaders like redboot and uboot. both are significant overkill, but provide features for developers and users like being able to re-flash linux, etc.
ARM linux has ATAGs (ARM TAGs). You can use both the traditional linux command line to tell linux boot information like what address to find the root file system, and ATAGs. Atags are structures in memory that I think r0 or something like that is set to when you branch from the bootloader to linux. The general concept though is the chip powers up, boots from rom or ram, if prepares ram so that it is ready to use, linux might want/need to be copied from rom to ram, the root file system, if separate, might want to be copied to somewhere else in ram. ATAGs are prepared to tell arm where to decompress linux if need be, as well as where to find the command line and or where to find things like the root file system, some registers are prepared as passed parameters to linux and lastly the bootloader branches to the address containing the entry point in the linux kernel.
You have to have boot code available at the address where the hardware starts executing.
This is usually accomplished by having the hardware map some sort of flash or boot ROM to the boot address and start running from there.
Note that in micro controllers the code that starts running at boot has a pretty tough life - no hardware is initialized yet, and by no hardware I mean that even the DDR controllers that control the access to RAM are not working yet... so your code needs to run without RAM.
After the initial boot code sets enough of the hardware (e.g. sets the RAM chips, set up TLBs etc, program MACs etc.) you have the bootloader run.
In some systems, the initial boot code is just the first part of the boot loader. In some systems, a dedicated boot code sets things up and then reads the boot loader from flash and runs it.
The job of the boot loader is to bring the image of the kernel/OS into RAM, usually from flash or network (but can also be shared memory with another board, PCI buses and the like although that is more rare). Once the boot loader has the image of the kernel/OS binary in RAM it might optionally uncompress it, and hand over control (call) the start address of the kernel/OS image.
Sometime, the kernel/OS image is actually a small decompressor and blob of compressed kernel.
At any rate the end result is that the kernel/OS is available in RAM and the boot loader, optionally through the piggy back decompressor, has passed control to it.
Then the kernel/OS starts running and the OS is up.

How is the BIOS used by a modern OS?

What's the function of the BIOS in a modern OS? Is it still used after booting? And is there some kind of BIOS API?
The BIOS is still the first thing that runs on the just-started CPU and responsible for getting the motherboard hardware turned on, setting basic chipset modes and registers, initializing some hardware, and running the code that loads the kernel.
The BIOS is usually not used once the kernel is loaded, and depends on a 16-bit execution environment as opposed to the 32- or 64-bit protected mode environment that a modern kernel operates in.
The boot loader normally does require the BIOS IO calls to get the kernel into memory. The BIOS is being replaced even in this role by newer boot-time software such as Coreboot to provide faster boot times. EFI will one day replace the traditional BIOS, and hopefully the boot loader, passing control directly to the kernel after loading it from storage.
The discovered hardware configuration, memory range settings, and ACPI metadata tables are probably the only BIOS-based data used by the OS after the kernel is loaded. Any runnable ACPI code is encoded as ACPI Machine Language and is interpreted by the OS.
Any good traditional book on MS-DOS assembly programming will include information on the BIOS programming interface. Check out The Art of ASSEMBLY LANGUAGE PROGRAMMING
I wrote BIOS for notebook computers for several years. The BIOS does a lot of things while the OS is running.
A major task is to inform the OS when many events happen so the OS can look smart (as if it somehow figured these things out on its own). For example, the BIOS tells the OS when: the power button is pressed, batteries are inserted or removed, AC power comes or goes, the system connects to or disconnects from a docking station, hard drives and or certain types of optical drives are inserted or removed.
Most portable computers have features that you can access/control through Fn keys and through OS-level applications provided by the manufacturers. The BIOS responds to these hotkeys and has code to interface with the OS-level apps. Features like controlling screen brightness (which certain OSes want to appear to control) or controlling bling LEDs fall into this category.
Perhaps the most important task of the BIOS is to shut down the system when the power button is held down for more than 4 seconds (to recover from OS hangs!).
The biggest benefit to having OS control over BIOS now is to control hardware level variables such as fan speed, temperature gauges, etc.