can someone please help me or direct me on how to configure ARM cortex a76 in gem5?
will making changes to configs/common/cores/arm/ex_big.py to a76 configuration ?
For the ISA interface purely, we have very few "enable this feature"
flags in the config, for the most part it is all on an "enable
everything that is implemented by default" basis.
For performance modeling, I believe not enough public information is
made available for accurate modeling, have a look at
https://en.wikipedia.org/wiki/ARM_Cortex-A76 for some very high level
publicly known parameters you could try to replicate with custom configs
such as out-of-order or cache sizes. Those parameters I mentioned are accessible on fs.py with --cpu-type and --caches --l2cache --l1d_size=1024 --l1i_size=1024 --l2_size=1024 --l3_size=1024I.
Mailing list thread: https://www.mail-archive.com/gem5-users#gem5.org/msg16323.html
Related
I noticed that in the GEM5 full system provided by ARM (fs.py), the HPI CPU instruction cache does not use a prefetcher. The source code specifically states # No prefetcher, this is handled by the core (HPI.py).
I tried adding a prefetcher to the ICache to look into this issue. The config.ini output showed that the prefetcher was indeed included in the system, but examining stats.txt I saw that it did not issue any prefetch requests. I also spent some time reading through the core source code, but could not identify any component that seemed to handle instruction prefetching.
I was wondering why this is. I can think of scenarios where instruction prefetching could benefit the CPU performance, and am not aware of a technical problem implementing it. This is especially confusing to me since it seems the system was built to allow using instruction prefetching: the configuration scripts allow setting a prefetcher for the ICache from the command line, the ICache class is derived from Cache and so can incorporate a prefetcher, and prefetchers in GEM5 have a on_inst property which supposedly controls instruction prefetching.
Can anyone explain the logic behind this, and possibly suggest ways to enable prefetching for the instruction cache?
I have with interest studied the hardware virtualization extensions added by Intel and AMD to the x86 architecture (known as VMX and SVM, respectively). While this is still a relatively recent addition to x86 CPU's my understanding is that the mainframe architecture made extensive use of virtualization since the 70's-80's for instance in the form of the venerable z/VM operating system. Even nested virtualization has been used.
My question is, is there a public documentation of the hardware facilities provided by the z/Architecture used by the z/VM operating system to implement this virtualization? I.e. the control registers and data structures that the hardware implements to allows the hypervisor to simulate the guest state and trap necessary instructions? Another thing I am curious about is if the z/Architecture supports second-level address translation (which was added later to VMX and SVM).
Just to get it out of the way, System/370 and all its descendants support virtualization as is (they satisfy virtualization requirements). In that sense, no special hardware support has ever been needed, as opposed to Intel architecture.
The performance improvements for VM guests on System/370, XA, ESA etc. all the way through z/Architecture have been traditionally implemented using DIAG (diagnose) instruction as well as microcode (now millicode) assist. In modern terms, it is more of paravirtualization. The facilities are documented, you can start here for instance.
Update - after reading extensive comments, a few notes and clarifications.
S/370 and its descendants never needed specialized hardware virtualization support to correctly run guest operating systems - not because the virtualization was part of the initial design and requirements - it wasn't, but because the architecture was properly designed to support secure multiuser environment. Popek and Goldberg's virtualization requirements are actually very weak - in essence, that only privileged instructions can affect system configuration. This requirement was part of even S/370's predecessor, System/360, well before first virtualized systems appeared.
Performance imporvements of VM guests proceeded along two lines.
First, paravirtualization approach - essentially developing well-architected API for guest-hypervisor communication. It's been used not only for performance, but for a wide variety of other services such as inter-VM communication. The API is documented in the manual referred to above.
Second, microcode extensions (VM microcode assist) that performed some performance sensitive hypervisor logic at microcode level, essentially hardware level. That is not paravirtualization, it is hardware virtualization support proper. But in early 370 machines this support was not architected, meaning it was model-dependent and subject to change. With 370/XA, IBM introduced a proper architectural way to support high-performance virtualization, the Start Interpretive Execution (SIE) instruction. This instruction is not documented in Principles of Operation, but rather in a separate publication, IBM System/370 XA Interpretive Execution. (This document is referenced multiple times in Principles of Operation. The link refers to the first version of the document, you can download version 2 here. I am not sure if that publication was ever updated - probably this is the latest version.) Additionally, I/O subsystem provided VM assist too.
I failed to mention the SIE instruction and the manual that documented it in my original answer, which is a crucial part of the story. I am grateful to the author of the question and the extensive comments that proded me to check my memory and figure out that I skipped an important bit of technical background. This presentation provides an excellent overview of z/VM core facilities that covers additional aspects including memory management, I/O, networking etc.
The SIE instruction is how virtualization software accesses the z/Architecture Interpretive Execution Facility (IEF). The exact details of the interface have not been published since the early 1990s.
This is a hardware-based capability. IEF provides two levels of virtualization. The first level is used by firmware (via the SIE instruction) to create partitions. In each partition you can run an operating system. One of those operating systems is z/VM. It uses the SIE instruction (running within the context of the first level SIE instruction) to run virtual machines. You can run any z/Architecture operating system in a virtual machine, including z/VM itself.
The SIE instruction takes as input the description of a virtual server (partition or virtual machine). The hardware then runs the virtual server's instruction stream, stopping only when it needs help from whatever issued the SIE instruction, whether it be the partition hypervisor or the z/VM hypervisor.
I want to read the global variables via the JTAG port, live, when a program is running on the microcontroller. Is it possible?
JTAG defines only a physical interface, it does not describe the on-chip debug capabilities of a particular processor which may or may not support access during execution.
Moreover whether it can be done in VB is not really the issue, the important issue is what hardware device and/or I/O port you are using for the JTAG interface, and whether a driver and API to access via .Net is available. That said VB.Net is not the first language I'd choose for that in any case.
A good place to start perhaps is OpenOCD, though it is not .Net specific.
"Almost-Live" is possibly doable, depending on the JTAG implementation. Often JTAG activity which reads memory does so by stealing cycles from the micro (or sometimes even inserting instructions into the pipeline). I'm not sure there's a micro which allows completely transparent access to memory over JTAG.
"All you need to do" is understand the JTAG implementation, know where the variable is located and issue a "memory read" command by wiggling the JTAG pins in the appropriate fashion. This is not a small task, which is why professional engineers are willing to pay (sometimes large amounts of) money for tools which perform this task.
Often the free (limited) toolchains the vendors provide can perform this also.
Yes, I suppose it is possible. But you'll need to drive the JTAG port (that sounds painful!) and know exactly where the data is stored on the chip, and what the formatting is.
I wanted to know what steps one would need to take to "hack" a camera's firmware to add/change features, specifically cameras of Canon or Olympus make.
I can understand this is an involved topic, but a general outline of the steps and what I issues I should keep an eye out for would be appreciated.
I presume the first step is to take the firmware, load it into a decompiler (any recommendations?) and examine the contents. I admit I've never decompiled code before, so this will be a good challenge to get me started, any advice? books? tutorials? what should I expect?
Thanks stack as always!
Note : I know about Magic Lantern and CHDK, I want to get technical advise on how they were started and came to be.
http://magiclantern.wikia.com/wiki/Decompiling
http://magiclantern.wikia.com/wiki/Struct_Guessing
http://magiclantern.wikia.com/wiki/Firmware_file
http://magiclantern.wikia.com/wiki/GUI_Events/550D
http://magiclantern.wikia.com/wiki/Register_Map/Brute_Force
I wanted to know what steps one would need to take to "hack" a
camera's firmware to add/change features, specifically cameras of
Canon or Olympus make.
General steps for this hacking/reverse engineering:
Gathering information about the camera system (main CPU, Image coprocessor, RAM/Flash chips..). Challenges: Camera system makers tend to hide such sensitive information. Also, datasheets/documentation for proprietary chips are not released to public at all.
Getting firmware: through dumping Flash memory inside the camera or extracting the firmware from update packages used for camera firmware update. Challenges: Accessing readout circuitry for flash is not a trivial job specially with the fact that camera systems have one of the most densely populated PCBs. Also, Proprietary firmware are highly protected with sophisticated encryption algorithms when embedded into update packages.
Dis-assembly: getting a "bit" more readable instructions out of the opcode firmware. Challenges: Although dis-assemblers are widely available, they will give you the "operational" equivalent assembly code out of the opcode with no guarantee for being human readable/meaningful.
Customization: Just after understanding most of the code functionalities, you can make modifications that need not to harm normal operation of the camera system. Challenges: Not an easy task.
Alternatively, I highly recommend you to look for an already open source camera software (also HW). You can learn a lot about camera systems.
Such projects are: Elphel and AXIOM
First the background, specifics of my question will follow:
At the company that I work at the platform we work on is currently the Microchip PIC32 family using the MPLAB IDE as our development environment. Previously we've also written firmware for the Microchip dsPIC and TI MSP families for this same application.
The firmware is pretty straightforward in that the code is split into three main modules: device control, data sampling, and user communication (usually a user PC). Device control is achieved via some combination of GPIO bus lines and at least one part needing SPI or I2C control. Data sampling is interrupt driven using a Timer module to maintain sample frequency and more SPI/I2C and GPIO bus lines to control the sampling hardware (ie. ADC). User communication is currently implemented via USB using the Microchip App Framework.
So now the question: given what I've described above, at what point would I consider employing an RTOS for my project? Currently I'm thinking of these possible trigger points as reasons to use an RTOS:
Code complexity? The code base architecture/organization is still small enough that I can keep all the details in my head.
Multitasking/Threading? Time-slicing the module execution via interrupts suffices for now for multitasking.
Testing? Currently we don't do much formal testing or verification past the HW smoke test (something I hope to rectify in the near future).
Communication? We currently use a custom packet format and a protocol that pretty much only does START, STOP, SEND DATA commands with data being a binary blob.
Project scope? There is a possibility in the near future that we'll be getting a project to integrate our device into a larger system with the goal of taking that system to mass production. Currently all our projects have been experimental prototypes with quick turn-around of about a month, producing one or two units at a time.
What other points do you think I should consider? In your experience what convinced (or forced) you to consider using an RTOS vs just running your code on the base runtime? Pointers to additional resources about designing/programming for an RTOS is also much appreciated.
There are many many reasons you might want to use an RTOS. They are varied & the degree to which they apply to your situation is hard to say. (Note: I tend to think this way: RTOS implies hard real time which implies preemptive kernel...)
Rate Monotonic Analysis (RMA) - if you want to use Rate Monotonic Analysis to ensure your timing deadlines will be met, you must use a pre-emptive scheduler
Meet real-time deadlines - even without using RMA, with a priority-based pre-emptive RTOS, your scheduler can help ensure deadlines are met. Paradoxically, an RTOS will typically increase interrupt latency due to critical sections in the kernel where interrupts are usually masked
Manage complexity -- definitely, an RTOS (or most OS flavors) can help with this. By allowing the project to be decomposed into independent threads or processes, and using OS services such as message queues, mutexes, semaphores, event flags, etc. to communicate & synchronize, your project (in my experience & opinion) becomes more manageable. I tend to work on larger projects, where most people understand the concept of protecting shared resources, so a lot of the rookie mistakes don't happen. But beware, once you go to a multi-threaded approach, things can become more complex until you wrap your head around the issues.
Use of 3rd-party packages - many RTOSs offer other software components, such as protocol stacks, file systems, device drivers, GUI packages, bootloaders, and other middleware that help you build an application faster by becoming almost more of an "integrator" than a DIY shop.
Testing - yes, definitely, you can think of each thread of control as a testable component with a well-defined interface, especially if a consistent approach is used (such as always blocking in a single place on a message queue). Of course, this is not a substitute for unit, integration, system, etc. testing.
Robustness / fault tolerance - an RTOS may also provide support for the processor's MMU (in your PIC case, I don't think that applies). This allows each thread (or process) to run in its own protected space; threads / processes cannot "dip into" each others' memory and stomp on it. Even device regions (MMIO) might be off limits to some (or all) threads. Strictly speaking, you don't need an RTOS to exploit a processor's MMU (or MPU), but the 2 work very well hand-in-hand.
Generally, when I can develop with an RTOS (or some type of preemptive multi-tasker), the result tends to be cleaner, more modular, more well-behaved and more maintainable. When I have the option, I use one.
Be aware that multi-threaded development has a bit of a learning curve. If you're new to RTOS/multithreaded development, you might be interested in some articles on Choosing an RTOS, The Perils of Preemption and An Introduction to Preemptive Multitasking.
Lastly, even though you didn't ask for recommendations... In addition to the many numerous commercial RTOSs, there are free offerings (FreeRTOS being one of the most popular), and the Quantum Platform is an event-driven framework based on the concept of active objects which includes a preemptive kernel. There are plenty of choices, but I've found that having the source code (even if the RTOS isn't free) is advantageous, esp. when debugging.
RTOS, first and foremost permits you to organize your parallel flows into the set of tasks with well-defined synchronization between them.
IMO, the non-RTOS design is suitable only for the single-flow architecture where all your program is one big endless loop. If you need the multi-flow - a number of tasks, running in parallel - you're better with RTOS. Without RTOS you'll be forced to implement this functionality in-house, re-inventing the wheel.
Code re-use -- if you code drivers/protocol-handlers using an RTOS API they may plug into future projects easier
Debugging -- some IDEs (such as IAR Embedded Workbench) have plugins that show nice live data about your running process such as task CPU utilization and stack utilization
Usually you want to use an RTOS if you have any real-time constraints. If you don’t have real-time constraints, a regular OS might suffice. RTOS’s/OS’s provide a run-time infrastructure like message queues and tasking. If you are just looking for code that can reduce complexity, provide low level support and help with testing, some of the following libraries might do:
The standard C/C++ libraries
Boost libraries
Libraries available through the manufacturer of the chip that can provide hardware specific support
Commercial libraries
Open source libraries
Additional to the points mentioned before, using an RTOS may also be useful if you need support for
standard storage devices (SD, Compact Flash, disk drives ...)
standard communication hardware (Ethernet, USB, Firewire, RS232, I2C, SPI, ...)
standard communication protocols (TCP-IP, ...)
Most RTOSes provide these features or are expandable to support them