Beagleboard bare metal programming - embedded

I just got my BeagleBoard-Xm and I'm wondering if there is any detailed step by step tutorials on how to get a very simple bare metal software running on the hardware?
The reason I ask is I want to deeply understand how the hardware architecture works, everything from the bootloader, linkers, interrupts, exceptions, MMU etc. I figured the best way is to get a simple hello world program to execute on the beagleboard xm without an OS. Nothing advanced, just start up the board and get a "hello world" output on the screen. thats it!
The next step would be getting an tiny OS to run, that can schedule some very simple tasks. No filesystem needed, just to understand the basics of the OS.

Absolutely no problem...
First off get the serial port up and running, I have one of the older/earlier beagleboards and remember the serial port and just about everything about the I/O being painful, nevertheless get a serial port on it so you can see it boot.
It boots uboot I think and you can press a key or esc or something like that to interrupt the normal boot (into linux). From the uboot prompt it is easy to load your first simple programs.
I have some beagleboard code handy at the moment but dont have my beagleboard itself handy to try them. So go to http://sam7stuff.blogspot.com/ to get an idea of how to mix some startup assembler and C code for OSless embedded programs (for arm, I have a number of examples out there for other thumb/cortex-m3 platforms, but those boot a little differently).
The sam7 ports for things and memory address space is totally different from the beagleboard/omap. The above is a framework that you can change or re-invent.
You will need the OMAP 35x techincal reference manual from ti.com. Search for the omap part on their site OMAP3530.
Also the beagleboard documentation. For example this statement:
A single RS232 port is provided on the BeagleBoard and provides access to the TX and
RX lines of UART3
So in the trm for the omap searching for UART3 shows that it is at a base address of 0x49020000. (often it is very difficult to figure out the entire address for something as the manuals usually have part of the memory map here, and another part there, and near the register descriptions only the lower few bits of the address are called out.)
Looking at the uart registers THR_REG is where you write bytes to be sent out the uart, note that it is a 16 bit register.
Knowing this we can make the first program:
.globl _start
_start:
ldr r0,=0x49020000
mov r1,#0x55
strh r1,[r0]
strh r1,[r0]
strh r1,[r0]
strh r1,[r0]
strh r1,[r0]
hang: b hang
Here is a makefile for it:
ARMGNU = arm-none-linux-gnueabi
AOPS = --warn --fatal-warnings
COPS = -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding
uarttest.bin : uarttest.s
$(ARMGNU)-as $(AOPS) uarttest.s -o uarttest.o
$(ARMGNU)-ld -T rammap uarttest.o -o uarttest.elf
$(ARMGNU)-objdump -D uarttest.elf > uarttest.list
$(ARMGNU)-objcopy uarttest.elf -O srec uarttest.srec
$(ARMGNU)-objcopy uarttest.elf -O binary uarttest.bin
And the linker script that is used:
/* rammap */
MEMORY
{
ram : ORIGIN = 0x80300000, LENGTH = 0x10000
}
SECTIONS
{
.text : { *(.text*) } > ram
}
Note the linux version from codesourcery is called out, you do not need that version of a gnu cross compiler, in fact this code being asm only needs an assembler and linker (binutils stuff). The arm-none-eabi-... type cross compiler will work as well (assuming you get the lite tools from codesourcery).
Once you have a .bin file look at the help on uboot, I dont remember the exact command but it is probably an l 0x80300000 or load_xmodem or some such thing. Basically you want to x, or y or z modem the .bin file over the serial port into memory space for the processor, then using a go or whatever the command is tell uboot to branch to your program.
You should see a handful of U characters (0x55 is 'U') come out the serial port when run.
Your main goal up front is to get a simple serial port routine up so you can print stuff out to debug and otherwise see what your programs are doing. Later you can get into graphics, etc. but first use the serial port.
There was some cheating going on. Since uboot came up and initialized the serial port we didnt have to, just shove bytes into the thr. but quickly you will overflow the thr's storage and lose bytes, so you then need to read the trm for the omap and find some sort of bit that indicates the transmitter is empty, it has transmitted everything, then create a uart_send type function that polls for transmitter empty then sends the one byte out.
also forget about printf(), you need to create your own print a number (octal or hex are the easiest) and perhaps print string. I do this sort of work all day and all night and 99% of the time all I use is a small routine that prints 32 bit hex numbers out the uart. from the numbers I can debug and see the status of the programs.
So take the sam7 model or something like it (note the compiler and linker command line options are important as is the order of files on the link command line, the first file has to be your entry point if you want to have the first instruction/word in the .bin file be your entry point, which is usually a good idea as down the road you want to know how to control this for booting from a rom).
You can probably do quite a bit without removing or replacing uboot, if you start to look at the linux based boot commands for uboot you will see that it is copying what is pretty much a .bin file from flash or somewhere into a spot in ram, then branching to it. Now branching to linux, esp arm linux involves some arm tables and possible setting up some registers, where your programs wont want or need that. basically whatever command you figure out to use, after you have copied your program to ram, is what you will script in a boot script for uboot should you choose to have the board boot and run like it does with linux.
Saying that you can use jtag and not rely on uboot to work, when you go that path though there are likely a certain number of things you have to do on boot to get the chip up and running, in particular configuring the uart is likely a few clock dividers somewhere, clock enables, I/O enables, various things like that. Which is why the sam7 example starts with a blink the led thing instead of a uart thing. The amotek jtag-tiny is a good jtag wiggler, I have been quite pleased, use these all day long every day at work. The beagleboard probably uses a TI pinout and not the standard ARM pinout so you will likely need to change the cabling. And I dont know if the OMAP gives you direct access to the arm tap controller or if you have to do something ti specific. You are better off just going the uboot route for the time being.
Once you have a framework where you have a small amount of asm to setup the stack and branch to your entrypoint C code, you can start to turn that C code into an OS or do whatever you want. If you look at chibios or prex or others like it you will find they have small asm boot code that gets them into their system. Likewise there are uart debug and non-debug routines in there. Many rtoses are going to want to use interrupts and not poll for thr to be empty.
If this post doesnt get you up and running with your hello world (letting you do some of the work), let me know and i will dig out my beagleboard and create a complete example. My board doesnt exactly match yours but as far as hello world goes it should be close enough.

You could also try TI StarterWare:
http://www.ti.com/tool/starterware-sitara

Related

what is the exact role of an interpreter?

having trouble understanding the exact role of an interpreter. to quote wikipedia - "Programs in interpreted languages[1] are not translated into machine code however, although their interpreter (which may be seen as an executor or processor) typically consists of directly executable machine code (generated from assembly and/or high level language source code)."
my doubt is about this statement - "interpreter (which may be seen as an executor or processor) typically consists of directly executable machine code" ? what does that mean? interpreter is supposed to be a program .How can it 'execute' code by itself ? they have re-stated this fact by saying " interpreter is different from language translators like compilers". Can anyone clarify please ? Also what is the difference (if any) between interpreted language and machine code ?
Compiler:
Transforms your code into binary machine code which can be directly executed by the CPU. Example: C, Fortran
Interpreter:
Is a program that executes the code written by the programmer without an additional step of transformation. Example: Bash scripts, Formulas in Excel
Actually it is not that easy any more. There are many concepts between these two pols. Java is compiled into an intermediate language that is then interpreted, just-in-time compilers compile small parts of interpreted code to speed them up.
"How can it 'execute' code by itself?" Take the Excel example. If you type a calculation into a cell, Excel somehow executes the code, right? But Excel does not compile the code and run it, but it parses it and executes in a general way. Excel has a sum function that in the end is executed on the processor as an add machine command, but there is a lot to do for Excel in between.
I will briefly describe an emulator to explain the main concept mentioned in the question.
Suppose I am using Mame, a video game emulator, and select the old classic arcade "Miss PacMan". Looking at the schematic or looking directly at a PCB inside an arcade video game, it is easy to find the processor : the zilog Z80, the only large chip with 40 pins. Now, if we get the technical data for that processor, we can find the binary encoding for each instruction it can execute. Basically, it get a 8-bit data (value ranging from 0 to 255) which tells the processor what to do. In the case of the emulator, it read the byte (the exact same bytes as would do the Z80 processor inside the original miss pac-man electronic board), determine what a Z80 would do and simulate the instruction.
Some classic video game may have use a x86 processor, similar to the one currently used in most PC. Even when selecting such a game in Mame, the emulator would still read the bytes as found in that game and interpret each one the way the x86 processor would do. In other words, the emulator would not take advantage of the fact that the PC and the emulated game are using a similar processor. It would perform the same steps to emulate any game no matter if the PC on which Mame is running share any similitude with the original game.
You are asking how an interpreter could execute code? The interpreter is a program (the interpreter is just a software, not a physical processor). The wording is effectively confusing. For this sentence to make sense, we would need all the following conditions:
1 - the program to interpret is already in binary, in a machine language that can be executed directly by the processor used in your PC
2 - the program location, the exact address used, is the same as the location that you can reserve in your PC
3 - any library and any I/O occupy the exact same address
When all these condition can be meet, the interpreter could just tell the processor on your PC to stop executing the code from the interpreter but instead, "jump" in the code of the program to be interpreted. Anyone could then say : it is not an interpreter, it is just a launcher.
Maybe such an interpreter which actually does not interpret but let your processor do the real job is still useful in the following way: it could let your processor perform some of the work, but request the generation of an exception when the code to be interpreted is executing some type of instruction. For example, let the code running, but generate a "general protection error" or "trap" or "exception" when trying to execute any of the variant of "IN" or "OUT". The interpreter would take note of the I/O port being written or it would choose a value to give instead of allowing to read a real I/O port. The interpreter would then manage to get the processor "jump" in the program to interpret at the location just after the instruction "IN" or "OUT".
Normally, an interpreter read an ASCII text file, the original source code (which could be Unicode instead of ASCII), determine line by line, word by word, what a compiler would do, then simulate the task on the fly. When the original compiler would need to read many lines to fully understand the current task, the interpreter would also need to read all these lines before being able to simulate the same task.
A big advantage of an interpreter is that it can not crash. Because every instruction is simulated, it is not sensitive to any bug or malicious code. That was a big advantage at the time when computers needed to reboot after encountering any bug, at a time where reboot was taking 10 minutes or more.
Today, with fast SSD to reboot in 5 second and with reliable operating systems which can trap any error in one process and close that process without affecting the stability of the machine, there is less incentive to prefer a slow interpreter over a much faster JIT or much much faster binary executable

How can I learn to build my own bootloader for an embedded system?

Does anyone know of a good learning resource for building your own bootloader for an embedded system? From reading various textbooks, I have a good overview of what a bootloader is supposed to do, and some textbooks include snippets of assembler to show how the bootloader should be built.
However, when I search for resources/tutorials that describe how to build a bootloader, everything I've found so far is either too advanced, (assuming a knowledge of certain preliminaries and are thus hard to follow), or, they are dealing with creating a bootloader for a pc or an emulator. Ideally I'm looking for a single resource/book, that covers preliminaries, and walks me through the process. I'm happy to purchase a particular chip, and relevant cables, if the tutorial/textbook requires that.
The term bootloader is quite broad, so does your quest have roots in a few dozen lines of code with a serial bootloader or are you interested in a linux type full blown operating system (uboot) that has gobs of features and drivers and stacks?
If you dont already know that answer or dont know what I am talking about you need to figure that out, I would start small...even if you desire the huge monster operating system solution, you should start with bare metal (which is what a bootloader is, a bare metal program) chip comes out of reset, your code runs first, blinks an led. Then control the rate of the blinking led, then if you have a push button read the push button and make it change the led (demonstrating input and output). Then find and use a timer to blink the led if you didnt already (use polling first please, interrupts later). Now you can do clock math and have an idea how the chips clock tree is so use that to get a uart up, tx only first, then rx/tx echo what your receive. THEN you are ready to talk about your first bootloader, which should involve some serial protocol (invent your own or use xmodem or something) that actually "boots" and then lets you "load" other programs after booting.
You can do all of this (well virtual leds in some other form) using simulators, and that may not be a bad idea since the hard part of bare metal is first off controlling the assembler compiler and linker to make a binary that actually boots up and runs. Then piles of sub $10 and sub $20 boards that you can learn to write a bootloader for (msp430 launchpads, the other launchpads, stm32f0 and f4 discovery boards, the raspberry pi, probably not the beagles avoid those for now, oh and a myriad of avr based boards, avoid x86 start with microcontrollers, arm, avr, msp430).

u-boot : Relocation

This one is a basic question related to u-boot.
Why does the u-boot code relocate itself ?
Ok, it makes sense if u-boot is executing from NOR-flash or boot ROM space but if it runs from SDRAM already why does it have to relocate itself once again ?
This question comes up frequently. Good answers sometimes too.
I agree it is handy to load the build to SDRAM during development. That works for me, I do it all the time. I have some special boot code in flash which does not enable MMU/cache. For my u-boot builds I switch CONFIG_SYS_TEXT_BASE between flash and ram builds. I run my development builds that way routinely.
As a practical matter, handling re-initialization of MMU/cache would be a nontrivial matter. And U-Boot benefits IMO from simplicity, as result of leaving out things like that.
The tech lead at Denx has expressed his opinion. IIRC his other posts are more strongly worded than that one. I get the impression that he does not like to repeat himself.
update: why relocate. Memory access is faster from RAM than from ROM, this will matter particularly if target has no instruction cache. Executing from RAM allows flash reprogramming; also (more minor) it allows software breakpoints with "trap" instructions; also it is more like the target's normal mode of operation, so if e.g. burst reads from RAM are iffy the failure will be seen at early boot.
U-boot has to reserve 3 regions in memory that stores: 1) u-boot itself, 2) uImage (compressed kernel), and 3) uncompressed kernel. These 3 regions must be carefully placed in u-boot to prevent conflict.
However, the previous stage boot-loader, (BL2 or BL1) that brings u-boot into DRAM memory don\t know u-boot's planing on these 3 regions. So it can only loads u-boot onto a lower address in DRAM memory and jump to it. Then, after u-boot execute some basic initialization and detect current PC is not in planed location, u-boot call relocate function that move u-boot to the planned location and jump to it.
The code of NOR flash must initialize the SDRAM, Then the copy code from Nor Flash to SDRAM, The process will copy itself, because you could enable MMU, we will start Virtual address mapping.

On reset what happens in embedded system?

I have a doubt regarding the reset due to power up:
As I know that microcontroller is hardwired to start with some particular memory location say 0000H on power up. At 0000h, whether interrupt service routine is written for reset(initialization of stack pointer and program counter etc) or the reset address is there at 0000h(say 7000) so that micro controller jumps at 7000 address and there initialization of stack and PC is written.
Who writes this reset service routine? Is it the manufacturer of microcontroller chip(Intel or microchip etc) or any programmer can change this reset service routine(For example, programmer changed the PC to 4000h from 7000h on power up reset resulting into the first instruction to be fetched from 4000 instead of 7000).
How the stack pointer and program counter are initialized to the respective initial addresses as on power up microcontroller is not in the state to put the address into stack pointer and program counter registers(there is no initialization done till reset service routine).
What should be the steps in the reset service routine considering all possibilities?
With reference to your numbering:
The hardware reset process is processor dependent and will be fully described in the data sheet or reference manual for the part, but your description is generally the case - different architectures may have subtle variations.
While some microcontrollers include a ROM based boot-loader that may contain start-up code, typically such bootloaders are only used to load code over a communications port, either to program flash memory directly or to load and execute a secondary bootloader to RAM that then programs flash memory. As far as C runtime start-up goes, this is either provided with the compiler/toolchain, or you write it yourself in assembler. Normally even when start-up code is provided by the compiler vendor, it is supplied as source to be assembled and linked with your application. The compiler vendor cannot always know things like memory map, SDRAM mapping and timing, or processor clock speed or what oscillator crystal is used in your hardware, so the start-up code will generally need customisation or extension through initialisation stubs that you must implement for your hardware.
On ARM Cortex-M devices in fact the initial PC and stack-pointer are in fact loaded by hardware, they are stored at the reset address and loaded on power-up. However in the general case you are right, the reset address either contains the start-up code or a vector to the start-up code, on pre-Cortex ARM architectures, the reset address actually contains a jump instruction rather than a true vector address. Either way, the start-up code for a C/C++ runtime must at least initialise the stack pointer, initialise static data, perform any necessary C library initialisation and jump to main(). In the case of C++ it must also execute the constructors of any global static objects before calling main().
The processor cores normally have as you say a starting address of some sort of table either a list of addresses or like ARM a place where instructions are executed. Wrapped around that core but within the chip can vary. Cores that are not specific to the chip vendor like 8051, mips, arm, xscale, etc are going to have a much wider range of different answers. Some microcontroller vendors for example will look at strap pins and if the strap is wired a certain way when reset is released then it executes from a special boot flash inside the chip, a bootloader that you can for example use to program the user boot flash with. If the strap is not tied that certain way then sometimes it boots your user code. One vendor I know of still has it boot their bootloader flash, if the vector table has a valid checksum then they jump to the reset vector in your vector table otherwise they sit in their bootloader mode waiting for you to talk to them.
When you get into the bigger processors, non-microcontrollers, where software lives outside the processor either on a boot flash (separate chip from the processor) or some ram that is managed somehow before reset, etc. Those usually follow the rule for the core, start at address 0xFFFFFFF0 or start at address 0x00000000, if there is garbage there, oh well fire off the undefined instruction vector, if that is garbage just hang there or sit in an infinite loop calling the undefined instruction vector. this works well for an ARM for example you can build a board with a boot flash that is erased from the factory (all 0xFFs) then you can use jtag to stop the arm and program the flash the first time and you dont have to unsolder or socket or pre-program anything. So long as your bootloader doesnt hang the arm you can have an unbrickable design. (actually you can often hold the arm in reset and still get at it with the jtag debugger and not worry about bad code messing with jtag pins or hanging the arm core).
The short answer: How many different processor chip vendors have there been? There are many different solutions, as many as you can think of and more have been deployed. Placing a reset handler address in a known place in memory is the most common though.
EDIT:
Questions 2 and 3. if you are buying a chip, some of the microcontrollers have this protected bootloader, but even with that normally you write the boot code that will be used by the product. And part of that boot code is to initialize the stack pointers and prepare memory and bring up parts of the chip and all those good things. Sometimes chip vendors will provide examples. if you are buying a board level product, then often you will find a board support package (BSP) which has working example code to bring up the board and perhaps do a few things. Say the beagleboard for example or the open-rd or embeddedarm.com come with a bootloader (u-boot or other) and some already have linux pre-installed. boards like that the user usually just writes some linux apps/drivers and adds them to the bsp, but you are not limited to that, you are often welcome to completely re-write and replace the bootloader. And whoever writes the bootloader has to setup the stacks and bring up the hardware, etc.
systems like the gameboy advance or nds or the like, the vendor has some startup code that calls your startup code. so they may have the stack and such setup for them but they are handing off to you, so much of the system may be up, you just get to decide how to slice up the memorires, where you want your stack, data, program, etc.
some vendors want to keep this stuff controlled or a secret, others do not. in some cases you may end up with a board or chip with no example code, just some data sheets and reference manuals.
if you want to get into this business though you need to be prepared to write this startup code (in assembler) that may call some C code to bring up the rest of the system, then that might start up the main operating system or application or whatever. Microcotrollers sounds like what you are playing with, the answers to your questions are in the chip vendors users guides, some vendors are better than others. search for the word reset or boot in the document to try to figure out what their boot schemes are. I recommend you use "dollar votes" to choose the better vendors. A vendor with bad docs, secret docs, bad support, dont give them your money, spend your money on vendors with freely downloadable, well written docs, with well written examples and or user forums with full time employees trolling around answering questions. There are times where the docs are not available except to serious, paying customers, it depends on the market. most general purpose embedded systems though are openly documented. the quality varies widely, but the docs, etc are there.
Depends completely on the controller/embedded system you use. The ones I've used in game development have the IP point at a starting address in RAM. The boot strap code supplied from the compiler initializes static/const memory, sets the stack pointer, and then jumps execution to a main() routine of some sort. Older systems also started at a fixed address, but you manually had to set the stack, starting vector table, and other stuff in assembler. A common name for the starting assembler file is CRT0.s for the stuff I've done.
So 1. You are correct. The microprocessor has to start at some fixed address.
2. The ISR can be supplied by the manufacturer or compiler creator, or you can write one yourself, depending on the complexity of the system in question.
3. The stack and initial programmer counter are usually handled via some sort of bootstrap routine that quite often can be overriden with your own code. See above.
Last: The steps will depend on the chip. If there is a power interruption of any sort, RAM may be scrambled and all ISR vector tables and startup code should be rewritten, and the app should be run as if it just powered up. But, read your documentation! I'm sure there is platform specific stuff there that will answer these for your specific case.

How does one use dynamic recompilation?

It came to my attention some emulators and virtual machines use dynamic recompilation. How do they do that? In C i know how to call a function in ram using typecasting (although i never tried) but how does one read opcodes and generate code for it? Does the person need to have premade assembly chunks and copy/batch them together? is the assembly written in C? If so how do you find the length of the code? How do you account for system interrupts?
-edit-
system interrupts and how to (re)compile the data is what i am most interested in. Upon more research i heard of one person (no source available) used js, read the machine code, output js source and use eval to 'compile' the js source. Interesting.
It sounds like i MUST have knowledge of the target platform machine code to dynamically recompile
Yes, absolutely. That is why parts of the Java Virtual Machine must be rewritten (namely, the JIT) for every architecture.
When you write a virtual machine, you have a particular host-architecture in mind, and a particular guest-architecture. A portable VM is better called an emulator, since you would be emulating every instruction of the guest-architecture (guest-registers would be represented as host-variables, rather than host-registers).
When the guest- and host-architectures are the same, like VMWare, there are a ton of (pretty neat) optimizations you can do to speed up the virtualization - today we are at the point that this type of virtual machine is BARELY slower than running directly on the processor. Of course, it is extremely architecture-dependent - you would probably be better off rewriting most of VMWare from scratch than trying to port it.
It's quite possible - though obviously not trivial - to disassemble code from a memory pointer, optimize the code in some way, and then write back the optimized code - either to the original location or to a new location with a jump patched into the original location.
Of course, emulators and VMs don't have to RE-write, they can do this at load-time.
This is a wide open question, not sure where you want to go with it. Wikipedia covers the generic topic with a generic answer. The native code being emulated or virtualized is replaced with native code. The more the code is run the more is replaced.
I think you need to do a few things, first decide if you are talking about an emulation or a virtual machine like a vmware or virtualbox. An emulation the processor and hardware is emulated using software, so the next instruction is read by the emulator, the opcode pulled apart by code and you determine what to do with it. I have been doing some 6502 emulation and static binary translation which is dynamic recompilation but pre processed instead of real time. So your emulator may take a LDA #10, load a with immediate, the emulator sees the load A immediate instruction, knows it has to read the next byte which is the immediate the emulator has a variable in the code for the A register and puts the immediate value in that variable. Before completing the instruction the emulator needs to update the flags, in this case the Zero flag is clear the N flag is clear C and V are untouched. But what if the next instruction was a load X immediate? No big deal right? Well, the load x will also modify the z and n flags, so the next time you execute the load a instruction you may figure out that you dont have to compute the flags because they will be destroyed, it is dead code in the emulation. You can continue with this kind of thinking, say you see code that copies the x register to the a register then pushes the a register on the stack then copies the y register to the a register and pushes on the stack, you could replace that chunk with simply pushing the x and y registers on the stack. Or you may see a couple of add with carries chained together to perform a 16 bit add and store the result in adjacent memory locations. Basically looking for operations that the processor being emulated couldnt do but is easy to do in the emulation. Static binary translation which I suggest you look into before dynamic recompilation, performs this analysis and translation in a static manner, as in, before you run the code. Instead of emulating you translate the opcodes to C for example and remove as much dead code as you can (a nice feature is the C compiler can remove more dead code for you).
Once the concept of emulation and translation are understood then you can try to do it dynamically, it is certainly not trivial. I would suggest trying to again doing a static translation of a binary to the machine code of the target processor, which a good exercise. I wouldnt attempt dynamic run time optimizations until I had succeeded in performing them statically against a/the binary.
virtualization is a different story, you are talking about running the same processor on the same processor. So x86 on an x86 for example. the beauty here is that using non-old x86 processors, you can take the program being virtualized and run the actual opcodes on the actual processor, no emulation. You setup traps built into the processor to catch things, so loading values in AX and adding BX, etc these all happen at real time on the processor, when AX wants to read or write memory it depends on your trap mechanism if the addresses are within the virtual machines ram space, no traps, but lets say the program writes to an address which is the virtualized uart, you have the processor trap that then then vmware or whatever decodes that write and emulates it talking to a real serial port. That one instruction though wasnt realtime it took quite a while to execute. What you could do if you chose to is replace that instruction or set of instructions that write a value to the virtualized serial port and maybe have then write to a different address that could be the real serial port or some other location that is not going to cause a fault causing the vm manager to have to emulate the instruction. Or add some code in the virtual memory space that performs a write to the uart without a trap, and have that code instead branch to this uart write routine. The next time you hit that chunk of code it now runs at real time.
Another thing you can do is for example emulate and as you go translate to a virtual intermediate bytcode, like llvm's. From there you can translate from the intermediate machine to the native machine, eventually replacing large sections of program if not the whole thing. You still have to deal with the peripherals and I/O.
Here's an explaination of how they are doing dynamic recompilation for the 'Rubinius' Ruby interpteter:
http://www.engineyard.com/blog/2010/making-ruby-fast-the-rubinius-jit/
This approach is typically used by environments with an intermediate byte code representation (like Java, .net). The byte code contains enough "high level" structures (high level in terms of higher level than machine code) so that the VM can take chunks out of the byte code and replace it by a compiled memory block. The VM typically decide which part is getting compiled by counting how many times the code was already interpreted, since the compilation itself is a complex and time-consuming process. So it is usefull to only compile the parts which get executed many times.
but how does one read opcodes and generate code for it?
The scheme of the opcodes is defined by the specification of the VM, so the VM opens the program file, and interprets it according to the spec.
Does the person need to have premade assembly chunks and copy/batch them together? is the assembly written in C?
This process is an implementation detail of the VM, typically there is a compiler embedded, which is capable to transform the VM opcode stream into machine code.
How do you account for system interrupts?
Very simple: none. The code in the VM can't interact with real hardware. The VM interact with the OS, and transfer OS events to the code by jumping/calling specific parts inside the interpreted code. Every event in the code or from the OS must pass the VM.
Also hardware virtualization products can use some kind of JIT. A typical use cases in the X86 world is the translation of 16bit real mode code to 32 or 64bit protected mode code to not to be forced to emulate a CPU in real mode. Also a software-only VM replaces jump instructions in the executing code by jumps into the VM control software, which at each branch the following code path for jump instructions scans and them replace, before it jumps to the real code destination. But I doubt if the jump replacement qualifies as JIT compilation.
IIS does this by shadow copying: after compilation it copies assemblies to some temporary place and runs them from temp.
Imagine, that user change some files. Then IIS will recompile asseblies in next steps:
Recompile (all requests handled by old code)
Copies new assemblies (all requests handled by old code)
All new requests will be handled by new code, all requests - by old.
I hope this'd be helpful.
A virtual Machine loads "byte code" or "intermediate language" and not machine code therefore, I suppose, that it just recompiles the byte code more efficiently once it has more runtime data.
http://en.wikipedia.org/wiki/Just-in-time_compilation