cortex a9 boot and memory

cortex a9 boot and memory - embedded

I am a newbie starting out in micro-controller programming. The chip of interest here is cortex-a9. At reset or power up there has to be code at 0x0000000 from my readings. My questions though they may sound too trivial will help me in putting some concepts in perspective.
Does the memory address 0x0000000 reside in ROM?
What happens right after the code is read from that address?
Should there be some sort of boot-loader present & if so at what address should this be in & Should it also be residing in ROM?
Finally, at what point does the kernel kick in & where does the kernel code reside?

ARM sells cores not chips, what resides at that address depends on the chip vendor that has bought the ARM core and put it in their chip. Implementations vary from vendor to vendor, chip to chip.
Traditionally an ARM will boot from address zero, more correctly the reset exception vector is at address zero. Unlike other processor families, the traditional ARM model is NOT a list of addresses for exception entry points but instead the ARM EXECUTES the instruction at that address, which means you need to use either a relative branch or a load pc instruction. The newer cortex-m series, which are thumb/thumb2 only (they cannot execute ARM (32 bit) instructions) uses the traditional (non-ARM) like list of addresses, also the zero address is not an exception vector, it is the address to load in the stack pointer, then the second entry is reset and so on. Also the cortex-m exception list is different, that family has like 128 individual interrupts, where the traditional ARM has two, fast and normal. There is a recent cortex-m based question or perhaps phrased as thumb2 question for running linux on a thumb2 ARM. I think the cortex-m implementations are all microcontroller class chips and only have on chip memory in the tens of kbytes, basically these dont fall into the category you are asking about. And you are asking about cortex-a9 anyway.
A number of cores or maybe all of them have a boot option where the boot address can be 0x00000000 or something like 0xFFFF0000 as an alternate address. using that would be very confusing for ARM users, but it provides the ability for example to have a rom at one address and a ram at another allowing you to boot on power up from a rom then switch the exception table into ram for runtime operation. You probably have a chip with a core that can do this but it is up to the chip vendor whether or not to use these edge of the core features or to hardwire them to some setting and not provide you that flexibility.
You need to look at the datasheet/docs for the chip in question. Find out what the name of the ARM core is, as you mentioned cortex-a9. Ideally you want to know the rev as well r0p0 kind of a thing, then go to ARM's website and find the TRM, technical reference manual for that core. You will also want to get a copy of the ARM ARM, ARM Architectural Reference Manual. The (traditional) ARM exception vectors are described in the ARM ARM as well as quite a ton more info. You also need the chip vendors documentation, and look into their boot scheme. Some will point address zero to the boot prom on power up, then the bootloader will need to do something, flip a bit in a register, and the memory controller will switch address 0 to ram. Some might have address 0 always configured as ram, and some other address always configured as rom, lets say 0x80000000 for example, and the chip will copy some items from rom to ram for you before boot, or the chip may simply have the power up setting for the reset vector to be a branch to rom, then it is up to the bootloader to patch up the vector table. As many different schemes as you can think of, it is likely someone has tried it, so you have to study the chip vendors documentation or sample code to understand Basically the answer to your rom question, is it depends and you have to check with the chip vendor.
The ARM TRM for the core should describe, if any, the strap options on the core (like being able to boot from an alternate address), connect those strap options, if any, that are implemented by the vendor. The ARM ARM is not really going to get into that like the TRM. A vendor worth buying from though will have some of their own documentation and/or code that shows what their rom based boot strategy is.
For a system destined to be a linux system you are going to have a bootloader, some non-linux code (very much like the bios on your desktop/laptop) that brings up the system and eventually launches linux. Linux is going to need a fair amount of memory (relative to microcontroller and other well known ARM implementations), that ram may end up being sram or dram and the bootloader may have to initialize the memory interface before it can start linux up. There are popular bootloaders like redboot and uboot. both are significant overkill, but provide features for developers and users like being able to re-flash linux, etc.
ARM linux has ATAGs (ARM TAGs). You can use both the traditional linux command line to tell linux boot information like what address to find the root file system, and ATAGs. Atags are structures in memory that I think r0 or something like that is set to when you branch from the bootloader to linux. The general concept though is the chip powers up, boots from rom or ram, if prepares ram so that it is ready to use, linux might want/need to be copied from rom to ram, the root file system, if separate, might want to be copied to somewhere else in ram. ATAGs are prepared to tell arm where to decompress linux if need be, as well as where to find the command line and or where to find things like the root file system, some registers are prepared as passed parameters to linux and lastly the bootloader branches to the address containing the entry point in the linux kernel.

You have to have boot code available at the address where the hardware starts executing.
This is usually accomplished by having the hardware map some sort of flash or boot ROM to the boot address and start running from there.
Note that in micro controllers the code that starts running at boot has a pretty tough life - no hardware is initialized yet, and by no hardware I mean that even the DDR controllers that control the access to RAM are not working yet... so your code needs to run without RAM.
After the initial boot code sets enough of the hardware (e.g. sets the RAM chips, set up TLBs etc, program MACs etc.) you have the bootloader run.
In some systems, the initial boot code is just the first part of the boot loader. In some systems, a dedicated boot code sets things up and then reads the boot loader from flash and runs it.
The job of the boot loader is to bring the image of the kernel/OS into RAM, usually from flash or network (but can also be shared memory with another board, PCI buses and the like although that is more rare). Once the boot loader has the image of the kernel/OS binary in RAM it might optionally uncompress it, and hand over control (call) the start address of the kernel/OS image.
Sometime, the kernel/OS image is actually a small decompressor and blob of compressed kernel.
At any rate the end result is that the kernel/OS is available in RAM and the boot loader, optionally through the piggy back decompressor, has passed control to it.
Then the kernel/OS starts running and the OS is up.

Related

how to port uclinux linux to any microcontroller

I have stellaris LM4f232 evaluation borad. I have ported free rtos , sysbios to stellaris lm4f232 board and successfully developed an gps tracking application . But I always wanted to port uc linux for my board . my question are
i) is there any material to port uclinux to any controller
ii)what are necessary knowledge I required to do the same
I have googled a lot . I did n't get the right information, but I have seen posts that its difficult ,but I cant able to realise the same .any help????
iii) what is the road map to achieve it , what are the knowledge I should need to achieve this

Linux, even uCLinux requires considerable memory resources; you'd want to start with at least 2Mb for the boot device and 16Mb of RAM (although a minimal system can be booted in as little as 4Mb). On a microcontroller, this means that you must have external memory.
Another issue is that Cortex-M devices are optimised to run code from on-chip Flash memory, having separate buses for ROM and RAM so that data and instructions can be fetched simultaneously. uClinux must run from external RAM, which has a detrimental effect on the performance, and you will be unlikely to achieve the 1.25MIPS per MHz figure the CM4 is otherwise capable of. It is possible to arrange for time critical code to be placed in on-chip flash is necessary, but it is of course a limited resource.
Some good advice on the issues of deploying Linux on a Cortex-M device can be found here

I would suggest to have a look on buildroot which as far as I know can be build for this board.

adding to #Clifford , you can use u-boot (bootloader) ,already configured for many boards ,if your board is not on list you can edit it.,

Bootloader Working

I am working on Uboot bootloader. I have some basic question about the functionality of Bootloader and the application it is going to handle:
Q1: As per my knowledge, bootloader is used to download the application into memory. Over internet I also found that bootloader copies the application to RAM and then the application runs from RAM. I am confused with the working of Bootloader...When application is provided to bootloader through serial or TFTP, What happens next, whether Bootloader copies it to RAM first or whether it writes directly to Flash.
Q2: Why there is a need for Bootloader to copy application to RAM and then run the application from RAM? What difficulties we will face if our application runs from FLASH?
Q3: What is the meaning of statement "My application is running from RAM/FLASH"? Is it mean that our application's .text segment or .code segment is in RAM/FLASH? And we are not concerned about .bss section because it is designed to be in RAM.
Thanks
Phogat

When any hardware system is designed, the designer must consider where the executable code will be located. The answer depends on the microcontroller, the included memory types, and the system requirements. So the answer varies from system to system. Some systems execute code located in RAM. Other systems execute code located in flash. You didn't tell us enough about your system to know what it is designed to do.
A system might be designed to execute code from RAM because RAM access times are faster than flash so code can execute faster. A system might be designed to execute code from flash because flash is plentiful and RAM may not be. A system might be designed to execute code from flash so that it boots more quickly. These are just some examples and there are other considerations as well.
RAM is volatile so it does not retain code through a power cycle. If the system executes code located in RAM then a bootloader is required to obtain and write the code to RAM at powerup. Flash is non-volatile so execution can start right away at powerup and a bootloader is not necessary (but may still be useful).
Regarding Q3, the answer is yes. If the system is running from RAM then the .text will be located in RAM (but not until after the bootloader has copied it to there). If the system is running from flash then the .text section will be located in flash. The .bss section is variables and will be in RAM regardless of where the .text section is.

Yes, in general a bootloader boots the system, but it might also provide a mechanism for interrupting the default boot path and allow alternate firmware to be downloaded and run instead, as well as other features (like flashing).
Traditional rom had a traditional ram like interface, address, data, chip select, read/write, etc. And you can still buy rom that way, but it is cheaper from a pin real estate perspective to use something spi or i2c based, which is slower. Not desireable to run from, but tolerable to read once then run from ram. newer flash technologies can/have had problems with read-disturb, where if your code is in a tight loop reading the same instructions or for any other reason the flash is being read too fast, the charge can drop such that a read returns the wrong data, potentially causing the program to change course or crash. Also your PC and other linux platforms are used to copying the kernel from NV storage (hard disk) to ram and then running from there so the copy from flash to ram and run from ram has a comfort level, and is often faster than flash. So there are many potential reasons to not use flash, but depending on the system it may be possible to run from flash just fine (some systems the flash in question is not accessible directly and not executable, of course SOME rom in that system needed to be executable/bootable).
It simplifies the coding challenges if you program the flash with something that is in ram. You can create and debug the code one time that reads from ram and writes to flash and reads from flash and writes to ram. DONE. Now you can work on separate code that receives data from serial to ram, or from ram to serial. DONE. Then work on code that does the same over ethernet or usb or whatever DONE. You dont have to deal with inventing a protocol or solving the problem of timing. Flash writing is very slow, and even xmodem at a moderate speed can be way too fast, so you have to buffer that data in ram anyway, might as well make the tasks completely separate, instead of an xmodem or any other serial based flash loader with a big ram based fifo, just move the data to ram, then separately go from ram to flash. Same for other interfaces. It is technically possible to buffer the data and give the illusion of going from the download interface straight to flash, and depending on the protocol it is technically possible to hold off the sender so that as little as one flash page is required in ram before programming flash. With the older parallel flashes you could do something pretty cool which I dont think most people figured out. When you stop writing to the flash page for some known period of time the flash would automatically start to program that page and you have to wait for 10ms or something like that before it is done. What folks assumed was you had to program sequential addresses and had to get the new data for the next address in that period of time and would demand high serial port speeds, etc, the reality is you can pound the same address over and over again with the same data and the flash wont start to program the page, and the download interface can be infinitely slow. Serial flashes work differently and either dont need tricks or have different tricks.
RAM/FLASH is not some industry term. It likely means that .text is in rom (flash) and .data and .bss are in ram. A copy of the initial state of .data will probably be on flash as well and copied to ram before main() is called, likewise .bss will be zeroed before main() is called. look at crt0.S for most platforms in gnu sources (glibc, or is it gcc, I dont know) to get the gist of how the bootstrap works in a generic fashion.
A bootloader is not required to run linux or other operating systems, you dont NEED uboot, but it is quite useful. Linux is pretty easy, you copy the kernel and root file system, either set some registers or some tags in memory or both then branch to the entry point in the kernel and linux takes over from there. Because linux is so complicated it is desireable to have a complicated bootloader that can capitalize on high speed interfaces like ethernet (rather than being limited to serial or slower).

I would add something regarding your question Q2.
Q2: Why there is a need for Bootloader to copy application to RAM and then run the application from RAM? What difficulties we will face if our application runs from FLASH?
It is not only about having SPI or similar serial external code memory (which is not that often anyway).
Even the external ROM/FLASH/EPROM/ connected to the usual high speed parallel bus will will prevent a system from running on a maximum clock (with zero wait state) even on the relatively slow MCUs due to the external memory access time. You would need 10 ns FLASH access time for the 100 MHz clock, which is not so easy to get (if economically possible at all). And you would agree that 100 MHz is not such a brain spinning speed any more :-)
That is why many MCU/CPU architectures are doing tricks with reading multiply instructions at once, or having internal cash, or doing whatever was needed to compensate for a slow external code memory. Only most older 8-bit architectures can execute the code directly from the flash memory ('in place').
Even if your only code memory was the internal Flash, something need to be done to speed it up. Take a look for example at this article:
http://www.iqmagazineonline.com/magazine/pdf/v_3_2_pdf/pg14-15-18-19-9Q6Phillips-Z.pdf
It desribes how the ARM7 has incorporated something they called MAM (Memory Accelerator Module). It is a good read, and you will find some measures there to speed up the code memory access for the specific ARM7 arhitecture (goes for most others):
Limit maximum clock frequency (from 80 MHz to about 20 MHz for the example in the article)
Insert wait-cycles during flash accesses
Use an instruction cache
Copy the program code from flash to RAM
Obviously, if the instruction cache was not an option (too small, or the clock too high) you are really left only with execution from the RAM, after relocating the code there at the start up.
There is an option also to run only specific section of code from the RAM, which could be specified to the linker. For the DSP (Digital System Processing) systems, there was really no option to run from the EPROM/FLASH even in the old days with clock around only few tens of MHz, let alone now.
Another issue is debugging, the options for debugging the code placed in ROM, or even Flash, are very limited (you have to move section of the code to RAM to be able to set a break point on most systems).

Regarding Q2, one of the difficulties you may face executing from Flash is another code update. If you are executing from the same block of Flash you are trying to update, the system will crash. This depends on your system architecture (how your application and bootloader are organized in Flash) but may be particularly hard to avoid if you are trying to update the bootloader itself.

On reset what happens in embedded system?

I have a doubt regarding the reset due to power up:
As I know that microcontroller is hardwired to start with some particular memory location say 0000H on power up. At 0000h, whether interrupt service routine is written for reset(initialization of stack pointer and program counter etc) or the reset address is there at 0000h(say 7000) so that micro controller jumps at 7000 address and there initialization of stack and PC is written.
Who writes this reset service routine? Is it the manufacturer of microcontroller chip(Intel or microchip etc) or any programmer can change this reset service routine(For example, programmer changed the PC to 4000h from 7000h on power up reset resulting into the first instruction to be fetched from 4000 instead of 7000).
How the stack pointer and program counter are initialized to the respective initial addresses as on power up microcontroller is not in the state to put the address into stack pointer and program counter registers(there is no initialization done till reset service routine).
What should be the steps in the reset service routine considering all possibilities?

With reference to your numbering:
The hardware reset process is processor dependent and will be fully described in the data sheet or reference manual for the part, but your description is generally the case - different architectures may have subtle variations.
While some microcontrollers include a ROM based boot-loader that may contain start-up code, typically such bootloaders are only used to load code over a communications port, either to program flash memory directly or to load and execute a secondary bootloader to RAM that then programs flash memory. As far as C runtime start-up goes, this is either provided with the compiler/toolchain, or you write it yourself in assembler. Normally even when start-up code is provided by the compiler vendor, it is supplied as source to be assembled and linked with your application. The compiler vendor cannot always know things like memory map, SDRAM mapping and timing, or processor clock speed or what oscillator crystal is used in your hardware, so the start-up code will generally need customisation or extension through initialisation stubs that you must implement for your hardware.
On ARM Cortex-M devices in fact the initial PC and stack-pointer are in fact loaded by hardware, they are stored at the reset address and loaded on power-up. However in the general case you are right, the reset address either contains the start-up code or a vector to the start-up code, on pre-Cortex ARM architectures, the reset address actually contains a jump instruction rather than a true vector address. Either way, the start-up code for a C/C++ runtime must at least initialise the stack pointer, initialise static data, perform any necessary C library initialisation and jump to main(). In the case of C++ it must also execute the constructors of any global static objects before calling main().

The processor cores normally have as you say a starting address of some sort of table either a list of addresses or like ARM a place where instructions are executed. Wrapped around that core but within the chip can vary. Cores that are not specific to the chip vendor like 8051, mips, arm, xscale, etc are going to have a much wider range of different answers. Some microcontroller vendors for example will look at strap pins and if the strap is wired a certain way when reset is released then it executes from a special boot flash inside the chip, a bootloader that you can for example use to program the user boot flash with. If the strap is not tied that certain way then sometimes it boots your user code. One vendor I know of still has it boot their bootloader flash, if the vector table has a valid checksum then they jump to the reset vector in your vector table otherwise they sit in their bootloader mode waiting for you to talk to them.
When you get into the bigger processors, non-microcontrollers, where software lives outside the processor either on a boot flash (separate chip from the processor) or some ram that is managed somehow before reset, etc. Those usually follow the rule for the core, start at address 0xFFFFFFF0 or start at address 0x00000000, if there is garbage there, oh well fire off the undefined instruction vector, if that is garbage just hang there or sit in an infinite loop calling the undefined instruction vector. this works well for an ARM for example you can build a board with a boot flash that is erased from the factory (all 0xFFs) then you can use jtag to stop the arm and program the flash the first time and you dont have to unsolder or socket or pre-program anything. So long as your bootloader doesnt hang the arm you can have an unbrickable design. (actually you can often hold the arm in reset and still get at it with the jtag debugger and not worry about bad code messing with jtag pins or hanging the arm core).
The short answer: How many different processor chip vendors have there been? There are many different solutions, as many as you can think of and more have been deployed. Placing a reset handler address in a known place in memory is the most common though.
EDIT:
Questions 2 and 3. if you are buying a chip, some of the microcontrollers have this protected bootloader, but even with that normally you write the boot code that will be used by the product. And part of that boot code is to initialize the stack pointers and prepare memory and bring up parts of the chip and all those good things. Sometimes chip vendors will provide examples. if you are buying a board level product, then often you will find a board support package (BSP) which has working example code to bring up the board and perhaps do a few things. Say the beagleboard for example or the open-rd or embeddedarm.com come with a bootloader (u-boot or other) and some already have linux pre-installed. boards like that the user usually just writes some linux apps/drivers and adds them to the bsp, but you are not limited to that, you are often welcome to completely re-write and replace the bootloader. And whoever writes the bootloader has to setup the stacks and bring up the hardware, etc.
systems like the gameboy advance or nds or the like, the vendor has some startup code that calls your startup code. so they may have the stack and such setup for them but they are handing off to you, so much of the system may be up, you just get to decide how to slice up the memorires, where you want your stack, data, program, etc.
some vendors want to keep this stuff controlled or a secret, others do not. in some cases you may end up with a board or chip with no example code, just some data sheets and reference manuals.
if you want to get into this business though you need to be prepared to write this startup code (in assembler) that may call some C code to bring up the rest of the system, then that might start up the main operating system or application or whatever. Microcotrollers sounds like what you are playing with, the answers to your questions are in the chip vendors users guides, some vendors are better than others. search for the word reset or boot in the document to try to figure out what their boot schemes are. I recommend you use "dollar votes" to choose the better vendors. A vendor with bad docs, secret docs, bad support, dont give them your money, spend your money on vendors with freely downloadable, well written docs, with well written examples and or user forums with full time employees trolling around answering questions. There are times where the docs are not available except to serious, paying customers, it depends on the market. most general purpose embedded systems though are openly documented. the quality varies widely, but the docs, etc are there.

Depends completely on the controller/embedded system you use. The ones I've used in game development have the IP point at a starting address in RAM. The boot strap code supplied from the compiler initializes static/const memory, sets the stack pointer, and then jumps execution to a main() routine of some sort. Older systems also started at a fixed address, but you manually had to set the stack, starting vector table, and other stuff in assembler. A common name for the starting assembler file is CRT0.s for the stuff I've done.
So 1. You are correct. The microprocessor has to start at some fixed address.
2. The ISR can be supplied by the manufacturer or compiler creator, or you can write one yourself, depending on the complexity of the system in question.
3. The stack and initial programmer counter are usually handled via some sort of bootstrap routine that quite often can be overriden with your own code. See above.
Last: The steps will depend on the chip. If there is a power interruption of any sort, RAM may be scrambled and all ISR vector tables and startup code should be rewritten, and the app should be run as if it just powered up. But, read your documentation! I'm sure there is platform specific stuff there that will answer these for your specific case.

How to find an embedded platform?

I am new to the locating hardware side of embedded programming and so after being completely overwhelmed with all the choices out there (pc104, custom boards, a zillion option for each board, volume discounts, devel kits, ahhh!!) I am asking here for some direction.
Basically, I must find a new motherboard and (most likely) re-implement the program logic. Rewriting this in C/C++/Java/C#/Pascal/BASIC is not a problem for me. so my real problem is finding the hardware. This motherboard will have several other devices attached to it. Here is a summary of what I need to do:
Required:
2 RS232 serial ports (one used all the time for primary UI, the second one not continuous)
1 modem (9600+ baud ok) [Modem will be in simultaneous use with only one of the serial port devices, so interrupt sharing with one serial port is OK, but not both]
Minimum permanent/long term storage: Whatever O/S requires + 1 MB (executable) + 512 KB (Data files)
RAM: Minimal, whatever the O/S requires plus maybe 1MB for executable.
Nice to have:
USB port(s)
Ethernet network port
Wireless network
Implementation languages (any O/S I will adapt to):
First choice Java/C# (Mono ok)
Second choice is C/Pascal
Third is BASIC
Ok, given all this, I am having a lot of trouble finding hardware that will support this that is low in cost. Every manufacturer site I visit has a lot of options, and it's difficult to see if their offering will even satisfy my must-have requirements (for example they sometimes list 3 "serial ports", but it appears that only one of the three is RS232, for example, and don't mention what the other two are). The #1 constraint is cost, #2 is size.
Can anyone help me with this? This little task has left me thinking I should have gone for EE and not CS :-).
EDIT: A bit of background: This is a system currently in production, but the original programmer passed away, and the current hardware manufacturer cannot find hardware to run the (currently) DOS system, so I need to reimplement this in a modern platform. I can only change the programming and the motherboard hardware.

I suggest buying a cheap Atom Mini-ITX board, some of which come with multi - 4+ RS232 ports.
But with Serial->USB converters, this isn't really an issue. Just get an Atom. And if you have code, port your software to Linux.
Here is a link to a Jetway Mini-Itx board, and a link to a 4 port RS232 expansion module for it. ~$170 total, some extra for memory, a disk, and a case and PSU. $250-$300 total.
Now here is an Intel Atom Board at $69 to which you could add flash storage instead of drives, and USB-serial converters for any data collection you need to do.
PC104 has a lot of value in maximizing the space used in 19" or 23" rackmount configurations - if you're not in that space, PC104 is a waste of your time and money, IMHO.

The BeagleBoard should have everything you need for $200 or so - it can run Linux so use whatever programming language you like.

A 'modern' system will run DOS so long as it is x86, I suggest that you look at an industrial PC board from a supplier such as Advantech, your existing system may well run unchanged if it adheres to PC/DOS/BIOS standards.
That said if your original system runs on DOS, the chances are that you do not need the horsepower of a modern x86 system, and can save money by using a microcontroller board using something fairly ubiquitous such as an ARM. Also if DOS was the OS, then you most likely do not need an OS at all, and could develop the system "bare-metal". The resources necessary just to support Linux are probably far greater than your existing application and OS together, and for little or no benefit unless you intend on extending the capability of the system considerably.
There are a number of resources available (free and commercial) for implementing a file system and USB on a bare-metal system or a system using a simple real-time kernel such as FreeRTOS or eCOS which have far smaller footprints than Linux.

The Windows embedded site ( http://www.microsoft.com/windowsembedded/en-us/default.mspx )
has a lot of resources and links to hardware partners, distributors and development kits. There's even a "Spark" incubation project ( http://www.microsoft.com/windowsembedded/en-us/community/spark/default.mspx )
What's also really nice about using windows ce is that it now supports Silverlight as a development environment.

I've used the jetway boards / daughter cards that Chris mentioned with success for various projects from embedded control, my home router, my HTPC front end.
You didn't mention what the actual application was but if you need something more industrial due to temperature or moisture constraints i've found http://www.logicsupply.com/ to be a good resource for mini-itx systems that can take a beating.
A tip for these board is that given your minimal storage requirements, don't use a hard drive. Use an IDE adapter for a compact flash card as the system storage or an SD card. No moving parts is usually a big plus in these applications. They also usually offer models with DC power input so you can use a laptop like or wall wart external supply which minimizes its final size.
This http://www.fit-pc.com/web/ is another option in the very small atom PC market, you'd likely need to use some USB converters to get to your desired connectivity.
The beagle board Paul mentioned is also a good choice, there are daughter cards for that as well that will add whatever ports you need and it has an on board SD card reader for whatever storage you need. This is also a substantially lower power option vs the atom systems.
There are a ton of single board computers that would fit your needs. When searching you'll normally find that they don't keep many interface connectors on the processor board itself but rather you need to look at the stackable daughter cards they offer which would provide whatever connections you need (RS-232, etc.). This is often why you see just "serial port" in the description as the final physical layer for the serial port will be defined on the daughter card.
There are a ton of arm based development boards you could also use, to many to list, these are similar to the beagle board. Googling for "System on module" is a good way to find many options. These again are usually a module with the processor/ram/flash on 1 card and then offer various carrier boards which the module plugs into which will provide the various forms of connectivity you need.
In terms of development, the atom boards will likely be the easiest if your more familiar with x86 development. ARM is strongly supported under linux though so there is little difficulty in getting these up and running.
Personally i would avoid windows for a headless design like your discussing, i rarely see a windows based embedded device that isn't just bad.

Take at look at one of the boards in the Arduino line, in particular the Arduino Mega. Very flexible boards at a low cost, and the Mega has enough I/O ports to do what you need it to do. There is no on-chip modem, but you can connect to something like a Phillips PCD3312C over the I2C connector or you can find an Arduino add-on board (called a "shield") to give you modem functionality (or Bluetooth, ethernet, etc etc). Also, these are very easy to connect to an external memory device (like a flash drive or an SD card) so you should have plenty of storage space.
For something more PC-like, look for an existing device that is powered by a VIA EPIA board. There are lot of devices out there that use these (set-top boxes, edge routers, network security devices etc) that you can buy and re-program. For example, I found a device that was supposed to be a network security device. It came with the EPIA board, RAM, a hard drive, and a power supply. All I had to do was format the hard drive, install Linux (Debian had all necessary drivers already included), and I had a complete mini-computer ready to go. It only cost me around $45 too (bought brand new, unopened on ebay).
Update: The particular device I found was an EdgeSecure i10 from Ingrian Networks.

How is the BIOS used by a modern OS?

What's the function of the BIOS in a modern OS? Is it still used after booting? And is there some kind of BIOS API?

The BIOS is still the first thing that runs on the just-started CPU and responsible for getting the motherboard hardware turned on, setting basic chipset modes and registers, initializing some hardware, and running the code that loads the kernel.
The BIOS is usually not used once the kernel is loaded, and depends on a 16-bit execution environment as opposed to the 32- or 64-bit protected mode environment that a modern kernel operates in.
The boot loader normally does require the BIOS IO calls to get the kernel into memory. The BIOS is being replaced even in this role by newer boot-time software such as Coreboot to provide faster boot times. EFI will one day replace the traditional BIOS, and hopefully the boot loader, passing control directly to the kernel after loading it from storage.
The discovered hardware configuration, memory range settings, and ACPI metadata tables are probably the only BIOS-based data used by the OS after the kernel is loaded. Any runnable ACPI code is encoded as ACPI Machine Language and is interpreted by the OS.
Any good traditional book on MS-DOS assembly programming will include information on the BIOS programming interface. Check out The Art of ASSEMBLY LANGUAGE PROGRAMMING

I wrote BIOS for notebook computers for several years. The BIOS does a lot of things while the OS is running.
A major task is to inform the OS when many events happen so the OS can look smart (as if it somehow figured these things out on its own). For example, the BIOS tells the OS when: the power button is pressed, batteries are inserted or removed, AC power comes or goes, the system connects to or disconnects from a docking station, hard drives and or certain types of optical drives are inserted or removed.
Most portable computers have features that you can access/control through Fn keys and through OS-level applications provided by the manufacturers. The BIOS responds to these hotkeys and has code to interface with the OS-level apps. Features like controlling screen brightness (which certain OSes want to appear to control) or controlling bling LEDs fall into this category.
Perhaps the most important task of the BIOS is to shut down the system when the power button is held down for more than 4 seconds (to recover from OS hangs!).

The biggest benefit to having OS control over BIOS now is to control hardware level variables such as fan speed, temperature gauges, etc.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas