Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Per my understanding, virtual memory is as follows:
Programs/applications/executables reside in a storage device. Storage device access is much slower than RAM. Hence, programs is copied from storage memory to main memory for execution. Since computers have limited main memory (RAM), when all of the RAM is being used (e.g., if there are many programs open simultaneously or if one very large program is in use), a computer with virtual memory enabled will swap data to the HDD and back to memory as needed, thus, in effect, increasing the total system memory.
As far as I know, most embedded devices do not have disk memory (like smartphones or in car infotainment systems). Code is directly executed from Flash memory. RAM is mainly used as a scratchpad area (local variables, return address etc).
So why do we need virtual memory in embedded systems? (e.g. WinCE and QNX support virtual memory)
Your understanding is completely wrong. You are confusing virtual memory with swapping or page files. There are systems that have virtual memory and no swap or page files and there are systems that swap without virtual memory.
Virtual memory just means that a process has a view of memory that is different from the physical mapping. Among other things, it allows processes to have their own virtual address space.
Storage device access is much slower than RAM. Hence programs is copied from storage memory to main memory for execution. Since computers have limited main memory (RAM), when all of the RAM is being used (e.g., if there are many programs open simultaneously or if one very large program is in use), a computer with virtual memory enabled will swap data to the HDD and back to memory as needed, thus, in effect, increasing the total system memory.
That's swapping (or paging). It has nothing to do with virtual memory except that most modern operating systems implement swapping using virtual memory. Swapping actually existed before virtual memory.
I think you're probably incorrect about these devices running code directly from flash memory. The read speed of flash is pretty low and RAM is very cheap. My bet is that most of the systems you mention don't run code directly from flash and instead use virtual memory to fault code into RAM as needed.
embedded systems, the term itself has a wide range of applications. you could call a small microcontroller with flash program space measured in kbytes or less and ram measured in either bits or bytes (not enough to be kbytes) an embedded system. Likewise a tivo running a full blown operating system on a pretty much full blown computer motherboard (replace tivo with xbox as another example) as an embedded system. So you need to be less vague about your question. virtual memory has little to do with any of that its applications cross those boundaries.
There are many answers above, David S has the best of course that virtual memory simply means the memory address on one side of the virtual memory boundary is different than the physical address that is used on the other side of that boundary. Where, how, why, etc is there a boundary varies.
A popular use for virtual memory, and I might argue a primary use case is for operating systems. One benefit is that for example all applications could be compiled for the same address space, all applications might be compiled such that from the programs perspective they all start at say address 0x8000, and as far as that program when it runs and accesses memory it accesses stuff based on that address. A combination of the hardware and the operating system change that virtual address that the program is using to a physical address. If the operating system allows for multitasking, then each task might think they are in the same address space but the physical addresses are different for each of those tasks. I wont elaborate further on why using an assumed, fixed address space, is a benefit. Another aspect that operating systems use is memory management. Many MMU's will let you segment the memory however. If a user wants to allocate 100 Megabytes of memory the program may access in its virtual address space that 100 meg as if it were linear and in that address space it is linear, but that 100 meg might be broken down into say 4Kbyte chunks that are scattered all about the physical address space, not always likely but certainly technically possible that no two chunks of that physical memory is next to any other chunk of that 100 meg. your memory management doesnt necessarily have to try to keep large physical chunks of memory available for applications to allocate. Note not all MMUs are exactly the same and 4Kbytes is just an example. A third major benefit from virtual address space to an operating system is protection. If the application is bound to the virtual address space, it is often quite easy to prevent that application from touching the memory of any other application or the operating system. the application in this case would operate/execute at a proection level such that all accesses are considered virtual and have to go through a translation to physical, the tables that are used to define that virtual to physical can contain protection flags. If the application addresses a memory address in its virtual space that it has no business accessing, the hardware can trap that and let the operating system take action as to how to handle it (virtualize some hardware, pop up an error and kill the app, pop up a warning and not kill the app but at the same time feed the app bogus data for their transaction, etc).
There are lots of ways this can be used in an embedded system. first off many embedded systems run operating systems, so all of the above, ease of compiling the program for the address space, relative ease of memory management, and protection of the other applications and operating system and other benefits not mentioned. (virtualization being one, being able to enable/disable instruction/data caching on a block by block basis is another)
The bottom line though is what David S pointed out. virtual memory simply means the virtual address is not necessarily equal to the physical address, it can be but doesnt have to be, there is some boundary, some hardware, usually table driven, that translates the virtual address into a physical address. Lots of reasons why you would want to do this, since some embedded systems are indistinguishable from non-embedded systems any reason that applies to a non-embedded system can apply to an embedded system.
As much as folks may want you to believe that a system has a flat address space, it is often an illusion. In a microcontroller for example you might have multiple flash banks and one or more ram banks. Each of these banks has a physical, generally zero based address. Even if there is no mmu or anything else like that there is a place somewhere between the address bus on the processor and the address bus on the flash or ram memory that decodes the address on the processor and uses that to address into the specific memory bank. Often the lower bits match and upper bits are responsible for the bank choices (this is often the case with an mmu as well) so in that sense the processor is living in a virtual address space. (not limited to microcontrollers, this is generally how processors address busses are treated) With microcontrollers depending on a pin being pulled high or low or some other mechanism you might have a chip feature that allows one flash bank to be used to boot the processor or another. You might tie an input pin high and the processors built in bootloader allows you to access and debug the system for example reprogram the application flash. Or perhaps tie that line low and boot the application flash instead of the vendors debugger/boot flash. some chips get even more complicated letting you boot one flash then the program writes a register somewhere instantly changing the memory architecture moving things around, for example allowing ram to be used for the interrupt vector table so your application can be changed after boot rather than a vector table in flash that is not as easy to change at will.
now when you talk about virtual memory as far as swapping to and from a disk, that is a trick often employed by operating systems to give the illusion of having more ram. I mentioned that above under the category of virtualization. virtual memory in the sense that it isnt really there, I have X bytes but will let the software think there are Y bytes (where Y is larger than X) available. The operating system through the virtual tables used by the hardware, manages which memory chunks are tied to physical ram and are allowed to complete as is by the hardware, or are marked as not available in some way, causing an exception to the operating system, upon inspection the operating system determines that this is a valid address for this application, but the data behind this address has been swapped to disk. The operating system then finds through some algorithm another chunk of ram belonging to whomever (part of the algorithm) and it copies that chunk of ram to disk, marks the table related to that virtual to physical as not valid, then copies the desired chunk from disk to ram, marks that chunk as valid and lets the hardware complete the memory cycle.
Not any different than say how vmware or other virtual machines work. You can execute instructions natively on the hardware using virtual memory until such time as you cause an exception, the virtual machine might think you have an xyz network interface and might have a driver that is accessing a register in that xyz network interface, but the reality is you have no xyz hardware and/or you dont want the virtual machine applications to access that hardware, so you virtualize it, you trap that register access, and using software that simulates the hardware you fake that access and let the program on the virtual machine continue. This obviously not the only way to do virtual machines, but it is one way if the hardware supports it, to let a virtual machine run very fast as a percentage of the time it is actually running instructions on the hardware. The slowest way to virtualize of course is to virtualize everything including the processor, every instruction in that case would be simulated, this is quite slow but has its own features (virtualizing an arm system on an x86 or x86 on an arm, xyz on an abc, fill in the blanks). And if that is the type of virtual memory you are talking about in an embedded system, well if the embedded system is for the most part indistinguishable from a non-embedded system (an xbox or tivo for example) then well for the same reasons you could allow such a thing. If you were on a microcontroller, well the use cases there would generally mean if you needed more memory you would buy a bigger microcontroller, or add more memory to the system ,or change the needs of the application such that it doesnt need as much memory. there may be exceptions, but it mostly depends on your application and requirements, a general purpose or general purpose like system which allows for applications or their data to be larger than the available ram, will require some sort of solution. the microcontroller in your keyless entry key fob thing or in your tv remote control or clock radio or whatever normally would not have a need to allow "applications" to require more resources than are physically there.
The more important benefit of using virtual memory is that every process gets its own address space which is isolated from every other process's. That way virtual memory helps keep faults contained and improves security and stability. I should note that it is still possible for two processes to share a bit of memory, to facilitate communication (shared mem IPC).
Also you can do other tricks like conserving memory via mapping shared parts into more than one process's (libc comes to mind for embedded use) address space but only having it once in physical mem. Also this gives it a speed boost, you can even enhance it further the way linux does cheapen fork/clone by only copying the in kernel descriptors and leaving the memory image alone up until the first write access is done with a similar idea.
As a last benefit, in modern systems, it's common to do file I/O via mapping the file into the process space (cf. mmap for example).
It's interesting to note that one can get some of the benefits of "virtual memory" without needing a full-fledged MMU. The hardware requirements can sometimes be amazingly light. The PIC 16C505 has a 5-bit address space and 40 bytes of RAM; addresses 0x10 to 0x1F can map to either of two groups of 16 bytes of RAM. When writing an application which needed to manage two different data streams, I arranged so that all the variables associated with one data stream would be in the first group of 16 "switchable" memory locations, and those associated with the other would be at the corresponding addresses in the second group. I could then use the same code to manage both data streams. Simply set the banking bit one way, call the routine, set it the other way, and call the routine again.
One of the reasons Virtual Memory exists is so that your device can multitask. It can also act as your RAM does, thus taking the load off of your physical RAM and swapping the load back and forth.
Related
I'm currently researching topics such as RAM/ROM/Stack/Heap and data segments etc.
I was looking at the ARM Cortex-M3 memory map and saw the region labeled "External RAM".
According to the data sheet of a random Cortex-M3 STM32 MCU the external RAM region is mapped from 0x60000000- 0x9FFFFFFF, so it is quite large!
I couldn't find a definitive answer about how this region is actually used.
I imagine you would have an external SRAM and you would choose between two options.
(1) Read via the SPI interface and place into a local buffer(stack), then load that local buffer into the external ram region. This option seems to have a lot of negative consequences, such as hogging the CPU and increasing the stack temporarily if the requested data is very large.
(2) Utilize a DMA and transfer from the SPI interface into the external ram region.
Now I can't understand, why you would map the data to this specific address range, what are the advantages, why don't you just place the data directly in that huge memory region?
Now I'm asking this question because I have a slight feeling I have completely missed the point of what the External RAM region really is.
-Edit-
In the data sheet that is linking to the STM32 device, the memory region "External RAM" is marked as reserved. It is my conclusion that the memory regions listed by ARM is showing the full potential of a 32bit MCU, as I incorrectly state that the external RAM region "is quite large!" does not necessarily mean that this is "real" size of that region, if it is even used, it depends on what the vendor can physically achieve within the MCU hardware, and I imagine they would limit hardware capabilities to be competitive on price, power consumption etc.
I imagine you would have an external [SRAM][3] and you would choose
between two options.
(1) Read via the SPI interface and place into a local buffer(stack), then load that local buffer into the external ram region. This option
seems to have a lot of negative consequences, such as hogging the CPU
and increasing the stack temporarily if the requested data is very
large.
(2) Utilize a DMA and transfer from the SPI interface into the external ram region.
None of the above. External memory on an SPI bus is not memory mapped. If you have an SPI memory, it is not mapped to that region, it is simply an SPI device, and the "address" is simply an offset from the start of the memory device itself. MCUs with a Quad or Octo-SPI controller are memory mapped. QSPI RAM is not that common and relatively expensive. QSPI is more commonly used for flash memory.
The external memory region can be used by STM32 parts with an FSMC (Flexible
Static Memory Controller) or an FMC (Flexible Memory Controller), or and mentions a QPSI interface. The latter FMC SDRAM, and is generally available on the higher end parts. Apart from the QSPI and NAND flash, these interfaces require using the GPIO EMIF (external memory interface) alternate function to create an address and data bus. So it generally requires parts with high pin count to accommodate. The EMIF can be configured for 8, 16 or 32bit data bus for reduced pin count (and slower access).
Now I can't understand, why you would map the data to this specific
address range, what are the advantages, why don't you just place the
data directly in that huge memory region?
Since it was precipitated by your earlier misconception this question is perhaps redundant, but memory that exists in the memory map can be used to store data accessed as regular variables rather than transferring to an from internal buffers and it can be used as an execution region - code can loaded to and be executed directly from such memory.
Now I'm asking this question because I have a slight feeling I have completely missed the point of what the External RAM region really is.
Self awareness is a skill. That is known as conscious incompetence and is a motivator for learning.
It is my conclusion that the memory regions listed by ARM is showing the full potential of a 32bit MCU, as I incorrectly state that the external RAM region "is quite large!" does not necessarily mean that this is "real" size of that region, if it is even used, it depends on what the vendor can physically achieve within the MCU hardware, and I imagine they would limit hardware capabilities to be competitive on price, power consumption etc.
No, it is largely about the number of pins available for an address bus (except for QSPI). The external memory is a matter for the board design - it is not something the MCU vendor decides must be present. The constraint is a maximum, not a required amount of physical memory. The STM32 FMC supports the following memory sizes/types:
So you can have up to 512Mb of SDRAM for example. The space available for static memories (NOR/PSRAM/SRAM) is significantly larger than the than the typical size of such memories.
I was doing a little on 32-bit microprocessors and have I have learnt that:
1) A 32-bit microprocessor can only address 2^32 bits of memory which means that the memory pointer size should not exceed 32-bit range i.e. the pointer size should be equal to or less than 32-bit.
2) I also came to know that CPU allocate multiple blocks of memory for things like storing numbers and text, that is up to the program and not related to the size of each address (Source:here).So is it possible that a CPU can use multiple blocks (registers) to store pointers more than 32-bit in size?
Processors can access an essentially unlimited amount of memory by using variations on a technique called bank switching. In a simple bank-switching scheme, the memory chips that are wired to a portion of the address space will have some address inputs fed by the processor and some from an external latching device. Historically, the IBM PC had a 1MB address space, but an expanded memory board would IIRC allow two 16KB regions of that space to be mapped to any of dozens or hundreds of 16KB blocks of memory contained thereon. Nowadays processors generally have a memory-management unit built-in, which maps 4KB or 64KB blocks of memory to any address within a much larger space, and additional circuitry may, with OS support, expand things further.
The big difficulty with bank switching is that any given address might identify many different places in memory depending upon how the bank-switching hardware is configured, so accessing data from memories in a banked region will generally be more complicated than accessing data in directly-accessible memory and will only be possible from code which knows how the bank-switching hardware works. Nowadays it's more common to simply use a processor which can access all the memory one needs, but historically bank-switching was often a useful technique for going beyond processor limitations.
You could store a 64 bit pointer using 2 separate locations in memeory. But it probably wouldn't be useful since your processor can only use 32 bit pointers.
I'm interested in operating systems topic and I have a dummy question. Standard PE executable files are linked to 0x400000. My question is how can operating system load multiply executables with same image base, when virtual memory just maps virtual addresses to physical. Is it storing PDE and PTE index of thread somewhere? Is there some addition to each address before execution starts? How does it work?
Each process gets its own virtual address space, and hence there's no conflict. All virtual address spaces that exist in any one time in the system get mapped into the physical address space. Virtual memory that can't or currently isn't mapped onto a particular physical memory is held in the swap file (swap partition, or alike) — this is called paging.
During thread switches, when the CPU is about to execute a thread from a different process than it was executing so far, the operating system's scheduler informs the CPU (sets the respective registers) about the new virtual address translation table to use. Thus the CPU thinks there's just one virtual address space at the given time, while the operating system can manage many more, one for each process.
Disclaimer: My answer may be a thought of as a bit superficial or imprecise as opposed to the reality. This for the sake of simplicity in respect to the nature of the OPs question. Also, these mechanisms are CPU-dependent and operating system-dependent.
I am working on Uboot bootloader. I have some basic question about the functionality of Bootloader and the application it is going to handle:
Q1: As per my knowledge, bootloader is used to download the application into memory. Over internet I also found that bootloader copies the application to RAM and then the application runs from RAM. I am confused with the working of Bootloader...When application is provided to bootloader through serial or TFTP, What happens next, whether Bootloader copies it to RAM first or whether it writes directly to Flash.
Q2: Why there is a need for Bootloader to copy application to RAM and then run the application from RAM? What difficulties we will face if our application runs from FLASH?
Q3: What is the meaning of statement "My application is running from RAM/FLASH"? Is it mean that our application's .text segment or .code segment is in RAM/FLASH? And we are not concerned about .bss section because it is designed to be in RAM.
Thanks
Phogat
When any hardware system is designed, the designer must consider where the executable code will be located. The answer depends on the microcontroller, the included memory types, and the system requirements. So the answer varies from system to system. Some systems execute code located in RAM. Other systems execute code located in flash. You didn't tell us enough about your system to know what it is designed to do.
A system might be designed to execute code from RAM because RAM access times are faster than flash so code can execute faster. A system might be designed to execute code from flash because flash is plentiful and RAM may not be. A system might be designed to execute code from flash so that it boots more quickly. These are just some examples and there are other considerations as well.
RAM is volatile so it does not retain code through a power cycle. If the system executes code located in RAM then a bootloader is required to obtain and write the code to RAM at powerup. Flash is non-volatile so execution can start right away at powerup and a bootloader is not necessary (but may still be useful).
Regarding Q3, the answer is yes. If the system is running from RAM then the .text will be located in RAM (but not until after the bootloader has copied it to there). If the system is running from flash then the .text section will be located in flash. The .bss section is variables and will be in RAM regardless of where the .text section is.
Yes, in general a bootloader boots the system, but it might also provide a mechanism for interrupting the default boot path and allow alternate firmware to be downloaded and run instead, as well as other features (like flashing).
Traditional rom had a traditional ram like interface, address, data, chip select, read/write, etc. And you can still buy rom that way, but it is cheaper from a pin real estate perspective to use something spi or i2c based, which is slower. Not desireable to run from, but tolerable to read once then run from ram. newer flash technologies can/have had problems with read-disturb, where if your code is in a tight loop reading the same instructions or for any other reason the flash is being read too fast, the charge can drop such that a read returns the wrong data, potentially causing the program to change course or crash. Also your PC and other linux platforms are used to copying the kernel from NV storage (hard disk) to ram and then running from there so the copy from flash to ram and run from ram has a comfort level, and is often faster than flash. So there are many potential reasons to not use flash, but depending on the system it may be possible to run from flash just fine (some systems the flash in question is not accessible directly and not executable, of course SOME rom in that system needed to be executable/bootable).
It simplifies the coding challenges if you program the flash with something that is in ram. You can create and debug the code one time that reads from ram and writes to flash and reads from flash and writes to ram. DONE. Now you can work on separate code that receives data from serial to ram, or from ram to serial. DONE. Then work on code that does the same over ethernet or usb or whatever DONE. You dont have to deal with inventing a protocol or solving the problem of timing. Flash writing is very slow, and even xmodem at a moderate speed can be way too fast, so you have to buffer that data in ram anyway, might as well make the tasks completely separate, instead of an xmodem or any other serial based flash loader with a big ram based fifo, just move the data to ram, then separately go from ram to flash. Same for other interfaces. It is technically possible to buffer the data and give the illusion of going from the download interface straight to flash, and depending on the protocol it is technically possible to hold off the sender so that as little as one flash page is required in ram before programming flash. With the older parallel flashes you could do something pretty cool which I dont think most people figured out. When you stop writing to the flash page for some known period of time the flash would automatically start to program that page and you have to wait for 10ms or something like that before it is done. What folks assumed was you had to program sequential addresses and had to get the new data for the next address in that period of time and would demand high serial port speeds, etc, the reality is you can pound the same address over and over again with the same data and the flash wont start to program the page, and the download interface can be infinitely slow. Serial flashes work differently and either dont need tricks or have different tricks.
RAM/FLASH is not some industry term. It likely means that .text is in rom (flash) and .data and .bss are in ram. A copy of the initial state of .data will probably be on flash as well and copied to ram before main() is called, likewise .bss will be zeroed before main() is called. look at crt0.S for most platforms in gnu sources (glibc, or is it gcc, I dont know) to get the gist of how the bootstrap works in a generic fashion.
A bootloader is not required to run linux or other operating systems, you dont NEED uboot, but it is quite useful. Linux is pretty easy, you copy the kernel and root file system, either set some registers or some tags in memory or both then branch to the entry point in the kernel and linux takes over from there. Because linux is so complicated it is desireable to have a complicated bootloader that can capitalize on high speed interfaces like ethernet (rather than being limited to serial or slower).
I would add something regarding your question Q2.
Q2: Why there is a need for Bootloader to copy application to RAM and then run the application from RAM? What difficulties we will face if our application runs from FLASH?
It is not only about having SPI or similar serial external code memory (which is not that often anyway).
Even the external ROM/FLASH/EPROM/ connected to the usual high speed parallel bus will will prevent a system from running on a maximum clock (with zero wait state) even on the relatively slow MCUs due to the external memory access time. You would need 10 ns FLASH access time for the 100 MHz clock, which is not so easy to get (if economically possible at all). And you would agree that 100 MHz is not such a brain spinning speed any more :-)
That is why many MCU/CPU architectures are doing tricks with reading multiply instructions at once, or having internal cash, or doing whatever was needed to compensate for a slow external code memory. Only most older 8-bit architectures can execute the code directly from the flash memory ('in place').
Even if your only code memory was the internal Flash, something need to be done to speed it up. Take a look for example at this article:
http://www.iqmagazineonline.com/magazine/pdf/v_3_2_pdf/pg14-15-18-19-9Q6Phillips-Z.pdf
It desribes how the ARM7 has incorporated something they called MAM (Memory Accelerator Module). It is a good read, and you will find some measures there to speed up the code memory access for the specific ARM7 arhitecture (goes for most others):
Limit maximum clock frequency (from 80 MHz to about 20 MHz for the example in the article)
Insert wait-cycles during flash accesses
Use an instruction cache
Copy the program code from flash to RAM
Obviously, if the instruction cache was not an option (too small, or the clock too high) you are really left only with execution from the RAM, after relocating the code there at the start up.
There is an option also to run only specific section of code from the RAM, which could be specified to the linker. For the DSP (Digital System Processing) systems, there was really no option to run from the EPROM/FLASH even in the old days with clock around only few tens of MHz, let alone now.
Another issue is debugging, the options for debugging the code placed in ROM, or even Flash, are very limited (you have to move section of the code to RAM to be able to set a break point on most systems).
Regarding Q2, one of the difficulties you may face executing from Flash is another code update. If you are executing from the same block of Flash you are trying to update, the system will crash. This depends on your system architecture (how your application and bootloader are organized in Flash) but may be particularly hard to avoid if you are trying to update the bootloader itself.
I basically wanted to know what exactly a virtual processor is. At IBM's site they define it as:
"A virtual processor is a representation of a physical processor core to the operating system of a logical partition that uses shared processors. "
I understand that if there are x processors, each of which can simultaneously perform two operations, then the system can perform 2x operations simultaneously. But where does virtual processor fit into this. And i tried looking up the difference between a logical partition and other partitions such as primary but wasn't really sure.
I'd like to draw an analogy between virtual memory and virtual processors.
Start with expectations:
A user program is written against a set of expectation about what the memory looks like (an a nice flat, large, continuous memory model is the best...)
An OS system is written against a set of expectation of how the hardware performs (what CPU protection modes operation are available, how interrupts arrive and are blocked and handled, how to talk to IO devices, etc...)
Realize that expectation can be met directly by the hardware, or by an abstraction layer
Virtual memory is a set of (specialized, not found in simple chips) hardware tools and OS services that fake a user program into thinking that it has that nice, flat, large, continuous memory space, even while the OS is busily dividing the real memory into little piece, and storing some of them on disk, bringing other back, and otherwise making a real hash of it. But your code doesn't care. Everything just works.
A virtual processor system is a set of (specialized, not found in consumer CPUs) hardware tools and hypervisor services that allow your OS to believe it has direct access to one or more processors with the expected protection modes, interrupts, etc. even though the hypervisor is busily swapping whole OS contexts onto and off of one or more real processors, starting and stopping access to IO busses, and so on and so forth. But the OS doesn't care. Everything just works.
The hardware support to do this is has only recently started to be available in "desktop" CPUs, but Big Iron has had it for ages. It is useful for a couple of reasons
Protection. In a properly protected OS, it is tough for one processes or user to spy on another. But since they can be resident in the same context, it may still be possible. Virtualizing OSs divides them by another, even thinner channel and makes it that much harder for data to leak, and malicious things to be done.
Robustness. If you can swap OS contexts in and out you migrate them from one machine to anther and checkpoint and restart. Which allows for computers that detect failures on their own processors and recover gracefully.
These are the things (aside from millions of LOC of heavily debugged, mission critical code) that have kept people paying for Big Iron.