Can a 32-bit processor load a 64-bit memory address using multiple blocks or registers? - 32-bit

I was doing a little on 32-bit microprocessors and have I have learnt that:
1) A 32-bit microprocessor can only address 2^32 bits of memory which means that the memory pointer size should not exceed 32-bit range i.e. the pointer size should be equal to or less than 32-bit.
2) I also came to know that CPU allocate multiple blocks of memory for things like storing numbers and text, that is up to the program and not related to the size of each address (Source:here).So is it possible that a CPU can use multiple blocks (registers) to store pointers more than 32-bit in size?

Processors can access an essentially unlimited amount of memory by using variations on a technique called bank switching. In a simple bank-switching scheme, the memory chips that are wired to a portion of the address space will have some address inputs fed by the processor and some from an external latching device. Historically, the IBM PC had a 1MB address space, but an expanded memory board would IIRC allow two 16KB regions of that space to be mapped to any of dozens or hundreds of 16KB blocks of memory contained thereon. Nowadays processors generally have a memory-management unit built-in, which maps 4KB or 64KB blocks of memory to any address within a much larger space, and additional circuitry may, with OS support, expand things further.
The big difficulty with bank switching is that any given address might identify many different places in memory depending upon how the bank-switching hardware is configured, so accessing data from memories in a banked region will generally be more complicated than accessing data in directly-accessible memory and will only be possible from code which knows how the bank-switching hardware works. Nowadays it's more common to simply use a processor which can access all the memory one needs, but historically bank-switching was often a useful technique for going beyond processor limitations.

You could store a 64 bit pointer using 2 separate locations in memeory. But it probably wouldn't be useful since your processor can only use 32 bit pointers.

Related

Cortex-M3 External RAM Region

I'm currently researching topics such as RAM/ROM/Stack/Heap and data segments etc.
I was looking at the ARM Cortex-M3 memory map and saw the region labeled "External RAM".
According to the data sheet of a random Cortex-M3 STM32 MCU the external RAM region is mapped from 0x60000000- 0x9FFFFFFF, so it is quite large!
I couldn't find a definitive answer about how this region is actually used.
I imagine you would have an external SRAM and you would choose between two options.
(1) Read via the SPI interface and place into a local buffer(stack), then load that local buffer into the external ram region. This option seems to have a lot of negative consequences, such as hogging the CPU and increasing the stack temporarily if the requested data is very large.
(2) Utilize a DMA and transfer from the SPI interface into the external ram region.
Now I can't understand, why you would map the data to this specific address range, what are the advantages, why don't you just place the data directly in that huge memory region?
Now I'm asking this question because I have a slight feeling I have completely missed the point of what the External RAM region really is.
-Edit-
In the data sheet that is linking to the STM32 device, the memory region "External RAM" is marked as reserved. It is my conclusion that the memory regions listed by ARM is showing the full potential of a 32bit MCU, as I incorrectly state that the external RAM region "is quite large!" does not necessarily mean that this is "real" size of that region, if it is even used, it depends on what the vendor can physically achieve within the MCU hardware, and I imagine they would limit hardware capabilities to be competitive on price, power consumption etc.
I imagine you would have an external [SRAM][3] and you would choose
between two options.
(1) Read via the SPI interface and place into a local buffer(stack), then load that local buffer into the external ram region. This option
seems to have a lot of negative consequences, such as hogging the CPU
and increasing the stack temporarily if the requested data is very
large.
(2) Utilize a DMA and transfer from the SPI interface into the external ram region.
None of the above. External memory on an SPI bus is not memory mapped. If you have an SPI memory, it is not mapped to that region, it is simply an SPI device, and the "address" is simply an offset from the start of the memory device itself. MCUs with a Quad or Octo-SPI controller are memory mapped. QSPI RAM is not that common and relatively expensive. QSPI is more commonly used for flash memory.
The external memory region can be used by STM32 parts with an FSMC (Flexible
Static Memory Controller) or an FMC (Flexible Memory Controller), or and mentions a QPSI interface. The latter FMC SDRAM, and is generally available on the higher end parts. Apart from the QSPI and NAND flash, these interfaces require using the GPIO EMIF (external memory interface) alternate function to create an address and data bus. So it generally requires parts with high pin count to accommodate. The EMIF can be configured for 8, 16 or 32bit data bus for reduced pin count (and slower access).
Now I can't understand, why you would map the data to this specific
address range, what are the advantages, why don't you just place the
data directly in that huge memory region?
Since it was precipitated by your earlier misconception this question is perhaps redundant, but memory that exists in the memory map can be used to store data accessed as regular variables rather than transferring to an from internal buffers and it can be used as an execution region - code can loaded to and be executed directly from such memory.
Now I'm asking this question because I have a slight feeling I have completely missed the point of what the External RAM region really is.
Self awareness is a skill. That is known as conscious incompetence and is a motivator for learning.
It is my conclusion that the memory regions listed by ARM is showing the full potential of a 32bit MCU, as I incorrectly state that the external RAM region "is quite large!" does not necessarily mean that this is "real" size of that region, if it is even used, it depends on what the vendor can physically achieve within the MCU hardware, and I imagine they would limit hardware capabilities to be competitive on price, power consumption etc.
No, it is largely about the number of pins available for an address bus (except for QSPI). The external memory is a matter for the board design - it is not something the MCU vendor decides must be present. The constraint is a maximum, not a required amount of physical memory. The STM32 FMC supports the following memory sizes/types:
So you can have up to 512Mb of SDRAM for example. The space available for static memories (NOR/PSRAM/SRAM) is significantly larger than the than the typical size of such memories.

Is there any connection between the segments created in memory by a microprocessor and the memory structure of a process in an operating system?

In 8086 microprocessor, we segment the memory into segments of 64K each because of the 16 bit registers (Since a 20 bit address cannot be stored in the 16 bit register). These segments are categorized as code segment, data segment, stack segment and extra segment. This structure is similar to that of created by a process in operating system. Does that mean each process takes up memory equivalent to 4 segments which will be equivalent to 4*64K in case of 8086 ? And if this is true then by doing some more math we can say that only 4 process will be handled by a 8086 microprocessor at a time (i.e. one of the process will be running state and others would be in block or ready state) since maximum of 16 segments are possible (Total memory size / size of each segment = 1MB/64K = 16).
I have just started studying this and saw this equivalence between process and segments. Does any such connection between the segments of the memory and the memory structure of the process actually exists or it's just my crazy imagination ?
A little history helps. Early UNIX(tm) ran on the Digital pdp minicomputer family. The first circulated versions were V6 & V7, which were exclusive to the pdp-11 family. That family could support a whopping 256K of RAM; but the gp register set (used for address formation) were 16bits wide. There was a limited memory protection scheme in the processor, which permitted the kernel (supervisor) to have a separate address space from user (user); and instructions (addresses generated by pc) to be separate from data (generated by other means). This will probably get edited into the dust by pdp-11 fanbois.
At around this time, intel was rolling what was to become the 8086. Current 8-bit CPUs were already straining at a 64K address space limitation, and were using a concept called bank switching to increase that. In bank switching, some sub-ranges of the 64K address space could be re-pointed into a larger memory bank; so although you could carefully address much more memory. The Hitachi 64180 was one of the CPUs that incorporated this into its silicon; most used external memory controllers.
The 8086 addressing scheme was an amalgamation of these notions. You could produce an Operating System which supported dynamically relocated processes and shared text with up to 64K Instruction + 64K Data. The general idea was you take the segment registers out of the programming model, thus if the OS has to relocate the process, it knows that the process had no saved copy of the old segment value. The commercial OS QNX 1.x, 2.x provided this as a model; the later using the 286 extensions to protect against programs that played with the segment registers.
For programs that didn't care about such subtleties (Lotus 123, ...), you could use the segment registers to effectively create a 2^20 address space on the 8086. It is an ugly programming model in this mode because address formation is A=Seg*16+Base, so Seg=1,Base=0 and Seg=0,Base=16 resolve to the same address.
So, you aren't hallucinating, it was quite intentional, if more than a little half-arsed.

How does malloc know where the first available block is in embedded systems?

I have read that malloc has multiple implementations which are platform depended.
How does it work in an embedded device in bare metal programming?
Let's suppose we have an mcu with 256KB FLASH memory and 64KB RAM.
How does it know how much available RAM there is from my program?
For bare metal systems, you'll have a specific segment allocated in the linker script, often called .heap. There is no such thing as memory sharing between processes, meaning that the heap must have a fixed maximum size and therefore is pretty useless in general. malloc doesn't know a thing about how much RAM your program uses since there is no desktop OS in sight.
Your RAM is divided into .stack, .data, .bss and .heap, each with its own fixed maximum size. More about these segments here: https://electronics.stackexchange.com/a/237759/6102. In a typical bare metal MCU application, most of the RAM will be reserved for .data and .bss. You will have something from 128 bytes up to several kb reserved for the stack. You will typically not have a heap at all - but if you do, it will sit there and take up a fixed amount of x kb no matter how much of it you actually use.
malloc in itself could be implemented in different ways indeed. Either you include a "header" together with each allocated segment, the header stating the allocated size and potentially the address of the next available free segment. Or you could implement it as a look-up table where each item is a pointer to the first element and the size.
None of this is particularly relevant, since you shouldn't be using heap allocation in embedded systems. The main reason being that it doesn't make any sense. You don't want arbitrary behavior, you want deteministic behavior. You want to allocate x amount of memory for the worst case and if a heap was to be used it would have to be at least that large anyway, so you gain nothing but bloat from using a heap. Then comes all the usual problems with allocation overhead, fragmentation and leaks.
For bare metal/RTOS applications, do yourself a favour and delete .heap from your linker script, then forget that you ever heard about malloc. A MCU is not a PC.

What does it mean when my CPU doesn't support unaligned memory access?

I just discovered that the ARM I'm writing code on (Cortex M0), doesn't support unaligned memory access.
Now in my code I use a lot of packed structures, and I never got any warnings or hardfaults, so how can the Cortex access members of these structures when it doesnt allow for unaligned access?
Compilers such as gcc understand about alignment and will issue the correct instructions to get around alignment issues. If you have a packed structure, you will have told the compiler about it so it know ahead of time how to do alignment.
Let's say you're on a 32 bit architecture but have a struct that is packed like this:
struct foo __attribute__((packed)) {
unsigned char bar;
int baz;
}
When an access to baz is made, it will do the memory loads on a 32 bit boundary, and shift all the bits into position.
In this case it will probably to a 32 bit load of the address of bar and a 32 bit load at the address of bar + 4. Then it will apply a sequence of logical operations such as shift and logical or/and to end up with the correct value of baz in a 32 bit register.
Have a look at the assembly output to see how this works. You'll notice that unaligned accesses will be less efficient than aligned accesses on these architectures.
On many older 8-bit microprocessors, there were instructions to load (and storing) registers which were larger than the width of the memory bus. Such an operation would be performed by loading half of the register from one address, and the other half from the next higher address. Even on systems where the memory bus is wider than 8 bits wide (say, 16 bits) it is often useful to regard memory as being an addressable collection of bytes. Loading a byte from any address will cause the processor to read half of a 16-bit memory location and ignore the other half. Reading a 16-bit value from an even address will cause the processor to read an entire 16-bit memory location and use the whole thing; the value will be the same as if one read two consecutive byte addresses and concatenated the result, but it will be read in one operation rather than two.
On some such systems, if one attempts to read a 16-bit value from an odd address, the processor will read two consecutive addresses, using half of one value and the other half of the other value, as though one had performed two single-byte reads and combined the results. This is called an unaligned memory access. On other systems, such an operation will result in a bus fault, which generally triggers some form of interrupt which may or may not be able to do something useful about it. Hardware to support unaligned accesses is rather complicated, and designing code to avoid unaligned accesses is generally not overly difficult. Thus, such hardware generally only exists either on processors that are already very complicated, or processors which will be running code that was designed for processors that would assembly multi-byte registers from single-byte reads (e.g. on the 8088, every 16-bit read required two 8-bit memory fetches, and a lot of 8088 code was run on later Intel processors).

Why is virtual memory needed in embedded systems? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Per my understanding, virtual memory is as follows:
Programs/applications/executables reside in a storage device. Storage device access is much slower than RAM. Hence, programs is copied from storage memory to main memory for execution. Since computers have limited main memory (RAM), when all of the RAM is being used (e.g., if there are many programs open simultaneously or if one very large program is in use), a computer with virtual memory enabled will swap data to the HDD and back to memory as needed, thus, in effect, increasing the total system memory.
As far as I know, most embedded devices do not have disk memory (like smartphones or in car infotainment systems). Code is directly executed from Flash memory. RAM is mainly used as a scratchpad area (local variables, return address etc).
So why do we need virtual memory in embedded systems? (e.g. WinCE and QNX support virtual memory)
Your understanding is completely wrong. You are confusing virtual memory with swapping or page files. There are systems that have virtual memory and no swap or page files and there are systems that swap without virtual memory.
Virtual memory just means that a process has a view of memory that is different from the physical mapping. Among other things, it allows processes to have their own virtual address space.
Storage device access is much slower than RAM. Hence programs is copied from storage memory to main memory for execution. Since computers have limited main memory (RAM), when all of the RAM is being used (e.g., if there are many programs open simultaneously or if one very large program is in use), a computer with virtual memory enabled will swap data to the HDD and back to memory as needed, thus, in effect, increasing the total system memory.
That's swapping (or paging). It has nothing to do with virtual memory except that most modern operating systems implement swapping using virtual memory. Swapping actually existed before virtual memory.
I think you're probably incorrect about these devices running code directly from flash memory. The read speed of flash is pretty low and RAM is very cheap. My bet is that most of the systems you mention don't run code directly from flash and instead use virtual memory to fault code into RAM as needed.
embedded systems, the term itself has a wide range of applications. you could call a small microcontroller with flash program space measured in kbytes or less and ram measured in either bits or bytes (not enough to be kbytes) an embedded system. Likewise a tivo running a full blown operating system on a pretty much full blown computer motherboard (replace tivo with xbox as another example) as an embedded system. So you need to be less vague about your question. virtual memory has little to do with any of that its applications cross those boundaries.
There are many answers above, David S has the best of course that virtual memory simply means the memory address on one side of the virtual memory boundary is different than the physical address that is used on the other side of that boundary. Where, how, why, etc is there a boundary varies.
A popular use for virtual memory, and I might argue a primary use case is for operating systems. One benefit is that for example all applications could be compiled for the same address space, all applications might be compiled such that from the programs perspective they all start at say address 0x8000, and as far as that program when it runs and accesses memory it accesses stuff based on that address. A combination of the hardware and the operating system change that virtual address that the program is using to a physical address. If the operating system allows for multitasking, then each task might think they are in the same address space but the physical addresses are different for each of those tasks. I wont elaborate further on why using an assumed, fixed address space, is a benefit. Another aspect that operating systems use is memory management. Many MMU's will let you segment the memory however. If a user wants to allocate 100 Megabytes of memory the program may access in its virtual address space that 100 meg as if it were linear and in that address space it is linear, but that 100 meg might be broken down into say 4Kbyte chunks that are scattered all about the physical address space, not always likely but certainly technically possible that no two chunks of that physical memory is next to any other chunk of that 100 meg. your memory management doesnt necessarily have to try to keep large physical chunks of memory available for applications to allocate. Note not all MMUs are exactly the same and 4Kbytes is just an example. A third major benefit from virtual address space to an operating system is protection. If the application is bound to the virtual address space, it is often quite easy to prevent that application from touching the memory of any other application or the operating system. the application in this case would operate/execute at a proection level such that all accesses are considered virtual and have to go through a translation to physical, the tables that are used to define that virtual to physical can contain protection flags. If the application addresses a memory address in its virtual space that it has no business accessing, the hardware can trap that and let the operating system take action as to how to handle it (virtualize some hardware, pop up an error and kill the app, pop up a warning and not kill the app but at the same time feed the app bogus data for their transaction, etc).
There are lots of ways this can be used in an embedded system. first off many embedded systems run operating systems, so all of the above, ease of compiling the program for the address space, relative ease of memory management, and protection of the other applications and operating system and other benefits not mentioned. (virtualization being one, being able to enable/disable instruction/data caching on a block by block basis is another)
The bottom line though is what David S pointed out. virtual memory simply means the virtual address is not necessarily equal to the physical address, it can be but doesnt have to be, there is some boundary, some hardware, usually table driven, that translates the virtual address into a physical address. Lots of reasons why you would want to do this, since some embedded systems are indistinguishable from non-embedded systems any reason that applies to a non-embedded system can apply to an embedded system.
As much as folks may want you to believe that a system has a flat address space, it is often an illusion. In a microcontroller for example you might have multiple flash banks and one or more ram banks. Each of these banks has a physical, generally zero based address. Even if there is no mmu or anything else like that there is a place somewhere between the address bus on the processor and the address bus on the flash or ram memory that decodes the address on the processor and uses that to address into the specific memory bank. Often the lower bits match and upper bits are responsible for the bank choices (this is often the case with an mmu as well) so in that sense the processor is living in a virtual address space. (not limited to microcontrollers, this is generally how processors address busses are treated) With microcontrollers depending on a pin being pulled high or low or some other mechanism you might have a chip feature that allows one flash bank to be used to boot the processor or another. You might tie an input pin high and the processors built in bootloader allows you to access and debug the system for example reprogram the application flash. Or perhaps tie that line low and boot the application flash instead of the vendors debugger/boot flash. some chips get even more complicated letting you boot one flash then the program writes a register somewhere instantly changing the memory architecture moving things around, for example allowing ram to be used for the interrupt vector table so your application can be changed after boot rather than a vector table in flash that is not as easy to change at will.
now when you talk about virtual memory as far as swapping to and from a disk, that is a trick often employed by operating systems to give the illusion of having more ram. I mentioned that above under the category of virtualization. virtual memory in the sense that it isnt really there, I have X bytes but will let the software think there are Y bytes (where Y is larger than X) available. The operating system through the virtual tables used by the hardware, manages which memory chunks are tied to physical ram and are allowed to complete as is by the hardware, or are marked as not available in some way, causing an exception to the operating system, upon inspection the operating system determines that this is a valid address for this application, but the data behind this address has been swapped to disk. The operating system then finds through some algorithm another chunk of ram belonging to whomever (part of the algorithm) and it copies that chunk of ram to disk, marks the table related to that virtual to physical as not valid, then copies the desired chunk from disk to ram, marks that chunk as valid and lets the hardware complete the memory cycle.
Not any different than say how vmware or other virtual machines work. You can execute instructions natively on the hardware using virtual memory until such time as you cause an exception, the virtual machine might think you have an xyz network interface and might have a driver that is accessing a register in that xyz network interface, but the reality is you have no xyz hardware and/or you dont want the virtual machine applications to access that hardware, so you virtualize it, you trap that register access, and using software that simulates the hardware you fake that access and let the program on the virtual machine continue. This obviously not the only way to do virtual machines, but it is one way if the hardware supports it, to let a virtual machine run very fast as a percentage of the time it is actually running instructions on the hardware. The slowest way to virtualize of course is to virtualize everything including the processor, every instruction in that case would be simulated, this is quite slow but has its own features (virtualizing an arm system on an x86 or x86 on an arm, xyz on an abc, fill in the blanks). And if that is the type of virtual memory you are talking about in an embedded system, well if the embedded system is for the most part indistinguishable from a non-embedded system (an xbox or tivo for example) then well for the same reasons you could allow such a thing. If you were on a microcontroller, well the use cases there would generally mean if you needed more memory you would buy a bigger microcontroller, or add more memory to the system ,or change the needs of the application such that it doesnt need as much memory. there may be exceptions, but it mostly depends on your application and requirements, a general purpose or general purpose like system which allows for applications or their data to be larger than the available ram, will require some sort of solution. the microcontroller in your keyless entry key fob thing or in your tv remote control or clock radio or whatever normally would not have a need to allow "applications" to require more resources than are physically there.
The more important benefit of using virtual memory is that every process gets its own address space which is isolated from every other process's. That way virtual memory helps keep faults contained and improves security and stability. I should note that it is still possible for two processes to share a bit of memory, to facilitate communication (shared mem IPC).
Also you can do other tricks like conserving memory via mapping shared parts into more than one process's (libc comes to mind for embedded use) address space but only having it once in physical mem. Also this gives it a speed boost, you can even enhance it further the way linux does cheapen fork/clone by only copying the in kernel descriptors and leaving the memory image alone up until the first write access is done with a similar idea.
As a last benefit, in modern systems, it's common to do file I/O via mapping the file into the process space (cf. mmap for example).
It's interesting to note that one can get some of the benefits of "virtual memory" without needing a full-fledged MMU. The hardware requirements can sometimes be amazingly light. The PIC 16C505 has a 5-bit address space and 40 bytes of RAM; addresses 0x10 to 0x1F can map to either of two groups of 16 bytes of RAM. When writing an application which needed to manage two different data streams, I arranged so that all the variables associated with one data stream would be in the first group of 16 "switchable" memory locations, and those associated with the other would be at the corresponding addresses in the second group. I could then use the same code to manage both data streams. Simply set the banking bit one way, call the routine, set it the other way, and call the routine again.
One of the reasons Virtual Memory exists is so that your device can multitask. It can also act as your RAM does, thus taking the load off of your physical RAM and swapping the load back and forth.