How do functions access locals in stack frames? - embedded

I've read that stack frames contain return addresses, function arguments, and local variables for a function. Since functions don't know where their stack frame is in memory at compile time, how do they know the memory address of their local variables? Do they offset and dereference the stack pointer for every read or write of a local? In particular, how does this work on embedded devices without efficient support for pointer accesses, where load and store addresses have to be hardcoded into the firmware and pointer accesses go through reserved registers?

The way objects work is that the compiler or assembly programmer determines the layout of an object — the offset of each field relative to the start of the object (as well as the size of the object as a whole).  Then, objects are passed and stored as references, which are generally pointers in C and machine code.  In struct xy { int x; int y; }, we can reason that x is at offset 0 and y at offset 4 from an object reference to a struct xy.
The stack frame is like an object that contains a function's memory-based local variables (instead of struct members), and being accessed not by an object reference but by the stack or frame pointer.  (And being allocated/deallocated by stack pointer decrement/increment, instead of malloc and free.)
Both share the issue that we don't know the actual location/address of a given field (x or y) of a dynamically allocated object or stack frame position of a memory-based local variable until runtime, but when a function runs, it can compute the complete absolute address (of object fields or memory-based local variables) quite simply by adding together the base (object reference or stack/frame pointer) to relative position of the desired item, knowing its predetermined layout.
Processors offer addressing modes that help to support this kind of access, usually something like base + displacement.
Let's also note that many local variables are assigned directly to CPU registers so have no memory address at all.  Other local variables move between memory and CPU registers, and such might be considered optimization that means we don't have to access memory if the value of a variable is needed when that has recently already been loaded into a CPU register.
In many ways, processors for embedded devices are like other processors, offering addressing modes to help with memory accesses, and with optimizing compilers that can make good decisions about where a variable lives.  As you can tell from the above, not all variables need live in memory, and some live in both in memory and in CPU registers to help reduce memory access costs.

The anser is, it depends on the architecture. You will have a register that contains the address of the current stack frame, EBP for x86 for instance, once you know this, individual variables are identified by their offsets into the stack frame, calculated by object size at compile time (hence the need for size to be know at compile time for local variables).
Even if a stack frame appears in different places in memory, the variables will have the same relative offset, so you can always calculate the address.
The size of the stack frame for each function is calculated at compile and included in the code so that each call can set up and clean its own frame.

Related

Process Instructions storage (Operating System)

I was learning about how a process looks inside a memory from(OS concepts by Abraham silberschatz).
So I came to know that it mainly has following section
HEAP
STACK
DATA/ R/WCode( for global variables)
Text(or ROC) that contains the code
Shared library
OS reserved space
List item
Diagram link
I have some questions regarding the process workflow.
Where does PCB fits in this diagram.
People generally show the called functions getting pushed onto the process stack in memory, but in actual 3 pieces of info gets pushed(local variable, parameters passed and return address).Two sub-questions here:
2.1 Where are the actual instructions stored in this diagram(because stack only has data not instructions). Is it in the text section.
2.2 if this stack data is pushed then they must get popped when the function
execution is completed.So how does Return-Address comes into play while popping.
That "Figure 3.1 - A process in memory", shows the address space of a process.  Each process typically has its own address space (not depicted), and the kernel also typically has its own address space (also not depicted).
The PCBs usually live within the kernel, who is managing the processes.
Functions, when then are not active, still have machine code instructions which are located within the text segment/section of the process.
Functions that are invoked are said to be activated, and an activation record, also known as a stack frame or call frame, is created on the stack for some functions, depending on how complex the function is.  (The stack frame or activation record is effectively private data to the function, not generally intended for other functions to inspect, modulo exception (throw/catch) mechanisms)
A function, X, that calls another function, Y, will suspend itself upon the invocation of Y waiting for Y to return to X, before X resumes.  In such scenario, X uses an activation record on the stack to maintain its suspended state, which it will use upon resumption.
If/when function Y returns to X it must remove any data that it (Y) allocated on the stack, and restore the stack pointer, and other call-preserved registers, to their original value(s) in order for X to successfully resume.  The stack pointer register is the reference for a function to find its stack allocated data.
When X calls Y we can speak to the presence of two return addresses: X has a return address to get back to its caller, and Y has a return address to get back to X.  As this is the nature of calling, some architectures will provide instructions for calling that also push the return address directly onto the stack, and provide instructions for returning that pop the return address off the stack to resume the caller.
However, RISC architectures will generally leave the return address in a CPU register, requiring functions that make further calls to save their own return address in a place that will survive their own further calling (that place being stack memory).  RISC architectures also tend to engage in less pushing and popping, collecting all stack frame allocations into one (larger) allocation in prologue and all stack frame deallocations into one (larger) deallocation in epilogue.
The suspension, invocation, returning, and resumption is logical, from the point of view of functions: though the processor doesn't really see or care about functions, but rather instead sees a continuous stream of instructions that happen to include various forms of branching.

Does MIPS's LW instruction only work on arrays? Or does it work on variables, too?

I wonder if the lw MIPS instruction can work on a variable. I read a book that said the compiler will associate register with variable. Then, don't I have to move variable in memory to register?
Algorithms in pseudo code and in high level languages and have logical variables, while the machine code equivalent algorithm in a program has physical storage.
Logical variables have names, types, scope, lifetime, and at runtime have a location and hold a value.  In a program/algorithm, a name typically refers to the content held by a variable; its scope goes to what variables are reachable from any given line of code (or data); lifetime goes to the duration of variables, e.g. global variables have full program lifetime, whereas function parameters live as long as the function is active but then cease to exist upon the function's return; some variables are referred to indirectly by the program, and certain variables can hold a value at runtime that can be changed and later recalled by the program.
Physical storage of the machine consists of CPU registers and main memory; both allow for storage and retrieval of what values have been stored.  Physical storage is essentially unnamed, has no real type, or scope, and has permanent (full program) lifetime.
CPU registers are fast and limited in count and cannot be indexed, whereas main memory is vast, and indexable or addressable.  Thus, for main memory there is a notion of address that is first-class: you can identify a memory location by a number (an address), and use that value (that address), say as a parameter. 
Addressing or indexing is not possible with the CPU registers, they can only be named in machine code instructions.
One job of the compiler or an assembly language writer is to map (to translate) the logical variables from our algorithms in to the physical storage available on the processor.  Any mapping that works is acceptable; though some will be more efficient than others.  For logical variables that have overlapping lifetimes, they require separate physical storage.  Physical storage is frequently reused, repurposed for different logical variables — as logical variables' lifetimes end, their storage can be repurposed for other logical variables whose lifetime is just beginning.
Data structures that require indexing must live in main memory; however, main memory can be also be used any way that the compiler or programmer likes, so it can be used for simple logical variables as well.  Data structures as as arrays, trees, linked lists, inherently require indexing/addressing.  Arrays b/c we are selecting one of many elements, and the others because we use pointers (references) so the items being pointed (referred) to must have addresses and so must live in main memory.
Since the CPU registers are precious, fast resources, they are mostly used for logical variables that have short lifetimes.  Such variables are function parameters and local variables (locals).
Sometimes a logical variable has to be moved from one physical storage location to another.  Such is the case with some parameters passed in registers, if its associated register will be clobbered for some reason before the program's final usage of the variable, then such variable will have to be mapped to yet another storage location and its value copied from the original to the other (by the machine code program), before the physical storage of its original mapping gets clobbered.
Assembly language is like machine code though with named labels and separated sections.  Sections subdivide main memory for the program — separating code from global data — both code and data are initialized with values as per the program prior to program start.
Other, uninitialized main memory, is also available to the program: configured as the stack and as the heap.  While both the stack and heap refer to physical storage, each has a different usage model conventionally applied by programs and functions; the usage model allows for sharing of physical storage among multiple functions — this usage model has a notion of allocation, initialization (and usage) and eventual deallocation.
By convention stack memory allows for allocation upon function entry and deallocation upon function exit, and in this manner the stack memory is repeatedly repurposed and reused by one function after another.
Also by convention, heap memory allows for explicit allocation and deallocation of storage — storage that does not have to correspond to function activation & deactivation, so is used for data structures that are to outlive functions that create them.
Labels are used to identify locations in the code and data — locations in code for branching and calling; locations in data for storage for global variables.   Labels in data work to identify storage location of global variables mapped to physical storage.  It is up to the programmer to reserve sufficient storage for each kind of global variable.  A label is equivalent to the constant value that is the address of the start of physical storage for an item (rather than to the logical variable as a whole as would a name of a variable imply in high level language).  It is up to the compiler and assembly programmer to access physical storage in a manner that is wholly consistent with the intent of the logical variables mapped there.  The processor does not read data declarations, and as the physical storage is constantly being repurposed, it is the machine code program's job to inform the processor how to treat storage.
Labels are removed during build of (assembly) source code into machine code programs.  Labels are not seen by the processor, and in the program, do not separate what comes before from what comes after — they are just a convenience for the assembly programmer.  Labels alone, in code, do not affect flow of control and in data they do not prevent or preclude access that goes past or prior (by memory address) the intended logical variable there.
Then don't I have to move variable in memory to register?
As an assembly programmer you can map a logical variable to a CPU register alone without using main memory, with the caveats that this is inappropriate under some circumstances:
if the register will not survive the lifetime of the logical variable.  This can happen, for example, if the logical variable is a global variable, or if the register will be otherwise clobbered by code, such as with function calling.
if the logical variable is of a nature that it requires indexing, which is not possible among CPU registers.
If you use main memory for a variable (as is appropriate for a global variable), them to recall its value or store a new value there you will have to used loads and stores, respectively.  Global data will be initialized prior to program start but other physical storage (such as CPU registers, stack memory, heap memory) needs to be initialized by execution of machine code in the program itself.

vulkan compute shader direct access to CPU allocated memory

Is there any way in vulkan computer shader to bind specific location in CPU memory, So that I can directly access it in shader language.
For example, if I have a variable declaration int a[]={contents........};, can I bind the address of a to say binding location 0 and then access in glsl something like this
layout(std430,binding = 0) {
int a[];
}
I want do this because to I don't want spend time on writing and reading from buffer.
Generally, you cannot make the GPU access memory that Vulkan did not allocate itself for the GPU. The exception to this are external allocations made by other APIs that themselves are allocating GPU-accessible memory.
Just taking a random stack or global pointer and shoving it at Vulkan isn't going to work.
I want something like cudaHostGetDevicePointer in CUDA
What you're asking for here is not what that function does. That function takes a CPU pointer to CPU-accessible memory which CUDA allocated for you and which you previously mapped into a CPU address range. The pointer you give it must be within a mapped region of GPU memory.
You can't just shove a stack/global variable at it and expect it to work. The variable would have to be within the mapped allocation, and a global or stack variable can't be within such an allocation.
Vulkan doesn't have a way to reverse-engineer a pointer into a mapped range of device memory back to the VkDeviceMemory object it was mapped from. This is in part because Vulkan doesn't have pointers to allocations; you have to use VkDeviceMemory object, which you create and manage yourself. But if you need to know where a CPU-accessible pointer was mapped from, you can keep track of that yourself.
I want do this because to I don't want spend time on writing and reading from buffer.
Vulkan is exactly for people that do want to spend time managing how the data flows. You might want to consider some rapid prototyping framework or math library instead.
Is there any way in vulkan computer shader to bind specific location in CPU memory
Yes, but it won't save you any time.
Firstly Vulkan does allow allocation of CPU memory via VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT|VK_MEMORY_PROPERTY_HOST_COHERENT_BIT. So you could allocate your stuff in that VkDeviceMemory, map it, do your CPU stuff in that address space, and then use it on GPU.
Second way is via using the VK_EXT_external_memory_host extension, which allows you to import your pointer into Vulkan as VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_ALLOCATION_BIT_EXT. But it is involved in its own way, and the driver might say "nope", so you are back to square one.

Are JVM heap/stack different than virtual address space heap/stack?

Memory is divided into "segments" called heap, stack, bss, data, and text. However, the JVM also has these concepts of stack and heap. So how are these two reconciled?
Are they different levels of abstraction, where main memory is one or two levels below the JVM, and whose "segments" maps naturally to JVM's "segments"? Since JVM is supposed to be a virtual computer, it seems to me like they emulate what happens underneath but at a higher level of abstraction.
Sounds to me like you've been reading a textbook or similar. All these terms generally have very precise definitions in books/lectures, but a lot less precise definitions in reality. Therefore what people mean when they say heap is not necessarily exactly the same as what a book etc. says.
Memory is divided into "segments" called heap, stack, bss, data, and text.
This is only true for a typical user space process. In other word this will be true for an everyday program written in c or similar, however it is not true for all programs, and definitely not true for the entire memory space.
When a program is executed the OS allocates memory for the various segments listed, except the the heap. The program can request memory from the OS while it is executing. This allows a program to use a different amount of memory depending on its needs. The heap refers to memory requested by the program usually via a function like malloc. To clarify the heap typically refers to a managed region of memory, usually managed with malloc/free. It is also possible to request memory directly from the OS, in an unmanaged fashion. Most people (Imo) would say this wouldn't count as part of the heap.
The stack is a data structure/segment which keeps track of local variables and function calls. It stores important information like where to return after a function call. In c or other "native" languages the stack is created by the OS and can grow or shrink if needed.
Java allows program to request memory during execution using new. Memory allocated to a java program using new is referred to as memory in the java heap. One could imagine that if you where implementing a Jvm you would use malloc behind the scenes of new. This would result in a java heap within a regular native heap. In reality "serious" jvms do not do this and interact directly with the OS for memory.
In Java the stack is created by the Jvm. One could imagine that this is allocated by malloc, but as with the heap this is likely not how real world jvms do it.
Edit:
A Jvm like hotspot. Would likely allocate memory directly from the OS. This memory would then get put into some kind of pool, from which it would be removed as needed. Reasons for needed memory would be needed includes new, or a stack that needs to grow.

Estimating available RAM left with safety margin in C (STM32F4)

I am currently developing application for STM32F407 using STM32CubeMx and Keil uVision. I know that dynamic memory allocation in embedded systems is mostly discouraged, but from spot to spot on internet I can find some arguments in favor of it.
Due to my inventors soul I wanted to try to do it, but do it safely. Let's assume I'm creating a dynamically allocated fifo for incoming UART messages, holding structs composed of the msg itself and its' length. However I wouldn't like to consume all the heap size doing so, therefore I want to check how much of it I have left: Me new (?) idea is to try temporarily allocating some big chunk of memory (say 100 char) - if it's successful, I accept the incoming msg, if not - it means that I'm running out of heap and ignore the msg (or accept it and dequeue the oldest). After checking I of course free the temp memory.
A few questions arise in my mind:
First of all, does it make sens at all? Do you think, basic on your experience, that it could be usefull and safe?
I couldn't find precise info about what exactly shares RAM in ES (I know about heap, stack and volatile vars) so my question is: providing that answer to 1. isn't "hell no go home", what size of the temp memory checker would you pick for the mentioned controller?
About the micro itself - it has 192kB RAM, however in the Drivers\CMSIS\Device\ST\STM32F4xx\Source\Templates\arm\startup_stm32f407xx.s file only 512B+1024B are allocated for heap and stack - isn't that very little, leaving the whooping, remaining 190kB for volatile vars? Would augmenting the heap size to, say 50kB be sensible? If yes, do I do it directly in this file or it's a better practice to do it somewhere else?
Probably for some of you "safe dynamic memory" and "embedded" in one post is both schocking and dazzling, but keep in mind that this is experimenting and exploring new horizons :) Thanks and greetings.
Keil uVision describes only the IDE. If you are using KEil MDK-ARM which implies ARM's RealView compiler then you can get accurate heap information using the __heapstats() function.
__heapstats() is a little strange in that rather than simply returning a value it outputs heap information to a formatted output stream facilitated by a function pointer and file descriptor passed to it. The output function must have an fprintf() like interface. You can use fprintf() of course, but that requires that you have correctly retargetted the stdio
For example the following:
typedef int (*__heapprt)(void *, char const *, ...);
__heapstats( (__heapprt)fprintf, stdout ) ;
outputs for example:
4180 bytes in 1 free blocks (avge size 4180)
1 blocks 2^11+1 to 2^12
Unfortunately that does not really achieve what you need since it outputs text. You could however implement your own function to capture the data in memory and parse the result. You may only need to capture the first decimal digit characters and discard anything else, except that the amount of free memory and the largest allocatable block are not necessarily the same thing of course. Fragmentation is indicated by the number or free blocks and their average size. You can perhaps guarantee to be able to allocate at least an average sized block.
The issue with dynamic allocation in embedded systems are to do with handling memory exhaustion and, in real-time systems, the non-deterministic timing of both allocation and deallocation using the default malloc/free implementations. In your case you might be better off using a fixed-block allocator. You can implement such an allocator by creating a static array of memory blocks (or by dynamically allocating them from the heap at start-up), and placing a pointer to each block on a queue or linked list or stack structure. To allocate you simply remove a pointer from the queue/list/stack, and to free you place a pointer back. When the available blocks structure is empty, memory is exhausted. It is entirely deterministic, and because it is your implementation can be easily monitored for performance and capacity.
With respect to question 3. You are expected to adjust the heap and system stack size to suit your application. Most tools I have used have a linker script that automatically allocates all available memory not statically allocated, allocated to a stack or reserved for other purposes to the heap. However MDK-ARM does not do that in the default linker scripts but rather allocates a fixed size heap.
You can use the linker map file summary to determine how much space is unused and manually expand the heap. I usually do that leaving a small amount of unused space to account for maintenance when the amount of statically allocated data may increase. At some point however; you end up running out of memory, and the arcane error messages from the linker may not make it obvious that your heap is just too big. It is possible to override the default linker script and provide your own, and no doubt possible then to automatically size the heap - though I have never taken the trouble to try it.
Okay I have tested my idea with dynamic heap free space checking and it worked well (although I didn't perform long-run tests), however #Clifford answer and this article convinced me to abandon the idea of dynamic allocation. Eventually I implemented my own, static heap with pages (2d array), occupied pages indicator (0-1 array of size of number of pages) and fifo of structs consisting of pointer to the msg on my static heap (actually just the index of the array) and length of message (to determine how many contiguous pages it occupies). 95% of msg I receive should take up only one page, 5% - 2 or 3 pages, so fragmentation is still possible, but at least I keep a tight rein on it and it affects only the part of memory assigned to this module of the code (in other words: the fragmentation doesn't leak to other parts of the code). So far it has worked without any problems and for sure is faster because the lookup time is O(n*m), n - number of pages, m - the longest page possible, but taking into consideration the laws of probability it goes down to O(n). Moreover n is always a lot smaller the number of all allocation units in memory, so way less to look for.