How does a stack memory increase? - process

In a typical C program, the linux kernel provides 84K - ~100K of memory. How does the kernel allocate more memory for the stack when the process uses the given memory.
IMO when the process takes up all the memory of the stack and now uses the next contiguous memory, ideally it should page fault and then the kernel handles the page fault.
Is it here that the kernel provides more memory to the stack for the given process, and which data structure in linux kernel identifies the size of the stack for the process??

There are a number of different methods used, depending on the OS (linux realtime vs. normal) and the language runtime system underneath:
1) dynamic, by page fault
typically preallocate a few real pages to higher addresses and assign the initial sp to that. The stack grows downward, the heap grows upward. If a page fault happens somewhat below the stack bottom, the missing intermediate pages are allocated and mapped. Effectively increasing the stack from the top towards the bottom automatically. There is typically a maximum up to which such automatic allocation is performed, which can or can not be specified in the environment (ulimit), exe-header, or dynamically adjusted by the program via a system call (rlimit). Especially this adjustability varies heavily between different OSes. There is also typically a limit to "how far away" from the stack bottom a page fault is considered to be ok and an automatic grow to happen. Notice that not all systems' stack grows downward: under HPUX it (used?) to grow upward so I am not sure what a linux on the PA-Risc does (can someone comment on this).
2) fixed size
other OSes (and especially in embedded and mobile environments) either have fixed sizes by definition, or specified in the exe header, or specified when a program/thread is created. Especially in embedded real time controllers, this is often a configuration parameter, and individual control tasks get fix stacks (to avoid runaway threads taking the memory of higher prio control tasks). Of course also in this case, the memory might be allocated only virtually, untill really needed.
3) pagewise, spaghetti and similar
such mechanisms tend to be forgotten, but are still in use in some run time systems (I know of Lisp/Scheme and Smalltalk systems). These allocate and increase the stack dynamically as-required. However, not as a single contigious segment, but instead as a linked chain of multi-page chunks. It requires different function entry/exit code to be generated by the compiler(s), in order to handle segment boundaries. Therefore such schemes are typically implemented by a language support system and not the OS itself (used to be earlier times - sigh). The reason is that when you have many (say 1000s of) threads in an interactive environment, preallocating say 1Mb would simply fill your virtual address space and you could not support a system where the thread needs of an individual thread is unknown before (which is typically the case in a dynamic environment, where the use might enter eval-code into a separate workspace). So dynamic allocation as in scheme 1 above is not possible, because there are would be other threads with their own stacks in the way. The stack is made up of smaller segments (say 8-64k) which are allocated and deallocated from a pool and linked into a chain of stack segments. Such a scheme may also be requried for high performance support of things like continuations, coroutines etc.
Modern unixes/linuxes and (I guess, but not 100% certain) windows use scheme 1) for the main thread of your exe, and 2) for additional (p-)threads, which need a fix stack size given by the thread creator initially. Most embedded systems and controllers use fixed (but configurable) preallocation (even physically preallocated in many cases).
edit: typo

The stack for a given process has a limited, fixed size. The reason you can't add more memory as you (theoretically) describe is because the stack must be contiguous, and it grows toward the heap. So, when the stack reaches the heap, no extension is possible.
The stack size for a userland program is not determined by the kernel. The kernel stack size is a configuration option for the kernel (usually 4k or 8k).
Edit: if you already know this, and were merely talking about the allocation of physical pages for a process, then you have the procedure down already. But there's no need to keep track of the "stack size" like this: the virtual pages in the stack with no pagetable entries are just normal overcommitted virtual pages. Physical memory will be granted on their first access. But the kernel does not have to overcommit memory, and thus a stack will probably have complete physical realization when the executable is first loaded.

The stack can only be used up to a certain length, because it has a fixed storage capacity in memory. If your question asks in what direction does the stack being used up? the answer is downwards. It is filled down in memory towards the heap. The heap is a dynamic component of memory by which it can actually grow from the bottom up, based on your need of data storage.

Related

How are virtual addresses corresponding to kernel stack mapped?

Each process's virtual address space comprises of user space and kernel space. As pointed out by many articles, the kernel space of all processes is mapped to same physical address in memory i.e. there is only one kernel in the physical memory. But each process has its own kernel stack which is a part of the kernel space. How does same mapping work for all processes with different kernel stacks?
Note: This is the OS agnostic answer. Details do vary slightly with OS in question (e.g. Darwin and continuations..), and possibly with architectural (ARMv8, x86, etc) implementations.
When a process performs a system call, the user mode state (registers) is saved, including the user mode stack pointer. At that point, a kernel mode stack pointer is loaded, which is usually maintained somewhere in the thread control block.
You are correct in saying that there is only one kernel space. What follows is, that (in theory) one thread in kernel space could easily see and/or tamper with any others in kernel space (just like same process threads can "see" each other in user space) This, however, is (almost always) in theory only, since the kernel code presumably respects memory boundaries (as is assumed user mode does, with thread local storage, etc). That said, "almost always", because if the kernel code can be exploited, then all of kernel memory will be laid bare to the exploiter, and potentially read and/or compromised.

Are JVM heap/stack different than virtual address space heap/stack?

Memory is divided into "segments" called heap, stack, bss, data, and text. However, the JVM also has these concepts of stack and heap. So how are these two reconciled?
Are they different levels of abstraction, where main memory is one or two levels below the JVM, and whose "segments" maps naturally to JVM's "segments"? Since JVM is supposed to be a virtual computer, it seems to me like they emulate what happens underneath but at a higher level of abstraction.
Sounds to me like you've been reading a textbook or similar. All these terms generally have very precise definitions in books/lectures, but a lot less precise definitions in reality. Therefore what people mean when they say heap is not necessarily exactly the same as what a book etc. says.
Memory is divided into "segments" called heap, stack, bss, data, and text.
This is only true for a typical user space process. In other word this will be true for an everyday program written in c or similar, however it is not true for all programs, and definitely not true for the entire memory space.
When a program is executed the OS allocates memory for the various segments listed, except the the heap. The program can request memory from the OS while it is executing. This allows a program to use a different amount of memory depending on its needs. The heap refers to memory requested by the program usually via a function like malloc. To clarify the heap typically refers to a managed region of memory, usually managed with malloc/free. It is also possible to request memory directly from the OS, in an unmanaged fashion. Most people (Imo) would say this wouldn't count as part of the heap.
The stack is a data structure/segment which keeps track of local variables and function calls. It stores important information like where to return after a function call. In c or other "native" languages the stack is created by the OS and can grow or shrink if needed.
Java allows program to request memory during execution using new. Memory allocated to a java program using new is referred to as memory in the java heap. One could imagine that if you where implementing a Jvm you would use malloc behind the scenes of new. This would result in a java heap within a regular native heap. In reality "serious" jvms do not do this and interact directly with the OS for memory.
In Java the stack is created by the Jvm. One could imagine that this is allocated by malloc, but as with the heap this is likely not how real world jvms do it.
Edit:
A Jvm like hotspot. Would likely allocate memory directly from the OS. This memory would then get put into some kind of pool, from which it would be removed as needed. Reasons for needed memory would be needed includes new, or a stack that needs to grow.

How to properly assign huge heap space for JVM

Im trying to work around an issue which has been bugging me for a while. In a nutshell: on which basis should one assign a max heap space for resource-hogging application and is there a downside for tit being too large?
I have an application used to visualize huge medical datas, which can eat up to several gigabytes of memory if several imaging volumes are opened size by side. Caching the data to be viewed is essential for fluent workflow. The software is supported with windows workstations and is started with a bootloader, which assigns the heap size and launches the main application. The actual memory needed by main application is directly proportional to the data being viewed and cannot be determined by the bootloader, because it would require reading the data, which would, ultimately, consume too much time.
So, to ensure that the JVM has enough memory during launch we set up xmx as large as we dare based, by current design, on the max physical memory of the workstation. However, is there any downside to this? I've read (from a post from 2008) that it is possible for native processes to hog up excess heap space, which can lead to memory errors during runtime. Should I maybe also sniff for free virtualmemory or paging file size prior to assigning heap space? How would you deal with this situation?
Oh, and this is my first post to these forums. Nice to meet you all and be gentle! :)
Update:
Thanks for all the answers. I'm not sure if I put my words right, but my problem rose from the fact that I have zero knowledge of the hardware this software will be run on but would, nevertheless, like to assign as much heap space for the software as possible.
I came to a solution of assigning a heap of 70% of physical memory IF there is sufficient amount of virtual memory available - less otherwise.
You can have heap sizes of around 28 GB with little impact on performance esp if you have large objects. (lots of small objects can impact GC pause times)
Heap sizes of 100 GB are possible but have down sides, mostly because they can have high pause times. If you use Azul Zing, it can handle much larger heap sizes significantly more gracefully.
The main limitation is the size of your memory. If you heap exceeds that, your application and your computer will run very slower/be unusable.
A standard way around these issues with mapping software (which has to be able to map the whole world for example) is it break your images into tiles. This way you only display the image which is one the screen (or portions which are on the screen) If you need to be able to zoom in and out you might need to store data at two to four levels of scale. Using this approach you can view a map of the whole world on your phone.
Best to not set JVM max memory to greater than 60-70% of workstation memory, in some cases even lower, for two main reasons. First, what the JVM consumes on the physical machine can be 20% or more greater than heap, due to GC mechanics. Second, the representation of a particular data entity in the JVM heap may not be the only physical copy of that entity in the machine's RAM, as the OS has caches and buffers and so forth around the various IO devices from which it grabs these objects.

Estimating available RAM left with safety margin in C (STM32F4)

I am currently developing application for STM32F407 using STM32CubeMx and Keil uVision. I know that dynamic memory allocation in embedded systems is mostly discouraged, but from spot to spot on internet I can find some arguments in favor of it.
Due to my inventors soul I wanted to try to do it, but do it safely. Let's assume I'm creating a dynamically allocated fifo for incoming UART messages, holding structs composed of the msg itself and its' length. However I wouldn't like to consume all the heap size doing so, therefore I want to check how much of it I have left: Me new (?) idea is to try temporarily allocating some big chunk of memory (say 100 char) - if it's successful, I accept the incoming msg, if not - it means that I'm running out of heap and ignore the msg (or accept it and dequeue the oldest). After checking I of course free the temp memory.
A few questions arise in my mind:
First of all, does it make sens at all? Do you think, basic on your experience, that it could be usefull and safe?
I couldn't find precise info about what exactly shares RAM in ES (I know about heap, stack and volatile vars) so my question is: providing that answer to 1. isn't "hell no go home", what size of the temp memory checker would you pick for the mentioned controller?
About the micro itself - it has 192kB RAM, however in the Drivers\CMSIS\Device\ST\STM32F4xx\Source\Templates\arm\startup_stm32f407xx.s file only 512B+1024B are allocated for heap and stack - isn't that very little, leaving the whooping, remaining 190kB for volatile vars? Would augmenting the heap size to, say 50kB be sensible? If yes, do I do it directly in this file or it's a better practice to do it somewhere else?
Probably for some of you "safe dynamic memory" and "embedded" in one post is both schocking and dazzling, but keep in mind that this is experimenting and exploring new horizons :) Thanks and greetings.
Keil uVision describes only the IDE. If you are using KEil MDK-ARM which implies ARM's RealView compiler then you can get accurate heap information using the __heapstats() function.
__heapstats() is a little strange in that rather than simply returning a value it outputs heap information to a formatted output stream facilitated by a function pointer and file descriptor passed to it. The output function must have an fprintf() like interface. You can use fprintf() of course, but that requires that you have correctly retargetted the stdio
For example the following:
typedef int (*__heapprt)(void *, char const *, ...);
__heapstats( (__heapprt)fprintf, stdout ) ;
outputs for example:
4180 bytes in 1 free blocks (avge size 4180)
1 blocks 2^11+1 to 2^12
Unfortunately that does not really achieve what you need since it outputs text. You could however implement your own function to capture the data in memory and parse the result. You may only need to capture the first decimal digit characters and discard anything else, except that the amount of free memory and the largest allocatable block are not necessarily the same thing of course. Fragmentation is indicated by the number or free blocks and their average size. You can perhaps guarantee to be able to allocate at least an average sized block.
The issue with dynamic allocation in embedded systems are to do with handling memory exhaustion and, in real-time systems, the non-deterministic timing of both allocation and deallocation using the default malloc/free implementations. In your case you might be better off using a fixed-block allocator. You can implement such an allocator by creating a static array of memory blocks (or by dynamically allocating them from the heap at start-up), and placing a pointer to each block on a queue or linked list or stack structure. To allocate you simply remove a pointer from the queue/list/stack, and to free you place a pointer back. When the available blocks structure is empty, memory is exhausted. It is entirely deterministic, and because it is your implementation can be easily monitored for performance and capacity.
With respect to question 3. You are expected to adjust the heap and system stack size to suit your application. Most tools I have used have a linker script that automatically allocates all available memory not statically allocated, allocated to a stack or reserved for other purposes to the heap. However MDK-ARM does not do that in the default linker scripts but rather allocates a fixed size heap.
You can use the linker map file summary to determine how much space is unused and manually expand the heap. I usually do that leaving a small amount of unused space to account for maintenance when the amount of statically allocated data may increase. At some point however; you end up running out of memory, and the arcane error messages from the linker may not make it obvious that your heap is just too big. It is possible to override the default linker script and provide your own, and no doubt possible then to automatically size the heap - though I have never taken the trouble to try it.
Okay I have tested my idea with dynamic heap free space checking and it worked well (although I didn't perform long-run tests), however #Clifford answer and this article convinced me to abandon the idea of dynamic allocation. Eventually I implemented my own, static heap with pages (2d array), occupied pages indicator (0-1 array of size of number of pages) and fifo of structs consisting of pointer to the msg on my static heap (actually just the index of the array) and length of message (to determine how many contiguous pages it occupies). 95% of msg I receive should take up only one page, 5% - 2 or 3 pages, so fragmentation is still possible, but at least I keep a tight rein on it and it affects only the part of memory assigned to this module of the code (in other words: the fragmentation doesn't leak to other parts of the code). So far it has worked without any problems and for sure is faster because the lookup time is O(n*m), n - number of pages, m - the longest page possible, but taking into consideration the laws of probability it goes down to O(n). Moreover n is always a lot smaller the number of all allocation units in memory, so way less to look for.

Handling stack overflows in embedded systems

In embedded software, how do you handle a stack overflow in a generic way?
I come across some processor which does protect in hardware way like recent AMD processors.
There are some techniques on Wikipedia, but are those real practical approaches?
Can anybody give a clear suggested approach which works in all case on today's 32-bit embedded processors?
Ideally you write your code with static stack usage (no recursive calls). Then you can evaluate maximum stack usage by:
static analysis (using tools)
measurement of stack usage while running your code with complete code coverage (or as high as possible code coverage until you have a reasonable confidence you've established the extent of stack usage, as long as your rarely-run code doesn't use particularly more stack than the normal execution paths)
But even with that, you still want to have a means of detecting and then handling stack overflow if it occurs, if at all possible, for more robustness. This can be especially helpful during the project's development phase. Some methods to detect overflow:
If the processor supports a memory read/write interrupt (i.e. memory access breakpoint interrupt) then it can be configured to point to the furthest extent of the stack area.
In the memory map configuration, set up a small (or large) block of RAM that is a "stack guard" area. Fill it with known values. In the embedded software, regularly (as often as reasonably possible) check the contents of this area. If it ever changes, assume a stack overflow.
Once you've detected it, then you need to handle it. I don't know of many ways that code can gracefully recover from a stack overflow, because once it's happened, your program logic is almost certainly invalidated. So all you can do is
log the error
Logging the error is very useful, because otherwise the symptoms (unexpected reboots) can be very hard to diagnose.
Caveat: The logging routine must be able to run reliably even in a corrupted-stack scenario. The routine should be simple. I.e. with a corrupted stack, you probably can't try to write to EEPROM using your fancy EEPROM writing background task. Maybe just log the error into a struct that is reserved for this purpose, in non-init RAM, which can then be checked after reboot.
Reboot (or perhaps shutdown, especially if the error reoccurs repeatedly)
Possible alternative: restart just the particular task, if you're using an RTOS, and your system is designed so the stack corruption is isolated, and all the other tasks are able to handle that task restarting. This would take some serious design consideration.
While embedded stack overflow can be caused by recursive functions getting out of hand, it can also be caused by errant pointer usage (although this could be considered another type of error), and normal system operation with an undersized stack. In other words, if you don't profile your stack usage it can occur outside of a defect or bug situation.
Before you can "handle" stack overflow you have to identify it. A good method for doing this is to load the stack with a pattern during initialization and then monitor how much of the pattern disappears during run-time. In this fashion you can identify the highest point the stack has reached.
The pattern check algorithm should execute in the opposite direction of stack growth. So, if the stack grows from 0x1000 to 0x2000, then your pattern check can start at 0x2000 to increase efficiency. If your pattern was 0xAA and the value at 0x2000 contains something other than 0xAA, you know you've probably got some overflow.
You should also consider placing an empty RAM buffer immediately after the stack so that if you do detect overflow you can shut down the system without losing data. If your stack is followed immediately by heap or SRAM data then identifying an overflow will mean that you have already suffered corruption. Your buffer will protect you for a little bit longer. On a 32-bit micro you should have enough RAM to provide at least a small buffer.
If you are using a processor with a Memory Management Unit your hardware can do this for you with minimal software overhead. Most modern 32 bit processors have them and more and more 32 bit micro controllers feature them as well.
Set up a memory area in the MMU that will be used for the stack. It should be bordered by two memory areas where the MMU does not allow access. When your application is running you will receive a exception/interrupt as soon as you overflow the stack.
Because you get a exception at the moment the error occur you know exactly where in your application the stack went bad. You can look at the call stack to see exactly how you got to where you are. This makes it a lot easier to find your problem than trying to figure out what is wrong by detecting your problem long after it happened.
I have used this successfully on PPC and AVR32 processors. When you start out using an MMU you feel like it is a waste of time since you got along great without it for many years but once you see the advantages of a exception at the exact spot where your memory problem occur you will never go back. A MMU can also detect zero pointer accesses if you disallow memory access to the bottom park of your ram.
If you are using an RTOS your MMU protects the memory and stacks of other tasks errors in one task should not affect them. This means you could also easily restart your task without affecting the other tasks.
In addition to this a processor with a MMU usually also has lots of ram your program is a lot less likely to overflow your stack and you don't need to fine tune everything to get you application to run correctly with a small memory foot print.
An alternative to this would be to use the Processor debug facilities to cause a interrupt on a memory access to the end of your stack. This will probably be very processor specific.
A stack overflow occurs the stack memory is exhausted by too large of a call stack ? e.g. a recursive function too many levels deep.
There are techniques to detect a stack overflow by placing known data after the stack so it could be detected if the stack grew too much and overwrote it.
There are static source code analysis tools such as GnatStack, StackAnalyzer from AbsInt, and Bound-T which can be used to determine or make a guess at the maximum run-time stack-size.