How are virtual addresses corresponding to kernel stack mapped? - process

Each process's virtual address space comprises of user space and kernel space. As pointed out by many articles, the kernel space of all processes is mapped to same physical address in memory i.e. there is only one kernel in the physical memory. But each process has its own kernel stack which is a part of the kernel space. How does same mapping work for all processes with different kernel stacks?

Note: This is the OS agnostic answer. Details do vary slightly with OS in question (e.g. Darwin and continuations..), and possibly with architectural (ARMv8, x86, etc) implementations.
When a process performs a system call, the user mode state (registers) is saved, including the user mode stack pointer. At that point, a kernel mode stack pointer is loaded, which is usually maintained somewhere in the thread control block.
You are correct in saying that there is only one kernel space. What follows is, that (in theory) one thread in kernel space could easily see and/or tamper with any others in kernel space (just like same process threads can "see" each other in user space) This, however, is (almost always) in theory only, since the kernel code presumably respects memory boundaries (as is assumed user mode does, with thread local storage, etc). That said, "almost always", because if the kernel code can be exploited, then all of kernel memory will be laid bare to the exploiter, and potentially read and/or compromised.

Related

Does a system call involve a context switch or not?

I am reading the wikipedia page on system calls and I cannot reconcile a few of the statements that are made there.
At the bottom, it says that "A system call does not generally require a context switch to another process; instead, it is executed in the context of whichever process invoked it."
Yet, at the top, it says that "[...] applications to request services via system calls, which are often initiated via interrupts. An interrupt [...] passes control to the kernel [and then] the kernel executes a specific set of instructions over which the calling program has no direct control".
It seems to me that if the interrupt "passes control to the kernel," that means that the kernel, which is "another process," is executing and therefore a context switch happened. Therefore, there seems to be a contradiction in the wikipedia page. Where is my understanding wrong?
Your understanding is wrong because the kernel isn't a separate process. The kernel is sitting in RAM in shared memory areas. Typically, it sits in the top half of the virtual address space.
When the kernel is invoked with a system call, it is not necessarily using an interrupt. On x86-64, it is invoked directly using a specific processor instruction (syscall). This instruction makes the processor jump to the address stored in a special register.
Syscalls don't necessarily involve a full context switch. They must involve a user mode to kernel mode context switch. Most often, kernels have a kernel stack per process. This stack is mostly unused and empty when no system call is active as it then makes no sense to have anything stored in it.
The registers also need to be saved since the kernel can use them. I don't know for other processors but x86-64 does have the TSS allowing for automated user mode to kernel mode stack switch. The registers still need to be saved manually.
In the end, there is actually a necessary partial context switch when entering the kernel through a system call but it doesn't involve switching the whole process. Since the temporary storage for swapped registers and the kernel stack are already reserved, it involves much less overhead as the kernel doesn't need to touch the page tables. Swapping page tables often involves cache managing and some cache flushing to make it consistent.

Running multiple executables linked to 0x400000

I'm interested in operating systems topic and I have a dummy question. Standard PE executable files are linked to 0x400000. My question is how can operating system load multiply executables with same image base, when virtual memory just maps virtual addresses to physical. Is it storing PDE and PTE index of thread somewhere? Is there some addition to each address before execution starts? How does it work?
Each process gets its own virtual address space, and hence there's no conflict. All virtual address spaces that exist in any one time in the system get mapped into the physical address space. Virtual memory that can't or currently isn't mapped onto a particular physical memory is held in the swap file (swap partition, or alike) — this is called paging.
During thread switches, when the CPU is about to execute a thread from a different process than it was executing so far, the operating system's scheduler informs the CPU (sets the respective registers) about the new virtual address translation table to use. Thus the CPU thinks there's just one virtual address space at the given time, while the operating system can manage many more, one for each process.
Disclaimer: My answer may be a thought of as a bit superficial or imprecise as opposed to the reality. This for the sake of simplicity in respect to the nature of the OPs question. Also, these mechanisms are CPU-dependent and operating system-dependent.

Why is virtual memory needed in embedded systems? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Per my understanding, virtual memory is as follows:
Programs/applications/executables reside in a storage device. Storage device access is much slower than RAM. Hence, programs is copied from storage memory to main memory for execution. Since computers have limited main memory (RAM), when all of the RAM is being used (e.g., if there are many programs open simultaneously or if one very large program is in use), a computer with virtual memory enabled will swap data to the HDD and back to memory as needed, thus, in effect, increasing the total system memory.
As far as I know, most embedded devices do not have disk memory (like smartphones or in car infotainment systems). Code is directly executed from Flash memory. RAM is mainly used as a scratchpad area (local variables, return address etc).
So why do we need virtual memory in embedded systems? (e.g. WinCE and QNX support virtual memory)
Your understanding is completely wrong. You are confusing virtual memory with swapping or page files. There are systems that have virtual memory and no swap or page files and there are systems that swap without virtual memory.
Virtual memory just means that a process has a view of memory that is different from the physical mapping. Among other things, it allows processes to have their own virtual address space.
Storage device access is much slower than RAM. Hence programs is copied from storage memory to main memory for execution. Since computers have limited main memory (RAM), when all of the RAM is being used (e.g., if there are many programs open simultaneously or if one very large program is in use), a computer with virtual memory enabled will swap data to the HDD and back to memory as needed, thus, in effect, increasing the total system memory.
That's swapping (or paging). It has nothing to do with virtual memory except that most modern operating systems implement swapping using virtual memory. Swapping actually existed before virtual memory.
I think you're probably incorrect about these devices running code directly from flash memory. The read speed of flash is pretty low and RAM is very cheap. My bet is that most of the systems you mention don't run code directly from flash and instead use virtual memory to fault code into RAM as needed.
embedded systems, the term itself has a wide range of applications. you could call a small microcontroller with flash program space measured in kbytes or less and ram measured in either bits or bytes (not enough to be kbytes) an embedded system. Likewise a tivo running a full blown operating system on a pretty much full blown computer motherboard (replace tivo with xbox as another example) as an embedded system. So you need to be less vague about your question. virtual memory has little to do with any of that its applications cross those boundaries.
There are many answers above, David S has the best of course that virtual memory simply means the memory address on one side of the virtual memory boundary is different than the physical address that is used on the other side of that boundary. Where, how, why, etc is there a boundary varies.
A popular use for virtual memory, and I might argue a primary use case is for operating systems. One benefit is that for example all applications could be compiled for the same address space, all applications might be compiled such that from the programs perspective they all start at say address 0x8000, and as far as that program when it runs and accesses memory it accesses stuff based on that address. A combination of the hardware and the operating system change that virtual address that the program is using to a physical address. If the operating system allows for multitasking, then each task might think they are in the same address space but the physical addresses are different for each of those tasks. I wont elaborate further on why using an assumed, fixed address space, is a benefit. Another aspect that operating systems use is memory management. Many MMU's will let you segment the memory however. If a user wants to allocate 100 Megabytes of memory the program may access in its virtual address space that 100 meg as if it were linear and in that address space it is linear, but that 100 meg might be broken down into say 4Kbyte chunks that are scattered all about the physical address space, not always likely but certainly technically possible that no two chunks of that physical memory is next to any other chunk of that 100 meg. your memory management doesnt necessarily have to try to keep large physical chunks of memory available for applications to allocate. Note not all MMUs are exactly the same and 4Kbytes is just an example. A third major benefit from virtual address space to an operating system is protection. If the application is bound to the virtual address space, it is often quite easy to prevent that application from touching the memory of any other application or the operating system. the application in this case would operate/execute at a proection level such that all accesses are considered virtual and have to go through a translation to physical, the tables that are used to define that virtual to physical can contain protection flags. If the application addresses a memory address in its virtual space that it has no business accessing, the hardware can trap that and let the operating system take action as to how to handle it (virtualize some hardware, pop up an error and kill the app, pop up a warning and not kill the app but at the same time feed the app bogus data for their transaction, etc).
There are lots of ways this can be used in an embedded system. first off many embedded systems run operating systems, so all of the above, ease of compiling the program for the address space, relative ease of memory management, and protection of the other applications and operating system and other benefits not mentioned. (virtualization being one, being able to enable/disable instruction/data caching on a block by block basis is another)
The bottom line though is what David S pointed out. virtual memory simply means the virtual address is not necessarily equal to the physical address, it can be but doesnt have to be, there is some boundary, some hardware, usually table driven, that translates the virtual address into a physical address. Lots of reasons why you would want to do this, since some embedded systems are indistinguishable from non-embedded systems any reason that applies to a non-embedded system can apply to an embedded system.
As much as folks may want you to believe that a system has a flat address space, it is often an illusion. In a microcontroller for example you might have multiple flash banks and one or more ram banks. Each of these banks has a physical, generally zero based address. Even if there is no mmu or anything else like that there is a place somewhere between the address bus on the processor and the address bus on the flash or ram memory that decodes the address on the processor and uses that to address into the specific memory bank. Often the lower bits match and upper bits are responsible for the bank choices (this is often the case with an mmu as well) so in that sense the processor is living in a virtual address space. (not limited to microcontrollers, this is generally how processors address busses are treated) With microcontrollers depending on a pin being pulled high or low or some other mechanism you might have a chip feature that allows one flash bank to be used to boot the processor or another. You might tie an input pin high and the processors built in bootloader allows you to access and debug the system for example reprogram the application flash. Or perhaps tie that line low and boot the application flash instead of the vendors debugger/boot flash. some chips get even more complicated letting you boot one flash then the program writes a register somewhere instantly changing the memory architecture moving things around, for example allowing ram to be used for the interrupt vector table so your application can be changed after boot rather than a vector table in flash that is not as easy to change at will.
now when you talk about virtual memory as far as swapping to and from a disk, that is a trick often employed by operating systems to give the illusion of having more ram. I mentioned that above under the category of virtualization. virtual memory in the sense that it isnt really there, I have X bytes but will let the software think there are Y bytes (where Y is larger than X) available. The operating system through the virtual tables used by the hardware, manages which memory chunks are tied to physical ram and are allowed to complete as is by the hardware, or are marked as not available in some way, causing an exception to the operating system, upon inspection the operating system determines that this is a valid address for this application, but the data behind this address has been swapped to disk. The operating system then finds through some algorithm another chunk of ram belonging to whomever (part of the algorithm) and it copies that chunk of ram to disk, marks the table related to that virtual to physical as not valid, then copies the desired chunk from disk to ram, marks that chunk as valid and lets the hardware complete the memory cycle.
Not any different than say how vmware or other virtual machines work. You can execute instructions natively on the hardware using virtual memory until such time as you cause an exception, the virtual machine might think you have an xyz network interface and might have a driver that is accessing a register in that xyz network interface, but the reality is you have no xyz hardware and/or you dont want the virtual machine applications to access that hardware, so you virtualize it, you trap that register access, and using software that simulates the hardware you fake that access and let the program on the virtual machine continue. This obviously not the only way to do virtual machines, but it is one way if the hardware supports it, to let a virtual machine run very fast as a percentage of the time it is actually running instructions on the hardware. The slowest way to virtualize of course is to virtualize everything including the processor, every instruction in that case would be simulated, this is quite slow but has its own features (virtualizing an arm system on an x86 or x86 on an arm, xyz on an abc, fill in the blanks). And if that is the type of virtual memory you are talking about in an embedded system, well if the embedded system is for the most part indistinguishable from a non-embedded system (an xbox or tivo for example) then well for the same reasons you could allow such a thing. If you were on a microcontroller, well the use cases there would generally mean if you needed more memory you would buy a bigger microcontroller, or add more memory to the system ,or change the needs of the application such that it doesnt need as much memory. there may be exceptions, but it mostly depends on your application and requirements, a general purpose or general purpose like system which allows for applications or their data to be larger than the available ram, will require some sort of solution. the microcontroller in your keyless entry key fob thing or in your tv remote control or clock radio or whatever normally would not have a need to allow "applications" to require more resources than are physically there.
The more important benefit of using virtual memory is that every process gets its own address space which is isolated from every other process's. That way virtual memory helps keep faults contained and improves security and stability. I should note that it is still possible for two processes to share a bit of memory, to facilitate communication (shared mem IPC).
Also you can do other tricks like conserving memory via mapping shared parts into more than one process's (libc comes to mind for embedded use) address space but only having it once in physical mem. Also this gives it a speed boost, you can even enhance it further the way linux does cheapen fork/clone by only copying the in kernel descriptors and leaving the memory image alone up until the first write access is done with a similar idea.
As a last benefit, in modern systems, it's common to do file I/O via mapping the file into the process space (cf. mmap for example).
It's interesting to note that one can get some of the benefits of "virtual memory" without needing a full-fledged MMU. The hardware requirements can sometimes be amazingly light. The PIC 16C505 has a 5-bit address space and 40 bytes of RAM; addresses 0x10 to 0x1F can map to either of two groups of 16 bytes of RAM. When writing an application which needed to manage two different data streams, I arranged so that all the variables associated with one data stream would be in the first group of 16 "switchable" memory locations, and those associated with the other would be at the corresponding addresses in the second group. I could then use the same code to manage both data streams. Simply set the banking bit one way, call the routine, set it the other way, and call the routine again.
One of the reasons Virtual Memory exists is so that your device can multitask. It can also act as your RAM does, thus taking the load off of your physical RAM and swapping the load back and forth.

System call without context switching?

I was just reading up on how linux works in my OS-book when I came across this..
[...] the kernel is created as a single, monolitic binary. The main reason is to improve performance. Because all kernel code and data structures are kept in a single address space, no context switches are necessary when a process calls an operating-system function or when a hardware interrup is delivered.
That sounded quite amazing to me, surely it must store the process's context before running off into kernel mode to handle an interrupt.. But ok, I'll buy it for now. A few pages on, while describing a process's scheduling context, it said:
Both system calls and interrups that occur while the process is executing will use this stack.
"this stack" being the place where the kernel stores the process's registers and such.
Isn't this a direct contradiction to the first quote? Am I missinterpreting it somehow?
I think the first quote is referring to the differences between a monolithic kernel and a microkernel.
Linux being monolithic, all its kernel components (device drivers, scheduler, VM manager) run at ring 0. Therefore, no context switch is necessary when performing system calls and handling interrupts.
Contrast microkernels, where components like device drivers and IPC providers run in user space, outside of ring 0. Therefore, this architecture requires additional context switches when performing system calls (because the performing module might reside in user space) and handling interrupts (to relay the interrupts to the device drivers).
"Context switch" could mean one of a couple of things, both relevant: (1) switching from user to kernel mode to process the system call, or an involuntary switch to kernel mode to process an interrupt against the interrupt stack, or (2) switching to run another user process in user space, with a jump to kernel space in between the two.
Any movement from user space to kernel space implies saving enough user-space to return to it reliably. If the kernel-space code decides that - while you're no longer running the user-code for that process - it's time to let another user-process run, it gets in.
So at the least, you're talking 2-3 stacks or places to store a "context": hardware-interrupts need a kernel-level stack to say what to return to; user method/subroutine calls use a standard stack for getting that done. Etc.
The original Unix kernels - and the model isn't that different now for this part - ran the system calls like a short-order cook processing breakfast orders: move this over on the stove to make room for the order of bacon that just arrived, start the bacon, go back to the first order. All in kernel switching context. Was not a huge monitoring application, which probably drove the IBM and DEC software folks mad.
When making a system call in Linux, a context switch is done from user-space to kernel space (ring3 to ring0). Each process has an associated kernel mode stack, that is used by the system call. Before the system call is executed, the CPU registers of the process are stored on its user-mode stack, this stack is different from the kernel mode stack, and is the one which the process uses for user-space executions.
When a process is in kernel mode (or user mode), calling functions of the same mode will not require a context switch. This is what is referred by the first quote.
The second quote refers to the kernel mode stack, and not the user-mode stack.
Having said this, I must mention Linux optimisations, where no transition is needed to the kernel space for executing a system call, i.e. all processing related to the system call is done in the user space itself (thus no context switch). vsyscall, and VDSO are such techniques. The idea behind them is quite simple. It is to send to the user space, the data that is required for execution of the corresponding system call. More info can be found in this LWN article.
In addition to this, there have been some research projects in which all the execution happens in the same ring. User space programs, and the OS code, both reside in the same ring. Idea is to get rid of the overhead of ring switches. Microsoft's [singularity][2] OS is one such project.

How does a stack memory increase?

In a typical C program, the linux kernel provides 84K - ~100K of memory. How does the kernel allocate more memory for the stack when the process uses the given memory.
IMO when the process takes up all the memory of the stack and now uses the next contiguous memory, ideally it should page fault and then the kernel handles the page fault.
Is it here that the kernel provides more memory to the stack for the given process, and which data structure in linux kernel identifies the size of the stack for the process??
There are a number of different methods used, depending on the OS (linux realtime vs. normal) and the language runtime system underneath:
1) dynamic, by page fault
typically preallocate a few real pages to higher addresses and assign the initial sp to that. The stack grows downward, the heap grows upward. If a page fault happens somewhat below the stack bottom, the missing intermediate pages are allocated and mapped. Effectively increasing the stack from the top towards the bottom automatically. There is typically a maximum up to which such automatic allocation is performed, which can or can not be specified in the environment (ulimit), exe-header, or dynamically adjusted by the program via a system call (rlimit). Especially this adjustability varies heavily between different OSes. There is also typically a limit to "how far away" from the stack bottom a page fault is considered to be ok and an automatic grow to happen. Notice that not all systems' stack grows downward: under HPUX it (used?) to grow upward so I am not sure what a linux on the PA-Risc does (can someone comment on this).
2) fixed size
other OSes (and especially in embedded and mobile environments) either have fixed sizes by definition, or specified in the exe header, or specified when a program/thread is created. Especially in embedded real time controllers, this is often a configuration parameter, and individual control tasks get fix stacks (to avoid runaway threads taking the memory of higher prio control tasks). Of course also in this case, the memory might be allocated only virtually, untill really needed.
3) pagewise, spaghetti and similar
such mechanisms tend to be forgotten, but are still in use in some run time systems (I know of Lisp/Scheme and Smalltalk systems). These allocate and increase the stack dynamically as-required. However, not as a single contigious segment, but instead as a linked chain of multi-page chunks. It requires different function entry/exit code to be generated by the compiler(s), in order to handle segment boundaries. Therefore such schemes are typically implemented by a language support system and not the OS itself (used to be earlier times - sigh). The reason is that when you have many (say 1000s of) threads in an interactive environment, preallocating say 1Mb would simply fill your virtual address space and you could not support a system where the thread needs of an individual thread is unknown before (which is typically the case in a dynamic environment, where the use might enter eval-code into a separate workspace). So dynamic allocation as in scheme 1 above is not possible, because there are would be other threads with their own stacks in the way. The stack is made up of smaller segments (say 8-64k) which are allocated and deallocated from a pool and linked into a chain of stack segments. Such a scheme may also be requried for high performance support of things like continuations, coroutines etc.
Modern unixes/linuxes and (I guess, but not 100% certain) windows use scheme 1) for the main thread of your exe, and 2) for additional (p-)threads, which need a fix stack size given by the thread creator initially. Most embedded systems and controllers use fixed (but configurable) preallocation (even physically preallocated in many cases).
edit: typo
The stack for a given process has a limited, fixed size. The reason you can't add more memory as you (theoretically) describe is because the stack must be contiguous, and it grows toward the heap. So, when the stack reaches the heap, no extension is possible.
The stack size for a userland program is not determined by the kernel. The kernel stack size is a configuration option for the kernel (usually 4k or 8k).
Edit: if you already know this, and were merely talking about the allocation of physical pages for a process, then you have the procedure down already. But there's no need to keep track of the "stack size" like this: the virtual pages in the stack with no pagetable entries are just normal overcommitted virtual pages. Physical memory will be granted on their first access. But the kernel does not have to overcommit memory, and thus a stack will probably have complete physical realization when the executable is first loaded.
The stack can only be used up to a certain length, because it has a fixed storage capacity in memory. If your question asks in what direction does the stack being used up? the answer is downwards. It is filled down in memory towards the heap. The heap is a dynamic component of memory by which it can actually grow from the bottom up, based on your need of data storage.