What is a kernel stack used for? - process

The following is a description I read of a context switch between process A and process B. I don't understand what a kernel stack is used for. There is suppose to be a per process kernel stack. And the description I am reading speaks of saving registers of A onto the kernel stack of A and also saving registers of A to the process structure of A. What exactly is the point of saving the registers to both the kernel stack and the process structure and why the need for both?
A context switch is conceptually simple: all the OS has to do is save
a few register values for the currently-executing process (onto its
kernel stack, for example) and restore a few for the
soon-to-be-executing process (from its kernel stack). By doing so, the
OS thus ensures that when the return-from-trap instruction is finally
executed, instead of returning to the process that was running, the
system resumes execution of another process...
Process A is running and then is interrupted by the timer interrupt.
The hardware saves its registers (onto its kernel stack) and enters
the kernel (switching to kernel mode). In the timer interrupt handler,
the OS decides to switch from running Process A to Process B. At that
point, it calls the switch() routine, which carefully saves current
register values (into the process structure of A), restores the
registers of Process B (from its process structure entry), and then
switches contexts, specifically by changing the stack pointer to use
B’s kernel stack (and not A’s). Finally, the OS returns-from-trap,
which restores B’s registers and starts running it.

I have a disagreement with the second paragraph.
Process A is running and then is interrupted by the timer interrupt. The hardware saves its registers (onto its kernel stack) and enters the kernel (switching to kernel mode).
I am not aware of a system that saves all the registers on the kernel stack on an interrupt. Program Counter, Processor Status, and Stack Pointer (assuming the hardware does not have a separate Kernel Mode Stack Pointer). Normally, processors save the minimum necessary on the kernel stack after an interrupt. The interrupt handler will then save any additional registers it wants to use and restores them before exit. The processor's RETURN FROM INTERRUPT or EXCEPTION instruction then restores the registers automatically stored by the interrupt.
That description assumes no change in the process.
If the interrupt handle decides to change the process, it saves the current register state (the "process context" --most processors have a single instruction for this. In Intel land you might have to use multiple instructions) then executes another instruction to load the process context of the new process.
To answer your heading question "What is a kernel stack used for?", it is used whenever the processor is in Kernel mode. If the kernel did not have a stack protected from user access, the integrity of the system could be compromised. The kernel stack tends to be very small.
To answer you second question, "What exactly is the point of saving the registers to both the kernel stack and the process structure and why the need for both?"
They serve two different purpose. The saved registers on the kernel stack are used to get out of kernel mode. The context process block saves the entire register set in order to change processes.
I think your misunderstanding comes from the wording of your source that suggests all registers are stored on the stack when entering kernel mode, rather than just the minimum number of registers needed to make the kernel mode switch. The system will usually only save what it needs to get back to user mode (and may use that same information to return back to the original process in another context switch, depending upon the system). The change in process context saves all the registers.
Edits to answer additional questions:
If the interrupt handler needs to use register not saved by the CPU automatically by the interrupt, it pushes them on the kernel stack on entry and pops them off on exit. The interrupt handler has to explicitly save and restore any [general] registers it uses. The Process Context Block does not get touched for this.
The Process Context Block only gets altered as part an actual context switch.
Example:
Lets assume we have a processor with a program counter, stack pointer, processor status and 16 general registers (I know no such system really exists) and that the same SP is used for all modes.
Interrupt occurs.
The hardware pushes the PC, SP, and PS on to the stack, loads the SP with the address of the kernel mode stack and the PC from the interrupt handler (from the processor's dispatch table).
Interrupt handler gets called.
The writer of the handler decides he is going to us R0-R3. So the first lines of the handler have:
Push R0 ; on to the kernel mode stack
Push R1
Push R2
Push R3
The interrupt handler does whatever it wants to do.
Cleanup
The writer of the interrupt handler needs to do:
Pop R3
Pop R2
Pop R1
Pop R0
REI ; Whatever the system's return from interrupt or exception instruction is.
Hardware Takes over
Restores the PS, PC, and SP from the kernel mode stack, then resumes executing where it was before the interrupt.
I've made up my own processor for simplification. Some processors have lengthy instructions that are interruptable (e.g. block character moves). Such instructions often use registers to maintain their context. On such a system, the processor would have to automatically save any registers is uses to maintain context within the instruction.
An interrupt handler does not muck with the process context block unless it is changing processes.

It's difficult to speak in general terms about how an OS works 'under the hood', because it's dependent on how the hardware works. Also, terminology isn't highly standardised.
My guess is that by the 'Process structure entry' the writer means what is commonly known as the 'context' of the process, and that contains a copy of every register. It's not possible for the interrupt code to immediately save registers to this structure, because it would have to use (and therefore modify) registers in doing so. That's why it has to save a few registers, enough so that it can do the job, somewhere immediately available, e.g. where the stack pointer is pointing, which the writer calls the 'kernel stack'.
Depending on the architecture, this could be a single stack or separate ones per process.

Related

Does a system call involve a context switch or not?

I am reading the wikipedia page on system calls and I cannot reconcile a few of the statements that are made there.
At the bottom, it says that "A system call does not generally require a context switch to another process; instead, it is executed in the context of whichever process invoked it."
Yet, at the top, it says that "[...] applications to request services via system calls, which are often initiated via interrupts. An interrupt [...] passes control to the kernel [and then] the kernel executes a specific set of instructions over which the calling program has no direct control".
It seems to me that if the interrupt "passes control to the kernel," that means that the kernel, which is "another process," is executing and therefore a context switch happened. Therefore, there seems to be a contradiction in the wikipedia page. Where is my understanding wrong?
Your understanding is wrong because the kernel isn't a separate process. The kernel is sitting in RAM in shared memory areas. Typically, it sits in the top half of the virtual address space.
When the kernel is invoked with a system call, it is not necessarily using an interrupt. On x86-64, it is invoked directly using a specific processor instruction (syscall). This instruction makes the processor jump to the address stored in a special register.
Syscalls don't necessarily involve a full context switch. They must involve a user mode to kernel mode context switch. Most often, kernels have a kernel stack per process. This stack is mostly unused and empty when no system call is active as it then makes no sense to have anything stored in it.
The registers also need to be saved since the kernel can use them. I don't know for other processors but x86-64 does have the TSS allowing for automated user mode to kernel mode stack switch. The registers still need to be saved manually.
In the end, there is actually a necessary partial context switch when entering the kernel through a system call but it doesn't involve switching the whole process. Since the temporary storage for swapped registers and the kernel stack are already reserved, it involves much less overhead as the kernel doesn't need to touch the page tables. Swapping page tables often involves cache managing and some cache flushing to make it consistent.

OS Context Switch in ISR

I am just eager to know how OS actually does context switch when some asynchronous event raise ISR that make higher priority task ready to run. As far as I know when CPU enter ISR it puts some of register values to the hardware stack, so how scheduler retreives those values and puts it to the task stack ? Does it access hardware stack in order to copy values that are allready preserved ? I hope I was clear.
Thanks in advance.
On a Cortex-M3 processor you have the MSP (Main Stack Pointer - which is your hardware stack) and the PSP (Process Stack Pointer - which is your task stack).
On entry to an exception the stack frame is stored on the current PSP stack (in normal, non nested operation). The exception handler then switches to the MSP stack, however it can still access the PSP stack so it can store any remaining registers etc on that same PSP stack as well as any other task information it needs.
The exception can then selected the new high priority task and switch the PSP to this tasks stack and restoring the registers that is needs. It then leaves the PSP in exactly the same state as when the task was suspended so that on return from exception the rest of the stack is correctly restored.
It is more complex than this in certain situations but that is the basic operation (On ARM Cortex-M). It will be different on other processors.
I would recommend downloading FreeRTOS and looking at the various different port layers. There is a port for pretty much everything there, and the low level task switching stuff in the "portable" directories is fairly small and straightforward.
As I'm not quite sure what the scope of your question is, I'll try and summarize some concepts of preemptive scheduling:
There's one stack per task. For each stack, there's a stack pointer pointing to it. So basically, for the task switch, the current stack pointer is saved and the next task's stack pointer is loaded. Interestingly, the return from OS to the task's code is then done via a RETURN instruction, and not a JUMP or CALL like one might expect.
When an ISR interrupts a running task, it will not run another task itself. As you correctly said, it only makes a task runnable (taking it out of waiting state), so that, in the next scheduling cycle, the OS can consider the now-ready task for further execution. (If and when that task runs depends on his assigned priority; if it has a very high priority, the OS may try and make sure it runs before any other, lower prio task gets switched to.)
The actual task switching only occurs after the ISR finished and returned, so there's no need to copy anything from one stack to another.
In 'simple' implementations, the ISR may just return to the task it interrupted, so that no early, 'out-of-order' context switch will occur.
Another, more complex implementation can have the ISR return to the OS instead of the interrupted task. A function like yield() would thus be called, giving the OS the chance to do a task switch immediately if necessary.
This, however, may require that affected ISRs get special exit instructions appended replacing the normal compiler-generated ISR code.

Detect ISR method call in FreeRTOS

Is it possible to determine whether a method in FreeRTOS is being invoked from the context of an ISR (interrupt service request) or a task at runtime? Maybe an existing function already exists for this or maybe it is possible to write a method that examines the stack somehow?
There are two ways to do this. I'm using a Cortex-M7 microcontroller. So I'm not 100% sure this works for your Cortex-M3. But it's worth checking in your datasheets.
FIRST APPROACH
Check the CPU registers of your Cortex-M core. Normally you have the usual R0-R12 CPU registers, a SP (Stack Pointer), a LR (Link Register) and a PC (Program Counter). There are a few extra 'special' CPU registers, more specifically: PSR, PRIMASK, FAULTMASK, BASEPRI and CONTROL. That's it for the Cortex-M7 core.
Now consider the PSR register. The PSR register stands for "Program Status Register". There is a bitfield ISR_NUMBER[8:0] in it. If it has the value 0, the CPU is in "thread mode". Thread mode is the normal non-interrupt mode. If the value is nonzero, your CPU is executing an interrupt. What interrupt? The value in ISR_NUMBER[8:0] tells you the interrupt number.
Reading the value of the PSR register is not trivial. You need to use specific assembly instruction to do that. There is no quick way to do it in C. You will need the MSR (Move general to special reg) and MRS (move special to general reg) instructions. Of course, inline assembly will make it possible to put it smoothly in your C-code :-)
SECOND APPROACH
There is a second approach. Unlike the previous one, you don't need to read out a CPU register. Instead, this second approach requires you to read out the value of a 'general' register (like there are a few thousand in your microcontroller). The register I'm referring to is the ICSR(Interrupt Control and State) register. This register is located in the SCB "System Control Block". The register has a bitfield named VECTACTIVE[8:0]. Again, this bitfield contains the number of the active interrupt. If the value is 0, the CPU is in thread mode, which means that no interrupt is currently running.
Hope this helps.

System call without context switching?

I was just reading up on how linux works in my OS-book when I came across this..
[...] the kernel is created as a single, monolitic binary. The main reason is to improve performance. Because all kernel code and data structures are kept in a single address space, no context switches are necessary when a process calls an operating-system function or when a hardware interrup is delivered.
That sounded quite amazing to me, surely it must store the process's context before running off into kernel mode to handle an interrupt.. But ok, I'll buy it for now. A few pages on, while describing a process's scheduling context, it said:
Both system calls and interrups that occur while the process is executing will use this stack.
"this stack" being the place where the kernel stores the process's registers and such.
Isn't this a direct contradiction to the first quote? Am I missinterpreting it somehow?
I think the first quote is referring to the differences between a monolithic kernel and a microkernel.
Linux being monolithic, all its kernel components (device drivers, scheduler, VM manager) run at ring 0. Therefore, no context switch is necessary when performing system calls and handling interrupts.
Contrast microkernels, where components like device drivers and IPC providers run in user space, outside of ring 0. Therefore, this architecture requires additional context switches when performing system calls (because the performing module might reside in user space) and handling interrupts (to relay the interrupts to the device drivers).
"Context switch" could mean one of a couple of things, both relevant: (1) switching from user to kernel mode to process the system call, or an involuntary switch to kernel mode to process an interrupt against the interrupt stack, or (2) switching to run another user process in user space, with a jump to kernel space in between the two.
Any movement from user space to kernel space implies saving enough user-space to return to it reliably. If the kernel-space code decides that - while you're no longer running the user-code for that process - it's time to let another user-process run, it gets in.
So at the least, you're talking 2-3 stacks or places to store a "context": hardware-interrupts need a kernel-level stack to say what to return to; user method/subroutine calls use a standard stack for getting that done. Etc.
The original Unix kernels - and the model isn't that different now for this part - ran the system calls like a short-order cook processing breakfast orders: move this over on the stove to make room for the order of bacon that just arrived, start the bacon, go back to the first order. All in kernel switching context. Was not a huge monitoring application, which probably drove the IBM and DEC software folks mad.
When making a system call in Linux, a context switch is done from user-space to kernel space (ring3 to ring0). Each process has an associated kernel mode stack, that is used by the system call. Before the system call is executed, the CPU registers of the process are stored on its user-mode stack, this stack is different from the kernel mode stack, and is the one which the process uses for user-space executions.
When a process is in kernel mode (or user mode), calling functions of the same mode will not require a context switch. This is what is referred by the first quote.
The second quote refers to the kernel mode stack, and not the user-mode stack.
Having said this, I must mention Linux optimisations, where no transition is needed to the kernel space for executing a system call, i.e. all processing related to the system call is done in the user space itself (thus no context switch). vsyscall, and VDSO are such techniques. The idea behind them is quite simple. It is to send to the user space, the data that is required for execution of the corresponding system call. More info can be found in this LWN article.
In addition to this, there have been some research projects in which all the execution happens in the same ring. User space programs, and the OS code, both reside in the same ring. Idea is to get rid of the overhead of ring switches. Microsoft's [singularity][2] OS is one such project.

How 4gb(VM) Address space used while swithcing from/to user space to kernel space

I looked at a lot if online thread/tutorials regarding how process address space is divided into process/kernel
Ex:
i have some Helloworld program
in that i have call as printf(in turn it makes write system call to enter into kernel space)
My doubt how Helloworld program stack used by kernel.
Can you tell me how whole execution goes on...
./helloworld -> printf() -> write system call -> display driver -> return from write -> back to helloworld
Thanks,
Amarender
The detailed answer to this question depends on the specific kernel and architecture. However, the general answer is that when userspace wants to call into the kernel, it executes a trap instruction, that causes the CPU to change privilege level and start executing kernel code. As part of the privilege level change, the CPU will also switch to a kernel stack. When the kernel is done, it will execute a return-from-trap sequence that restores the userspace stack and resumes execution where it left off.
In a nutshell: When the write system call is made, int $80 trap is generated. The handler saves the current process registers on the Kernel stack (present in the kernel address space). Then CPL in segment registers are changed to enable the use of kernel page tables. Then the kernel looks up its table of system calls and finds the appropriate address of the desired routine. The execution then jumps to the routine which in turn may call the device driver code.
After doing its work, the kernel returns to user mode by restoring the register content and CPL in the segment registers.