I am reading the wikipedia page on system calls and I cannot reconcile a few of the statements that are made there.
At the bottom, it says that "A system call does not generally require a context switch to another process; instead, it is executed in the context of whichever process invoked it."
Yet, at the top, it says that "[...] applications to request services via system calls, which are often initiated via interrupts. An interrupt [...] passes control to the kernel [and then] the kernel executes a specific set of instructions over which the calling program has no direct control".
It seems to me that if the interrupt "passes control to the kernel," that means that the kernel, which is "another process," is executing and therefore a context switch happened. Therefore, there seems to be a contradiction in the wikipedia page. Where is my understanding wrong?
Your understanding is wrong because the kernel isn't a separate process. The kernel is sitting in RAM in shared memory areas. Typically, it sits in the top half of the virtual address space.
When the kernel is invoked with a system call, it is not necessarily using an interrupt. On x86-64, it is invoked directly using a specific processor instruction (syscall). This instruction makes the processor jump to the address stored in a special register.
Syscalls don't necessarily involve a full context switch. They must involve a user mode to kernel mode context switch. Most often, kernels have a kernel stack per process. This stack is mostly unused and empty when no system call is active as it then makes no sense to have anything stored in it.
The registers also need to be saved since the kernel can use them. I don't know for other processors but x86-64 does have the TSS allowing for automated user mode to kernel mode stack switch. The registers still need to be saved manually.
In the end, there is actually a necessary partial context switch when entering the kernel through a system call but it doesn't involve switching the whole process. Since the temporary storage for swapped registers and the kernel stack are already reserved, it involves much less overhead as the kernel doesn't need to touch the page tables. Swapping page tables often involves cache managing and some cache flushing to make it consistent.
Related
I'm learning the concepts of operating system. This is part I've learned: kernel is key piece of os that does lots of critical things such as memory management, job scheduling etc.
This is part what I'm thinking and get confused: to have os operating as expected, in a sense kernel needs to keep running, perhaps in the background, so it is always able to respond to different system calls and interrupts. In order to achieve this, I think of two completely different approaches:
kernel actually spawns some processes purely on its behalf, not user process, and keep them running in background (like daemon)? These background processes will handle housekeeping stuff without acknowledgement from user or user process. I call this approach as "kernel is running on its own"
There is no kernel process at all. Every process we can find in os are all user processes. Kernel is nothing but a library (piece of code, along with some key data structures like page tables etc) shared among all these user processes. In each process's address space, some portion of kernel will be loaded so that when any interrupt or system call occurs, mode is elevated to kernel mode. Pieces of kernel code loaded into user process's address space will be executed so that kernel can handle the event. When kernel does that, it is still in the context of current user process. In this approach, there exists only user processes, but kernel will periodically run within the context of each user process (but in a different mode).
This is a conceptual question that has confused me for a while. Thanks in advance!
The answer to your question is mostly no. The kernel doesn't spawn kernel mode processes. At boot, the kernel might start some executables but they run in user mode as a privileged user. For example, the Linux kernel will start systemd as the first user mode process as the root user. This process will read configuration files (written by your distribution's developers like Ubuntu) and start some other processes like the X Server for graphics and basic input (from keyboard, mouse, etc).
Your #1 is wrong and your #2 is also somewhat wrong. The kernel isn't a library. It is code loaded in the top half of the virtual address space. The bottom half of the VAS is very big (several tens of thousands of GB) so user mode processes can become very big as long as you have physical RAM or swap space to back the memory they require. The top half of the VAS is shared between processes. For the bottom half, every process has theoretical access to all of it.
The kernel is called on system call and on interrupt. It doesn't run all the time like a process. It simply is called when an interrupt or syscall occurs. To make it work with more active processes than there are processor cores, timers will be used. On x86-64, each core has one local APIC. The local APIC has a timer that you can program to throw an interrupt after some time. The kernel will thus give a time slice to each process, choose one process in the list and start the timer with its corresponding time slice. When the timer throws an interrupt, the kernel knows that the time slice of that process is over and that it might be time to let another process take its place on that core.
First of all, A library can have its own background threads.
Secondly, the answer is somewhere between these approaches.
Most Unix-like system are built on a monolithic kernel (or hybrid one). That means the kernel contains all its background work in kernel threads in a single address space. I wrote in more details about this here.
On most Linux distributions, you can run
ps -ef | grep '\[.*\]'
And it will show you kernel threads.
But it will not show you "the kernel process", because ps basically only shows threads. Multithreaded processes will be seen via their main thread. But the kernel doesn't have a main thread, it owns all the threads.
If you want to look at processes via the lens of address spaces rather than threads, there's not really a way to do it. However, address spaces are useless if no thread can access them, So you access the actual address space of a thread (if you have permission) via /proc/<pid>/mem. So if you used the above ps command and found a kernel thread, you can see its address space using this approach.
But you don't have to search - you can also access the kernel's address space via /proc/kcore.
You will see, however, that these kernel threads aren't, for the most part, core kernel functionality such as scheduling & virtual memory management. In most Unix kernels, these happen during a system call by the thread that made the system call while it's running in kernel mode.
Windows, on the other hand, is built on a microkernel. That means that the kernel launches other processes and delegates work to them.
On Windows, that microkernel's address space is represented by the "System" service. The other processes - file systems, drivers etc., and other parts of what a monolithic kernel would comprise e.g. virtual memory management - might run in user mode or kernel mode, but still in a different address space than the microkernel.
You can get more details on how this works on Wikipedia.
Thirdly, just to be clear, that none of these concepts is to be confused with "system daemon", which are the regular userspace daemons that an OS needs in order to function, e.g. systemd, syslog, cron, etc..
Those are generally created by the "init" process (PID 1 on Unix systems) e.g. systemd, however systemd itself is created by the kernel at boot time.
UniformBuffers and StoreBuffers are updated, on the CPU side, using memcpy. How does synchronization work for those descriptor types? Does using memcpy imply that the application waits for memcpy to upload data to the GPU prior to continuing to next statement? If so, does this mean that barriers are not needed for sync'ing these types of buffers?
Synchronization works the same way for any memory resource: with certain rare exceptions, if you've changed memory, you need a memory dependency to ensure visibility of those changes. The synchronization system doesn't care whether it's used as a UBO or whatever. It cares about the nature of the source operation (the host) and the destination operation (reading from certain shader stages).
For host-to-device memory operations, you need to perform a form of synchronization known as a "domain operation". Fortunately, vkQueueSubmit automatically performs a domain operation on any host writes made visible before the vkQueueSubmit call. So if you write stuff to GPU-visible memory, then call vkQueueSubmit (either in the same thread or via CPU-side inter-thread communication), any commands in that submit call (or later ones) will see the values you wrote.
Assuming you have made them visible. Writes to host-coherent memory are always visible to the GPU, but writes to non-coherent memory must be made visible via a call to vkFlushMappedMemoryRanges.
If you want to write to memory asynchronously to the GPU process that reads it, you'll need to use an event. You write to the memory, make it visible if needs be, then set the event. The GPU commands that read from it would wait on the event, using VK_ACCESS_HOST_WRITE_BIT as the source access, and VK_PIPELINE_STAGE_HOST_BIT as the source stage. The destination access and stage are determined by how you plan to read from it.
Vulkan knows nothing about memcpy. It doesn't care how you modify the memory; it only cares that you do so in accord with its rules.
The following is a description I read of a context switch between process A and process B. I don't understand what a kernel stack is used for. There is suppose to be a per process kernel stack. And the description I am reading speaks of saving registers of A onto the kernel stack of A and also saving registers of A to the process structure of A. What exactly is the point of saving the registers to both the kernel stack and the process structure and why the need for both?
A context switch is conceptually simple: all the OS has to do is save
a few register values for the currently-executing process (onto its
kernel stack, for example) and restore a few for the
soon-to-be-executing process (from its kernel stack). By doing so, the
OS thus ensures that when the return-from-trap instruction is finally
executed, instead of returning to the process that was running, the
system resumes execution of another process...
Process A is running and then is interrupted by the timer interrupt.
The hardware saves its registers (onto its kernel stack) and enters
the kernel (switching to kernel mode). In the timer interrupt handler,
the OS decides to switch from running Process A to Process B. At that
point, it calls the switch() routine, which carefully saves current
register values (into the process structure of A), restores the
registers of Process B (from its process structure entry), and then
switches contexts, specifically by changing the stack pointer to use
B’s kernel stack (and not A’s). Finally, the OS returns-from-trap,
which restores B’s registers and starts running it.
I have a disagreement with the second paragraph.
Process A is running and then is interrupted by the timer interrupt. The hardware saves its registers (onto its kernel stack) and enters the kernel (switching to kernel mode).
I am not aware of a system that saves all the registers on the kernel stack on an interrupt. Program Counter, Processor Status, and Stack Pointer (assuming the hardware does not have a separate Kernel Mode Stack Pointer). Normally, processors save the minimum necessary on the kernel stack after an interrupt. The interrupt handler will then save any additional registers it wants to use and restores them before exit. The processor's RETURN FROM INTERRUPT or EXCEPTION instruction then restores the registers automatically stored by the interrupt.
That description assumes no change in the process.
If the interrupt handle decides to change the process, it saves the current register state (the "process context" --most processors have a single instruction for this. In Intel land you might have to use multiple instructions) then executes another instruction to load the process context of the new process.
To answer your heading question "What is a kernel stack used for?", it is used whenever the processor is in Kernel mode. If the kernel did not have a stack protected from user access, the integrity of the system could be compromised. The kernel stack tends to be very small.
To answer you second question, "What exactly is the point of saving the registers to both the kernel stack and the process structure and why the need for both?"
They serve two different purpose. The saved registers on the kernel stack are used to get out of kernel mode. The context process block saves the entire register set in order to change processes.
I think your misunderstanding comes from the wording of your source that suggests all registers are stored on the stack when entering kernel mode, rather than just the minimum number of registers needed to make the kernel mode switch. The system will usually only save what it needs to get back to user mode (and may use that same information to return back to the original process in another context switch, depending upon the system). The change in process context saves all the registers.
Edits to answer additional questions:
If the interrupt handler needs to use register not saved by the CPU automatically by the interrupt, it pushes them on the kernel stack on entry and pops them off on exit. The interrupt handler has to explicitly save and restore any [general] registers it uses. The Process Context Block does not get touched for this.
The Process Context Block only gets altered as part an actual context switch.
Example:
Lets assume we have a processor with a program counter, stack pointer, processor status and 16 general registers (I know no such system really exists) and that the same SP is used for all modes.
Interrupt occurs.
The hardware pushes the PC, SP, and PS on to the stack, loads the SP with the address of the kernel mode stack and the PC from the interrupt handler (from the processor's dispatch table).
Interrupt handler gets called.
The writer of the handler decides he is going to us R0-R3. So the first lines of the handler have:
Push R0 ; on to the kernel mode stack
Push R1
Push R2
Push R3
The interrupt handler does whatever it wants to do.
Cleanup
The writer of the interrupt handler needs to do:
Pop R3
Pop R2
Pop R1
Pop R0
REI ; Whatever the system's return from interrupt or exception instruction is.
Hardware Takes over
Restores the PS, PC, and SP from the kernel mode stack, then resumes executing where it was before the interrupt.
I've made up my own processor for simplification. Some processors have lengthy instructions that are interruptable (e.g. block character moves). Such instructions often use registers to maintain their context. On such a system, the processor would have to automatically save any registers is uses to maintain context within the instruction.
An interrupt handler does not muck with the process context block unless it is changing processes.
It's difficult to speak in general terms about how an OS works 'under the hood', because it's dependent on how the hardware works. Also, terminology isn't highly standardised.
My guess is that by the 'Process structure entry' the writer means what is commonly known as the 'context' of the process, and that contains a copy of every register. It's not possible for the interrupt code to immediately save registers to this structure, because it would have to use (and therefore modify) registers in doing so. That's why it has to save a few registers, enough so that it can do the job, somewhere immediately available, e.g. where the stack pointer is pointing, which the writer calls the 'kernel stack'.
Depending on the architecture, this could be a single stack or separate ones per process.
I wanted to know exactly whose responsibility is it to set the mode bits during system calls to the kernel.
Does the job scheduler manage these bits, or is the whole Process Status Word (PSW) a part of the Process Control Block?
Or is it the responsibility of the interrupt handler to do this? If so, how does the Interrupt Service routine (being a routine itself) get to perform such a privileged task and not any other user routine? What if some user process tries to address the PSW ?Is the behavior different for different Operating Systems?
Alot of the protection mechanisms you ask about are architecture specific. I believe that the Process Status Word refers to an IBM architecture, but I am not certain. I don't know specifically how the Process Status Word is used in that architecture
I can, however, give you an example of how this is done in the case of x86. In x86, privileged instructions can only be executed on ring 0, which is what the interrupt handlers and other kernel code execute in.
The way the CPU knows whether code is in kernel space or user space is via protection bits set on that particular page in the virtual memory system. That means when a process is created, certain areas of memory are marked as being user code and other areas, where the kernel is mapped to, is marked as being kernel code, so the processor knows whether code being executed should have privileged access based on where it is in the virtual memory space. Since only the kernel can modify this space, user code is unable to execute privileged instructions.
The Process Control Block is not architecture specific, which means that it is entirely up to the operating system to determine how it is used to set up privileges and such. One thing is for certain, however, the CPU does not read the Process Control Block as it exists in the operating system. Some architectures, however, could have their own process control mechanism built in, but this is not strictly necessary. On x86, the Process Control Block would be used to know what sort of system calls the process can make, as well as virtual memory mappings which tell the CPU it's privilege level.
While different architectures have different mechanisms for protecting user code, they all share many common attributes in that when kernel code is executed via a system call, the system knows that only the code in that particular location can be privileged.
I was just reading up on how linux works in my OS-book when I came across this..
[...] the kernel is created as a single, monolitic binary. The main reason is to improve performance. Because all kernel code and data structures are kept in a single address space, no context switches are necessary when a process calls an operating-system function or when a hardware interrup is delivered.
That sounded quite amazing to me, surely it must store the process's context before running off into kernel mode to handle an interrupt.. But ok, I'll buy it for now. A few pages on, while describing a process's scheduling context, it said:
Both system calls and interrups that occur while the process is executing will use this stack.
"this stack" being the place where the kernel stores the process's registers and such.
Isn't this a direct contradiction to the first quote? Am I missinterpreting it somehow?
I think the first quote is referring to the differences between a monolithic kernel and a microkernel.
Linux being monolithic, all its kernel components (device drivers, scheduler, VM manager) run at ring 0. Therefore, no context switch is necessary when performing system calls and handling interrupts.
Contrast microkernels, where components like device drivers and IPC providers run in user space, outside of ring 0. Therefore, this architecture requires additional context switches when performing system calls (because the performing module might reside in user space) and handling interrupts (to relay the interrupts to the device drivers).
"Context switch" could mean one of a couple of things, both relevant: (1) switching from user to kernel mode to process the system call, or an involuntary switch to kernel mode to process an interrupt against the interrupt stack, or (2) switching to run another user process in user space, with a jump to kernel space in between the two.
Any movement from user space to kernel space implies saving enough user-space to return to it reliably. If the kernel-space code decides that - while you're no longer running the user-code for that process - it's time to let another user-process run, it gets in.
So at the least, you're talking 2-3 stacks or places to store a "context": hardware-interrupts need a kernel-level stack to say what to return to; user method/subroutine calls use a standard stack for getting that done. Etc.
The original Unix kernels - and the model isn't that different now for this part - ran the system calls like a short-order cook processing breakfast orders: move this over on the stove to make room for the order of bacon that just arrived, start the bacon, go back to the first order. All in kernel switching context. Was not a huge monitoring application, which probably drove the IBM and DEC software folks mad.
When making a system call in Linux, a context switch is done from user-space to kernel space (ring3 to ring0). Each process has an associated kernel mode stack, that is used by the system call. Before the system call is executed, the CPU registers of the process are stored on its user-mode stack, this stack is different from the kernel mode stack, and is the one which the process uses for user-space executions.
When a process is in kernel mode (or user mode), calling functions of the same mode will not require a context switch. This is what is referred by the first quote.
The second quote refers to the kernel mode stack, and not the user-mode stack.
Having said this, I must mention Linux optimisations, where no transition is needed to the kernel space for executing a system call, i.e. all processing related to the system call is done in the user space itself (thus no context switch). vsyscall, and VDSO are such techniques. The idea behind them is quite simple. It is to send to the user space, the data that is required for execution of the corresponding system call. More info can be found in this LWN article.
In addition to this, there have been some research projects in which all the execution happens in the same ring. User space programs, and the OS code, both reside in the same ring. Idea is to get rid of the overhead of ring switches. Microsoft's [singularity][2] OS is one such project.