I wanted to know exactly whose responsibility is it to set the mode bits during system calls to the kernel.
Does the job scheduler manage these bits, or is the whole Process Status Word (PSW) a part of the Process Control Block?
Or is it the responsibility of the interrupt handler to do this? If so, how does the Interrupt Service routine (being a routine itself) get to perform such a privileged task and not any other user routine? What if some user process tries to address the PSW ?Is the behavior different for different Operating Systems?
Alot of the protection mechanisms you ask about are architecture specific. I believe that the Process Status Word refers to an IBM architecture, but I am not certain. I don't know specifically how the Process Status Word is used in that architecture
I can, however, give you an example of how this is done in the case of x86. In x86, privileged instructions can only be executed on ring 0, which is what the interrupt handlers and other kernel code execute in.
The way the CPU knows whether code is in kernel space or user space is via protection bits set on that particular page in the virtual memory system. That means when a process is created, certain areas of memory are marked as being user code and other areas, where the kernel is mapped to, is marked as being kernel code, so the processor knows whether code being executed should have privileged access based on where it is in the virtual memory space. Since only the kernel can modify this space, user code is unable to execute privileged instructions.
The Process Control Block is not architecture specific, which means that it is entirely up to the operating system to determine how it is used to set up privileges and such. One thing is for certain, however, the CPU does not read the Process Control Block as it exists in the operating system. Some architectures, however, could have their own process control mechanism built in, but this is not strictly necessary. On x86, the Process Control Block would be used to know what sort of system calls the process can make, as well as virtual memory mappings which tell the CPU it's privilege level.
While different architectures have different mechanisms for protecting user code, they all share many common attributes in that when kernel code is executed via a system call, the system knows that only the code in that particular location can be privileged.
Related
I'm learning the concepts of operating system. This is part I've learned: kernel is key piece of os that does lots of critical things such as memory management, job scheduling etc.
This is part what I'm thinking and get confused: to have os operating as expected, in a sense kernel needs to keep running, perhaps in the background, so it is always able to respond to different system calls and interrupts. In order to achieve this, I think of two completely different approaches:
kernel actually spawns some processes purely on its behalf, not user process, and keep them running in background (like daemon)? These background processes will handle housekeeping stuff without acknowledgement from user or user process. I call this approach as "kernel is running on its own"
There is no kernel process at all. Every process we can find in os are all user processes. Kernel is nothing but a library (piece of code, along with some key data structures like page tables etc) shared among all these user processes. In each process's address space, some portion of kernel will be loaded so that when any interrupt or system call occurs, mode is elevated to kernel mode. Pieces of kernel code loaded into user process's address space will be executed so that kernel can handle the event. When kernel does that, it is still in the context of current user process. In this approach, there exists only user processes, but kernel will periodically run within the context of each user process (but in a different mode).
This is a conceptual question that has confused me for a while. Thanks in advance!
The answer to your question is mostly no. The kernel doesn't spawn kernel mode processes. At boot, the kernel might start some executables but they run in user mode as a privileged user. For example, the Linux kernel will start systemd as the first user mode process as the root user. This process will read configuration files (written by your distribution's developers like Ubuntu) and start some other processes like the X Server for graphics and basic input (from keyboard, mouse, etc).
Your #1 is wrong and your #2 is also somewhat wrong. The kernel isn't a library. It is code loaded in the top half of the virtual address space. The bottom half of the VAS is very big (several tens of thousands of GB) so user mode processes can become very big as long as you have physical RAM or swap space to back the memory they require. The top half of the VAS is shared between processes. For the bottom half, every process has theoretical access to all of it.
The kernel is called on system call and on interrupt. It doesn't run all the time like a process. It simply is called when an interrupt or syscall occurs. To make it work with more active processes than there are processor cores, timers will be used. On x86-64, each core has one local APIC. The local APIC has a timer that you can program to throw an interrupt after some time. The kernel will thus give a time slice to each process, choose one process in the list and start the timer with its corresponding time slice. When the timer throws an interrupt, the kernel knows that the time slice of that process is over and that it might be time to let another process take its place on that core.
First of all, A library can have its own background threads.
Secondly, the answer is somewhere between these approaches.
Most Unix-like system are built on a monolithic kernel (or hybrid one). That means the kernel contains all its background work in kernel threads in a single address space. I wrote in more details about this here.
On most Linux distributions, you can run
ps -ef | grep '\[.*\]'
And it will show you kernel threads.
But it will not show you "the kernel process", because ps basically only shows threads. Multithreaded processes will be seen via their main thread. But the kernel doesn't have a main thread, it owns all the threads.
If you want to look at processes via the lens of address spaces rather than threads, there's not really a way to do it. However, address spaces are useless if no thread can access them, So you access the actual address space of a thread (if you have permission) via /proc/<pid>/mem. So if you used the above ps command and found a kernel thread, you can see its address space using this approach.
But you don't have to search - you can also access the kernel's address space via /proc/kcore.
You will see, however, that these kernel threads aren't, for the most part, core kernel functionality such as scheduling & virtual memory management. In most Unix kernels, these happen during a system call by the thread that made the system call while it's running in kernel mode.
Windows, on the other hand, is built on a microkernel. That means that the kernel launches other processes and delegates work to them.
On Windows, that microkernel's address space is represented by the "System" service. The other processes - file systems, drivers etc., and other parts of what a monolithic kernel would comprise e.g. virtual memory management - might run in user mode or kernel mode, but still in a different address space than the microkernel.
You can get more details on how this works on Wikipedia.
Thirdly, just to be clear, that none of these concepts is to be confused with "system daemon", which are the regular userspace daemons that an OS needs in order to function, e.g. systemd, syslog, cron, etc..
Those are generally created by the "init" process (PID 1 on Unix systems) e.g. systemd, however systemd itself is created by the kernel at boot time.
I am reading the wikipedia page on system calls and I cannot reconcile a few of the statements that are made there.
At the bottom, it says that "A system call does not generally require a context switch to another process; instead, it is executed in the context of whichever process invoked it."
Yet, at the top, it says that "[...] applications to request services via system calls, which are often initiated via interrupts. An interrupt [...] passes control to the kernel [and then] the kernel executes a specific set of instructions over which the calling program has no direct control".
It seems to me that if the interrupt "passes control to the kernel," that means that the kernel, which is "another process," is executing and therefore a context switch happened. Therefore, there seems to be a contradiction in the wikipedia page. Where is my understanding wrong?
Your understanding is wrong because the kernel isn't a separate process. The kernel is sitting in RAM in shared memory areas. Typically, it sits in the top half of the virtual address space.
When the kernel is invoked with a system call, it is not necessarily using an interrupt. On x86-64, it is invoked directly using a specific processor instruction (syscall). This instruction makes the processor jump to the address stored in a special register.
Syscalls don't necessarily involve a full context switch. They must involve a user mode to kernel mode context switch. Most often, kernels have a kernel stack per process. This stack is mostly unused and empty when no system call is active as it then makes no sense to have anything stored in it.
The registers also need to be saved since the kernel can use them. I don't know for other processors but x86-64 does have the TSS allowing for automated user mode to kernel mode stack switch. The registers still need to be saved manually.
In the end, there is actually a necessary partial context switch when entering the kernel through a system call but it doesn't involve switching the whole process. Since the temporary storage for swapped registers and the kernel stack are already reserved, it involves much less overhead as the kernel doesn't need to touch the page tables. Swapping page tables often involves cache managing and some cache flushing to make it consistent.
How does the operating system manage the program permissions? If you are writing a low-level program without using any system calls, directly controlling the processor, then how does an operating system put stakes if the program directly controls the processor?
Edit
It seems that my question is not very clear, I apologize, I can not speak English well and I use the translator. Anyway, what I wonder is how an operating system manages the permissions of the programs (for example the root user etc ...). If a program is written to really low-level without making system calls, then can it have full access to the processor? If you want to say that it can do everything you want and as a result the various users / permissions that the operating system does not have much importance. However, from the first answer I received I read that you can not make useful programs that work without system calls, so a program can not interact directly with a hardware (I mean how the bios interacts with the hardware for example)?
Depends on the OS. Something like MS-DOS that is barely an OS and doesn't stop a program from taking over the whole machine essentially lets programs run with kernel privilege.
An any OS with memory-protection that tries to keep separate processes from stepping on each other, the kernel doesn't allow user-space processes to talk directly to I/O hardware.
A privileged user-space process might be allowed to memory-map video RAM and/or I/O registers of a device into its own address space, and basically act like a device driver. (e.g. an X server under Linux.)
1) It is imposible to have a program that that does not make any system calls.
2) Instructions that control the operation of the processor must be executed in kernel mode.
3) The only way to get into kernel mode is through an exception (including system calls). By controlling how exceptions are handled, the operating system prevents malicious access.
If a program is written to really low-level without making system calls, then can it have full access to the processor?
On a modern system this is impossible. A system call is going to be made in the background whether you like it or not.
I am just now learning about OSes and I stumbled upon this question from my class' lecture notes. In our class, we define a process as a program in execution and I know that an OS is itself a program. So by this definition, an OS is a process.
At the same time processes can be switched in or out via a context switch, which is something that the OS manages and handles. But what would handle the OS itself when it isn't running?
Also if it is a process, does the OS have a process control block associated with it?
There was an older question on this site that I looked at, but I felt as if the answers weren't clear enough to really outline WHY the OS is/isn't a process so I thought I'd ask again here.
First of all, an OS is multiple parts. The core piece is the kernel, which is not a process. It is a framework for running processes. In practice, a process is more than just a "program in execution". On a system with an MMU, a process is usually run in its own virtual address space. The kernel however, is usually mapped into all processes. It's always there.
Other ancillary parts of the OS exist to make it usuable. The OS may have processes that it runs as part of its management. For example, Linux has many kernel threads that are independently scheduled tasks. But these are often not crucial to the OS's operation.
Short answer: No.
Here's as good a definition of "Operating System" as any:
https://en.wikipedia.org/wiki/Operating_system
An operating system (OS) is system software that manages computer
hardware and software resources and provides common services for
computer programs. The operating system is a component of the system
software in a computer system. Application programs usually require an
operating system to function.
Even "system-level processes" (like "init" on Linux, or "svchost.exe" on Windows) rely on the "operating system" ... but are not themselves the operating system.
Agreeing to some of the comments above/below.
OS is not a process. However there are few variants in design that give the opposite illusion.
For eg: If you are running a FreeRTOS, then there is no such thing as a separate OS address space and Process address space, every thing runs as a single process, the FreeRTOS framework provides API's that allow Synchronization of different tasks.
Operating System is just a set of API's (system calls) and utilities that help to achieve Multi-processing, Resource sharing etc. For eg: schedule() is a core OS function that handles the multi-processing capabilities of the OS.
In that sense, OS is not a process. Although it attaches to every process that runs on the CPU, otherwise how will the process make use of the OS's API.
It is more like soul for the body (hardware), if you will.
It is just not one process but a set of (kernel) processes required to run user processes in the system. PID 0 being the parent of all processes providing scheduler/swapping functionality to the rest of the kernel/user processes, but it is not the only process. These kernel processes (with the help of kernel drivers) provide accessor functionality (through system calls) to the user processes.
It depends upon what you are calling the "operating system".
It depends upon what operating system you are talking about.
That said and at the risk of gross oversimplification, most of what one calls "the operating system" is generally executed from user processes while in kernel mode. The entry into kernel occurs either through an interrupt, trap or fault.
To do a context switch usually either a process causes a fault entering kernel mode to so something (like write to the disk). While in kernel mode, the process realizes it would have to wait so it yields by switching the context to another process. The other common way is a timer causes an interrupt, that forces the process into kernel mode. The process then determines who should be executed next, and switches the process context.
Some operating systems do have their own kernel process that function but that is increasingly rare.
Most operating system have components that have their own processes.
I was just reading up on how linux works in my OS-book when I came across this..
[...] the kernel is created as a single, monolitic binary. The main reason is to improve performance. Because all kernel code and data structures are kept in a single address space, no context switches are necessary when a process calls an operating-system function or when a hardware interrup is delivered.
That sounded quite amazing to me, surely it must store the process's context before running off into kernel mode to handle an interrupt.. But ok, I'll buy it for now. A few pages on, while describing a process's scheduling context, it said:
Both system calls and interrups that occur while the process is executing will use this stack.
"this stack" being the place where the kernel stores the process's registers and such.
Isn't this a direct contradiction to the first quote? Am I missinterpreting it somehow?
I think the first quote is referring to the differences between a monolithic kernel and a microkernel.
Linux being monolithic, all its kernel components (device drivers, scheduler, VM manager) run at ring 0. Therefore, no context switch is necessary when performing system calls and handling interrupts.
Contrast microkernels, where components like device drivers and IPC providers run in user space, outside of ring 0. Therefore, this architecture requires additional context switches when performing system calls (because the performing module might reside in user space) and handling interrupts (to relay the interrupts to the device drivers).
"Context switch" could mean one of a couple of things, both relevant: (1) switching from user to kernel mode to process the system call, or an involuntary switch to kernel mode to process an interrupt against the interrupt stack, or (2) switching to run another user process in user space, with a jump to kernel space in between the two.
Any movement from user space to kernel space implies saving enough user-space to return to it reliably. If the kernel-space code decides that - while you're no longer running the user-code for that process - it's time to let another user-process run, it gets in.
So at the least, you're talking 2-3 stacks or places to store a "context": hardware-interrupts need a kernel-level stack to say what to return to; user method/subroutine calls use a standard stack for getting that done. Etc.
The original Unix kernels - and the model isn't that different now for this part - ran the system calls like a short-order cook processing breakfast orders: move this over on the stove to make room for the order of bacon that just arrived, start the bacon, go back to the first order. All in kernel switching context. Was not a huge monitoring application, which probably drove the IBM and DEC software folks mad.
When making a system call in Linux, a context switch is done from user-space to kernel space (ring3 to ring0). Each process has an associated kernel mode stack, that is used by the system call. Before the system call is executed, the CPU registers of the process are stored on its user-mode stack, this stack is different from the kernel mode stack, and is the one which the process uses for user-space executions.
When a process is in kernel mode (or user mode), calling functions of the same mode will not require a context switch. This is what is referred by the first quote.
The second quote refers to the kernel mode stack, and not the user-mode stack.
Having said this, I must mention Linux optimisations, where no transition is needed to the kernel space for executing a system call, i.e. all processing related to the system call is done in the user space itself (thus no context switch). vsyscall, and VDSO are such techniques. The idea behind them is quite simple. It is to send to the user space, the data that is required for execution of the corresponding system call. More info can be found in this LWN article.
In addition to this, there have been some research projects in which all the execution happens in the same ring. User space programs, and the OS code, both reside in the same ring. Idea is to get rid of the overhead of ring switches. Microsoft's [singularity][2] OS is one such project.