OS Concepts Terminology - process

I'm doing some fill in the blanks from a sample exam for my class and I was hoping you could double check my terminology.
The various scheduling queues used by the operating system would consist of lists of processes.
Interrupt handling is the technique of periodically checking to see if a condition (such as completion of some requested I/O operation) has been met.
When the CPU is in kernel mode, a running program has access to a restricted set of CPU functionality.
The job of the CPU scheduler is to select a process on the ready queue and change its state.
The CPU normally supports a vector of interrupts so the OS can respond appropriately when some event of interest occurs in the hardware.
Using traps, a device controller can use idle time on the bus to read from or write to main memory.
During a context switch, the state of one process is copied from the CPU and saved, and the state of a different process is restored.
An operating system consists of a kernel and a collection of application programs that run as user processes and either provide OS services to the user or work in the background to keep the computer running smooth.
There are so many terms from our chapters, I am not quite sure if I am using the correct ones.

My thoughts:
1. Processes and/or threads. Jobs and tasks aren't unheard of either. There can be other things. E.g. in MS Windows there are also Deferred Procedure Calls (DPCs) that can be queued.
2. This must be polling.
4. Why CPU scheduler? Why not just scheduler?
6. I'm not sure about traps in the hardware/bus context.

Related

Does kernel spawn some processes (not user process) that keep running in background?

I'm learning the concepts of operating system. This is part I've learned: kernel is key piece of os that does lots of critical things such as memory management, job scheduling etc.
This is part what I'm thinking and get confused: to have os operating as expected, in a sense kernel needs to keep running, perhaps in the background, so it is always able to respond to different system calls and interrupts. In order to achieve this, I think of two completely different approaches:
kernel actually spawns some processes purely on its behalf, not user process, and keep them running in background (like daemon)? These background processes will handle housekeeping stuff without acknowledgement from user or user process. I call this approach as "kernel is running on its own"
There is no kernel process at all. Every process we can find in os are all user processes. Kernel is nothing but a library (piece of code, along with some key data structures like page tables etc) shared among all these user processes. In each process's address space, some portion of kernel will be loaded so that when any interrupt or system call occurs, mode is elevated to kernel mode. Pieces of kernel code loaded into user process's address space will be executed so that kernel can handle the event. When kernel does that, it is still in the context of current user process. In this approach, there exists only user processes, but kernel will periodically run within the context of each user process (but in a different mode).
This is a conceptual question that has confused me for a while. Thanks in advance!
The answer to your question is mostly no. The kernel doesn't spawn kernel mode processes. At boot, the kernel might start some executables but they run in user mode as a privileged user. For example, the Linux kernel will start systemd as the first user mode process as the root user. This process will read configuration files (written by your distribution's developers like Ubuntu) and start some other processes like the X Server for graphics and basic input (from keyboard, mouse, etc).
Your #1 is wrong and your #2 is also somewhat wrong. The kernel isn't a library. It is code loaded in the top half of the virtual address space. The bottom half of the VAS is very big (several tens of thousands of GB) so user mode processes can become very big as long as you have physical RAM or swap space to back the memory they require. The top half of the VAS is shared between processes. For the bottom half, every process has theoretical access to all of it.
The kernel is called on system call and on interrupt. It doesn't run all the time like a process. It simply is called when an interrupt or syscall occurs. To make it work with more active processes than there are processor cores, timers will be used. On x86-64, each core has one local APIC. The local APIC has a timer that you can program to throw an interrupt after some time. The kernel will thus give a time slice to each process, choose one process in the list and start the timer with its corresponding time slice. When the timer throws an interrupt, the kernel knows that the time slice of that process is over and that it might be time to let another process take its place on that core.
First of all, A library can have its own background threads.
Secondly, the answer is somewhere between these approaches.
Most Unix-like system are built on a monolithic kernel (or hybrid one). That means the kernel contains all its background work in kernel threads in a single address space. I wrote in more details about this here.
On most Linux distributions, you can run
ps -ef | grep '\[.*\]'
And it will show you kernel threads.
But it will not show you "the kernel process", because ps basically only shows threads. Multithreaded processes will be seen via their main thread. But the kernel doesn't have a main thread, it owns all the threads.
If you want to look at processes via the lens of address spaces rather than threads, there's not really a way to do it. However, address spaces are useless if no thread can access them, So you access the actual address space of a thread (if you have permission) via /proc/<pid>/mem. So if you used the above ps command and found a kernel thread, you can see its address space using this approach.
But you don't have to search - you can also access the kernel's address space via /proc/kcore.
You will see, however, that these kernel threads aren't, for the most part, core kernel functionality such as scheduling & virtual memory management. In most Unix kernels, these happen during a system call by the thread that made the system call while it's running in kernel mode.
Windows, on the other hand, is built on a microkernel. That means that the kernel launches other processes and delegates work to them.
On Windows, that microkernel's address space is represented by the "System" service. The other processes - file systems, drivers etc., and other parts of what a monolithic kernel would comprise e.g. virtual memory management - might run in user mode or kernel mode, but still in a different address space than the microkernel.
You can get more details on how this works on Wikipedia.
Thirdly, just to be clear, that none of these concepts is to be confused with "system daemon", which are the regular userspace daemons that an OS needs in order to function, e.g. systemd, syslog, cron, etc..
Those are generally created by the "init" process (PID 1 on Unix systems) e.g. systemd, however systemd itself is created by the kernel at boot time.

Operating System Basics

I am reading process management,and I have a few doubts-
What is meant by an I/o request,for E.g.-A process is executing and
hence it is in running state,it is in waiting state if it is waiting
for the completion of an I/O request.I am not getting by what is meant by an I/O request,Can you
please give an example to elaborate.
Another doubt is -Lets say that a process is executing and suddenly
an interrupt occurs,then the process stops its execution and will be
put in the ready state,is it possible that some other process began
its execution while the interrupt is also being processed?
Regarding the first question:
A simple way to think about it...
Your computer has lots of components. CPU, Hard Drive, network card, sound card, gpu, etc. All those work in parallel and independent of each other. They are also generally slower than the CPU.
This means that whenever a process makes a call that down the line (on the OS side) ends up communicating with an external device, there is no point for the OS to be stuck waiting for the result since the time it takes for that operation to complete is probably an eternity (in the CPU view point of things).
So, the OS fires up whatever communication the process requested (call it IO request), flags the process as waiting for IO, and switches execution to another process so the CPU can do something useful instead of sitting around blocked waiting for the IO request to complete.
When the external device finishes whatever operation was requested, it generates an interrupt, so the OS is informed the work is done, and it can then flag the blocked process as ready again.
This is all a very simplified view of course, but that's the main idea. It allows the CPU to do useful work instead of waiting for IO requests to complete.
Regarding the second question:
It's tricky, even for single CPU machines, and depends on how the OS handles interrupts.
For code simplicity, a simple OS might for example, whenever an interrupt happens process the interrupt in one go, then resume whatever process it decides it's appropriate whenever the interrupt handling is done. So in this case, no other process would run until the interrupt handling is complete.
In practice, things get a bit more complicated for performance and latency reasons.
If you think about an interrupt lifetime as just another task for the CPU (From when the interrupt starts to the point the OS considers that handling complete), you can effectively code the interrupt handling to run in parallel with other things.
Just think of the interrupt as notification for the OS to start another task (that interrupt handling). It grabs whatever context it needs at the point the interrupt started, then keeps processing that task in parallel with other processes.
I/O request generally just means request to do either Input , Output or both. The exact meaning varies depending on your context like HTTP, Networks, Console Ops, or may be some process in the CPU.
A process is waiting for IO: Say for example you were writing a program in C to accept user's name on command line, and then would like to print 'Hello User' back. Your code will go into waiting state until user enters their name and hits Enter. This is a higher level example, but even on a very low level process executing in your computer's processor works on same basic principle
Can Processor work on other processes when current is interrupted and waiting on something? Yes! You better hope it does. Thats what scheduling algorithms and stacks are for. However the real answer depending on what Architecture you are on, does it support parallel or serial processing etc.

How does the operating system handle its responsibilities while a process is executing?

I had this question in mind from long time and may sound little vacuous. We know that operating system is responsible for handling memory allocation, process management etc. CPU can perform only one task at a time(assuming it to be single core). Suppose an operating system has allocated a CPU cycle to some user initiated process and CPU is executing that. Now where is operating system running? If some other process is using the CPU, then, is operating system not running for that moment? as OS itself must need CPU to run. If in case OS is not running, then who is handling process management, device management etc for that period?
The question is mixing up who's in control of the memory and who's in control of the CPU. The wording “running” is imprecise: on a single CPU, a single task is running at any given time in the sense that the processor is executing its instructions; but many tasks are executing in the sense that their state is stored in memory and their execution can resume at any time.
While a process is executing on the CPU, the kernel is not executing. Its state is saved in memory. The execution of the kernel can resume:
if the process code makes a jump into kernel code — this is called a system call.
if an interrupt occurs.
If the operating system provides preemptive multitasking, it will schedule an interrupt to happen after an interval of time (called a time slice). On a non-preemptive operating system, the process will run forever if it doesn't yield the CPU. See What mechanisms prevent a process from taking over the processor forever? for an explanation of how preemption works.
Tasks such as process management and device management are triggered by some event. If the event is a request by the process, the request will take the form of a system call, which executes kernel code. If the event is triggered from hardware, it will take the form of an interrupt, which executes kernel code.
(Note: in this answer, I use “CPU” and “processor” synonymously, to mean a single execution thread: a single core, or whatever the hardware architecture is.)
The OS kernel does nothing at all until it is entered via an interrupt. It may be entered because of a hardware interrupt that causes a driver to run and the driver chooses to exit via the OS, or a running thread may make a syscall interrupt.
Unless an interrupt occurs, the OS kernel does nothing at all. It does not need to do anything.
Edit:
DMA is, (usually), used for bulk I/O and is handled by a hardware subsystem that handles requests issued by a system call, (software interrupt). When a DMA operation is complete, the DMA hardware raises a hardware interrupt, so running a driver that can further signal the OS of the completion, maybe changing the set of running threads, so DMA is managed by interrupts.
A new process/thread can only be loaded by an existing thread that has issued a system call, (software interrupt), and so new processes are initiated by interrupts.
It's interrupts, all the way down :)
It depends on which type of CPU Scheduling you are using : (in case of single core)
if your process is executing with preemptive scheduling then you can interrupt the process in between for some time duration and you can use the CPU for some other Process or O.S. but in case of Non-Preemptive Scheduling process is not going to yield the CPU before completing there execution.
In case of single Core, if there is a single process then it will execute with given instruction and if there are multiple process then states stored in the PCB. which make process queue and execute one after another, if no interrupts occur.
PCB is responsible for any process management.
when a process initialize it calls to Library function that's System calls and execution of Kernel get invoke if some process get failed during execution or interrupt occur.

VB.NET multi threading and system architecture

I believe I have a reasonable understanding of threading from an Object Oriented perspective using the Thread class and the Runnable interface. In one of my applications there is a "download" button that allows the user to run a task in the background that takes about half an hour whilst continuing to use the VB.NET application.
However, I do not understand how Threading maps to the physical architecture of a computer. If you have a single threaded application that runs on a PC with a quadcore processor then does a .net program use all four processors?
If you have a multi threaded application (say four threads) on a quadcore processor then does each thread execute on different cores?
Do you have any control of this as a developer?
I have referenced a book I read at university called Operating System Concepts, but I have not found a specific answer.
If you have a single threaded application that runs on a PC with a quadcore processor then does a .net program use all four processors?
No, it can’t, at least not simultaneously. However, it’s theoretically possible that the operating system’s scheduler first executes your thread on one processor and later moves it to another processor. Such a scheduler is necessary to allow simultaneously running more applications / threads than there are physical processors present: execution of a thread is sliced into small pieces, which are fed to the processor(s) one after the other. Each threads gets some time slice allocated during which it can calculate before usage of the CPU switches to another thread.
Do you have any control of this as a developer?
Not directly. What you can control is the priority of your thread to make it more important to the task scheduler.
On a more general note, you should not use threads in your use-case – at least not directly. Threads are actually pretty low-level primitives. For your specific use-case, there’s a component called BackgroundWorker which abstracts many of the low-level details of thread management for you.
If you have a multi threaded application (say four threads) on a quadcore processor then does each thread execute on different cores?
Not necessarily; again, the application has next to no control over how exactly its threads are executed; the operating system however tries really hard to schedule threads “smartly”. This means that in practice, if your application has several busy threads, they are spread out as evenly as possible across the available cores. In particular, if there are no more threads than cores then each thread gets its own core.
Generally, you do not need to worry about mapping to physical architecture, .NET and the OS will do their best to maximize efficiency of your application. Just make it multi-threaded, even at the cost of being slower on a single threaded computer. You can, however, limit your maximum number of threads (if your app theoretically scales to infinity), to a number of cores, or double that. Test performance for each case, and decide which maximum works best for you.
Sometimes setting a core # can make your app's performance even worse. For example, if core #1 is currently scanning your PC for viruses, and your antivirus is single threaded. With the above scenario, assuming a quad-core PC, you do NOT want to run your 4-threaded app on a 1-per-core basis.
Having said that, if you really want to run a thread on specific core, it is possible - see this:
How to start a Thread at Specific Core
How Can I Set Processor Affinity in .NET?
Also check this out:
How to control which core a process runs on?

What is the difference between a thread/process/task?

What is the difference between a thread/process/task?
Process:
A process is an instance of a computer program that is being executed.
It contains the program code and its current activity.
Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.
Process-based multitasking enables you to run the Java compiler at the same time that you are using a text editor.
In employing multiple processes with a single CPU,context switching between various memory context is used.
Each process has a complete set of its own variables.
Thread:
A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers.
A thread of execution results from a fork of a computer program into two or more concurrently running tasks.
The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process. Multiple threads can exist within the same process and share resources such as memory, while different processes do not share these resources.
Example of threads in same process is automatic spell check and automatic saving of a file while writing.
Threads are basically processes that run in the same memory context.
Threads may share the same data while execution.
Thread Diagram i.e. single thread vs multiple threads
Task:
A task is a set of program instructions that are loaded in memory.
Short answer:
A thread is a scheduling concept, it's what the CPU actually 'runs' (you don't run a process). A process needs at least one thread that the CPU/OS executes.
A process is data organizational concept. Resources (e.g. memory for holding state, allowed address space, etc) are allocated for a process.
To explain on simpler terms
Process: process is the set of instruction as code which operates on related data and process has its own various state, sleeping, running, stopped etc. when program gets loaded into memory it becomes process. Each process has atleast one thread when CPU is allocated called sigled threaded program.
Thread: thread is a portion of the process. more than one thread can exist as part of process. Thread has its own program area and memory area. Multiple threads inside one process can not access each other data. Process has to handle sycnhronization of threads to achieve the desirable behaviour.
Task: Task is not widely concept used worldwide. when program instruction is loaded into memory people do call as process or task. Task and Process are synonyms nowadays.
A process invokes or initiates a program. It is an instance of a program that can be multiple and running the same application. A thread is the smallest unit of execution that lies within the process. A process can have multiple threads running. An execution of thread results in a task. Hence, in a multithreading environment, multithreading takes place.
A program in execution is known as process. A program can have any number of processes. Every process has its own address space.
Threads uses address spaces of the process. The difference between a thread and a process is, when the CPU switches from one process to another the current information needs to be saved in Process Descriptor and load the information of a new process. Switching from one thread to another is simple.
A task is simply a set of instructions loaded into the memory. Threads can themselves split themselves into two or more simultaneously running tasks.
for more Understanding refer the link: http://www.careerride.com/os-thread-process-and-task.aspx
Wikipedia sums it up quite nicely:
Threads compared with processes
Threads differ from traditional multitasking operating system processes in that:
processes are typically independent, while threads exist as
subsets of a process
processes carry considerable state information, whereas multiple
threads within a process share state
as well as memory and other resources
processes have separate address spaces, whereas threads share their
address space
processes interact only through system-provided inter-process
communication mechanisms.
Context switching between threads in the same process is
typically faster than context
switching between processes.
Systems like Windows NT and OS/2 are said to have "cheap" threads and "expensive" processes; in other operating systems there is not so great a difference except the cost of address space switch which implies a TLB flush.
Task and process are used synonymously.
from wiki clear explanation
1:1 (Kernel-level threading)
Threads created by the user are in 1-1 correspondence with schedulable entities in the kernel.[3] This is the simplest possible threading implementation. Win32 used this approach from the start. On Linux, the usual C library implements this approach (via the NPTL or older LinuxThreads). The same approach is used by Solaris, NetBSD and FreeBSD.
N:1 (User-level threading)
An N:1 model implies that all application-level threads map to a single kernel-level scheduled entity;[3] the kernel has no knowledge of the application threads. With this approach, context switching can be done very quickly and, in addition, it can be implemented even on simple kernels which do not support threading. One of the major drawbacks however is that it cannot benefit from the hardware acceleration on multi-threaded processors or multi-processor computers: there is never more than one thread being scheduled at the same time.[3] For example: If one of the threads needs to execute an I/O request, the whole process is blocked and the threading advantage cannot be utilized. The GNU Portable Threads uses User-level threading, as does State Threads.
M:N (Hybrid threading)
M:N maps some M number of application threads onto some N number of kernel entities,[3] or "virtual processors." This is a compromise between kernel-level ("1:1") and user-level ("N:1") threading. In general, "M:N" threading systems are more complex to implement than either kernel or user threads, because changes to both kernel and user-space code are required. In the M:N implementation, the threading library is responsible for scheduling user threads on the available schedulable entities; this makes context switching of threads very fast, as it avoids system calls. However, this increases complexity and the likelihood of priority inversion, as well as suboptimal scheduling without extensive (and expensive) coordination between the userland scheduler and the kernel scheduler.