The Operating System notes from my university reads :
The PCB is created when a process is born via fork, and is reclaimed
when a process is terminated. While systems calls such as exec rewrite
the memory image of the process, the PCB (and the entities pointed by
it, like the kernel stack) largely remain intact during exec or any
other system call, except for slight modifications (like changing the
page tables to point to the new memory image).
But during fork system call, the memory image from the parent is wiped and a new memory image is initialized to the child process. Hence the PCB located in the kernel stack of the memory image is also wiped and hence a completely new PCB is re-written to the process is my understanding.
What concept have I understood wrong?
The process control block is located in the Kernel space in the RAM. The kernel space also has the Paging table. When he exec system command is called, the memory image of the process is wiped nd new memory image is written for the process without affecting the process control block in the kernel space for that process, but the paging table that maps the logical address and virtual address of the processes has to be changes since the memory image is changed.Hence in my knowledge,the PCB is not re-written.
Related
Since Process in main memory is stored in the form of stack, heap, data section and static data section. Where does the PCB lies in here? Or is it stored independent from this?
Is it stored on the bottom of the stack of the process or is it independent of the memory representation of process
PCB (Process Control Blocks) which contains all the info about a process are usually stored in a specially reserved memory as part of the kernel space.
The kernel space, part of logical space in RAM, is the core of an Operating System which has full access to the underlying hardware.
Each process's virtual address space comprises of user space and kernel space. As pointed out by many articles, the kernel space of all processes is mapped to same physical address in memory i.e. there is only one kernel in the physical memory. But each process has its own kernel stack which is a part of the kernel space. How does same mapping work for all processes with different kernel stacks?
Note: This is the OS agnostic answer. Details do vary slightly with OS in question (e.g. Darwin and continuations..), and possibly with architectural (ARMv8, x86, etc) implementations.
When a process performs a system call, the user mode state (registers) is saved, including the user mode stack pointer. At that point, a kernel mode stack pointer is loaded, which is usually maintained somewhere in the thread control block.
You are correct in saying that there is only one kernel space. What follows is, that (in theory) one thread in kernel space could easily see and/or tamper with any others in kernel space (just like same process threads can "see" each other in user space) This, however, is (almost always) in theory only, since the kernel code presumably respects memory boundaries (as is assumed user mode does, with thread local storage, etc). That said, "almost always", because if the kernel code can be exploited, then all of kernel memory will be laid bare to the exploiter, and potentially read and/or compromised.
If we talk about Address Space of a process it is the virtual address range which includes static data, stack and heap memory for that particular process. And coming to Process Control Block (PCB) which is a data structure maintained by operating system for each process it manages, where PCB includes a lot of information about the process like process no., process state, program counter, list of open files, cpu scheduling info ...and more.
Now this is the point where I got confused that Address Space is also a memory which stores information about a process and similar thing is done by PCB too. Then how these are connected to each other. I am not able to visualize this in my mind. Why we have these two things existing simultaneously. Isn't it possible to achieve our goal just by using PCB?
Process Address space refer to memory regions the process is using. It typically consists of heap, stack, initialized data, uninitialized data and text. There are mainly two address spaces of a process -- logical and physical.
The PCB is a structure resides in kernel to track the state of process. One of the things PCB contain is memory information. In a typically system, PCB may contain information about pages the process has.
To answer your question, Process Address space is an idea built on top of PCB and many other things (such as page table).
I've watched the presentation and still have one question about working of shared buffers. As the slide 16 shows, when the server handles an incoming request, the postmaster process calls fork() to create a child one for handling the incoming request. Here is a picture from there:
So, we have the entire copy of the postmaster process except its pid. Now, if the child process update some data belonging to shared memory (putting in shared buffers, as shown in the slide 17), we need the other threads be awared of the changes. The picture:
The synchronization process is what I don't understand. Any process owns a copy of the shared memory and while copying it doesn't know if another thread will write something to its copy of the shared memory. What if after creating proc1 by calling fork(), another process proc2 is created a little bit later and start writing something into the its copy of the shared memory.
Question: How does proc1 know what to do with the part of the shared memory that are being modified by proc2?
The crucial thing to understand is that there are two different types of memory sharing used.
One is the copy-on-write sharing used by fork() (without exec()), where the child process inherits the parent process's memory and state. In this case when the child or parent modify anything, a new private copy of the modified memory page is allocated. So the child doesn't see changes made by the parent after fork() and the parent doesn't see changes made by the child after fork(). Peer children cannot see each other's changes either. They're all isolated as far as memory is concerned, they just share a common ancestor.
That memory is what's shown in the Program (text), data and stack sections of the diagram.
Because of that isoltion, PostgreSQL also uses POSIX shared memory - or, in older versions, system V shared memory. These are explicitly shared memory segments that are mapped to a range of addresses. Each process sees the same memory, and it is not copy-on-write. It's fully read/write shared.
This is what is shown in the purple "shared memory" section of the diagram.
POSIX shared memory is used for inter-process communication for locking, for shared_buffers, etc etc. Not the memory inherited from fork()ing.
While memory from fork is often shared copy-on-write, that's really an operating system implementation detail. The operating system could choose not to share it at all, and make an immediate copy of the parent's whole address space for the child at fork time. The only way the copy-on-write sharing is really relevant is when looking at top etc.
When PostgreSQL refers to "shared memory" it's always talking about the POSIX or System V shared memory block(s) that are mapped into each process's address space. Not copy-on-write sharing from fork().
I don't know about this special case but generally in linux and most other operating systems in order to speedup creating a new process, when a process asks operating system to create a new process then OS creates the new one with minimum requirements (specifically in DB applications) and share most of parent memory space with child. Now when a child want to modify some part of shared memory, OS uses COW (copy on write) concept and create a new copy of that part of the memory for child process usage. So this part becomes specific for child process and is no longer shared with parent process.
Context:
I don't really understand how the kernel saves the state of a running code when it gets to exceed its time slice.
I don't visualize what happens actually.
Question:
1) Where is stored the current running code (and its stack ?) ?
2) When the kernel will "see" the code again, will it just follow an offset and keep going as if nothing happened ?
It is not clear to me.
Thanks
Current code instruction pointer and current stack pointer are stored in task_struct->ip and task_struct->sp (for x86) and new process's task_struct->ip and task_struct->sp and are loaded back to sp and ip registers when switch_to() is called in Linux kernel.
Kernel's switch_to() does many things like resetup of EIP, stack, FPU, segment descriptors, debug registers while switching to new process.
Then kernel's switch_mm() switch the virtual memory mappings from last process to new process.
It depends on the OS but as a general rule there is a block of storage which holds information about each process (usually called the Process Control Block or PCB). This information includes a pointer to the current line of code that is being executed and the contents of registers etc, so the process can start again where it stopped last time.
This block of information is owned by the OS itself not the process so it lives beyond the suspension of the process.
The program code itself is not stored in the PCB - it simply exists in memory or on disk. It can even be shared between processes, for example several processes may be running the same program, each at a different point in the code at any given time and each with their own set of 'variables' or data unique to that process's run of the program. All the OS needs is the variables and the line number or pointer to know where a particular process was in the code when it was suspended, and it can start from that point again.
It is worth noting that any RAM the process was using may or may not be still there when it restarts. In general an OS will try to leave recently used or frequently used RAM chunks (or 'pages') in memory if possible. If it needs to free up space, however, it may swap the 'page' out to disk, but disk access is much, much slower, hence the desire to avoid swapping out memory which is likely to be used again if possible.
In the worst case situation an OS may find it swaps out a process and then very soon the new process need to use some memory which has to be retrieved from disk. It is suspended while this happens as the retrieval take a long time in CPU terms. It may then happen that the next process also very soon finds itself in the same situation. The OS is now spending a lot of its time swapping processes and memory in and out and much less of its time doing real work - this is commonly called 'thrashing'.