What does this statement in the context of process creation mean?
"In UNIX, the child's initial address space is a copy of the parent's, but there are definitely two distinct address spaces involved; no writable memory is shared".
I do understand that after the fork system call, the parent process is cloned and that clears the copied part. What I find difficult understanding is the "different address spaces" part, after the address space is copied.
Thank You
"Different address spaces" just means that the two processes have separate and independent copies of all their data in memory. Initially those copies are the same, but each process can change data in its own memory and the changes are not visible to the other process. For example, if the initial process has a variable called x stored at address 0x01234567, after the fork() both processes will have a variable at that address, but they're different variables that can hold different values despite having the same address. An address like 0x01234567 actually corresponds to different places in RAM in each process.
If both processes shared the same address space, they'd both be looking at the same memory (rather than separate and independent copies of it), so changes made by one process would be visible to the other. An address like 0x01234567 would refer to the same spot in RAM in both processes.
(In principle, fork() makes a complete copy of all the calling process's memory. In practice, the copying is typically deferred, using a technique called "copy-on-write" that allows the system to avoid making duplicate copies of data that's the same in both processes. But that's an implementation detail that's basically invisible to applications; the system behaves as if fork() made a complete copy of everything.)
In linux there is a concept of COW (Copy On Write). When a fork() is called the child process will be created. The child and parent process will have there own address space. The child process clones the address space of parent (including stack and heap). But it is required to know when the cloning will happen. If the parent has stack memory of say 100 bytes and the child process does not modify that memory (or simply reads) through out it's life span then the stack memory of 100 bytes will not be cloned. Only when the child process tries to write to that memory the memory will be cloned. This functionality in linux is called COW.
Related
If we talk about Address Space of a process it is the virtual address range which includes static data, stack and heap memory for that particular process. And coming to Process Control Block (PCB) which is a data structure maintained by operating system for each process it manages, where PCB includes a lot of information about the process like process no., process state, program counter, list of open files, cpu scheduling info ...and more.
Now this is the point where I got confused that Address Space is also a memory which stores information about a process and similar thing is done by PCB too. Then how these are connected to each other. I am not able to visualize this in my mind. Why we have these two things existing simultaneously. Isn't it possible to achieve our goal just by using PCB?
Process Address space refer to memory regions the process is using. It typically consists of heap, stack, initialized data, uninitialized data and text. There are mainly two address spaces of a process -- logical and physical.
The PCB is a structure resides in kernel to track the state of process. One of the things PCB contain is memory information. In a typically system, PCB may contain information about pages the process has.
To answer your question, Process Address space is an idea built on top of PCB and many other things (such as page table).
This is a sentence in the PowerPoint of my system lecture, but I don't understand why context switch invalidates the MMU. I know it will invalidate the cache since the cache contains information of another process. However, as for MMU, it just maps virtual memory to physical memory. If context switch invalidates it, does this mean the MMU use different mechanism of mapping in different processes?
Does this mean the MMU use different mechanism of mapping in different processes?
Your conclusion is essentially right.
Each process has its mapping from virtual to physical addresses (called context).
The address 0x401000 for example can be translated to 0x01234567 for process A and to 0x89abcdef for process B.
Having different contexts allows for an easy isolation of the processes, easy on demand paging and simplified relocation.
So each context switch must invalidate the TLB or the CPU would continue using the old translations.
Some pages however are global, meaning that they have the same translation independently of the current process address space.
For example the kernel code is mapped in the same way for every process adn thus doesn't need to be remapped.
So in the end only a part of the TLB is invalidated.
You can read how Linux handles the process address space for a real example of applied theory.
What you are describing is entirely system specific.
First of all, what they are probably referring to is invaliding the MMU cache. That assume the MMU has a cache (likely these days but not guaranteed).
When a context switch occurs, the processor has set put the MMU in a state where leftovers from the previous process would screw up the new process. If it did not, the cache would map the new process's logical pages to the old process's physical page frames.
For example, some processors use one page table for the system space and one or more other page tables for the user space. After a context switch, it would be ideal for the processor to invalidate any caching of the user space page tables but leave any caching of the system table table alone.
Note that in most processors all of this is done entirely behind the scenes. Even OS programmers do not need to deal with (or even be aware of) any flushing or invalidation of the MMU. There is a single switch process context instruction that handles everything. Other processors require the OS programmer to handle additional tasks as part of a context switch which, in some oddball processors, includes explicitly flushing the MMU cache.
I've watched the presentation and still have one question about working of shared buffers. As the slide 16 shows, when the server handles an incoming request, the postmaster process calls fork() to create a child one for handling the incoming request. Here is a picture from there:
So, we have the entire copy of the postmaster process except its pid. Now, if the child process update some data belonging to shared memory (putting in shared buffers, as shown in the slide 17), we need the other threads be awared of the changes. The picture:
The synchronization process is what I don't understand. Any process owns a copy of the shared memory and while copying it doesn't know if another thread will write something to its copy of the shared memory. What if after creating proc1 by calling fork(), another process proc2 is created a little bit later and start writing something into the its copy of the shared memory.
Question: How does proc1 know what to do with the part of the shared memory that are being modified by proc2?
The crucial thing to understand is that there are two different types of memory sharing used.
One is the copy-on-write sharing used by fork() (without exec()), where the child process inherits the parent process's memory and state. In this case when the child or parent modify anything, a new private copy of the modified memory page is allocated. So the child doesn't see changes made by the parent after fork() and the parent doesn't see changes made by the child after fork(). Peer children cannot see each other's changes either. They're all isolated as far as memory is concerned, they just share a common ancestor.
That memory is what's shown in the Program (text), data and stack sections of the diagram.
Because of that isoltion, PostgreSQL also uses POSIX shared memory - or, in older versions, system V shared memory. These are explicitly shared memory segments that are mapped to a range of addresses. Each process sees the same memory, and it is not copy-on-write. It's fully read/write shared.
This is what is shown in the purple "shared memory" section of the diagram.
POSIX shared memory is used for inter-process communication for locking, for shared_buffers, etc etc. Not the memory inherited from fork()ing.
While memory from fork is often shared copy-on-write, that's really an operating system implementation detail. The operating system could choose not to share it at all, and make an immediate copy of the parent's whole address space for the child at fork time. The only way the copy-on-write sharing is really relevant is when looking at top etc.
When PostgreSQL refers to "shared memory" it's always talking about the POSIX or System V shared memory block(s) that are mapped into each process's address space. Not copy-on-write sharing from fork().
I don't know about this special case but generally in linux and most other operating systems in order to speedup creating a new process, when a process asks operating system to create a new process then OS creates the new one with minimum requirements (specifically in DB applications) and share most of parent memory space with child. Now when a child want to modify some part of shared memory, OS uses COW (copy on write) concept and create a new copy of that part of the memory for child process usage. So this part becomes specific for child process and is no longer shared with parent process.
Context:
I don't really understand how the kernel saves the state of a running code when it gets to exceed its time slice.
I don't visualize what happens actually.
Question:
1) Where is stored the current running code (and its stack ?) ?
2) When the kernel will "see" the code again, will it just follow an offset and keep going as if nothing happened ?
It is not clear to me.
Thanks
Current code instruction pointer and current stack pointer are stored in task_struct->ip and task_struct->sp (for x86) and new process's task_struct->ip and task_struct->sp and are loaded back to sp and ip registers when switch_to() is called in Linux kernel.
Kernel's switch_to() does many things like resetup of EIP, stack, FPU, segment descriptors, debug registers while switching to new process.
Then kernel's switch_mm() switch the virtual memory mappings from last process to new process.
It depends on the OS but as a general rule there is a block of storage which holds information about each process (usually called the Process Control Block or PCB). This information includes a pointer to the current line of code that is being executed and the contents of registers etc, so the process can start again where it stopped last time.
This block of information is owned by the OS itself not the process so it lives beyond the suspension of the process.
The program code itself is not stored in the PCB - it simply exists in memory or on disk. It can even be shared between processes, for example several processes may be running the same program, each at a different point in the code at any given time and each with their own set of 'variables' or data unique to that process's run of the program. All the OS needs is the variables and the line number or pointer to know where a particular process was in the code when it was suspended, and it can start from that point again.
It is worth noting that any RAM the process was using may or may not be still there when it restarts. In general an OS will try to leave recently used or frequently used RAM chunks (or 'pages') in memory if possible. If it needs to free up space, however, it may swap the 'page' out to disk, but disk access is much, much slower, hence the desire to avoid swapping out memory which is likely to be used again if possible.
In the worst case situation an OS may find it swaps out a process and then very soon the new process need to use some memory which has to be retrieved from disk. It is suspended while this happens as the retrieval take a long time in CPU terms. It may then happen that the next process also very soon finds itself in the same situation. The OS is now spending a lot of its time swapping processes and memory in and out and much less of its time doing real work - this is commonly called 'thrashing'.
Let's focus on uniprocessor computer systems. When a process gets created, as far as I know, the page table gets set up which maps the virtual addresses to the physical memory address space. Each process gets its own page table, stored in the kernel address space. But how does the MMU choose the right page table for the process since there is not only one process running and there will be many context switches happening?
Any help is appreciated!
Best,
Simon
Processors have a privileged register called the page table base register (PTBR), on x86 it is CR3. On a context switch, the OS changes the value of the PTBR so that the processor now knows which page table to use. In addition to the PTBR, many modern processors have a notion of an address space number (ASN). Processes are given an address space number (from a limited pool) and this ASN is set in a register on a context switch as well. This ASN is used as part of TLB matching and allows TLB entries from multiple address spaces to coexist. Only when an ASN is reused is it necessary to flush the TLB, and then only for entries matching that ASN. Most x86 implementations are more coarse grained than this and there is a notion of global pages (for shared libraries and shared data).
The MMU in this case is unaware completely of what a process is. The operating system, which keeps tracks of processes, generates a page table for each process, as you say, as they are created. The process for context switching is as follows:
The operating system tells the MMU to use page table located at physical address 0xFOO
The operating system programs the programmable interrupt timer (PIT) to cause a hardware interrupt after BAR milliseconds.
The operating system restores the process state (CPU registers, program counter, etc) and jumps to the correct address.
The process runs until the PIT triggers an interrupt.
The Operating System routine for handling the PIT interrupt then saves the program state (registers etc), uses a scheduling algorithm for determining the next process to run (in a simple case, a circular linked list), then starts over at step 1.
I hope that clears up any doubts you may have. The short answer: The MMU is process agnostic and doesn't know what a process is.