How is the Address Space (of a process) and Process Control Block (PCB) are related in Operating System? - process

If we talk about Address Space of a process it is the virtual address range which includes static data, stack and heap memory for that particular process. And coming to Process Control Block (PCB) which is a data structure maintained by operating system for each process it manages, where PCB includes a lot of information about the process like process no., process state, program counter, list of open files, cpu scheduling info ...and more.
Now this is the point where I got confused that Address Space is also a memory which stores information about a process and similar thing is done by PCB too. Then how these are connected to each other. I am not able to visualize this in my mind. Why we have these two things existing simultaneously. Isn't it possible to achieve our goal just by using PCB?

Process Address space refer to memory regions the process is using. It typically consists of heap, stack, initialized data, uninitialized data and text. There are mainly two address spaces of a process -- logical and physical.
The PCB is a structure resides in kernel to track the state of process. One of the things PCB contain is memory information. In a typically system, PCB may contain information about pages the process has.
To answer your question, Process Address space is an idea built on top of PCB and many other things (such as page table).

Related

Where Does PCB of a process lies inside the main memory

Since Process in main memory is stored in the form of stack, heap, data section and static data section. Where does the PCB lies in here? Or is it stored independent from this?
Is it stored on the bottom of the stack of the process or is it independent of the memory representation of process
PCB (Process Control Blocks) which contains all the info about a process are usually stored in a specially reserved memory as part of the kernel space.
The kernel space, part of logical space in RAM, is the core of an Operating System which has full access to the underlying hardware.

Operating Systems: Process Creation

What does this statement in the context of process creation mean?
"In UNIX, the child's initial address space is a copy of the parent's, but there are definitely two distinct address spaces involved; no writable memory is shared".
I do understand that after the fork system call, the parent process is cloned and that clears the copied part. What I find difficult understanding is the "different address spaces" part, after the address space is copied.
Thank You
"Different address spaces" just means that the two processes have separate and independent copies of all their data in memory. Initially those copies are the same, but each process can change data in its own memory and the changes are not visible to the other process. For example, if the initial process has a variable called x stored at address 0x01234567, after the fork() both processes will have a variable at that address, but they're different variables that can hold different values despite having the same address. An address like 0x01234567 actually corresponds to different places in RAM in each process.
If both processes shared the same address space, they'd both be looking at the same memory (rather than separate and independent copies of it), so changes made by one process would be visible to the other. An address like 0x01234567 would refer to the same spot in RAM in both processes.
(In principle, fork() makes a complete copy of all the calling process's memory. In practice, the copying is typically deferred, using a technique called "copy-on-write" that allows the system to avoid making duplicate copies of data that's the same in both processes. But that's an implementation detail that's basically invisible to applications; the system behaves as if fork() made a complete copy of everything.)
In linux there is a concept of COW (Copy On Write). When a fork() is called the child process will be created. The child and parent process will have there own address space. The child process clones the address space of parent (including stack and heap). But it is required to know when the cloning will happen. If the parent has stack memory of say 100 bytes and the child process does not modify that memory (or simply reads) through out it's life span then the stack memory of 100 bytes will not be cloned. Only when the child process tries to write to that memory the memory will be cloned. This functionality in linux is called COW.

Running multiple executables linked to 0x400000

I'm interested in operating systems topic and I have a dummy question. Standard PE executable files are linked to 0x400000. My question is how can operating system load multiply executables with same image base, when virtual memory just maps virtual addresses to physical. Is it storing PDE and PTE index of thread somewhere? Is there some addition to each address before execution starts? How does it work?
Each process gets its own virtual address space, and hence there's no conflict. All virtual address spaces that exist in any one time in the system get mapped into the physical address space. Virtual memory that can't or currently isn't mapped onto a particular physical memory is held in the swap file (swap partition, or alike) — this is called paging.
During thread switches, when the CPU is about to execute a thread from a different process than it was executing so far, the operating system's scheduler informs the CPU (sets the respective registers) about the new virtual address translation table to use. Thus the CPU thinks there's just one virtual address space at the given time, while the operating system can manage many more, one for each process.
Disclaimer: My answer may be a thought of as a bit superficial or imprecise as opposed to the reality. This for the sake of simplicity in respect to the nature of the OPs question. Also, these mechanisms are CPU-dependent and operating system-dependent.

How does the system choose the right Page Table?

Let's focus on uniprocessor computer systems. When a process gets created, as far as I know, the page table gets set up which maps the virtual addresses to the physical memory address space. Each process gets its own page table, stored in the kernel address space. But how does the MMU choose the right page table for the process since there is not only one process running and there will be many context switches happening?
Any help is appreciated!
Best,
Simon
Processors have a privileged register called the page table base register (PTBR), on x86 it is CR3. On a context switch, the OS changes the value of the PTBR so that the processor now knows which page table to use. In addition to the PTBR, many modern processors have a notion of an address space number (ASN). Processes are given an address space number (from a limited pool) and this ASN is set in a register on a context switch as well. This ASN is used as part of TLB matching and allows TLB entries from multiple address spaces to coexist. Only when an ASN is reused is it necessary to flush the TLB, and then only for entries matching that ASN. Most x86 implementations are more coarse grained than this and there is a notion of global pages (for shared libraries and shared data).
The MMU in this case is unaware completely of what a process is. The operating system, which keeps tracks of processes, generates a page table for each process, as you say, as they are created. The process for context switching is as follows:
The operating system tells the MMU to use page table located at physical address 0xFOO
The operating system programs the programmable interrupt timer (PIT) to cause a hardware interrupt after BAR milliseconds.
The operating system restores the process state (CPU registers, program counter, etc) and jumps to the correct address.
The process runs until the PIT triggers an interrupt.
The Operating System routine for handling the PIT interrupt then saves the program state (registers etc), uses a scheduling algorithm for determining the next process to run (in a simple case, a circular linked list), then starts over at step 1.
I hope that clears up any doubts you may have. The short answer: The MMU is process agnostic and doesn't know what a process is.

Memory Map for RTOS

I am looking forward to understand, what purpose a memory map serves in embedded system.
How does the function stack differs here, from normal unix system.
Any insights that can help me debug few memory related crashes for embedded system will be helpful.
Embedded systems, especially real-time ones, often have a lot of statically-allocated data, and/or data placed at specific locations in memory. The memory map tells you where these things are, which can be helpful when you run into problems and need to examine the state of the system. For example, you might dump all of memory and then analyze it after the fact; in such a case, the memory map will be rather handy for finding the objects you suspect might be related to the problem.
On the code side, your system might log a hardware exception that points to the address of the instruction where the exception was detected. Looking up the memory locations of functions, combined with a disassembly of the function, can help you analyze such problems.
The details really depend on what kind of embedded system you're building. If you provide more details, people may be able to give better responses.
I am not sure that I understand the question. You seem to be suggesting that a "memory map" is something unique to embedded systems or that it is a tangible software component. It is neither; it is merely a description of the layout of an application's memory usage.
All applications will have a memory map regardless of platform, the difference is that typically on an embedded system the application is linked as a single monolithic entity, so that the resultant memory layout refers to the entire system rather than an individual process as it might in an application on a GPOS platform.
It is the linker and the linker script that determines memory mapping, and your linker will be able to output a map report file that describes the layout and allocation applied. This is true of embedded and desktop applications regardless of OS or architecture.
The memory map for a RTOS is not that much different than the memory map for any computer. It defines which hardware resides at which of the processor's addresses. That hardware may be RAM, ROM, Flash, serial ports, parallel ports, timers, interrupt vectors, or any number of other parts addressable by the processor.
The memory map also describes how you intend to budget for limited resources such as RAM, ROM, or Flash in your system design.
For instance, if there's multiple tasks running, RAM might be mapped so that each task has it's own specific area of RAM allocated to it.
In turn, each tasks's part of RAM would be mapped so that there are specific areas for the stack, another for static variables, and perhaps more again for heap(s).
When you have an operating system on the target, it looks after a lot of this dynamically. However, if your application is the only software on the device, you'll have to manage these decisions yourself, usually at compile/link time. Search "link scripts" for further clues,
The Memory map is a layout of memory of system. It is present in both embedded systems and normal applications. Though it is present in normal applications, it's usage is well appreciated in embedded systems due to system constraints.
Memory map is managed by means of linker scripts or linker command files. It maps resources like Flash or Internal RAM(L1P,L1D,L2,L3) or External RAM(DDR) or ROM or peripherals (ports,serial,parallel,USB etc) or specific device registers or I/O ports with appropriate fixed addresses in the memory space of the system.
In case of embedded systems, based on the memory configuration or constraints of board and performance requirements, the segments like text segment or data segment or BSS can also be placed in the appropriate memory of choice.
There are occasions where various versions of development boards will have different configurations of memory and peripherals. In that case, we may need to edit the linker scripts according to memory configuration and peripherals of the board as an essential check-point in board bring-up.
Memory map can help in defining the shared memory too that can play a key role in multi-threaded applications and also for multi-core applications.
Crashes can be debugged by back tracing the address of crash and mapping it to the memory of the system to get an high level idea of the possible library or object causing the problem.