Accessing an illegal address after sbrk - page-tables

From what I understand, sbrk, unlike mmap, does not work in the granularity of a page size. For example, I could increase the heap by 100 bytes, even if it means that a page in the virtual address may be partly allocated and partly not.
My question is what stops me from accessing virtual addresses that are in the part of the page that was not allocated? The page table maps pages to frames, so if an address is in a page that was partly allocated by sbrk, what will prevent the translation from succeeding?

Related

Where is page table located?

I've been studying about paging and page tables. I don't see to understand where page tables are located.
In one of the answers from stack exchange(https://unix.stackexchange.com/questions/487052/where-is-page-table-stored-in-linux), it is said that page tables are in kernel address space, which is in virtual memory(from what I understood).
However in lecture slides from University of Illinois(https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf), page tables seem to be in RAM, which is physical memory.
Can anyone tell me clearly where the page tables are stored?
Thank you in advance.
The answer to this question is too broad, and I think it belongs to super-user stack exchange.
In x86 systems, page tables are structures used by the CPU, but they are too large to be hold in registers, so they are kept in RAM.
Any process has a memory map in which there is two big zones: user space and kernel space. Kernel space is the same space for all process. User space is private to that process. On 32 bit X86 based Linux systems, any logical address equal or greater than 0xC0000000 belongs to kernel. Below that address, it's user space.
The page table of the process is held in the kernel space. The kernel may have several page tables in RAM, but only one is the active page table. In x86 CPUs, it's the page table pointed by register CR3.
There is a more detailed explanation of how it works here: https://stackoverflow.com/a/20792205/3011009
i think you have a problem about understanding the virtual and physical memory.
as the name suggest the virtual memory is not real. the reason of the idea of virtual memory was that the process sees all the storage in a computer as the available memory. for example in a 64 bit system, a process might see 2^64 as the memory available to it and another process may see the same thing. so using the virtual memory every process would see a continuous memory available to it which might be so much bigger than the available memory on the system. all the addresses in the virtual memory then should be translated to the equivalent physical memory using something called page tables.
pages are blocks of cells(addresses), for example lets say that the available memory(physical) in a system is 2 GB, and the pages or blocks of cells has been chosen as 4 KB, in this case in a 4 KB block or page 4096 different cells or addresses are available which we could address using 12 bits , since we have:
2^12 = 4096
if the overall memory is 2 GB, then it means we could have:
2GB/4KB = 524288
which means we could have 524288 different pages in the physical memory, now some of these pages are only assigned to the operating system code, which means only the os could have access to it, these are the codes and instructions of the operating system program which could help the execution of every other program. other pages are available for other processes.
now lets say we have an address like this in the virtual memory:
0x000075fe
first of all we said that we need 12 bits to tell the position of every address in the page itself since the page is 4 KB, this position is 5fe, what operating system or every other memory management tool does! is that it won't translate this OFFSET, the position of every address in the virtual page would be the same thing in the physical page, i think this is one of the main features which makes translation beneficial , now the rest of the address should be translated to the related page in the physical which is :
0x00007
for this , the page table should be looked, which as we said is just a table in the kernel memory, which is not accessible in the user space, for example is something like this:
0x00001 0x00004
0x00002 disk ----> means every these addresses are in the disk
0x00007 0x004fe
so the 0x00007 page should be translated to the 0x004fe and therefore the address of:
0x000075fe in the virtual memory would be translated to:
0x004fe5fe in the physical memory , which means this is an address in the page number 0x004fe and the position of 5feth - 1.(since we know the starting point is zero).

Major speed differences between static/stack and heap memory

I've encountered the problem that accessing data stored in heap memory performs really slow when the memory is frequently reallocated.
in comparison to
What could explain this behaviour?
Possibly page fault issues. If you malloc a large block of RAM, the physical RAM will probably not be allocated straight away, some page table entries will be set. The physical RAM won't be allocated until you access a location in it for the first time. This involves
a page fault,
finding a physical memory page
zeroing every location on that page
updating the page table
This is an expensive operation in terms of time and will happen once per allocated page (550 x 4kbyte pages for the RAM you are allocating)

Physical Memory and Virtual Memory data allocation behavior

Im interested in understanding how a computer allocates variables for physical memory vs files in virtual memory ( such as on a hard drive ), in terms of how does the computer determine know where to put data. It almost seems random in both memory storage types, but its not because it simply can't put data at a memory address or sector (any location) of a hard drive that's occupied or allocated for another process already. When I was studying how Norton's speed disk ( a program that de-fragments files on hard drives ) on my old W95 system, I noticed from the program's representation of hard drive's data ( a color coded visual map of different data types, e.g. swap files were always first at the top.), consisting of many files spread out all over the hard drive with empty unused areas. In addition some of these areas, I saw what looked like a mix of data and empty space showed a spotty pattern. I want to think its random for that to happen. Like wise, when I was studying the memory addresses of a simple program I wrote in C, I noticed that each version of my program after recompiling it after changes - showed different addresses for segments and offsets. I was expecting the computer to use the same address when I recompiled it. Sometimes the same address would be used, other times it was different. Again, I want to think its random also for memory locations to be chosen by programs. I thought that memory allocation or file writing was based on the first empty space available, written in a contiguous manner.
So my question is, I want to know how and what is it in the logic works of a common computer, that decides where it writes its data in such a arbitrary manner for either type of location (physical RAM or Dynamic )? What area of computer science (if not assembly language) would I need to study that would explains this, almost random behavior?
Thanks in Advance
Something broader and directly from computer science would be a linked list. http://en.wikipedia.org/wiki/Linked_list
Imagine if you had a linked list and simply added items to the end, these items might live linearly in memory or disk or whatever somewhere. But as you remove some items in the middle of the list by having say item number 7 point at item number 9 eliminating item number 8. As with memory allocation for allocs or virtual memory or hard drive sector allocation, etc how fast you fragment your storage has to do with the algorithm you use for allocating the next item.
file systems can/do use a link list type scheme to keep track of what sectors are tied to a single file. it is fast and easy to use the link list but deal with fragmentation. A much slower method would be to have no fragmentation but be constantly copying/moving files around to keep them on linear sectors.
malloc() allocation schemes and MMU allocation schemes also fall under this category. Basically any time you take something, slice it up into fractions and put a virtual interface in front of those fractions to give the appearance to the programmer/user that they are linear. Malloc() (not counting the virtual memory via the MMU) is the other way around allocating a number of linear chunks of those fractions to meed the alloc need, and having an alloc/free scheme that attempts to keep as many large chunks available, just in case, a bad malloc system is one where you have half of your memory free but the maximum malloc that works without an out of memory error is a malloc of a small fraction of that memory, say you have a gig free and can only allocate 4096 bytes.
You should look at virtual memory and TLB (translation lookaside buffer) or paging.
It is not trivial to implement virtual memory and paging. The performance of your whole system depends on it. If it's not done properly your system will thrash.
It is early morning here so Wikipedia will have to do for now: https://en.m.wikipedia.org/wiki/Translation_lookaside_buffer
EDIT:
Those coloured spots you saw in your defrag were chunks on your HDD. Each chunk is of some specified size. Depending on how fragmented your HDD is, you might have portions of your HDD that look like this:
*-*-***-***-*
where * means full, and - means empty
This (above) could be part of one application/file or multiple files; I will assume one file is split across those to simplify my example. At the end of each * there is a pointer to the next location where the next * chunk is (this is called a linked list). The more fragmented your HDD is (or memory) the more of these pointers to next chunk you will have. This in turn uses more space for next pointers instead of using space for data and the result is more overhead when reading that data. If this is a file on disk, you will have multiple seeks (which are bad because they're slow) if your data is not grouped together (locality principle). When you use defrag, it moves and groups all chunks together (as best as it can).
*-*-***-***-*
becomes
*********----
The OS decides paging and virtual memory addressing (and such). TLB is a hardware (a cache) that aids this process (it maps physical memory to virtual memory addresses for fast look up). The CPU communicates with the TLB via MMU
To answer your questions
You should study operating systems.
Yes the locations where to place your files on HDD are decided by the OS. If you deleted a file and download it again, there is no guarantee it will be placed in the same location-most likely not.
A nice summary of all these components and principles I mentioned here work: Click Here. It's a ppt with slides from a Real Time Operating Systems book (if I'm not mistaken the same exact one I used)

Pure segmentation

I'm a bit confused about pure segmentation due to in my head always existed the idea of virtual memory.
But as I understand pure segmentation is also imagining a virtual address space, divided in segments that are ALL loaded in RAM.
The difference with virtual memory with segmentation, is that possibly there's some segment that it's not in RAM.
Is this correct?
I ADD A QUESTION:
Is there a practical difference between segmentation combined with paging, and a two-level paging?, it's the same except for the "limit" protection of the segment method. Or there's another difference?
No, it's not correct. For example, on x86, segmentation uses "far" pointers that consist of two parts: the segment selector (loaded into a segment register, e.g., DS) and an offset into the segment. Segment offsets always begin at 0. The CPU uses the segment selector to find the segment descriptor which contains the segment's LINEAR base address, length and access rights. All accesses are length-checked; if you try to access memory outside of the segment limit or with invalid access (e.g., writing to a read-only segment), the CPU will generate a general protection fault.
Since segment addresses are always zero-based and the segment base is implicit in the segment selector, the OS can move segments around and defragment memory without affecting the programs using that data. (Contrast this with the "flat" memory model where if you move some data, you also have to update all pointers pointing to it.)
Now, when paging is disabled, the LINEAR segment base address is its physical memory address. When paging is enabled, all accesses to segment data are translated by the MMU as usual.
If you're serious about understanding memory management at this level, an excellent explanation can be found by reading Operating System Concepts by Silberschatz, Galvin, and Gagne. You should be able to find an inexpensive, older edition.

How does a stack memory increase?

In a typical C program, the linux kernel provides 84K - ~100K of memory. How does the kernel allocate more memory for the stack when the process uses the given memory.
IMO when the process takes up all the memory of the stack and now uses the next contiguous memory, ideally it should page fault and then the kernel handles the page fault.
Is it here that the kernel provides more memory to the stack for the given process, and which data structure in linux kernel identifies the size of the stack for the process??
There are a number of different methods used, depending on the OS (linux realtime vs. normal) and the language runtime system underneath:
1) dynamic, by page fault
typically preallocate a few real pages to higher addresses and assign the initial sp to that. The stack grows downward, the heap grows upward. If a page fault happens somewhat below the stack bottom, the missing intermediate pages are allocated and mapped. Effectively increasing the stack from the top towards the bottom automatically. There is typically a maximum up to which such automatic allocation is performed, which can or can not be specified in the environment (ulimit), exe-header, or dynamically adjusted by the program via a system call (rlimit). Especially this adjustability varies heavily between different OSes. There is also typically a limit to "how far away" from the stack bottom a page fault is considered to be ok and an automatic grow to happen. Notice that not all systems' stack grows downward: under HPUX it (used?) to grow upward so I am not sure what a linux on the PA-Risc does (can someone comment on this).
2) fixed size
other OSes (and especially in embedded and mobile environments) either have fixed sizes by definition, or specified in the exe header, or specified when a program/thread is created. Especially in embedded real time controllers, this is often a configuration parameter, and individual control tasks get fix stacks (to avoid runaway threads taking the memory of higher prio control tasks). Of course also in this case, the memory might be allocated only virtually, untill really needed.
3) pagewise, spaghetti and similar
such mechanisms tend to be forgotten, but are still in use in some run time systems (I know of Lisp/Scheme and Smalltalk systems). These allocate and increase the stack dynamically as-required. However, not as a single contigious segment, but instead as a linked chain of multi-page chunks. It requires different function entry/exit code to be generated by the compiler(s), in order to handle segment boundaries. Therefore such schemes are typically implemented by a language support system and not the OS itself (used to be earlier times - sigh). The reason is that when you have many (say 1000s of) threads in an interactive environment, preallocating say 1Mb would simply fill your virtual address space and you could not support a system where the thread needs of an individual thread is unknown before (which is typically the case in a dynamic environment, where the use might enter eval-code into a separate workspace). So dynamic allocation as in scheme 1 above is not possible, because there are would be other threads with their own stacks in the way. The stack is made up of smaller segments (say 8-64k) which are allocated and deallocated from a pool and linked into a chain of stack segments. Such a scheme may also be requried for high performance support of things like continuations, coroutines etc.
Modern unixes/linuxes and (I guess, but not 100% certain) windows use scheme 1) for the main thread of your exe, and 2) for additional (p-)threads, which need a fix stack size given by the thread creator initially. Most embedded systems and controllers use fixed (but configurable) preallocation (even physically preallocated in many cases).
edit: typo
The stack for a given process has a limited, fixed size. The reason you can't add more memory as you (theoretically) describe is because the stack must be contiguous, and it grows toward the heap. So, when the stack reaches the heap, no extension is possible.
The stack size for a userland program is not determined by the kernel. The kernel stack size is a configuration option for the kernel (usually 4k or 8k).
Edit: if you already know this, and were merely talking about the allocation of physical pages for a process, then you have the procedure down already. But there's no need to keep track of the "stack size" like this: the virtual pages in the stack with no pagetable entries are just normal overcommitted virtual pages. Physical memory will be granted on their first access. But the kernel does not have to overcommit memory, and thus a stack will probably have complete physical realization when the executable is first loaded.
The stack can only be used up to a certain length, because it has a fixed storage capacity in memory. If your question asks in what direction does the stack being used up? the answer is downwards. It is filled down in memory towards the heap. The heap is a dynamic component of memory by which it can actually grow from the bottom up, based on your need of data storage.