How do we compare ASLR across different Operating Systems?

How do we compare ASLR across different Operating Systems? - aslr

Address space layout randomization (ASLR) is a computer security technique which involves randomly arranging the positions of key data areas, usually including the base of the executable and position of libraries, heap, and stack, in a process's address space.
This is the description from Wikipedia.
Is it fair to estimate effectiveness of ASLR in two different OS by estimating the "randomness" of the positions of the key data areas?
Are there any other measurable factors by which we can compare ASLR effectiveness?
Any tips on how to proceed?

I can think of several parameters that ASLR can implement in different degrees.
The variance of address randomization
The spatial "stickiness" of this randomization - how long do the address remain fixed
The temporal "stickiness" of this randomization - in how many objects e.g. processes does the address stay the same
The performance penalty during exe load
The runtime performance penality
The degree to which it break legacy coding practices (how ever bad they maybe)
It reliance on hardware features (memory protection, virtual memory)
The "wastage" of virtual address space
A requirement for recompilation of kernel or lack there of
A requirement for recompilation of other system support modules

Related

Can a 32-bit processor load a 64-bit memory address using multiple blocks or registers?

I was doing a little on 32-bit microprocessors and have I have learnt that:
1) A 32-bit microprocessor can only address 2^32 bits of memory which means that the memory pointer size should not exceed 32-bit range i.e. the pointer size should be equal to or less than 32-bit.
2) I also came to know that CPU allocate multiple blocks of memory for things like storing numbers and text, that is up to the program and not related to the size of each address (Source:here).So is it possible that a CPU can use multiple blocks (registers) to store pointers more than 32-bit in size?

Processors can access an essentially unlimited amount of memory by using variations on a technique called bank switching. In a simple bank-switching scheme, the memory chips that are wired to a portion of the address space will have some address inputs fed by the processor and some from an external latching device. Historically, the IBM PC had a 1MB address space, but an expanded memory board would IIRC allow two 16KB regions of that space to be mapped to any of dozens or hundreds of 16KB blocks of memory contained thereon. Nowadays processors generally have a memory-management unit built-in, which maps 4KB or 64KB blocks of memory to any address within a much larger space, and additional circuitry may, with OS support, expand things further.
The big difficulty with bank switching is that any given address might identify many different places in memory depending upon how the bank-switching hardware is configured, so accessing data from memories in a banked region will generally be more complicated than accessing data in directly-accessible memory and will only be possible from code which knows how the bank-switching hardware works. Nowadays it's more common to simply use a processor which can access all the memory one needs, but historically bank-switching was often a useful technique for going beyond processor limitations.

You could store a 64 bit pointer using 2 separate locations in memeory. But it probably wouldn't be useful since your processor can only use 32 bit pointers.

Challenges in using flat memory model

The flat memory model(linear memory model) provides maximum execution speed, occupies minimum CPU real estate and has direct access to memory without any segmentation / paging. It seems that flat memory model is ideal for small realtime application or single threaded realtime application.
However, is it possible to use real-time application that is multi-threaded/multi-tasking along with requirement of high resource allocation/protection in flat memory model ?
Thanks

I don't think the memory model has much to do here, except for the (RT)OS itself which you use to get multi-threading / multi-tasking done.
Paging or segmentation, if provided, is useful for the OS primarily for implementing memory protection features. It is only possible this way that the OS may protect itself and running user mode tasks against improperly written code in others which would accidentally write in memory out of their intended domain. (You can't get memory protection without some kind of paging or segmentation since you can't guard every single memory access)
In 32 bit AVR processors there is even a distinction between Memory management unit (MMU) and Memory protection unit (MPU). The first is the more complex unit supporting those kinds of paging features like modern PC processors (for example even making it possible to realize virtual memory), while the latter is a simpler subset only giving you tools for realizing memory protection (for example by the OS, to protect itself and tasks against each other), while it does not have any remapping capability (by a given address you always access the same cell of memory) like the MMU does. (Why the distinction? Because some cheaper AVR32's, where that's sufficient, only have an MPU)
So on a simple flat memory model what important thing you won't get are the protection features. If you can get by without those, it should go just fine.

Why is virtual memory needed in embedded systems? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Per my understanding, virtual memory is as follows:
Programs/applications/executables reside in a storage device. Storage device access is much slower than RAM. Hence, programs is copied from storage memory to main memory for execution. Since computers have limited main memory (RAM), when all of the RAM is being used (e.g., if there are many programs open simultaneously or if one very large program is in use), a computer with virtual memory enabled will swap data to the HDD and back to memory as needed, thus, in effect, increasing the total system memory.
As far as I know, most embedded devices do not have disk memory (like smartphones or in car infotainment systems). Code is directly executed from Flash memory. RAM is mainly used as a scratchpad area (local variables, return address etc).
So why do we need virtual memory in embedded systems? (e.g. WinCE and QNX support virtual memory)

Your understanding is completely wrong. You are confusing virtual memory with swapping or page files. There are systems that have virtual memory and no swap or page files and there are systems that swap without virtual memory.
Virtual memory just means that a process has a view of memory that is different from the physical mapping. Among other things, it allows processes to have their own virtual address space.
Storage device access is much slower than RAM. Hence programs is copied from storage memory to main memory for execution. Since computers have limited main memory (RAM), when all of the RAM is being used (e.g., if there are many programs open simultaneously or if one very large program is in use), a computer with virtual memory enabled will swap data to the HDD and back to memory as needed, thus, in effect, increasing the total system memory.
That's swapping (or paging). It has nothing to do with virtual memory except that most modern operating systems implement swapping using virtual memory. Swapping actually existed before virtual memory.
I think you're probably incorrect about these devices running code directly from flash memory. The read speed of flash is pretty low and RAM is very cheap. My bet is that most of the systems you mention don't run code directly from flash and instead use virtual memory to fault code into RAM as needed.

embedded systems, the term itself has a wide range of applications. you could call a small microcontroller with flash program space measured in kbytes or less and ram measured in either bits or bytes (not enough to be kbytes) an embedded system. Likewise a tivo running a full blown operating system on a pretty much full blown computer motherboard (replace tivo with xbox as another example) as an embedded system. So you need to be less vague about your question. virtual memory has little to do with any of that its applications cross those boundaries.
There are many answers above, David S has the best of course that virtual memory simply means the memory address on one side of the virtual memory boundary is different than the physical address that is used on the other side of that boundary. Where, how, why, etc is there a boundary varies.
A popular use for virtual memory, and I might argue a primary use case is for operating systems. One benefit is that for example all applications could be compiled for the same address space, all applications might be compiled such that from the programs perspective they all start at say address 0x8000, and as far as that program when it runs and accesses memory it accesses stuff based on that address. A combination of the hardware and the operating system change that virtual address that the program is using to a physical address. If the operating system allows for multitasking, then each task might think they are in the same address space but the physical addresses are different for each of those tasks. I wont elaborate further on why using an assumed, fixed address space, is a benefit. Another aspect that operating systems use is memory management. Many MMU's will let you segment the memory however. If a user wants to allocate 100 Megabytes of memory the program may access in its virtual address space that 100 meg as if it were linear and in that address space it is linear, but that 100 meg might be broken down into say 4Kbyte chunks that are scattered all about the physical address space, not always likely but certainly technically possible that no two chunks of that physical memory is next to any other chunk of that 100 meg. your memory management doesnt necessarily have to try to keep large physical chunks of memory available for applications to allocate. Note not all MMUs are exactly the same and 4Kbytes is just an example. A third major benefit from virtual address space to an operating system is protection. If the application is bound to the virtual address space, it is often quite easy to prevent that application from touching the memory of any other application or the operating system. the application in this case would operate/execute at a proection level such that all accesses are considered virtual and have to go through a translation to physical, the tables that are used to define that virtual to physical can contain protection flags. If the application addresses a memory address in its virtual space that it has no business accessing, the hardware can trap that and let the operating system take action as to how to handle it (virtualize some hardware, pop up an error and kill the app, pop up a warning and not kill the app but at the same time feed the app bogus data for their transaction, etc).
There are lots of ways this can be used in an embedded system. first off many embedded systems run operating systems, so all of the above, ease of compiling the program for the address space, relative ease of memory management, and protection of the other applications and operating system and other benefits not mentioned. (virtualization being one, being able to enable/disable instruction/data caching on a block by block basis is another)
The bottom line though is what David S pointed out. virtual memory simply means the virtual address is not necessarily equal to the physical address, it can be but doesnt have to be, there is some boundary, some hardware, usually table driven, that translates the virtual address into a physical address. Lots of reasons why you would want to do this, since some embedded systems are indistinguishable from non-embedded systems any reason that applies to a non-embedded system can apply to an embedded system.
As much as folks may want you to believe that a system has a flat address space, it is often an illusion. In a microcontroller for example you might have multiple flash banks and one or more ram banks. Each of these banks has a physical, generally zero based address. Even if there is no mmu or anything else like that there is a place somewhere between the address bus on the processor and the address bus on the flash or ram memory that decodes the address on the processor and uses that to address into the specific memory bank. Often the lower bits match and upper bits are responsible for the bank choices (this is often the case with an mmu as well) so in that sense the processor is living in a virtual address space. (not limited to microcontrollers, this is generally how processors address busses are treated) With microcontrollers depending on a pin being pulled high or low or some other mechanism you might have a chip feature that allows one flash bank to be used to boot the processor or another. You might tie an input pin high and the processors built in bootloader allows you to access and debug the system for example reprogram the application flash. Or perhaps tie that line low and boot the application flash instead of the vendors debugger/boot flash. some chips get even more complicated letting you boot one flash then the program writes a register somewhere instantly changing the memory architecture moving things around, for example allowing ram to be used for the interrupt vector table so your application can be changed after boot rather than a vector table in flash that is not as easy to change at will.
now when you talk about virtual memory as far as swapping to and from a disk, that is a trick often employed by operating systems to give the illusion of having more ram. I mentioned that above under the category of virtualization. virtual memory in the sense that it isnt really there, I have X bytes but will let the software think there are Y bytes (where Y is larger than X) available. The operating system through the virtual tables used by the hardware, manages which memory chunks are tied to physical ram and are allowed to complete as is by the hardware, or are marked as not available in some way, causing an exception to the operating system, upon inspection the operating system determines that this is a valid address for this application, but the data behind this address has been swapped to disk. The operating system then finds through some algorithm another chunk of ram belonging to whomever (part of the algorithm) and it copies that chunk of ram to disk, marks the table related to that virtual to physical as not valid, then copies the desired chunk from disk to ram, marks that chunk as valid and lets the hardware complete the memory cycle.
Not any different than say how vmware or other virtual machines work. You can execute instructions natively on the hardware using virtual memory until such time as you cause an exception, the virtual machine might think you have an xyz network interface and might have a driver that is accessing a register in that xyz network interface, but the reality is you have no xyz hardware and/or you dont want the virtual machine applications to access that hardware, so you virtualize it, you trap that register access, and using software that simulates the hardware you fake that access and let the program on the virtual machine continue. This obviously not the only way to do virtual machines, but it is one way if the hardware supports it, to let a virtual machine run very fast as a percentage of the time it is actually running instructions on the hardware. The slowest way to virtualize of course is to virtualize everything including the processor, every instruction in that case would be simulated, this is quite slow but has its own features (virtualizing an arm system on an x86 or x86 on an arm, xyz on an abc, fill in the blanks). And if that is the type of virtual memory you are talking about in an embedded system, well if the embedded system is for the most part indistinguishable from a non-embedded system (an xbox or tivo for example) then well for the same reasons you could allow such a thing. If you were on a microcontroller, well the use cases there would generally mean if you needed more memory you would buy a bigger microcontroller, or add more memory to the system ,or change the needs of the application such that it doesnt need as much memory. there may be exceptions, but it mostly depends on your application and requirements, a general purpose or general purpose like system which allows for applications or their data to be larger than the available ram, will require some sort of solution. the microcontroller in your keyless entry key fob thing or in your tv remote control or clock radio or whatever normally would not have a need to allow "applications" to require more resources than are physically there.

The more important benefit of using virtual memory is that every process gets its own address space which is isolated from every other process's. That way virtual memory helps keep faults contained and improves security and stability. I should note that it is still possible for two processes to share a bit of memory, to facilitate communication (shared mem IPC).
Also you can do other tricks like conserving memory via mapping shared parts into more than one process's (libc comes to mind for embedded use) address space but only having it once in physical mem. Also this gives it a speed boost, you can even enhance it further the way linux does cheapen fork/clone by only copying the in kernel descriptors and leaving the memory image alone up until the first write access is done with a similar idea.
As a last benefit, in modern systems, it's common to do file I/O via mapping the file into the process space (cf. mmap for example).

It's interesting to note that one can get some of the benefits of "virtual memory" without needing a full-fledged MMU. The hardware requirements can sometimes be amazingly light. The PIC 16C505 has a 5-bit address space and 40 bytes of RAM; addresses 0x10 to 0x1F can map to either of two groups of 16 bytes of RAM. When writing an application which needed to manage two different data streams, I arranged so that all the variables associated with one data stream would be in the first group of 16 "switchable" memory locations, and those associated with the other would be at the corresponding addresses in the second group. I could then use the same code to manage both data streams. Simply set the banking bit one way, call the routine, set it the other way, and call the routine again.

One of the reasons Virtual Memory exists is so that your device can multitask. It can also act as your RAM does, thus taking the load off of your physical RAM and swapping the load back and forth.

Memory Map for RTOS

I am looking forward to understand, what purpose a memory map serves in embedded system.
How does the function stack differs here, from normal unix system.
Any insights that can help me debug few memory related crashes for embedded system will be helpful.

Embedded systems, especially real-time ones, often have a lot of statically-allocated data, and/or data placed at specific locations in memory. The memory map tells you where these things are, which can be helpful when you run into problems and need to examine the state of the system. For example, you might dump all of memory and then analyze it after the fact; in such a case, the memory map will be rather handy for finding the objects you suspect might be related to the problem.
On the code side, your system might log a hardware exception that points to the address of the instruction where the exception was detected. Looking up the memory locations of functions, combined with a disassembly of the function, can help you analyze such problems.
The details really depend on what kind of embedded system you're building. If you provide more details, people may be able to give better responses.

I am not sure that I understand the question. You seem to be suggesting that a "memory map" is something unique to embedded systems or that it is a tangible software component. It is neither; it is merely a description of the layout of an application's memory usage.
All applications will have a memory map regardless of platform, the difference is that typically on an embedded system the application is linked as a single monolithic entity, so that the resultant memory layout refers to the entire system rather than an individual process as it might in an application on a GPOS platform.
It is the linker and the linker script that determines memory mapping, and your linker will be able to output a map report file that describes the layout and allocation applied. This is true of embedded and desktop applications regardless of OS or architecture.

The memory map for a RTOS is not that much different than the memory map for any computer. It defines which hardware resides at which of the processor's addresses. That hardware may be RAM, ROM, Flash, serial ports, parallel ports, timers, interrupt vectors, or any number of other parts addressable by the processor.
The memory map also describes how you intend to budget for limited resources such as RAM, ROM, or Flash in your system design.
For instance, if there's multiple tasks running, RAM might be mapped so that each task has it's own specific area of RAM allocated to it.
In turn, each tasks's part of RAM would be mapped so that there are specific areas for the stack, another for static variables, and perhaps more again for heap(s).
When you have an operating system on the target, it looks after a lot of this dynamically. However, if your application is the only software on the device, you'll have to manage these decisions yourself, usually at compile/link time. Search "link scripts" for further clues,

The Memory map is a layout of memory of system. It is present in both embedded systems and normal applications. Though it is present in normal applications, it's usage is well appreciated in embedded systems due to system constraints.
Memory map is managed by means of linker scripts or linker command files. It maps resources like Flash or Internal RAM(L1P,L1D,L2,L3) or External RAM(DDR) or ROM or peripherals (ports,serial,parallel,USB etc) or specific device registers or I/O ports with appropriate fixed addresses in the memory space of the system.
In case of embedded systems, based on the memory configuration or constraints of board and performance requirements, the segments like text segment or data segment or BSS can also be placed in the appropriate memory of choice.
There are occasions where various versions of development boards will have different configurations of memory and peripherals. In that case, we may need to edit the linker scripts according to memory configuration and peripherals of the board as an essential check-point in board bring-up.
Memory map can help in defining the shared memory too that can play a key role in multi-threaded applications and also for multi-core applications.
Crashes can be debugged by back tracing the address of crash and mapping it to the memory of the system to get an high level idea of the possible library or object causing the problem.

Which kinds of low level facilities aren't typically supported on multi-core machines?

I'm looking at some optimized, low level, cross platform, concurrency code designed to run on multi-core machines, and want to check some of its assumptions.
Support for hardware optimizations of some kinds aren't, probably, supported on multi core designs (for example, Out of Order Execution support [wikipedia] seems like a good candidate - it takes a lot of surface area to implement, and can be a power hog). Does anyone have a list of other such facilities - ones typically available on single or small number of core machines, but typically left out from machines with larger number of cores on them?

Today, multicore machines are warmed-over die shrinks of uniprocessors. You could almost imagine sawing a 4-core die into 4 1-core dice. I exaggerate only a little bit.
In future, multicore machines will be more thoughtfully designed for energy efficiency and area efficiency. You may see the same ISA, but with different mixes of resources (more or fewer numbers of duplicated functional units), and even with some sharing of resources between cores (e.g. AMD Bulldozer). And, as you say, backing off from the complexity and energy overhead of no-holds-barred out-of-order execution. This will most likely be perceived as different instruction-per-clock (IPC) differences (more or less performance) on the same instruction set architecture.
Also as vendors have to juggle a hypothetical portfolio of big out-of-order serial performance optimized cores and small in-order or less-out-of-order (OoO) and narrower, more energy efficient "throughput" cores, they will be challenged to keep these different implementations in sync with the evolutions of their ISAs. Some cores may support new instructions, new state, new coprocessors, virtualization, security, etc. earlier than others. This leads to a challenge of coding to the common denominator while also lighting up the new facilities for better perf or energy efficiency (or whatever) on those cores that have the new capabilities.
So to answer your specific question, all the traditional computer architecture techniques for trading gates for expressive-power, or performance, or energy efficiency may be rethought and selectively removed in future small throughput-oriented cores.
Hardware multithreading
Aggressive OoO -> humble OoO or even in-order execution
High degrees of microarchitectural speculation
Fancy branch predictors
Big TLBs
Fancy memory prefetchers
Deep pipelines
Wide issue / many copies of functional units
Big caches, wide buses to caches
...
But it goes both ways. It may also be that the new small throughput-optimized energy-optimized cores have new features not present in the older OoO cores. For example, the Larrabee New Instructions (LRBni) (http://www.drdobbs.com/high-performance-computing/216402188) were proposed for a machine with dozens of simpler cores. As another example, the small cores may turn to hardware multithreading to afford better memory latency tolerance to compensate for smaller private caches.
Also, having lots of small energy frugal cores means you may be willing to dedicate and therefore customize some of the cores to optimize performance for particular valuable workloads. For example, the Tensilica custom processors and tools anticipate that some of your small cores will have additional instructions and custom problem-specific datapaths (accelerating an inner loop of video decoding, for example). So in these cases the little core may (counter-intuitively) have much better performance than the much larger core.
Makes sense?
Happy hacking!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas