Basic ELF file toplogy - elf

I'm writing the structure of an Elf file, a sort of summary or topology. For a little report, I have, and I'm posting here because I am a bit confused. Have I understood it correctly?
To explain a bit what Im thinking
An ELF file consist of two parts
1: ELF header
2: File data
The file data consist of 3 parts, The program header, the section header and Data, now what confuses me is, is the arrows supposed to be like this? A file data consist of 3 parts, but after my understanding this is the process.
source: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#/media/File:Elf-layout--en.svg
] the figure Im trying to make, to make it easier to understand of the WHOLE ELF file rather than the only as it shown on wikipedia.

An ELF file consist of two parts 1: ELF header 2: File data
In my opinion it's wrong to think about ELF as having only two parts. An ELF file is structured, with many optional parts.
You can think of it as a folder with table of contents (ELF header) describing other subsections (section headers, program headers) which in turn describe file data. Many parts of this folder can be removed without invalidating the entire file, and some parts can't be present for some types of files, while others must be present.
For example, an ET_EXEC file must contain program headers, and may contain section headers (section headers can be removed with the strip command).
In your picture, I would call "File Data" what you called "Data". The other parts: program headers, section headers are all "ELF metadata", not very different from the ELF header.

Related

How to inject data in a .bin file in a post compilation script?

Purpose
I want my build system to produce one binary file that includes:
The bootloader
The application binary
The application header (for the bootloader)
Here's a small overview of the memory layout (nothing out of the ordinary here)
The build system already concatenates the bootloader and the application in a post-compilation script.
In other words, only the header is missing.
Problem
What's the best way to generate and inject the application header in the memory?
Possible solutions
Create a .bin file just for the header and use cat to inject it in my final binary
Use linker file to hardcode the header (is this possible?)
Use a script to read the final binary and hardcode the header
Other?
What is the best solution for injecting data in memory in a post compilation script?
SRecord is a great tool for doing all kinds of manipulation on binary and other file types used for embedded code images.
In this case, given a binary bootheader.bin to insert at offset 0x8000 in image.bin:
srec_cat bootheader.bin −binary −offset 0x8000 −o image.bin
The tool is somewhat arcane, but the documentaton includes numerous examples covering various common tasks.

Elf representation in HEX

I am working on understanding some ground concepts in embedded Systems. My question is similar to understand hexedit of an elf .
In order to burn compiler output to ROM, the .out file is converted to HEX (say intel-hex). I wonder how the following informations are preserved in HEX format:
Section header
Symbol tables, debug symbols, linker symbols etc.
Elf header.
If these are preserved in HEX file, how they can be read from hex file?
A bit out question but how the microcontroller on boot knows where .data .bss etc. exists in HEX and to be coppied to RAM?
None of that is preserved. A HEX file only contains the raw program and data. https://en.wikipedia.org/wiki/Intel_HEX
The microcontroller does not know where .data and .bss are located - it doesn't even know that they exist. The start-up code which is executed before main() is called contains the start addresses of those sections - the addresses are hard-coded into the program. This start-up code will be in the HEX file like everything else.
The elements in points 1 to 3 are not included in the raw binary since they serve no purpose in the application; rather they are used by the linker and the debugger on the development host, and are unnecessary for program execution where all you need is the byte values and the address to write them to, which is more or less all the hex file contains (may also contain a start address record).
Systems that have dynamic linking or self-hosted debug capabilities (such as VxWorks for example) use the object file file.
With respect to point 5, the microcontroller does not need to know; the linker uses that information when resolving absolute and relative addresses in the object code. Once filly resolved (linked), the addresses are embedded in the code directly. Again where dynamic loading/linking is used the object file meta-data is required and such systems do not normally load a raw hex file or binary.

Possible to add executable section/segment to ELF binary?

It's easy to add an empty section with objcopy --add-section.
However, I'd like the added section can be loaded as normal .text section and executable. Which means need modify segment header. Any suggestion?
Possible to add executable section/segment to ELF binary?
It's possible in theory, but not in practice: ELF files have complicated internal structures, which would all need to be rebuilt.
Which means need modify segment header
Modifying Phdr table is quite easy: it's just a fixed table. But you'll have to move other segments, and update all internal offsets that point into them, which is the hard part.

ELF files - What is a section and why do we need it?

I've been reading ELF standard here. From what I understand, each ELF contains ELF header, program headers (why more than one?) and section headers. Can anyone please explain:
How are ELF files generated? is it the compiler responsibility?
What are sections and why do we need them?
What are program headers and why do we need them?
Inside program headers, what's the meaning of the fields p_vaddr and p_paddr?
Does each section have it's own section header?
Alternatively, does any one have a link to a more friendly documenation of ELF?
How are ELF files generated? is it the compiler responsibility?
They can be generated by a compiler, an assembler, or any other tool that can generate them. Even your own program you wrote for generating ELF files ;) They're just streams of bytes after all, so they can be generated by just writing bytes into a file in binary mode. You can do that too.
What are sections and why do we need them?
ELF files are subdivided into sections. Sections are the smallest continuous regions in the file. You can think of them as pages in an organizer, each with its own name and type that describes what does it contain inside. Linkers use this information to combine different parts of the program coming from different modules into one executable file or a library, by merging sections of the same type (gluing pages together, if you will).
In executable files, sections are optional, but they're usually there to describe what's in the file and where does it begin, and how much bytes does it take.
What are program headers and why do we need them?
They're mostly for making executable files. In order to run a program, sections aren't enough, because you have to specify not only what's there in the file, but also where should it be loaded into memory in the running process. Program headers are just for that purpose: they describe segments, which are regions of memory in the running process, with different access privileges & stuff.
Each program header describes one segment. It tells the loader where should it load a certain region in the file into memory and what permissions should it set for that region (e.g. should it be allowed to execute code from it? should it be writable or just for reading?)
Segments can be further subdivided into sections. For example, if you have to specify that your code segment is further subdivided into code and static read-only strings for the messages the program displays. Or that your data segment is subdivided into funky data and hardcore data :J It's for you to decide.
In executable files, sections are optional, but it's nice to have them, because they describe what's in the file and allow for dumping selected parts of it (e.g. with the objdump tool). Sometimes they are needed, though, for storing dynamic linking information, symbol tables, debugging information, stuff like that.
Inside program headers, what's the meaning of the fields p_vaddr and p_paddr?
Those are the addresses at which the data in the file will be loaded. They map the contents of the file into their corresponding memory locations. The first one is a virtual address, the second one is physical address.
Physical addresses are the "raw" memory addresses. On modern operating systems, those are no longer used in the userland. Instead, userland programs use virtual addresses. The operating system deceives the userland program that it is alone in memory, and that the entire address space is available for it. Under the hood, the operating system maps those virtual addresses to physical ones in the actual memory, and it does it transparently to the program.
Of course, not every address in the virtual address space is available at the same time. There are limitations imposed by the actual physical memory available. So the operating system just maps the memory for the segments the program actually uses (here's where the "segments" part from the ELF file's program headers comes into play). If the process tries to access some unmapped memory, the operating system steps in and says, "sorry, chap, this memory doesn't belong to you". (The program can address it, but it cannot access it.)
Does each section have it's own section header?
Yes. If it doesn't have an entry in the Section Headers Table, it's not a section :q Because they only way to tell if some part of the file is a section, is by looking in to the Section Headers Table which tells you what sections are defined in the file and where you can find them.
You can think of the Section Headers Table as a table of contents in a book. Without the table of contents, there aren't any chapters after all, because they're not listed anywhere. The book may have headings, but the content is not subdivided into logical chapters that can be found through the table of contents. Same goes with sections in ELF files: there can be some regions of data, but you can't tell without the "table of contents" which is the SHT.
This link includes a better explaination.
How are ELF files generated? is it the compiler responsibility?*
It is architecture dependent.
What are sections and why do we need them?
Different section have different information such as code, initialized data, uninitialized data etc. These information will be used by the compiler and linker.
What are program headers and why do we need them?
Program headers are used by the operating system when it loads the executable. These headers contains information about the segments (contiguous memory block with some permissions) such as which parts needs to be loaded, interpreter infor etc.
Inside program headers, what's the meaning of the fields p_vaddr and p_paddr?
In general virtual address and the physical address are same. But could be different depends on the system.
Does each section have it's own section header?
yes. Each section have a section header entry at section header table.
This is the best documentation I've found: http://www.skyfree.org/linux/references/ELF_Format.pdf
Each section has only one section header, but there can be section headers without sections
2 - There are many different sections, ex: relocation section recoeds many infomation for relocation symbol. I use the infomation to load a elf object and run/relocate the object.
Antoher example: debug section records debug information, gdb use the data for showing debug message.
Symbol section records symbol information.
3 - programming header used by loader, loader loads a elf execute file by looking up programming header.

View/Edit File Headers

I'm conducting reserach on malformed archives and this one exploit ( CVE-2012-1459 ) talkes about manipulating the header information of the archive. I have a rough understanding of all of this and would like to clear it up.
Every file has header information that contains byte length and other simular stuff?
If the above is incorrect. What is a header?
How is it possible to view/edit this information?
( Any information about headers is helpfull. Information online is not clear )
Many file formats have headers. If you wanted to look at the headers for a specific file, the best option would be to grab a thorough documentation of the format, a hex editor, a calculator that does hex to dec and vice versa conversions, and a notepad for sketching stuff on. Oftentimes, they are fairly in depth and have many levels of indirection, and understanding exactly how they fit together is fairly daunting. Some files may have many nested headers. Archives are an example: they typically have one header with information about the entire archive, and headers inside, one for each file, with details about the contents.