Elf representation in HEX

Elf representation in HEX - embedded

I am working on understanding some ground concepts in embedded Systems. My question is similar to understand hexedit of an elf .
In order to burn compiler output to ROM, the .out file is converted to HEX (say intel-hex). I wonder how the following informations are preserved in HEX format:
Section header
Symbol tables, debug symbols, linker symbols etc.
Elf header.
If these are preserved in HEX file, how they can be read from hex file?
A bit out question but how the microcontroller on boot knows where .data .bss etc. exists in HEX and to be coppied to RAM?

None of that is preserved. A HEX file only contains the raw program and data. https://en.wikipedia.org/wiki/Intel_HEX
The microcontroller does not know where .data and .bss are located - it doesn't even know that they exist. The start-up code which is executed before main() is called contains the start addresses of those sections - the addresses are hard-coded into the program. This start-up code will be in the HEX file like everything else.

The elements in points 1 to 3 are not included in the raw binary since they serve no purpose in the application; rather they are used by the linker and the debugger on the development host, and are unnecessary for program execution where all you need is the byte values and the address to write them to, which is more or less all the hex file contains (may also contain a start address record).
Systems that have dynamic linking or self-hosted debug capabilities (such as VxWorks for example) use the object file file.
With respect to point 5, the microcontroller does not need to know; the linker uses that information when resolving absolute and relative addresses in the object code. Once filly resolved (linked), the addresses are embedded in the code directly. Again where dynamic loading/linking is used the object file meta-data is required and such systems do not normally load a raw hex file or binary.

Related

use program or section headers to load an ELF

I'm writing an EFI application that loads an ELF into memory and jumps to it, but I don't know what header I should analyse first (program or section header). I have a function that reads the program headers to load the ELF into memory (which works) and a function that reads the section headers to load the ELF into memory (which also works).

The program loader should look at the program header only. The section headers are for tools such as debuggers. I don't think this is spelled out explicitly in the original ELF specification or the System V ABI specification, but it is very much implied:
System V Application Binary Interface
Even today, when new features are defined which are used by the dynamic linker, references are added the dynamic to the dynamic section, even though in theory, the information could also be obtained from the section header (but there are probably some exceptions for certain architectures).

Trace32 command to read symbol contents from ELF file

Problem scenario:
In simple words, do we have a Trace32 command to read symbols (and its contents) from ELF file that was loaded on to target ? We have this special case where application specific debug symbols of the ELF file are made as part of '.noload' section in ELF, which means the symbols/contents are present part of the ELF file (available when read using readelf -a xxxx.elf_file_name) but are not part of the final binary image generated i.e. the '.noload' section in ELF file is stripped away when generating xxx.bin which is flashed to target memory.
Debug symbols in '.noload' section are statically assigned values and these values do not change during runtime.
When I tried to read the debug symbols part of the '.noload' section (after compiling into binary and loading onto Trace32), I see 'MMU fail' flagged on trace32 popup window which means trace32 is trying to read symbol contents from memory but is not accessible, since symbols part of the '.noload' section was not loaded at all though they have addresses mapped.
Any inputs:
- I need help with a trace32 command that can directly read symbol content from ELF file than from target memory.
- Also not sure if I can use 'readelf ' in practice scripts ? Any help in this direction if we do not have any solution for above query ?

Use command
Data.LOAD.Elf myfile.elf [<optional address offset>] /NoCODE
The option /NoCODE instructs TRACE32 to only load the debgug symbols from your ELF but not to load any code to your target. You can than view the symbols with command sYmbol.Browse.
However if you use TRACE32 to load your application to your target, you don't have to create a binary from you ELF first. With TRACE32 you can also load the PROGBITS sections of your ELF directly to your target.
In this case you would simply use the Data.LOAD.Elf command without the /NoCODE option (after enabling flash programming).
Since you are using an MMU you might want to activate logical memory space IDs with command SYStem.Option.MMUSPACES ON. Then load your symbols with
Data.LOAD.Elf myfile.elf <space-ID>:<offset> /NoCODE
where 'space-ID' matches with the space-ID used by you MMU for the Task and 'offset' is usually zero.
If you are debugging your application on an embedded Linux than you should use the TRACE32 OS awareness for Linux and the Linux symbol auto-loader to load the symbols to the correct addresses for you.
I don't think there is any reason why you should use 'readelf' from within TRACE32. Anyway you can invoke any command line program with commands OS.Area or OS.Command.

ELF files - What is a section and why do we need it?

I've been reading ELF standard here. From what I understand, each ELF contains ELF header, program headers (why more than one?) and section headers. Can anyone please explain:
How are ELF files generated? is it the compiler responsibility?
What are sections and why do we need them?
What are program headers and why do we need them?
Inside program headers, what's the meaning of the fields p_vaddr and p_paddr?
Does each section have it's own section header?
Alternatively, does any one have a link to a more friendly documenation of ELF?

How are ELF files generated? is it the compiler responsibility?
They can be generated by a compiler, an assembler, or any other tool that can generate them. Even your own program you wrote for generating ELF files ;) They're just streams of bytes after all, so they can be generated by just writing bytes into a file in binary mode. You can do that too.
What are sections and why do we need them?
ELF files are subdivided into sections. Sections are the smallest continuous regions in the file. You can think of them as pages in an organizer, each with its own name and type that describes what does it contain inside. Linkers use this information to combine different parts of the program coming from different modules into one executable file or a library, by merging sections of the same type (gluing pages together, if you will).
In executable files, sections are optional, but they're usually there to describe what's in the file and where does it begin, and how much bytes does it take.
What are program headers and why do we need them?
They're mostly for making executable files. In order to run a program, sections aren't enough, because you have to specify not only what's there in the file, but also where should it be loaded into memory in the running process. Program headers are just for that purpose: they describe segments, which are regions of memory in the running process, with different access privileges & stuff.
Each program header describes one segment. It tells the loader where should it load a certain region in the file into memory and what permissions should it set for that region (e.g. should it be allowed to execute code from it? should it be writable or just for reading?)
Segments can be further subdivided into sections. For example, if you have to specify that your code segment is further subdivided into code and static read-only strings for the messages the program displays. Or that your data segment is subdivided into funky data and hardcore data :J It's for you to decide.
In executable files, sections are optional, but it's nice to have them, because they describe what's in the file and allow for dumping selected parts of it (e.g. with the objdump tool). Sometimes they are needed, though, for storing dynamic linking information, symbol tables, debugging information, stuff like that.
Inside program headers, what's the meaning of the fields p_vaddr and p_paddr?
Those are the addresses at which the data in the file will be loaded. They map the contents of the file into their corresponding memory locations. The first one is a virtual address, the second one is physical address.
Physical addresses are the "raw" memory addresses. On modern operating systems, those are no longer used in the userland. Instead, userland programs use virtual addresses. The operating system deceives the userland program that it is alone in memory, and that the entire address space is available for it. Under the hood, the operating system maps those virtual addresses to physical ones in the actual memory, and it does it transparently to the program.
Of course, not every address in the virtual address space is available at the same time. There are limitations imposed by the actual physical memory available. So the operating system just maps the memory for the segments the program actually uses (here's where the "segments" part from the ELF file's program headers comes into play). If the process tries to access some unmapped memory, the operating system steps in and says, "sorry, chap, this memory doesn't belong to you". (The program can address it, but it cannot access it.)
Does each section have it's own section header?
Yes. If it doesn't have an entry in the Section Headers Table, it's not a section :q Because they only way to tell if some part of the file is a section, is by looking in to the Section Headers Table which tells you what sections are defined in the file and where you can find them.
You can think of the Section Headers Table as a table of contents in a book. Without the table of contents, there aren't any chapters after all, because they're not listed anywhere. The book may have headings, but the content is not subdivided into logical chapters that can be found through the table of contents. Same goes with sections in ELF files: there can be some regions of data, but you can't tell without the "table of contents" which is the SHT.

This link includes a better explaination.
How are ELF files generated? is it the compiler responsibility?*
It is architecture dependent.
What are sections and why do we need them?
Different section have different information such as code, initialized data, uninitialized data etc. These information will be used by the compiler and linker.
What are program headers and why do we need them?
Program headers are used by the operating system when it loads the executable. These headers contains information about the segments (contiguous memory block with some permissions) such as which parts needs to be loaded, interpreter infor etc.
Inside program headers, what's the meaning of the fields p_vaddr and p_paddr?
In general virtual address and the physical address are same. But could be different depends on the system.
Does each section have it's own section header?
yes. Each section have a section header entry at section header table.

This is the best documentation I've found: http://www.skyfree.org/linux/references/ELF_Format.pdf
Each section has only one section header, but there can be section headers without sections

2 - There are many different sections, ex: relocation section recoeds many infomation for relocation symbol. I use the infomation to load a elf object and run/relocate the object.
Antoher example: debug section records debug information, gdb use the data for showing debug message.
Symbol section records symbol information.
3 - programming header used by loader, loader loads a elf execute file by looking up programming header.

What is the difference in byte code like Java bytecode and files and machine code executables like ELF?

What are the differences between the byte code binary executables such as Java class files, Parrot bytecode files or CLR files and machine code executables such as ELF, Mach-O and PE.
what are the distinctive differences between the two?
such as the .text area in the ELF structure is equal to what part of the class file?
or they all have headers but the ELF and PE headers contain Architecture but the Class file does not
Java Class File
Elf file
PE File

Byte code is, as imulsion noted, an intermediate step, right before compilation into machine code. Because the last step is left to load time (and often runtime, as is the case with Just-In-Time (JIT) compilation, byte code is architecture independent: The runtime (CLR for .net or JVM for Java) is responsible for mapping the byte code opcodes to their underlying machine code representation.
By comparison, native code (Windows: PE, PE32+, OS X/iOS: Mach-O, Linux/Android/etc: ELF) is compiled code, suited for a particular architecture (Android/iOS: ARM, most else: Intel 32-bit (i386) or 64-bit). These are all very similar, but still require sections (or, in Mach-O parlance "Load Commands") to set up the memory structure of the executable as it becomes a process (Old DOS supported the ".com" format which was a raw memory image). In all the above, you can say , roughly, the following:
Sections with a "." are created by the compiler, and are "default" or expected to have default behavior
The executable has the main code section, usually called "text" or ".text". This is native code, which can run on the specific architecture
Strings are stored in a separate section. These are used for hard-coded output (what you print out) as well as symbol names.
Symbols - which are what the linker uses to put together the executable with its libraries (Windows: DLLs, Linux/Android: Shared Objects, OS X/iOS: .dylibs or frameworks) are stored in a separate section. Usually there is also a "PLT" (Procedure Linkage Table) which enables the compiler to simply put in stubs to the functions you call (printf, open, etc), that the linker can connect when the executable loads.
Import table (in Windows parlance.. In ELF this is a DYNAMIC section, in OS X this is a LC_LOAD_LIBRARY command) is used to declare additional libraries. If those aren't found when the executable is loaded, the load fails, and you can't run it.
Export table (for libraries/dylibs/etc) are the symbols which the library (or in Windows, even an .exe) can export so as to have others link with.
Constants are usually in what you see as the ".rodata".
Hope this helps. Really, your question was vague..
TG

Byte code is a 'halfway' step. So the Java compiler (javac) will turn the source code into byte code. Machine code is the next step, where the computer takes the byte code, turns it into machine code (which can be read by the computer) and then executes your program by reading the machine code. Computers cannot read source code directly, likewise compilers cannot translate immediately into machine code. You need a halfway step to make programs work.

Note that ELF binaries don't necessarily need to be machine/arch specific per se.
The interesting piece is the "interpreter" header field: it holds a path name to a loader program that's executed instead of the actual binary. This one then is responsible for loading the actual program, loading and linking libraries, etc. This is the way how eg. ld.so comes in.
Theoretically one could create an ELF binary that holds java bytecode (or a complete jar). This just needs some appropriate "interpreter" program which starts up a JVM and loads the code from the binary into it.
Not sure whether this actually has been done before, but certainly possible.
The same can be done w/ quite any non-native code.
It also could serve for direct multiarch support via some VM like qemu:
Let the target platform (libc+linker scripts) put the arch name into the interpreter program name (eg. /lib/ld.so.x86_64, /lib/ld.so.armhf, ...).
Then, on a particular arch (eg. x86_64), the one with native arch name will point to the original ld.so, while the others point to some special one that calls up something like qemu-system-XXX.

Changing embedded serial number

I have a Serial number string "1080910" embedded in a programmable device which has been downloaded to a binary file using the ALL-100 programmer. This is my Master file as it were. I need to change this serial number to that of the unit that I need to re-flash using the Master file - the ALL-100 programmer uses XACCESS User Interface which has Edit feature showing Address location, Hex data field and Ascii field. Somewhere in this file is the serial number string - can anybody assist me in how to locate and edit the serial number string as I have been unable to locate it using the search function and have not been able to visually pick up the sequence of numbers. Help !!!

If the data has a symbolic address in the source code, and is not a local variable, its address will appear in the map file generated by the linker. If it is a local variable initialised with a literal constant, then the data will exist in the static initialisation data the location of which should also be identified in the map file.
Another possibility is that your application image is compressed and the start-up code expands it into RAM at run-time. This will be obvious in the map file if the data and code addresses are in RAM rather than ROM. If this is the case then what you are attempting will be very difficult. You would have to know the compression algorithm used, and which part of the image is the commpressed part (part of it will be the decompression code that runs from ROM). You would then have to decompress the image, modify the string, and then recompress it. Further, if the decompression performs any kind of checksum on the compressed or decompressed data, you will have to recalculate and modify that too.
If this was a requirement from the outset, you would have done better to reserve the space in the linker script or use compiler specific extensions to absolutely locate the data at a specific location.

Maybe it is stored in Unicode, so alternate chars are 00.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas