Trace32 command to read symbol contents from ELF file - elf

Problem scenario:
In simple words, do we have a Trace32 command to read symbols (and its contents) from ELF file that was loaded on to target ? We have this special case where application specific debug symbols of the ELF file are made as part of '.noload' section in ELF, which means the symbols/contents are present part of the ELF file (available when read using readelf -a xxxx.elf_file_name) but are not part of the final binary image generated i.e. the '.noload' section in ELF file is stripped away when generating xxx.bin which is flashed to target memory.
Debug symbols in '.noload' section are statically assigned values and these values do not change during runtime.
When I tried to read the debug symbols part of the '.noload' section (after compiling into binary and loading onto Trace32), I see 'MMU fail' flagged on trace32 popup window which means trace32 is trying to read symbol contents from memory but is not accessible, since symbols part of the '.noload' section was not loaded at all though they have addresses mapped.
Any inputs:
- I need help with a trace32 command that can directly read symbol content from ELF file than from target memory.
- Also not sure if I can use 'readelf ' in practice scripts ? Any help in this direction if we do not have any solution for above query ?

Use command
Data.LOAD.Elf myfile.elf [<optional address offset>] /NoCODE
The option /NoCODE instructs TRACE32 to only load the debgug symbols from your ELF but not to load any code to your target. You can than view the symbols with command sYmbol.Browse.
However if you use TRACE32 to load your application to your target, you don't have to create a binary from you ELF first. With TRACE32 you can also load the PROGBITS sections of your ELF directly to your target.
In this case you would simply use the Data.LOAD.Elf command without the /NoCODE option (after enabling flash programming).
Since you are using an MMU you might want to activate logical memory space IDs with command SYStem.Option.MMUSPACES ON. Then load your symbols with
Data.LOAD.Elf myfile.elf <space-ID>:<offset> /NoCODE
where 'space-ID' matches with the space-ID used by you MMU for the Task and 'offset' is usually zero.
If you are debugging your application on an embedded Linux than you should use the TRACE32 OS awareness for Linux and the Linux symbol auto-loader to load the symbols to the correct addresses for you.
I don't think there is any reason why you should use 'readelf' from within TRACE32. Anyway you can invoke any command line program with commands OS.Area or OS.Command.

Related

Change where in elf file code execution starts

I want to change where in the elf file execution starts. For example I have a basic hello world program in a elf file. The actual code is located at an offset of 0x1000 bytes into the file. I want to move that code to, lets say, a 0x900 offset and modify the file so that it starts executing at 0x900. I know this sounds kinda useless but it does serve a purpose.
First you compile/assemble (clang/as/...) your program into a hello.o ELF object file. At this point, you would normally let the compiler driver finish the job and emit an ELF executable.
You can instead use the linker (lld/ld/...) and specify the entry point with --entry 0x900. You can also do this with a linker script. Note that if you do this, you have to handle a bunch of stuff that the compiler driver normally handles for you. The warning from the Oracle linker manual says:
When you invoke the link-editor directly, you have to supply every
object file and library required to create the intended output. The
link-editor makes no assumptions about the object modules or libraries
that you meant to use in creating the output.

Elf representation in HEX

I am working on understanding some ground concepts in embedded Systems. My question is similar to understand hexedit of an elf .
In order to burn compiler output to ROM, the .out file is converted to HEX (say intel-hex). I wonder how the following informations are preserved in HEX format:
Section header
Symbol tables, debug symbols, linker symbols etc.
Elf header.
If these are preserved in HEX file, how they can be read from hex file?
A bit out question but how the microcontroller on boot knows where .data .bss etc. exists in HEX and to be coppied to RAM?
None of that is preserved. A HEX file only contains the raw program and data. https://en.wikipedia.org/wiki/Intel_HEX
The microcontroller does not know where .data and .bss are located - it doesn't even know that they exist. The start-up code which is executed before main() is called contains the start addresses of those sections - the addresses are hard-coded into the program. This start-up code will be in the HEX file like everything else.
The elements in points 1 to 3 are not included in the raw binary since they serve no purpose in the application; rather they are used by the linker and the debugger on the development host, and are unnecessary for program execution where all you need is the byte values and the address to write them to, which is more or less all the hex file contains (may also contain a start address record).
Systems that have dynamic linking or self-hosted debug capabilities (such as VxWorks for example) use the object file file.
With respect to point 5, the microcontroller does not need to know; the linker uses that information when resolving absolute and relative addresses in the object code. Once filly resolved (linked), the addresses are embedded in the code directly. Again where dynamic loading/linking is used the object file meta-data is required and such systems do not normally load a raw hex file or binary.

When is ELFCLASSNONE used in an ELF file?

I am learning about ELF. The file class can be one of ELFCLASS32, ELFCLASS64 or ELFCLASSNONE.
However, I cannot find any example usage of ELFCLASSNONE.
What is it used for ? And when ? Is it actually used anywhere ?
Is it actually used anywhere ?
No.
(It's only used to detect invalid ELF files.)
Used where?
Anywhere validity of the ELF file is verified. Here is an example from the Linux kernel tools.
even there, ELFCLASSNONE is not used.
You don't know what parts of the ELF header readelf examined before it concluded that .bashrc is not an ELF file. It may have looked at ei_ident[EI_CLASS] and compared the value with ELFCLASSNONE (though likely it didn't).
If you make a copy of e.g. /bin/date, and write a 0 byte into 5th byte of the copy (EI_IDENT == 4) to corrupt it, then run readelf -h on that copy, you'll probably get an "invalid ELF class" or similar message.

What is the difference in byte code like Java bytecode and files and machine code executables like ELF?

What are the differences between the byte code binary executables such as Java class files, Parrot bytecode files or CLR files and machine code executables such as ELF, Mach-O and PE.
what are the distinctive differences between the two?
such as the .text area in the ELF structure is equal to what part of the class file?
or they all have headers but the ELF and PE headers contain Architecture but the Class file does not
Java Class File
Elf file
PE File
Byte code is, as imulsion noted, an intermediate step, right before compilation into machine code. Because the last step is left to load time (and often runtime, as is the case with Just-In-Time (JIT) compilation, byte code is architecture independent: The runtime (CLR for .net or JVM for Java) is responsible for mapping the byte code opcodes to their underlying machine code representation.
By comparison, native code (Windows: PE, PE32+, OS X/iOS: Mach-O, Linux/Android/etc: ELF) is compiled code, suited for a particular architecture (Android/iOS: ARM, most else: Intel 32-bit (i386) or 64-bit). These are all very similar, but still require sections (or, in Mach-O parlance "Load Commands") to set up the memory structure of the executable as it becomes a process (Old DOS supported the ".com" format which was a raw memory image). In all the above, you can say , roughly, the following:
Sections with a "." are created by the compiler, and are "default" or expected to have default behavior
The executable has the main code section, usually called "text" or ".text". This is native code, which can run on the specific architecture
Strings are stored in a separate section. These are used for hard-coded output (what you print out) as well as symbol names.
Symbols - which are what the linker uses to put together the executable with its libraries (Windows: DLLs, Linux/Android: Shared Objects, OS X/iOS: .dylibs or frameworks) are stored in a separate section. Usually there is also a "PLT" (Procedure Linkage Table) which enables the compiler to simply put in stubs to the functions you call (printf, open, etc), that the linker can connect when the executable loads.
Import table (in Windows parlance.. In ELF this is a DYNAMIC section, in OS X this is a LC_LOAD_LIBRARY command) is used to declare additional libraries. If those aren't found when the executable is loaded, the load fails, and you can't run it.
Export table (for libraries/dylibs/etc) are the symbols which the library (or in Windows, even an .exe) can export so as to have others link with.
Constants are usually in what you see as the ".rodata".
Hope this helps. Really, your question was vague..
TG
Byte code is a 'halfway' step. So the Java compiler (javac) will turn the source code into byte code. Machine code is the next step, where the computer takes the byte code, turns it into machine code (which can be read by the computer) and then executes your program by reading the machine code. Computers cannot read source code directly, likewise compilers cannot translate immediately into machine code. You need a halfway step to make programs work.
Note that ELF binaries don't necessarily need to be machine/arch specific per se.
The interesting piece is the "interpreter" header field: it holds a path name to a loader program that's executed instead of the actual binary. This one then is responsible for loading the actual program, loading and linking libraries, etc. This is the way how eg. ld.so comes in.
Theoretically one could create an ELF binary that holds java bytecode (or a complete jar). This just needs some appropriate "interpreter" program which starts up a JVM and loads the code from the binary into it.
Not sure whether this actually has been done before, but certainly possible.
The same can be done w/ quite any non-native code.
It also could serve for direct multiarch support via some VM like qemu:
Let the target platform (libc+linker scripts) put the arch name into the interpreter program name (eg. /lib/ld.so.x86_64, /lib/ld.so.armhf, ...).
Then, on a particular arch (eg. x86_64), the one with native arch name will point to the original ld.so, while the others point to some special one that calls up something like qemu-system-XXX.

.h generated from .h.in?

There are struct definitions in the .h file that my library creates after I build it.. but I cannot find these in the corresponding .h.in. Can somebody tell me how all this works and where it gets the extra info from?
To be specific: I am building pth, the userspace threading library. It has pth_p.h.in, which doesn't contain the struct definition I am looking for, yet when I build the library, a pth_p.h appears and it has the definition I need.
In fact, I have searched every single file in the library before it is built and cannot find where it is generating the struct definition.
Pth uses GNU Autoconf, Automake, and Libtool. By running ./configure you'll be running a shell script which eventually runs m4 to detect the presence of a whole bunch of different system attributes and make changes to a number of files.
It looks like it boils down to ./configure generating Makefile from Makefile.in and then running something via make that triggers the shtool subcommand scpp:
pth_p.h: $(S)pth_p.h.in
$(SHTOOL) scpp -o pth_p.h -t $(S)pth_p.h.in -Dcpp -Cintern -M '==#==' $(HSRCS)
Obscure link, but here's an shtool-scpp manpage, which describes it as:
This command is an additional ANSI C
source file pre-processor for sharing
cpp(1) code segments, internal
variables and internal functions. The
intention for this comes from writing
libraries in ANSI C. Here a common
shared internal header file is usually
used for sharing information between
the library source files.
The operation is to parse special
constructs in files, generate a few
things out of these constructs and
insert them at position mark in tfile
by writing the output to ofile.
Additionally the files are never
touched or modified. Instead the
constructs are removed later by the
cpp(1) phase of the build process. The
only prerequisite is that every file
has a ``"#include ""ofile"""'' at the
top.
.h.in is probably processed within a configure (generated from configure.ac) script, look out for
AC_CONFIG_FILES([thatfile.h])
It replaces variables of the form #VAR# in the .in file with their values.
Edit: Just noticed if I'm right you should retag your question