Determining symbol addresses using binutils/readelf - elf

I am working on a project where our verification test scripts need to locate symbol addresses within the build of software being tested. This might be used for setting breakpoints or reading static data from memory. What I am after is to create a map file containing symbol names, base address in memory, and size. Our build outputs an ELF file which has the information I want. I've been trying to use the readelf, nm, and objdump tools to try and to gain the symbol addresses I need.
I originally tried readelf -s file.elf and that seemed to access some symbols, particularly those which were written in assembler. However, many of the symbols that I wanted were not in there - specifically those that originated within our Ada code.
I used readelf --debug-dump file.elf to dump all debug information. From that I do see all symbols, including those that were in the Ada code. However, the format seems to be in the DWARF format. Does anyone know why these symbols would not be output by readelf when I ask it to list the symbolic information? Perhaps there is simply an option I am missing.
Now I could go to the trouble of writing a custom DWARF parser to get the information but if I can get it using one of the Binutils (nm, readelf, objdump) then I'd really like prefer a standard solution.

DWARF is the debug information and tries to reflect the relation of the original source code. Taking following code as an example
static int one() {
// something
return 1;
}
int main(int ac, char **av) {
return one();
}
After you compile it using gcc -O3 -g, the static function one will be inlined into main. So when you use readelf -s, you will never see the symbol one. However, when you use readelf --debug-dump, you can see one is a function which is inlined.
So, in this example, compiler does not prohibit you use optimization with -g, so you can still debug the executable. In that example, even the function is optimized and inlined, gdb still can use DWARF information to know the function and source/line from current code block inside inlined function.
Above is just a case of compiler optimization. There might be plenty of reasons that could lead to mismatch symbols address between readelf -s and DWARF.

Related

Mismatch between IDs from minidump_stalkwalk and dump_syms

I am trying to use google breakpad, but I am facing a strange issue.
i am working in linux. I have my own library, my_lib.so, which I process with dump_syms and generates this symbol :
$ dump_syms my_lib.so|head -2
MODULE Linux mips 3BB485681467218D36EB2FF02287096C0 my_lib.so
INFO CODE_ID 6885B43B67148D2136EB2FF02287096C
I create the symbols directory with the appropiate subdirectories. I then generate a minidump for the program that uses a stripped version of my_lib.so, but when I try to process it with minidump_stackwalk:
0x77dce000 - 0x77e23fff my_lib.so ??? (WARNING: No symbols, my_lib.so, AC40136B433E5A68F66CCE8C2C2E6C250)
It is seaching for a differente ID, AC40136B433E5A68F66CCE8C2C2E6C250, so it does not find the symbols. Why the mismatch?
Knowing that it searches for AC40136B433E5A68F66CCE8C2C2E6C250 I manually changed the tree directory in symbols, to match that one, just to test. I also changed the id inside the my_lib.so.sym file, and then minidump_stalkwalk does not complain about not finding the symbols, but still I can't see the stack trace.
Any ideas about this mismatch?
by the way, if I run readelf -n over the original library and the stripped one, I get the same GNU BUILD ID.

What is meaning of 'set(CMAKE_REQUIRED_LIBRARIES "m")' in CMake Tutorial?

I am learning CMake with CMake Tutorial and found something which is not clear for me:
include(CheckSymbolExists)
set(CMAKE_REQUIRED_LIBRARIES "m")
So what is the CheckSymbolExists? Is it a function or a lib?
What's meaning of the "m"? Does it mean a lib name or some flag?
I had tried to read through cmake documents, but I just don't understand.
Please somebody help me to understand these.
First, set(CMAKE_REQUIRED_LIBRARIES "m") includes the math library. You do the same on the command-line like this: gcc test.c -lm which includes the library libm.so/.dll
CheckSymbolExists is a CMake Module which provides more functionality. You can include it with include(CheckSymbolExists)
After this you can use the function check_symbol_exists(...) in CMake to check the availability of symbols in header files.
The exact example from the tutorial:
check_symbol_exists(log "math.h" HAVE_LOG) checks if the header file math.h has a symbol (can be a function, constant or whatever) which is called log. If there is one, the CMake Variable HAVE_LOG is set to 1, otherwise set to 0.
The document said, if my understanding is correct, this module will check if a symbol can be correctly linked when it saw a symbol that is not a enum, type or intrinsic.
So in that snippet, when the first runs of check_symbol_exists didn't define the two cache variable, it will check if it had missed an required lib, and retry.

Is the ELF .notes section really needed?

On Linux, I'm trying to strip a statically linked ELF file to the bare essentials. When I run:
strip --strip-unneeded foo
or
strip --strip-all foo
The resulting file still has a fat .notes section that appears to be full of funky strings.
Is the .notes section really needed or can I safely force it out with --remove-section?
Thanks for any help.
From experience and from looking at the man page for strip, it looks like strip isn't supposed to get rid of any and all sections and strings that aren't needed; just symbols. Quoth the man page:
GNU strip discards all symbols from object files objfile.
That being said, from experience, strip, even without --strip-all, removes sections unneeded for loading, such as .symtab and .strtab, and you can, as you note, remove sections you want it with --remove-section.
As an example of a .notes section, I took /bin/ls from my Ubuntu 11.10 64-bit box:
$ readelf -Wn /bin/ls
Notes at offset 0x00000254 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.15
Notes at offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 3e6f3159144281f709c3c5ffd41e376f53b47952
That encompasses the .note.ABI-tag section and the .note.gnu.build-id section. It looks like they contain data that isn't necessary to load the program, but also isn't standard, and isn't known by strip to not be necessary for the proper running of the program, since an ELF can have any number of additional "unknown" sections that aren't safe to remove. So rather using a virtual whitelist (which would fail miserably), it uses a blacklist of sections that it knows it can get rid of, and does so.
Short version: these sections don't seem to be standard and could be used for various things, so strip can't know it's safe to remove them. But based on the info inside the one I took above, if it's your own program, it's almost certainly safe to remove it.

In an ELF file, how does the address for _start get detemined?

I've been reading the ELF specification and cannot figure out where the program entry point and _start address come from.
It seems like they should have to be in a pretty consistent place, but I made a few trivial programs, and _start is always in a different place.
Can anyone clarify?
The _start symbol may be defined in any object file. Normally it is generated automatically (it corresponds to main in C). You can generate it yourself, for instance in an assembler source file:
.globl _start
_start:
// assembly here
When the linker has processed all object files it looks for the _start symbol and puts its value in the e_entry field of the elf header. The loader takes the address from this field and makes a call to it after it has finished loading all sections in memory and is ready to execute the file.
Take a look at the linker script ld is using:
ld -verbose
The format is documented at: https://sourceware.org/binutils/docs-2.25/ld/Scripts.html
It determines basically everything about how the executable will be generated.
On Binutils 2.24 Ubuntu 14.04 64-bit, it contains the line:
ENTRY(_start)
which sets the entry point to the _start symbol (goes to the ELF header as mentioned by ctn)
And then:
. = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
which sets the address of the first headers to 0x400000 + SIZEOF_HEADERS.
I have modified that address to 0x800000, passed my custom script with ld -T and it worked: readelf -s says that _start is at that address.
Another way to change it is to use the -Ttext-segment=0x800000 option.
The reason for using 0x400000 = 4Mb = getconf PAGE_SIZE is to start at the beginning of the second page as asked at: Why is the ELF execution entry point virtual address of the form 0x80xxxxx and not zero 0x0?
A question describes how to set _start from the command line: Why is the ELF entry point 0x8048000 not changeable with the "ld -e" option?
SIZEOF_HEADERS is the size of the ELF + program headers, which are at the beginning of the ELF file. That data gets loaded into the very beginning of the virtual memory space by Linux (TODO why?) In a minimal Linux x86-64 hello world with 2 program headers it is worth 0xb0, so that the _start symbol comes at 0x4000b0.
I'm not sure but try this link http://www.docstoc.com/docs/23942105/UNIX-ELF-File-Format
at page 8 it is shown where the entry point is if it is executable. Basically you need to calculate the offset and you got it.
Make sure to remember the little endianness of x86 ( i guess you use it) and reorder if you read bytewise edit: or maybe not i'm not quit sure about this to be honest.

Does Xcode's objective-c compiler optimize for bit shifts?

Does the objective-c compiler in Xcode know better, or is it faster if I use bit shift for multiplications and divisions by powers of 2?
NSInteger parentIndex = index >> 1; // integer division by 2
Isn't this a bit 1980's? Don't processors run these instructions in the same time these days? I remember back in my 68000 days when a div was 100+ cycles and a shift only 3 or 4... not sure this is the case any more as processors have moved on.
Why don't you get the compiler to generate the assembler file and have a look what it's generating and run some benchmarks.
I found this on the web which may help you... although it's for 'C' I think most of the options will be the same.
Q: How can I peek at the assembly code generated by GCC?
Q: How can I create a file where I can see the C code and its assembly
translation together?
A: Use the -S (note: capital S) switch to GCC, and it will emit the assembly code to a file with a .s extension. For example, the following command:
gcc -O2 -S -c foo.c
will leave the generated assembly code on the file foo.s.
If you want to see the C code together with the assembly it was converted to, use a command line like this:
gcc -c -g -Wa,-a,-ad [other GCC options] foo.c > foo.lst
which will output the combined C/assembly listing to the file foo.lst.
If you need to both get the assembly code and to compile/link the program, you can either give the -save-temps option to GCC (which will leave all the temporary files including the .s file in the current directory), or use the -Wa,aln=foo.s option which instructs the assembler to output the assembly translation of the C code (together with the hex machine code and some additional info) to the file named after the =.