Is the ELF .notes section really needed? - elf

On Linux, I'm trying to strip a statically linked ELF file to the bare essentials. When I run:
strip --strip-unneeded foo
or
strip --strip-all foo
The resulting file still has a fat .notes section that appears to be full of funky strings.
Is the .notes section really needed or can I safely force it out with --remove-section?
Thanks for any help.

From experience and from looking at the man page for strip, it looks like strip isn't supposed to get rid of any and all sections and strings that aren't needed; just symbols. Quoth the man page:
GNU strip discards all symbols from object files objfile.
That being said, from experience, strip, even without --strip-all, removes sections unneeded for loading, such as .symtab and .strtab, and you can, as you note, remove sections you want it with --remove-section.
As an example of a .notes section, I took /bin/ls from my Ubuntu 11.10 64-bit box:
$ readelf -Wn /bin/ls
Notes at offset 0x00000254 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.15
Notes at offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 3e6f3159144281f709c3c5ffd41e376f53b47952
That encompasses the .note.ABI-tag section and the .note.gnu.build-id section. It looks like they contain data that isn't necessary to load the program, but also isn't standard, and isn't known by strip to not be necessary for the proper running of the program, since an ELF can have any number of additional "unknown" sections that aren't safe to remove. So rather using a virtual whitelist (which would fail miserably), it uses a blacklist of sections that it knows it can get rid of, and does so.
Short version: these sections don't seem to be standard and could be used for various things, so strip can't know it's safe to remove them. But based on the info inside the one I took above, if it's your own program, it's almost certainly safe to remove it.

Related

modify build-id in the notes section of the elf file

I need to modify a build-id in the notes section of the ELF file. I see there are plenty of tools to read elf but not to modify them. I found elfedit but it doesn't seem to do what I need. Is it even possible?
Here is the output of readelf
$ readelf -n myelffile
Displaying notes found in: .note.ABI-tag
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 3.14.0
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: d75a086c288c582036b0562908304bc3a8033235
I'm trying to modify .note.gnu.build-id section.
Is it even possible?
Yes. This is one of the easier modifications, since the data in the note is completely arbitrary, and no other data refer to it.
All you have to do is find the .note section, decode each note in turn until you find the one with NT_GNU_BUILD_ID type, and overwrite its data with same-length bytes of your choosing.
Are you aware of the linker --build-id 0x.... option which allows you to put in whatever hex data you desire at link time? If you can relink your binary, then you wouldn't need to modify the build-id note, as the linker will happily put your data there during the initial link.

Filter the output of GNU nm by section

I'm trying to identify the largest symbols in an .elf file for each memory section (.text, .data, .bss). So far I'm using GNU nm to get the largest symbols:
nm foo.elf --size-sort --reverse-sort --radix=d --demangle --line-numbers
Is there a builtin way in nm to filter the ouput by section or do I need to resort to text filtering?
nm outputs a section type for every symbol as single letter code (B: .bss, D: .data, T: .text), but there seems no way to filter by symbol type.
Background: The code runs on a microcontroller which is able to execute instruction directly from flash memory. The instructions from the .text section stay in the flash memory during execution, .bss and .data are loaded into the RAM. That's way I would like to be able to identify the largest symbols in each section independently.
there seems no way to filter by symbol type.
Just use grep to perform any filtering you may need.
You may also want to look at Bloaty McBloatface: a size profiler for binaries.

Determining symbol addresses using binutils/readelf

I am working on a project where our verification test scripts need to locate symbol addresses within the build of software being tested. This might be used for setting breakpoints or reading static data from memory. What I am after is to create a map file containing symbol names, base address in memory, and size. Our build outputs an ELF file which has the information I want. I've been trying to use the readelf, nm, and objdump tools to try and to gain the symbol addresses I need.
I originally tried readelf -s file.elf and that seemed to access some symbols, particularly those which were written in assembler. However, many of the symbols that I wanted were not in there - specifically those that originated within our Ada code.
I used readelf --debug-dump file.elf to dump all debug information. From that I do see all symbols, including those that were in the Ada code. However, the format seems to be in the DWARF format. Does anyone know why these symbols would not be output by readelf when I ask it to list the symbolic information? Perhaps there is simply an option I am missing.
Now I could go to the trouble of writing a custom DWARF parser to get the information but if I can get it using one of the Binutils (nm, readelf, objdump) then I'd really like prefer a standard solution.
DWARF is the debug information and tries to reflect the relation of the original source code. Taking following code as an example
static int one() {
// something
return 1;
}
int main(int ac, char **av) {
return one();
}
After you compile it using gcc -O3 -g, the static function one will be inlined into main. So when you use readelf -s, you will never see the symbol one. However, when you use readelf --debug-dump, you can see one is a function which is inlined.
So, in this example, compiler does not prohibit you use optimization with -g, so you can still debug the executable. In that example, even the function is optimized and inlined, gdb still can use DWARF information to know the function and source/line from current code block inside inlined function.
Above is just a case of compiler optimization. There might be plenty of reasons that could lead to mismatch symbols address between readelf -s and DWARF.

How do I delete a program header from an ELF binary

I want to write a utility to remove a program header from an ELF binary. For example, when I run readelf -l /my/elf I get a listing of all the program headers: PHDR INTERP ... GNU_STACK GNU_RELRO. When I run my utility, I would like to get all the same program headers back in the same order, minus the one I deleted. Is there any easier way to do this than recreated the entire ELF from scratch, skipping the unwanted header?
Is there any easier way to do this than recreated the entire ELF from scratch
Sure: program headers form a fixed-record table at an offset given by ehdr.e_phoff, containing .e_phnum entries of .e_phentsize bytes.
To delete one entry, simply copy the rest of entries over it, and decrement .e_phnum. That's all there is to it.
Beware: deleting some entries will likely cause the dynamic loader to crash. GNU_STACK is about the only header that can be deleted without too much harm (that I can think of).
Update:
Yes, setting .p_type to PT_NULL is another (and simpler) approach. But such entries are generally not expected to be present, and you may find some systems where PT_NULL will trigger an assertion in the loader (or in some other program).
Finally, adding a new Phdr might be tricky. Usually there is no space to expand the table (as it is immediately followed by some other data, e.g. .text). You can relocate the table to the end of the file, and set .e_phoff and .e_phnum to correspond to the new table, but many programs expect the entire Phdr table to be loaded and available at runtime, and that is not easy to arrange, as the new location at the end of the file will not be "covered" by any PT_LOAD segment.
The GNU Binary File Descriptor library (libbfd) may be helpful.

In an ELF file, how does the address for _start get detemined?

I've been reading the ELF specification and cannot figure out where the program entry point and _start address come from.
It seems like they should have to be in a pretty consistent place, but I made a few trivial programs, and _start is always in a different place.
Can anyone clarify?
The _start symbol may be defined in any object file. Normally it is generated automatically (it corresponds to main in C). You can generate it yourself, for instance in an assembler source file:
.globl _start
_start:
// assembly here
When the linker has processed all object files it looks for the _start symbol and puts its value in the e_entry field of the elf header. The loader takes the address from this field and makes a call to it after it has finished loading all sections in memory and is ready to execute the file.
Take a look at the linker script ld is using:
ld -verbose
The format is documented at: https://sourceware.org/binutils/docs-2.25/ld/Scripts.html
It determines basically everything about how the executable will be generated.
On Binutils 2.24 Ubuntu 14.04 64-bit, it contains the line:
ENTRY(_start)
which sets the entry point to the _start symbol (goes to the ELF header as mentioned by ctn)
And then:
. = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
which sets the address of the first headers to 0x400000 + SIZEOF_HEADERS.
I have modified that address to 0x800000, passed my custom script with ld -T and it worked: readelf -s says that _start is at that address.
Another way to change it is to use the -Ttext-segment=0x800000 option.
The reason for using 0x400000 = 4Mb = getconf PAGE_SIZE is to start at the beginning of the second page as asked at: Why is the ELF execution entry point virtual address of the form 0x80xxxxx and not zero 0x0?
A question describes how to set _start from the command line: Why is the ELF entry point 0x8048000 not changeable with the "ld -e" option?
SIZEOF_HEADERS is the size of the ELF + program headers, which are at the beginning of the ELF file. That data gets loaded into the very beginning of the virtual memory space by Linux (TODO why?) In a minimal Linux x86-64 hello world with 2 program headers it is worth 0xb0, so that the _start symbol comes at 0x4000b0.
I'm not sure but try this link http://www.docstoc.com/docs/23942105/UNIX-ELF-File-Format
at page 8 it is shown where the entry point is if it is executable. Basically you need to calculate the offset and you got it.
Make sure to remember the little endianness of x86 ( i guess you use it) and reorder if you read bytewise edit: or maybe not i'm not quit sure about this to be honest.