Segment mapping in an ELF file - elf

ELF files consists of sections based on their contents such as .data,.text, .rodata etc and these sections are grouped into segments that guide how the ELF is mapped/loaded into the memory (Virtual/Physical mappings). These segments are formed by grouping bunch of sections together in the ELF.
example:
Section to Segment mapping:
Segment Sections...
00 .hash .dynsym .dynstr .rela.dyn .rela.plt
01 .plt .text
02 .rodata
03 .data.rel.ro .dynamic .got .got.plt .data .version_section .bss
04 .dynamic
I want to know how this grouping of sections is determined. Is is possible to control this grouping into segments. For example I would like to have .version_section as a separate segment altogether. Any idea how I would I go about this?
If linker script commands can be used then it would be great to know which ones.
Thanks in advance. :)

Assuming you are talking about gcc, it looks like you can use the --script option to provide ld with detailed instructions using their Command Script language

Related

Comparing Object Files with ObjDump

So I am trying to compare two ELF files original/gen_twiddle_fft16x16_imre.oe674 and new/gen_twiddle_fft16x16_imre.oe674 to see if they are the same thing. I have a hunch that they are, but, I can't tell exactly.
I can't just compare the code size and hope for the best, because their sizes are a few hundred bytes.
I run the dissassembler:
/ti_packages/all_packages/ccs1200/ccs/tools/compiler/ti-cgt-armllvm_2.1.0.LTS/bin/tiarmobjdump -D directory/gen_twiddle_fft16x16_imre.oe674
and I get:
gen_twiddle_fft16x16_imre.oe674: file format elf32-unknown
/ti_packages/all_packages/ccs1200/ccs/tools/compiler/ti-cgt-armllvm_2.1.0.LTS/bin/tiarmobjdump: error: 'gen_twiddle_fft16x16_imre.oe674': can't find target: : error: unable to get target for 'unknown--', see --version and --triple
for both files. I look at the symbol table, and they are exactly the same, except for this part at the top:
original/:
00000000 l df *ABS* 00000000 .hidden **TIsUkimy7q4**
new/:
00000000 l df *ABS* 00000000 .hidden **TIgFVpK1DaG**
Questions:
What could be the reason for this difference / what does this difference mean?
Is there anything else I should try?
What could be the reason for this difference / what does this difference mean?
Looks like two randomly-generated symbol names. The difference likely doesn't mean anything.
Is there anything else I should try?
Comparing symbol tables isn't going to tell you anything useful.
You should compare disassembly of the two objects (objdump -dr output), and also compare .data and .rodata in them (which you can dump with objdump -sj.data ..., etc.).
If they have identical .text, .data and .rodata, it's likely that the two objects are effectively the same.

Filter the output of GNU nm by section

I'm trying to identify the largest symbols in an .elf file for each memory section (.text, .data, .bss). So far I'm using GNU nm to get the largest symbols:
nm foo.elf --size-sort --reverse-sort --radix=d --demangle --line-numbers
Is there a builtin way in nm to filter the ouput by section or do I need to resort to text filtering?
nm outputs a section type for every symbol as single letter code (B: .bss, D: .data, T: .text), but there seems no way to filter by symbol type.
Background: The code runs on a microcontroller which is able to execute instruction directly from flash memory. The instructions from the .text section stay in the flash memory during execution, .bss and .data are loaded into the RAM. That's way I would like to be able to identify the largest symbols in each section independently.
there seems no way to filter by symbol type.
Just use grep to perform any filtering you may need.
You may also want to look at Bloaty McBloatface: a size profiler for binaries.

ELF program header segments sizes and offsets

I am trying to understand the ELF format and right now there are some thing that I don't get about the segments defined in the program header. I have this little code that I convert to an ELF file with g++ (x86_x64 on Linux):
#include <stdlib.h>
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
if (argc == 1)
{
cout << "Hello world!" << endl;
}
return 0;
}
With g++ -c -m64 -D ACIS64 main.cpp -o main.o and g++ -s -O1 -o Main main.o.
Now, with readelf I get this list of segments:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040
0x00000000000001f8 0x00000000000001f8 R E 8
INTERP 0x0000000000000238 0x0000000000400238 0x0000000000400238
0x000000000000001c 0x000000000000001c R 1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000afc 0x0000000000000afc R E 200000
LOAD 0x0000000000000df8 0x0000000000600df8 0x0000000000600df8
0x0000000000000270 0x00000000000003a0 RW 200000
DYNAMIC 0x0000000000000e18 0x0000000000600e18 0x0000000000600e18
0x00000000000001e0 0x00000000000001e0 RW 8
NOTE 0x0000000000000254 0x0000000000400254 0x0000000000400254
0x0000000000000044 0x0000000000000044 R 4
GNU_EH_FRAME 0x00000000000009a4 0x00000000004009a4 0x00000000004009a4
0x0000000000000044 0x0000000000000044 R 4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x0000000000000df8 0x0000000000600df8 0x0000000000600df8
0x0000000000000208 0x0000000000000208 R 1
With Bless Hex Editor I am looking at the code and try to find each one of these segments.
I find the PHDR segment just after the ELF header and having the size of this entire program header. It has an alignment of 8 bytes and is readable/executable. [!]I don't understand why executable.
I find the segment where the interpreter is declared, just after the PHDR. It has the size of the interpreter's path and an alignment of 1 byte. Correct
Now I have a segment that is readable and executable, which [!]I suppose is the code segment. I don't understand why does it start at 0x0000000000000000. Shouldn't this start where the entry point is located? Why does it have a size of 0xafc bytes? Isn't the size only the size of the code? How much of the file is executable? Also, I don't understand why the alignment is 0x200000 bytes. Is that how much space is reserved for a LOAD segment in memory?. This is where this segment ends and an amout of 764 0x0 bytes follows it:
The next one (readable and writable) [!]I suppose is a segment where variables are stored. It ends just where something like the sections header might be starting.
Now the next one is a DYNAMIC header. It starts at 0xe18, which is inside the one above. [!]I thought this was a segment where references to external functions and variables are stored but I am not sure. It is readable and writable. I just don't know what segment is this and why it is "inside" the LOAD segment above
A NOTE segment, containing some info that I suppose is not important right now
GNU specific segments, one of them having any offsets and sizes equal to 0x0000000000000000, others interfering with other segments, which I don't get, either.
I come from the PE world, where each thing has its own well defined offset and size and here I see these weird addresses and sizes and I am confused.
The readelf output displays the program header table. It contains the list of segments (which may be loadable or non-loadable) in the ELF file. It is common for a segment to contain other segments, as seen here.
I find the PHDR segment just after the ELF header and having the size
of this entire program header. It has an alignment of 8 bytes and is
readable/executable. [!]I don't understand why executable.
If you read the readelf output carefully, you will notice that PHDR is actually a part of the code segment (notice the VirtAddr and the MemSiz fields). That explains why it shares the same permissions as the code segment (RX).
Now I have a segment that is readable and executable, which [!]I
suppose is the code segment. I don't understand why does it start at
0x0000000000000000. Shouldn't this start where the entry point is
located? Why does it have a size of 0xafc bytes? Isn't the size only
the size of the code? How much of the file is executable? Also, I
don't understand why the alignment is 0x200000 bytes. Is that how much
space is reserved for a LOAD segment in memory?. This is where this
segment ends and an amout of 764 0x0 bytes follows it:
Yes, this is the code segment. It begins at the beginning of the file (i.e. offset 0) and extends upto 0xafc bytes in the file. The header specifies that this part of the file is mapped to 0x0000000000400000 in memory when the ELF is loaded. The segment not only consists of the main( ) from the C++ file, some other executable stuff is also added by the compiler. Alignment only specifies where should the next segment begin, not the size of the segment. Loadable segments should have congruent values of VirtAddr and PhysAddr fields modulo page size (or Align field, if Align!=0 && Align!=1). That explains why VirtAddr for data segment is 0x0000000000600df8 (0x0000000000600df8 - 0x0000000000000df8 % 0x200000 == 0). The region in file between the text segment and the data segment (i.e. between 0xafc and 0xdf8) is filled with zeroes.
The next one (readable and writable) [!]I suppose is a segment where
variables are stored. It ends just where something like the sections
header might be starting.
Correct, this is the data segment that stores the global and static variables (among other stuff). It ends just before the section headers.
Now the next one is a DYNAMIC header. It starts at 0xe18, which is
inside the one above. [!]I thought this was a segment where references
to external functions and variables are stored but I am not sure. It
is readable and writable. I just don't know what segment is this and
why it is "inside" the LOAD segment above
Just like the PHDR segment is a part of the code segment, DYNAMIC segment is a part of the data segment. That's why the same permissions (RW). It contains .dynamic section which contains an array of structures such as addresses of symbol and string tables.
GNU specific segments, one of them having any offsets and sizes equal
to 0x0000000000000000, others interfering with other segments, which I
don't get, either.
GNU_EH_FRAME is a part of code segment and GNU_RELRO is a part of data segment (See the VirtAddr and MemSiz fields). GNU_STACK is just an program header which tells the system how to control the stack when the ELF is loaded into memory. (FileSiz and MemSiz are 0).
References:
ELF File format specification
Linkers and Loaders, by John R. Levine

Strange variables with gdb

I am using gdb to debug a program in x86 assembly. Though I have a strange behavior of some variables and I can't understand why.
This is how I define and view them:
section .data
CountDied: dd 0000
OnesFound: db 00
section .text
global _start
_start:
nop
... code
When I run gdb step by step I check if the variable have the correct value at the very first instruction and I get the following:
print CountDied
$1=0
print OnesFound
$2=167772672
Though in the next instructions OnesFound seems to behave in a correct way. I'm really puzzled. Thanks for your suggestions.
An assembly "variable" is just a label for a specific point in memory. GDB doesn't know how big it is supposed to be, it's just assuming that it's a 32-bit value.
The hex representation of the number you're getting is 0x0A000200. x86 is a little endian platform, so that will actually be stored in memory as 00 02 00 0A. Only the first byte is actually part of the value you set, and it is set correctly.
You can view just the specific byte you want with by using the command x/b &OnesFound instead of using print.

Is the ELF .notes section really needed?

On Linux, I'm trying to strip a statically linked ELF file to the bare essentials. When I run:
strip --strip-unneeded foo
or
strip --strip-all foo
The resulting file still has a fat .notes section that appears to be full of funky strings.
Is the .notes section really needed or can I safely force it out with --remove-section?
Thanks for any help.
From experience and from looking at the man page for strip, it looks like strip isn't supposed to get rid of any and all sections and strings that aren't needed; just symbols. Quoth the man page:
GNU strip discards all symbols from object files objfile.
That being said, from experience, strip, even without --strip-all, removes sections unneeded for loading, such as .symtab and .strtab, and you can, as you note, remove sections you want it with --remove-section.
As an example of a .notes section, I took /bin/ls from my Ubuntu 11.10 64-bit box:
$ readelf -Wn /bin/ls
Notes at offset 0x00000254 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.15
Notes at offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 3e6f3159144281f709c3c5ffd41e376f53b47952
That encompasses the .note.ABI-tag section and the .note.gnu.build-id section. It looks like they contain data that isn't necessary to load the program, but also isn't standard, and isn't known by strip to not be necessary for the proper running of the program, since an ELF can have any number of additional "unknown" sections that aren't safe to remove. So rather using a virtual whitelist (which would fail miserably), it uses a blacklist of sections that it knows it can get rid of, and does so.
Short version: these sections don't seem to be standard and could be used for various things, so strip can't know it's safe to remove them. But based on the info inside the one I took above, if it's your own program, it's almost certainly safe to remove it.