Location of DW_FORM_strp values - elf

I'm trying to understand where DW_FORM_strp attribute values are actually stored in an ELF file (can be found here: https://filebin.net/77bb8359o0ibqu67).
I've found sections .debug_info, .debug_abbrev and .debug_str. I've then parsed the compilation unit header in .debug_info, and found the abbreviation table entry for the compile unit and iterated over its abbreviations. The first abbreviation is DW_AT_producer with form DW_FORM_strp. What I'm wondering is how to find where this offset is located?
From the DWARF4 spec I read: Each debugging information entry begins with a code that represents an entry in a separate abbreviations table. This code is followed directly by a series of attribute values. My understanding of this is that if I go back to the compilation unit header, skip over its content, I should end up at the compilation unit. It starts with a ULEB128 (which I parse), after which the attribute values should come. However, in my ELF file those bytes are all 0. I've run readelf -w on the file, and I see the following:
Contents of the .debug_info section:
Compilation Unit # offset 0x0:
Length: 0xf6 (32-bit)
Version: 4
Abbrev Offset: 0x0
Pointer Size: 8
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<c> DW_AT_producer : (indirect string, offset: 0x62): GNU C11 7.5.0 -mtune=generic -march=x86-64 -g -O0 -fstack-protector-strong
<10> DW_AT_language : 12 (ANSI C99)
<11> DW_AT_name : (indirect string, offset: 0xd9): elf.c
<15> DW_AT_comp_dir : (indirect string, offset: 0xad): /home//struct_analyzer
<19> DW_AT_low_pc : 0x0
<21> DW_AT_high_pc : 0x39
<29> DW_AT_stmt_list : 0x0
This tells me that the offset into the string table is 0x62, and the name is at an offset 0xd9. However, after parsing the ULEB128 which is the first part of any DIE, the next 4 bytes (the first attribute's value) are 0x00 00 00 00. This I don't understand?
Edit to Employed Russian:
Yes, I understand that the offset 0x62 points into the .debug_str section. However, what I'm wondering is where I find this 0x62 value?
Each DIE starts with a ULEB128 value (the abbreviation table entry code), and is followed by the attributes. The first attribute in the corresponding abbreviation table entry is a DW_AT_producer of form DW_FORM_strp. This means that the next 4 bytes in the DIE are supposed to be the offset into .debug_str. However, the next 4 bytes are 0x00 00 00 00, and not 0x62 00 00 00 which is the value I'm looking. 0x62 is residing at offset 0x5c8 into the ELF file, whereas the DIE's attributes start at offset 0x85 as far as I can tell (see attached image for a hexdump (little endian) - highlighted byte is the ULEB128, and the following bytes are what I expect to be the offset into .debug_str).
Edit 2
I've been able to determine that the actual attribute values of form DW_FORM_strp are located in the .rela.debug_info section in the ELF file, so I'll have to read more about that.

The specific ELF file posted for this question also has a rela.debug_info section, which contains relocation entries for the .debug_info section. From the ELF spec:
.relaNAME
This section holds relocation information as described below.
If the file has a loadable segment that includes relocation,
the section's attributes will include the SHF_ALLOC bit. Oth‐
erwise, the bit will be off. By convention, "NAME" is sup‐
plied by the section to which the relocations apply. Thus a
relocation section for .text normally would have the name
.rela.text. This section is of type SHT_RELA.
Each relocation entry in this section (of type Elf64_Rela in this particular case) should be iterated over, and the value of each entry should be addended with the corresponding value in the .debug_info section.

This tells me that the offset into the string table is 0x62, and the name is at an offset 0xd9.
Correct. These offsets are into the .debug_str section, which starts at offset 0x289 in the file.
readelf -WS elf.o | grep debug_str
[12] .debug_str PROGBITS 0000000000000000 000289 0000e4 01 MS 0 0 1
dd if=elf.o bs=1 skip=$((0x289+0x62)) count=75 2>/dev/null
GNU C11 7.5.0 -mtune=generic -march=x86-64 -g -O0 -fstack-protector-strong
dd if=elf.o bs=1 skip=$((0x289+0xd9)) count=5 2>/dev/null
elf.c
P.S.
I've found sections .dwarf_info, .dward_abbrev and .dwarf_str.
None of above sections exit in your file. It helps to be precise when asking questions.

Related

Getting difference between virtual address and Offset in an ELF file

readelf -S of a particular binary gives the following output
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .hash HASH 0000000000400278 00000278
0000000000000a7c 0000000000000004 A 4 0 8
[ 4] .dynsym DYNSYM 0000000000400cf8 00000cf8
.
.
.
Difference between virtual address and offset of first section .interp is 0x400000. I am curious as to:
how is this calculated?
Is there a programmatic way of determining this?
how is this calculated?
You just calculated it yourself: 0x400238 - 0x238 == 0x400000. Your question is probably "why is this particular address selected?".
This is the default link-at address for Linux x86_64 position dependent binaries. You can change that address with -Ttext=... linker flag. The default is different for ix86 (32-bit) binaries: it's 0x8048000.
I am not sure why these particular defaults were chosen.
Is there a programmatic way of determining this?
Sure: read the Elf64_Ehdr from the start of the file. It will tell you offset to the start of program headers (.e_phoff). Seek to that offset, and read Elf64_Phdrs. Now iterate over them, and their .p_vaddr and .p_offset will have the same values.
P.S. You are looking at program sections which are not used and are not guaranteed to be present in a fully-linked binary. You should be looking at program segments instead. Use readelf -Wl a.out to examine them.

modified elf symbol but not reflecting in disassembly

I have used a symbol "__copy_start" inside my assembly code which is coming from linker script. symbol is defined as ABS in symbol table.
This symbol is used inside a macro to copy data from one memory location to another.
After looking at varenter code hereious ways to modify this symbol directly in elf i decided to write C code of my own to modify the symbol value.
To do that i traversed entire symbol table and did string match for the symbol i am interested in. When there is a symbol name match i just assigned symbol_table.st_value = new value.
To make sure the new value is taken i did readelf -s and checked that it does show the new value assigned by me.
Now, when i disassemble the modified elf i find that the new value has not taken effect and i still see the assembly code doing copy from old symbol value.
My question is:
Am i doing something wrong here? is it possible to change the symbol values in elf? If yes, please let me know the correct way to do it. How do i achieve what i intend to do here.
Note: I don't have the source code so taking this approach.
Thanks in advance,
Gaurav
wanted to add more information so that people can understand better.
copying the elf header below:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
**Type: EXEC (Executable file)**
Machine: Ubicom32 32-bit microcontrollers
Version: 0x1
Entry point address: 0xb0000000
Start of program headers: 52 (bytes into file)
Start of section headers: 33548 (bytes into file)
Flags: 0x6
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 3
Here as you can see that file is of type executable.
output of readelf -S copied below:
There are 6 section headers, starting at offset 0x830c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 3ffc0000 004000 000ebc 00 AX 0 0 1
[ 2] .sdram PROGBITS 50000000 008000 0002e4 00 WA 0 0 1
[ 3] .shstrtab STRTAB 00000000 0082e4 000028 00 0 0 1
[ 4] .symtab SYMTAB 00000000 0083fc 0001c0 10 5 20 4
[ 5] .strtab STRTAB 00000000 0085bc 00019a 00 0 0 1
I am using one of the symbol named "__copy_start" in an instruction to copy the data from .sdram section to .text section. I was under an impression that i could go and change the symbol_table.st_value and then get the desired work done. But unfortunately that is not the case. Seems like it is already compiled and cannot be changed like this.
Any idea how this could be done would be really helpful.
Regards,
Gaurav
Are you sure that the object code actually uses a relocation to reference the data at the __copy_start symbol? Even for position-independent code, it is usually possible to turn section start addresses into relative addresses, which do not need a relocation. (That the symbol itself remains present with an absolute address does not change this.)
You can check this by using readelf -r or eu-readelf -r and examining the output. It is also visible in the objdump --dissassemble --reloc output.

ELF unmapped region in a segment

Below is the output for my readelf -l test
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00008000 0x00008000 0x00148 0x00148 R E 0x8000
LOAD 0x000148 0x00010148 0x00010148 0x00000 0x00004 RW 0x8000
NOTE 0x0000b4 0x000080b4 0x000080b4 0x00024 0x00024 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00 .note.gnu.build-id .text
01 .bss
02 .note.gnu.build-id
03
My question is about the first LOAD segment. It encompasses [8000 - 8148], and is mapped to sections .note and .text. My readelf -S output shows that .note section starts from 80b4, and .text starts from 80d8. That means Loadable segment contains a region [8000-80b3] which is unmapped to any section, but still will be loaded to memory by loader.
My question is, if there is any harm if I create a new segment which ranges from[80b4-8148] deleting this segment?
That means Loadable segment contains a region [8000-80b3] which is unmapped to any section, but still will be loaded to memory by loader.
Correct. You will find the Elf32_Ehdr, and likely a set of Elf32_Phdrs in that segment.
Note: for the main binary, it's actually the kernel that does the loading, and not the dynamic linker. You are not wrong in calling it "loader", but usually people use "loader" for the dynamic linker, and not the "part of the kernel that maps in the main binary".
My question is, if there is any harm if I create a new segment which ranges from[80b4-8148] deleting this segment?
The segment has to be page-aligned. A segment with .p_vaddr that is not page-aligned (as I believe you are proposing) will be rejected by the kernel.

relocation section header information in .elf file

My apology for my poor English, really having a hard time understanding what is the sh_info field contains for relocation section, following is what I get from the ELF document:
It says
sh_info : contains the section header index of the section to which the relocation applies
sh_link: contains the section header index of the associated symbol table.
Clearly: sh_info is not about the symbol table section that the relocation section relates to, whose information is stored in sh_link.
Based on my understanding: when relocating a symbol, three sections are related: the relocation section, the symbol table section, and the section which contains symbols' definition for symbols in the symbol table.
Assumption 1: So I assume sh_info is about the third section mentioned ahead
-----However, when I go through the sample code for relocation, my assumption seems not match
static int elf_do_reloc(Elf32_Ehdr *hdr, Elf32_Rel *rel, Elf32_Shdr *reltab) {
Elf32_Shdr *target = elf_section(hdr, reltab->sh_info);
int addr = (int)hdr + target->sh_offset;
int *ref = (int *)(addr + rel->r_offset);
// Symbol value
int symval = 0;
if(ELF32_R_SYM(rel->r_info) != SHN_UNDEF) {
symval = elf_get_symval(hdr, reltab->sh_link, ELF32_R_SYM(rel->r_info));
if(symval == ELF_RELOC_ERR) return ELF_RELOC_ERR;
}
-----Sicce r_info is a field only entry in relocation section contains
which means sh_info is the index of the relocation section itself. < Assumption 2
What confuses me more is the an example someone else posts, reading elf file example
it seems the sh_info field information is nothing related to my previous 2 assumptions
Could anyone please help explain what does sh_info really contains?
About the "confusing example", maybe relocation part got deleted but only mention of sh_info is related to parsing of (dynamic) symbols name and (as show in image in your question) that field has different meaning for SHT_SYMTAB and SHT_DYNSYM (number of items in section + 1).
Section #0A OFF: 0x000015F8
Name: .rela.plt (0x00000084)
Type: SHT_RELA (0x00000004)
Flags: -a-
Addr: 0x004003E8
Offset: 0x000003E8
Size: 0x00000090
Link: 0x00000005
Info 0x0000000C
Section #05 OFF: 0x000014B8
Name: .dynsym (0x0000004E)
Type: SHT_DYNSYM (0x0000000B)
Flags: -a-
Addr: 0x00400280
Offset: 0x00000280
Size: 0x000000A8
Link: 0x00000006
Info 0x00000001
Section #0C OFF: 0x00001678
Name: .plt (0x00000089)
Type: SHT_PROGBITS (0x00000001)
Flags: -ax
Addr: 0x004004A0
Offset: 0x000004A0
Size: 0x00000070
Link: 0x00000000
Info 0x00000000
You can see that sh_link points to .dynsym section and sh_info points to .plt section (which contains executable memory).
So sh_link is symbol table and sh_info is executable section that gets modified.
Basically your document says it all already, but here are some more references.
Chapter on Sections [Figure 4-12: sh_link and sh_info Interpretation]:
sh_link - The section header index of the associated symbol table.
sh_info - The section header index of the section to which the relocation applies.
Also there's a chapter on relocation:
A relocation section references two other sections: a symbol table and a section to modify. The section header's sh_info and sh_link members, described in ``Sections'' above, specify these relationships. Relocation entries for different object files have slightly different interpretations for the r_offset member.
In relocatable files, r_offset holds a section offset. The relocation section itself describes how to modify another section in the file; relocation offsets designate a storage unit within the second section.
In executable and shared object files, r_offset holds a virtual address. To make these files' relocation entries more useful for the dynamic linker, the section offset (file interpretation) gives way to a virtual address (memory interpretation).
And just for fun... Here are (search page) relocation types for x86 on Linux:
#define R_X86_64_NONE 0 /* No reloc */
#define R_X86_64_64 1 /* Direct 64 bit */
#define R_X86_64_PC32 2 /* PC relative 32 bit signed */
#define R_X86_64_GOT32 3 /* 32 bit GOT entry */
#define R_X86_64_PLT32 4 /* 32 bit PLT address */
#define R_X86_64_COPY 5 /* Copy symbol at runtime */
#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */
#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */
#define R_X86_64_RELATIVE 8 /* Adjust by program base */
#define R_X86_64_GOTPCREL 9 /* 32 bit signed pc relative
offset to GOT */
#define R_X86_64_32 10 /* Direct 32 bit zero extended */
#define R_X86_64_32S 11 /* Direct 32 bit sign extended */
#define R_X86_64_16 12 /* Direct 16 bit zero extended */
#define R_X86_64_PC16 13 /* 16 bit sign extended pc relative */
#define R_X86_64_8 14 /* Direct 8 bit sign extended */
#define R_X86_64_PC8 15 /* 8 bit sign extended pc relative */

Why are elf segments not page aligned?

readelf -l /bin/ls:
LOAD 0x000000 0x08048000 0x08048000 0x18ff8 0x18ff8 R E 0x1000
LOAD 0x019eec 0x08061eec 0x08061eec 0x003f4 0x01014 RW 0x1000
So the boundary page between the two segments is both read-only and read-writable, how is this possible?
Assuming a page size of 4096 (0x1000) bytes and rounding addresses to page granularities:
The first loadable segment would use the address range [0x8048000--0x8060FFF], both ends inclusive.
The second loadable segment would use the address range [0x8061000--0x8062FFF], of which 0x3F4 bytes starting at address 0x8061EEC would come from the executable, with the rest being zero-filled at load time.
There is no overlap.