Getting difference between virtual address and Offset in an ELF file - elf

readelf -S of a particular binary gives the following output
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .hash HASH 0000000000400278 00000278
0000000000000a7c 0000000000000004 A 4 0 8
[ 4] .dynsym DYNSYM 0000000000400cf8 00000cf8
.
.
.
Difference between virtual address and offset of first section .interp is 0x400000. I am curious as to:
how is this calculated?
Is there a programmatic way of determining this?

how is this calculated?
You just calculated it yourself: 0x400238 - 0x238 == 0x400000. Your question is probably "why is this particular address selected?".
This is the default link-at address for Linux x86_64 position dependent binaries. You can change that address with -Ttext=... linker flag. The default is different for ix86 (32-bit) binaries: it's 0x8048000.
I am not sure why these particular defaults were chosen.
Is there a programmatic way of determining this?
Sure: read the Elf64_Ehdr from the start of the file. It will tell you offset to the start of program headers (.e_phoff). Seek to that offset, and read Elf64_Phdrs. Now iterate over them, and their .p_vaddr and .p_offset will have the same values.
P.S. You are looking at program sections which are not used and are not guaranteed to be present in a fully-linked binary. You should be looking at program segments instead. Use readelf -Wl a.out to examine them.

Related

Identify entries in a global offset table

I'm attempting to write some ELF parsing logic (in C). Specifically, I'm trying to identify which entries in the GOT correspond to which functions.
I've crafted a simple program which contains references to malloc and free. Some relevant excerpts from readelf -a a.out:
Relocation section '.rela.plt' at offset 0x630 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000003fc8 0000000100000007 R_X86_64_JUMP_SLOT 0000000000000000 free#GLIBC_2.2.5 + 0
0000000000003fd0 0000000500000007 R_X86_64_JUMP_SLOT 0000000000000000 malloc#GLIBC_2.2.5 + 0
No processor specific unwind information to decode
Symbol table '.dynsym' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free#GLIBC_2.2.5 (2)
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.34 (3)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTable
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND malloc#GLIBC_2.2.5 (2)
6: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
7: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize#GLIBC_2.2.5 (2)
I know how to use .dynstr to get the names of the symbols in .dynsym. However, how is readelf populating the symbol names in .rela.plt? I'm not seeing anything in the definitions of either Elf64_Sym or Elf64_Rel which would help. At first, I thought the st_shndx field in Elf64_Sym would be relevant but readelf is showing that value as SHN_UNDEF.
The information is contained in the Elf64_Rel structure. Specifically, the r_info field:
This member gives both the symbol table index with respect to which the relocation must be made and the type of relocation to apply.
The ELF64_R_SYM macro can be used to extract the offset from this field. As seen in the .rela.plt description in the OP, free, for example, has an index of 1 which corresponds to entry 1 in .dynsym.

Location of DW_FORM_strp values

I'm trying to understand where DW_FORM_strp attribute values are actually stored in an ELF file (can be found here: https://filebin.net/77bb8359o0ibqu67).
I've found sections .debug_info, .debug_abbrev and .debug_str. I've then parsed the compilation unit header in .debug_info, and found the abbreviation table entry for the compile unit and iterated over its abbreviations. The first abbreviation is DW_AT_producer with form DW_FORM_strp. What I'm wondering is how to find where this offset is located?
From the DWARF4 spec I read: Each debugging information entry begins with a code that represents an entry in a separate abbreviations table. This code is followed directly by a series of attribute values. My understanding of this is that if I go back to the compilation unit header, skip over its content, I should end up at the compilation unit. It starts with a ULEB128 (which I parse), after which the attribute values should come. However, in my ELF file those bytes are all 0. I've run readelf -w on the file, and I see the following:
Contents of the .debug_info section:
Compilation Unit # offset 0x0:
Length: 0xf6 (32-bit)
Version: 4
Abbrev Offset: 0x0
Pointer Size: 8
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<c> DW_AT_producer : (indirect string, offset: 0x62): GNU C11 7.5.0 -mtune=generic -march=x86-64 -g -O0 -fstack-protector-strong
<10> DW_AT_language : 12 (ANSI C99)
<11> DW_AT_name : (indirect string, offset: 0xd9): elf.c
<15> DW_AT_comp_dir : (indirect string, offset: 0xad): /home//struct_analyzer
<19> DW_AT_low_pc : 0x0
<21> DW_AT_high_pc : 0x39
<29> DW_AT_stmt_list : 0x0
This tells me that the offset into the string table is 0x62, and the name is at an offset 0xd9. However, after parsing the ULEB128 which is the first part of any DIE, the next 4 bytes (the first attribute's value) are 0x00 00 00 00. This I don't understand?
Edit to Employed Russian:
Yes, I understand that the offset 0x62 points into the .debug_str section. However, what I'm wondering is where I find this 0x62 value?
Each DIE starts with a ULEB128 value (the abbreviation table entry code), and is followed by the attributes. The first attribute in the corresponding abbreviation table entry is a DW_AT_producer of form DW_FORM_strp. This means that the next 4 bytes in the DIE are supposed to be the offset into .debug_str. However, the next 4 bytes are 0x00 00 00 00, and not 0x62 00 00 00 which is the value I'm looking. 0x62 is residing at offset 0x5c8 into the ELF file, whereas the DIE's attributes start at offset 0x85 as far as I can tell (see attached image for a hexdump (little endian) - highlighted byte is the ULEB128, and the following bytes are what I expect to be the offset into .debug_str).
Edit 2
I've been able to determine that the actual attribute values of form DW_FORM_strp are located in the .rela.debug_info section in the ELF file, so I'll have to read more about that.
The specific ELF file posted for this question also has a rela.debug_info section, which contains relocation entries for the .debug_info section. From the ELF spec:
.relaNAME
This section holds relocation information as described below.
If the file has a loadable segment that includes relocation,
the section's attributes will include the SHF_ALLOC bit. Oth‐
erwise, the bit will be off. By convention, "NAME" is sup‐
plied by the section to which the relocations apply. Thus a
relocation section for .text normally would have the name
.rela.text. This section is of type SHT_RELA.
Each relocation entry in this section (of type Elf64_Rela in this particular case) should be iterated over, and the value of each entry should be addended with the corresponding value in the .debug_info section.
This tells me that the offset into the string table is 0x62, and the name is at an offset 0xd9.
Correct. These offsets are into the .debug_str section, which starts at offset 0x289 in the file.
readelf -WS elf.o | grep debug_str
[12] .debug_str PROGBITS 0000000000000000 000289 0000e4 01 MS 0 0 1
dd if=elf.o bs=1 skip=$((0x289+0x62)) count=75 2>/dev/null
GNU C11 7.5.0 -mtune=generic -march=x86-64 -g -O0 -fstack-protector-strong
dd if=elf.o bs=1 skip=$((0x289+0xd9)) count=5 2>/dev/null
elf.c
P.S.
I've found sections .dwarf_info, .dward_abbrev and .dwarf_str.
None of above sections exit in your file. It helps to be precise when asking questions.

modified elf symbol but not reflecting in disassembly

I have used a symbol "__copy_start" inside my assembly code which is coming from linker script. symbol is defined as ABS in symbol table.
This symbol is used inside a macro to copy data from one memory location to another.
After looking at varenter code hereious ways to modify this symbol directly in elf i decided to write C code of my own to modify the symbol value.
To do that i traversed entire symbol table and did string match for the symbol i am interested in. When there is a symbol name match i just assigned symbol_table.st_value = new value.
To make sure the new value is taken i did readelf -s and checked that it does show the new value assigned by me.
Now, when i disassemble the modified elf i find that the new value has not taken effect and i still see the assembly code doing copy from old symbol value.
My question is:
Am i doing something wrong here? is it possible to change the symbol values in elf? If yes, please let me know the correct way to do it. How do i achieve what i intend to do here.
Note: I don't have the source code so taking this approach.
Thanks in advance,
Gaurav
wanted to add more information so that people can understand better.
copying the elf header below:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
**Type: EXEC (Executable file)**
Machine: Ubicom32 32-bit microcontrollers
Version: 0x1
Entry point address: 0xb0000000
Start of program headers: 52 (bytes into file)
Start of section headers: 33548 (bytes into file)
Flags: 0x6
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 3
Here as you can see that file is of type executable.
output of readelf -S copied below:
There are 6 section headers, starting at offset 0x830c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 3ffc0000 004000 000ebc 00 AX 0 0 1
[ 2] .sdram PROGBITS 50000000 008000 0002e4 00 WA 0 0 1
[ 3] .shstrtab STRTAB 00000000 0082e4 000028 00 0 0 1
[ 4] .symtab SYMTAB 00000000 0083fc 0001c0 10 5 20 4
[ 5] .strtab STRTAB 00000000 0085bc 00019a 00 0 0 1
I am using one of the symbol named "__copy_start" in an instruction to copy the data from .sdram section to .text section. I was under an impression that i could go and change the symbol_table.st_value and then get the desired work done. But unfortunately that is not the case. Seems like it is already compiled and cannot be changed like this.
Any idea how this could be done would be really helpful.
Regards,
Gaurav
Are you sure that the object code actually uses a relocation to reference the data at the __copy_start symbol? Even for position-independent code, it is usually possible to turn section start addresses into relative addresses, which do not need a relocation. (That the symbol itself remains present with an absolute address does not change this.)
You can check this by using readelf -r or eu-readelf -r and examining the output. It is also visible in the objdump --dissassemble --reloc output.

Make all pages readable/writable/executable

I would like to grant full permissions (read, write, and execute) to all memory pages in an ELF binary. Ideally, I'd like to be able to do this as a transformation on a binary or object file, in the same way that symbols can be changed with objcopy. I have not yet found a good way to do this. I would also be okay with a solution that involves running code at startup that calls mprotect on every page with the flags PROT_READ | PROT_WRITE | PROT_EXEC. I've tried this briefly, but I haven't found a good way to know which pages are mapped, and therefore which pages need to be mprotected.
It isn't required that dynamically allocated pages have all permissions, only the pages mapped at program startup.
The following script implements Employed Russian's answer in code:
sets the p_type of the RELRO segment to PT_NULL
sets Flags on LOAD segments to PF_X|PF_W|PF_R.
It depends on pyelftools for python3, which can be installed with pip3 install pyelftools.
#!/usr/bin/env python3
import sys
from elftools.elf.elffile import ELFFile
from elftools.elf.descriptions import describe_p_type
if len(sys.argv) != 2:
print("Usage: elf_rwe <elf>")
name = sys.argv[1]
with open(name, "rb") as f:
elf = ELFFile(f)
rwe_offsets = []
relro_offsets = []
for i in range(elf['e_phnum']):
program_offset = elf['e_phoff'] + i * elf['e_phentsize']
f.seek(program_offset)
program_header = elf.structs.Elf_Phdr.parse_stream(f)
if program_header['p_type'] == "PT_LOAD":
rwe_offsets.append(program_offset)
if program_header['p_type'] == "PT_GNU_RELRO":
relro_offsets.append(program_offset)
f.seek(0)
b = list(f.read())
# Zap RELRO
pt_null = 0
for off in relro_offsets:
b[off] = pt_null
# Fix permissions
p_flags_offset = 4
for off in rwe_offsets:
b[off + p_flags_offset] = 0x7 # PF_X|PF_W|PF_R
with open(name, "wb") as f:
f.write(bytes(b))
I would like to grant full permissions (read, write, and execute) to all memory pages in an ELF binary.
Note that some security policies, such as W^X in selinux will prevent your binary from running.
Ideally, I'd like to be able to do this as a transformation on a binary or object file
Run readelf -Wl on your binary. You'll see something similar to:
$ readelf -Wl /bin/date
Elf file type is EXEC (Executable file)
Entry point 0x4021cf
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
INTERP 0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x00dde4 0x00dde4 R E 0x200000
LOAD 0x00de10 0x000000000060de10 0x000000000060de10 0x0004e4 0x0006b0 RW 0x200000
DYNAMIC 0x00de28 0x000000000060de28 0x000000000060de28 0x0001d0 0x0001d0 RW 0x8
NOTE 0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R 0x4
GNU_EH_FRAME 0x00cb8c 0x000000000040cb8c 0x000000000040cb8c 0x0002f4 0x0002f4 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x00de10 0x000000000060de10 0x000000000060de10 0x0001f0 0x0001f0 R 0x1
What you want to do then is to change Flags on LOAD segments to have PF_X|PF_W|PF_R. The flags are part of Elf{32,64}_Phdr table, and the offset to the table is in stored in e_phoff of the Elf{32,64}_Ehdr (which is stored at the start of every ELF file).
Look in /usr/include/elf.h. Parsing fixed-sized ELF structures involved here isn't complicated.
You are unlikely to find any standard tool that would do this for you (given that this is such an unusual and unsecure thing to do), but a program to change flags is trivial to write in C, Python or Perl.
P.S. You may also need to "zap" the RELRO segment, which could be done by changing its p_type to PT_NULL.
I haven't found a good way to know which pages are mapped, and therefore which pages need to be mprotected.
On Linux, you could parse /proc/self/maps to get that info. Other OSes may offer a different way to achieve the same.

ELF unmapped region in a segment

Below is the output for my readelf -l test
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00008000 0x00008000 0x00148 0x00148 R E 0x8000
LOAD 0x000148 0x00010148 0x00010148 0x00000 0x00004 RW 0x8000
NOTE 0x0000b4 0x000080b4 0x000080b4 0x00024 0x00024 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00 .note.gnu.build-id .text
01 .bss
02 .note.gnu.build-id
03
My question is about the first LOAD segment. It encompasses [8000 - 8148], and is mapped to sections .note and .text. My readelf -S output shows that .note section starts from 80b4, and .text starts from 80d8. That means Loadable segment contains a region [8000-80b3] which is unmapped to any section, but still will be loaded to memory by loader.
My question is, if there is any harm if I create a new segment which ranges from[80b4-8148] deleting this segment?
That means Loadable segment contains a region [8000-80b3] which is unmapped to any section, but still will be loaded to memory by loader.
Correct. You will find the Elf32_Ehdr, and likely a set of Elf32_Phdrs in that segment.
Note: for the main binary, it's actually the kernel that does the loading, and not the dynamic linker. You are not wrong in calling it "loader", but usually people use "loader" for the dynamic linker, and not the "part of the kernel that maps in the main binary".
My question is, if there is any harm if I create a new segment which ranges from[80b4-8148] deleting this segment?
The segment has to be page-aligned. A segment with .p_vaddr that is not page-aligned (as I believe you are proposing) will be rejected by the kernel.