Why are elf segments not page aligned? - elf

readelf -l /bin/ls:
LOAD 0x000000 0x08048000 0x08048000 0x18ff8 0x18ff8 R E 0x1000
LOAD 0x019eec 0x08061eec 0x08061eec 0x003f4 0x01014 RW 0x1000
So the boundary page between the two segments is both read-only and read-writable, how is this possible?

Assuming a page size of 4096 (0x1000) bytes and rounding addresses to page granularities:
The first loadable segment would use the address range [0x8048000--0x8060FFF], both ends inclusive.
The second loadable segment would use the address range [0x8061000--0x8062FFF], of which 0x3F4 bytes starting at address 0x8061EEC would come from the executable, with the rest being zero-filled at load time.
There is no overlap.

Related

Location of DW_FORM_strp values

I'm trying to understand where DW_FORM_strp attribute values are actually stored in an ELF file (can be found here: https://filebin.net/77bb8359o0ibqu67).
I've found sections .debug_info, .debug_abbrev and .debug_str. I've then parsed the compilation unit header in .debug_info, and found the abbreviation table entry for the compile unit and iterated over its abbreviations. The first abbreviation is DW_AT_producer with form DW_FORM_strp. What I'm wondering is how to find where this offset is located?
From the DWARF4 spec I read: Each debugging information entry begins with a code that represents an entry in a separate abbreviations table. This code is followed directly by a series of attribute values. My understanding of this is that if I go back to the compilation unit header, skip over its content, I should end up at the compilation unit. It starts with a ULEB128 (which I parse), after which the attribute values should come. However, in my ELF file those bytes are all 0. I've run readelf -w on the file, and I see the following:
Contents of the .debug_info section:
Compilation Unit # offset 0x0:
Length: 0xf6 (32-bit)
Version: 4
Abbrev Offset: 0x0
Pointer Size: 8
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<c> DW_AT_producer : (indirect string, offset: 0x62): GNU C11 7.5.0 -mtune=generic -march=x86-64 -g -O0 -fstack-protector-strong
<10> DW_AT_language : 12 (ANSI C99)
<11> DW_AT_name : (indirect string, offset: 0xd9): elf.c
<15> DW_AT_comp_dir : (indirect string, offset: 0xad): /home//struct_analyzer
<19> DW_AT_low_pc : 0x0
<21> DW_AT_high_pc : 0x39
<29> DW_AT_stmt_list : 0x0
This tells me that the offset into the string table is 0x62, and the name is at an offset 0xd9. However, after parsing the ULEB128 which is the first part of any DIE, the next 4 bytes (the first attribute's value) are 0x00 00 00 00. This I don't understand?
Edit to Employed Russian:
Yes, I understand that the offset 0x62 points into the .debug_str section. However, what I'm wondering is where I find this 0x62 value?
Each DIE starts with a ULEB128 value (the abbreviation table entry code), and is followed by the attributes. The first attribute in the corresponding abbreviation table entry is a DW_AT_producer of form DW_FORM_strp. This means that the next 4 bytes in the DIE are supposed to be the offset into .debug_str. However, the next 4 bytes are 0x00 00 00 00, and not 0x62 00 00 00 which is the value I'm looking. 0x62 is residing at offset 0x5c8 into the ELF file, whereas the DIE's attributes start at offset 0x85 as far as I can tell (see attached image for a hexdump (little endian) - highlighted byte is the ULEB128, and the following bytes are what I expect to be the offset into .debug_str).
Edit 2
I've been able to determine that the actual attribute values of form DW_FORM_strp are located in the .rela.debug_info section in the ELF file, so I'll have to read more about that.
The specific ELF file posted for this question also has a rela.debug_info section, which contains relocation entries for the .debug_info section. From the ELF spec:
.relaNAME
This section holds relocation information as described below.
If the file has a loadable segment that includes relocation,
the section's attributes will include the SHF_ALLOC bit. Oth‐
erwise, the bit will be off. By convention, "NAME" is sup‐
plied by the section to which the relocations apply. Thus a
relocation section for .text normally would have the name
.rela.text. This section is of type SHT_RELA.
Each relocation entry in this section (of type Elf64_Rela in this particular case) should be iterated over, and the value of each entry should be addended with the corresponding value in the .debug_info section.
This tells me that the offset into the string table is 0x62, and the name is at an offset 0xd9.
Correct. These offsets are into the .debug_str section, which starts at offset 0x289 in the file.
readelf -WS elf.o | grep debug_str
[12] .debug_str PROGBITS 0000000000000000 000289 0000e4 01 MS 0 0 1
dd if=elf.o bs=1 skip=$((0x289+0x62)) count=75 2>/dev/null
GNU C11 7.5.0 -mtune=generic -march=x86-64 -g -O0 -fstack-protector-strong
dd if=elf.o bs=1 skip=$((0x289+0xd9)) count=5 2>/dev/null
elf.c
P.S.
I've found sections .dwarf_info, .dward_abbrev and .dwarf_str.
None of above sections exit in your file. It helps to be precise when asking questions.

required size of a configuration file for a HX1K (in "SPI slave" mode)

I am reworking the programmer for the Olimex iCE40HX1K board (targetted towards a STM32F103 ma) where I also would like to implement the "SPI Slave" mode to configure an image directly into RAM without using the serial flash.
Looking at the Lattice "programming and Configuration guide" (page 11), it is noted in table 8 that a EPROM for a ICE40-LP/LX1K must be at least 34112 bytes. (which -I guess- means that the configuration-files can be up to that size).
However, all images I have (sofar) created with the icestorm tools are 32220 octets.
I am a bit puzzled here.
Can somebody explain the difference between these two figures?
Does the HX1K need a configuration-file of 32220 or 34112 bytes?
I don't know how Lattice arrived at this number. A complete HX1K bin file with BRAM initialization but without comment and without multiboot header is 32220 bytes in size. The (optional) multiboot header would add another 160 bytes (32220 + 160 = 32380). The lattice tools usually add about 80 bytes to the comment field (32220 + 80 = 32300). Whatever I do, all numbers I have are more than 1000 short of 34112.
I don't know if there is a maximum length for the comment. Maybe there is and 34112 is the size of a bit stream with a comment of maximum length?
34112 - 32220 = 1892. Maybe someone decided to add 8kB (8192 bytes) just in case, but that person accidentally swapped the first two digits? Idk..
If you don't care about comments or multiboot headers, then iCE40 1K bit-streams have a fixed size, and that size is 32220 bytes.

Make all pages readable/writable/executable

I would like to grant full permissions (read, write, and execute) to all memory pages in an ELF binary. Ideally, I'd like to be able to do this as a transformation on a binary or object file, in the same way that symbols can be changed with objcopy. I have not yet found a good way to do this. I would also be okay with a solution that involves running code at startup that calls mprotect on every page with the flags PROT_READ | PROT_WRITE | PROT_EXEC. I've tried this briefly, but I haven't found a good way to know which pages are mapped, and therefore which pages need to be mprotected.
It isn't required that dynamically allocated pages have all permissions, only the pages mapped at program startup.
The following script implements Employed Russian's answer in code:
sets the p_type of the RELRO segment to PT_NULL
sets Flags on LOAD segments to PF_X|PF_W|PF_R.
It depends on pyelftools for python3, which can be installed with pip3 install pyelftools.
#!/usr/bin/env python3
import sys
from elftools.elf.elffile import ELFFile
from elftools.elf.descriptions import describe_p_type
if len(sys.argv) != 2:
print("Usage: elf_rwe <elf>")
name = sys.argv[1]
with open(name, "rb") as f:
elf = ELFFile(f)
rwe_offsets = []
relro_offsets = []
for i in range(elf['e_phnum']):
program_offset = elf['e_phoff'] + i * elf['e_phentsize']
f.seek(program_offset)
program_header = elf.structs.Elf_Phdr.parse_stream(f)
if program_header['p_type'] == "PT_LOAD":
rwe_offsets.append(program_offset)
if program_header['p_type'] == "PT_GNU_RELRO":
relro_offsets.append(program_offset)
f.seek(0)
b = list(f.read())
# Zap RELRO
pt_null = 0
for off in relro_offsets:
b[off] = pt_null
# Fix permissions
p_flags_offset = 4
for off in rwe_offsets:
b[off + p_flags_offset] = 0x7 # PF_X|PF_W|PF_R
with open(name, "wb") as f:
f.write(bytes(b))
I would like to grant full permissions (read, write, and execute) to all memory pages in an ELF binary.
Note that some security policies, such as W^X in selinux will prevent your binary from running.
Ideally, I'd like to be able to do this as a transformation on a binary or object file
Run readelf -Wl on your binary. You'll see something similar to:
$ readelf -Wl /bin/date
Elf file type is EXEC (Executable file)
Entry point 0x4021cf
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
INTERP 0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x00dde4 0x00dde4 R E 0x200000
LOAD 0x00de10 0x000000000060de10 0x000000000060de10 0x0004e4 0x0006b0 RW 0x200000
DYNAMIC 0x00de28 0x000000000060de28 0x000000000060de28 0x0001d0 0x0001d0 RW 0x8
NOTE 0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R 0x4
GNU_EH_FRAME 0x00cb8c 0x000000000040cb8c 0x000000000040cb8c 0x0002f4 0x0002f4 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x00de10 0x000000000060de10 0x000000000060de10 0x0001f0 0x0001f0 R 0x1
What you want to do then is to change Flags on LOAD segments to have PF_X|PF_W|PF_R. The flags are part of Elf{32,64}_Phdr table, and the offset to the table is in stored in e_phoff of the Elf{32,64}_Ehdr (which is stored at the start of every ELF file).
Look in /usr/include/elf.h. Parsing fixed-sized ELF structures involved here isn't complicated.
You are unlikely to find any standard tool that would do this for you (given that this is such an unusual and unsecure thing to do), but a program to change flags is trivial to write in C, Python or Perl.
P.S. You may also need to "zap" the RELRO segment, which could be done by changing its p_type to PT_NULL.
I haven't found a good way to know which pages are mapped, and therefore which pages need to be mprotected.
On Linux, you could parse /proc/self/maps to get that info. Other OSes may offer a different way to achieve the same.

ELF unmapped region in a segment

Below is the output for my readelf -l test
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00008000 0x00008000 0x00148 0x00148 R E 0x8000
LOAD 0x000148 0x00010148 0x00010148 0x00000 0x00004 RW 0x8000
NOTE 0x0000b4 0x000080b4 0x000080b4 0x00024 0x00024 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00 .note.gnu.build-id .text
01 .bss
02 .note.gnu.build-id
03
My question is about the first LOAD segment. It encompasses [8000 - 8148], and is mapped to sections .note and .text. My readelf -S output shows that .note section starts from 80b4, and .text starts from 80d8. That means Loadable segment contains a region [8000-80b3] which is unmapped to any section, but still will be loaded to memory by loader.
My question is, if there is any harm if I create a new segment which ranges from[80b4-8148] deleting this segment?
That means Loadable segment contains a region [8000-80b3] which is unmapped to any section, but still will be loaded to memory by loader.
Correct. You will find the Elf32_Ehdr, and likely a set of Elf32_Phdrs in that segment.
Note: for the main binary, it's actually the kernel that does the loading, and not the dynamic linker. You are not wrong in calling it "loader", but usually people use "loader" for the dynamic linker, and not the "part of the kernel that maps in the main binary".
My question is, if there is any harm if I create a new segment which ranges from[80b4-8148] deleting this segment?
The segment has to be page-aligned. A segment with .p_vaddr that is not page-aligned (as I believe you are proposing) will be rejected by the kernel.

Binary, produced by MSP430GCC, has strange start address for text segment

After compiling an exemplary C program with msp430-gcc (LTS 20120406 unpatched) for the MSPG2211 I got the following output using the readelf command:
section header
program header
The address space of the MSPG2211 microcontroller is structured as follows:
0x0000 - 0x01FF - Peripherals
0x0200 - 0x027F - RAM
0x1000 - 0x10FF - Flash (information memory)
0x1100 - 0xF7FF - ???
0xF800 - 0xFFFF - Flash (code memory + interrupt vectors)
The text section shown in the section header starts at 0xF800 which is the first address of the code memory.
The text segment, including only the text section, is bigger than the text section and starts already at 0xF76C.
As I understood, the loadable segments gets loaded to the shown physical addresses for program execution.
So why the start address of the text segment lies within an undefined memory region?
Some of the names sections (such as .text) contain data that is actually loaded into the MCU.
The ELF program headers, however, contain only metadata; their address does not matter.