My apology for my poor English, really having a hard time understanding what is the sh_info field contains for relocation section, following is what I get from the ELF document:
It says
sh_info : contains the section header index of the section to which the relocation applies
sh_link: contains the section header index of the associated symbol table.
Clearly: sh_info is not about the symbol table section that the relocation section relates to, whose information is stored in sh_link.
Based on my understanding: when relocating a symbol, three sections are related: the relocation section, the symbol table section, and the section which contains symbols' definition for symbols in the symbol table.
Assumption 1: So I assume sh_info is about the third section mentioned ahead
-----However, when I go through the sample code for relocation, my assumption seems not match
static int elf_do_reloc(Elf32_Ehdr *hdr, Elf32_Rel *rel, Elf32_Shdr *reltab) {
Elf32_Shdr *target = elf_section(hdr, reltab->sh_info);
int addr = (int)hdr + target->sh_offset;
int *ref = (int *)(addr + rel->r_offset);
// Symbol value
int symval = 0;
if(ELF32_R_SYM(rel->r_info) != SHN_UNDEF) {
symval = elf_get_symval(hdr, reltab->sh_link, ELF32_R_SYM(rel->r_info));
if(symval == ELF_RELOC_ERR) return ELF_RELOC_ERR;
}
-----Sicce r_info is a field only entry in relocation section contains
which means sh_info is the index of the relocation section itself. < Assumption 2
What confuses me more is the an example someone else posts, reading elf file example
it seems the sh_info field information is nothing related to my previous 2 assumptions
Could anyone please help explain what does sh_info really contains?
About the "confusing example", maybe relocation part got deleted but only mention of sh_info is related to parsing of (dynamic) symbols name and (as show in image in your question) that field has different meaning for SHT_SYMTAB and SHT_DYNSYM (number of items in section + 1).
Section #0A OFF: 0x000015F8
Name: .rela.plt (0x00000084)
Type: SHT_RELA (0x00000004)
Flags: -a-
Addr: 0x004003E8
Offset: 0x000003E8
Size: 0x00000090
Link: 0x00000005
Info 0x0000000C
Section #05 OFF: 0x000014B8
Name: .dynsym (0x0000004E)
Type: SHT_DYNSYM (0x0000000B)
Flags: -a-
Addr: 0x00400280
Offset: 0x00000280
Size: 0x000000A8
Link: 0x00000006
Info 0x00000001
Section #0C OFF: 0x00001678
Name: .plt (0x00000089)
Type: SHT_PROGBITS (0x00000001)
Flags: -ax
Addr: 0x004004A0
Offset: 0x000004A0
Size: 0x00000070
Link: 0x00000000
Info 0x00000000
You can see that sh_link points to .dynsym section and sh_info points to .plt section (which contains executable memory).
So sh_link is symbol table and sh_info is executable section that gets modified.
Basically your document says it all already, but here are some more references.
Chapter on Sections [Figure 4-12: sh_link and sh_info Interpretation]:
sh_link - The section header index of the associated symbol table.
sh_info - The section header index of the section to which the relocation applies.
Also there's a chapter on relocation:
A relocation section references two other sections: a symbol table and a section to modify. The section header's sh_info and sh_link members, described in ``Sections'' above, specify these relationships. Relocation entries for different object files have slightly different interpretations for the r_offset member.
In relocatable files, r_offset holds a section offset. The relocation section itself describes how to modify another section in the file; relocation offsets designate a storage unit within the second section.
In executable and shared object files, r_offset holds a virtual address. To make these files' relocation entries more useful for the dynamic linker, the section offset (file interpretation) gives way to a virtual address (memory interpretation).
And just for fun... Here are (search page) relocation types for x86 on Linux:
#define R_X86_64_NONE 0 /* No reloc */
#define R_X86_64_64 1 /* Direct 64 bit */
#define R_X86_64_PC32 2 /* PC relative 32 bit signed */
#define R_X86_64_GOT32 3 /* 32 bit GOT entry */
#define R_X86_64_PLT32 4 /* 32 bit PLT address */
#define R_X86_64_COPY 5 /* Copy symbol at runtime */
#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */
#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */
#define R_X86_64_RELATIVE 8 /* Adjust by program base */
#define R_X86_64_GOTPCREL 9 /* 32 bit signed pc relative
offset to GOT */
#define R_X86_64_32 10 /* Direct 32 bit zero extended */
#define R_X86_64_32S 11 /* Direct 32 bit sign extended */
#define R_X86_64_16 12 /* Direct 16 bit zero extended */
#define R_X86_64_PC16 13 /* 16 bit sign extended pc relative */
#define R_X86_64_8 14 /* Direct 8 bit sign extended */
#define R_X86_64_PC8 15 /* 8 bit sign extended pc relative */
Related
I'm attempting to write some ELF parsing logic (in C). Specifically, I'm trying to identify which entries in the GOT correspond to which functions.
I've crafted a simple program which contains references to malloc and free. Some relevant excerpts from readelf -a a.out:
Relocation section '.rela.plt' at offset 0x630 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000003fc8 0000000100000007 R_X86_64_JUMP_SLOT 0000000000000000 free#GLIBC_2.2.5 + 0
0000000000003fd0 0000000500000007 R_X86_64_JUMP_SLOT 0000000000000000 malloc#GLIBC_2.2.5 + 0
No processor specific unwind information to decode
Symbol table '.dynsym' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free#GLIBC_2.2.5 (2)
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.34 (3)
3: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTable
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND malloc#GLIBC_2.2.5 (2)
6: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
7: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize#GLIBC_2.2.5 (2)
I know how to use .dynstr to get the names of the symbols in .dynsym. However, how is readelf populating the symbol names in .rela.plt? I'm not seeing anything in the definitions of either Elf64_Sym or Elf64_Rel which would help. At first, I thought the st_shndx field in Elf64_Sym would be relevant but readelf is showing that value as SHN_UNDEF.
The information is contained in the Elf64_Rel structure. Specifically, the r_info field:
This member gives both the symbol table index with respect to which the relocation must be made and the type of relocation to apply.
The ELF64_R_SYM macro can be used to extract the offset from this field. As seen in the .rela.plt description in the OP, free, for example, has an index of 1 which corresponds to entry 1 in .dynsym.
I'm trying to understand where DW_FORM_strp attribute values are actually stored in an ELF file (can be found here: https://filebin.net/77bb8359o0ibqu67).
I've found sections .debug_info, .debug_abbrev and .debug_str. I've then parsed the compilation unit header in .debug_info, and found the abbreviation table entry for the compile unit and iterated over its abbreviations. The first abbreviation is DW_AT_producer with form DW_FORM_strp. What I'm wondering is how to find where this offset is located?
From the DWARF4 spec I read: Each debugging information entry begins with a code that represents an entry in a separate abbreviations table. This code is followed directly by a series of attribute values. My understanding of this is that if I go back to the compilation unit header, skip over its content, I should end up at the compilation unit. It starts with a ULEB128 (which I parse), after which the attribute values should come. However, in my ELF file those bytes are all 0. I've run readelf -w on the file, and I see the following:
Contents of the .debug_info section:
Compilation Unit # offset 0x0:
Length: 0xf6 (32-bit)
Version: 4
Abbrev Offset: 0x0
Pointer Size: 8
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<c> DW_AT_producer : (indirect string, offset: 0x62): GNU C11 7.5.0 -mtune=generic -march=x86-64 -g -O0 -fstack-protector-strong
<10> DW_AT_language : 12 (ANSI C99)
<11> DW_AT_name : (indirect string, offset: 0xd9): elf.c
<15> DW_AT_comp_dir : (indirect string, offset: 0xad): /home//struct_analyzer
<19> DW_AT_low_pc : 0x0
<21> DW_AT_high_pc : 0x39
<29> DW_AT_stmt_list : 0x0
This tells me that the offset into the string table is 0x62, and the name is at an offset 0xd9. However, after parsing the ULEB128 which is the first part of any DIE, the next 4 bytes (the first attribute's value) are 0x00 00 00 00. This I don't understand?
Edit to Employed Russian:
Yes, I understand that the offset 0x62 points into the .debug_str section. However, what I'm wondering is where I find this 0x62 value?
Each DIE starts with a ULEB128 value (the abbreviation table entry code), and is followed by the attributes. The first attribute in the corresponding abbreviation table entry is a DW_AT_producer of form DW_FORM_strp. This means that the next 4 bytes in the DIE are supposed to be the offset into .debug_str. However, the next 4 bytes are 0x00 00 00 00, and not 0x62 00 00 00 which is the value I'm looking. 0x62 is residing at offset 0x5c8 into the ELF file, whereas the DIE's attributes start at offset 0x85 as far as I can tell (see attached image for a hexdump (little endian) - highlighted byte is the ULEB128, and the following bytes are what I expect to be the offset into .debug_str).
Edit 2
I've been able to determine that the actual attribute values of form DW_FORM_strp are located in the .rela.debug_info section in the ELF file, so I'll have to read more about that.
The specific ELF file posted for this question also has a rela.debug_info section, which contains relocation entries for the .debug_info section. From the ELF spec:
.relaNAME
This section holds relocation information as described below.
If the file has a loadable segment that includes relocation,
the section's attributes will include the SHF_ALLOC bit. Oth‐
erwise, the bit will be off. By convention, "NAME" is sup‐
plied by the section to which the relocations apply. Thus a
relocation section for .text normally would have the name
.rela.text. This section is of type SHT_RELA.
Each relocation entry in this section (of type Elf64_Rela in this particular case) should be iterated over, and the value of each entry should be addended with the corresponding value in the .debug_info section.
This tells me that the offset into the string table is 0x62, and the name is at an offset 0xd9.
Correct. These offsets are into the .debug_str section, which starts at offset 0x289 in the file.
readelf -WS elf.o | grep debug_str
[12] .debug_str PROGBITS 0000000000000000 000289 0000e4 01 MS 0 0 1
dd if=elf.o bs=1 skip=$((0x289+0x62)) count=75 2>/dev/null
GNU C11 7.5.0 -mtune=generic -march=x86-64 -g -O0 -fstack-protector-strong
dd if=elf.o bs=1 skip=$((0x289+0xd9)) count=5 2>/dev/null
elf.c
P.S.
I've found sections .dwarf_info, .dward_abbrev and .dwarf_str.
None of above sections exit in your file. It helps to be precise when asking questions.
I'm writing a python script to delete files on MacOS, and I run into SIP protected files. I know the presence of st_flags more than likely mean I can't delete the file. Like here:
>>> os.stat(f).st_flags
524288
But I'm curious to know what that actually means. I looked in stat.h and see:
/*
* Definitions of flags stored in file flags word.
*
* Super-user and owner changeable flags.
*/
#define UF_SETTABLE 0x0000ffff /* mask of owner changeable flags */
#define UF_NODUMP 0x00000001 /* do not dump file */
#define UF_IMMUTABLE 0x00000002 /* file may not be changed */
#define UF_APPEND 0x00000004 /* writes to file may only append */
#define UF_OPAQUE 0x00000008 /* directory is opaque wrt. union */
/*
* The following bit is reserved for FreeBSD. It is not implemented
* in Mac OS X.
*/
/* #define UF_NOUNLINK 0x00000010 */ /* file may not be removed or renamed */
#define UF_COMPRESSED 0x00000020 /* file is compressed (some file-systems) */
/* UF_TRACKED is used for dealing with document IDs. We no longer issue
* notifications for deletes or renames for files which have UF_TRACKED set. */
#define UF_TRACKED 0x00000040
#define UF_DATAVAULT 0x00000080 /* entitlement required for reading */
/* and writing */
/* Bits 0x0100 through 0x4000 are currently undefined. */
#define UF_HIDDEN 0x00008000 /* hint that this item should not be */
/* displayed in a GUI */
/*
* Super-user changeable flags.
*/
#define SF_SUPPORTED 0x001f0000 /* mask of superuser supported flags */
#define SF_SETTABLE 0xffff0000 /* mask of superuser changeable flags */
#define SF_ARCHIVED 0x00010000 /* file is archived */
#define SF_IMMUTABLE 0x00020000 /* file may not be changed */
#define SF_APPEND 0x00040000 /* writes to file may only append */
#define SF_RESTRICTED 0x00080000 /* entitlement required for writing */
#define SF_NOUNLINK 0x00100000 /* Item may not be removed, renamed or mounted on */
I just dont quite see how it adds up to 524288. I mean I kinda get it, like permission bits, 1 or more in the 6th position from the right should mean SF_NOUNLINK is set, but where is the 5 coming from? 2 in the 5th position from the right means SF_IMMUTABLE is set, and 2nd position is 8, which is UF_DATAVAULT, which makes sense. The other values in positions 3-4, and the value of 5 (from right) I dont understand. Any pointers as how to read this?
After compiling an exemplary C program with msp430-gcc (LTS 20120406 unpatched) for the MSPG2211 I got the following output using the readelf command:
section header
program header
The address space of the MSPG2211 microcontroller is structured as follows:
0x0000 - 0x01FF - Peripherals
0x0200 - 0x027F - RAM
0x1000 - 0x10FF - Flash (information memory)
0x1100 - 0xF7FF - ???
0xF800 - 0xFFFF - Flash (code memory + interrupt vectors)
The text section shown in the section header starts at 0xF800 which is the first address of the code memory.
The text segment, including only the text section, is bigger than the text section and starts already at 0xF76C.
As I understood, the loadable segments gets loaded to the shown physical addresses for program execution.
So why the start address of the text segment lies within an undefined memory region?
Some of the names sections (such as .text) contain data that is actually loaded into the MCU.
The ELF program headers, however, contain only metadata; their address does not matter.
This question already has answers here:
Solution needed for building a static IDT and GDT at assemble/compile/link time
(1 answer)
How to do computations with addresses at compile/linking time?
(2 answers)
Closed 5 days ago.
I'm working on a project that has tight boot time requirements. The targeted architecture is an IA-32 based processor running in 32 bit protected mode. One of the areas identified that can be improved is that the current system dynamically initializes the processor's IDT (interrupt descriptor table). Since we don't have any plug-and-play devices and the system is relatively static, I want to be able to use a statically built IDT.
However, this proving to be troublesome for the IA-32 arch since the 8 byte interrupt gate descriptors splits the ISR address. The low 16 bits of the ISR appear in the first 2 bytes of the descriptor, some other bits fill in the next 4 bytes, and then finally the last 16 bits of the ISR appear in the last 2 bytes.
I wanted to use a const array to define the IDT and then simply point the IDT register at it like so:
typedef struct s_myIdt {
unsigned short isrLobits;
unsigned short segSelector;
unsigned short otherBits;
unsigned short isrHibits;
} myIdtStruct;
myIdtStruct myIdt[256] = {
{ (unsigned short)myIsr0, 1, 2, (unsigned short)(myIsr0 >> 16)},
{ (unsigned short)myIsr1, 1, 2, (unsigned short)(myIsr1 >> 16)},
etc.
Obviously this won't work as it is illegal to do this in C because myIsr is not constant. Its value is resolved by the linker (which can do only a limited amount of math) and not by the compiler.
Any recommendations or other ideas on how to do this?
You ran into a well known x86 wart. I don't believe the linker can stuff the address of your isr routines in the swizzled form expected by the IDT entry.
If you are feeling ambitious, you could create an IDT builder script that does something like this (Linux based) approach. I haven't tested this scheme and it probably qualifies as a nasty hack anyway, so tread carefully.
Step 1: Write a script to run 'nm' and capture the stdout.
Step 2: In your script, parse the nm output to get the memory address of all your interrupt service routines.
Step 3: Output a binary file, 'idt.bin' that has the IDT bytes all setup and ready for the LIDT instruction. Your script obviously outputs the isr addresses in the correct swizzled form.
Step 4: Convert his raw binary into an elf section with objcopy:
objcopy -I binary -O elf32-i386 idt.bin idt.elf
Step 5: Now idt.elf file has your IDT binary with the symbol something like this:
> nm idt.elf
000000000000000a D _binary_idt_bin_end
000000000000000a A _binary_idt_bin_size
0000000000000000 D _binary_idt_bin_start
Step 6: relink your binary including idt.elf. In your assembly stubs and linker scripts, you can refer to symbol _binary_idt_bin_start as the base of the IDT. For example, your linker script can place the symbol _binary_idt_bin_start at any address you like.
Be careful that relinking with the IDT section doesn't move anyting else in your binary, e.g. your isr routines. Manage this in your linker script (.ld file) by puting the IDT into it's own dedicated section.
---EDIT---
From comments, there seems to be confusion about the problem. The 32-bit x86 IDT expects the address of the interrupt service routine to be split into two different 16-bit words, like so:
31 16 15 0
+---------------+---------------+
| Address 31-16 | |
+---------------+---------------+
| | Address 15-0 |
+---------------+---------------+
A linker is thus unable to plug-in the ISR address as a normal relocation. So, at boot time, software must construct this split format, which slows boot time.