I'm working with elf64 files and i was wondering two things, the first is, in which segment the shstrtable is stored, because reviewing readelf -l doesn't appear. And the other question (coming from the first one) is it possible for a section not be inside a segment?
Also i noticed some 'gaps' between some segments. What is inside those gaps?
I am using the following example, that is an hello_world.c:
readelf -lW hello
El tipo del fichero elf es DYN (Fichero objeto compartido)
Entry point 0x1040
There are 11 program headers, starting at offset 64
Encabezados de Programa:
Tipo Desplaz DirVirt DirFísica TamFich TamMem Opt Alin
PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R 0x8
INTERP 0x0002a8 0x00000000000002a8 0x00000000000002a8 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000560 0x000560 R 0x1000
LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x0001e5 0x0001e5 R E 0x1000
LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x000118 0x000118 R 0x1000
LOAD 0x002de8 0x0000000000003de8 0x0000000000003de8 0x000248 0x000250 RW 0x1000
DYNAMIC 0x002df8 0x0000000000003df8 0x0000000000003df8 0x0001e0 0x0001e0 RW 0x8
NOTE 0x0002c4 0x00000000000002c4 0x00000000000002c4 0x000044 0x000044 R 0x4
GNU_EH_FRAME 0x00200c 0x000000000000200c 0x000000000000200c 0x000034 0x000034 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
GNU_RELRO 0x002de8 0x0000000000003de8 0x0000000000003de8 0x000218 0x000218 R 0x1
mapeo de Sección a Segmento:
Segmento Secciones...
00
01 .interp
02 .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .got.plt .data .bss
06 .dynamic
07 .note.gnu.build-id .note.ABI-tag
08 .eh_frame_hdr
09
10 .init_array .fini_array .dynamic .got
in which segment the shstrtable is stored, because reviewing readelf -l doesn't appear.
It doesn't appear in any segment.
And the other question (coming from the first one) is it possible for a section not be inside a segment?
Yes.
Also i noticed some 'gaps' between some segments. What is inside those gaps?
Nothing. Segments tell the kernel or the runtime loader how to mmap the on-disk binary into memory. Since mmap operates on whole pages (4096 bytes here), the contents of memory between 0x560 and 0xFFF will be "whatever happens to be in the file at offsets 0x560 through 0xFFF, but the program shouldn't access it and the contents is effectively undefined. See also this answer.
Related
The Lexibook TV Games/Yeno 200-in-1 Console has a large set of 32-bit games. On the MicroSD Card, the 32-bit games are a set of folders with one .elf file, and a folder of many many .bin files. My question is, is there any type of way to run these properly? Opening the .elf files in online viewers show the files called upon, and the beginning is as follows:
ELF header
==========
Name Offset NumValue Value
EI_MAG: 0x00000000 0x7F454C46 ELF
EI_CLASS 0x00000004 0x01 32 BIT
EI_DATA 0x00000005 0x01 DATA2LSB (Little-Endian)
EI_VERSION 0x00000006 0x01 EV_CURRENT
EI_OSABI 0x00000007 0x00 UNIX System V ABI
EI_OSABIVER 0x00000008 0x00
E_TYPE 0x00000010 0x0002 ET_EXEC (Executable file)
E_MACHINE 0x00000012 0x0087 Unknown
E_VERSION 0x00000014 0x00000001 EV_CURRENT
E_ENTRY 0x00000018 0xA0081000
E_PHOFF 0x0000001C 0x00000034
E_SHOFF 0x00000020 0x00266A18
E_FLAGS 0x00000024 0x00070000
E_EHSIZE 0x00000028 0x0034
E_PHENTSIZE 0x0000002A 0x0020
E_PHNUM 0x0000002C 0x0001
E_SHENTSIZE 0x0000002E 0x0028
E_SHNUM 0x00000030 0x001D
E_SHSTRNDX 0x00000032 0x001A
Program header tables
=====================
Type Offset VAddr PAddr FileSz MemSz Align Flags
PT_LOAD 0x00000000 0xA0080000 0xA0080000 0x001DE034 0x0025EB20 0x00008000 Execute|Write|Read
"UNIX System V ABI" Is written, which may be info that's important.
P.S. - Sorry if my question isn't very clear, I don't know much about this kind of stuff. Thank you!
I've been messing around with x86-16 assembly and running it with VirtualBox. For some reason when I read from memory and try to print it as a character, I get completely different results from what I was expecting. However when I hard-code the character as part of the instruction, it works fine.
Here's the code:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push cs
pop ds ; ds = cs
pop es ; es = 0xB800
jmp start
; input = di (position*2), ax (character and attributes)
putchar:
stosw
ret
; input = si (NUL-terminated string)
print:
cli
cld
.nextChar:
lodsb ; mov al, [ds:si] ; si += 1
test al, al
jz .finish
call putchar
jmp .nextChar
.finish:
sti
ret
start:
mov ah, 0x0E
mov di, 8
; should print P
mov al, byte [msg]
call putchar
; should print A
mov al, byte [msg + 1]
call putchar
; should print O
mov al, byte [msg + 2]
call putchar
; should print !
mov al, byte [msg + 3]
call putchar
; should print X
mov al, 'X'
call putchar
; should print Y
mov al, 'Y'
call putchar
cli
hlt
msg: db 'PAO!', 0
; Fill the rest of the bytes upto byte 510 with 0s
times 510 - ($ - $$) db 0
; Header
db 0x55
db 0xAA
The print label and instructions in it can be ignored since I haven't used it yet because of the problem I've been having trying to print a character stored in memory. I've assembled it with both FASM and NASM and have the same problem meaning it's obviously my fault.
It prints something like:
The ORG Directive
When you specify an ORG directive like ORG 0x0000 at the top of your assembler program, and use BITS 16 you are informing NASM that when resolving labels to Code and Data, that the absolute offsets that will be generated will be based on the starting offset specified in ORG (16-bit code will be limited to an offset being a WORD/2 bytes) .
If you have ORG 0x0000 at the start and place a label start: at the beginning of the code, start will have an absolute offset of 0x0000. If you use ORG 0x7C00 then the label start will have an absolute offset of 0x7c00. This will apply to any data labels and code labels.
We can simplify your example to see what is going on in the generated code when dealing with a data variable and a hard coded character. Although this code doesn't exactly perform the same actions as your code, it is close enough to show what works and what doesn't.
Example using ORG 0x0000:
BITS 16
ORG 0x0000
start:
push cs
pop ds ; DS=CS
push 0xb800
pop es ; ES = 0xB800 (video memory)
mov ah, 0x0E ; AH = Attribute (yellow on black)
mov al, byte [msg]
mov [es:0x00], ax ; This should print letter 'P'
mov al, byte [msg+1]
mov [es:0x02], ax ; This should print letter 'A'
mov al, 'O'
mov [es:0x04], ax ; This should print letter 'O'
mov al, '!'
mov [es:0x06], ax ; This should print letter '!'
cli
hlt
msg: db "PA"
; Bootsector padding
times 510-($-$$) db 0
dw 0xAA55
If you were to run this on VirtualBox the first 2 characters would be garbage while O! should display correctly. I will use this example through the rest of this answer.
VirtualBox / CS:IP / Segment:Offset Pairs
In the case of Virtual Box, it will effectively do the equivalent of a FAR JMP to 0x0000:0x7c00 after loading the boot sector at physical address 0x00007c00. A FAR JMP (or equivalent) will not only jump to a given address, it sets CS and IP to the values specified. A FAR JMP to 0x0000:0x7c00 will set CS = 0x0000 and IP = 0x7c00 .
If one is unfamiliar with the calculations behind 16-bit segment:offset pairs and how they map to a physical address then this document is a reasonably good starting point to understanding the concept. The general equation to get a physical memory address from a 16-bit segment:offset pair is (segment<<4)+offset = 20-bit physical address .
Since VirtualBox uses CS:IP of 0x0000:0x7c00 it would start executing code at a physical address of (0x0000<<4)+0x7c00 = 20-bit physical address 0x07c00 . Please be aware that this isn't guaranteed to be the case in all environments. Because of the nature of segment:offset pairs, there is more than one way to reference physical address 0x07c00. See the section at the end of this answer on ways to handle this properly.
What is Going Wrong with Your Bootloader?
Assuming we are using VirtualBox and the information above in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 upon entry to our bootloader. If we take the example code (Using ORG 0x0000) I wrote in the first section of this answer and look at the disassembled information (I'll use objdump output) we'd see this:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin
00000000 <.data>:
0: 0e push cs
1: 1f pop ds
2: 68 00 b8 push 0xb800
5: 07 pop es
6: b4 0e mov ah,0xe
8: a0 24 00 mov al,ds:0x24
b: 26 a3 00 00 mov es:0x0,ax
f: a0 25 00 mov al,ds:0x25
12: 26 a3 02 00 mov es:0x2,ax
16: b0 4f mov al,0x4f
18: 26 a3 04 00 mov es:0x4,ax
1c: b0 21 mov al,0x21
1e: 26 a3 06 00 mov es:0x6,ax
22: fa cli
23: f4 hlt
24: 50 push ax ; Letter 'P'
25: 41 inc cx ; Letter 'A'
...
1fe: 55 push bp
1ff: aa stos BYTE PTR es:[di],al
Since the ORG information is lost when assembling to a binary file, I use --adjust-vma=0x0000 so that the first column of values (memory address) start at 0x0000. I want to do this because I used ORG 0x0000 in the original assembler code. I have also added some comments in the code to show where our data section is (and where the letters P and A were placed after the code).
If you were to run this program in VirtualBox the first 2 characters will come out as gibberish. So why is that? First recall VirtualBox reached our code by setting CS to 0x0000 and IP to 0x7c00. This code then copied CS to DS:
0: 0e push cs
1: 1f pop ds
Since CS was zero, then DS is zero. Now let us look at this line:
8: a0 24 00 mov al,ds:0x24
ds:0x24 is actually the encoded address for the msg variable in our data section. The byte at offset 0x24 has the value P in it (0x25 has A). You might see where things might go wrong. Our DS = 0x0000 so mov al,ds:0x24 is really the same as mov al,0x0000:0x24. This syntax isn't valid but I'm replacing DS with 0x0000 to make a point. 0x0000:0x24 is where our code while executing will attempt to read our letter P from. But wait! That is physical address (0x0000<<4)+0x24 = 0x00024. This memory address happens to be at the bottom of memory in the middle of the interrupt vector table. Clearly this is not what we intended!
There are a couple ways to tackle this issue. The easiest (and preferred method) is to actually place the proper segment into DS, and not rely on what CS might be when our program runs. Since we set an ORG of 0x0000 we need to have a Data Segment(DS) = 0x07c0 . A segment:offset pair of 0x07c0:0x0000 = physical address 0x07c00 . Which is what the address of our bootloader is at. So all we have to do is amend the code by replacing:
push cs
pop ds ; DS=CS
With:
push 0x07c0
pop ds ; DS=0x07c0
This change should provide the correct output when run in VirtualBox . Now let us see why. This code didn't change:
8: a0 24 00 mov al,ds:0x24
Now when executed DS=0x07c0. This would have been like saying mov al,0x07c0:0x24. 0x07c0:0x24, which would translate into a physical address of (0x07c0<<4)+0x24 = 0x07c24 . This is what we want since our bootloader was physically placed into memory by the BIOS starting at that location and so it should reference our msg variable correctly.
Moral of the story? What ever you use for ORG there should be an applicable value in the DS register when we start our program.We should set it explicitly, and not rely on what is in CS.
Why Do Immediate Values Print?
With the original code, the first 2 characters printed gibberish, but the last two didn't. As was discussed in the previous section there was a reason the first 2 character wouldn't print, but what about the last 2 characters that did?
Let us examine the disassembly of the 3rd character O more carefully:
16: b0 4f mov al,0x4f ; 0x4f = 'O'
Since we used an immediate (constant) value and moved it into register AL, the character itself is encoded as part of the instruction. It doesn't rely on a memory access via the DS register. Because of this the last 2 characters displayed properly.
Ross Ridge's Suggestion and Why it Works in VirtualBox
Ross Ridge suggested we use ORG 0x7c00, and you observed that it worked. Why did that happen? And is that solution ideal?
Using my very first example and modify ORG 0x0000 to ORG 0x7c00, and then assemble it. objdump would have provided this disassembly:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x7c00 boot.bin
boot.bin: file format binary
Disassembly of section .data:
00007c00 <.data>:
7c00: 0e push cs
7c01: 1f pop ds
7c02: 68 00 b8 push 0xb800
7c05: 07 pop es
7c06: b4 0e mov ah,0xe
7c08: a0 24 7c mov al,ds:0x7c24
7c0b: 26 a3 00 00 mov es:0x0,ax
7c0f: a0 25 7c mov al,ds:0x7c25
7c12: 26 a3 02 00 mov es:0x2,ax
7c16: b0 4f mov al,0x4f
7c18: 26 a3 04 00 mov es:0x4,ax
7c1c: b0 21 mov al,0x21
7c1e: 26 a3 06 00 mov es:0x6,ax
7c22: fa cli
7c23: f4 hlt
7c24: 50 push ax ; Letter 'P'
7c25: 41 inc cx ; Letter 'A'
...
7dfe: 55 push bp
7dff: aa stos BYTE PTR es:[di],al
VirtualBox set CS to 0x0000 when it jumped to our bootloader. Our original code then copied CS to DS, so DS = 0x0000. Now observe what the ORG 0x7c00 directive has done to our generated code:
7c08: a0 24 7c mov al,ds:0x7c24
Notice how we are now using an offset of 0x7c24! This would be like mov al,0x0000:0x7c24 which is physical address (0x0000<<4)+0x7c24 = 0x07c24. That is the right memory location where the bootloader was loaded, and is the proper position of our msg string. So it works.
Is using an ORG 0x7c00 a bad idea? No. It is fine. But we have a subtle issue to contend with. What happens if another Virtual PC environment or real hardware doesn't FAR JMP to our bootloader using a CS:IP of 0x0000:0x7c00? This is possible. There are many physical PCs with a BIOS that actually does the equivalent of a far jump to 0x07c0:0x0000. That too is physical address 0x07c00 as we have already seen. In that environment, when our code runs CS = 0x07c0. If we use the original code that copies CS to DS, DS now has 0x07c0 too. Now observe what would happen to this code in that situation:
7c08: a0 24 7c mov al,ds:0x7c24
DS=0x07c0 in this scenario. We now have something resembling mov al,0x07c0:0x7c24 when the program actually runs. Ut-oh, that looks bad. What does that translate to as a physical address? (0x07c0<<4)+0x7c24 = 0x0F824. That is somewhere above our bootloader and it will contain whatever happens to be there after the computer boots. Likely zeros, but it should be assumed to be garbage. Clearly not where our msg string was loaded!
So how do we resolve this? To amend what Ross Ridge suggested, and to heed the advice I previously gave about explicitly setting DS to the segment we really want (don't assume CS is correct and then blindly copy to DS) we should place 0x0000 into DS when our bootloader starts if we use ORG 0x7c00. So we can change this code:
ORG 0x7c00
start:
push cs
pop ds ; DS=CS
to:
ORG 0x7c00
start:
xor ax, ax ; ax=0x0000
mov ds, ax ; DS=0x0000
Here we don't rely on an untrusted value in CS. We simply set DS to the segment value that makes sense given the ORG we used. You could have pushed 0x0000 and popped it into DS as you have been doing. I am more accustomed to zeroing out a register and moving that to DS.
By taking this approach, it doesn't matter what value in CS might have been used to reach our bootloader, the code would still reference the appropriate memory location for our data.
Don't Assume 1st Stage is Invoked by BIOS with CS:IP=0x0000:0x7c00
In my General Bootloader Tips that I wrote in a previous StackOverflow answer, tip #1 is very important:
When the BIOS jumps to your code you can't rely on CS,DS,ES,SS,SP registers having valid or expected values. They should be set up appropriately when your bootloader starts. You can only be guaranteed that your bootloader will be loaded and run from physical address 0x07c00 and that the boot drive number is loaded into the DL register.
The BIOS could have FAR JMP'ed (or equivalent) to our code with jmp 0x07c0:0x0000, and some emulators and real hardware do it this way. Others use jmp 0x0000:0x7c00 like VirtualBox does.
We should account for this by setting DS explicitly to what we need, and set it to what makes sense for the value we use in our ORG directive.
Summary
Don't assume CS is a value we expect, and don't blindly copy CS to DS . Set DS explicitly.
Your code could be fixed to use either ORG 0x0000 as you originally had it, if we set DS appropriately to 0x07c0 as previously discussed. That could look like:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x07c0
pop ds ; DS=0x07c0 since we use ORG 0x0000
pop es
Alternatively we could have used ORG 0x7c00 like this:
ORG 0x7c00
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x0000
pop ds ; DS=0x0000 since we use ORG 0x7c00
pop es
Is the following a valid way to define variables in asm?
.globl main
.globl a, b, c, d
a: .byte 4
b: .value 7
c: .long 0x0C # 11
d: .quad 9
main:
mov $0, %eax
add a(%rip), %eax
add b(%rip), %eax
add c(%rip), %eax
add d(%rip), %eax
ret
For example, is it required/suggested to have a .TEXT section? How exactly does a(%rip) resolve to the value of $4? That seems almost like magic to me.
The default section is .text; lines before any section directive assemble into the .text section. So you do have one, and in fact you put everything in it, including your data. (Or read-only constants.) Normally you should put static constants in .rodata (or .rdata on Windows), not in .text, for performance reasons. (Mixing code and data wastes space in the I-cache and D-cache, and in TLBs.)
It doesn't resolve to an immediate $4 at assemble time, it resolves to an address. In this case, using a RIP-relative addressing mode. See what does "mov offset(%rip), %rax" do? / How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for more about the fact that it means "address of symbol a with respect to RIP", not actually RIP + absolute address of the symbol.
In other cases, the symbol a does generally resolve to its (32-bit) absolute address when used as add $a, %rdi or something.
It's only at runtime that the CPU loads data (which you put there with directives like .long) from that static storage. If you changed what was in memory (e.g. with a debugger, or by running other instructions) before add c(%rip), %eax executed, it would load a different value.
You put your constant data in .text, along with the code, which is generally not what you want for performance reasons. But it means the assembler can resolve the RIP-relative addressing at assemble time instead of only using a relocation that the linker has to fill in. Although it seems GAS chooses not to resolve the references and still leaves it for the linker:
$ gcc -c foo.s
$ objdump -drwC -Matt foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
0: 04 .byte 0x4
0000000000000001 <b>:
1: 07 (bad)
...
0000000000000003 <c>:
3: 0c 00 or $0x0,%al
...
0000000000000007 <d>:
7: 09 00 or %eax,(%rax)
9: 00 00 add %al,(%rax)
b: 00 00 add %al,(%rax)
...
000000000000000f <main>:
f: b8 00 00 00 00 mov $0x0,%eax
14: 03 05 00 00 00 00 add 0x0(%rip),%eax # 1a <main+0xb> 16: R_X86_64_PC32 a-0x4
1a: 03 05 00 00 00 00 add 0x0(%rip),%eax # 20 <main+0x11> 1c: R_X86_64_PC32 b-0x4
20: 03 05 00 00 00 00 add 0x0(%rip),%eax # 26 <main+0x17> 22: R_X86_64_PC32 c-0x4
26: 03 05 00 00 00 00 add 0x0(%rip),%eax # 2c <main+0x1d> 28: R_X86_64_PC32 d-0x4
2c: c3 retq
(Attempted disassembly of your data as instructions happens because you put them in .text. objdump -d only disassembles .text, and non-immediate constants are normally placed in .rodata.)
Linking it into a executable resolves those symbol references:
$ gcc -nostdlib -static foo.s # not a working executable, just link it without extra stuff
$ objdump -drwC -Matt a.out
... (bogus data omitted)
000000000040100f <main>:
40100f: b8 00 00 00 00 mov $0x0,%eax
401014: 03 05 e6 ff ff ff add -0x1a(%rip),%eax # 401000 <a>
40101a: 03 05 e1 ff ff ff add -0x1f(%rip),%eax # 401001 <b>
401020: 03 05 dd ff ff ff add -0x23(%rip),%eax # 401003 <c>
401026: 03 05 db ff ff ff add -0x25(%rip),%eax # 401007 <d>
40102c: c3 retq
Note the 32-bit little-endian 2's complement encoding of the relative offsets in the RIP+rel32 addressing modes. (And the comment with the absolute address, added by objdump for convenience in this disassembly output.)
BTW, most assemblers including GAS have macro facilities, so you could have used a = 4 or .equ a, 4 to define it as an assemble-time constant, instead of emitting data into the output there. Then you'd use it as add $a, %eax, which would assemble to an add $sign_extended_imm8, r/m32 opcode.
Also, all your loads are dword sized (determined by the register operand), so only 1 of them matches the size of the data directives you used. Single-step through your code and look at the high bits of EAX.
Assembly language doesn't really have variables. It has tools you can use to implement the high-level-language concept of variables, including variables with static storage class. (A label and some space in .data or .bss. Or .rodata for const "variables".)
But if you use the tools differently, you can do things like load 4 bytes that span the .byte, the .value (16-bit), and the first byte of the .long. So after the first instruction, you'll have EAX += 0x0c000704 (because x86 is little-endian). This is totally legal to write in assembler, and nothing is checking to enforce the concept of a variable as ending before the next label.
(Unless you use MASM, which does have variables; in that case you'd have had to write add eax, dword ptr [a]; without the size override MASM would complain about the mismatch between a dword register and a byte variable. Other flavours of asm syntax, like NASM and AT&T, assume you know what you're doing and don't try to be "helpful".)
I have used a symbol "__copy_start" inside my assembly code which is coming from linker script. symbol is defined as ABS in symbol table.
This symbol is used inside a macro to copy data from one memory location to another.
After looking at varenter code hereious ways to modify this symbol directly in elf i decided to write C code of my own to modify the symbol value.
To do that i traversed entire symbol table and did string match for the symbol i am interested in. When there is a symbol name match i just assigned symbol_table.st_value = new value.
To make sure the new value is taken i did readelf -s and checked that it does show the new value assigned by me.
Now, when i disassemble the modified elf i find that the new value has not taken effect and i still see the assembly code doing copy from old symbol value.
My question is:
Am i doing something wrong here? is it possible to change the symbol values in elf? If yes, please let me know the correct way to do it. How do i achieve what i intend to do here.
Note: I don't have the source code so taking this approach.
Thanks in advance,
Gaurav
wanted to add more information so that people can understand better.
copying the elf header below:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
**Type: EXEC (Executable file)**
Machine: Ubicom32 32-bit microcontrollers
Version: 0x1
Entry point address: 0xb0000000
Start of program headers: 52 (bytes into file)
Start of section headers: 33548 (bytes into file)
Flags: 0x6
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 3
Here as you can see that file is of type executable.
output of readelf -S copied below:
There are 6 section headers, starting at offset 0x830c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 3ffc0000 004000 000ebc 00 AX 0 0 1
[ 2] .sdram PROGBITS 50000000 008000 0002e4 00 WA 0 0 1
[ 3] .shstrtab STRTAB 00000000 0082e4 000028 00 0 0 1
[ 4] .symtab SYMTAB 00000000 0083fc 0001c0 10 5 20 4
[ 5] .strtab STRTAB 00000000 0085bc 00019a 00 0 0 1
I am using one of the symbol named "__copy_start" in an instruction to copy the data from .sdram section to .text section. I was under an impression that i could go and change the symbol_table.st_value and then get the desired work done. But unfortunately that is not the case. Seems like it is already compiled and cannot be changed like this.
Any idea how this could be done would be really helpful.
Regards,
Gaurav
Are you sure that the object code actually uses a relocation to reference the data at the __copy_start symbol? Even for position-independent code, it is usually possible to turn section start addresses into relative addresses, which do not need a relocation. (That the symbol itself remains present with an absolute address does not change this.)
You can check this by using readelf -r or eu-readelf -r and examining the output. It is also visible in the objdump --dissassemble --reloc output.
Below is the output for my readelf -l test
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00008000 0x00008000 0x00148 0x00148 R E 0x8000
LOAD 0x000148 0x00010148 0x00010148 0x00000 0x00004 RW 0x8000
NOTE 0x0000b4 0x000080b4 0x000080b4 0x00024 0x00024 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00 .note.gnu.build-id .text
01 .bss
02 .note.gnu.build-id
03
My question is about the first LOAD segment. It encompasses [8000 - 8148], and is mapped to sections .note and .text. My readelf -S output shows that .note section starts from 80b4, and .text starts from 80d8. That means Loadable segment contains a region [8000-80b3] which is unmapped to any section, but still will be loaded to memory by loader.
My question is, if there is any harm if I create a new segment which ranges from[80b4-8148] deleting this segment?
That means Loadable segment contains a region [8000-80b3] which is unmapped to any section, but still will be loaded to memory by loader.
Correct. You will find the Elf32_Ehdr, and likely a set of Elf32_Phdrs in that segment.
Note: for the main binary, it's actually the kernel that does the loading, and not the dynamic linker. You are not wrong in calling it "loader", but usually people use "loader" for the dynamic linker, and not the "part of the kernel that maps in the main binary".
My question is, if there is any harm if I create a new segment which ranges from[80b4-8148] deleting this segment?
The segment has to be page-aligned. A segment with .p_vaddr that is not page-aligned (as I believe you are proposing) will be rejected by the kernel.