I would be grateful if can explain me what's happening in the following example using printf, compiling with nasm and gcc.
Why is "sud" only printed on the screen? I don't understand, also, why is "sudobor" printed on the screen when I exchange "push 'sud'" with "push 'sudo'"?
Can someone, also explain why do i need to push esp? Is it a null, that is required to be at the end of the string in printf?
Thank you advance.
This is string.s file:
section .data
section .text
global start
extern printf
start:
push ebp
mov ebp, esp
push 'bor'
push 'sud'
push esp
call printf
mov esp, ebp
pop dword ebp
ret
this is c file:
#include <stdio.h>
#include <stdlib.h>
extern void start();
int main(void) {
start();
}
First off, thanks for blowing my mind. When I first looked at your code, I didn't believe it would work at all. Then I tried it and reproduced your results. Now it makes perfect sense to me, albeit in a twisted way. :-) I'll try to explain it.
First, let's look at the more sane way to achieve this. Define a string in the data portion of the ASM file:
section .data
string: db "Hey, is this thing on?", 0
Then push the address of that string on the stack before calling printf:
push string
call printf
So, that first parameter to printf (last parameter pushed on the stack before the call) is the pointer to the format string. What your code did was push the string on the stack, followed by the stack pointer which then pointed to the string.
Next, I'm going to replace your strings so that they are easier to track in disassembly:
push '567'
push '123'
push esp
call printf
Assemble with nasm, and then disassemble with objdump:
nasm string.s -f elf32 -o string.o
objdump -d -Mintel string.o
When you push, e.g., '123', that gets converted to a 32-bit hex digit-- 0x333231 in this case. Note that the full 32 bits are 0x00333231.
3: 68 35 36 37 00 push 0x373635
8: 68 31 32 33 00 push 0x333231
d: 54 push esp
Pushing onto the stack decrements the stack pointer. Assuming an initial stack pointer of 0x70 (contrived for simplicity), this is the state of the stack before calling printf:
64: 68: 6c: 70:
68 00 00 00 31 32 33 00 35 36 37 00 ...
So, when print is called, it uses the first parameter as the string pointer and starts printing characters until it sees a NULL (0x00).
That's why this example only prints "123" ("sud" in your original).
So let's push "1234" instead of "123". This means we are pushing the value 0x34333231. When calling printf the stack now looks like:
64: 68: 6c: 70:
68 00 00 00 31 32 33 34 35 36 37 00 ...
Now there is no NULL gap between those 2 strings on the stack and this example will print "1234567" (or "sudobor" in your original).
Implications: Try pushing "5678" instead of "567". You will probably get a segmentation fault because printf will just keep reading characters to print until it tries to read memory it doesn't have permission to read. Also, try pushing a string that is longer than 4 characters (e.g., "push '12345'"). The assembler won't let you because it can't convert that to a 32-bit number.
Related
I've been messing around with x86-16 assembly and running it with VirtualBox. For some reason when I read from memory and try to print it as a character, I get completely different results from what I was expecting. However when I hard-code the character as part of the instruction, it works fine.
Here's the code:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push cs
pop ds ; ds = cs
pop es ; es = 0xB800
jmp start
; input = di (position*2), ax (character and attributes)
putchar:
stosw
ret
; input = si (NUL-terminated string)
print:
cli
cld
.nextChar:
lodsb ; mov al, [ds:si] ; si += 1
test al, al
jz .finish
call putchar
jmp .nextChar
.finish:
sti
ret
start:
mov ah, 0x0E
mov di, 8
; should print P
mov al, byte [msg]
call putchar
; should print A
mov al, byte [msg + 1]
call putchar
; should print O
mov al, byte [msg + 2]
call putchar
; should print !
mov al, byte [msg + 3]
call putchar
; should print X
mov al, 'X'
call putchar
; should print Y
mov al, 'Y'
call putchar
cli
hlt
msg: db 'PAO!', 0
; Fill the rest of the bytes upto byte 510 with 0s
times 510 - ($ - $$) db 0
; Header
db 0x55
db 0xAA
The print label and instructions in it can be ignored since I haven't used it yet because of the problem I've been having trying to print a character stored in memory. I've assembled it with both FASM and NASM and have the same problem meaning it's obviously my fault.
It prints something like:
The ORG Directive
When you specify an ORG directive like ORG 0x0000 at the top of your assembler program, and use BITS 16 you are informing NASM that when resolving labels to Code and Data, that the absolute offsets that will be generated will be based on the starting offset specified in ORG (16-bit code will be limited to an offset being a WORD/2 bytes) .
If you have ORG 0x0000 at the start and place a label start: at the beginning of the code, start will have an absolute offset of 0x0000. If you use ORG 0x7C00 then the label start will have an absolute offset of 0x7c00. This will apply to any data labels and code labels.
We can simplify your example to see what is going on in the generated code when dealing with a data variable and a hard coded character. Although this code doesn't exactly perform the same actions as your code, it is close enough to show what works and what doesn't.
Example using ORG 0x0000:
BITS 16
ORG 0x0000
start:
push cs
pop ds ; DS=CS
push 0xb800
pop es ; ES = 0xB800 (video memory)
mov ah, 0x0E ; AH = Attribute (yellow on black)
mov al, byte [msg]
mov [es:0x00], ax ; This should print letter 'P'
mov al, byte [msg+1]
mov [es:0x02], ax ; This should print letter 'A'
mov al, 'O'
mov [es:0x04], ax ; This should print letter 'O'
mov al, '!'
mov [es:0x06], ax ; This should print letter '!'
cli
hlt
msg: db "PA"
; Bootsector padding
times 510-($-$$) db 0
dw 0xAA55
If you were to run this on VirtualBox the first 2 characters would be garbage while O! should display correctly. I will use this example through the rest of this answer.
VirtualBox / CS:IP / Segment:Offset Pairs
In the case of Virtual Box, it will effectively do the equivalent of a FAR JMP to 0x0000:0x7c00 after loading the boot sector at physical address 0x00007c00. A FAR JMP (or equivalent) will not only jump to a given address, it sets CS and IP to the values specified. A FAR JMP to 0x0000:0x7c00 will set CS = 0x0000 and IP = 0x7c00 .
If one is unfamiliar with the calculations behind 16-bit segment:offset pairs and how they map to a physical address then this document is a reasonably good starting point to understanding the concept. The general equation to get a physical memory address from a 16-bit segment:offset pair is (segment<<4)+offset = 20-bit physical address .
Since VirtualBox uses CS:IP of 0x0000:0x7c00 it would start executing code at a physical address of (0x0000<<4)+0x7c00 = 20-bit physical address 0x07c00 . Please be aware that this isn't guaranteed to be the case in all environments. Because of the nature of segment:offset pairs, there is more than one way to reference physical address 0x07c00. See the section at the end of this answer on ways to handle this properly.
What is Going Wrong with Your Bootloader?
Assuming we are using VirtualBox and the information above in the previous section is considered correct, then CS = 0x0000 and IP = 0x7c00 upon entry to our bootloader. If we take the example code (Using ORG 0x0000) I wrote in the first section of this answer and look at the disassembled information (I'll use objdump output) we'd see this:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x0000 boot.bin
00000000 <.data>:
0: 0e push cs
1: 1f pop ds
2: 68 00 b8 push 0xb800
5: 07 pop es
6: b4 0e mov ah,0xe
8: a0 24 00 mov al,ds:0x24
b: 26 a3 00 00 mov es:0x0,ax
f: a0 25 00 mov al,ds:0x25
12: 26 a3 02 00 mov es:0x2,ax
16: b0 4f mov al,0x4f
18: 26 a3 04 00 mov es:0x4,ax
1c: b0 21 mov al,0x21
1e: 26 a3 06 00 mov es:0x6,ax
22: fa cli
23: f4 hlt
24: 50 push ax ; Letter 'P'
25: 41 inc cx ; Letter 'A'
...
1fe: 55 push bp
1ff: aa stos BYTE PTR es:[di],al
Since the ORG information is lost when assembling to a binary file, I use --adjust-vma=0x0000 so that the first column of values (memory address) start at 0x0000. I want to do this because I used ORG 0x0000 in the original assembler code. I have also added some comments in the code to show where our data section is (and where the letters P and A were placed after the code).
If you were to run this program in VirtualBox the first 2 characters will come out as gibberish. So why is that? First recall VirtualBox reached our code by setting CS to 0x0000 and IP to 0x7c00. This code then copied CS to DS:
0: 0e push cs
1: 1f pop ds
Since CS was zero, then DS is zero. Now let us look at this line:
8: a0 24 00 mov al,ds:0x24
ds:0x24 is actually the encoded address for the msg variable in our data section. The byte at offset 0x24 has the value P in it (0x25 has A). You might see where things might go wrong. Our DS = 0x0000 so mov al,ds:0x24 is really the same as mov al,0x0000:0x24. This syntax isn't valid but I'm replacing DS with 0x0000 to make a point. 0x0000:0x24 is where our code while executing will attempt to read our letter P from. But wait! That is physical address (0x0000<<4)+0x24 = 0x00024. This memory address happens to be at the bottom of memory in the middle of the interrupt vector table. Clearly this is not what we intended!
There are a couple ways to tackle this issue. The easiest (and preferred method) is to actually place the proper segment into DS, and not rely on what CS might be when our program runs. Since we set an ORG of 0x0000 we need to have a Data Segment(DS) = 0x07c0 . A segment:offset pair of 0x07c0:0x0000 = physical address 0x07c00 . Which is what the address of our bootloader is at. So all we have to do is amend the code by replacing:
push cs
pop ds ; DS=CS
With:
push 0x07c0
pop ds ; DS=0x07c0
This change should provide the correct output when run in VirtualBox . Now let us see why. This code didn't change:
8: a0 24 00 mov al,ds:0x24
Now when executed DS=0x07c0. This would have been like saying mov al,0x07c0:0x24. 0x07c0:0x24, which would translate into a physical address of (0x07c0<<4)+0x24 = 0x07c24 . This is what we want since our bootloader was physically placed into memory by the BIOS starting at that location and so it should reference our msg variable correctly.
Moral of the story? What ever you use for ORG there should be an applicable value in the DS register when we start our program.We should set it explicitly, and not rely on what is in CS.
Why Do Immediate Values Print?
With the original code, the first 2 characters printed gibberish, but the last two didn't. As was discussed in the previous section there was a reason the first 2 character wouldn't print, but what about the last 2 characters that did?
Let us examine the disassembly of the 3rd character O more carefully:
16: b0 4f mov al,0x4f ; 0x4f = 'O'
Since we used an immediate (constant) value and moved it into register AL, the character itself is encoded as part of the instruction. It doesn't rely on a memory access via the DS register. Because of this the last 2 characters displayed properly.
Ross Ridge's Suggestion and Why it Works in VirtualBox
Ross Ridge suggested we use ORG 0x7c00, and you observed that it worked. Why did that happen? And is that solution ideal?
Using my very first example and modify ORG 0x0000 to ORG 0x7c00, and then assemble it. objdump would have provided this disassembly:
objdump -Mintel -mi8086 -D -b binary --adjust-vma=0x7c00 boot.bin
boot.bin: file format binary
Disassembly of section .data:
00007c00 <.data>:
7c00: 0e push cs
7c01: 1f pop ds
7c02: 68 00 b8 push 0xb800
7c05: 07 pop es
7c06: b4 0e mov ah,0xe
7c08: a0 24 7c mov al,ds:0x7c24
7c0b: 26 a3 00 00 mov es:0x0,ax
7c0f: a0 25 7c mov al,ds:0x7c25
7c12: 26 a3 02 00 mov es:0x2,ax
7c16: b0 4f mov al,0x4f
7c18: 26 a3 04 00 mov es:0x4,ax
7c1c: b0 21 mov al,0x21
7c1e: 26 a3 06 00 mov es:0x6,ax
7c22: fa cli
7c23: f4 hlt
7c24: 50 push ax ; Letter 'P'
7c25: 41 inc cx ; Letter 'A'
...
7dfe: 55 push bp
7dff: aa stos BYTE PTR es:[di],al
VirtualBox set CS to 0x0000 when it jumped to our bootloader. Our original code then copied CS to DS, so DS = 0x0000. Now observe what the ORG 0x7c00 directive has done to our generated code:
7c08: a0 24 7c mov al,ds:0x7c24
Notice how we are now using an offset of 0x7c24! This would be like mov al,0x0000:0x7c24 which is physical address (0x0000<<4)+0x7c24 = 0x07c24. That is the right memory location where the bootloader was loaded, and is the proper position of our msg string. So it works.
Is using an ORG 0x7c00 a bad idea? No. It is fine. But we have a subtle issue to contend with. What happens if another Virtual PC environment or real hardware doesn't FAR JMP to our bootloader using a CS:IP of 0x0000:0x7c00? This is possible. There are many physical PCs with a BIOS that actually does the equivalent of a far jump to 0x07c0:0x0000. That too is physical address 0x07c00 as we have already seen. In that environment, when our code runs CS = 0x07c0. If we use the original code that copies CS to DS, DS now has 0x07c0 too. Now observe what would happen to this code in that situation:
7c08: a0 24 7c mov al,ds:0x7c24
DS=0x07c0 in this scenario. We now have something resembling mov al,0x07c0:0x7c24 when the program actually runs. Ut-oh, that looks bad. What does that translate to as a physical address? (0x07c0<<4)+0x7c24 = 0x0F824. That is somewhere above our bootloader and it will contain whatever happens to be there after the computer boots. Likely zeros, but it should be assumed to be garbage. Clearly not where our msg string was loaded!
So how do we resolve this? To amend what Ross Ridge suggested, and to heed the advice I previously gave about explicitly setting DS to the segment we really want (don't assume CS is correct and then blindly copy to DS) we should place 0x0000 into DS when our bootloader starts if we use ORG 0x7c00. So we can change this code:
ORG 0x7c00
start:
push cs
pop ds ; DS=CS
to:
ORG 0x7c00
start:
xor ax, ax ; ax=0x0000
mov ds, ax ; DS=0x0000
Here we don't rely on an untrusted value in CS. We simply set DS to the segment value that makes sense given the ORG we used. You could have pushed 0x0000 and popped it into DS as you have been doing. I am more accustomed to zeroing out a register and moving that to DS.
By taking this approach, it doesn't matter what value in CS might have been used to reach our bootloader, the code would still reference the appropriate memory location for our data.
Don't Assume 1st Stage is Invoked by BIOS with CS:IP=0x0000:0x7c00
In my General Bootloader Tips that I wrote in a previous StackOverflow answer, tip #1 is very important:
When the BIOS jumps to your code you can't rely on CS,DS,ES,SS,SP registers having valid or expected values. They should be set up appropriately when your bootloader starts. You can only be guaranteed that your bootloader will be loaded and run from physical address 0x07c00 and that the boot drive number is loaded into the DL register.
The BIOS could have FAR JMP'ed (or equivalent) to our code with jmp 0x07c0:0x0000, and some emulators and real hardware do it this way. Others use jmp 0x0000:0x7c00 like VirtualBox does.
We should account for this by setting DS explicitly to what we need, and set it to what makes sense for the value we use in our ORG directive.
Summary
Don't assume CS is a value we expect, and don't blindly copy CS to DS . Set DS explicitly.
Your code could be fixed to use either ORG 0x0000 as you originally had it, if we set DS appropriately to 0x07c0 as previously discussed. That could look like:
ORG 0
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x07c0
pop ds ; DS=0x07c0 since we use ORG 0x0000
pop es
Alternatively we could have used ORG 0x7c00 like this:
ORG 0x7c00
BITS 16
push word 0xB800 ; Address of text screen video memory in real mode for colored monitors
push 0x0000
pop ds ; DS=0x0000 since we use ORG 0x7c00
pop es
Is the following a valid way to define variables in asm?
.globl main
.globl a, b, c, d
a: .byte 4
b: .value 7
c: .long 0x0C # 11
d: .quad 9
main:
mov $0, %eax
add a(%rip), %eax
add b(%rip), %eax
add c(%rip), %eax
add d(%rip), %eax
ret
For example, is it required/suggested to have a .TEXT section? How exactly does a(%rip) resolve to the value of $4? That seems almost like magic to me.
The default section is .text; lines before any section directive assemble into the .text section. So you do have one, and in fact you put everything in it, including your data. (Or read-only constants.) Normally you should put static constants in .rodata (or .rdata on Windows), not in .text, for performance reasons. (Mixing code and data wastes space in the I-cache and D-cache, and in TLBs.)
It doesn't resolve to an immediate $4 at assemble time, it resolves to an address. In this case, using a RIP-relative addressing mode. See what does "mov offset(%rip), %rax" do? / How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for more about the fact that it means "address of symbol a with respect to RIP", not actually RIP + absolute address of the symbol.
In other cases, the symbol a does generally resolve to its (32-bit) absolute address when used as add $a, %rdi or something.
It's only at runtime that the CPU loads data (which you put there with directives like .long) from that static storage. If you changed what was in memory (e.g. with a debugger, or by running other instructions) before add c(%rip), %eax executed, it would load a different value.
You put your constant data in .text, along with the code, which is generally not what you want for performance reasons. But it means the assembler can resolve the RIP-relative addressing at assemble time instead of only using a relocation that the linker has to fill in. Although it seems GAS chooses not to resolve the references and still leaves it for the linker:
$ gcc -c foo.s
$ objdump -drwC -Matt foo.o
foo.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <a>:
0: 04 .byte 0x4
0000000000000001 <b>:
1: 07 (bad)
...
0000000000000003 <c>:
3: 0c 00 or $0x0,%al
...
0000000000000007 <d>:
7: 09 00 or %eax,(%rax)
9: 00 00 add %al,(%rax)
b: 00 00 add %al,(%rax)
...
000000000000000f <main>:
f: b8 00 00 00 00 mov $0x0,%eax
14: 03 05 00 00 00 00 add 0x0(%rip),%eax # 1a <main+0xb> 16: R_X86_64_PC32 a-0x4
1a: 03 05 00 00 00 00 add 0x0(%rip),%eax # 20 <main+0x11> 1c: R_X86_64_PC32 b-0x4
20: 03 05 00 00 00 00 add 0x0(%rip),%eax # 26 <main+0x17> 22: R_X86_64_PC32 c-0x4
26: 03 05 00 00 00 00 add 0x0(%rip),%eax # 2c <main+0x1d> 28: R_X86_64_PC32 d-0x4
2c: c3 retq
(Attempted disassembly of your data as instructions happens because you put them in .text. objdump -d only disassembles .text, and non-immediate constants are normally placed in .rodata.)
Linking it into a executable resolves those symbol references:
$ gcc -nostdlib -static foo.s # not a working executable, just link it without extra stuff
$ objdump -drwC -Matt a.out
... (bogus data omitted)
000000000040100f <main>:
40100f: b8 00 00 00 00 mov $0x0,%eax
401014: 03 05 e6 ff ff ff add -0x1a(%rip),%eax # 401000 <a>
40101a: 03 05 e1 ff ff ff add -0x1f(%rip),%eax # 401001 <b>
401020: 03 05 dd ff ff ff add -0x23(%rip),%eax # 401003 <c>
401026: 03 05 db ff ff ff add -0x25(%rip),%eax # 401007 <d>
40102c: c3 retq
Note the 32-bit little-endian 2's complement encoding of the relative offsets in the RIP+rel32 addressing modes. (And the comment with the absolute address, added by objdump for convenience in this disassembly output.)
BTW, most assemblers including GAS have macro facilities, so you could have used a = 4 or .equ a, 4 to define it as an assemble-time constant, instead of emitting data into the output there. Then you'd use it as add $a, %eax, which would assemble to an add $sign_extended_imm8, r/m32 opcode.
Also, all your loads are dword sized (determined by the register operand), so only 1 of them matches the size of the data directives you used. Single-step through your code and look at the high bits of EAX.
Assembly language doesn't really have variables. It has tools you can use to implement the high-level-language concept of variables, including variables with static storage class. (A label and some space in .data or .bss. Or .rodata for const "variables".)
But if you use the tools differently, you can do things like load 4 bytes that span the .byte, the .value (16-bit), and the first byte of the .long. So after the first instruction, you'll have EAX += 0x0c000704 (because x86 is little-endian). This is totally legal to write in assembler, and nothing is checking to enforce the concept of a variable as ending before the next label.
(Unless you use MASM, which does have variables; in that case you'd have had to write add eax, dword ptr [a]; without the size override MASM would complain about the mismatch between a dword register and a byte variable. Other flavours of asm syntax, like NASM and AT&T, assume you know what you're doing and don't try to be "helpful".)
I have used a symbol "__copy_start" inside my assembly code which is coming from linker script. symbol is defined as ABS in symbol table.
This symbol is used inside a macro to copy data from one memory location to another.
After looking at varenter code hereious ways to modify this symbol directly in elf i decided to write C code of my own to modify the symbol value.
To do that i traversed entire symbol table and did string match for the symbol i am interested in. When there is a symbol name match i just assigned symbol_table.st_value = new value.
To make sure the new value is taken i did readelf -s and checked that it does show the new value assigned by me.
Now, when i disassemble the modified elf i find that the new value has not taken effect and i still see the assembly code doing copy from old symbol value.
My question is:
Am i doing something wrong here? is it possible to change the symbol values in elf? If yes, please let me know the correct way to do it. How do i achieve what i intend to do here.
Note: I don't have the source code so taking this approach.
Thanks in advance,
Gaurav
wanted to add more information so that people can understand better.
copying the elf header below:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
**Type: EXEC (Executable file)**
Machine: Ubicom32 32-bit microcontrollers
Version: 0x1
Entry point address: 0xb0000000
Start of program headers: 52 (bytes into file)
Start of section headers: 33548 (bytes into file)
Flags: 0x6
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 3
Here as you can see that file is of type executable.
output of readelf -S copied below:
There are 6 section headers, starting at offset 0x830c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 3ffc0000 004000 000ebc 00 AX 0 0 1
[ 2] .sdram PROGBITS 50000000 008000 0002e4 00 WA 0 0 1
[ 3] .shstrtab STRTAB 00000000 0082e4 000028 00 0 0 1
[ 4] .symtab SYMTAB 00000000 0083fc 0001c0 10 5 20 4
[ 5] .strtab STRTAB 00000000 0085bc 00019a 00 0 0 1
I am using one of the symbol named "__copy_start" in an instruction to copy the data from .sdram section to .text section. I was under an impression that i could go and change the symbol_table.st_value and then get the desired work done. But unfortunately that is not the case. Seems like it is already compiled and cannot be changed like this.
Any idea how this could be done would be really helpful.
Regards,
Gaurav
Are you sure that the object code actually uses a relocation to reference the data at the __copy_start symbol? Even for position-independent code, it is usually possible to turn section start addresses into relative addresses, which do not need a relocation. (That the symbol itself remains present with an absolute address does not change this.)
You can check this by using readelf -r or eu-readelf -r and examining the output. It is also visible in the objdump --dissassemble --reloc output.
I'm looking at some disassembly code and see something like 0x01c8f09b <+0015> mov 0x8(%edx),%edi and I am wondering what the value of %edx or %edi is.
Is there a way to print the value of %edx or other assembly variables? Is there a way to print the value at the memory address that %edx points at (I'm assuming edx is a register containing a pointer to ... something here).
For example, you can print an objet by typing po in the console, so is there a command or syntax for printing registers/variables in the assembly?
Background:
I'm getting EXC_BAD_ACCESS on this line and I would like to debug what is going on. I'm aware this error is related to memory management and I'm looking at figuring out where I may be missing/too-many retain/release/autorelease calls.
Additional Info:
This is on IOS, and my application is running in the iPhone simulator.
You can print a register (e.g, eax) using:
print $eax
Or for short:
p $eax
To print it as hexadecimal:
p/x $eax
To display the value pointed to by a register:
x $eax
Check the gdb help for more details:
help print
help x
Depends up which Xcode compiler/debugger you are using. For gcc/gdb it's
info registers
but for clang/lldb it's
register read
(gdb) info reg
eax 0xe 14
ecx 0x2844e0 2639072
edx 0x285360 2642784
ebx 0x283ff4 2637812
esp 0xbffff350 0xbffff350
ebp 0xbffff368 0xbffff368
esi 0x0 0
edi 0x0 0
eip 0x80483f9 0x80483f9 <main+21>
eflags 0x246 [ PF ZF IF ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
From Debugging with gdb:
You can refer to machine register contents, in expressions, as variables with names
starting with `$'. The names of registers are different for each machine; use info
registers to see the names used on your machine.
info registers
Print the names and values of all registers except floating-point
registers (in the selected stack frame).
info all-registers
Print the names and values of all registers, including floating-point
registers.
info registers regname ...
Print the relativized value of each specified register regname.
regname may be any register name valid on the machine you are using,
with or without the initial `$'.
If you are using LLDB instead of GDB you can use register read
Those are not variables, but registers.
In GDB, you can see the values of standard registers by using the following command:
info registers
Note that a register contains integer values (32bits in your case, as the register name is prefixed by e). What it represent is not known. It can be a pointer, an integer, mostly anything.
If po crashes when you try to print a register's value as a pointer, it's likely that the value is not a pointer (or an invalid one).
I would like to come up with the byte code in assembler (assembly?) for Windows machines to add two 32-bit longs and throw away the carry bit. I realize the "Windows machines" part is a little vague, but I'm assuming that the bytes for ADD are pretty much the same in all modern Intel instruction sets.
I'm just trying to abuse VB a little and make some things faster. So as an example of running direct assembly in VB, the hex string "8A4C240833C0F6C1E075068B442404D3E0C20800" is the assembly code for SHL that can be "injected" into a VB6 program for a fast SHL operation expecting two Long parameters (we're ignoring here that 32-bit longs in VB6 are signed, just pretend they are unsigned).
Along those same lines, what is the hex string of bytes representing assembler instructions that will do the same thing to return the sum of two 32-bit unsigned integers?
The hex code above for SHL is, according to the author:
mov eax, [esp+4]
mov cl, [esp+8]
shl eax, cl
ret 8
I spit those bytes into a file and tried unassembling them in a windows command prompt using the old debug utility, but I figured out it's not working with the newer instruction set because it didn't like EAX when I tried assembling something but it was happy with AX.
I know from comments in the source code that SHL EAX, CL is D3E0, but I don't have any reference to know what the bytes are for instruction ADD EAX, CL or I'd try it. (Though I know now that the operands have to be the same size.)
I tried flat assembler and am not getting anything I can figure out how to use. I used it to assemble the original SHL code and got a very different result, not the same bytes. Help?
I disassembled the bytes you provided and got the following code:
(__TEXT,__text) section
f:
00000000 movb 0x08(%esp),%cl
00000004 xorl %eax,%eax
00000006 testb $0xe0,%cl
00000009 jne 0x00000011
0000000b movl 0x04(%esp),%eax
0000000f shll %cl,%eax
00000011 retl $0x0008
Which is definitely more complicated than the source code the author provided. It checks that the second operand isn't too large, for example, which isn't in the code you showed at all (see Edit 2, below, for a more complete analysis). Here's a simple stdcall function that adds two arguments together and returns the result:
mov 4(%esp), %eax
add 8(%esp), %eax
ret $8
Assembling that gives me this output:
(__TEXT,__text) section
00000000 8b 44 24 04 03 44 24 08 c2 08 00
I hope those bytes do what you want them to!
Edit: Perhaps more usefully, I just did the same in C:
__attribute__((__stdcall__))
int f(int a, int b)
{
return a + b;
}
Compiled with -Oz and -fomit-frame-pointer it generates exactly the same code (well, functionally equivalent, anyway):
$ gcc -arch i386 -fomit-frame-pointer -Oz -c -o example.o example.c
$ otool -tv example.o
example.o:
(__TEXT,__text) section
_f:
00000000 movl 0x08(%esp),%eax
00000004 addl 0x04(%esp),%eax
00000008 retl $0x0008
The machine code output:
$ otool -t example.o
example.o:
(__TEXT,__text) section
00000000 8b 44 24 08 03 44 24 04 c2 08 00
Sure beats hand-writing assembly code!
Edit 2:
#ErikE asked in the comments below what would happen if a shift of 32 bits or greater was attempted. The disassembled code at the top of this answer (for the bytes provided in the original question) can be represented by the following higher-level code:
unsigned int shift_left(unsigned int a, unsigned char b)
{
if (b > 32)
return 0;
else
return a << b;
}
From this logic it's pretty easy to see that if you pass a value greater than 32 as the second parameter to the shift function, you'll just get 0 back.