LLDB: Disassemble functions at pointer location (Obj-C/macOS) - objective-c

When trying to debug applications written in Objective-C I often see that the registers contain a pointer to a function. The problem is, I cannot seem to get to the actual implementation.
Example:
$register read
$rbx = 0x00007fffd1b326b8 (void *)0x001dffffd1b32731
$x/2xw $rbx
$0x7fffd1b326b8: 0xd1b32731 0x001dffff (flipped endian)
$x/2xw 0x001dffffd1b32731
$error: memory read failed for 0x1dffffd1b32600
Obviously, I cannot set a breakpoint at that address either so that eliminates that option so my question is: is it possible to get to the instructions that are supposedly at that memory address?

If you know the address of a function, you can see the assembly of that function using disassemble -a command.
(lldb) disassemble -a 0x1234
or
(lldb) disas -a 0x1234

Related

How do i set up instruction & data memory address when using "riscv32-unknown-elf-gcc"?

I designed RISCV32IM processor, and I used "riscv32-unknown-elf-gcc" to generate code for testing.
However, the PC(instruction memory address) value and data memory address of the generated code had arbitrary values. I used this command:
riscv32-unknown-elf-gcc -march=rv32im -mabi=ilp32 -nostartfiles test.c
Can I know if I can set the instruction and data memory address I want?
Thanks.
Thank you for answer.
I designed only HW, and this is my first time using the SW tool chain.
Even if my question is rudimentary, please understand.
The figure is the result of the "-v" option.
enter image description here
I can't modify the script file because I use riscv tool chain in DOCKER environment.
So, I tried to copy the script file (elf32lriscv.x), modify it.
I modified it to 0x10000 ==> 0x00000.
The file name of the copied script is "test5.x".
And it was executed as follows.
What am I doing wrong?
enter image description here
The riscv compiler is using the default linker script to place text and date section... .
If you add -v option to your command line riscv32-unknown-elf-gcc -v -march=rv32im -mabi=ilp32 -nostartfiles test.c, you will see the linker script used by collect 2 ( normally it will be -melf32lriscv . you can find the linker script in ${path_to_toolchain}/riscv32-unknown-elf/lib/ldscripts/ (the default one is .x).
You can also use riscv32-unknown-elf-ld --verbose like explained by #Frant. However , you need to be careful if the toolchain was compiled with enable multilib and you compile for rv64 but the default is rv32 or vice versa. It is not the case probably, but to be sure you can specify the arch with -A elf32riscv for an rv32.
To Set the addresses you can create your own linker script or copy and modify the default one. You can only modify the executable start like explained by #Frant or make more modification and place whatever you want where you want.
Once your own linker script ready you can pass it to the linker with -Wl,-T,${own_linker_script }. you command will be riscv32-unknown-elf-gcc -march=rv32im -mabi=ilp32 -nostartfiles test.c -Wl,-T,${own_linker_script }

How to resolve function name in elf

I wanted to write an elf parser and disassemble the .text section, so I parsed the elf file and gave the .text section to the capstone to disassemble it for me. Unfortunately, capstone doesn't resolve function names.
According to the below assembly code in my elf file, there is a call to a function that I want to resolve its name.
call 8048380
I checked .symtab section but functions that need relocation like printf has a 0 address in the table because their address is unknown until load time.
So how am I gonna resolve its name?
I checked .symtab section but functions that need relocation like printf
The function you are interested in (the one at address 0x8048380) is not like printf and doesn't require runtime relocation.
It's unclear from your question how you obtained this dissassembly:
call 8048380
Chances are you need to use better tool, or you pointed your tool at a stripped binary (don't do that).
Here is an example of what the reasonable output should look like:
int foo() { return 42; }
int main() { return foo(); }
$ gcc t.c
$ gdb -q ./a.out
(gdb) disas main
Dump of assembler code for function main:
0x08048410 <+0>: push %ebp
0x08048411 <+1>: mov %esp,%ebp
0x08048413 <+3>: call 0x8048406 <foo> // GDB resolves the address
0x08048418 <+8>: pop %ebp
0x08048419 <+9>: ret
End of assembler dump.

Determining symbol addresses using binutils/readelf

I am working on a project where our verification test scripts need to locate symbol addresses within the build of software being tested. This might be used for setting breakpoints or reading static data from memory. What I am after is to create a map file containing symbol names, base address in memory, and size. Our build outputs an ELF file which has the information I want. I've been trying to use the readelf, nm, and objdump tools to try and to gain the symbol addresses I need.
I originally tried readelf -s file.elf and that seemed to access some symbols, particularly those which were written in assembler. However, many of the symbols that I wanted were not in there - specifically those that originated within our Ada code.
I used readelf --debug-dump file.elf to dump all debug information. From that I do see all symbols, including those that were in the Ada code. However, the format seems to be in the DWARF format. Does anyone know why these symbols would not be output by readelf when I ask it to list the symbolic information? Perhaps there is simply an option I am missing.
Now I could go to the trouble of writing a custom DWARF parser to get the information but if I can get it using one of the Binutils (nm, readelf, objdump) then I'd really like prefer a standard solution.
DWARF is the debug information and tries to reflect the relation of the original source code. Taking following code as an example
static int one() {
// something
return 1;
}
int main(int ac, char **av) {
return one();
}
After you compile it using gcc -O3 -g, the static function one will be inlined into main. So when you use readelf -s, you will never see the symbol one. However, when you use readelf --debug-dump, you can see one is a function which is inlined.
So, in this example, compiler does not prohibit you use optimization with -g, so you can still debug the executable. In that example, even the function is optimized and inlined, gdb still can use DWARF information to know the function and source/line from current code block inside inlined function.
Above is just a case of compiler optimization. There might be plenty of reasons that could lead to mismatch symbols address between readelf -s and DWARF.

In an ELF file, how does the address for _start get detemined?

I've been reading the ELF specification and cannot figure out where the program entry point and _start address come from.
It seems like they should have to be in a pretty consistent place, but I made a few trivial programs, and _start is always in a different place.
Can anyone clarify?
The _start symbol may be defined in any object file. Normally it is generated automatically (it corresponds to main in C). You can generate it yourself, for instance in an assembler source file:
.globl _start
_start:
// assembly here
When the linker has processed all object files it looks for the _start symbol and puts its value in the e_entry field of the elf header. The loader takes the address from this field and makes a call to it after it has finished loading all sections in memory and is ready to execute the file.
Take a look at the linker script ld is using:
ld -verbose
The format is documented at: https://sourceware.org/binutils/docs-2.25/ld/Scripts.html
It determines basically everything about how the executable will be generated.
On Binutils 2.24 Ubuntu 14.04 64-bit, it contains the line:
ENTRY(_start)
which sets the entry point to the _start symbol (goes to the ELF header as mentioned by ctn)
And then:
. = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
which sets the address of the first headers to 0x400000 + SIZEOF_HEADERS.
I have modified that address to 0x800000, passed my custom script with ld -T and it worked: readelf -s says that _start is at that address.
Another way to change it is to use the -Ttext-segment=0x800000 option.
The reason for using 0x400000 = 4Mb = getconf PAGE_SIZE is to start at the beginning of the second page as asked at: Why is the ELF execution entry point virtual address of the form 0x80xxxxx and not zero 0x0?
A question describes how to set _start from the command line: Why is the ELF entry point 0x8048000 not changeable with the "ld -e" option?
SIZEOF_HEADERS is the size of the ELF + program headers, which are at the beginning of the ELF file. That data gets loaded into the very beginning of the virtual memory space by Linux (TODO why?) In a minimal Linux x86-64 hello world with 2 program headers it is worth 0xb0, so that the _start symbol comes at 0x4000b0.
I'm not sure but try this link http://www.docstoc.com/docs/23942105/UNIX-ELF-File-Format
at page 8 it is shown where the entry point is if it is executable. Basically you need to calculate the offset and you got it.
Make sure to remember the little endianness of x86 ( i guess you use it) and reorder if you read bytewise edit: or maybe not i'm not quit sure about this to be honest.

Objective-C Code Obfuscation

Is there any way to obfuscate Objective-C Code ?
Thanks
The selectors are still plaintext - otool -o will dump out all your objects and the methods they define. You can also dump out all internal and external selectors accessed in the code with a one-liner that follows. Obfuscating method and parameter names at the source level would probably be easiest, though doing it at the object level will also obfuscate in a language-independent way at the expense of some linker table manipulation.
otool -s __TEXT __objc_methname yourapp.app/executable_file |expand -8 | cut -c17- | sed -n '3,$p' | perl -n -e 'print join("\n",split(/\x00/,scalar reverse (reverse unpack("(a4)*",pack("(H8)*",split(/\s/,$_))))))'|less
Objective c is a straight superset of C, therefore all normal C obfuscation techniques work. If you want to work with cocoa, however, you're going to have a bit of an obstacle because the method names are fairly self-documenting.
For your own methods, you just have to self-document the methods incorrectly. e.g.
-(void) doSomethingInnocent:(BOOL)animated withObject:passwords;
when you would normally have written:
-(void) sendObjectToMyServer:(BOOL)coverupAnimation;