extract source code from ELF with debug symbols - elf

I'm trying to extract the original source code from an ELF with debug symbols.
$ sed -n "14,16p" main.c
for (int p=start;p<end;p++)
if (isPrime(p))
printf("%d\n",p);
I want the "closest", preceeding source line from a given address:
$ gcc -g -O0 main.c -o main
$ objdump -D -S main > main.asm
$ sed -n "451,457p" main.asm
if (isPrime(p))
6ba: 8b 45 f4 mov -0xc(%rbp),%eax
6bd: 89 c7 mov %eax,%edi
6bf: e8 86 ff ff ff callq 64a <isPrime>
6c4: 85 c0 test %eax,%eax
6c6: 74 16 je 6de <main+0x49>
printf("%d\n",p);
So given the 0x6bf call instruction, I would like to extract if (isPrime(p)).
This seems possible since objdump does it (right?)

I'm trying to extract the original source code from an ELF with debug symbols.
That's impossible: the ELF with debug symbols contains no original source code.
What you appear to be after is source code location, i.e. file and line. Armed with file name, line number, and the original source, you can trivially print that line.
To recover file/line info, you could use addr2line -e main 0x6bf.

It turns out that it can be quite easily done with pyelftools:
import elftools
from elftools.elf.elffile import ELFFile as ELF
def addr_2_line(ELFname: str, addr: int) -> int:
with open(ELFname, "rb") as fl:
elf = ELF(fl)
dwarf_info = elf.get_dwarf_info()
for cu in dwarf_info.iter_CUs():
line = 1
for entry in dwarf_info.line_program_for_CU(cu).get_entries():
if entry.state:
if addr > entry.state.address:
line = entry.state.line
else:
return line
address = addr_2_line("main", 0x6bf)
print(f"src code[ 0x6bf ] = {address}")
When I run it it indeed gives the desired line:
src code[ 0x6bf ] = 15
It is probably worth checking if no adjustments are needed when there are more than just one compilation unit (cu)

Related

Writing lines to a binary file

I'm further playing with Raku's CommaIDE and I wanna print a binary file line by line.
I've tried this, but it doesn't work:
for "G.txt".IO.lines -> $line {
say $_;
}
How shall I fix it ? It's obviously incorrect.
EDIT
this doesn't work either, see the snippet bellow
for "G.txt".IO.lines -> $line {
say $line;
}
You're showing us h.raku but Comma is giving you an error regarding c.raku, which is some other file in your Comma project.
It looks like you're working with a text file, not binary. Raku makes a clear distinction here: a text file is treated as text, regardless of encoding. If it's UTF-8, using .lines as you are now should work just fine because that's the default. If it's some other encoding, you can call .lines(:enc<some-other-encoding>). If it's truly binary, then the concept of "lines" really has no meaning, and you want something more like .slurp(:bin), which will give you a Buf[uint8] for working on the byte level.
The question specifically refers to reading a binary file, for which reading line-wise may (or may not) make sense--depending on the file.
Here's code to read a binary file straight from the docs (using class IO::CatHandle):
~$ raku -e '(my $f1 = "foo".IO).spurt: "A\nB\nC\n"; (my $f2 = "foo"); with IO::CatHandle.new: $f2 {.encoding: Nil; .slurp.say;};'
Buf[uint8]:0x<41 0A 42 0A 43 0A>
Compare to reading the file with default encoding (utf8):
~$ raku -e '(my $f1 = "foo".IO).spurt: "A\nB\nC\n"; (my $f2 = "foo"); with IO::CatHandle.new: $f2 {.slurp.say;};'
A
B
C
See:
https://docs.raku.org/routine/encoding
Note: the read method uses class IO::Handle which reads binary by default. So the code is simply:
~$ raku -e '(my $file1 = "foo".IO).spurt: "A\nB\nC\n"; my $file2 = "foo".IO; given $file2.open { .read.say; .close;};'
Buf[uint8]:0x<41 0A 42 0A 43 0A>
See:
https://docs.raku.org/type/IO::Handle#method_read
For further reading, see discussion of Perl5's <> diamond-operator-equivalent in Raku:
https://docs.raku.org/language/5to6-nutshell#while_until
...and some (older) mailing-list discussion of the same:
https://www.nntp.perl.org/group/perl.perl6.users/2018/11/msg6295.html
Finally, the docs refer to writing a mixed utf8/binary file here (useful for further testing):
https://docs.raku.org/routine/encoding#Examples

modified elf symbol but not reflecting in disassembly

I have used a symbol "__copy_start" inside my assembly code which is coming from linker script. symbol is defined as ABS in symbol table.
This symbol is used inside a macro to copy data from one memory location to another.
After looking at varenter code hereious ways to modify this symbol directly in elf i decided to write C code of my own to modify the symbol value.
To do that i traversed entire symbol table and did string match for the symbol i am interested in. When there is a symbol name match i just assigned symbol_table.st_value = new value.
To make sure the new value is taken i did readelf -s and checked that it does show the new value assigned by me.
Now, when i disassemble the modified elf i find that the new value has not taken effect and i still see the assembly code doing copy from old symbol value.
My question is:
Am i doing something wrong here? is it possible to change the symbol values in elf? If yes, please let me know the correct way to do it. How do i achieve what i intend to do here.
Note: I don't have the source code so taking this approach.
Thanks in advance,
Gaurav
wanted to add more information so that people can understand better.
copying the elf header below:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
**Type: EXEC (Executable file)**
Machine: Ubicom32 32-bit microcontrollers
Version: 0x1
Entry point address: 0xb0000000
Start of program headers: 52 (bytes into file)
Start of section headers: 33548 (bytes into file)
Flags: 0x6
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 3
Here as you can see that file is of type executable.
output of readelf -S copied below:
There are 6 section headers, starting at offset 0x830c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 3ffc0000 004000 000ebc 00 AX 0 0 1
[ 2] .sdram PROGBITS 50000000 008000 0002e4 00 WA 0 0 1
[ 3] .shstrtab STRTAB 00000000 0082e4 000028 00 0 0 1
[ 4] .symtab SYMTAB 00000000 0083fc 0001c0 10 5 20 4
[ 5] .strtab STRTAB 00000000 0085bc 00019a 00 0 0 1
I am using one of the symbol named "__copy_start" in an instruction to copy the data from .sdram section to .text section. I was under an impression that i could go and change the symbol_table.st_value and then get the desired work done. But unfortunately that is not the case. Seems like it is already compiled and cannot be changed like this.
Any idea how this could be done would be really helpful.
Regards,
Gaurav
Are you sure that the object code actually uses a relocation to reference the data at the __copy_start symbol? Even for position-independent code, it is usually possible to turn section start addresses into relative addresses, which do not need a relocation. (That the symbol itself remains present with an absolute address does not change this.)
You can check this by using readelf -r or eu-readelf -r and examining the output. It is also visible in the objdump --dissassemble --reloc output.

Import class-dump info into GDB

Is there a way to import the output from class-dump into GDB?
Example code:
$ cat > test.m
#include <stdio.h>
#import <Foundation/Foundation.h>
#interface TestClass : NSObject
+ (int)randomNum;
#end
#implementation TestClass
+ (int)randomNum {
return 4; // chosen by fair dice roll.
// guaranteed to be random.
}
#end
int main(void) {
printf("num: %d\n", [TestClass randomNum]);
return 0;
}
^D
$ gcc test.m -lobjc -o test
$ ./test
num: 4
$ gdb test
...
(gdb) b +[TestClass randomNum]
Breakpoint 1 at 0x100000e5c
(gdb) ^D
$ strip test
$ gdb test
...
(gdb) b +[TestClass randomNum]
Function "+[TestClass randomNum]" not defined.
(gdb) ^D
$ class-dump -A test
...
#interface TestClass : NSObject
{
}
+ (int)randomNum; // IMP=0x0000000100000e50
#end
I know I can now use b *0x0000000100000e50 in gdb, but is there a way of modifying GDB's symbol table to make it accept b +[TestClass randomNum]?
Edit: It would be preferably if it would work with GDB v6 and not only GDB v7, as GDB v6 is the latest version with Apple's patches.
It’s possible to load a symbol file in gdb with the add-symbol-file command. The hardest part is to produce this symbol file.
With the help of libMachObjC (which is part of class-dump), it’s very easy to dump all addresses and their corresponding Objective-C methods. I have written a small tool, objc-symbols which does exactly this.
Let’s use Calendar.app as an example. If you try to list the symbols with the nm tool, you will notice that the Calendar app has been stripped:
$ nm -U /Applications/Calendar.app/Contents/MacOS/Calendar
0000000100000000 T __mh_execute_header
0000000005614542 - 00 0000 OPT radr://5614542
But with objc-symbols you can easily retrieve the addresses of all the missing Objective-C methods:
$ objc-symbols /Applications/Calendar.app
00000001000c774c +[CALCanvasAttributedText textWithPosition:size:text:]
00000001000c8936 -[CALCanvasAttributedText createTextureIfNeeded]
00000001000c8886 -[CALCanvasAttributedText bounds]
00000001000c883b -[CALCanvasAttributedText updateBezierRepresentation]
...
00000001000309eb -[CALApplication applicationDidFinishLaunching:]
...
Then, with SymTabCreator you can create a symbol file, which is just actually an empty dylib with all the symbols.
Using objc-symbols and SymTabCreator together is straightforward:
$ objc-symbols /Applications/Calendar.app | SymTabCreator -o Calendar.stabs
You can check that Calendar.stabs contains all the symbols:
$ nm Calendar.stabs
000000010014a58b T +[APLCALSource printingCachedTextSize]
000000010013e7c5 T +[APLColorSource alternateGenerator]
000000010013e780 T +[APLColorSource defaultColorSource]
000000010013e7bd T +[APLColorSource defaultGenerator]
000000010011eb12 T +[APLConstraint constraintOfClass:withProperties:]
...
00000001000309eb T -[CALApplication applicationDidFinishLaunching:]
...
Now let’s see what happens in gdb:
$ gdb --silent /Applications/Calendar.app
Reading symbols for shared libraries ................................. done
Without the symbol file:
(gdb) b -[CALApplication applicationDidFinishLaunching:]
Function "-[CALApplication applicationDidFinishLaunching:]" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
And after loading the symbol file:
(gdb) add-symbol-file Calendar.stabs
add symbol table from file "Calendar.stabs"? (y or n) y
Reading symbols from /Users/0xced/Calendar.stabs...done.
(gdb) b -[CALApplication applicationDidFinishLaunching:]
Breakpoint 1 at 0x1000309f2
You will notice that the breakpoint address does not exactly match the symbol address (0x1000309f2 vs 0x1000309eb, 7 bytes of difference), this is because gdb automatically recognizes the function prologue and sets the breakpoint just after.
GDB script
You can use this GDB script to automate this, given that the stripped executable is the current target.
Add the script from below to your .gdbinit, target the stripped executable and run the command objc_symbols in gdb:
$ gdb test
...
(gdb) b +[TestClass randomNum]
Function "+[TestClass randomNum]" not defined.
(gdb) objc_symbols
(gdb) b +[TestClass randomNum]
Breakpoint 1 at 0x100000ee1
(gdb) ^D
define objc_symbols
shell rm -f /tmp/gdb-objc_symbols
set logging redirect on
set logging file /tmp/gdb-objc_symbols
set logging on
info target
set logging off
shell target="$(head -1 /tmp/gdb-objc_symbols | head -1 | awk -F '"' '{ print $2 }')"; objc-symbols "$target" | SymTabCreator -o /tmp/gdb-symtab
set logging on
add-symbol-file /tmp/gdb-symtab
set logging off
end
There is no direct way to do this (that I know of), but it seems like a great idea.
And now there is a way to do it... nice answer, 0xced!
The DWARF file format is well documented, IIRC, and, as the lldb source is available, you have a working example of a parser.
Since the source to class-dump is also available, it shouldn't be too hard to modify it to spew DWARF output that could then be loaded into the debugger.
Obviously, you wouldn't be able to dump symbols with full fidelity, but this would probably be quite useful.
You can use DSYMCreator.
With DSYMCreator, you can create a symbol file from an iOS executable binary.
It's a toolchain, so you can use it like this.
$ ./main.py --only-objc /path/to/binary/xxx
Then a file /path/to/binary/xxx.symbol will be created, which is a DWARF format symbol. you can import it to lldb by yourself.
Apart from that, DSYMCreator also supports to export symbols from IDA Pro, you can use it like this.
$ ./main.py /path/to/binary/xxx
YES, just ignore --only-objc flag. Then the IDA Pro will run automatically, and then a file /path/to/binary/xxx.symbol will be created, which is the symbol file.
Thanks 0xced for creating objc-symbols, which is a part of DSYMCreator toolchain.
BTW, https://github.com/tobefuturer/restore-symbol is another choice.

Contiki compile error, " ERROR: address 0x820003 out of range at line 1740 of..."

I started to use contiki operating system with atmel atmega128rfa1.
I can compile my example, but the hex file is bad. The error is:
ERROR: address 0x820003 out of range at line 1740 of ipso.hex (i am not using IPSO, just i kept this name).
When I compile in linux system the code is program size is 27804 byte and the data is 4809byte.
When I compile in windows the program is 28292 and the data is 4791.
I use only one process and one etimer, I would like to turn on and off 1 led.
the makefile consinst of:
`
TARGET=avr-atmega128rfa1
CONTIKI = ../..
include $(CONTIKI)/Makefile.include
all:
make -f Makefile.ipso TARGET=avr-atmega128rfa1 ipso.elf
avr-objcopy -O ihex -R .eeprom ipso.elf ipso.hex
avr-size -C --mcu=atmega128rfa1 ipso.elf `
i can't program the controller. What is the problem?
thank you.
Special sections in the .elf file start above 0x810000 and must be removed when generating a hex file for programming a particular memory, e.g.
$ avr-objdump -h webserver6.avr-atmega128rfa1
webserver6.avr-atmega128rfa1: file format elf32-avr
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00001bda 00800200 0000e938 0000ea2c 2**0
CONTENTS, ALLOC, LOAD, DATA
1 .text 0000e938 00000000 00000000 000000f4 2**1
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .bss 000031a6 00801dda 00801dda 00010606 2**0
ALLOC
3 .eeprom 00000029 00810000 00810000 00010606 2**0
CONTENTS, ALLOC, LOAD, DATA
4 .fuse 00000003 00820000 00820000 0001062f 2**0
CONTENTS, ALLOC, LOAD, DATA
5 .signature 00000003 00840000 00840000 00010632 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
So,
avr-objcopy -O ihex -R .eeprom -R .fuse -R signature ipso.elf ipso.hex
alternately, only copy the desired sections:
avr-objcopy -O ihex -j .text -j .data ipso.elf ipso.hex
avr-objcopy --change-section-lma .eeprom=0
this works for me

Add two 32-bit integers in Assembler for use in VB6

I would like to come up with the byte code in assembler (assembly?) for Windows machines to add two 32-bit longs and throw away the carry bit. I realize the "Windows machines" part is a little vague, but I'm assuming that the bytes for ADD are pretty much the same in all modern Intel instruction sets.
I'm just trying to abuse VB a little and make some things faster. So as an example of running direct assembly in VB, the hex string "8A4C240833C0F6C1E075068B442404D3E0C20800" is the assembly code for SHL that can be "injected" into a VB6 program for a fast SHL operation expecting two Long parameters (we're ignoring here that 32-bit longs in VB6 are signed, just pretend they are unsigned).
Along those same lines, what is the hex string of bytes representing assembler instructions that will do the same thing to return the sum of two 32-bit unsigned integers?
The hex code above for SHL is, according to the author:
mov eax, [esp+4]
mov cl, [esp+8]
shl eax, cl
ret 8
I spit those bytes into a file and tried unassembling them in a windows command prompt using the old debug utility, but I figured out it's not working with the newer instruction set because it didn't like EAX when I tried assembling something but it was happy with AX.
I know from comments in the source code that SHL EAX, CL is D3E0, but I don't have any reference to know what the bytes are for instruction ADD EAX, CL or I'd try it. (Though I know now that the operands have to be the same size.)
I tried flat assembler and am not getting anything I can figure out how to use. I used it to assemble the original SHL code and got a very different result, not the same bytes. Help?
I disassembled the bytes you provided and got the following code:
(__TEXT,__text) section
f:
00000000 movb 0x08(%esp),%cl
00000004 xorl %eax,%eax
00000006 testb $0xe0,%cl
00000009 jne 0x00000011
0000000b movl 0x04(%esp),%eax
0000000f shll %cl,%eax
00000011 retl $0x0008
Which is definitely more complicated than the source code the author provided. It checks that the second operand isn't too large, for example, which isn't in the code you showed at all (see Edit 2, below, for a more complete analysis). Here's a simple stdcall function that adds two arguments together and returns the result:
mov 4(%esp), %eax
add 8(%esp), %eax
ret $8
Assembling that gives me this output:
(__TEXT,__text) section
00000000 8b 44 24 04 03 44 24 08 c2 08 00
I hope those bytes do what you want them to!
Edit: Perhaps more usefully, I just did the same in C:
__attribute__((__stdcall__))
int f(int a, int b)
{
return a + b;
}
Compiled with -Oz and -fomit-frame-pointer it generates exactly the same code (well, functionally equivalent, anyway):
$ gcc -arch i386 -fomit-frame-pointer -Oz -c -o example.o example.c
$ otool -tv example.o
example.o:
(__TEXT,__text) section
_f:
00000000 movl 0x08(%esp),%eax
00000004 addl 0x04(%esp),%eax
00000008 retl $0x0008
The machine code output:
$ otool -t example.o
example.o:
(__TEXT,__text) section
00000000 8b 44 24 08 03 44 24 04 c2 08 00
Sure beats hand-writing assembly code!
Edit 2:
#ErikE asked in the comments below what would happen if a shift of 32 bits or greater was attempted. The disassembled code at the top of this answer (for the bytes provided in the original question) can be represented by the following higher-level code:
unsigned int shift_left(unsigned int a, unsigned char b)
{
if (b > 32)
return 0;
else
return a << b;
}
From this logic it's pretty easy to see that if you pass a value greater than 32 as the second parameter to the shift function, you'll just get 0 back.