Extract Objective-c binary

Extract Objective-c binary - objective-c

Is it possible to extract a binary, to get the code that is behind the binary? With Class-dump you can see the implementation addresses, but is it possible to also see the code thats IN the implementation addresses? Is there ANY way to do it?

All your code compiles to single instructions, placed in the text section of your executable. The compiler is responsible for translating your higher level language to the processor specific instructions, which are simpler. Reverting this process would be nearly impossible, unless the code is quite simple. Some problems are ambiguity of statements, and the overall readability: local variables, for instance, will be nothing but an offset address.
If you want to read the disassembled code (the instructions of which the higher level code was compiled to) use this command in an executable:
otool -tV file

You can decompile (more accurately, disassemble) a binary and get it's assembly, but there is no way to get back the original Objective-C.
My curiosity begs me to ask why you want to do this!?

otx http://otx.osxninja.com/ is a good tool for symbolicating the otool based disassembly
It will handle both x86_64 and i386 disassembly.
and
Mach-O-Scope https://github.com/smorr/Mach-O-Scope is a a tool built on top of otx to dump it all into a sqlite3 database for browsing and annotating.
It won't give you the original source -- but it will get you pretty close providing you with the messages that are being sent around in methods.

Related

What does the -specs argument do in arm-none-eabi-gcc?

I was having trouble with the linker for the embedded arm gcc compiler, and I found a tutorial somewhere online saying that I could fix my linker errors in arm-none-eabi-gcc by including the argument -specs=nosys.specs, which worked for me, and it was able to compile my code.
My chip is an ATSAM7SE256 microcontroller, which to my understanding is an arm7tdmi processor using the armv4t and thumb instruction sets, and I've been compiling my code using:
arm-none-eabi-gcc -march=armv4t -mtune=arm7tdmi -specs=nosys.specs -o <exe_name>.elf <input_files>
And the code compiles with no issue, but I have no idea if it's doing what I think it's doing.
What is the significance of a spec file? What other values can you set with -specs=, and in what situations would you want to? Is nosys.specs the value I want for a completely embedded arm microcontroller?

It is documented at: https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Overall-Options.html#Overall-Options
It is a file containing switches to override standard defaults for various build components such as the compiler, assembler and linker. For example it can be used to replace the default C library.
I have never seen it used; typically bare-metal embedded system builds explicitly specify --nostdlib then explicitly link the required library. It could be used for environment specific build environments to link other default code such as an RTOS I guess. Personally I'd rather make all that explicit on the command line that hiding it in a file somewhere.
Essentially it applies the switches specified in the file as if they were defaults, so can be used to define defaults for specific build and execution environments.
The format of the specs file is documented at https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Spec-Files.html#Spec-Files
Without seeing both the linker errors and the content of the nosys.specs file in this case it is difficult to say how or why it solved your linker problem. The alternative solution of course would be to apply whatever switches are in the specs file directly.

Can different file extension executables be disassembled into the same instruction set OpCode?

This is a question from someone clueless about disassembly and decompiling in general, so bear with me. I am curious to know if executable file extensions (for example, listed in http://pcsupport.about.com/od/tipstricks/a/execfileext.htm ) can be disassembled into assembly language so then I can analyze opcode patterns across files.
My logic is that once all these different file extensions are in opcode form, they are all on the same level, regardless of language barriers, etc, so it would be easier to analyze them.
How feasible is this?
EDIT: Example. I have an .exe file and an .app file. If I disassembled both, could I compare them across opcode on the same OS? If not, how about executable files from the same OS. For example, for all executable files on Windows, if I disassembled both, could I compare opcode across each?
EDIT2: How will obfuscators affect my efforts?

In short, no.
The problem is that there is no practical universal instruction set. In practice, every computer architecture has its own instruction set (or sometimes several instruction sets). A native executable format like .exe is compiled to the machine's instruction set, which will differ based on the ISA targeted.
I'm not familiar with the .app format, but it appears to be some sort of archive containing executable code. So if you have an exe and app targeting the same ISA, you could conceivably diassemble and compare.
Obfuscation makes things much harder because it is difficult to get a reliable disassembly, let alone deal with stuff like self modifying code.

How do I add a prefix to all symbols in an elf object file but so that debugging still works?

I want to add a prefix to every symbol in an elf object file, how do you do that using Linux (eg debian)?
I need the debug information to still work (ie, gdb can still debug effectively albeit using the new names for all the symbols).
The elf object is relocatable.
A solution for a non-relocatable object would also be welcome.
A solution for which code-coverage stats continues to work would also be welcome but is not necessary.

I don't know of any canned way to do this.
I think it could be done by rewriting the ELF symbol table and the DWARF information as well. This is not trivial, though perhaps you could implement it using the various libraries in elfutils.

In which language is the proto compiler (of google protocol buffers) written?

I would like to know in which language the "proto compiler" (the compiler used to generate source files from Java, Python or c++) is written? Is it maybe a mix of languages?
Any help would be appreciated.
Thanks in Advance
Horace

It appears to be written in C++. There's also documentation on Java and Python APIs, but those don't appear to contain the compiler itself (at least I don't see anything that's obviously the compiler in either case, though I didn't spend a whole lot of time looking for it either).
That said, I'm almost tempted to vote to close -- for most practical purposes, the language used to implement the compiler is basically a trivia question, irrelevant to actual use. There is, however, an entirely legitimate exception: if you're going to download and modify the compiler, knowing the language you'd need to work with could be quite useful.

The protoc compiler is written in C or C++ (its a native program anyway).
When I want to process proto files in java files, I
I use the protoc command to convert them to a Protocol Buffer File ie
protoc protofile.proto --descriptor_set_out=OutputFile
Read the new protocol buffer file (its a FileDescriptorSet) and use it
An over complicated example is example, is compileProto method in
http://code.google.com/p/protobufeditor/source/browse/trunk/%20protobufeditor/Source/ProtoBufEditor/src/net/sf/RecordEditor/ProtoBuf/re/display/ProtoLayoutSelection.java
its compilcated because options because the protoc command and options can be stored in a properties file.
Note: The getFileDescriptor method reads the newly created protocol buffer

Process for reducing the size of an executable

I'm producing a hex file to run on an ARM processor which I want to keep below 32K. It's currently a lot larger than that and I wondered if someone might have some advice on what's the best approach to slim it down?
Here's what I've done so far
So I've run 'size' on it to determine how big the hex file is.
Then 'size' again to see how big each of the object files are that link to create the hex files. It seems the majority of the size comes from external libraries.
Then I used 'readelf' to see which functions take up the most memory.
I searched through the code to see if I could eliminate calls to those functions.
Here's where I get stuck, there's some functions which I don't call directly (e.g. _vfprintf) and I can't find what calls it so I can remove the call (as I think I don't need it).
So what are the next steps?
Response to answers:
As I can see there are functions being called which take up a lot of memory. I cannot however find what is calling it.
I want to omit those functions (if possible) but I can't find what's calling them! Could be called from any number of library functions I guess.
The linker is working as desired, I think, it only includes the relevant library files. How do you know if only the relevant functions are being included? Can you set a flag or something for that?
I'm using GCC

General list:
Make sure that you have the compiler and linker debug options disabled
Compile and link with all size options turned on (-Os in gcc)
Run strip on the executable
Generate a map file and check your function sizes. You can either get your linker to generate your map file (-M when using ld), or you can use objdump on the final executable (note that this will only work on an unstripped executable!) This won't actually fix the problem, but it will let you know of the worst offenders.
Use nm to investigate the symbols that are called from each of your object files. This should help in finding who's calling functions that you don't want called.
In the original question was a sub-question about including only relevant functions. gcc will include all functions within every object file that is used. To put that another way, if you have an object file that contains 10 functions, all 10 functions are included in your executable even if one 1 is actually called.
The standard libraries (eg. libc) will split functions into many separate object files, which are then archived. The executable is then linked against the archive.
By splitting into many object files the linker is able to include only the functions that are actually called. (this assumes that you're statically linking)
There is no reason why you can't do the same trick. Of course, you could argue that if the functions aren't called the you can probably remove them yourself.
If you're statically linking against other libraries you can run the tools listed above over them too to make sure that they're following similar rules.

Another optimization that might save you work is -ffunction-sections, -Wl,--gc-sections, assuming you're using GCC. A good toolchain will not need to be told that, though.
Explanation: GNU ld links sections, and GCC emits one section per translation unit unless you tell it otherwise. But in C++, the nodes in the dependecy graph are objects and functions.

On deeply embedded projects I always try to avoid using any standard library functions. Even simple functions like "strtol()" blow up the binary size. If possible just simply avoid those calls.
In most deeply embedded projects you don't need a versatile "printf()" or dynamic memory allocation (many controllers have 32kb or less RAM).
Instead of just using "printf()" I use a very simple custom "printf()", this function can only print numbers in hexadecimal or decimal format not more. Most data structures are preallocated at compile time.

Andrew EdgeCombe has a great list, but if you really want to scrape every last byte, sstrip is a good tool that is missing from the list and and can shave off a few more kB.
For example, when run on strip itself, it can shave off ~2kB.
From an old README (see the comments at the top of this indirect source file):
sstrip is a small utility that removes the contents at the end of an
ELF file that are not part of the program's memory image.
Most ELF executables are built with both a program header table and a
section header table. However, only the former is required in order
for the OS to load, link and execute a program. sstrip attempts to
extract the ELF header, the program header table, and its contents,
leaving everything else in the bit bucket. It can only remove parts of
the file that occur at the end, after the parts to be saved. However,
this almost always includes the section header table, and occasionally
a few random sections that are not used when running a program.
Note that due to some of the information that it removes, a sstrip'd executable is rumoured to have issues with some tools. This is discussed more in the comments of the source.
Also... for an entertaining/crazy read on how to make the smallest possible executable, this article is worth a read.

Just to double-check and document for future reference, but do you use Thumb instructions? They're 16 bit versions of the normal instructions. Sometimes you might need 2 16 bit instructions, so it won't save 50% in code space.
A decent linker should take just the functions needed. However, you might need compiler & linke settings to package functions for individual linking.

Ok so in the end I just reduced the project to it's simplest form, then slowly added files one by one until the function that I wanted to remove appeared in the 'readelf' file. Then when I had the file I commented everything out and slowly add things back in until the function popped up again. So in the end I found out what called it and removed all those calls...Now it works as desired...sweet!
Must be a better way to do it though.

To answer this specific need:
•I want to omit those functions (if possible) but I can't find what's
calling them!! Could be called from any number of library functions I
guess.
If you want to analyze your code base to see who calls what, by whom a given function is being called and things like that, there is a great tool out there called "Understand C" provided by SciTools.
https://scitools.com/
I have used it very often in the past to perform static code analysis. It can really help to determine library dependency tree. It allows to easily browse up and down the calling tree among other things.
They provide a limited time evaluation, then you must purchase a license.

You could look at something like executable compression.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extract Objective-c binary - objective-c

Is it possible to extract a binary, to get the code that is behind the binary? With Class-dump you can see the implementation addresses, but is it possible to also see the code thats IN the implementation addresses? Is there ANY way to do it?

You can decompile (more accurately, disassemble) a binary and get it's assembly, but there is no way to get back the original Objective-C. My curiosity begs me to ask why you want to do this!?

Related

What does the -specs argument do in arm-none-eabi-gcc?

Can different file extension executables be disassembled into the same instruction set OpCode?

How do I add a prefix to all symbols in an elf object file but so that debugging still works?

In which language is the proto compiler (of google protocol buffers) written?

Process for reducing the size of an executable

Categories

Resources