I'm an IOS developer and I know objective C. I wanna to create a stand alone mac app whose sole functionality is to patch another app available in same mac.
Lets say I have an app called X in my applications folder. This app X has some undesired behaviour. So I tried to modify this behaviour. I analysed the app's executable with the help of Hopper disassembler, I came to know that I have to change assembly instructions starting at 00000001003e3790. I changed those assembly instructions and produced the new executable. Then I replaced the old one with new executable and then that undesired behaviour now seems to be gone.
As most people would love to remove this undesired behaviour, I decided to write a patcher and distribute that patcher to them.
So how can I modify assembly instructions available inside the executable of app X in my patcher app then replace the original one with my modified version ?
It would be great if someone help me in right direction.
In general, you should ask the user for the location of the app bundle, in case it can`t be found in /Applications/.
You need to check whether the target executable inside that bundle has the same hash (it may be CRC, MD5, SHA — you name it) as the executable you had before patching it.
If the hashes match, then you are to open the file for writing and seek for the pre-hardcoded place where the wrong instructions are stored; you can determine that place by searching the patched file in a hex-editor for a long enough byte string beginning with your patched bytes.
And finally, you are to rewrite (a.k.a. patch) the target bytes with yours and close the file.
[UPD.] Example code for [3].
This does not require any ObjC-related mechanisms, and can be built and run using only the plain libc:
long PatchSomething(char *name, char *data, size_t offs, size_t size) {
long file = open(name, O_WRONLY);
if (file != -1) {
lseek(file, offs, SEEK_SET);
write(file, data, size);
close(file);
}
return file != -1;
}
where:
name is the name of the file to patch
data is the data to be written
offs is the file offset where the data shall be put
size is the data size; exactly size of the old bytes in the file would get rewritten
Related
What is supposed to happen to an executable that deletes itself as part of its execution? Are the rules different for different OSes? Does it depend on the executable format (eg PE, Mach-O, etc) or on something else?
Specifically, I want to know about the expected behavior for a self-deleting executable in OS X, Linux, and Windows. If they are different, I want to know why.
motivation:
I work on a project that has a "nuclear" build clean up command:
jlpm clean:slate
The above command completely cleans up and uninstalls everything related to the project, including the jlpm executable itself. On OS X/Linux the clean:slate command works fine, but I've been told it doesn't work on Windows. I'm curious as to why, and how I should go about fixing it
Are the rules different for different OSes?
Yes.
Does it depend on the executable format (eg PE, Mach-O, etc)
No, executable format is irrelevant.
Traditional UNIX implementations keep a reference count on the file inode. When a regular file is on disk and no program has opened it, it has a reference count of 1 (assuming there are no hard links to it). The 1 comes from directory in which the file appears.
If you then rm the file, the inode reference count drops to 0, which signals to the OS that it is no longer needed, and all data associated with it can be discarded.
When some program opens the file (or the file is executing), the inode reference count is incremented (now 2). If you now remove the file from directory, inode reference count drops to 1, but the file is still there, so there is no problem.
(This is how you could hog disk space on a machine in a way that is "invisible" to the system administrator.)
Windows do not have such reference counting, and attempts to remove open file fail. This causes no end of problems for UNIX programmers.
how I should go about fixing it
See answers to this question.
What are the differences between the byte code binary executables such as Java class files, Parrot bytecode files or CLR files and machine code executables such as ELF, Mach-O and PE.
what are the distinctive differences between the two?
such as the .text area in the ELF structure is equal to what part of the class file?
or they all have headers but the ELF and PE headers contain Architecture but the Class file does not
Java Class File
Elf file
PE File
Byte code is, as imulsion noted, an intermediate step, right before compilation into machine code. Because the last step is left to load time (and often runtime, as is the case with Just-In-Time (JIT) compilation, byte code is architecture independent: The runtime (CLR for .net or JVM for Java) is responsible for mapping the byte code opcodes to their underlying machine code representation.
By comparison, native code (Windows: PE, PE32+, OS X/iOS: Mach-O, Linux/Android/etc: ELF) is compiled code, suited for a particular architecture (Android/iOS: ARM, most else: Intel 32-bit (i386) or 64-bit). These are all very similar, but still require sections (or, in Mach-O parlance "Load Commands") to set up the memory structure of the executable as it becomes a process (Old DOS supported the ".com" format which was a raw memory image). In all the above, you can say , roughly, the following:
Sections with a "." are created by the compiler, and are "default" or expected to have default behavior
The executable has the main code section, usually called "text" or ".text". This is native code, which can run on the specific architecture
Strings are stored in a separate section. These are used for hard-coded output (what you print out) as well as symbol names.
Symbols - which are what the linker uses to put together the executable with its libraries (Windows: DLLs, Linux/Android: Shared Objects, OS X/iOS: .dylibs or frameworks) are stored in a separate section. Usually there is also a "PLT" (Procedure Linkage Table) which enables the compiler to simply put in stubs to the functions you call (printf, open, etc), that the linker can connect when the executable loads.
Import table (in Windows parlance.. In ELF this is a DYNAMIC section, in OS X this is a LC_LOAD_LIBRARY command) is used to declare additional libraries. If those aren't found when the executable is loaded, the load fails, and you can't run it.
Export table (for libraries/dylibs/etc) are the symbols which the library (or in Windows, even an .exe) can export so as to have others link with.
Constants are usually in what you see as the ".rodata".
Hope this helps. Really, your question was vague..
TG
Byte code is a 'halfway' step. So the Java compiler (javac) will turn the source code into byte code. Machine code is the next step, where the computer takes the byte code, turns it into machine code (which can be read by the computer) and then executes your program by reading the machine code. Computers cannot read source code directly, likewise compilers cannot translate immediately into machine code. You need a halfway step to make programs work.
Note that ELF binaries don't necessarily need to be machine/arch specific per se.
The interesting piece is the "interpreter" header field: it holds a path name to a loader program that's executed instead of the actual binary. This one then is responsible for loading the actual program, loading and linking libraries, etc. This is the way how eg. ld.so comes in.
Theoretically one could create an ELF binary that holds java bytecode (or a complete jar). This just needs some appropriate "interpreter" program which starts up a JVM and loads the code from the binary into it.
Not sure whether this actually has been done before, but certainly possible.
The same can be done w/ quite any non-native code.
It also could serve for direct multiarch support via some VM like qemu:
Let the target platform (libc+linker scripts) put the arch name into the interpreter program name (eg. /lib/ld.so.x86_64, /lib/ld.so.armhf, ...).
Then, on a particular arch (eg. x86_64), the one with native arch name will point to the original ld.so, while the others point to some special one that calls up something like qemu-system-XXX.
I'm working on porting PC OpenGL application onto Android. I've chosen to use that NDK android_native_app_glue framework stuff. As I understood, it would allow to me stay on C++ and write no even single JAVA code line. And it sounded promising.
The first unclear thing to me it data saving/loading. My application's data format is binary and my original PC-related code uses plain "stdio.h" FILE operations: fopen, fread, fwrite, etc to create, write and read from "mygame.bin" file. How do I port it onto Android?
Having googled it, I found that I must store and then use a "java environment" variable:
JNIEnv *g_jniEnv = 0;
void android_main(struct android_app* state) {
g_jniEnv = state->activity->env;
}
However, I still do not realize how to use this g_jniEnv to perform file operations.
Update:
Okay, I found that in Java, data saving can be performed as follows:
String string = "hello world!";
FileOutputStream fos = openFileOutput("mygame.bin", Context.MODE_PRIVATE);
fos.write(string.getBytes());
fos.close();
So my questions is: given a pointer to JNIEnv, how do I rewrite this code fragment in C++?
You can still use stdio.h in an NDK project. #include and pound away.
The only trick is getting the path to your application's private data folder. It's available as state->activity->internalDataPath in android_main. Outside of that path, the app has no filesystem rights.
You will probably need your JNIEnv eventually one way or another. The entirety of Java API is not mirrored for the native subsystem; so hold on to that env. But native file I/O is right there.
Does small embedded system without RTOS/OS uses dynamic/shared libraries. my understanding is that its very tough to use it and will be not productive.
If we are calling an API multiple times which is present in a static library. Does API code will be placed at every call location like macro expansion or code/text will be common for all calls. I think code/text will be common.
If I have made a static library for a .c files which has multiple API's and I am statically linking it with main file and in main file only one API has been called so my question is does whole library is included in final .bin or only particular API code.
from above questions you can assume that I am missing fundamentals itself so can anyone please provide the related links to brush up these.
Regards
[edit]
I have tried following things
addition.c module
`int addition(int a,int b)`
`{`
`int result;`
`result = a + b;`
`return result;`
`}`
`size addition.o`
23 0 0 23 17 addition.o
multiplication.c module
`int multiplication(int a, int b)`
`{`
`int result;`
`result = a * b;`
`return result;`
`}`
`size multiplication.o`
21 0 0 21 15 multiplication.o
created object file of both and put in archieve
ar cr libarith.a addition.o multiplication.o
then statically linked to my main application
example.c module
`#include "header.h"`
`#include <stdio.h>`
`1:int main()`
`2:{`
`3:int result;`
`4:result = addition(1,2);`
`5:printf("addition result is : %d\n",result);`
`6:result = multiplication(3,2);`
`7:printf("multiplication result is : %d\n",result);`
`8:return 0;`
`9:}`
gcc -static example.c -L. -larith -o example
size of example
511141 1928 7052 520121 7efb9 example
commented line number 6 of example.c
and again linked
gcc -static example.c -L. -larith -o example
size of example
511109 1928 7052 520089 7ef99 example
32 bytes of difference between above two
thats mean addition.o is not included in example
merged both modules addition.c and multiplication.c as addmult.c as below
int addition(int a,int b)
{
int result;
result = a + b;
return result;
}
int multiplication(int a, int b)
{
int result;
result = a * b;
return result;
}
created object file and put in archieve
before doing that i have deleted previous archieve
ar cr libarith.a addmult.o
now commented line number 6 of example.c
gcc -static example.c -L. -larith -o example
size example
511093 1928 7052 520073 7ef89 example
uncommented line nmber 6 of example.c
size example
511141 1928 7052 520121 7efb9 example
My question is in both cases if both functions are called final text size is same but if only one function is called then there is difference of 16
but multiplication.o size is 23 so definitly it has been not included but how we will justify 16.
If i am missing some fundamental itself ?
To dynamically load and link a library at runtime requires code to perform the load/link operation. That capability is normally part of an operating system. Moreover in a system without mass-storage of some kind, dynamic linking would not have any benefits since the dynamically linked code would have to exist in memory in any case so may as well have been statically linked.
To answer the second part of your question, a static library is simply a collection of object files in an archive. The linker will only extract and link the object code necessary to resolve symbols referenced in the executable as a whole. Some smart linkers can discard unused functions from within an object file, but you should not rely on that.
So by linking a static library you are not including all the unused code in the library. You can probably tell that by comparing the size of all your library files with the size of the executable binary - you will probably see that your executable is far smaller than the sum of the sizes of the libraries linked. Also your linker will have an option to create a map file which will tell you exactly what code has been included, and if it has a cross-reference output facility, what code references or is referenced by what.
If you are building your own static libraries, or even your own non-library code, it will pay to ensure good granularity at the object file level. For example if an object file contains two functions, one used and one unused, most linkers will have no choice but to include both, whereas if the functions are defined in separate compilation units (source files), then they will be in separate object files (even when collated into a library) and can be separately linked.
If you really have a embedded system without any operating system, then your hardware has essentially a fixed software, which you can change only by physical means (e.g. a soldering iron, or plugging something, etc...). In that case, that software runs on the "bare iron" and is doing somehow what an OS is providing (it is managing the physical resources and interacts directly with the I/O ports by appropriate machine instruction).
In particular, an embedded system without any OS cannot have any kind of dynamic libraries, because by definition these libraries need to be inside some files (on the embedded processor), and to have files you need an operating system.
The exact definition of what exactly is an operating system is debatable and fuzzy; I believe that providing a file system is one of the roles of most current OSes
Since shared libraries (or static libraries) are libraries sitting inside some files, you cannot have them without an OS. Something which provide files is by definition an operating system.
Perhaps you are using a cross-development chain to develop your embedded software. If you want to get something which runs on the bare metal, your chain has to ultimately give a single binary image which you can flash into a ROM, then solder or plug that ROM -or transfer somehow physically- in your embedded hardware (some tools enable you to flash an entire self contained processor).
I believe you might be confused, and you should read more about operating systems, kernels, the linux kernel, file systems, syscalls, RTOS, linkers & loaders, cross-compilers, microcontrollers, shared libraries, dynamic linkers ....
As Clifford suggested in comments, you could have an embedded system with some file system and some dynamic linker; in my view that would make an embryonic operating system, but it is a debatable matter of definition.
Notice that making a dynamic linker might not be an easy task (you'll need to do relocation); you could either make a generic ELF dynamic loader, or you could restrict the form of the dynamically loaded modules, and perhaps use your specific ld script to generate them.
You already have all the fundamentals you need. Without an operating system, mass storage (disc, filesystem, etc) and mulitple/many different programs that can take advantage of the shared library it doesnt make any sense. You dont save anything and it probably costs you a little more if you were to fake it enough to use a shared library in a fixed bare metal environment.
You mentioned having codesourcery, how do you learn these things? You disassemble your binaries and see what the compiler did. Does it link the entire gcc library because you used one divide? Does it link the entire C library because you used one function (does it even work to try to link a C library function, many have system calls to an operating system which you have to resolve). Start by using a simple divide in a very simple function (needs to be generic)
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a/b);
}
DO NOT call that function with fixed constants and do not call it from the same .c file, the best thing would be to simply add that function as is, and do nothing else with it just have it sit there. You may hit problems even trying to compile it, once you do, disassemble and see what the compiler did with it, see if the entire gcc library was added or just the code for that one function.
You cant trust any old web page or resource as it may not be the same tools you are using and may be out dated, the compiler you are using right now is the one that matters, right now, no other. And the answers are all right there in front of you.
No, they dont use dynamic libraries, the functions needed are linked in as needed. The optimizer may choose to inline some code, but in general the code for each function is in one place and each call to it is a call, it is not like a macro, in general. Again the optimizer may choose otherwise for performance reasons (small enough functions that dont consume too much memory and are small enough that the code required to make a function call is excessive compared to the function itself. Also that function needs to be in the same optimization space, for gcc this is the same .c file, for llvm this could be any code in the project.
I have some examples, cortex-m and others, bare metal. http://github.com/dwelch67 you may find some that may help answer your questions, examine for example that the compiler will implement a public function like the one above AND inline it when used. If you declare the function as static, then the optimizer, if it inlines, doesnt need to implement the function in the binary. if you make a call to a function like that in the same .c file, for example
c = fun(10,5);
there is a good chance that the optimizer if used, will replace that code with
c = 2;
and not perform the divide at all.
I have a Serial number string "1080910" embedded in a programmable device which has been downloaded to a binary file using the ALL-100 programmer. This is my Master file as it were. I need to change this serial number to that of the unit that I need to re-flash using the Master file - the ALL-100 programmer uses XACCESS User Interface which has Edit feature showing Address location, Hex data field and Ascii field. Somewhere in this file is the serial number string - can anybody assist me in how to locate and edit the serial number string as I have been unable to locate it using the search function and have not been able to visually pick up the sequence of numbers. Help !!!
If the data has a symbolic address in the source code, and is not a local variable, its address will appear in the map file generated by the linker. If it is a local variable initialised with a literal constant, then the data will exist in the static initialisation data the location of which should also be identified in the map file.
Another possibility is that your application image is compressed and the start-up code expands it into RAM at run-time. This will be obvious in the map file if the data and code addresses are in RAM rather than ROM. If this is the case then what you are attempting will be very difficult. You would have to know the compression algorithm used, and which part of the image is the commpressed part (part of it will be the decompression code that runs from ROM). You would then have to decompress the image, modify the string, and then recompress it. Further, if the decompression performs any kind of checksum on the compressed or decompressed data, you will have to recalculate and modify that too.
If this was a requirement from the outset, you would have done better to reserve the space in the linker script or use compiler specific extensions to absolutely locate the data at a specific location.
Maybe it is stored in Unicode, so alternate chars are 00.