Process for reducing the size of an executable - embedded

I'm producing a hex file to run on an ARM processor which I want to keep below 32K. It's currently a lot larger than that and I wondered if someone might have some advice on what's the best approach to slim it down?
Here's what I've done so far
So I've run 'size' on it to determine how big the hex file is.
Then 'size' again to see how big each of the object files are that link to create the hex files. It seems the majority of the size comes from external libraries.
Then I used 'readelf' to see which functions take up the most memory.
I searched through the code to see if I could eliminate calls to those functions.
Here's where I get stuck, there's some functions which I don't call directly (e.g. _vfprintf) and I can't find what calls it so I can remove the call (as I think I don't need it).
So what are the next steps?
Response to answers:
As I can see there are functions being called which take up a lot of memory. I cannot however find what is calling it.
I want to omit those functions (if possible) but I can't find what's calling them! Could be called from any number of library functions I guess.
The linker is working as desired, I think, it only includes the relevant library files. How do you know if only the relevant functions are being included? Can you set a flag or something for that?
I'm using GCC

General list:
Make sure that you have the compiler and linker debug options disabled
Compile and link with all size options turned on (-Os in gcc)
Run strip on the executable
Generate a map file and check your function sizes. You can either get your linker to generate your map file (-M when using ld), or you can use objdump on the final executable (note that this will only work on an unstripped executable!) This won't actually fix the problem, but it will let you know of the worst offenders.
Use nm to investigate the symbols that are called from each of your object files. This should help in finding who's calling functions that you don't want called.
In the original question was a sub-question about including only relevant functions. gcc will include all functions within every object file that is used. To put that another way, if you have an object file that contains 10 functions, all 10 functions are included in your executable even if one 1 is actually called.
The standard libraries (eg. libc) will split functions into many separate object files, which are then archived. The executable is then linked against the archive.
By splitting into many object files the linker is able to include only the functions that are actually called. (this assumes that you're statically linking)
There is no reason why you can't do the same trick. Of course, you could argue that if the functions aren't called the you can probably remove them yourself.
If you're statically linking against other libraries you can run the tools listed above over them too to make sure that they're following similar rules.

Another optimization that might save you work is -ffunction-sections, -Wl,--gc-sections, assuming you're using GCC. A good toolchain will not need to be told that, though.
Explanation: GNU ld links sections, and GCC emits one section per translation unit unless you tell it otherwise. But in C++, the nodes in the dependecy graph are objects and functions.

On deeply embedded projects I always try to avoid using any standard library functions. Even simple functions like "strtol()" blow up the binary size. If possible just simply avoid those calls.
In most deeply embedded projects you don't need a versatile "printf()" or dynamic memory allocation (many controllers have 32kb or less RAM).
Instead of just using "printf()" I use a very simple custom "printf()", this function can only print numbers in hexadecimal or decimal format not more. Most data structures are preallocated at compile time.

Andrew EdgeCombe has a great list, but if you really want to scrape every last byte, sstrip is a good tool that is missing from the list and and can shave off a few more kB.
For example, when run on strip itself, it can shave off ~2kB.
From an old README (see the comments at the top of this indirect source file):
sstrip is a small utility that removes the contents at the end of an
ELF file that are not part of the program's memory image.
Most ELF executables are built with both a program header table and a
section header table. However, only the former is required in order
for the OS to load, link and execute a program. sstrip attempts to
extract the ELF header, the program header table, and its contents,
leaving everything else in the bit bucket. It can only remove parts of
the file that occur at the end, after the parts to be saved. However,
this almost always includes the section header table, and occasionally
a few random sections that are not used when running a program.
Note that due to some of the information that it removes, a sstrip'd executable is rumoured to have issues with some tools. This is discussed more in the comments of the source.
Also... for an entertaining/crazy read on how to make the smallest possible executable, this article is worth a read.

Just to double-check and document for future reference, but do you use Thumb instructions? They're 16 bit versions of the normal instructions. Sometimes you might need 2 16 bit instructions, so it won't save 50% in code space.
A decent linker should take just the functions needed. However, you might need compiler & linke settings to package functions for individual linking.

Ok so in the end I just reduced the project to it's simplest form, then slowly added files one by one until the function that I wanted to remove appeared in the 'readelf' file. Then when I had the file I commented everything out and slowly add things back in until the function popped up again. So in the end I found out what called it and removed all those calls...Now it works as desired...sweet!
Must be a better way to do it though.

To answer this specific need:
•I want to omit those functions (if possible) but I can't find what's
calling them!! Could be called from any number of library functions I
guess.
If you want to analyze your code base to see who calls what, by whom a given function is being called and things like that, there is a great tool out there called "Understand C" provided by SciTools.
https://scitools.com/
I have used it very often in the past to perform static code analysis. It can really help to determine library dependency tree. It allows to easily browse up and down the calling tree among other things.
They provide a limited time evaluation, then you must purchase a license.

You could look at something like executable compression.

Related

How do I add a prefix to all symbols in an elf object file but so that debugging still works?

I want to add a prefix to every symbol in an elf object file, how do you do that using Linux (eg debian)?
I need the debug information to still work (ie, gdb can still debug effectively albeit using the new names for all the symbols).
The elf object is relocatable.
A solution for a non-relocatable object would also be welcome.
A solution for which code-coverage stats continues to work would also be welcome but is not necessary.
I don't know of any canned way to do this.
I think it could be done by rewriting the ELF symbol table and the DWARF information as well. This is not trivial, though perhaps you could implement it using the various libraries in elfutils.

Where is the implementation part of frameworks' header files (objective-C)?

I am learning Objective-C, but can't understand one thing with the frameworks. Each framework in objective-C contains header files which contain only #interface part. That means that header files only declare difference methods and do not implement them. Is this implementation part hidden in the frameworks or something, because I can't get how it works.
Thank you in advance for your answers!
Is this implementation part hidden in the frameworks or something
Well, sort of. It's compiled (the actual source code is not present neither in the SDK nor in the OS) and only the binary executable code is contained within the dynamic library that resides inside the framework.
It is still possible to use them (i. e. link against them), obviously (see this for an explanation), but you cannot edit the source code. In theory, you could try binpatching them (i. e. disassembling, analyzing and editing the executable file using a hex editor or something), but that's neither recommended (you can screw up your entire system if you do one slight thing wrong), nor easy.

static versus shared libraries in small embedded systems using C without OS (assuming XIP)

Does small embedded system without RTOS/OS uses dynamic/shared libraries. my understanding is that its very tough to use it and will be not productive.
If we are calling an API multiple times which is present in a static library. Does API code will be placed at every call location like macro expansion or code/text will be common for all calls. I think code/text will be common.
If I have made a static library for a .c files which has multiple API's and I am statically linking it with main file and in main file only one API has been called so my question is does whole library is included in final .bin or only particular API code.
from above questions you can assume that I am missing fundamentals itself so can anyone please provide the related links to brush up these.
Regards
[edit]
I have tried following things
addition.c module
`int addition(int a,int b)`
`{`
`int result;`
`result = a + b;`
`return result;`
`}`
`size addition.o`
23 0 0 23 17 addition.o
multiplication.c module
`int multiplication(int a, int b)`
`{`
`int result;`
`result = a * b;`
`return result;`
`}`
`size multiplication.o`
21 0 0 21 15 multiplication.o
created object file of both and put in archieve
ar cr libarith.a addition.o multiplication.o
then statically linked to my main application
example.c module
`#include "header.h"`
`#include <stdio.h>`
`1:int main()`
`2:{`
`3:int result;`
`4:result = addition(1,2);`
`5:printf("addition result is : %d\n",result);`
`6:result = multiplication(3,2);`
`7:printf("multiplication result is : %d\n",result);`
`8:return 0;`
`9:}`
gcc -static example.c -L. -larith -o example
size of example
511141 1928 7052 520121 7efb9 example
commented line number 6 of example.c
and again linked
gcc -static example.c -L. -larith -o example
size of example
511109 1928 7052 520089 7ef99 example
32 bytes of difference between above two
thats mean addition.o is not included in example
merged both modules addition.c and multiplication.c as addmult.c as below
int addition(int a,int b)
{
int result;
result = a + b;
return result;
}
int multiplication(int a, int b)
{
int result;
result = a * b;
return result;
}
created object file and put in archieve
before doing that i have deleted previous archieve
ar cr libarith.a addmult.o
now commented line number 6 of example.c
gcc -static example.c -L. -larith -o example
size example
511093 1928 7052 520073 7ef89 example
uncommented line nmber 6 of example.c
size example
511141 1928 7052 520121 7efb9 example
My question is in both cases if both functions are called final text size is same but if only one function is called then there is difference of 16
but multiplication.o size is 23 so definitly it has been not included but how we will justify 16.
If i am missing some fundamental itself ?
To dynamically load and link a library at runtime requires code to perform the load/link operation. That capability is normally part of an operating system. Moreover in a system without mass-storage of some kind, dynamic linking would not have any benefits since the dynamically linked code would have to exist in memory in any case so may as well have been statically linked.
To answer the second part of your question, a static library is simply a collection of object files in an archive. The linker will only extract and link the object code necessary to resolve symbols referenced in the executable as a whole. Some smart linkers can discard unused functions from within an object file, but you should not rely on that.
So by linking a static library you are not including all the unused code in the library. You can probably tell that by comparing the size of all your library files with the size of the executable binary - you will probably see that your executable is far smaller than the sum of the sizes of the libraries linked. Also your linker will have an option to create a map file which will tell you exactly what code has been included, and if it has a cross-reference output facility, what code references or is referenced by what.
If you are building your own static libraries, or even your own non-library code, it will pay to ensure good granularity at the object file level. For example if an object file contains two functions, one used and one unused, most linkers will have no choice but to include both, whereas if the functions are defined in separate compilation units (source files), then they will be in separate object files (even when collated into a library) and can be separately linked.
If you really have a embedded system without any operating system, then your hardware has essentially a fixed software, which you can change only by physical means (e.g. a soldering iron, or plugging something, etc...). In that case, that software runs on the "bare iron" and is doing somehow what an OS is providing (it is managing the physical resources and interacts directly with the I/O ports by appropriate machine instruction).
In particular, an embedded system without any OS cannot have any kind of dynamic libraries, because by definition these libraries need to be inside some files (on the embedded processor), and to have files you need an operating system.
The exact definition of what exactly is an operating system is debatable and fuzzy; I believe that providing a file system is one of the roles of most current OSes
Since shared libraries (or static libraries) are libraries sitting inside some files, you cannot have them without an OS. Something which provide files is by definition an operating system.
Perhaps you are using a cross-development chain to develop your embedded software. If you want to get something which runs on the bare metal, your chain has to ultimately give a single binary image which you can flash into a ROM, then solder or plug that ROM -or transfer somehow physically- in your embedded hardware (some tools enable you to flash an entire self contained processor).
I believe you might be confused, and you should read more about operating systems, kernels, the linux kernel, file systems, syscalls, RTOS, linkers & loaders, cross-compilers, microcontrollers, shared libraries, dynamic linkers ....
As Clifford suggested in comments, you could have an embedded system with some file system and some dynamic linker; in my view that would make an embryonic operating system, but it is a debatable matter of definition.
Notice that making a dynamic linker might not be an easy task (you'll need to do relocation); you could either make a generic ELF dynamic loader, or you could restrict the form of the dynamically loaded modules, and perhaps use your specific ld script to generate them.
You already have all the fundamentals you need. Without an operating system, mass storage (disc, filesystem, etc) and mulitple/many different programs that can take advantage of the shared library it doesnt make any sense. You dont save anything and it probably costs you a little more if you were to fake it enough to use a shared library in a fixed bare metal environment.
You mentioned having codesourcery, how do you learn these things? You disassemble your binaries and see what the compiler did. Does it link the entire gcc library because you used one divide? Does it link the entire C library because you used one function (does it even work to try to link a C library function, many have system calls to an operating system which you have to resolve). Start by using a simple divide in a very simple function (needs to be generic)
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a/b);
}
DO NOT call that function with fixed constants and do not call it from the same .c file, the best thing would be to simply add that function as is, and do nothing else with it just have it sit there. You may hit problems even trying to compile it, once you do, disassemble and see what the compiler did with it, see if the entire gcc library was added or just the code for that one function.
You cant trust any old web page or resource as it may not be the same tools you are using and may be out dated, the compiler you are using right now is the one that matters, right now, no other. And the answers are all right there in front of you.
No, they dont use dynamic libraries, the functions needed are linked in as needed. The optimizer may choose to inline some code, but in general the code for each function is in one place and each call to it is a call, it is not like a macro, in general. Again the optimizer may choose otherwise for performance reasons (small enough functions that dont consume too much memory and are small enough that the code required to make a function call is excessive compared to the function itself. Also that function needs to be in the same optimization space, for gcc this is the same .c file, for llvm this could be any code in the project.
I have some examples, cortex-m and others, bare metal. http://github.com/dwelch67 you may find some that may help answer your questions, examine for example that the compiler will implement a public function like the one above AND inline it when used. If you declare the function as static, then the optimizer, if it inlines, doesnt need to implement the function in the binary. if you make a call to a function like that in the same .c file, for example
c = fun(10,5);
there is a good chance that the optimizer if used, will replace that code with
c = 2;
and not perform the divide at all.

Extract Objective-c binary

Is it possible to extract a binary, to get the code that is behind the binary? With Class-dump you can see the implementation addresses, but is it possible to also see the code thats IN the implementation addresses? Is there ANY way to do it?
All your code compiles to single instructions, placed in the text section of your executable. The compiler is responsible for translating your higher level language to the processor specific instructions, which are simpler. Reverting this process would be nearly impossible, unless the code is quite simple. Some problems are ambiguity of statements, and the overall readability: local variables, for instance, will be nothing but an offset address.
If you want to read the disassembled code (the instructions of which the higher level code was compiled to) use this command in an executable:
otool -tV file
You can decompile (more accurately, disassemble) a binary and get it's assembly, but there is no way to get back the original Objective-C.
My curiosity begs me to ask why you want to do this!?
otx http://otx.osxninja.com/ is a good tool for symbolicating the otool based disassembly
It will handle both x86_64 and i386 disassembly.
and
Mach-O-Scope https://github.com/smorr/Mach-O-Scope is a a tool built on top of otx to dump it all into a sqlite3 database for browsing and annotating.
It won't give you the original source -- but it will get you pretty close providing you with the messages that are being sent around in methods.

Refactoring dissassembled code

You write a function and, looking at the resulting assembly, you see it can be improved.
You would like to keep the function you wrote, for readability, but you would like to substitute your own assembly for the compiler's. Is there any way to establish a relationship between your high-livel language function and the new assembly?
If you are looking at the assembly, then its fair to assume that you have a good understanding about how code gets compiled down. If you have this knowledge, then its sometimes possible to 'reverse enginer' the changes back up into the original language but its often better not to bother.
The optimisations that you make are likely to be very small in comparison to the time and effort required in first making these changes. I would suggest that you leave this kind of work to the compiler and go have a cup of tea. If the changes are significant, and the performance is critical, (as say in the embedded world) then you might want to mix the normal code with the assemblar in some fashion, however, on most computers and chips the performance is usually sufficient to avoid this headache.
If you really need more performance, then optimise the code not the assembly.
None, I suppose. You've rejected the compiler's work in favor of your own. You might as well throw out the function you wrote in the compiled language, because now all you have is your assembler in that platform.
I would highly advise against engaging in this kind of optimization because unless you're sure, via profiling and analysis, that you truly are making a difference.
It depends on the language you wrote your function in. Some languages like C are very low-level, translating each function call or statement to specific assembly statements. If you did use C, you can replace your function with inline assembly to improve performance.
Other high-level languages may convert each statement into macro routines or other more complex calls on the assembly side. Certain optimizations (like tail recursion, loop unrolling, etc) can be implemented easily on the source side, but others (like making more efficient use of the register file) may be impossible (again, depending on the language and the compiler you're using).
Its tough to say there is any relationship between modified assembly and the source which generated the unmodified version. It will certainly confuse debugging tools: register contents will no longer match the source variables they were supposed to correspond to.
There are a number of places in packet processing code where I've examined the generated assembly and gone back to change the original source code in order to improve the result. Re-arranging source can reduce the number of branches, __attribute__ and compiler arguments can align branch points and functions to reduce I$ misses. In desperate cases a little inline assembly can be used, so that the binary can still be compiled from source.
Something you could try is to separate your original function into its own file, and provide a make rule to build the assembler from there. Then update the assembler file with your improved version, and provide a make rule to build an object file from the assembler file. Then change your link rules to include that object file.
If you only ever change the assembler file, that will keep on being used. If you ever change the original higher-level language file, the assembler file will be rebuilt and the object file built from the new (unimproved) version.
This gives you a relationship between the two; you probably want to add a warning comment at the top of the higher-level language file to warn about the behaviour. Using some form of VCS will give you the ability to recover the improved assembler file if you make a mistake here.
If you're writing a native compiled app in Visual C++, there are two methods:
Use the __asm { } block and write your assembler in there.
Write your functions in MASM assembler, assemble to .obj, and link it as an static library. In your C/C++ code, declare the function with an extern "C" declaration.
Other C/C++ compilers have similar approaches.
In this situation, you generally have two options: optimize the code or rewrite the compiler. I can't see where breaking the link between source and op is ever going to be the correct solution.