How do I best determine if a binary contains STAB or DWARF debug information? - elf

When it comes to ELF, two accompanying debugging formats are overwhelmingly popular to others, namely STAB and DWARF. I'd like an easy way to ascertain whether a given binary contains debug information of one form or the other, preferably without having to inspect section names (.stab, etc.).
What are good ways of accomplishing this?

When it comes to ELF, two accompanying debugging formats are overwhelmingly popular
The STABS format has not been used by default by any current compilers on ELF platforms for the last 10 years. It's definitely not "overwhelmingly popular".
The way to tell:
readelf -WS ./a.out | egrep '\.(stab |debug)'
You will see .stab section if the binary has STABS. You will see .debug_info, .debug_line, etc. if you have DWARF. You'll see nothing if you don't have debug info at all.
preferably without having to inspect section names
Can't be done without looking at sections: it's the presence or absence of these sections that makes the binary contain or not contain debug info.

Related

Reading DIEs in ELF file

Hello I fairly new to the DWARF standard and ELF format. I have a few questions. I am using the DWARF 2 standard and I have a pretty basic understanding of how DIEs work and I was needing more clarity on how they are represented in bytes.
ELF Wiki provides a good table for in which order the bytes go in the program header, sections, and segments. But what is the correct way to represent DIEs in bytes for the DWARF 2 standard?
I have tried to dive deep into Dwarf Standards pdf documents to try to understand how DIEs are represented in bytes. Perhaps there is a section I am missing?
I would like to use this information to be able to delete certain DIEs to save space in the debugging section. I am only interest in the DIEs that provide variable address's.
I recommend that anyone starting out in DWARF begin with the Introduction to the DWARF Debugging Format. It's a very concise overview that provides an excellent foundation for exploring the format in further depth. Armed with this background, compile a debug version of a very simple program and compare a hex dump of the two ELF sections .debug_abbrev and .debug_info with the output of dwarfdump or readelf.
Once you are broadly familiar with the encoding of a DIE you will see that simply deleting its corresponding bytes from .debug_info would corrupt the entire file — in terms of both DWARF and ELF. For example, each DIE is identified by its relative file offset; deleting one DIE's bytes would alter the offsets of all subsequent DIEs and any references to them would therefore be broken. A robust solution would require parsing the DWARF to create an internal representation of the tree before eliminating unwanted nodes and writing out new DWARF. After modifying .debug_info you'd then need to edit the fabric of the ELF itself: at the very least, this would involve updating the section header table to reflect the new offsets for any shifted sections and updating any relocations.
If your principal concern is indeed space saving then I suggest you instead investigate what compiler options you have. The Oracle Studio Compilers, for example, allow fine control over the content included in the DWARF. Depending on your compiler and OS it may also be possible to emit files with compressed DWARF sections (e.g. .zdebug_info) or even leave the DWARF in different files altogether. The problem of DWARF bloat is well known and, if you are interested in tackling it at a low level yourself, you will find other suggestions in Michael Eager's introduction and in later versions of the standard.
The format is explained page 66 in sections 7.5.2 and 7.5.3.
The example in appendix 2, page 93 is is much clearer:
Each DIE references a corresponding entry in .debug_abbrev which defines a given DIE "signature" i.e.
its type (DW_TAG_*)
it has child DIE
its attribute (DW_AT_*) and their form (DW_FORM_*).
The format od the DIE is:
reference to a abbreviation (LEB128 i.e. variable length);
0 is used for ending a list of children̄;
une value per attribute (using the encoding associated with the given form).

Extract Objective-c binary

Is it possible to extract a binary, to get the code that is behind the binary? With Class-dump you can see the implementation addresses, but is it possible to also see the code thats IN the implementation addresses? Is there ANY way to do it?
All your code compiles to single instructions, placed in the text section of your executable. The compiler is responsible for translating your higher level language to the processor specific instructions, which are simpler. Reverting this process would be nearly impossible, unless the code is quite simple. Some problems are ambiguity of statements, and the overall readability: local variables, for instance, will be nothing but an offset address.
If you want to read the disassembled code (the instructions of which the higher level code was compiled to) use this command in an executable:
otool -tV file
You can decompile (more accurately, disassemble) a binary and get it's assembly, but there is no way to get back the original Objective-C.
My curiosity begs me to ask why you want to do this!?
otx http://otx.osxninja.com/ is a good tool for symbolicating the otool based disassembly
It will handle both x86_64 and i386 disassembly.
and
Mach-O-Scope https://github.com/smorr/Mach-O-Scope is a a tool built on top of otx to dump it all into a sqlite3 database for browsing and annotating.
It won't give you the original source -- but it will get you pretty close providing you with the messages that are being sent around in methods.

Autodocumentation type functionality for Fortran?

In the past I've used Doxygen for C and C++, but now I've been thrown on Fortran project and I would like to get a quick all encompassing look at the architecture.
In the past I've found reverse engineering tools to be useful where no documentation of the architecture exists.
So, is there a tool out there that will reverse engineer Fortran code?
I tried to use Doxygen, but didn't have any luck. I will be working with two different projects - one Fortran 90 and I think is in Fortran 77.
Thanks for any insights and feedback.
Tools which may help with reverse engineering:
SciTools Understand
Link with some more tools (search "fortran")
Also, maybe some of these unit testing frameworks will be helpful (I haven't used them, so I cannot comment on the pros and cons of any of them):
FUnit
FRUIT
Ftnunit
(these links link to fortranwiki, where you can find a tidbit on every one of them, and from there there are links to their home sites).
Doxygen 1.6.1 will generate documentation, call graphs, etc. for Fortran source code in free-format (F90) format. You are out of luck for auto-documenting fixed-format (F77) code with doxygen.
All is not lost, however. The conversion from fixed to free format is straightforward and can be automated to a great degree - change comment characters to '!', change continuation characters to '&', and append '&' to lines to be continued. In fact, if the appended continuation character is placed in column 73, it should be ignored by standard F77 compilers (which still only recognize code in columns 1 through 72) but will be recognized by F9x/F2003/F2008 compilers. This allows the same code to be recognized as both in fixed and free format, which lets you gracefully migrate from one format to the other.
Conveniently, there are about a thousand small programs that will do this format adjustment to some degree or another. Realistically, if you're going to be maintaining the code, you might as well move it away from the 1928 spec for Hollerith (IBM) punched cards. :)

Batch source-code aware spell check

What is a tool or technique that can be used to perform spell checks upon a whole source code base and its associated resource files?
The spell check should be source code aware meaning that it would stick to checking string literals in the code and not the code itself. Bonus points if the spell checker understands common resource file formats, for example text files containing name-value pairs (only check the values). Super-bonus points if you can tell it which parts of an XML DTD or Schema should be checked and which should be ignored.
Many IDEs can do this for the file you are currently working with. The difference in what I am looking for is something that can operate upon a whole source code base at once.
Something like a Findbugs or PMD type tool for mis-spellings would be ideal.
As you mentioned, many IDEs have this functionality already, and one such IDE is Eclipse. However, unlike many other IDEs Eclipse is:
A) open source
B) designed to be programmable
For instance, here's an article on using Eclipse's code formatting functionality from the command line:
http://www.peterfriese.de/formatting-your-code-using-the-eclipse-code-formatter/
In theory, you should be able to do something similar with it's spell-checking mechanism. I know this isn't exactly what you're looking for, and if there is a program for doing spell-checking in code then obviously that'd be better, but if not then Eclipse may be the next best thing.
This seems little old but seems to do a good job
Source Code Spell Checker

Process for reducing the size of an executable

I'm producing a hex file to run on an ARM processor which I want to keep below 32K. It's currently a lot larger than that and I wondered if someone might have some advice on what's the best approach to slim it down?
Here's what I've done so far
So I've run 'size' on it to determine how big the hex file is.
Then 'size' again to see how big each of the object files are that link to create the hex files. It seems the majority of the size comes from external libraries.
Then I used 'readelf' to see which functions take up the most memory.
I searched through the code to see if I could eliminate calls to those functions.
Here's where I get stuck, there's some functions which I don't call directly (e.g. _vfprintf) and I can't find what calls it so I can remove the call (as I think I don't need it).
So what are the next steps?
Response to answers:
As I can see there are functions being called which take up a lot of memory. I cannot however find what is calling it.
I want to omit those functions (if possible) but I can't find what's calling them! Could be called from any number of library functions I guess.
The linker is working as desired, I think, it only includes the relevant library files. How do you know if only the relevant functions are being included? Can you set a flag or something for that?
I'm using GCC
General list:
Make sure that you have the compiler and linker debug options disabled
Compile and link with all size options turned on (-Os in gcc)
Run strip on the executable
Generate a map file and check your function sizes. You can either get your linker to generate your map file (-M when using ld), or you can use objdump on the final executable (note that this will only work on an unstripped executable!) This won't actually fix the problem, but it will let you know of the worst offenders.
Use nm to investigate the symbols that are called from each of your object files. This should help in finding who's calling functions that you don't want called.
In the original question was a sub-question about including only relevant functions. gcc will include all functions within every object file that is used. To put that another way, if you have an object file that contains 10 functions, all 10 functions are included in your executable even if one 1 is actually called.
The standard libraries (eg. libc) will split functions into many separate object files, which are then archived. The executable is then linked against the archive.
By splitting into many object files the linker is able to include only the functions that are actually called. (this assumes that you're statically linking)
There is no reason why you can't do the same trick. Of course, you could argue that if the functions aren't called the you can probably remove them yourself.
If you're statically linking against other libraries you can run the tools listed above over them too to make sure that they're following similar rules.
Another optimization that might save you work is -ffunction-sections, -Wl,--gc-sections, assuming you're using GCC. A good toolchain will not need to be told that, though.
Explanation: GNU ld links sections, and GCC emits one section per translation unit unless you tell it otherwise. But in C++, the nodes in the dependecy graph are objects and functions.
On deeply embedded projects I always try to avoid using any standard library functions. Even simple functions like "strtol()" blow up the binary size. If possible just simply avoid those calls.
In most deeply embedded projects you don't need a versatile "printf()" or dynamic memory allocation (many controllers have 32kb or less RAM).
Instead of just using "printf()" I use a very simple custom "printf()", this function can only print numbers in hexadecimal or decimal format not more. Most data structures are preallocated at compile time.
Andrew EdgeCombe has a great list, but if you really want to scrape every last byte, sstrip is a good tool that is missing from the list and and can shave off a few more kB.
For example, when run on strip itself, it can shave off ~2kB.
From an old README (see the comments at the top of this indirect source file):
sstrip is a small utility that removes the contents at the end of an
ELF file that are not part of the program's memory image.
Most ELF executables are built with both a program header table and a
section header table. However, only the former is required in order
for the OS to load, link and execute a program. sstrip attempts to
extract the ELF header, the program header table, and its contents,
leaving everything else in the bit bucket. It can only remove parts of
the file that occur at the end, after the parts to be saved. However,
this almost always includes the section header table, and occasionally
a few random sections that are not used when running a program.
Note that due to some of the information that it removes, a sstrip'd executable is rumoured to have issues with some tools. This is discussed more in the comments of the source.
Also... for an entertaining/crazy read on how to make the smallest possible executable, this article is worth a read.
Just to double-check and document for future reference, but do you use Thumb instructions? They're 16 bit versions of the normal instructions. Sometimes you might need 2 16 bit instructions, so it won't save 50% in code space.
A decent linker should take just the functions needed. However, you might need compiler & linke settings to package functions for individual linking.
Ok so in the end I just reduced the project to it's simplest form, then slowly added files one by one until the function that I wanted to remove appeared in the 'readelf' file. Then when I had the file I commented everything out and slowly add things back in until the function popped up again. So in the end I found out what called it and removed all those calls...Now it works as desired...sweet!
Must be a better way to do it though.
To answer this specific need:
•I want to omit those functions (if possible) but I can't find what's
calling them!! Could be called from any number of library functions I
guess.
If you want to analyze your code base to see who calls what, by whom a given function is being called and things like that, there is a great tool out there called "Understand C" provided by SciTools.
https://scitools.com/
I have used it very often in the past to perform static code analysis. It can really help to determine library dependency tree. It allows to easily browse up and down the calling tree among other things.
They provide a limited time evaluation, then you must purchase a license.
You could look at something like executable compression.