How can the --add-section switch of OBJCOPY be used? - embedded

There are really two questions that revolve around the use of --add-section. The simple one is in the title. Based on my reading, I haven't been able to figure out how one could execute --add-section.
To use add-section, I have to pass a section name. If I use an existing section name the program responds with "can't add section '.data': File in wrong format." Perhaps I just need to pass another parameter. If I use a new section name, which I would prefer to do, I'm warned that "allocated section '.blob' not in segment."
Now, I have gotten my feature to work as I need it to aside from the "not in segment" warning. I'd like to figure out if there is a legitimate way to put this blob into the executable. I would link it in, but that isn't so easy because the data I'm adding is generated from the contents of the executable itself.
The second question is really what I care about. Is there a way to do the following given that the blob cannot be computed until after the link is complete.
Link ELF file
Generate blob from ELF file and other data
Add blob to ELF file so that it is loaded at run-time to the correct location in memory
objcopy --add-section .blob=blob.o \
--set-section-flags .blob=alloc,contents,load,readonly \
--change-section-address .blob=ADDRESS \
program.elf program.blobbed.elf
I'd be happy to add a section and/or segment to the ELF file as part of the link and insert this blob there. I'm not sure how to do that.
It has occurred to me that I could accomplish this feat with a second link, but objcopy would be cleaner.
Link ELF file
Generate blob from ELF file and other data
Re-link ELF file including new blob.o
UPDATE: This last strategy may be workable as long as the relink doesn't change something in the portion of the program that was produced by the first link. It doesn't on first attempts, but it may be possible to work around it. Hence, the desire to use --add-section to add in this blob instead of going through a second link.

You may add that section, fill it with, say, NULs, and then compute your blob. Then patch that blob into this section. Later, when you check the integrity of the ELF, do as if that section was full of NULs and compute the blob again. Finally, compare both computed blob and blob stored in section.

Not specifically answering your question but one approach I used to use for this sort of thing was to link in a placeholder block and then just patch the correct value in afterwards.
I know this isn't what you want to do, but it is a pretty simple and reliable way to do it. And has the major advantage of being tool chain/platform agnostic.

Related

How to read a Executable(.EXE) file in OpenVMS

When am trying to open any .EXE file am getting information in encoded form. Any idea how to see the content of an .EXE file ????
I need to know what Database tables are used in the particular .EXE.
Ah, now we are getting closer to the real question.
It is probably much more productive to ask the targeted databases about the SQL queries being execute during the run, or a top-ten shortly afterwards.
The table-names might not be hard-coded recognizably as such in the executable.
They might be obtained by a lookup, and some fun pre-fixing or other transformation might be in place.
Admittedly they like are clear text.
Easiest is probably to just transfer to a Unix server and use STRINGS on the image.
I want to include the source here with but that failed, and I cannot find how to attach a file. Below you'll find a link OpenVMS macro program source for a STRINGS like tool. Not sure how long the link will survive.
Just read for instructions, save (strings.mar), compile ($ MACRO strings), link ($link strings), and activate ($ mcr sys$login:strings image_to_test.exe)
OpenVMS Macro String program text
Good luck!
Hein
Use analyze/image to view the contents of an executable image file.
I'm guessing you are trying to look in the EXE because you do not have access to the source. I do something like this:
$ dump/record/byte/hex/out=a.a myexe.exe
Then look at a.a with any text editor (132 columns). The linker groups string literals together, and they are mostly near the beginning of the EXE, so you don't have to look to far into the file. Of course this only helps if the database references are string literals.
The string literal might be broken across a block (512 byte) boundary, so if you use search in your editor, try looking for substrings.
Aksh - you are chasing your tail on this one. Its a false dawn. Even if you could (and you can't) find the database tables, you will need the source of the .exe to do anything sensible with it, or the problem you are trying to solve. Its possible to write a program which just lists all the tables in a database without reading any of 'em. So you could spend and awful lot of effort and get nowhere. Hope this helps

Get file type of given file - based on contents

OK, it may sound fairly straightforward but I'm still not sure how to go about it.
I know it's possible to check file type based on file extensions, using UTIs (e.g. Get the type of a file in Cocoa).
However, I need to be able to get the file type (in more general terms, like "text", "image", "else"), depending on the content.
Is that possible?
Any ideas?
One route forward is to call the file command and parse its output, but that is fairly horrible, and I wouldn't do that as it's slow and you are susceptible to changes in the output.
The file command uses a pretty extensive database of byte patterns to test the contents of the file and I would be tempted to implement my own internal version of it, or use this library (which I think might need some work before it works under OSX).

Find duplicate PDFs

I'm looking for a utility that will help me find duplicate PDFs. The problem: I have a 1000s of PDF files. Some are duplicates. They are not easy to detect due differing files names and small differences in file size. Is there a utility/algorithm/library that can help me find the duplicates or show me files that are very similar (or degree of difference)?
Create an MD5 hash for each file and store it in a database. Identical files will then sort next to each other, or you can quickly search for a pre-existing key.
The problem is not yet solved in any way. What I do, is I use fdupes http://premium.caribe.net/~adrian2/fdupes.html to find exact duplicates.
But most of all, I use a workflow which minimizes duplicates. Every document that enters my system gets indexed with this perl-script I wrote: http://seegras.discordia.ch/Programs/fileindex which puts some name and an md5-sum of it into ~/.fileindex.md5 Now I can change metadata of the local PDF-files or whatever (and run fileindex again), and whenever I accidently download the same file again, I will stil lhave the md5-sum of the original file, and thus can detect whether it's a duplicate.
There's also exif-meta and exif-rename on http://seegras.discordia.ch/Programs/ which help with setting PDF metadata and with renaming PDF-files according to metadata; and if you're tagging all the files correctly, you will end up with duplicate filenames, indicating that they might be the same document within a different file.
If the files were created by the different tools, they could look the same but generate very different results because they are structured totally differently. I made some suggestions in a blog article at https://blog.idrsolutions.com/2010/09/comparing-2-pdf-files/
DiffPDF looks like something that might help you.
I remember that there is a UNIX utility called pdf2txt (see the package poppler-utils). You can try to extract the text from the files and make a textual diff.

Modifying elf file

I would like to add a new flag to an elf file. This flag should then be available
to the kernel in the process descriptor. My first idea was to use libelf, but unfortunately
there seems to be a bug with it on Ubuntu. Elfedit would have probably been a nice tool but I have not found a version for Linux, in particular Ubuntu.
So, I am wondering if anyone can suggest to me if there is any other useful tool out there
to add a custom flag to an elf file?
Many thanks for your help!
People who are able to modify the kernel to take advantage of the new flag probably wouldn't be asking how to add the flag to the ELF libraries.
So, how do you plan to have the kernel use this new flag? What is the purpose of the flag?
Since you are adding to the standard libelf, can't you fix the bug for Ubuntu and let them know that you've done so (make the fix available to them - though they'll probably need to relay it back up the chain).
Please look at ELFIO library. It contains WriteObj and Writer examples. By using the library, you will be able to create and/or modify ELF binary files.
(although old question but for reference I am writing answer based on my own experience)
I suggest to read elf file in memory struct, make changes to flags and load process memory with your in-memory struct. This method will need less efford as compare to bug correction. To start, check file elf.c for elf, program header, section headers struct. you can read file header in your struct which should have three struct members for elf, program, section. start read in your struct from elf header. then read program header on offset given in elf header (iteratively for all program headers). In same way you can read all sections through section headers.
encapsulating 3 headers struct in your own struct also give you oppertunity to have extra needed data in your other struct member.

Process for reducing the size of an executable

I'm producing a hex file to run on an ARM processor which I want to keep below 32K. It's currently a lot larger than that and I wondered if someone might have some advice on what's the best approach to slim it down?
Here's what I've done so far
So I've run 'size' on it to determine how big the hex file is.
Then 'size' again to see how big each of the object files are that link to create the hex files. It seems the majority of the size comes from external libraries.
Then I used 'readelf' to see which functions take up the most memory.
I searched through the code to see if I could eliminate calls to those functions.
Here's where I get stuck, there's some functions which I don't call directly (e.g. _vfprintf) and I can't find what calls it so I can remove the call (as I think I don't need it).
So what are the next steps?
Response to answers:
As I can see there are functions being called which take up a lot of memory. I cannot however find what is calling it.
I want to omit those functions (if possible) but I can't find what's calling them! Could be called from any number of library functions I guess.
The linker is working as desired, I think, it only includes the relevant library files. How do you know if only the relevant functions are being included? Can you set a flag or something for that?
I'm using GCC
General list:
Make sure that you have the compiler and linker debug options disabled
Compile and link with all size options turned on (-Os in gcc)
Run strip on the executable
Generate a map file and check your function sizes. You can either get your linker to generate your map file (-M when using ld), or you can use objdump on the final executable (note that this will only work on an unstripped executable!) This won't actually fix the problem, but it will let you know of the worst offenders.
Use nm to investigate the symbols that are called from each of your object files. This should help in finding who's calling functions that you don't want called.
In the original question was a sub-question about including only relevant functions. gcc will include all functions within every object file that is used. To put that another way, if you have an object file that contains 10 functions, all 10 functions are included in your executable even if one 1 is actually called.
The standard libraries (eg. libc) will split functions into many separate object files, which are then archived. The executable is then linked against the archive.
By splitting into many object files the linker is able to include only the functions that are actually called. (this assumes that you're statically linking)
There is no reason why you can't do the same trick. Of course, you could argue that if the functions aren't called the you can probably remove them yourself.
If you're statically linking against other libraries you can run the tools listed above over them too to make sure that they're following similar rules.
Another optimization that might save you work is -ffunction-sections, -Wl,--gc-sections, assuming you're using GCC. A good toolchain will not need to be told that, though.
Explanation: GNU ld links sections, and GCC emits one section per translation unit unless you tell it otherwise. But in C++, the nodes in the dependecy graph are objects and functions.
On deeply embedded projects I always try to avoid using any standard library functions. Even simple functions like "strtol()" blow up the binary size. If possible just simply avoid those calls.
In most deeply embedded projects you don't need a versatile "printf()" or dynamic memory allocation (many controllers have 32kb or less RAM).
Instead of just using "printf()" I use a very simple custom "printf()", this function can only print numbers in hexadecimal or decimal format not more. Most data structures are preallocated at compile time.
Andrew EdgeCombe has a great list, but if you really want to scrape every last byte, sstrip is a good tool that is missing from the list and and can shave off a few more kB.
For example, when run on strip itself, it can shave off ~2kB.
From an old README (see the comments at the top of this indirect source file):
sstrip is a small utility that removes the contents at the end of an
ELF file that are not part of the program's memory image.
Most ELF executables are built with both a program header table and a
section header table. However, only the former is required in order
for the OS to load, link and execute a program. sstrip attempts to
extract the ELF header, the program header table, and its contents,
leaving everything else in the bit bucket. It can only remove parts of
the file that occur at the end, after the parts to be saved. However,
this almost always includes the section header table, and occasionally
a few random sections that are not used when running a program.
Note that due to some of the information that it removes, a sstrip'd executable is rumoured to have issues with some tools. This is discussed more in the comments of the source.
Also... for an entertaining/crazy read on how to make the smallest possible executable, this article is worth a read.
Just to double-check and document for future reference, but do you use Thumb instructions? They're 16 bit versions of the normal instructions. Sometimes you might need 2 16 bit instructions, so it won't save 50% in code space.
A decent linker should take just the functions needed. However, you might need compiler & linke settings to package functions for individual linking.
Ok so in the end I just reduced the project to it's simplest form, then slowly added files one by one until the function that I wanted to remove appeared in the 'readelf' file. Then when I had the file I commented everything out and slowly add things back in until the function popped up again. So in the end I found out what called it and removed all those calls...Now it works as desired...sweet!
Must be a better way to do it though.
To answer this specific need:
•I want to omit those functions (if possible) but I can't find what's
calling them!! Could be called from any number of library functions I
guess.
If you want to analyze your code base to see who calls what, by whom a given function is being called and things like that, there is a great tool out there called "Understand C" provided by SciTools.
https://scitools.com/
I have used it very often in the past to perform static code analysis. It can really help to determine library dependency tree. It allows to easily browse up and down the calling tree among other things.
They provide a limited time evaluation, then you must purchase a license.
You could look at something like executable compression.