How do I delete a program header from an ELF binary - elf

I want to write a utility to remove a program header from an ELF binary. For example, when I run readelf -l /my/elf I get a listing of all the program headers: PHDR INTERP ... GNU_STACK GNU_RELRO. When I run my utility, I would like to get all the same program headers back in the same order, minus the one I deleted. Is there any easier way to do this than recreated the entire ELF from scratch, skipping the unwanted header?

Is there any easier way to do this than recreated the entire ELF from scratch
Sure: program headers form a fixed-record table at an offset given by ehdr.e_phoff, containing .e_phnum entries of .e_phentsize bytes.
To delete one entry, simply copy the rest of entries over it, and decrement .e_phnum. That's all there is to it.
Beware: deleting some entries will likely cause the dynamic loader to crash. GNU_STACK is about the only header that can be deleted without too much harm (that I can think of).
Update:
Yes, setting .p_type to PT_NULL is another (and simpler) approach. But such entries are generally not expected to be present, and you may find some systems where PT_NULL will trigger an assertion in the loader (or in some other program).
Finally, adding a new Phdr might be tricky. Usually there is no space to expand the table (as it is immediately followed by some other data, e.g. .text). You can relocate the table to the end of the file, and set .e_phoff and .e_phnum to correspond to the new table, but many programs expect the entire Phdr table to be loaded and available at runtime, and that is not easy to arrange, as the new location at the end of the file will not be "covered" by any PT_LOAD segment.

The GNU Binary File Descriptor library (libbfd) may be helpful.

Related

Scheme Macro - Adding a Header to a file

I am writing a scheme macro for a simulation tool. I create thousands of files and I want to add a header (6 lines) to each file. The code for my header runs well and the header gets created in the right way.
But the adding of the header to my file is buggy. It does not add the 6 header lines to top of my files without touching the rest, it delets the first information that are in my file. How much information is deleted, depends on the total length of header-information.
(let* ((out (open-input-output-file filename0) ))
(display header out)
(newline out)
(close-output-port out))
This is how my file looks without the header:
TracePro Release: 20 6 0
Irradiance Map Data for D:****\TracePro\Aktive\sim_mod_09.oml
Linear Units in millimeters
Data for absorbing_area_focuscircle Surface 0
Data generated at 10:55:56 May 28, 2021
This is how my file looks with the header:
axle x y z a b c
pyra 0 0 0 0 0 0
lens 0 0 0 0 0 0
coll 0 0 0 0 0 0
mir1 0 0 0 0 0 0
glass1 0 0 0 0 0 0
ing_area_focuscircle Surface 0
Data generated at 10:57:29 May 28, 2021
Raytrace Time: mins: 0, secs: 0*
Projected Plane Extent from surface geometry
TopLeft:(-1.05125,-214.755,-1.05125)
TopRight:(1.05125,-214.755,-1.05125)
BottomLeft:(-1.05125,-214.755,1.05125)
BottomRight:(1.05125,-214.755,1.05125)
This isn't really an answer, especially as my knowledge of the Scheme language standards isn't good enough to even know if this is possible within a strictly-defined Scheme. However I'll show why it's hard, and then give an example of how to solve it in Racket, first by cheating to make a probably-correct answer and then by trying to do it the hard way to make a probably-not-correct answer.
Why it is hard
No modern filesystem I know of allows you to open a file for 'insertion' where new content is inserted into the file, pushing existing content 'down' the file. Instead you can open a file for writing, conceptually, in two ways:
for appending, which will append new content at the end;
for overwriting, which will overwrite existing content with new.
(Actually these may be the same: opening for appending may just open for overwriting and then move the current location to the end of the file.)
So what you're doing in your sample is opening for overwriting, and then clobbering the content of the file with the header.
How to do it, in outline
The way to do what you need to do, in outline, is:
create and open a temporary file in the same directory as the file you care about;
write the new content to the temporary file;
copy all the content of the existing file to the temporary file;
close the temporary file;
if all is well, rename the temporary file on top of the existing file, if all is not OK, delete it.
If you do this carefully it is safe, because file renames are atomic, or should be, in the filesystem, if the two files are in the same directory. That means that the rename should either completely succeed or completely fail, even if the system crashes part way through or the filesystem fills or something like that. If the filesystem doesn't garuantee that then you're pretty much stuck.
But doing it carefully is not easy (I should admit here that some of my background is doing things like this to system-critical files, so I've spent too long thinking about how to make this safe in a context where getting it wrong is very serious indeed).
Solving this in Racket by cheating
As I said, getting the above process right is hard, and it is therefore something you often want to rely on a battle-tested library for. Racket has such a thing: call-with-atomic-output-file. This seems to be designed to solve exactly this problem: it deals with creating and opening the temporary file for you, deals with the renaming at the end and cleans up appropriately. All you need is a function which copies things around.
So here is a function, prepend-to-file which uses call-with-atomic-output-file to try and do what you want. This is Racket-specific, in many ways, and it is also somewhat overengineered.
(define (prepend-to-file file content #:buffer-size (buffer-size 40960))
;; prepend content to file
;;
;; Try to be a bit too clever about whether we're copying strings or bytes
;; based on the argument
(let-values ([(read-it! write-it make-it)
(if (bytes? content)
(values read-bytes! write-bytes make-bytes)
(values read-string! write-string make-string))])
(call-with-atomic-output-file file
(λ (out path)
;; out is open for writing to the temporary file, path is the
;; temporary file's pathname which we don't care about
(call-with-input-file file
(λ (in)
;; in is now open for reading from the real file
(write-it content out)
(let ([buffer (make-it buffer-size)])
;; copy in to out using a buffer
(for ([nread (in-producer (thunk
(read-it! buffer in))
eof-object?)])
(write-it buffer out 0 nread)))
;; OK just return the file for want of anything better
file))))))
I think it's reasonably likely that the above code actually works in most reasonable cases.
Solving this in Racket without cheating
If we could write call-with-atomic-output-file then we could solve the problem without cheating. But getting this right is hard. Here is an attempt to do this, which is almost certainly incorrect:
(define (call/temporary-output-file file proc)
(let ([tmpname (string-append file
"-"
(number->string (random (expt 2 24))))]
[managed #f]
[once #t])
;; tmpname is the name of the temporary file: this assumes pathnames are
;; only strings, which is wrong. managed is a flag which tells us if
;; proc returned normally, once is a flag which attempts to prevent any
;; continuation nasties so the whole thing can only happen once.
(call-with-output-file tmpname
(λ (out)
(dynamic-wind
(thunk
(when (not once)
;; if this is the case we're getting back in, and this
;; is not OK
(error 'call/temporary-output-file
"this is hopeless")))
(thunk
;; call proc and if it returns normally note that
(begin0 (proc out tmpname)
(set! managed #t)))
(thunk
;; close the output port regardless
(close-output-port out)
(if managed
;; We did OK, so rename the file in place
(rename-file-or-directory tmpname file #t)
;; failed, nuke the temporary file
(when (file-exists? tmpname)
(delete-file tmpname)))
;; finally set once false to prevent shenanigans
(set! once #f)))))))
Notes:
this is still Racket-specific, but it now depends only on simpler functions which have, probably, more obvious counterparts in other implementations (or in the standard);
it tries to deal with some of the edge cases, but almost certainly misses some;
it certainly does not cope in cases such as the rename failing and so on;
Again: don't use this: it's almost certainly buggy.
However if you did use this, then you could simply splice it in instead of call-with-atomic-output-file in the above code and it will, often but probably not always, work.

modify build-id in the notes section of the elf file

I need to modify a build-id in the notes section of the ELF file. I see there are plenty of tools to read elf but not to modify them. I found elfedit but it doesn't seem to do what I need. Is it even possible?
Here is the output of readelf
$ readelf -n myelffile
Displaying notes found in: .note.ABI-tag
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 3.14.0
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: d75a086c288c582036b0562908304bc3a8033235
I'm trying to modify .note.gnu.build-id section.
Is it even possible?
Yes. This is one of the easier modifications, since the data in the note is completely arbitrary, and no other data refer to it.
All you have to do is find the .note section, decode each note in turn until you find the one with NT_GNU_BUILD_ID type, and overwrite its data with same-length bytes of your choosing.
Are you aware of the linker --build-id 0x.... option which allows you to put in whatever hex data you desire at link time? If you can relink your binary, then you wouldn't need to modify the build-id note, as the linker will happily put your data there during the initial link.

Determining if two rar files are part of the same set

Let's say I have two files, (name).n.rar and (name).n+1.rar, which appear to be part of the same set (same size, etc). Is there any easy way to tell if they're actually part of the same set, without first downloading the full set? Currently the only way I can tell is by downloading an instance of every file and and then seeing if WinRAR gives me an error when I try to unwrap them.
(And on a related note, assuming there is such a method, can I do the same without having adjacent parts?)
Ideally there's an existing program that can do this, but I can code my own if necessary.
Further notes: These are two sets of archives of the same file. They appear identical to obvious checks: filenames are subsequent, contents are sane, sizes are identical, same number of parts. I then receive a full set of files. If they're not from the same set, I can't unrar them - though it seems that WinRAR will proceed to 100% before giving me the CRC error (file corrupt.)
New Answer
All tests were made using WinRAR 5.01 32-bit. Since the algorythm should remain the same, the following statements should be valid for any other previous version. Feel free to comment if you know that's not true.
I'll give a short briefing about the chat. I tried to pack a file larger than 1GB several times; Then I mixed up the files and tried to extract the archives: it worked. The problem was not the size of the file indeed.
I thought about three possible solutions to the problem:
Architecture was influent in the packaging process: so different people tried to pack the files, and mixing up them would result in an error;
Different people tried to pack the files, giving a slightly different size file (for example 250 MB and 250000 KB). This would have been noticed in the file properties, though;
Files were corrupted during the download: re-downloading them would confirm this hypothesis.
I was most curious about the first one: could architecture be influent in the packaging process?
I found out the answer is yes, it is. Here are the passages to repeat the experiment:
Pack your files in an archive, giving a precise part size, in computer A;
Pack the same exact files, giving the same exact part size, in computer B (TODO: Check if this experiment is still valid with similar architecture, e.g. Intel i7 with Intel i5) with a different architecture (e.g. Intel processor with AMD processor);
Transfer one (or more, if you wish, but of course not all of them!) parts from computer B to computer A. Remember to delete those files from computer A before the transfer;
Place all the files in the same directory, check if they all have the same name (e.g. "AAA part1", "AAA part2"...);
Extract them;
Enjoy your CRC Error!.
Tests were made using an Intel i7-3632QM and an AMD FX 6300.
I have some suspects about the fact that the compressed files are the same, but the CRC code is different.
Old Answer
There is a way indeed. During my Computer Science academic studies, we had a Computer Forensics class. I learned that every file has a static beginning (an header, we could say), that makes a program recognize its type and the way to decrypt it. To see it, you just have to open it with a text editor (Notepad++ is the best so far, I guess)
For example, jpeg images begin with ÿØÿá.
I tried to store a video in some splitted .rar files, and knowing if they are part of the same archive was simpler than I thought.
Every rar file begins with Rar!. On the second or third line, it should appear the name of the file stored in the archive: in my case, myVideo.mp4. If all your archives contain that filename, they're probably part of the same archive.
Things are getting worse if there are several files in the archive and you don't know their names. In fact, if there is more than one file, the RAR files structure is as follows:
File 1:
Rar!
NUL NUL NUL //Random things here
NUL NUL NUL NUL NUL myVideo.mp4 NUL NUL NUL NUL
//Random things here. If the dimensions of the file exceed the archive,
//the next file will begin with the same name.
//Let's assume that this is happening.
EOF
File 2:
Rar!
NUL NUL NUL //Random things here
NUL NUL myVideo.mp4 NUL NUL NUL
//This time the file is complete. Since there is still space in the archive,
//it will add another file
NUL NUL NUL NUL mySecondVideo.mp4 NUL NUL NUL NUL
EOF
Let's assume that at the end of the second archive, mySecondVideo hasn't been fully compressed yet.
File 3:
Rar!
NUL NUL NUL
NUL NUL NUL NUL mySecondVideo.mp4 NUL
NUL NUL NUL
NUL myTextFile.txt
NUL NUL NUL mySecondTextFile.txt NUL
EOF
If mySecondTextFile.txt isn't yet fully compressed, my fourth file will begin with its name.
I hope it's clear, I tried to keep it as simple as possible. In the case of more files, I would start from the last archive. I'd write down the first filename found on that file and I'd search it in the previous one. If I found that name, I'd repeat the sequence until the first archive.
I'm not familiar with RAR-format that much, but in case you decide to write your program in Java I can recommend using 7-Zip-JBinding.
http://sevenzipjbind.sourceforge.net/
http://sevenzipjbind.sourceforge.net/basic_snippets.html#open-multipart-rar-archives
You can download first n+1 parts of the archive and then call extract() method ignoring output data only caring for
IArchiveExtractCallback.setOperationResult(ExtractOperationResult)
calls (checking that CRC was ok) and monitoring files getting opened trough
IArchiveOpenVolumeCallback.getStream(java.lang.String)
If volume n+2 get requested, you can conclude that volume n+1 was the right one.
(I'm not 100% sure about this conclusion, but I would give it a try)

Is the ELF .notes section really needed?

On Linux, I'm trying to strip a statically linked ELF file to the bare essentials. When I run:
strip --strip-unneeded foo
or
strip --strip-all foo
The resulting file still has a fat .notes section that appears to be full of funky strings.
Is the .notes section really needed or can I safely force it out with --remove-section?
Thanks for any help.
From experience and from looking at the man page for strip, it looks like strip isn't supposed to get rid of any and all sections and strings that aren't needed; just symbols. Quoth the man page:
GNU strip discards all symbols from object files objfile.
That being said, from experience, strip, even without --strip-all, removes sections unneeded for loading, such as .symtab and .strtab, and you can, as you note, remove sections you want it with --remove-section.
As an example of a .notes section, I took /bin/ls from my Ubuntu 11.10 64-bit box:
$ readelf -Wn /bin/ls
Notes at offset 0x00000254 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.15
Notes at offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 3e6f3159144281f709c3c5ffd41e376f53b47952
That encompasses the .note.ABI-tag section and the .note.gnu.build-id section. It looks like they contain data that isn't necessary to load the program, but also isn't standard, and isn't known by strip to not be necessary for the proper running of the program, since an ELF can have any number of additional "unknown" sections that aren't safe to remove. So rather using a virtual whitelist (which would fail miserably), it uses a blacklist of sections that it knows it can get rid of, and does so.
Short version: these sections don't seem to be standard and could be used for various things, so strip can't know it's safe to remove them. But based on the info inside the one I took above, if it's your own program, it's almost certainly safe to remove it.

Inconsistent Behavior In A Batch File's For Statement

I've done very little with batch files but I'm trying to track down a strange bug I've been encountering on a legacy system.
I have a number of .exe files in particular folder. This script is supposed to duplicate them with a different file name.
Code From Batch File
for %%i in (*.exe) do copy \\networkpath\folder\%%i \\networkpath\folder\%%i.backup.exe
(Note: The source and destination folders are THE SAME)
Example Of Desired Behavior:
File1.exe --> Becomes --> File1.exe.backup.exe
File2.exe --> Becomes --> File2.exe.backup.exe
Now first, let me say that this is not the approach I would take. I know there are other (potentially more straight forward) ways to do this. I also know that you might wonder WHY on earth we care about creating a FileX.exe.backup.exe. But this script has been running for years and I'm told the problem only started recently. I'm trying to pinpoint the problem, not rewrite the code (even if it would be trivial).
Example Buggy Output:
File1.exe.backup.exe
File1.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
File1.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
etc...
File2.exe.backup.exe
File2.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
File2.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe.backup.exe
Not knowing anything about batch files, I looked at this and figured that the condition of the for statement was being re-evaluated after each iteration - creating a (near) infinite loop of copying (I can see that, eventually, the copy will fail when the names get too long).
This would explain the behaviour I'm seeing. And when cleaned the directory in question so that it had only the original File1.exe file and ran the script it produced the bug code. The problem is that I CANNOT replicate the behaviour anywhere else!?!
When I create a folder locally with a few .exe files and run the script - I get the expected output. And yes, if I run it again, I get one instance of 'File1.exe.backup.exe.backup.exe' (and each time I run it again, it increases in length by one). But I cannot get it to enter the near-infinite loop case.
It's been driving me crazy.
The bug is occurring on a networked location - so I've tried to recreate it on one - but again, no success. Because it's a shared network location, I wondered if it could have something to do with other people accessing or modifying files in the folder and even introduced delays and wrote a tiny program to perform actions in the same folder - but without any success.
The documentation I can find on the 'for' statement doesn't really help, but all of the tests I've run seem to suggest that the in (*.exe) section is only evaluated once at the beginning of execution.
Does anyone have any suggestions for what might be going on here?
I agree with Andriy M's comment - it looks to be related to Windows 7 Batch Script 'For' Command Error/Bug
The following change should fix the problem:
for /f "eol=: delims=" %%i in ('dir /b *.exe') do copy \\networkpath\folder\%%i \\networkpath\folder\%%i.backup.exe
Any file that starts with a semicolon (highly unlikely, but it can happen) would be skipped with the default EOL of semicolon. To be safe you should set EOL to some character that could never start a file name (or any path). That is why I chose the colon - it cannot appear in a folder or file name, and can only appear after a drive letter. So it should always be safe.
Copy supports wildcard characters also in target path. You can use
copy \\networkpath\folder\*.exe \\networkpath\folder\*.backup.exe