ELF segments mem size vs. file size

ELF segments mem size vs. file size - elf

I have read a couple of ELF specification documents but haven't found answers for the below questions yet
1) When segment memory size is greater than segment file size, should the ELF segment downloader fill the segment in memory with zeros as specified by memsize?
2) Can there be a case where a section should be filled with a constant other than zero, i.e. a general case "constant fill" section?
3) What is the right way to identify a .const segment in elf executable file?
The per-section flags value does not have such information and seems to be limited.I have seen implementations of ELF segment downloader where they don't download segments with file size of zero at all.
Thanks!

It's a long over-due answer, but anyway..
When segment memory size is greater than segment file size, should the ELF segment downloader fill the segment in memory with zeros as specified by memsize?
==> I think so. Some sections like .BSS (unintialized data) don't have space in the elf file but should have space in memory when loaded uninitialized. But C run-time initializes the data with zero before going into main() I understand.
Can there be a case where a section should be filled with a constant other than zero, i.e. a general case "constant fill" section?
==> Yes, I remember we can set fill pattern in a section. Search shows me FILL(expression) attribute.
What is the right way to identify a .const segment in elf executable file?
==> I think you could do something like
unsigned int __attribute__((__section__ (".const") FILE("0x1234"))) data[0x1000];
?

Related

Trying to replicate a CRC made with ielftool in srec_cat

So I'm trying to figure out a way to calculate a CRC with srec_cat before putting the code on a microcontroller. Right now, my post-build script uses the ielftool from IAR to do the calculation and insert it into the correct spot in the hex file.
I'm wondering how I can produce the same CRC with srec_cat, using the same hex file of course.
Here is the ielftool command that produces the CRC32 that I want to replicate:
--checksum APP_SYS_ApplicationCrc:4,crc32:1mi,0xffffffff;0x08060000-0x081fffff
APP_SYS_ApplactionCrc is the symbol where the checksum will be stored with a 4 byte offset added
crc32is the algorithm
1 specifies one’s complement
m reverses the input bytes and the final checksum
i initializes the checksum value with the start value
0xffffffff is the start value
And finally, 0x08060000-0x081fffff is the memory range for which the checksum will be calculated
I've tried a lot of things, but this, I think, is the closest I've gotten to the same command so far with srec_cat:
-crop 0x08060000 0x081ffffc -Bit_Reverse -crc32_b_e 0x081ffffc -CCITT -Bit_Reverse
-crop 0x08060000 0x081ffffc In a way specifies the memory range for which the CRC will be calculated
-Bit_Reverse should do the same thing as m in the ielftool when put in the right spot
-crc32_b_e is the algorithm. (I'm not sure yet if I need big endian _b_e or little endian _l_e)
0x081ffffc is the location in memory to place the CRC
-CCITT The initial seed (start value in ielftool) is all one bits (it's the default, but I figured I'd throw it in there)
Does anyone have ideas of how I can replicate the ielftool's CRC? Or am I just trying in vain?
I'm new to CRCs and don't know much more than the basics. Does it even matter anyway if I have exactly the same algorithm? Won't the CRC still work when I put the code on a board?
Note: I'm currently using ielftool 10.8.3.1326 and srec_cat 1.63

After many days of trying to figure out how to get the CRCs from each tool to match (and to make sure I was giving both tools the same data), I finally found a solution.
Based on Mark Adler's comment above I was trying to figure out how to get the CRC of a small amount of data such as an unsigned int. I finally had a lightbulb moment this morning and I realized that I simply needed to put a uint32_t with the value 123456789 in the code for the project I was already work on. Then I would place the variable at a specific location in memory using:
#pragma location=0x08060188
__root const uint32_t CRC_Data_Test = 123456789; //IAR specific pragma and keyword
This way I knew the variable location and length so could then tell the ielftool and srec_cat to only calculate the CRC over the area of that variable in memory.
I then took the elf file from the compiled project and created an intel hex file, so I could more easily look and make sure the correct variable data was at the correct address.
Next I sent the elf file through ielftool with this command:
ielftool proj.elf --checksum APP_SYS_ApplicationCrc:4,crc32:1mi,0xffffffff;0x08060188-0x0806018b proj.elf
And I sent the hex file through srec_cat with this command:
srec_cat proj.hex -intel -crop 0x08060188 0x0806018c -crc32_b_e 0x081ffffc -o proj_srec.hex -intel
After converting the elf with the CRC to a hex file and comparing two hex files I saw that the CRCs were very similar. The only difference was the endianness. Changing -crc32_b_e to -crc32_l_e got both tools to give me 9E 6C DF 18 as the CRC.
I then changed the memory address ranges for the CRC calculation to what they originally were (see the question) and I once again got the same CRC with both ielftool and srec_cat.

GNU Radio text file sink

I'm trying to teach myself basics of GNU Radio and DSP. I created a flowchart in GNU Radio Companion that takes a vector that is the binary representation of a single character (the character "1" as "00110001"), modulates, demodulates, and writes to a file sink.
The scope sink after demodulation looks like the values are returned (see below; appears to be correct pattern of 0s and 1s), but the file sink, although its size is 19 bytes, appears empty, or at least is not returning the correct values (I've looked at it in ASCII and Hex text editors). I assumed the single character transferred would result in 1 byte (or 8 bits) -- not 19 bytes. Changing some of the settings in the Polyphase Sync and adding a Repack Bits block after the binary slicer results in some characters in the output file, but never the right character.
My questions are:
Can GNU Radio take a single character, modulate/demodulate it, and return the same character?
Are there errors in my flowchart?
I'd appreciate any insights or suggestions, thank you.

How to manually construct a gzip so that compressed file is larger than original?

Suppose a 1KB file called data.bin, If it's possible to construct a gzip of it data.bin.gz, but much larger, how to do it?
How much larger could we theoretically get in GZIP format?

You can make it arbitrarily large. Take any gzip file and insert as many repetitions as you like of the five bytes: 00 00 00 ff ff after the gzip header and before the deflate data.

Summary:
With header fields/general structure: effect is unlimited unless it runs into software limitations
Empty blocks: unlimited effect by format specification
Uncompressed blocks: effect is limited to 6x
Compressed blocks: with apparent means, the maximum effect is estimated at 1.125x and is very hard to achieve
Take the gzip format (RFC1952 (metadata), RFC1951 (deflate format), additional notes for GNU gzip) and play with it as much as you like.
Header
There are a whole bunch of places to exploit:
use optional fields (original file name, file comment, extra fields)
bluntly append garbage (GNU gzip will issue a warning when decompressing)
concatenate multiple gzip archives (the format allows that, the resulting uncompressed data is, likewise, the concatenation or all chunks).
An interesting side effect (a bug in GNU gzip, apparently): gzip -l takes the reported uncompressed size from the last chunk only (even if it's garbage) rather than adding up values from all. So you can make it look like the archive is (absurdly) larger/smaller than raw data.
These are the ones that are immediately apparent, you may be able to find yet other ways.
Data
The general layout of "deflate" format is (RFC1951):
A compressed data set consists of a series of blocks, corresponding to
successive blocks of input data. The block sizes are arbitrary,
except that non-compressible blocks are limited to 65,535 bytes.
<...>
Each block consists of two parts: a pair of Huffman code trees that
describe the representation of the compressed data part, and a
compressed data part. (The Huffman trees themselves are compressed
using Huffman encoding.) The compressed data consists of a series of
elements of two types: literal bytes (of strings that have not been
detected as duplicated within the previous 32K input bytes), and
pointers to duplicated strings, where a pointer is represented as a
pair <length, backward distance>. The representation used in the
"deflate" format limits distances to 32K bytes and lengths to 258
bytes, but does not limit the size of a block, except for
uncompressible blocks, which are limited as noted above.
Full blocks
The 00 00 00 ff ff that Mark Adler suggests is essentially an empty, non-final block (RFC1951 section 3.2.3. for the 1st byte, 3.2.4. for the uncompressed block itself).
Btw, according to gzip overview at the official site and the source code, Mark is the author of the decompression part...
Uncompressed blocks
Using non-empty uncompressed blocks (see prev. section for references), you can at most create one for each symbol. The effect is thus limited to 6x.
Compressed blocks
In a nutshell: some inflation is achievable but it's very hard and the achievable effect is limited. Don't waste your time on them unless you have a very good reason.
Inside compressed blocks (section 3.2.5.), each chunk is [<encoded character(8-9 bits>|<encoded chunk length (7-11 bits)><distance back to data(5-18 bits)>], with lengths starting at 3. A 7-9-bit code unambiguously resolves to a literal character or a specific range of lengths. Longer codes correspond to larger lengths/distances. No space/meaningless stuff is allowed between chunks.
So the maximum for raw byte chunks is 9/8 (1.125x) - if all the raw bytes are with codes 144 - 255.
Playing with reference chunks isn't going to do any good for you: even a reference to a 3-byte sequence gives 25/24 (1.04x) at most.
That's it for static Huffman tables. Looking through the docs on dynamic ones, it optimizes the aforementioned encoding for the specific data or something. So, it should allow to make the ratio for the given data closer to the achievable maximum, but that's it.

Locating ELF shared library exports at runtime

It is possible to extract exported symbols of a loaded shared library using only its memory image?
I'm talking about the symbols listed in .dynsym section. As I understand, we can go this way:
Locate the base address of the library.For example, by reading /proc/<pid>/maps it is possible to find memory areas which are mapped from the library on disk, and then we can look for ELF magic bytes to find the ELF header which gives us the base address.
Find the PT_DYNAMIC segment from the program headers.Parse the ELF header, then iterate over the program headers to find the segment which contains the .dynamic section.
Extract the location of the dynamic symbol table.Iterate over the ElfN_Dyn structs to find the ones with d_tags DT_STRTAB and DT_SYMTAB. These will give us addresses of the string table (with symbol names) and the dynamic symbol table itself.
And this is where I stumbled. .dynamic section has a tag for the size of the string table (DT_STRSZ), but there is no indication of the symbol table size. It only contains the size of a single entry (DT_SYMENT). How can I retrieve the number of symbol entries in the table?
It should be possible to infer that from the size of the .dynsym section, but ELF files are represented as segments in memory. The section table is not required to be loaded into memory and can only be (reliably) accessed by reading the corresponding file.
I believe it is possible because the dynamic linker has to know the size of the symbol table. However, the dynamic loader may have stored it somewhere when the file had been loaded, and the linker is just using the cached value. Though it seems somewhat stupid to load the symbol table into memory, but to not load a handful of bytes with its size alongside.

The size of the dynamic symbol table must be inferred from the symbol hash table (DT_HASH or DT_GNU_HASH): this answer gives some code which does that.
The standard hash table (which is not used on GNU systems anymore) is quite simple. The first entry is nchain which is:
The number of symbol table entries should equal nchain
The GNU hash table is more complicated.

Bzip2 block header: 1AY&SY

This is the question about bzip2 archive format. Any Bzip2 archive consists of file header, one or more blocks and tail structure. All blocks should start with "1AY&SY", 6 bytes of BCD-encoded digits of the Pi number, 0x314159265359. According to the source of bzip2:
/*--
A 6-byte block header, the value chosen arbitrarily
as 0x314159265359 :-). A 32 bit value does not really
give a strong enough guarantee that the value will not
appear by chance in the compressed datastream. Worst-case
probability of this event, for a 900k block, is about
2.0e-3 for 32 bits, 1.0e-5 for 40 bits and 4.0e-8 for 48 bits.
For a compressed file of size 100Gb -- about 100000 blocks --
only a 48-bit marker will do. NB: normal compression/
decompression do *not* rely on these statistical properties.
They are only important when trying to recover blocks from
damaged files.
--*/
The question is: Is it true, that all bzip2 archives will have blocks with start aligned to byte boundary? I mean all archives created by reference implementation of bzip2, the bzip2-1.0.5+ utility.
I think that bzip2 may parse the stream not as byte stream but as bit stream (the block itself is encoded by huffman, which is not byte-aligned by design).
So, in other words: If grep -c 1AY&SY greater (huffman may generate 1AY&SY inside block) or equal to count of bzip2 blocks in the file?

BZIP2 looks at a bit stream.
From http://blastedbio.blogspot.com/2011/11/random-access-to-bzip2.html:
Anyway, the important bits are that a BZIP2 file contains one or more
"streams", which are byte aligned, each containing one (zero?) or more
"blocks", which are not byte aligned, followed by an end of stream
marker (the six bytes 0x177245385090 which is the square root of pi as
a binary coded decimal (BCD), a four byte checksum, and empty bits for
byte alignment).
The bzip2 wikipedia article also alludes to bit-block alignment (see the File Format section), which seems to be inline from what I remember from school (had to implement the algorithm...).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas