7z zip format header specifications - header

7z zip format has two headers. One is Start header and other is end header. Start header is of 32 bytes and has 6 fields. Likewise end header has some fields which is not very clear from documentation available in 7-zio.org site. Can anyone share a clear documentation of 7z End header format.
for example, zip format : https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html

Related

Does Zip uses a dictionary for compression? Is it possible to extract that dictionary and dump it to a text file?

I would like to extract the dictionary of any compression algorithm (zip would be the one I would go for since it is widely used) and dump this dictionary to a text file.
I looked the wikipedia page to try and find the answer in the header, but I didn't really find an explicit answer to my question
Zip can use multiple compression formats, one per compressed file.
For instance the Deflate and LZMA formats use a dictionary which is empty at the beginning and has a length of min(m,n) where m is the number of uncompressed bytes already processed and n is a preset value (32KB for Deflate).
So the dictionary is a portion of the uncompressed file on those formats.

PDF format. function of %-started sequence

What is a function of hex sequence "25 E2 E3 CF D3", found at the beginning of some documents? It should be a comment as far as I understand, but it's content is not any meaningful text and the same sequence occurs in many documents.
It identifies the PDF file as containing binary data.
From the freely available PDF Reference (section 7.5.2, p. 40):
If a PDF file contains binary data, as most do (see 7.2, "Lexical Conventions"), the header line shall be
immediately followed by a comment line containing at least four binary characters—that is, characters whose
codes are 128 or greater. This ensures proper behaviour of file transfer applications that inspect data near the
beginning of a file to determine whether to treat the file’s contents as text or as binary.

The PDF file starts with the header %pdf. Can a valid pdf file have more than 1 such headers?

The PDF file starts with the header %pdf. Can a valid pdf file have more than 1 such headers?
The Pdf specification says that it can have more than 1 trailers. But it does not talk about multiple headers. Can it have multiple headers?
It can have as many as you want, since the symbol % is used to represent comments inside a PDF file, so it's not actually a "header".
From the PDF specification:
7.2.3 Comments
Any occurrence of the PERCENT SIGN (25h) outside a string or stream
introduces a comment. The comment consists of all characters after the
PERCENT SIGN and up to but not including the end of the line,
including regular, delimiter, SPACE (20h), and HORZONTAL TAB
characters (09h). A conforming reader shall ignore comments, and treat
them as single white-space characters. That is, a comment separates
the token preceding it from the one following it.
EXAMPLE:
The PDF fragment in this example is syntactically equivalent to just
the tokens abc and 123. abc% comment ( /%) blah blah blah 123

How do I parse a .eml file using C/C++?

How do I parse .eml files using C/C++ (or preferably Objective-C)?
I looked around but couldn't find any details about the structure of the file.
EML files are plain ASCII (7-bit) files. The file format is specified in RFC 2822. The mail header will be separated from the body by an empty line.
If you will be dealing with emails that contain attachments, or characters whose value is greater than 127, you will need a base64 decoder. maybe this link will help.

dll files compared to gzip files

Okay, the title isn't very clear.
Given a byte array (read from a database blob) that represents EITHER the sequence of bytes contained in a .dll or the sequence of bytes representing the gzip'd version of that dll, is there a (relatively) simple signature that I can look for to differentiate between the two?
I'm trying to puzzle this out on my own, but I've discovered I can save a lot of time by asking for help. Thanks in advance.
Check if it's first two bytes are the gzip magic number 0x1f8b (see RFC 1952). Or just try to gunzip it, the operation will fail if the DLL is not gzip'd.
A gzip file should be fairly straight forward to determine as it ought to consist of a header, footer and some other distinguishable elements in between.
From Wikipedia:
"gzip" is often also used to refer to
the gzip file format, which is:
a 10-byte header, containing a magic
number, a version number and a time
stamp
optional extra headers, such as
the original file name
a body,
containing a DEFLATE-compressed
payload
an 8-byte footer, containing a
CRC-32 checksum and the length of the
original uncompressed data
You might also try determining if the gzip contains any records/entries as each will also have their own header.
You can find specific information on this file format (specifically the member header which is linked) here.