What are my md5 bits? - cryptography

I'm trying to code an md5 hashing function in Python but it doesn't seem to work. I've isolated the problem to the message bits that are to be hashed. Yes, I'm actually converting each byte to bits and forming a bit message (I want to study the algorithm on a bit level). And this is where things are falling apart; my bit string is not correctly formed.
The simplest message would be "", it's 0 bytes long, padding would be a "1" followed (or not) by 511 "0"s (last 64 bits denote message length, which, as already said, is just 0).
10000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
00000000000000000000000000000000
I'm feeding 32-bit chunks of data into the transform function at a time. I've tried to manually position the 1 in all the positions of in the first chunk, as well as the last chunk (little endian). Where should the "1" be?
Thank you.
Update: The correct position for the first 32-bit word fed into the transform should in fact be: 00000000000000000000000010000000 which int(x,2) is 128 this mess is due to my A = rotL((A+F(B,C,D)+int(messageBits[0],2)+sinList[0]), s11)+B transform format using int() to interpret the bit strings as integer data, int() takes little endian format binary, thus 100.... was a very huge number.

MD5 uses big-endian convention at bit level, then little-endian convention at byte level.
The input is an ordered sequence of bits. Eight consecutive bits are a byte. A byte has a numerical value between 0 and 255; each bit in a byte has value 128, 64, 32, 16, 8, 4, 2 or 1, in that order (that's what "big-endian at bit level" means).
Four consecutive bytes are a 32-bit word. The numerical value of the word is between 0 and 4294967295. The first byte is least significant in that word ("little-endian at byte level"). Hence, if the four bytes are a, b, c and d in that order, then the word numerical value is a+256*b+65536*c+16777216*d.
In software applications, input is almost always a sequence of bytes (its length, in bits, is a multiple of 8). The aggregation of bits into bytes is assumed to have already taken place. Thus, the extra '1' padding bit will be the first bit of the next byte, and, since the bit-level convention is big-endian, that next byte will have numerical value 128 (0x80).
For an empty message, the very first bit will be the '1' padding bit, followed by a whole bunch of zeros. The message length is also zero, which encodes yet other zeros. Therefore, the padded message block will be a single '1' followed by 511 '0', as you show. When bits are assembled into bytes, the first byte will have value 128, followed by 63 bytes of value 0. When bytes are grouped into 32-bit words, the first word (M0) will have numerical value 128, and the 15 other words (M1 to M15) will have numerical value 0.
Refer to the MD5 specification for details. What I describe above is what is explained in the first paragraph of section 2 of RFC 1321. The same encoding is used for the message bit length (at the end of the padding), and for writing out the final hash result.

Related

What is the bias exponent?

My question concerns the IEEE 754 Standard
So im going through the steps necessary to convert denary numbers into a floating point number (IEEE 754 standard) but I dont understand the purpose of determining the biased exponent. I cant get my head around that step and what it is exactly and why its done?
Could any one explain what this is - please keep in mind that I have just started a computer science conversion masters so I wont completely understand certain choices of terminology!
If you think its very long to explain please point me in the right direction!
The exponent in an IEEE 32-bit floating point number is 8 bits.
In most cases where we want to store a signed quantity in an 8-bit field, we use a signed representation. 0x80 is -128, 0xFF is -1, 0x00 is 0, up to 0x7F is 127.
But that's not how the exponent is represented. The exponent field is treated as if it were unsigned 8-bit number that is 127 too large. Look at the unsigned value in the exponent field and subtract 127 to get the actual value. So 0x00 represents -127. 0x7F represents 0.
For 64-bit floats, the field is 11 bits, with a bias of 1023, but it works the same.
A floating point number could be represented (but is not) as a sign bit s, an exponent field e, and a mantissa field m, where e is a signed integer and m is an unsigned fraction of an integer. The value of that number would then be computed as (-1)^s • 2^e • m. But this would not allow to represent important special cases.
Note that one could increase the exponent by ±n and shift the mantissa right by ±n without changing the value of the number. This allows for nearly a numbers to adjust the exponent such that the mantissa starts with a 1 (one exception is of course 0, a special FP number). If one does so, one has normalized FP numbers, and since the mantissa now starts always with 1, one does not have to store the leading 1 in memory, and the saved bit is used to increase the precision of the FP number. Thus, no mantissa m is stored but a mantissa field mf.
But how is now 0 represented? And what about FP numbers that have already the max or min exponent field, but due to their normalization, cannot be made larger or smaller? And what about "not a number-s" that are e.g. the result of x/0?
Here comes the idea of a biased exponent: If half of the max exponent value is added to the exponent field, one gets the bias exponent be. To compute the value of the FP number, this bias has to be subtracted of course. But all normalized FP numbers have now 0 < be <(all 1). Therefore these special biased exponents 0 and (all 1) can now be reserved for special purposes.
be = 0, mf = 0: Exact 0.
be = 0, mf ≠ 0: A denormalized number, i.e. mf is the real mantissa that does not have a leading 1.
be = (all 1), mf = 0: Infinite
be = (all 1), mf ≠ 0: Not a number (NaN)

CRC of input data shorter than poly width

I'm in the process of writing a paper during my studies on implementing CRC in Excel with VBA.
I've created a fairly straightforward, modular algorithm that uses Ross's parametrized model.
It works flawlessly for any length polynomian and any combination of parameters except for one; when the length of the input data is shorter than the width of the polynomial and an initial value is chosen ("INIT") that has any bits set which are "past" the length of the input data.
Example:
Input Data: 0x4C
Poly: 0x1021
Xorout: 0x0000
Refin: False
Refout: False
If I choose no INIT or any INIT like 0x##00, I get the same checksum as any of the online CRC generators. If any bit of the last two hex characters is set - like 0x0001 - my result is invalid.
I believe the question boils down to "How is the register initialized if only one byte of input data is present for a two byte INIT parameter?"
It turns out I was misled (or I very well may have misinterpreted) the explaination of how to use the INIT parameter on the sunshine2k website.
The INIT value must not be XORed with the first n input bytes per se (n being the width of the register / cropped poly / checksum), but must only be XORed in after the n 0-Bits have been appended to the input data.
This specification does not matter when input data is equal or larger than n bytes, but it does matter when the input data is too short.

What's the proper way to get a fixed-length bytes representation of an ECDSA Signature?

I'm using python and cryptography.io to sign and verify messages. I can get a DER-encoded bytes representation of a signature with:
cryptography_priv_key.sign(message, hash_function)
...per this document: https://cryptography.io/en/latest/hazmat/primitives/asymmetric/ec/
A DER-encoded ECDSA Signature from a 256-bit curve is, at most, 72 bytes; see: ECDSA signature length
However, depending on the values of r and s, it can also be 70 or 71 bytes. Indeed, if I examine length of the output of this function, it varies from 70-72. Do I have that right so far?
I can decode the signature to ints r and s. These are both apparently 32 bytes, but it's not clear to me whether that will always be so.
Is it safe to cast these two ints to bytes and send them over the wire, with the intention of encoding them again on the other side?
The simple answer is, yes, they will always be 32 bytes.
The more complete answer is that it depends on the curve. For example, a 256-bit curve has an order of 256-bits. Similarly, a 128-bit curve only has an order of 128-bits.
You can divide this number by eight to find the size of r and s.
It gets more complicated when curves aren't divisible by eight, like secp521r1 where the order is a 521-bit number.
In this case, we round up. 521 / 8 is 65.125, thus it requires that we free 66 bytes of memory to fit this number.
It is safe to send them over the wire and encode them again as long as you keep track of which is r and s.

How to write integer value "60" in 16bit binary, 32bit binary & 64bit binary

How to write integer value "60" in other binary formats?
8bit binary code of 60 = 111100
16bit binary code of 60 = ?
32bit binary code of 60 = ?
64bit binary code of 60 = ?
is it 111100000000 for 16 bit binary?
why does 8bit binary code contain 6bits instead of 8?
I googled for the answers but I'm not able to get these answers. Please provide answers as I'm still a beginner of this area.
Imagine you're writing the decimal value 60. You can write it using 2 digits, 4 digits or 8 digits:
1. 60
2. 0060
3. 00000060
In our decimal notation, the most significant digits are to the left, so increasing the number of digits for representation, without changing the value, means just adding zeros to the left.
Now, in most binary representations, this would be the same. The decimal 60 needs only 6 bits to represent it, so an 8bit or 16bit representation would be the same, except for the left-padding of zeros:
1. 00111100
2. 00000000 00111100
Note: Some OSs, software, hardware or storage devices might have different Endianness - which means they might store 16bit values with the least significant byte first, then the most signficant byte. Binary notation is still MSB-on-the-left, as above, but reading the memory of such little-endian devices will show any 16bit chunk will be internally reversed:
1. 00111100 - 8bit - still the same.
2. 00111100 00000000 - 16bit, bytes are flipped.
every number has its own binary number, that means that there is only one!
on a 16/32/64 bit system 111100 - 60 would just look the same with many 0s added infront of the number (regulary not shown)
so on 16 bit it would be 0000000000111100
32 bit - 0000000000000000000000000011110
and so on
For storage Endian matters ... otherwise bitwidth zeros are always prefixed so 60 would be...
8bit: 00111100
16bit: 0000000000111100

What does alignment to 16-byte boundary mean in x86

Intel's official optimization guide has a chapter on converting from MMX commands to SSE where they state the fallowing statment:
Computation instructions which use a memory operand that may not be aligned to a 16-byte boundary must be replaced with an unaligned 128-bit load (MOVDQU) followed by the same computation operation that uses instead register operands.
(chapter 5.8 Converting from 64-bit to 128-bit SIMD Integers, pg. 5-43)
I can't understand what they mean by "may not be aligned to a 16-byte boundary", could you please clarify it and give some examples?
Certain SIMD instructions, which perform the same instruction on multiple data, require that the memory address of this data is aligned to a certain byte boundary. This effectively means that the address of the memory your data resides in needs to be divisible by the number of bytes required by the instruction.
So in your case the alignment is 16 bytes (128 bits), which means the memory address of your data needs to be a multiple of 16. E.g. 0x00010 would be 16 byte aligned, while 0x00011 would not be.
How to get your data to be aligned depends on the programming language (and sometimes compiler) you are using. Most languages that have the notion of a memory address will also provide you with means to specify the alignment.
I'm guessing here, but could it be that "may not be aligned to a 16-byte boundary" means that this memory location has been aligned to a smaller value (4 or 8 bytes) before for some other purposes and now to execute SSE instructions on this memory you need to load it into a register explicitly?
Data that's aligned on a 16 byte boundary will have a memory address that's an even number — strictly speaking, a multiple of two. Each byte is 8 bits, so to align on a 16 byte boundary, you need to align to each set of two bytes.
Similarly, memory aligned on a 32 bit (4 byte) boundary would have a memory address that's a multiple of four, because you group four bytes together to form a 32 bit word.