how to fold the pseudo header checksum calculation into the csum field when using partial checksum offloading - udp

I'm trying to understand something related to checksum offloading.
when using partial checksum offloading, the checksum on the pseudo header is calculated in the host, and put into the udp checksum field, to be use in the full calculation inside the NIC.
but the pseudo header checksum can be more than 16-bit, so how do it supposed to be fold into its 16-bit checksum field? when the entire calculation is done in the host, the checksum value from the pseudo header is used as is, which means regular folding (like the one the being done at the end of the full checksum - add top half and bottom half) is not good here, as it change the value.
it looks like there is an option to fold the checksum from 32/64 bits to 16, without loosing its "value" to the entire calculation, but I can't find anything about it online.

Related

How to decide when to reflect or XOR CRC data?

I found multiple optimal CRC-32 polynomials on the CRC Polynomal Zoo site of Philip Koopman. Now I want to generate a CRC lookup table for one of the polynomials, by using the software pycrc.
To generate a CRC lookup table you have to provide the following information for the choosen polynomial:
Reflected in (boolean)
Reflected out (boolean)
XOR in (hex value)
XOR out (hex value)
For some polynomials I found the above parameters in a specification (for instance a AUTOSAR specification for the polynomial "F4ACFB13"), but what parameters should I choose if there is no specification for a certain polynomial? The Koopman site doesn't seem to provide the recommended parameters to use.
I already tried to find an explanation how to choose these parameters, but I could only find explanations how to implement these parameters and not how to choose them. Most websites recommend searching for specifications describing "common CRC polynomials", because they provide the optimal parameters.
Generally you are trying to match the CRC used in some existing protocol. In that case you need to do the same thing you did for the AUTOSAR CRC: find the specification for the CRC. Or you need to get several examples of messages and correct CRCs and try to reverse-engineer the CRC parameters.
You can find over a hundred CRC definitions here.
If you are creating your own protocol from scratch, then you can select any polynomial, reflection, initial value, and final exclusive-or you like, as well as any byte order of the CRC in the message. I would recommend that the polynomial be chosen with good properties for your message length from Phil's data, and that the initial value of the CRC register, init, not be zero. (If it is zero, then the CRC of any string of zeros will be the same value, that final exclusive-or, regardless of the length.) Also there is no detriment, and it is more aesthetic to pick the initial value and the final exclusive-or to be equal, so that the CRC of an empty sequence is zero.

How do i reverse engineer a known Cyclic Redundancy Check value?

So I have problem finding the CRC value for a series of commands, I already have the CRC values for some of the commands but I need to figure out how they were calculated. After carefully going over the data stream and attempting to calculate the CRC we cannot get the known CRC value and the calculated one to match. I have never calculated a CRC before but I have read multiple papers on it and it seems easy enough, except for the fact that its not working. The manual I have says the polynomial generator is (x^8 + x^7 + X^2 + X^0) and it gives me a unique non zero value of B1 (hex). The full command is A9E40401 (hex) with a CRC of 1E (hex). The process I am currently using involves converting the data stream from hex into binary, using the LSB first rule, inserting FF(hex) into the command to detect extraneous zeros, adding 00(hex) to the end as a place holder and than performing mod2 division, and than I invert it and apply it to the data stream. Either I'm doing something wrong or I missed a step. I am assuming the polynomial I was provided is correct. Any help would be greatly appreciated.
You can try RevEng, which is for exactly this, to determine the CRC parameters from a set of examples.
Then you can use crcany to generate the code.
I feel like an idiot.
When I calculated my CRC I did every step properly except the very last step where you must perform the LSB first rule and than invert the remainder.
I only performed the LSB first rule (thinking that's what was meant by invert).
So with a remainder of 00011101 I got the wrong CRC value (10111000). when I should have gotten 01000111.

Erlang binary protocol serialization

I'm currently using Erlang for a big project but i have a question regarding a proper proceeding.
I receive bytes over a tcp socket. The bytes are according to a fixed protocol, the sender is a pyton client. The python client uses class inheritance to create bytes from the objects.
Now i would like to (in Erlang) take the bytes and convert these to their equivelant messages, they all have a common message header.
How can i do this as generic as possible in Erlang?
Kind Regards,
Me
Pattern matching/binary header consumption using Erlang's binary syntax. But you will need to know either exactly what bytes or bits your are expecting to receive, or the field sizes in bytes or bits.
For example, let's say that you are expecting a string of bytes that will either begin with the equivalent of the ASCII strings "PUSH" or "PULL", followed by some other data you will place somewhere. You can create a function head that matches those, and captures the rest to pass on to a function that does "push()" or "pull()" based on the byte header:
operation_type(<<"PUSH", Rest/binary>>) -> push(Rest);
operation_type(<<"PULL", Rest/binary>>) -> pull(Rest).
The bytes after the first four will now be in Rest, leaving you free to interpret whatever subsequent headers or data remain in turn. You could also match on the whole binary:
operation_type(Bin = <<"PUSH", _/binary>>) -> push(Bin);
operation_type(Bin = <<"PULL", _/binary>>) -> pull(Bin).
In this case the "_" variable works like it always does -- you're just checking for the lead, essentially peeking the buffer and passing the whole thing on based on the initial contents.
You could also skip around in it. Say you knew you were going to receive a binary with 4 bytes of fluff at the front, 6 bytes of type data, and then the rest you want to pass on:
filter_thingy(<<_:4/binary, Type:6/binary, Rest/binary>>) ->
% Do stuff with Rest based on Type...
It becomes very natural to split binaries in function headers (whether the data equates to character strings or not), letting the "Rest" fall through to appropriate functions as you go along. If you are receiving Python pickle data or something similar, you would want to write the parsing routine in a recursive way, so that the conclusion of each data type returns you to the top to determine the next type, with an accumulated tree that represents the data read so far.
I only covered 8-bit bytes above, but there is also a pure bitstring syntax, which lets you go as far into the weeds with bits and bytes as you need with the same ease of syntax. Matching is a real lifesaver here.
Hopefully this informed more than confused. Binary syntax in Erlang makes this the most pleasant binary parsing environment in a general programming language I've yet encountered.
http://www.erlang.org/doc/programming_examples/bit_syntax.html

Trying to understand nbits value from stratum protocol

I'm looking at the stratum protocol and I'm having a problem with the nbits value of the mining.notify method. I have trouble calculating it, I assume it's the currency difficulty.
I pull a notify from a dogecoin pool and it returned 1b3cc366 and at the time the difficulty was 1078.52975077.
I'm assuming here that 1b3cc366 should give me 1078.52975077 when converted. But I can't seem to do the conversion right.
I've looked here, here and also tried the .NET function BitConverter.Int64BitsToDouble.
Can someone help me understand what the nbits value signify?
You are right, nbits is current network difficulty.
Difficulty encoding is throughly described here.
Hexadecimal representation like 0x1b3cc366 consists of two parts:
0x1b -- number of bytes in a target
0x3cc366 -- target prefix
This means that valid hash should be less than 0x3cc366000000000000000000000000000000000000000000000000 (it is exactly 0x1b = 27 bytes long).
Floating point representation of difficulty shows how much current target is harder than the one used in the genesis block.
Satoshi decided to use 0x1d00ffff as a difficulty for the genesis block, so the target was
0x00ffff0000000000000000000000000000000000000000000000000000.
And 1078.52975077 is how much current target is greater than the initial one:
$ echo 'ibase=16;FFFF0000000000000000000000000000000000000000000000000000 / 3CC366000000000000000000000000000000000000000000000000' | bc -l
1078.52975077482646448605

Collision Attacks, Message Digests and a Possible solution

I've been doing some preliminary research in the area of message digests. Specifically collision attacks of cryptographic hash functions such as MD5 and SHA-1, such as the Postscript example and X.509 certificate duplicate.
From what I can tell in the case of the postscript attack, specific data was generated and embedded within the header of the postscript file (which is ignored during rendering) which brought about the internal state of the md5 to a state such that the modified wording of the document would lead to a final MD value equivalent to the original postscript file.
The X.509 took a similar approach where by data was injected within the comment/whitespace sections of the certificate.
Ok so here is my question, and I can't seem to find anyone asking this question:
Why isn't the length of ONLY the data being consumed added as a final block to the MD calculation?
In the case of X.509 - Why is the whitespace and comments being taken into account as part of the MD?
Wouldn't a simple processes such as one of the following be enough to resolve the proposed collision attacks:
MD(M + |M|) = xyz
MD(M + |M| + |M| * magicseed_0 +...+ |M| * magicseed_n) = xyz
where :
M : is the message
|M| : size of the message
MD : is the message digest function (eg: md5, sha, whirlpool etc)
xyz : is the pairing of the acutal message digest value for the message M and |M|. <M,|M|>
magicseed_{i}: Is a set of random values generated with seed based on the internal-state prior to the size being added.
This technqiue should work, as to date all such collision attacks rely on adding more data to the original message.
In short, the level of difficulty involved in generating a collision message such that:
It not only generates the same MD
But is also comprehensible/parsible/compliant
and is also the same size as the original message,
is immensely difficult if not near impossible. Has this approach ever been discussed? Any links to papers etc would be nice.
Further Question: What is the lower bound for collisions of messages of common length for a hash function H chosen randomly from U, where U is the set of universal hash functions ?
Is it 1/N (where N is 2^(|M|)) or is it greater? If it is greater, that implies there is more than 1 message of length N that will map to the same MD value for a given H.
If that is the case, how practical is it to find these other messages? bruteforce would be of O(2^N), is there a method of time complexity less than bruteforce?
Can't speak for the rest of the questions, but the first one is fairly simple - adding length data to the input of the md5, at any stage of the hashing process (1st block, Nth block, final block) just changes the output hash. You couldn't retrieve that length from the output hash string afterwards. It's also not inconceivable that a collision couldn't be produced from another string with the exact same length in the first place, so saying "the original string was 17 bytes" is meaningless, because the colliding string could also be 17 bytes.
e.g.
md5("abce(17bytes)fghi") = md5("abdefghi<long sequence of text to produce collision>")
is still possible.
In the case of X.509 certificates specifically, the "comments" are not comments in the programming language sense: they are simply additional attributes with an OID that indicates they are to be interpreted as comments. The signature on a certificate is defined to be over the DER representation of the entire tbsCertificate ('to be signed' certificate) structure which includes all the additional attributes.
Hash function design is pretty deep theory, though, and might be better served on the Theoretical CS Stack Exchange.
As #Marc points out, though, as long as more bits can be modified than the output of the hash function contains, then by the pigeonhole principle a collision must exist for some pair of inputs. Because cryptographic hash functions are in general designed to behave pseudo-randomly over their inputs, collisions will tend toward being uniformly distributed over possible inputs.
EDIT: Incorporating the message length into the final block of the hash function would be equivalent to appending the length of everything that has gone before to the input message, so there's no real need to modify the hash function to do this itself; rather, specify it as part of the usage in a given context. I can see where this would make some types of collision attacks harder to pull off, since if you change the message length there's a changed field "downstream" of the area modified by the attack. However, this wouldn't necessarily impede the X.509 intermediate CA forgery attack since the length of the tbsCertificate is not modified.