How to decide when to reflect or XOR CRC data? - input

I found multiple optimal CRC-32 polynomials on the CRC Polynomal Zoo site of Philip Koopman. Now I want to generate a CRC lookup table for one of the polynomials, by using the software pycrc.
To generate a CRC lookup table you have to provide the following information for the choosen polynomial:
Reflected in (boolean)
Reflected out (boolean)
XOR in (hex value)
XOR out (hex value)
For some polynomials I found the above parameters in a specification (for instance a AUTOSAR specification for the polynomial "F4ACFB13"), but what parameters should I choose if there is no specification for a certain polynomial? The Koopman site doesn't seem to provide the recommended parameters to use.
I already tried to find an explanation how to choose these parameters, but I could only find explanations how to implement these parameters and not how to choose them. Most websites recommend searching for specifications describing "common CRC polynomials", because they provide the optimal parameters.

Generally you are trying to match the CRC used in some existing protocol. In that case you need to do the same thing you did for the AUTOSAR CRC: find the specification for the CRC. Or you need to get several examples of messages and correct CRCs and try to reverse-engineer the CRC parameters.
You can find over a hundred CRC definitions here.
If you are creating your own protocol from scratch, then you can select any polynomial, reflection, initial value, and final exclusive-or you like, as well as any byte order of the CRC in the message. I would recommend that the polynomial be chosen with good properties for your message length from Phil's data, and that the initial value of the CRC register, init, not be zero. (If it is zero, then the CRC of any string of zeros will be the same value, that final exclusive-or, regardless of the length.) Also there is no detriment, and it is more aesthetic to pick the initial value and the final exclusive-or to be equal, so that the CRC of an empty sequence is zero.

Related

check FCS ethernet frame CRC-32 online tools

I'm trying to check the FCS of an ethernet frame thanks to tools on different website.
I first used this website:
http://depa.usst.edu.cn/chenjq/www2/software/crc/CRC_Javascript/CRCcalculation.htm and find the next FCS : 0xD4C3C62F (the frame below)
Then, I tried this one : http://www.scadacore.com/field-applications/programming-calculators/online-checksum-calculator/ and I found the correct CRC : 0x7AD56BB3 but nothing of the different kind of CRC-32 (normal, reversed...) correspond to the CRC find on the first website.
Is there any link between algorithms?
Thank you!
Here is the hexadecimal frame (no start of frame) :
000AE6F005A3001234567890080045000030B3FE0000801172BA0A0000030A00000204000400001C894D000102030405060708090A0B0C0D0E0F10111213
Beware of online CRC calculators.
The Ethernet CRC of your string is actually 0xb36bd57a. It is stored in reverse order in the stream, which is why you wrote it incorrectly as 0x7AD56BB3.
There are many CRC definitions, including many 32-bit CRC definitions. See the RevEng catalog for examples. The one you want happens to be called "CRC-32", with this definition.
The "CCITT-32" (a name I have not seen before) being calculated in your first link is some other definition. It does not even appear in the RevEng catalog.
A more descriptive and clear update to #Mark Adler's answer (I am new here so I can't edit or comment)
The CRC you are searching for is called CRC-32/ISO-HDLC.
You can check the following online calculator and check the one named "CRC-32":
https://crccalc.com/
Each CRC32 algorithm has its own parameters for its generation, like polynomial, init,...etc
The IEEE802.3 standard defined the CRC32 algorithm parameters for the FCS field to have the polynomial(0x04c11db7), init/xorIn(0xffffffff), xorout(0xffffffff)

How do i reverse engineer a known Cyclic Redundancy Check value?

So I have problem finding the CRC value for a series of commands, I already have the CRC values for some of the commands but I need to figure out how they were calculated. After carefully going over the data stream and attempting to calculate the CRC we cannot get the known CRC value and the calculated one to match. I have never calculated a CRC before but I have read multiple papers on it and it seems easy enough, except for the fact that its not working. The manual I have says the polynomial generator is (x^8 + x^7 + X^2 + X^0) and it gives me a unique non zero value of B1 (hex). The full command is A9E40401 (hex) with a CRC of 1E (hex). The process I am currently using involves converting the data stream from hex into binary, using the LSB first rule, inserting FF(hex) into the command to detect extraneous zeros, adding 00(hex) to the end as a place holder and than performing mod2 division, and than I invert it and apply it to the data stream. Either I'm doing something wrong or I missed a step. I am assuming the polynomial I was provided is correct. Any help would be greatly appreciated.
You can try RevEng, which is for exactly this, to determine the CRC parameters from a set of examples.
Then you can use crcany to generate the code.
I feel like an idiot.
When I calculated my CRC I did every step properly except the very last step where you must perform the LSB first rule and than invert the remainder.
I only performed the LSB first rule (thinking that's what was meant by invert).
So with a remainder of 00011101 I got the wrong CRC value (10111000). when I should have gotten 01000111.

Erlang binary protocol serialization

I'm currently using Erlang for a big project but i have a question regarding a proper proceeding.
I receive bytes over a tcp socket. The bytes are according to a fixed protocol, the sender is a pyton client. The python client uses class inheritance to create bytes from the objects.
Now i would like to (in Erlang) take the bytes and convert these to their equivelant messages, they all have a common message header.
How can i do this as generic as possible in Erlang?
Kind Regards,
Me
Pattern matching/binary header consumption using Erlang's binary syntax. But you will need to know either exactly what bytes or bits your are expecting to receive, or the field sizes in bytes or bits.
For example, let's say that you are expecting a string of bytes that will either begin with the equivalent of the ASCII strings "PUSH" or "PULL", followed by some other data you will place somewhere. You can create a function head that matches those, and captures the rest to pass on to a function that does "push()" or "pull()" based on the byte header:
operation_type(<<"PUSH", Rest/binary>>) -> push(Rest);
operation_type(<<"PULL", Rest/binary>>) -> pull(Rest).
The bytes after the first four will now be in Rest, leaving you free to interpret whatever subsequent headers or data remain in turn. You could also match on the whole binary:
operation_type(Bin = <<"PUSH", _/binary>>) -> push(Bin);
operation_type(Bin = <<"PULL", _/binary>>) -> pull(Bin).
In this case the "_" variable works like it always does -- you're just checking for the lead, essentially peeking the buffer and passing the whole thing on based on the initial contents.
You could also skip around in it. Say you knew you were going to receive a binary with 4 bytes of fluff at the front, 6 bytes of type data, and then the rest you want to pass on:
filter_thingy(<<_:4/binary, Type:6/binary, Rest/binary>>) ->
% Do stuff with Rest based on Type...
It becomes very natural to split binaries in function headers (whether the data equates to character strings or not), letting the "Rest" fall through to appropriate functions as you go along. If you are receiving Python pickle data or something similar, you would want to write the parsing routine in a recursive way, so that the conclusion of each data type returns you to the top to determine the next type, with an accumulated tree that represents the data read so far.
I only covered 8-bit bytes above, but there is also a pure bitstring syntax, which lets you go as far into the weeds with bits and bytes as you need with the same ease of syntax. Matching is a real lifesaver here.
Hopefully this informed more than confused. Binary syntax in Erlang makes this the most pleasant binary parsing environment in a general programming language I've yet encountered.
http://www.erlang.org/doc/programming_examples/bit_syntax.html

Collision Attacks, Message Digests and a Possible solution

I've been doing some preliminary research in the area of message digests. Specifically collision attacks of cryptographic hash functions such as MD5 and SHA-1, such as the Postscript example and X.509 certificate duplicate.
From what I can tell in the case of the postscript attack, specific data was generated and embedded within the header of the postscript file (which is ignored during rendering) which brought about the internal state of the md5 to a state such that the modified wording of the document would lead to a final MD value equivalent to the original postscript file.
The X.509 took a similar approach where by data was injected within the comment/whitespace sections of the certificate.
Ok so here is my question, and I can't seem to find anyone asking this question:
Why isn't the length of ONLY the data being consumed added as a final block to the MD calculation?
In the case of X.509 - Why is the whitespace and comments being taken into account as part of the MD?
Wouldn't a simple processes such as one of the following be enough to resolve the proposed collision attacks:
MD(M + |M|) = xyz
MD(M + |M| + |M| * magicseed_0 +...+ |M| * magicseed_n) = xyz
where :
M : is the message
|M| : size of the message
MD : is the message digest function (eg: md5, sha, whirlpool etc)
xyz : is the pairing of the acutal message digest value for the message M and |M|. <M,|M|>
magicseed_{i}: Is a set of random values generated with seed based on the internal-state prior to the size being added.
This technqiue should work, as to date all such collision attacks rely on adding more data to the original message.
In short, the level of difficulty involved in generating a collision message such that:
It not only generates the same MD
But is also comprehensible/parsible/compliant
and is also the same size as the original message,
is immensely difficult if not near impossible. Has this approach ever been discussed? Any links to papers etc would be nice.
Further Question: What is the lower bound for collisions of messages of common length for a hash function H chosen randomly from U, where U is the set of universal hash functions ?
Is it 1/N (where N is 2^(|M|)) or is it greater? If it is greater, that implies there is more than 1 message of length N that will map to the same MD value for a given H.
If that is the case, how practical is it to find these other messages? bruteforce would be of O(2^N), is there a method of time complexity less than bruteforce?
Can't speak for the rest of the questions, but the first one is fairly simple - adding length data to the input of the md5, at any stage of the hashing process (1st block, Nth block, final block) just changes the output hash. You couldn't retrieve that length from the output hash string afterwards. It's also not inconceivable that a collision couldn't be produced from another string with the exact same length in the first place, so saying "the original string was 17 bytes" is meaningless, because the colliding string could also be 17 bytes.
e.g.
md5("abce(17bytes)fghi") = md5("abdefghi<long sequence of text to produce collision>")
is still possible.
In the case of X.509 certificates specifically, the "comments" are not comments in the programming language sense: they are simply additional attributes with an OID that indicates they are to be interpreted as comments. The signature on a certificate is defined to be over the DER representation of the entire tbsCertificate ('to be signed' certificate) structure which includes all the additional attributes.
Hash function design is pretty deep theory, though, and might be better served on the Theoretical CS Stack Exchange.
As #Marc points out, though, as long as more bits can be modified than the output of the hash function contains, then by the pigeonhole principle a collision must exist for some pair of inputs. Because cryptographic hash functions are in general designed to behave pseudo-randomly over their inputs, collisions will tend toward being uniformly distributed over possible inputs.
EDIT: Incorporating the message length into the final block of the hash function would be equivalent to appending the length of everything that has gone before to the input message, so there's no real need to modify the hash function to do this itself; rather, specify it as part of the usage in a given context. I can see where this would make some types of collision attacks harder to pull off, since if you change the message length there's a changed field "downstream" of the area modified by the attack. However, this wouldn't necessarily impede the X.509 intermediate CA forgery attack since the length of the tbsCertificate is not modified.

How can I use SYNCSORT to format a Packed Decimal field with a specifc sign value?

I want to use SYNCSORT to force all Packed Decimal fields to a negative sign value. The critical requirement is the 2nd nibble must be Hex 'D'. I have a method that works but it seems much too complex. In keeping with the KISS principle, I'm hoping someone has a better method. Perhaps using a bit mask on the last 4 bits? Here is the code I have come up with. Is there a better way?
*
* This sort logic is intended to force all Packed Decimal amounts to
* have a negative sign with a B'....1101' value (Hex 'xD').
*
SORT FIELDS=COPY
OUTFIL FILES=1,
INCLUDE=(8,1,BI,NE,B'....1..1',OR, * POSITIVE PACKED DECIMAL
8,1,BI,EQ,B'....1111'), * UNSIGNED PACKED DECIMAL
OUTREC=(1:1,7, * INCLUDING +0
8:(-1,MUL,8,1,PD),PD,LENGTH=1,
9:9,72)
OUTFIL FILES=2,
INCLUDE=(8,1,BI,EQ,B'....1..1',AND, * NEGATIVE PACKED DECIMAL
8,1,BI,NE,B'....1111'), * NOT UNSIGNED PACKED DECIMAL
OUTREC=(1:1,7, * INCLUDING -0
8:(+1,MUL,8,1,PD),PD,LENGTH=1,
9:9,72)
In the code that processes the VSAM file, can you change the read logic to GET with KEY GTEQ and check for < 0 on the result instead of doing a specific keyed read?
If you did that, you could accept all three negative packed values xA, xB and xD.
Have you considered writing an E15 user exit? The E15 user exit lets you
manipulate records as they are input to the sort process. In this case you would have a
REXX, COBOL or other LE compatible language subroutine patch the packed decimal sign field as it is input to the sort process. No need to split into multiple files to be merged later on.
Here is a link to example JCL
for invoking an E15 exit from DFSORT (same JCL for SYNCSORT). Chapter 4 of this reference
describes how to develop User Exit routines, again this is a DFSORT manual but I believe SyncSort is
fully compatible in this respect. Writing a user exit is no different than writing any other subroutine - get the linkage right and the rest is easy.
This is a very general outline, but I hope it helps.
Okay, it took some digging but NEALB's suggestion to seek help on MVSFORUMS.COM paid off... here is the final result. The OUTREC logic used with SORT/MERGE replaces OUTFIL and takes advantage of new capabilities (IFTHEN, WHEN and OVERLAY) in Syncsort 1.3 that I didn't realize existed. It pays to have current documentation available!
*
* This MERGE logic is intended to assert that the Packed Decimal
* field has a negative sign with a B'....1101' value (Hex X'.D').
*
*
MERGE FIELDS=(27,5.4,BI,A),EQUALS
SUM FIELDS=NONE
OUTREC IFTHEN=(WHEN=(32,1,BI,NE,B'....1..1',OR,
32,1,BI,EQ,B'....1111'),
OVERLAY=(32:(-1,MUL,32,1,PD),PD,LENGTH=1)),
IFTHEN=(WHEN=(32,1,BI,EQ,B'....1..1',AND,
32,1,BI,NE,B'....1111'),
OVERLAY=(32:(+1,MUL,32,1,PD),PD,LENGTH=1))
Looking at the last byte of a packed field is possible. You want positive/unsigned to negative, so if it is greater than -1, subtract it from zero.
From a short-lived Answer by MikeC, it is now known that the data contains non-preferred signs (that is, it can contain A through F in the low-order half-byte, whereas a preferred sign would be C (positive) or D (negative). F is unsigned, treated as positive.
This is tested with DFSORT. It should work with SyncSORT. Turns out that DFSORT can understand a negative packed-decimal zero, but it will not create a negative packed-decimal zero (it will allow a zoned-decimal negative zero to be created from a negative zero packed-decimal).
The idea is that a non-preferred sign is valid and will be accurately signed for input to a decimal machine instruction, but the result will always be a preferred sign, and will be correct. So by adding zero first, the field gets turned into a preferred sign and then the test for -1 will work as expected. With data in the sign-nybble for packed-decimal fields, SORT has some specific and documented behaviours, which just don't happen to help here.
Since there is only one value to deal with to become the negative zero, X'0C', after the normalisation of signs already done, there is a simple test and replacement with a constant of X'0D' for the negative zero. Since the negative zero will not work, the second test is changed from the original minus one to zero.
With non-preferred signs in the data:
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=INIT,
OVERLAY=(32:+0,ADD,32,1,PD,TO=PD,LENGTH=1)),
IFTHEN=(WHEN=(32,1,CH,EQ,X'0C'),
OVERLAY=(32:X'0D')),
IFTHEN=(WHEN=(32,1,PD,GT,0),
OVERLAY=(32:+0,SUB,32,1,PD,TO=PD,LENGTH=1))
With preferred signs in the data:
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=(32,1,CH,EQ,X'0C'),
OVERLAY=(32:X'0D')),
IFTHEN=(WHEN=(32,1,PD,GT,0),
OVERLAY=(32:+0,SUB,32,1,PD,TO=PD,LENGTH=1))
Note: If non-preferred signs are stuffed through a COBOL program not using compiler option NUMPROC(NOPFD) then results will be "interesting".