Bsd correction possible? - error-handling

I am looking at using the BSD checksum described here at wiki BSD does anyone know if you can use it for basic error correction?

Consider an 8 bit or 16 bit left rotating checksum where all the message bytes are supposed to be zero, but one them has a single bit error. The checksum will detect the error, but you'd get the same checksum for message[0] = 0x01, or message[1] = 0x02, ... , or message[7] = 0x80. The checksum can't determine which of these 8 (or more) possible error cases occurred, so it can't be used for error correction.
You'd need at least something like a Hamming code, BCH code or RS code to be able to correct one more bit errors. Since you have CRC as a tag, a single bit correcting binary BCH code is essentially the same as a CRC using a "primitive" polynomial that is the basis for a finite field, if the message length (including the CRC) is shorter than the number of possible values in the finite field. For example, a 15 bit message would have 11 data bits and 4 "parity" bits, based on a finite field of GF(2^4) (GF(16)).

Related

Compact-u16 - what is the purpose of this?

I was doing some research over the weekend on some blockchain dev in the Solana blockchain and came across a construct called Compact-u16. The definition of this in the documentation says the following: "A compact-u16 is a multi-byte encoding of 16 bits. The first byte contains the lower 7 bits of the value in its lower 7 bits. If the value is above 0x7f, the high bit is set and the next 7 bits of the value are placed into the lower 7 bits of a second byte. If the value is above 0x3fff, the high bit is set and the remaining 2 bits of the value are placed into the lower 2 bits of a third byte.".
I have been coding for 30+ years. Maybe I'm just old school on this, but why is there a construct to store 16 bits of data in 3 bytes? This is just vastly inefficient from my standpoint. Is there a reason for this? On further research, I found a doc related to assembly instruction pointers, which referenced 7 instruction pointers that are useful for caching values when context switching in and out of the processor stack. But this construct is used for a web app platform. Like, literally, there is no reason that I have been able to find that justifies using 3 bytes to store 16 bits of data. If the developers wanted to use an elegant bit mapping solution to compress space, why not just use a semaphore? Why create a brand new construct that increases the storage requirements for the data by 33%.
What am I missing?
I had some similar confusion when reading the compact-u16 description. Based on the code for parsing them in the solana python module I believe they're doing something conceptually similar to UTF-8, and storing the number in 1-3 bytes depending on its size.
Basically instead of each byte having 8 bits of a number, it has 7 bits of the number and a flag (the most significant bit) that indicates whether the number continues in the next byte. For the largest numbers they need an extra byte, but for numbers less than 128 they need only one byte. Since Solana seems to use these for storing the length of arrays, if it's common that the length of the arrays is less than 128 then they will end up with fewer total bytes to transfer across all transactions.
Some examples I worked out for myself:
hex | compact-u16
--------+------------
0x0000 | [0x00]
0x0001 | [0x01]
0x007f | [0x7f]
0x0080 | [0x80 0x01]
0x3fff | [0xff 0x7f]
0x4000 | [0x80 0x80 0x01]
0xc000 | [0x80 0x80 0x03]
0xffff | [0xff 0xff 0x03])

DEFLATE: how to handle "no distance codes" case?

I mostly get RFC 1951, however I'm not too clear on how to manage the case where (when using dynamic Huffman tables) no distance codes are needed or present. For example, let's take the input:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890987654321ZYXWVUTSR
where no backreference is possible since there are no repetitions of length >= 3.
According to RFC 1951, at least one distance code must be present regardless, otherwise it wouldn't be possible to encode HDIST - 1. I understand, according to the reference, that such code should be of zero bits to signal "no distance codes".
One distance code of zero bits means that there are no distance codes
used at all (the data is all literals).
In infgen symbols, I'd expect to see a dist 0 0.
Analyzing what gzip does with infgen, however, I see that TWO distance codes are emitted (each 1 bit long) for the above input (even though none is actually used then):
! infgen 2.4 output
!
gzip
!
last
dynamic
litlen 48 6
litlen 49 6
litlen 50 6
...cut...
litlen 121 6
litlen 122 6
litlen 256 6
dist 0 1
dist 1 1
literal 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890987654321Z
literal 'YXWVUTSR
end
!
crc
length
So what's the correct behavior in these cases?
If there are no matches in the deflate block, there will be no lengths from the length/literal code, and so the decoder will never look for a distance code. In that case, what would make the most sense is to provide no information at all about a distance code.
However the format does not permit that, since the 5-bit HDIST value in the header is interpreted as 1 to 32 distance codes, for which lengths must be provided for in the header. You must provide at least one distance code length in the header, even though it will never be used.
There are several valid things you can do in that case. RFC 1951 notes you can provide a single distance code (HDIST == 0, meaning one length), with length zero, which would be just one zero in the list of lengths.
It is also permitted to provide a single code of length one, or you could do as zlib is doing, which is to provide two codes of length one. You can actually put any valid distance code description you like there, and it will still be accepted.
As to why zlib's deflate is choosing to define two codes there, I can only guess that Jean-loup was being conservative, writing something he knew that even an over-simplified inflator would have to accept. Both gzip and zopfli do the same thing. They all do the same thing when there is only one distance code used. They could emit just the single one-bit distance code, per the RFC, but they emit two single-bit distance codes, one of which is never used.
Really the right thing to do would be to write a single zero length as noted in the RFC, which would take the fewest number of bits in the header. I will consider updating zlib to do that, to eke out a few more bits of compression.

How do i reverse engineer a known Cyclic Redundancy Check value?

So I have problem finding the CRC value for a series of commands, I already have the CRC values for some of the commands but I need to figure out how they were calculated. After carefully going over the data stream and attempting to calculate the CRC we cannot get the known CRC value and the calculated one to match. I have never calculated a CRC before but I have read multiple papers on it and it seems easy enough, except for the fact that its not working. The manual I have says the polynomial generator is (x^8 + x^7 + X^2 + X^0) and it gives me a unique non zero value of B1 (hex). The full command is A9E40401 (hex) with a CRC of 1E (hex). The process I am currently using involves converting the data stream from hex into binary, using the LSB first rule, inserting FF(hex) into the command to detect extraneous zeros, adding 00(hex) to the end as a place holder and than performing mod2 division, and than I invert it and apply it to the data stream. Either I'm doing something wrong or I missed a step. I am assuming the polynomial I was provided is correct. Any help would be greatly appreciated.
You can try RevEng, which is for exactly this, to determine the CRC parameters from a set of examples.
Then you can use crcany to generate the code.
I feel like an idiot.
When I calculated my CRC I did every step properly except the very last step where you must perform the LSB first rule and than invert the remainder.
I only performed the LSB first rule (thinking that's what was meant by invert).
So with a remainder of 00011101 I got the wrong CRC value (10111000). when I should have gotten 01000111.

Trying to understand nbits value from stratum protocol

I'm looking at the stratum protocol and I'm having a problem with the nbits value of the mining.notify method. I have trouble calculating it, I assume it's the currency difficulty.
I pull a notify from a dogecoin pool and it returned 1b3cc366 and at the time the difficulty was 1078.52975077.
I'm assuming here that 1b3cc366 should give me 1078.52975077 when converted. But I can't seem to do the conversion right.
I've looked here, here and also tried the .NET function BitConverter.Int64BitsToDouble.
Can someone help me understand what the nbits value signify?
You are right, nbits is current network difficulty.
Difficulty encoding is throughly described here.
Hexadecimal representation like 0x1b3cc366 consists of two parts:
0x1b -- number of bytes in a target
0x3cc366 -- target prefix
This means that valid hash should be less than 0x3cc366000000000000000000000000000000000000000000000000 (it is exactly 0x1b = 27 bytes long).
Floating point representation of difficulty shows how much current target is harder than the one used in the genesis block.
Satoshi decided to use 0x1d00ffff as a difficulty for the genesis block, so the target was
0x00ffff0000000000000000000000000000000000000000000000000000.
And 1078.52975077 is how much current target is greater than the initial one:
$ echo 'ibase=16;FFFF0000000000000000000000000000000000000000000000000000 / 3CC366000000000000000000000000000000000000000000000000' | bc -l
1078.52975077482646448605

Interoperability of AES CTR mode?

I use AES128 crypto in CTR mode for encryption, implemented for different clients (Android/Java and iOS/ObjC). The 16 byte IV used when encrypting a packet is formated like this:
<11 byte nonce> | <4 byte packet counter> | 0
The packet counter (included in a sent packet) is increased by one for every packet sent. The last byte is used as block counter, so that packets with fewer than 256 blocks always get a unique counter value. I was under the assumption that the CTR mode specified that the counter should be increased by 1 for each block, using the 8 last bytes as counter in a big endian way, or that this at least was a de facto standard. This also seems to be the case in the Sun crypto implementation.
I was a bit surprised when the corresponding iOS implementation (using CommonCryptor, iOS 5.1) failed to decode every block except the first when decoding a packet. It seems that CommonCryptor defines the counter in some other way. The CommonCryptor can be created in both big endian and little endian mode, but some vague comments in the CommonCryptor code indicates that this is not (or at least has not been) fully supported:
http://www.opensource.apple.com/source/CommonCrypto/CommonCrypto-60026/Source/API/CommonCryptor.c
/* corecrypto only implements CTR_BE. No use of CTR_LE was found so we're marking
this as unimplemented for now. Also in Lion this was defined in reverse order.
See <rdar://problem/10306112> */
By decoding block by block, each time setting the IV as specified above, it works nicely.
My question: is there a "right" way of implementing the CTR/IV mode when decoding multiple blocks in a single go, or can I expect it to be interoperability problems when using different crypto libs? Is CommonCrypto bugged in this regard, or is it just a question of implementing the CTR mode differently?
The definition of the counter is (loosely) specified in NIST recommendation sp800-38a Appendix B. Note that NIST only specifies how to use CTR mode with regards to security; it does not define one standard algorithm for the counter.
To answer your question directly, whatever you do you should expect the counter to be incremented by one each time. The counter should represent a 128 bit big endian integer according to the NIST specifications. It may be that only the least significant (rightmost) bits are incremented, but that will usually not make a difference unless you pass the 2^32 - 1 or 2^64 - 1 value.
For the sake of compatibility you could decide to use the first (leftmost) 12 bytes as random nonce, and leave the latter ones to zero, then let the implementation of the CTR do the increments. In that case you simply use a 96 bit / 12 byte random at the start, in that case there is no need for a packet counter.
You are however limited to 2^32 * 16 bytes of plaintext until the counter uses up all the available bits. It is implementation specific if the counter returns to zero or if the nonce itself is included in the counter, so you may want to limit yourself to messages of 68,719,476,736 = ~68 GB (yes that's base 10, Giga means 1,000,000,000).
because of the birthday problem you've got a 2^48 chance (48 = 96 / 2) of creating a collision for the nonce (required for each message, not each block), so you should limit the amount of messages;
if some attacker tricks you into decrypting 2^32 packets for the same nonce, you run out of counter.
In case this is still incompatible (test!) then use the initial 8 bytes as nonce. Unfortunately that does mean that you need to limit the number of messages because of the birthday problem.
Further investigations sheds some light on the CommonCrypto problem:
In iOS 6.0.1 the little endian option is now unimplemented. Also, I have verified that CommonCrypto is bugged in that the CCCryptorReset method does not in fact change the IV as it should, instead using pre-existing IV. The behaviour in 6.0.1 is different from 5.x.
This is potentially a security risc, if you initialize CommonCrypto with a nulled IV, and reset it to the actual IV right before encrypting. This would lead to all your data being encrypted with the same (nulled) IV, and multiple streams (that perhaps should have different IV but use same key) would leak data via a simple XOR of packets with corresponding ctr.