What is the difference between “SHA-2” and “SHA-256” - sha256

I'm a bit confused on the difference between SHA-2 and SHA-256 and often hear them used interchangeably. I think SHA-2 a "family" of hash algorithms and SHA-256 a specific algorithm in that family. Can anyone clear up the confusion.

The SHA-2 family consists of multiple closely related hash functions. It is essentially a single algorithm in which a few minor parameters are different among the variants.
The initial spec only covered 224, 256, 384 and 512 bit variants.
The most significant difference between the variants is that some are 32 bit variants and some are 64 bit variants. In terms of performance this is the only difference that matters.
On a 32 bit CPU SHA-224 and SHA-256 will be a lot faster than the other variants because they are the only 32 bit variants in the SHA-2 family. Executing the 64 bit variants on a 32 bit CPU will be slow due to the added complexity of performing 64 bit operations on a 32 bit CPU.
On a 64 bit CPU SHA-224 and SHA-256 will be a little slower than the other variants. This is because due to only processing 32 bits at a time, they will have to perform more operations in order to make it through the same number of bytes. You do not get quite a doubling in speed from switching to a 64 bit variant because the 64 bit variants do have a larger number of rounds than the 32 bit variants.
The internal state is 256 bits in size for the two 32 bit variants and 512 bits in size for all four 64 bit variants. So the number of possible sizes for the internal state is less than the number of possible sizes for the final output. Going from a large internal state to a smaller output can be good or bad depending on your point of view.
If you keep the output size fixed it can in general be expected that increasing the size of the internal state will improve security. If you keep the size of the internal state fixed and decrease the size of the output, collisions become more likely, but length extension attacks may become easier. Making the output size larger than the internal state would be pointless.
Due to the 64 bit variants being both faster (on 64 bit CPUs) and likely to be more secure (due to larger internal state), two new variants were introduced using 64 bit words but shorter outputs. Those are the ones known as 512/224 and 512/256.
The reasons for wanting variants with output that much shorter than the internal state is usually either that for some usages it is impractical to use such a long output or that the output need to be used as key for some algorithm that takes an input of a certain size.
Simply truncating the final output to your desired length is also possible. For example a HMAC construction specify truncating the final hash output to the desired MAC length. Due to HMAC feeding the output of one invocation of the hash as input to another invocation it means that using a hash with shorter output results in a HMAC with less internal state. For this reason it is likely to be slightly more secure to use HMAC-SHA-512 and truncate the output to 384 bits than to use HMAC-SHA-384.
The final output of SHA-2 is simply the internal state (after processing length extended input) truncated to the desired number of output bits. The reason SHA-384 and SHA-512 on the same input look so different is that a different IV is specified for each of the variants.

Wikipedia:
The SHA-2 family consists of six hash functions with digests (hash
values) that are 224, 256, 384 or 512 bits: SHA-224, SHA-256, SHA-384,
SHA-512, SHA-512/224, SHA-512/256.

Related

Homomorphic encryption using Palisade library

To all homomorphic encryption experts out there:
I'm using the PALISADE library:
int plaintextModulus = 65537;
float sigma = 3.2;
SecurityLevel securityLevel = HEStd_128_classic;
uint32_t depth = 2;
//Instantiate the crypto context
CryptoContext<DCRTPoly> cc = CryptoContextFactory<DCRTPoly>::genCryptoContextBFVrns(
plaintextModulus, securityLevel, sigma, 0, depth, 0, OPTIMIZED);
could you please explain (all) the parameters especially intrested in ptm, depth and sigma.
Secondly I am trying to make a Packed Plaintext with the cc above.
cc->MakePackedPlaintext(array);
What is the maximum size of the array? On my local machine (8GB RAM) when the array is larger than ~8000 int64 I get an free(): invalid next size (normal) error
Thank you for asking the question.
Plaintext modulus t (denoted as t here) is a critical parameter for BFV as all operations are performed mod t. In other words, when you choose t, you have to make sure that all computations do not wrap around, i.e., do not exceed t. Otherwise you will get an incorrect answer unless your goal is to compute something mod t.
sigma is the distribution parameter (used for the underlying Learning with Errors problem). You can just set to 3.2. No need to change it.
Depth is the multiplicative depth of the circuit you are trying to compute. It has nothing to with the size of vectors. Basically, if you have AxBxCxD, you have a depth 3 with a naive approach. BFV also supports more efficient binary tree evaluation, i.e., (AxB)x(CxD) - this option will reduce the depth to 2.
BFV is a scheme that supports packing. By default, the size of packed ciphertext is equal to the ring dimension (something like 8192 for the example you mentioned). This means you can pack up to 8192 integers in your case. To support larger arrays/vectors, you would need to break them into batches of 8192 each and encrypt each one separately.
Regarding your application, the CKKS scheme would probably be a much better option (I will respond on the application in more detail in the other thread).
I have some experience with the SEAL library which also uses the BFV encryption scheme. The BFV scheme uses modular arithmetic and is able to encrypt integers (not real numbers).
For the parameters you're asking about:
The Plaintext Modulus is an upper bound for the input integers. If this parameter is too low, it might cause your integers to overflow (depending on how large they are of course)
The Sigma is the distribution parameter for Gaussian noise generation
The Depth is the circuit depth which is the maximum number of multiplications on a path
Also for the Packed Plaintext, you should use vectors not arrays. Maybe that will fix your problem. If not, try lowering the size and make several vectors if necessary.
You can determine the ring dimension (generated by the crypto context based on your parameter settings) by using cc->GetRingDimension() as shown in line 113 of https://gitlab.com/palisade/palisade-development/blob/master/src/pke/examples/simple-real-numbers.cpp

Is there a CRC or criptographic function for generating smaller size unique results from unique inputs?

I have a manufacturer unique number ID of 128 bits that I cannot change and it's size is just too long for our purpose (2^128). This is on some embedded micro controller.
One idea is to compute a (run time) CRC32 or hash for narrowing the results but I am not sure for unicity CRC32 as a example: this can be unique for 2^32
Or what king of cryptography function I can use for guarantee unicity of 32 bits output based on unique input?
Thanks for clarifications,
If you know all these ID values in advance, then you can check them using a hash table. You can save space by storing only as many bits of each hash value as are necessary to tell them apart if them happen to land in the same bucket.
If not, then you're going to have a hard time, I'm afraid.
Let's assume these 128-bit IDs are produced as the output of a cryptographic hash function (e.g., MD5), so each ID resembles 128 bits chosen uniformly at random.
If you reduce these to 32-bit values, then the best you can hope to achieve is a set of 32-bit numbers where each bit is 0 or 1 with uniform probability. You could do this by calculating the CRC32 checksum, or by simply discarding 96 bits — it makes no difference.
32 bits is not enough enough to avoid collisions. The collision probability exceeds 1 in a million after just 93 inputs, and 1 in a thousand after 2,900 inputs. After 77,000 inputs, the collision probability reaches 50%. (Source).
So instead, your only real options are to somehow reverse-engineer the ID values into something smaller, or implement some external means of replacing these IDs with sequential integers (e.g., using a hash table).

Does Camellia have a 256 bit block size?

Wikipedia says Camellia comes with a block size of 128 and a variable key size (128, 192, 256). Another site lists it as a 256 Bit cipher.
The OpenSSL API has a function named EVP_camellia_256_cbc. Does this refer to the key size or the block size? And does Camellia support 256 Bit block sizes at all?
The information on the Wikipedia page is correct: Camellia has a fixed block size of 128 bit and a variable key size of 128, 192 and 256 bit. You can compare that with other authoritative sources like its specification, e.g. found in RFC 3713.
The "256 bit" in "256 bit cipher" usually refers to its security level and that is determined by its key size (and potential attack vectors that might decrease it).
Therefore, EVP_camellia_256_cbc means Camellia with a 256 bit key size, so you should supply keys of that size. Supplying keys of the correct key size is important, because some implementations may behave differently than others and you will lose a lot of time debugging when trying to connect different implementations.
For example, if you define that you want to use Camellia-256, but you're passing a key of 192 bit, it may happen that
one implementation fills the passed key with 0x00 byte up to the specified key size,
another implementation doesn't care about the specification and only looks at the actual supplied key to then run Camellia-192 or
a broken implementation (for non-standard key sizes) that calculates the number of rounds (12 or 14 for Camellia) that need to be used and arrives at a non-standard number of rounds which makes the result non-compatible with all other implementations.

Encoding - Efficiently send sparse boolean array

I have a 256 x 256 boolean array. These array is constantly changing and set bits are practically randomly distributed.
I need to send a current list of the set bits to many clients as they request them.
Following numbers are approximations.
If I send the coordinates for each set bit:
set bits data transfer (bytes)
0 0
100 200
300 600
500 1000
1000 2000
If I send the distance (scanning from left to right) to the next set bit:
set bits data transfer (bytes)
0 0
100 256
300 300
500 500
1000 1000
The typical number of bits that are set in this sparse array is around 300-500, so the second solution is better.
Is there a way I can do better than this without much added processing overhead?
Since you say "practically randomly distributed", let's assume that each location is a Bernoulli trial with probability p. p is chosen to get the fill rate you expect. You can think of the length of a "run" (your option 2) as the number of Bernoulli trials necessary to get a success. It turns out this number of trials follows the Geometric distribution (with probability p). http://en.wikipedia.org/wiki/Geometric_distribution
What you've done so far in option #2 is to recognize the maximum length of the run in each case of p, and reserve that many bits to send all of them. Note that this maximum length is still just a probability, and the scheme will fail if you get REALLY REALLY unlucky, and all your bits are clustered at the beginning and end.
As #Mike Dunlavey recommends in the comment, Huffman coding, or some other form of entropy coding, can redistribute the bits spent according to the frequency of the length. That is, short runs are much more common, so use fewer bits to send those lengths. The theoretical limit for this encoding efficiency is the "entropy" of the distribution, which you can look up on that Wikipedia page, and evaluate for different probabilities. In your case, this entropy ranges from 7.5 bits per run (for 1000 entries) to 10.8 bits per run (for 100).
Actually, this means you can't do much better than you're currently doing for the 1000 entry case. 8 bits = 1 byte per value. For the case of 100 entries, you're currently spending 20.5 bits per run instead of the theoretically possible 10.8, so that end has the highest chance for improvement. And in the case of 300: I think you haven't reserved enough bits to represent these sequences. The entropy comes out to 9.23 bits per pixel, and you're currently sending 8. You will find many cases where the space between true exceeds 256, which will overflow your representation.
All of this, of course, assumes that things really are random. If they're not, you need a different entropy calculation. You can always compute the entropy right out of your data with a histogram, and decide if it's worth pursuing a more complicated option.
Finally, also note that real-life entropy coders only approximate the entropy. Huffman coding, for example, has to assign an integer number of bits to each run length. Arithmetic coding can assign fractional bits.

What does alignment to 16-byte boundary mean in x86

Intel's official optimization guide has a chapter on converting from MMX commands to SSE where they state the fallowing statment:
Computation instructions which use a memory operand that may not be aligned to a 16-byte boundary must be replaced with an unaligned 128-bit load (MOVDQU) followed by the same computation operation that uses instead register operands.
(chapter 5.8 Converting from 64-bit to 128-bit SIMD Integers, pg. 5-43)
I can't understand what they mean by "may not be aligned to a 16-byte boundary", could you please clarify it and give some examples?
Certain SIMD instructions, which perform the same instruction on multiple data, require that the memory address of this data is aligned to a certain byte boundary. This effectively means that the address of the memory your data resides in needs to be divisible by the number of bytes required by the instruction.
So in your case the alignment is 16 bytes (128 bits), which means the memory address of your data needs to be a multiple of 16. E.g. 0x00010 would be 16 byte aligned, while 0x00011 would not be.
How to get your data to be aligned depends on the programming language (and sometimes compiler) you are using. Most languages that have the notion of a memory address will also provide you with means to specify the alignment.
I'm guessing here, but could it be that "may not be aligned to a 16-byte boundary" means that this memory location has been aligned to a smaller value (4 or 8 bytes) before for some other purposes and now to execute SSE instructions on this memory you need to load it into a register explicitly?
Data that's aligned on a 16 byte boundary will have a memory address that's an even number — strictly speaking, a multiple of two. Each byte is 8 bits, so to align on a 16 byte boundary, you need to align to each set of two bytes.
Similarly, memory aligned on a 32 bit (4 byte) boundary would have a memory address that's a multiple of four, because you group four bytes together to form a 32 bit word.