CRC of input data shorter than poly width - vba

I'm in the process of writing a paper during my studies on implementing CRC in Excel with VBA.
I've created a fairly straightforward, modular algorithm that uses Ross's parametrized model.
It works flawlessly for any length polynomian and any combination of parameters except for one; when the length of the input data is shorter than the width of the polynomial and an initial value is chosen ("INIT") that has any bits set which are "past" the length of the input data.
Example:
Input Data: 0x4C
Poly: 0x1021
Xorout: 0x0000
Refin: False
Refout: False
If I choose no INIT or any INIT like 0x##00, I get the same checksum as any of the online CRC generators. If any bit of the last two hex characters is set - like 0x0001 - my result is invalid.
I believe the question boils down to "How is the register initialized if only one byte of input data is present for a two byte INIT parameter?"

It turns out I was misled (or I very well may have misinterpreted) the explaination of how to use the INIT parameter on the sunshine2k website.
The INIT value must not be XORed with the first n input bytes per se (n being the width of the register / cropped poly / checksum), but must only be XORed in after the n 0-Bits have been appended to the input data.
This specification does not matter when input data is equal or larger than n bytes, but it does matter when the input data is too short.

Related

Homomorphic encryption using Palisade library

To all homomorphic encryption experts out there:
I'm using the PALISADE library:
int plaintextModulus = 65537;
float sigma = 3.2;
SecurityLevel securityLevel = HEStd_128_classic;
uint32_t depth = 2;
//Instantiate the crypto context
CryptoContext<DCRTPoly> cc = CryptoContextFactory<DCRTPoly>::genCryptoContextBFVrns(
plaintextModulus, securityLevel, sigma, 0, depth, 0, OPTIMIZED);
could you please explain (all) the parameters especially intrested in ptm, depth and sigma.
Secondly I am trying to make a Packed Plaintext with the cc above.
cc->MakePackedPlaintext(array);
What is the maximum size of the array? On my local machine (8GB RAM) when the array is larger than ~8000 int64 I get an free(): invalid next size (normal) error
Thank you for asking the question.
Plaintext modulus t (denoted as t here) is a critical parameter for BFV as all operations are performed mod t. In other words, when you choose t, you have to make sure that all computations do not wrap around, i.e., do not exceed t. Otherwise you will get an incorrect answer unless your goal is to compute something mod t.
sigma is the distribution parameter (used for the underlying Learning with Errors problem). You can just set to 3.2. No need to change it.
Depth is the multiplicative depth of the circuit you are trying to compute. It has nothing to with the size of vectors. Basically, if you have AxBxCxD, you have a depth 3 with a naive approach. BFV also supports more efficient binary tree evaluation, i.e., (AxB)x(CxD) - this option will reduce the depth to 2.
BFV is a scheme that supports packing. By default, the size of packed ciphertext is equal to the ring dimension (something like 8192 for the example you mentioned). This means you can pack up to 8192 integers in your case. To support larger arrays/vectors, you would need to break them into batches of 8192 each and encrypt each one separately.
Regarding your application, the CKKS scheme would probably be a much better option (I will respond on the application in more detail in the other thread).
I have some experience with the SEAL library which also uses the BFV encryption scheme. The BFV scheme uses modular arithmetic and is able to encrypt integers (not real numbers).
For the parameters you're asking about:
The Plaintext Modulus is an upper bound for the input integers. If this parameter is too low, it might cause your integers to overflow (depending on how large they are of course)
The Sigma is the distribution parameter for Gaussian noise generation
The Depth is the circuit depth which is the maximum number of multiplications on a path
Also for the Packed Plaintext, you should use vectors not arrays. Maybe that will fix your problem. If not, try lowering the size and make several vectors if necessary.
You can determine the ring dimension (generated by the crypto context based on your parameter settings) by using cc->GetRingDimension() as shown in line 113 of https://gitlab.com/palisade/palisade-development/blob/master/src/pke/examples/simple-real-numbers.cpp

Different usage of <PAD>, <EOS> and <GO> tokens

I found that there are many different usages of <PAD>, <EOS>, and <GO> tokens.
Personally, I separate those three tokens and assign different embeddings to them, assigning an all-zero embedding vector to <PAD> token specifically (with RNN-based seq2seq model).
The majority of codes show that <PAD>, <EOS> and <GO> are all represented as <PAD> token.
I want to know if there is the optimum usage of those tokens (in terms of RNN-based models or transformer-based models).
These are the special tokens used in seq2seq:
GO - the same as on the picture below - the first token which is fed to the decoder along with the though vector in order to start generating tokens of the answer
EOS - "end of sentence" - the same as on the picture below - as soon as decoder generates this token we consider the answer to be complete (you can't use usual punctuation marks for this purpose cause their meaning can be different)
UNK - "unknown token" - is used to replace the rare words that did not fit in your vocabulary. So your sentence My name is guotong1988 will be translated into My name is unk.
PAD - your GPU (or CPU at worst) processes your training data in batches and all the sequences in your batch should have the same length. If the max length of your sequence is 8, your sentence My name is guotong1988 will be padded from either side to fit this length: My name is guotong1988 pad pad pad pad
will help understand better
Reference: https://github.com/nicolas-ivanov/tf_seq2seq_chatbot/issues/15

How to Make a Uniform Random Integer Generator from a Random Boolean Generator?

I have a hardware-based boolean generator that generates either 1 or 0 uniformly. How to use it to make a uniform 8-bit integer generator? I'm currently using the collected booleans to create the binary string for the 8-bit integer. The generated integers aren't uniformly distributed. It follows the distribution explained on this page. Integers with ̶a̶ ̶l̶o̶t̶ ̶o̶f̶ ̶a̶l̶t̶e̶r̶n̶a̶t̶I̶n̶g̶ ̶b̶I̶t̶s̶ the same number of 1's and 0's such as 85 (01010101) and -86 (10101010) have the highest chance to be generated and integers with a lot of repeating bits such as 0 (00000000) and -1 (11111111) have the lowest chance.
Here's the page that I've annotated with probabilities for each possible 4-bit integer. We can see that they're not uniform. 3, 5, 6, -7, -6, and -4 that have the same number of 1's and 0's have ⁶/₁₆ probability while 0 and -1 that all of their bits are the same only have ¹/₁₆ probability.
.
And here's my implementation on Kotlin
Based on your edit, there appears to be a misunderstanding here. By "uniform 4-bit integers", you seem to have the following in mind:
Start at 0.
Generate a random bit. If it's 1, add 1, and otherwise subtract 1.
Repeat step 2 three more times.
Output the resulting number.
Although the random bit generator may generate bits where each outcome is as likely as the other to be randomly generated, and each 4-bit chunk may be just as likely as any other to be randomly generated, the number of bits in each chunk is not uniformly distributed.
What range of integers do you want? Say you're generating 4-bit integers. Do you want a range of [-4, 4], as in the 4-bit random walk in your question, or do you want a range of [-8, 7], which is what you get when you treat a 4-bit chunk of bits as a two's complement integer?
If the former, the random walk won't generate a uniform distribution, and you will need to tackle the problem in a different way.
In this case, to generate a uniform random number in the range [-4, 4], do the following:
Take 4 bits of the random bit generator and treat them as an integer in [0, 15);
If the integer is greater than 8, go to step 1.
Subtract 4 from the integer and output it.
This algorithm uses rejection sampling, but is variable-time (thus is not appropriate whenever timing differences can be exploited in a security attack). Numbers in other ranges are similarly generated, but the details are too involved to describe in this answer. See my article on random number generation methods for details.
Based on the code you've shown me, your approach to building up bytes, ints, and longs is highly error-prone. For example, a better way to build up an 8-bit byte to achieve what you want is as follows (keeping in mind that I am not very familiar with Kotlin, so the syntax may be wrong):
val i = 0
val b = 0
for (i = 0; i < 8; i++) {
b = b << 1; // Shift old bits
if (bitStringBuilder[i] == '1') {
b = b | 1; // Set new bit
} else {
b = b | 0; // Don't set new bit
}
}
value = (b as byte) as T
Also, if MediatorLiveData is not thread safe, then neither is your approach to gathering bits using a StringBuilder (especially because StringBuilder is not thread safe).
The approach you suggest, combining eight bits of the boolean generator to make one uniform integer, will work in theory. However, in practice there are several issues:
You don't mention what kind of hardware it is. In most cases, the hardware won't be likely to generate uniformly random Boolean bits unless the hardware is a so-called true random number generator designed for this purpose. For example, the hardware might generate uniformly distributed bits but have periodic behavior.
Entropy means how hard it is to predict the values a generator produces, compared to ideal random values. For example, a 64-bit data block with 32 bits of entropy is as hard to predict as an ideal random 32-bit data block. Characterizing a hardware device's entropy (or ability to produce unpredictable values) is far from trivial. Among other things, this involves entropy tests that have to be done across the full range of operating conditions suitable for the hardware (e.g., temperature, voltage).
Most hardware cannot produce uniform random values, so usually an additional step, called randomness extraction, entropy extraction, unbiasing, whitening, or deskewing, is done to transform the values the hardware generates into uniformly distributed random numbers. However, it works best if the hardware's entropy is characterized first (see previous point).
Finally, you still have to test whether the whole process delivers numbers that are "adequately random" for your purposes. There are several statistical tests that attempt to do so, such as NIST's Statistical Test Suite or TestU01.
For more information, see "Nondeterministic Sources and Seed Generation".
After your edits to this page, it seems you're going about the problem the wrong way. To produce a uniform random number, you don't add uniformly distributed random bits (e.g., bit() + bit() + bit()), but concatenate them (e.g., (bit() << 2) | (bit() << 1) | bit()). However, again, this will work in theory, but not in practice, for the reasons I mention above.

How to slow down a file source in GNU Radio?

I'm attempting to unpack bytes from an input file in GNU Radio Companion into a binary bitstream. My problem is that the Unpack K Bits block works at the same sample rate as the file source. So by the time the first bit of byte 1 is clocked out, byte 2 has already been loaded. How do I either slow down the file source or speed up the Unpack K Bits block? Is there a way I can tell GNU Radio Companion to repeat each byte from the file source 8 times?
Note that "after pack" is displaying 4 times as much data as "before pack".
My problem is that the Unpack K Bits block works at the same sample rate as the file source
No it doesn't. Unpack K Bits is an interpolator block. In your case the interpolation is 8. For every bytes 8 new bytes are produced.
The result is right, but the time scale of your sink is wrong. You have to change the sampling rate at the second GUI Time Sink to fit the true sampling rate of the flowgraph after the Unpack K Bits.
So instead of 32e3 it should be 8*32e3.
Manos' answer is very good, but I want to add to this:
This is a common misunderstanding for people that just got in touch with doing digital signal processing down at the sample layer:
GNU Radio doesn't have a notion of sampling rate itself. The term sampling rate is only used by certain blocks to e.g. calculate the period of a sine (in the case of the signal source: Period = f_signal/f_sample), or to calculate times or frequencies that are written on display axes (like in your case).
"Slowing down" means "making the computer process samples slower", but doesn't change the signal.
All you need to do is match what you want the displaying sink to show as time units with what you configure it to do.

Mutable array, Objective-c, or variable array, c. Any difference in performance?

I have a multidimensional array (3D matrix) of unknown size, where each element in this matrix is of type short int.
The size of the matrix can be approximated to be around 10 x 10 x 1,000,000.
As I see it I have two options: Mutable Array (Objective-c) or Variable Array (c).
Are there any difference in reading writing to these arrays?
How large will these files become when I save to file?
Any advice would be gratefully accepted.
Provided you know the size of the array at the point of creation, i.e. you don't need to dynamically change the bounds, then a C array of short int with these dimensions will win easily - for reasons such as no encoding of values as objects and direct indexing.
If you write the array in binary to a file then it will just be the number of elements multiplied by sizeof(short int) without any overhead. If you need to also stored the dimensions that is 3 * sizeof(int) - 12 or 24 bytes.
The mutable array will be slower (albeit not by much) since its built on a C array. How large will the file be when you save this array?
It will take you more than 10x10x10000000 bytes because you'll have to encode it in a way where you can recall the matrix. This part is really up to you. For a 3D array, you'll have to use a special character/format in order to denote a 3D array. It depends on how you want to do this, but it will take 1 byte for every digit of every number + 1 char for the space you'll put between elements in the same row + (1 NL For every 2nd dimension in your array * n) + (1 other character for 3d values * n *n)
It might be easier to Stick each Row into its own file, and then stick the columns below it like you normally would. Then in a new file, I would start putting the 3d elements such that each line lines up with the column number of the 2nd dimension. That's just me though, its up to you.