Limit of multiplicative depth in leveled fully homomorphic encryption - cryptography

Is there any practical estimate of the maximum multiplicative depth D that is supported by leveled FHE schemes?

Related

Homomorphic encryption using Palisade library

To all homomorphic encryption experts out there:
I'm using the PALISADE library:
int plaintextModulus = 65537;
float sigma = 3.2;
SecurityLevel securityLevel = HEStd_128_classic;
uint32_t depth = 2;
//Instantiate the crypto context
CryptoContext<DCRTPoly> cc = CryptoContextFactory<DCRTPoly>::genCryptoContextBFVrns(
plaintextModulus, securityLevel, sigma, 0, depth, 0, OPTIMIZED);
could you please explain (all) the parameters especially intrested in ptm, depth and sigma.
Secondly I am trying to make a Packed Plaintext with the cc above.
cc->MakePackedPlaintext(array);
What is the maximum size of the array? On my local machine (8GB RAM) when the array is larger than ~8000 int64 I get an free(): invalid next size (normal) error
Thank you for asking the question.
Plaintext modulus t (denoted as t here) is a critical parameter for BFV as all operations are performed mod t. In other words, when you choose t, you have to make sure that all computations do not wrap around, i.e., do not exceed t. Otherwise you will get an incorrect answer unless your goal is to compute something mod t.
sigma is the distribution parameter (used for the underlying Learning with Errors problem). You can just set to 3.2. No need to change it.
Depth is the multiplicative depth of the circuit you are trying to compute. It has nothing to with the size of vectors. Basically, if you have AxBxCxD, you have a depth 3 with a naive approach. BFV also supports more efficient binary tree evaluation, i.e., (AxB)x(CxD) - this option will reduce the depth to 2.
BFV is a scheme that supports packing. By default, the size of packed ciphertext is equal to the ring dimension (something like 8192 for the example you mentioned). This means you can pack up to 8192 integers in your case. To support larger arrays/vectors, you would need to break them into batches of 8192 each and encrypt each one separately.
Regarding your application, the CKKS scheme would probably be a much better option (I will respond on the application in more detail in the other thread).
I have some experience with the SEAL library which also uses the BFV encryption scheme. The BFV scheme uses modular arithmetic and is able to encrypt integers (not real numbers).
For the parameters you're asking about:
The Plaintext Modulus is an upper bound for the input integers. If this parameter is too low, it might cause your integers to overflow (depending on how large they are of course)
The Sigma is the distribution parameter for Gaussian noise generation
The Depth is the circuit depth which is the maximum number of multiplications on a path
Also for the Packed Plaintext, you should use vectors not arrays. Maybe that will fix your problem. If not, try lowering the size and make several vectors if necessary.
You can determine the ring dimension (generated by the crypto context based on your parameter settings) by using cc->GetRingDimension() as shown in line 113 of https://gitlab.com/palisade/palisade-development/blob/master/src/pke/examples/simple-real-numbers.cpp

What are the advantages and disadvantages of using the crossover genetic operator?

For example, we have this problem:
Maximise the function f(X) = X^2 , with 0 ≤ X ≤ 31
Using binary encoding we can represent individuals using 5 bits. After undergoing a selection method, we get to the genetic operators.
For this problem (or any optimisation problem), what are the advantages and disadvantages of the following:
High or Low crossover rate
Using 1-Point crossover
Using multi-point crossover
Using Uniform crossover
Here's what I came up with so far:
High crossover rates and multi-point crossover can decrease the quality of parents with good fitness, and produce worse offspring
Low crossover rates mean the solution will take longer to converge to some optima
It's hard to give a good answer as more information is needed what exactly the 5 bits represent, but I gave it a try:
A high crossover rate causes the genomes in the next generation to be more random, as there will be more genomes that are a mix of previous generation genomes
A low crossover rate keeps fit genomes from the previous generation, although it decreases the chance that a very fit genome will be produced by crossover operation
Uniform crossover will create genomes that will be very different from their parents if their parents are not similar. If its parents are similar, the offspring will be similar to its parents.
Using 1-point crossover means that offspring genomes will be less diverse, they will be quite similar to their parents.
Using multi-point crossover is basically a mix between 1-point and uniform, depending on the amount of points.

Proof that k-means always converges?

I understand k-means algorithms steps.
However I'm not sure if the algorithm will always converge? Or can the observations always switch from one centroid to another?
The algorithm always converges (by-definition) but not necessarily to global optimum.
The algorithm may switch from centroid to centroid but this is a parameter of the algorithm (precision, or delta). This is sometimes refered as "cycling". The algorithm after a while cycles through centroids. There are two solutions (which both can be used at the same time). Precision parameter, maximum number of iterations parameter.
Precision parameter, if centroids amount of change is less than a threshold delta, stop the algorithm.
Max Num Iterations, if algorithm reaches that number of iterations stop the algorithm.
Note that the above schemes do not spoil the convergence characteristics of the algorithm. it still will converge but not necessarily to global optimum (this is irrelevant of the scheme used, as in many optimisation algorithms).
You may be interested in the related question on stats.SE Cycling in k-means algorithm and a referenced proof of convergence

How to calculate modulus of large numbers with large prime?

I am working with elliptic curve cryptography on software environment. I wish to inquire how to efficiently implement the modulo operation of large numbers with respect to a large prime number.
e.g. (192 bit number) mod (192 bit mersenne prime)
If there are any tricks or algorithms you can refer that would be very helpful as I am working with resource constrained sensor nodes.
There is no 192-bit Mersene prime, as considered in the question.
Implementing modular reduction of a 192-bit integer x modulo another 192-bit prime p is very straightforward: the result is x when x<p, or x-p otherwise.
Perhaps the question really is about efficient modular reduction modulo a 192-bit prime p of some larger quantity, for a prime p as commonly used in Elliptic Curve Cryptography. Such primes are often chosen in a way allowing efficient modular reduction. For example, for P-192, the prime modulus p is specified to be 6277101735386680763835789423207666416083908700390324961279 which is fffffffffffffffffffffffffffffffeffffffffffffffffh
or 2192-264-1. This p is so near (232)6 that when working in base 232, estimation of a quotient digit in modular reduction modulo p is very easy, much like estimating a new digit when performing schoolbook Euclidian division by 999899 in base 10 is easy: much of the time, the leftmost digit of what remains of the dividend is that new digit of the quotient.

What are the downsides of convolution by FFT compared to realspace convolution?

So I am aware that a convolution by FFT has a lower computational complexity than a convolution in real space. But what are the downsides of an FFT convolution?
Does the kernel size always have to match the image size, or are there functions that take care of this, for example in pythons numpy and scipy packages? And what about anti-aliasing effects?
FFT convolutions are based on the convolution theorem, which states that given two functions f and g, if Fd() and Fi() denote the direct and inverse Fourier transform, and * and . convolution and multiplication, then:
f*g = Fi(Fd(d).Fd(g))
To apply this to a signal f and a kernel g, there are some things you need to take care of:
f and g have to be of the same size for the multiplication step to be possible, so you need to zero-pad the kernel (or input, if the kernel is longer than it).
When doing a DFT, which is what FFT does, the resulting frequency domain representation of the function is periodic. This means that, by default, your kernel wraps around the edge when doing the convolution. If you want this, then all is great. But if not, you have to add an extra zero-padding the size of the kernel to avoid it.
Most (all?) FFT packages only work well (performance-wise) with sizes that do not have any large prime factors. Rounding the signal and kernel size up to the next power of two is a common practice that may result in a (very) significant speed-up.
If your signal and kernel sizes are f_l and g_l, doing a straightforward convolution in time domain requires g_l * (f_l - g_l + 1) multiplications and (g_l - 1) * (f_l - g_l + 1) additions.
For the FFT approach, you have to do 3 FFTs of size at least f_l + g_l, as well as f_l + g_l multiplications.
For large sizes of both f and g, the FFT is clearly superior with its n*log(n) complexity. For small kernels, the direct approach may be faster.
scipy.signal has both convolve and fftconvolve methods for you to play around. And fftconvolve handles all the padding described above transparently for you.
While fast convolution has better "big O" complexity than direct form convolution; there are a few drawbacks or caveats. I did some thinking about this topic for an article I wrote a while back.
Better "big O" complexity is not always better. Direct form convolution can be faster than using FFTs for filters smaller than a certain size. The exact size depends on the platform and implementations used. The crossover point is usually in the 10-40 coefficient range.
Latency. Fast convolution is inherently a blockwise algorithm. Queueing up hundreds or thousands of samples at a time before transforming them may be unacceptable for some real-time applications.
Implementation complexity. Direct form is simpler in terms of the memory, code space and in the theoretical background of the writer/maintainer.
On a fixed point DSP platform (not a general purpose CPU): the limited word size considerations of fixed-point FFT make large fixed point FFTs nearly useless. At the other end of the size spectrum, these chips have specialized MAC intstructions that are well designed for performing direct form FIR computation, increasing the range over which te O(N^2) direct form is faster than O(NlogN). These factors tend to create a limited "sweet spot" where fixed point FFTs are useful for Fast Convolution.