Need information on Bilinear Maps - cryptography

I've googled the following URL's and need more simple information regarding Bilinear Maps
Intro to Bilinear Maps -- by Bethencourt and
http://en.wikipedia.org/wiki/Bilinear_map
Lecture 25: Pairing-Based Cryptography -- MIT course
I would like to know in a simple to understand framework
1) what is a bilinear pairing -- an example would be great
2) how is it useful say in CP-ABE -- ciphertext policy attribute based encryption schema
Thanks

A pairing, in cryptography, is a way to make three-party operations.
Suppose that you have three groups G1, G2 and G3, in which discrete logarithm is hard. Let's note group operation additively (with a '+' sign) in G1 and G2, and multiplicatively in G3. A pairing e is a function which takes one element of G1 and one element of G2, and outputs an element of G3, such that, for all integers a and b, and all group elements X1 and Y1 (from G1) and X2 and Y2 (from G2), you get:
e(X1 + X2, Y1) = e(X1, Y1) e(X2, Y1) (pairing is linear in the first parameter)
e(X1, Y1 + Y2) = e(X1, Y1) e(X1, Y2) (pairing is linear in the second parameter)
e(aX, bY) = e(X, Y)ab (actually a consequence of the bilinearity explained above)
An example of a very weak pairing is the following: let p and q be two prime numbers such that q divides p-1. Let g be a multiplicative generator of a sub-group or order q modulo p (i.e. g is not 1, but gq = 1 mod p). Define G1 and G2 to be the integers modulo q, with addition as group operation. Define G3 to be the subgroup generated by g. Then, define e as: e(X, Y) = gXY mod p. This gives you a non-degenerate pairing ("non-degenerate" means that the pairing can return values other than 1). But it is useless for cryptography because "discrete logarithm" in G1 and G2 is a matter of a simple modular division, i.e. very easy to compute efficiently (because we used integer addition as group law).
A non-weak pairing can be used for identity-based cryptography (where the public key of someone is their email address, not some mathematical object linked to the address through a signed certificate -- the point being, precisely, to avoid a PKI). It can also be used for three-party Diffie-Hellman, or more generally protocols which involve three entities at a time (e.g. protocols for "electronic cash" or some voting systems). See this page for some details and links.
The currently only known cryptographically strong pairings, but still usable in practice, are based on specially crafted elliptic curves. See Ben Lynn's PhD dissertation for a mathematical introduction, and PBC for the implementation. An "easy" variant will make G1 and G2 an elliptic curve over a field GF(p) (for a prime integer p), and G3 will be a multiplicative subgroup of the invertible elements in GF(p2). Be warned that it is somewhat higher-level mathematics than "plain" elliptic curves (you have to know how field extensions work).

Related

Public key Exceeding Mod P? A Clarification Request On The Discrete Logarithm Problem

I tried to observe / implement the discrete logarithm problem but I noticed something about it; but before I get into it let me give some clarification which is open to correction.
a = b^x mod P
Where as
a = the public key of the address;
b = the generator point of the secp256k1 koblitz curve (this is the
curve in context);
x = the discrete log;
P = the modular integer.
I coupled all parameters below:
A =
044f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa385b6b1b8ead809ca67454d9683fcf2ba03456d6fe2c4abe2b07f0fbdbb2f1c1
(uncompressed public key)
034f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa :
(compressed public key)
B = 04 79BE667E F9DCBBAC 55A06295 CE870B07 029BFCDB 2DCE28D9 59F2815B
16F81798 483ADA77 26A3C465 5DA4FBFC 0E1108A8 FD17B448 A6855419
9C47D08F FB10D4B8 (uncompressed generator point)
02 79BE667E F9DCBBAC 55A06295 CE870B07 029BFCDB 2DCE28D9 59F2815B
16F81798 (compress generator point)
X = ?
P = FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFE
FFFFFC2F
I don't actually know what part of the parameters I should use ( compressed or uncompressed)
N. B : I tried the uncompressed public key to Mod P but the uncompressed public key exceeded the Mod P in size.
What should I do about this?
a = b^x mod P Where as
a = the public key of the address;
b = the generator point of the secp256k1 koblitz curve (this is the
curve in context);
x = the discrete log;
P = the modular integer.
We are given a discrete logarithm problem (DLOG) ( also called the index calculus ); That is given a, b, and P find x such that a = b^x mod P is held. The above is actually the multiplicative notation for finite field DLOG as used by OP. ECC DLOG is additive and has different notation as;
That is given points A and the base B find x such that A = [x]B is held on the curve E(FP). [x]B simply means that add the point B x-times to itself.
Compression
The beginning byte provides information about compression.
02 compression and choose y
03 compression and choose -y
04 No compression
To find the y, put the x into the curve equation and solve the quadratic residue by the Tonelli-Shanks algorithm.
In your case, both are given, no problem. Use the uncompressed public key.
The current record for secp256k1 is 114-bit On 16 June 2020 by Aleksander Zieniewic and they provided their software. So, if you don't have a low target, you cannot break the discrete log.
I tried the uncompressed public key to Mod P but the uncompressed public key exceeded the Mod P in size.
A point Q in the Elliptic curve when used affine coordinate system it has two coordinates as Q=(x,y) where x,y from the defining field (P in your case). When checking a point Q is either on the curve or not put x and y into the curve equation y^2 = x^3+ax+b and check the equality.
To uncompress insert the value of x into the equation x^3+ax+b mod P to get let say value a, then use the Tonelli-Shanks algorithm to find the square root of a in this equation y^2 = a mod P to find y and -y. According to compression value choose y or -y.
update per comment
I tried using the compressed public key but it was still bigger than mod p.
Compression a point requires information about what is the compression. Now you have given two forms of the public key a;
No compression: since the beginning starts with 04
Compression but choose the -y since starts wiht 03
Using capitals here not to confuse with hex a;
A = 04
4f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa
385b6b1b8ead809ca67454d9683fcf2ba03456d6fe2c4abe2b07f0fbdbb2f1c1
A = 03
4f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa
You can use the curve equation to derive the second part with chosen -y
And you can compare the coordinate values with
p = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F
a = 0x4f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa
if a>p:
print("a")
or use yours eye and mind;
P = FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F
x(A)= 4f355bdcb7cc0af728ef3cceb9615d90684bb5b2ca5f859ab0f0b704075871aa
y(A)= 385b6b1b8ead809ca67454d9683fcf2ba03456d6fe2c4abe2b07f0fbdbb2f1c1

Comparing complexity of O(n+m) and O(max(n,m))

I had a job interview today. And was asked about complexity of std:set_intersection. When I was answering I mentioned that
O(n+m)
is equal to:
O(max(n,m))
I was told that this is incorrect. I was unsuccessfully trying to show equivalence with:
O(0.5*(n+m)) ≤ O(max(n,m)) ≤ O(n+m)
My question is: am I really incorrect?
For all m, n ≥ 0 it is valid that max(m, n) ≤ m + n → max(m, n) in O(m + n), and m + n ≤ 2max(m, n) → m + n in O(max(m, n)).
Thus O(max(m, n)) = O(m + n).
ADDENDUM: If f belongs O(m + n) then a constant D > 0 exists, that f(n, m) < D * (m + n) for m and n large enough. Thus f(n, m) < 2 D * max(m, n), and O(m + n) must be a subset of O(max(m, n)). The proof of O(max(m, n)) is a subset of O(m + n) is made analogously.
Well you have totally right about O(n+m) is equal to O(max(n,m)),even more precise we can prove Θ(n+m)=Θ(max(n,m) which is more tight and proves your sentence. The mathematical proof is (for both big-O and Θ) very simple and easy to understand with common sense. So since we have a mathematical proof which is a way to say something but in a more well defined and strict way which doesn't leaves any ambiguity.
Though you was (wrongly) told that this is incorrect because if we want to be very precise this is not the appropriate - mathematical way to express that order of max(m,n) is same as m+n. You used the words "is equal" referring to big-O notation but what is the definition of big-O notation?
It is referred to Sets. Saying max(n+m) belongs to O(m+n) is the
most correct way and vice versa m+n belongs to O(max(m,n)). In big O
notation is commonly used and accepted to say m+n = O(max(n,m)).
The problem caused is that you didn't try to refer to the order of a function like f is O(g) but you tried to compare Sets O(f) and O(g).But proving two infinite sets are equal is not easy (and that may confused the interviewer).
We can say Sets A and B are identical(or equal) when contain same elements (we do not try to compare but instead refer to elements they contain so they must be finite). And even identification can't be easily applied when talking about Big O Sets.
Big O of F is used to notate that we are talking about the Set that
contains all functions with order greater or equal than F. How many
functions are there??
Infinite since F+c is contained and c can take infinite values.
How could you say two different Sets are identical(or equal) when they are
infinite ,well it is not that simple.
So I understand what you are thinking that n+m and max(n,m) have same
order but **the right way to express that** is by saying n+m is
O(max(n,m)) and max(n,m)is O(m+n) ( O(m+n) is equal to O(max(m,n))
may better requires a proof).
One more thing, we said that these functions have same order and this is absolutely mathematically correct but when trying to do optimization of an algorithm and you may need to take into account some lower order factors then maybe they give you slightly different results but the asymptotic behavior is proved to be the same.
CONCLUSION
As you can read in wikipedia (and in all cs courses in every university or in every algorithm book) Big O/θ/Ω/ω/ο notations helps us compare functions and find their order of growth and not for Sets of Functions and this is why you were told you were wrong. Though is easy to say O(n^2) is subset of O(n) it is very difficult to compare infinite to say if two sets are identical. Cantor have worked on categorizing infinite sets, for example we know that natural numbers are countable infinite and real numbers are uncountable infinite so real numbers are more than natural numbers even though both are infinite. It is getting very complicating when trying t order and categorize infinite sets and this would be more of a research in maths than a way of comparing functions.
UPDATE
It turns out you could simply prove O(n+m) equals to O(max(n,m)):
for every function F which belongs to O(n+m) this means that there are constant c and k such:
F <= c(n+m) for every n>=k and m>=k
then also stands:
F <= c(n+m)<= 2c*max(n,m)
so F belongs to O(max(n,m)) and as a result O(m+n) is subset of O(max(n,m)).
Now consider F belongs to O(max(n,m)) then there are constants c and k such:
F <= c*max(n+m) for every n>=k and m>=k
and we also have:
F <= c*max(n+m)<=2c(m+n) for every n>=k and m>=k
so there is c'=2c and with same k by definition: F is O(m+n) and as a result O(max(n,m)) is subset of O(n+m). Because we proved O(m+n) is subset of O(max(n,m)) we proved that O(max(m,n)) and O(m+n) are equal and this mathematical proof proves you had totally right without any doubt.
Finally note that proving that m+n is O(max(n,m)) and max(n,m) is O(m+n) doesn't proves immediately that sets are equal (we need a proof for that) as your saying it just proves that functions have same order but we didn't examine the sets. Though it is easy to see (in general case) that if f is O(g) and g is O(F) then you can easily prove in that case the big O sets equality like we did in the previous paragraph.
We'll show by rigorous Big-O analysis that you are indeed correct, given one possible choice of parameter of growth in your analysis. However, this does not necessarily mean that the viewpoint of the interviewer is incorrect, rather that his/her choice of parameter of growth differs. His/her prompt that your answer was outright incorrect, however, is questionable: you've possibly simply used two slightly different approaches to analyzing the asymptotic complexity of std::set_intersection, both leading to the general consensus that the algorithm runs in linear time.
Preparations
Lets start by looking at the reference of std::set_intersection at cppreference (emphasis mine)
http://en.cppreference.com/w/cpp/algorithm/set_intersection
Parameters
first1, last1 - the first range of elements to examine
first2, last2 - the second range of elements to examine
Complexity
At most 2·(N1+N2-1) comparisons, where
N1 = std::distance(first1, last1)
N2 = std::distance(first2, last2)
std::distance itself is naturally linear (worst case: no random access)
std::distance
...
Returns the number of elements between first and last.
We'll proceed to briefly recall the basic of Big-O notation.
Big-O notation
We loosely state the definition of a function or algorithm f being in O(g(n)) (to be picky, O(g(n)) being a set of functions, hence f ∈ O(...), rather than the commonly misused f(n) ∈ O(...)).
If a function f is in O(g(n)), then c · g(n) is an upper
bound on f(n), for some non-negative constant c such that f(n) ≤ c · g(n)
holds, for sufficiently large n (i.e. , n ≥ n0 for some constant
n0).
Hence, to show that f ∈ O(g(n)), we need to find a set of (non-negative) constants (c, n0) that fulfils
f(n) ≤ c · g(n), for all n ≥ n0, (+)
We note, however, that this set is not unique; the problem of finding the constants (c, n0) such that (+) holds is degenerate. In fact, if any such pair of constants exists, there will exist an infinite amount of different such pairs.
We proceed with the Big-O analysis of std::set_intersection, based on the already known worst case number of comparisons of the algorithm (we'll consider one such comparison a basic operation).
Applying Big-O asymptotic analysis to the set_intersection example
Now consider two ranges of elements, say range1 and range2, and assume that the number of elements contained in these two ranges are m and n, respectively.
Note! Already at this initial stage of the analys do we make a choice: we choose to study the problem in terms of two different parameters of growth (or rather, focusing on the largest one of these two). As we shall see ahead, this will lead to the same asymptotic complexity as the one stated by the OP. However, we could just as well choose to let k = m+n be the parameter of choice: we would still conclude that std::set_intersection is of linear-time complexity, but rather in terms of k (which is m+n which is not max(m, n)) than the largest of m and n. These are simply the preconditions we freely choose to set prior to proceeding with our Big-O notation/asymptotic analysis, and it's quite possibly that the interviewer had a preference of choosing to analyze the complexity using k as parameter of growth rather than the largest of its two components.
Now, from above we know that as worst case, std::set_intersection will run 2 · (m + n - 1) comparisons/basic operations. Let
h(n, m) = 2 · (m + n - 1)
Since the goal is to find an expression of the asymptotic complexity in terms of Big-O (upper bound), we may, without loss of generality, assume that n > m, and define
f(n) = 2 · (n + n - 1) = 4n - 2 > h(n,m) (*)
We proceed to analyze the asymptotic complexity of f(n), in terms of Big-O notation. Let
g(n) = n
and note that
f(n) = 4n - 2 < 4n = 4 · g(n)
Now (choose to) let c = 4 and n0 = 1, and we can state the fact that:
f(n) < 4 · g(n) = c · g(n), for all n ≥ n0, (**)
Given (**), we know from (+) that we've now shown that
f ∈ O(g(n)) = O(n)
Furthermore, since `(*) holds, naturally
h ∈ O(g(n)) = O(n), assuming n > m (i)
holds.
If we switch our initial assumption and assume that m > n, re-tracing the analysis above will, conversely, yield the similar result
h ∈ O(g(m)) = O(m), assuming m > n (ii)
Conclusion
Hence, given two ranges range1 and range2 holding m and n elements, respectively, we've shown that the asymptotic complexity of std::set_intersection applied two these two ranges is indeed
O(max(m, n))
where we're chosen the largest of m and n as the parameter of growth of our analysis.
This is, however, not really valid annotation (at least not common) when speaking about Big-O notation. When we use Big-O notation to describe the asymptotic complexity of some algorithm or function, we do so with regard to some single parameter of growth (not two of them).
Rather than answering that the complexity is O(max(m, n)) we may, without loss of generality, assume that n describes the number of elements in the range with the most elements, and given that assumption, simply state than an upper bound for the asymptotic complexity of std::set_intersection is O(n) (linear time).
A speculation as to the interview feedback: as mentioned above, it's possible that the interviewer simply had a firm view that the Big-O notation/asymptotic analysis should've been based on k = m+n as parameter of growth rather than the largest of its two components. Another possibility could, naturally, be that the interviewer simply confusingly queried about the worst case of actual number of comparisons of std::set_intersection, while mixing this with the separate matter of Big-O notation and asymptotic complexity.
Final remarks
Finally note that the analysis of worst case complexity of std::set_intersect is not at all representative for the commonly studied non-ordered set intersection problem: the former is applied to ranges that are already sorted (see quote from Boost's set_intersection below: the origin of std::set_intersection), whereas in the latter, we study the computation of the intersection of non-ordered collections.
Boost: set_intersection
Description
set_intersection constructs a sorted range that is the intersection
of the sorted ranges rng1 and rng2. The return value is the
end of the output range.
As an example of the latter, the Intersection set method of Python applies to non-ordered collections, and is applied to say sets s and t, it has an average case and a worst-case complexity of O(min(len(s), len(t)) and O(len(s) * len(t)), respectively. The huge difference between average and worst case in this implementation stems from the fact that hash based solutions generally works very well in practice, but can, for some applications, theoretically have a very poor worst-case performance.
For additional details of the latter problem, see e.g.
Intersection of two unsorted sets or lists # SE-CSTheory

Generating Matched Pairs for Statistical Analysis

In my study, a person is represented as a pair of real numbers (x, y). x is on [30, 80] and y is [60, 120]. There are two types of people, A and B. I have ~300 of each type. How can I generate the largest (or even a large) set of pairs of one person from A with one from B: ((xA, yA), (xB, yB)) such that each pair of points is close? Two points are close if abs(x1-x2) < dX and abs(y1 - y2) < dY. Similar constraints are acceptable. (That is, this constraint is roughly a Manhattan metric, but euclidean/etc is ok too.) Not all points need be used, but no point can be reused.
You're looking for the Hungarian Algorithm.
Suggested formulation: A are rows, B are columns, each cell contains a distance metric between Ai and Bi, e.g. abs(X(Ai)-X(Bi)) + abs(Y(Ai)-Y(Bi)). (You can normalize the X and Y values to [0,1] if you want distances to be proportional to the range of each variable)
Then use the Hungarian Algorithm to minimize matching weight.
You can filter out matches with distances over your threshold. If you're worried that this filtering might cause the approach to be sub-optimal, you could set distances over your threshold to a very high number.
There are many implementations of this algorithm. A short search found one in any conceivable language, including VBA for Excel and some online solvers (not sure about matching 300x300 matrix with them, though)
Hungarian algorithm did it, thanks Etov.
Source code available here: http://www.filedropper.com/stackoverflow1

Elliptic curves for ECDSA: choosing "generator"

G(Gx, Gy) -- which is also called generator -- is a point on Elliptic Curve (EC) on Finite field. Finite field size = p -- prime modulus.
Say we have an EC(Fp): y**2 = x**3 + ax + b (mod p)
How should the point G be selected on it?
Does every point has to be found on EC(Fp) and then chosen one of those?
Or Gx\ Gy have to be somehow specific?
The only thing i know: G's order must be a prime number.
P.S.
Sorry for my English and Thank you.

How can I convert a ECDSA curve specification from the SEC2 form into the form needed by Go?

I`m trying to implement ECDSA in the curve secp256k1 in Google Go.
Secp256k1 is defined by the SECG standard (SEC 2, part 2, Recommended Elliptic Curve Domain Parameters over 𝔽p, page 15) in terms of parameters p, a, b, G compressed, G uncompressed, n and h.
In Go's crypto library, the curves are defined by parameters P, N, B, Gx, Gy and BitSize. How do I convert parameters given by SECG into the ones needed by Go?
In the elliptic package of Go,
A Curve represents a short-form Weierstrass curve with a=-3.
So, we have curves of the form y² = x³ - 3·x + B (where both x and y take values in 𝔽P). P and B thus are the parameters to identify a curve, the others are only necessary for the operations on the curve elements which will be used for cryptography.
The SECG standard SEC 2 defines the secp256k1 curve as y² = x³ + a·x + b with a = 0, i.e. effectively y² = x³ + b.
These curves are not the same, independent of which b and B are selected here.
Your conversion is not possible with the elliptic package's Curve class, as it only supports some special class of curves (these with a = -3), while SEC 2 recommends curves from other classes (a = 0 for the ...k1 curves).
On the other hand, the curves with ...r1 in the name seem to have a = -3. And actually, secp256r1 seems to be the same curve which is available in elliptic as p256(). (I didn't prove this, but at least some the hex digits of the uncompressed form of the base point in SEC 2 are the coordinates of the base point in elliptic.)