NP verifier-based definition - definition

i'm a computer science student and i'm having some problem understanding the verifier based definition of NP problems.
The definition says that a problem is in NP if can be verified in polinomial time by a deterministic turing machine, given a "certificate".
But what happens, if the certificate is exactly the problem solution? It's only a bit, and it's obviuosly polinomially limited by the input size, and it's obviously verifiable in constant, thus polinomial time.
Therefore, each decision problem would belong to NP.
Where am i wrong?

But what happens, if the certificate is exactly the problem solution? It's only a bit, and it's obviuosly polinomially limited by the input size, and it's obviously verifiable in constant, thus polinomial time.
Why "obviously"? You might have to spend an exponential amount of time verifying the solution, in which case the problem need not be in NP. The point is that, even though the certificate is a single bit for a decision problem, you don't know whether that bit should be zero or one to solve the problem. (If you always did know that, then any decision problem in P or in NP would be solvable in constant time.)

Not all problems can be verified in polynomial time even though the solution is polynomial in length. Lets consider the Travelling Salesman Problem. Given a solution you can only verify whether the given solution is a tour of the cities but you cannot tell whether it is the minimum length tour, unless you explore all possible tours.
Hence, in most of the cases the decision problem is NP-Complete (e.g. to find whether the set of cities contain a tour) while the optimization problems are NP-Hard (e.g. finding the minimum length tour)

Related

Optimization has found the global minimum but converges to a local one

I am using the stochastic optimization algorithm CMA-ES. Although it finds the global minimum in the first cycles ( I know because it is a made-up benchmark test) the algorithm after some cycles converge to another minimum (a local one since it has a bigger cost function value).
Does everyone have experience in the matter?
Do I have to care that it converges to a local minimum since it has found the global one? Is it wrong to just use the global minimum like that and not to care about where the algorithm has converged?
My opinion from the results is that this is happening due to the normal distribution, the global minimum has only a few solutions but the local one has a great percentage of solutions. ( I have tried a lot of different populations values but the result is the same)
Thank you in advance for your help!
It is common to keep a global "best" solution when running evolutionary algorithms, especially if they are the kind that is allowed to move to worse results from a better one.
If you are running the algorithm with an approximate fitness function and getting a good-enough result is okay, you can go with what it converges to. Depending on the problem you are solving, it might be very good or very bad to overfit a solution.
If your fitness function is not an approximation and is the correct metric to optimize, just keep the best performer and use it when you finish running the algorithm.

Confusion about NP-hard and NP-Complete in Traveling Salesman problems

Traveling Salesman Optimization(TSP-OPT) is a NP-hard problem and Traveling Salesman Search(TSP) is NP-complete. However, TSP-OPT can be reduced to TSP since if TSP can be solved in polynomial time, then so can TSP-OPT(1). I thought for A to be reduced to B, B has to be as hard if not harder than A. As I can see in the below references, TSP-OPT can be reduced to TSP. TSP-OPT is supposed to be harder than TSP. I am confused...
References: (1)Algorithm, Dasgupta, Papadimitriou, Vazirani Exercise 8.1 http://algorithmics.lsi.upc.edu/docs/Dasgupta-Papadimitriou-Vazirani.pdf https://cseweb.ucsd.edu/classes/sp08/cse101/hw/hw6soln.pdf
http://cs170.org/assets/disc/dis10-sol.pdf
I took a quick look at the references you gave, and I must admit there's one thing I really dislike in your textbook (1st pdf) : they address NP-completeness while barely mentioning decision problems. The provided definition of an NP-complete problem also somewhat deviates from what I'd expect from a textbook. I assume that was a conscious decision to make the introduction more appealing...
I'll provide a short answer, followed by a more detailed explanation about related notions.
Short version
Intuitively (and informally), a problem is in NP if it is easy to verify its solutions.
On the other hand, a problem is NP-hard if it is difficult to solve, or find a solution.
Now, a problem is NP-complete if it is both in NP, and NP-hard. Therefore you have two key, intuitive properties to NP-completeness. Easy to verify, but hard to find solutions.
Although they may seem similar, verifying and finding solutions are two different things. When you use reduction arguments, you're looking at whether you can find a solution. In that regard, both TSP and TSP-OPT are NP-hard, and it is difficult to find their solutions. In fact, the third pdf provides a solution to excercise 8.1 of your textbook, which directly shows that TSP and TSP-OPT are equivalently hard to solve.
Now, one major distinction between TSP and TSP-OPT is that the former is (what your textbook call) a search problem, whereas the latter is an optimization problem. The notion of verifying the solution of a search problem makes sense, and in the case of TSP, it is easy to do, therefore it is NP-complete. For optimization problems, the notion of verifying a solution becomes weird, because I can't think of any way to do that without first computing the size of an optimal solution, which is roughly equivalent to solving the problem itself. Since we do not know an efficient way to verify a solution for TSP-OPT, we cannot say that it is in NP, thus we cannot say that it is NP-complete. (More on this topic in the detailed explanation.)
The tl;dr is that for TSP-OPT, it is both hard to verify and hard to find solutions, while for TSP it is easy to verify and hard to find solutions.
Reductions arguments only help when it comes to finding solutions, so you need other arguments to distinguish them when it comes to verifying solutions.
More details
One thing your textbook is very brief about is what a decision problem is.
Formally, the notion of NP-completeness, NP-hardness, NP, P, etc, were developed in the context of decision problems, and not optimization or search problems.
Here's a quick example of the differences between these different types of problems.
A decision problem is a problem whose answer is either YES or NO.
TSP decision problem
Input: a graph G, a budget b
Output: Does G admit a tour of weight at most b ? (YES/NO)
Here is the search version
TSP search problem
Input: a graph G, a budget b
Output: Find a tour of G of weight at most b, if it exists.
And now the optimization version
TSP optimization problem
Input: a graph G
Output: Find a tour of G with minimum weight.
Out of context, the TSP problem could refer to any of these. What I personally refer to as the TSP is the decision version. Here your textbook use TSP for the search version, and TSP-OPT for the optimization version.
The problem here is that those various problems are strictly distinct. The decision version only ask for existence, while the search version asks for more, it needs one example of such a solution. In practice, we often want to have the actual solution, so more practical approaches may omit to mention decision problems.
Now what about it? The definition of an NP-complete problem was meant for decision problems, so it technically does not apply directly to search or optimization problems. But because the theory behind it is well defined and useful, it is handy to still apply the term NP-complete/NP-hard to search/optimization problem, so that you have an idea of how hard these problems are to solve. So when someone says the travelling salesman problem is NP-complete, formally it should be the decision problem version of the problem.
Obviously, many notions can be extended so that they also cover search problems, and that is how it is presented in your textbook. The differences between decision, search, and optimization, are precisely what the exercises 8.1 and 8.2 try to cover in your textbook. Those exercises are probably meant to get you interested in the relationship between these different types of problems, and how they relate to one another.
Short Version
The decision problem is NP-complete because you can both have a polynomial time verifier for the solution, as well as the fact that the hamiltonian cycle problem is reducible to TSP_DECIDE in polynomial time.
However, the optimization problem is strictly NP-hard, because even though TSP_OPTIMIZE is reducible from the hamiltonian (HAM) cycle problem in polynomial time, you don't have a poly time verifier for a claimed hamiltonian cycle C, whether it is the shortest or not, because you simply have to enumerate all possibilities (which consumes the factorial order space & time).
What the given reference define is, bottleneck TSP
The Bottleneck traveling salesman problem (bottleneck TSP) is a problem in discrete or combinatorial optimization. The problem is to find the Hamiltonian cycle in a weighted graph which minimizes the weight of the most weighty edge of the cycle.
The problem is known to be NP-hard. The decision problem version of this, "for a given length x is there a Hamiltonian cycle in a graph G with no edge longer than x?", is NP-complete. NP-completeness follows immediately by a reduction from the problem of finding a Hamiltonian cycle.
This problem can be solved by performing a binary search or sequential search for the smallest x such that the subgraph of edges of weight at most x has a Hamiltonian cycle. This method leads to solutions whose running time is only a logarithmic factor larger than the time to find a Hamiltonian cycle.
Long Version
The mistake is to say that the TSP is NP complete. Truth is that TSP is NP hard. Let me explain a bit:
The TSP is a problem defined by a set of cities and the distances
between each city pair. The problem is to find a circuit that goes
through each city once and that ends where it starts. This in itself
isn't difficult. What makes the problem interesting is to find the
shortest circuit among all those that are possible.
Solving this problem is quite simple. One merely need to compute the length of all possible circuits, then keep the shortest one. Issue is that the number of such circuits grows very quickly with the number of cities. If there are n cities then this number is factorial of n-1 = (n-1)(n-2)...3.2.
A problem is NP if one can easily (in polynomial time) check that a proposed solution is indeed a solution.
Here is the trick.
In order to check that a proposed tour is a solution of the TSP we need to check two things, namely
That each city is is visited only once
That there is no shorter tour than the one we are checking
We didn't check the second condition! The second condition is what makes the problem difficult to solve. As of today, no one has found a way to check condition 2 in polynomial time. It means that the TSP isn't in NP, as far as we know.
Therefore, TSP isn't NP complete as far as we know. We can only say that TSP is NP hard.
When they write that TSP is NP complete, they mean that the following decision problem (yes/no question) is NP complete:
TSP_DECISION : Given a number L, a set of cities, and distance between all city pairs, is there a tour visiting each city exactly once of length less than L?
This problem is indeed NP complete, as it is easy (polynomial time) to check that a given tour leads to a yes answer to TSPDECISION.

P NP and NP complete clarfication?

This is an answer I found on stack overflow
NP is a complexity class that represents the set of all decision problems for which the instances where the answer is "yes" have proofs that can be verified in polynomial time.
This means that if someone gives us an instance of the problem and a certificate (sometimes called a witness) to the answer being yes, we can check that it is correct in polynomial time.
My question is who is the "we" that checks to see if the solution is correct in polynomial time? Is it a program or does it literally mean a human sitting down and working it out on paper?
In the classical definition, it is a Turing machine. I believe it has been shown that the computers we use today are more-or-less the same to Turing machines in the complexity theory sense (polynomial time on one is polynomial time on the other), see: https://en.wikipedia.org/wiki/Turing_machine#Comparison_with_real_machines

Can variance be replaced by absolute value in this objective function?

Initially I modeled my objective function as follows:
argmin var(f(x),g(x))+var(c(x),d(x))
where f,g,c,d are linear functions
in order to be able to use linear solvers I modeled the problem as follows
argmin abs(f(x),g(x))+abs(c(x),d(x))
is it correct to change variance to absolute value in this context, I'm pretty sure they imply the same meaning as having the least difference between two functions
You haven't given enough context to answer the question. Even though your question doesn't seem to be about regression, in many ways it is similar to the question of choosing between least squares and least absolute deviations approaches to regression. If that term in your objective function is in any sense an error term then the most appropriate way to model the error depends on the nature of the error distribution. Least squares is better if there is normally distributed noise. Least absolute deviations is better in the nonparametric setting and is less sensitive to outliers. If the problem has nothing to do with probability at all then other criteria need to be brought in to decide between the two options.
Having said all this, the two ways of measuring distance are broadly similar. One will be fairly small if and only if the other is -- though they won't be equally small. If they are similar enough for your purposes then the fact that absolute values can be linearized could be a good motivation to use it. On the other hand -- if the variance-based one is really a better expression of what you are interested in then the fact that you can't use LP isn't sufficient justification to adopt absolute values. After all -- quadratic programming is not all that much harder than LP, at least below a certain scale.
To sum up -- they don't imply the same meaning, but they do imply similar meanings; and, whether or not they are similar enough depends upon your purposes.

Are there public key cryptography algorithms that are provably NP-hard to defeat? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Should practical quantum computing become a reality, I am wondering if there are any public key cryptographic algorithms that are based on NP-complete problems, rather than integer factorization or discrete logarithms.
Edit:
Please check out the "Quantum computing in computational complexity theory" section of
the wiki article on quantum computers. It points out that the class of problems quantum computers can answer (BQP) is believed to be strictly easier than NP-complete.
Edit 2:
'Based on NP-complete' is a bad way of expressing what I'm interested in.
What I intended to ask is for a Public Key encryption algorithm with the property that any method for breaking the encryption can also be used to break the underlying NP-complete problem. This means breaking the encryption proves P=NP.
I am responding to this old thread because it is a very common and important question, and all of the answers here are inaccurate.
The short answer to the original question is an unequivocal "NO". There are no known encryption schemes (let alone public-key ones) that are based on an NP-complete problem (and hence all of them, under polynomial-time reductions). Some are "closer" that others, though, so let me elaborate.
There is a lot to clarify here, so let's start with the meaning of "based on an NP-complete problem." The generally agreed upon interpretation of this is: "can be proven secure in a particular formal model, assuming that no polynomial-time algorithms exist for NP-complete problems". To be even more precise, we assume that no algorithm exists that always solves an NP-complete problem. This is a very safe assumption, because that's a really hard thing for an algorithm to do - it's seemingly a lot easier to come up with an algorithm that solves random instances of the problem with good probability.
No encryption schemes have such a proof, though. If you look at the literature, with very few exceptions (see below), the security theorems read like the following:
Theorem: This encryption scheme is provably secure, assuming that no
polynomial-time algorithm exists for
solving random instances of some problem X.
Note the "random instances" part. For a concrete example, we might assume that no polynomial-time algorithm exists for factoring the product of two random n-bit primes with some good probability. This is very different (less safe) from assuming that no polynomial-time algorithm exists for always factoring all products of two random n-bit primes.
The "random instances" versus "worst case instances" issue is what is tripped up several responders above. The McEliece-type encryption schemes are based on a very special random version of decoding linear codes - and not on the actual worst-case version which is NP-complete.
Pushing beyond this "random instances" issue has required some deep and beautiful research in theoretical computer science. Starting with the work of Miklós Ajtai, we have found cryptographic algorithms where the security assumption is a "worst case" (safer) assumption instead of a random case one. Unfortunately, the worst case assumptions are for problems that are not known to be NP complete, and some theoretical evidence suggests that we can't adapt them to use NP-complete problems. For the interested, look up "lattice based cryptography".
Some cryptosystems based on NP-hard problems have been proposed (such as the Merkle-Hellman cryptosystem based on the subset-sum problem, and the Naccache-Stern knapsack cryptosystem based on the knapsack problem), but they have all been broken. Why is this? Lecture 16 of Scott Aaronson's Great Ideas in Theoretical Computer Science says something about this, which I think you should take as definitive. What it says is the following:
Ideally, we would like to construct a [Cryptographic Pseudorandom Generator] or cryptosystem whose security was based on an NP-complete problem. Unfortunately, NP-complete problems are always about the worst case. In cryptography, this would translate to a statement like “there exists a message that’s hard to decode”, which is not a good guarantee for a cryptographic system! A message should be hard to decrypt with overwhelming probability. Despite decades of effort, no way has yet been discovered to relate worst case to average case for NP-complete problems. And this is why, if we want computationally-secure cryptosystems, we need to make stronger assumptions than P≠NP.
This was an open question in 1998:
On the possibility of basing Cryptography on the assumption that P != NP
by Oded Goldreich, Rehovot Israel, Shafi Goldwasser
From the abstract: "Our conclusion is that the question remains open".
--I wonder if that's changed in the last decade?
Edit:
As far as I can tell the question is still open, with recent progress toward an answer of no such algorithm exists.
Adi Akavia, Oded Goldreich, Shafi Goldwasser, and Dana Moshkovitz published this paper in the ACM in 2006: On basing one-way functions on NP-hardness "Our main findings are the following two negative results"
The stanford site Complexity Zoo is helpful in decripting what those two negative results mean.
While many forms have been broken, check out Merkle-Hellman, based on a form of the NP-complete 'Knapsack Problem'.
Lattice cryptography offers the (over)generalized take-home message that indeed one can design cryptosystems where breaking the average case is as hard as solving a particular NP-hard problem (typically the Shortest Vector Problem or the Closest Vector Problem).
I can recommend reading the introduction section of http://eprint.iacr.org/2008/521 and then chasing references to the cryptosystems.
Also, see the lecture notes at http://www.cs.ucsd.edu/~daniele/CSE207C/, and chase links for a book if you want.
Googling for NP-complete and Public key encryption finds False positives ... that are actually insecure. This cartoonish pdf appears to show a public key encyption algorithm based on the minimium dominating set problem. Reading further it then admits to lying that the algorithm is secure ... the underlying problem is NP-Complete but it's use in the PK algorithm does not preserve the difficulty.
Another False positive Google find: Cryptanalysis of the Goldreich-Goldwasser-Halevi cryptosystem from Crypto '97. From the abstract:
At Crypto '97, Goldreich, Goldwasser and Halevi proposed a public-key cryptosystem based on the closest vector problem in a lattice, which is known to be NP-hard. We show that there is a major flaw in the design of the scheme which has two implications: any ciphertext leaks information on the plaintext, and the problem of decrypting ciphertexts can be reduced to a special closest vector problem which is much easier than the general problem.
There is a web site that may be relevant to your interests: Post-Quantum Cryptography.
Here is my reasoning. Correct me if I'm wrong.
(i) ``Breaking'' a cryptosystem is necessarily a problem in NP and co-NP. (Breaking a cryptosystem involves inverting the encryption function, which is one-to-one and computable in polynomial-time. So, given the ciphertext, the plaintext is a certificate that can be verified in polynomial time. Thus querying the plaintext based on the ciphertext is in NP and in co-NP.)
(ii) If there is an NP-hard problem in NP and co-NP, then NP = co-NP. (This problem would be NP-complete and in co-NP. Since any NP language is reducible to this co-NP language, NP is a subset of co-NP. Now use symmetry: any language L in co-NP has -L (its compliment) in NP, whence -L is in co-NP---that is L = --L is in NP.)
(iii) I think that it is generally believed that NP != co-NP, as otherwise there are polynomial-sized proofs that boolean formulas are not satisfiable.
Conclusion: Complexity-theoretic conjectures imply that NP-hard cryptosystems don't exist.
(Otherwise, you have an NP-hard problem in NP and co-NP, whence NP = co-NP---which is believed to be false.)
While RSA and other widely-used cryptographic algorithms are based on the difficulty of integer factorization (which is not known to be NP-complete), there are some public key cryptography algorithms based on NP-complete problems too. A google search for "public key" and "np-complete" will reveal some of them.
(I incorrectly said before that quantum computers would speed up NP-complete problems, but this is not true. I stand corrected.)
As pointed out by many other posters, it is possible to base cryptography on NP-hard or NP-complete problems.
However, the common methods for cryptography are going to be based on difficult mathematics (difficult to crack, that is). The truth is that it is easier to serialize numbers as a traditional key than to create a standardized string that solves an NP-hard problem. Therefore, practical crypto is based on mathematical problems that are not yet proven to be NP-hard or NP-complete (so it is conceivable that some of these problems are in P).
In ElGamal or RSA encryption, breaking it requires the cracking the discrete logarithm, so look at this wikipedia article.
No efficient algorithm for computing general discrete logarithms logbg is known. The naive algorithm is to raise b to higher and higher powers k until the desired g is found; this is sometimes called trial multiplication. This algorithm requires running time linear in the size of the group G and thus exponential in the number of digits in the size of the group. There exists an efficient quantum algorithm due to Peter Shor however (http://arxiv.org/abs/quant-ph/9508027).
Computing discrete logarithms is apparently difficult. Not only is no efficient algorithm known for the worst case, but the average-case complexity can be shown to be at least as hard as the worst case using random self-reducibility.
At the same time, the inverse problem of discrete exponentiation is not (it can be computed efficiently using exponentiation by squaring, for example). This asymmetry is analogous to the one between integer factorization and integer multiplication. Both asymmetries have been exploited in the construction of cryptographic systems.
The widespread belief is that these are NP-complete, but maybe can't be proven so. Note that quantum computers may break crypto efficiently!
Since nobody really answered the question I have to give you the hint: "McEliece". Do some searches on it. Its a proven NP-Hard encryption algorithm. It needs O(n^2) encryption and decryption time. It has a public key of size O(n^2) too, which is bad. But there are improvements which lower all these bounds.