How does SCIP calculate vanillafullstrongbranching-scores? - scip

I want to do some experiments on branchingrules in SCIP (using the python interface).
To be sure some basics of my code work, I tried to mirror the vanillafullstrongbranching of SCIP using the dive functionality.
This works mostly as expected, but I get weird results for nodes that have at least one infeasible child. I dug through PySCIPOpt and the C-code of SCIP and would have expected a big number as the sb score in case (at least) one child is infeasible, as the LP-solver returns a big number for infeasible problems and SCIP is using the product-score by default. Instead of big numbers I get numbers just big enough to be bigger than the other scores.
My questions is: What am I missing in the code? Where and how does SCIP treat the score for infeasible children differently.
As quite some code is required to reproduce this, I created a gist, that makes use of a set-cover instance I also uploaded to github.

I'm pretty sure the strongbranching score never exceeds the cutoffbound of the lp, so infeasible nodes would get a score of the cutoffbound.

Related

GUROBI only uses single core to setup problem with cvxpy (python)

I have a large MILP that I build with cvxpy and want to solve with GUROBI. When I give use the solve() function of cvxpy it take a really really really long time to setup and does not start solving for hours. Whilest doing that only 1 core of my cluster is being used. It is used for 100%. I would like to use multiple cores to build the model so that the process of building the model does not take so long. Running grbprobe also shows that gurobi knows about the other cores and for solving the problem it uses multiple cores.
I have tried to run with different flags i.e. turning presolve off and on or giving the number of Threads to be used (this seemed like i didn't even for the solving.
I also have reduce the number of constraints in the problem and it start solving much faster which means that this is definitively not a problem of the model itself.
The problem in it's normal state should have 2200 constraints i reduce it to 150 and it took a couple of seconds until it started to search for a solution.
The problem is that I don't see anything since it takes so long to get the ""set username parameters"" flag and I don't get any information on what the computer does in the mean time.
Is there a way to tell GUROBI or CVXPY that it can take more cpus for the build-up?
Is there another way to solve this problem?
Sorry. The first part of the solve (cvxpy model generation, setup, presolving, scaling, solving the root, preprocessing) is almost completely serial. The parallel part is when it really starts working on the branch-and-bound tree. For many problems, the parallel part is by far the most expensive, but not for all.
This is not only the case for Gurobi. Other high-end solvers have the same behavior.
There are options to do less presolving and preprocessing. That may get you earlier in the B&B. However, usually, it is better not to touch these options.
Running things with verbose=True may give you more information. If you have more detailed questions, you may want to share the log.

Confusion about NP-hard and NP-Complete in Traveling Salesman problems

Traveling Salesman Optimization(TSP-OPT) is a NP-hard problem and Traveling Salesman Search(TSP) is NP-complete. However, TSP-OPT can be reduced to TSP since if TSP can be solved in polynomial time, then so can TSP-OPT(1). I thought for A to be reduced to B, B has to be as hard if not harder than A. As I can see in the below references, TSP-OPT can be reduced to TSP. TSP-OPT is supposed to be harder than TSP. I am confused...
References: (1)Algorithm, Dasgupta, Papadimitriou, Vazirani Exercise 8.1 http://algorithmics.lsi.upc.edu/docs/Dasgupta-Papadimitriou-Vazirani.pdf https://cseweb.ucsd.edu/classes/sp08/cse101/hw/hw6soln.pdf
http://cs170.org/assets/disc/dis10-sol.pdf
I took a quick look at the references you gave, and I must admit there's one thing I really dislike in your textbook (1st pdf) : they address NP-completeness while barely mentioning decision problems. The provided definition of an NP-complete problem also somewhat deviates from what I'd expect from a textbook. I assume that was a conscious decision to make the introduction more appealing...
I'll provide a short answer, followed by a more detailed explanation about related notions.
Short version
Intuitively (and informally), a problem is in NP if it is easy to verify its solutions.
On the other hand, a problem is NP-hard if it is difficult to solve, or find a solution.
Now, a problem is NP-complete if it is both in NP, and NP-hard. Therefore you have two key, intuitive properties to NP-completeness. Easy to verify, but hard to find solutions.
Although they may seem similar, verifying and finding solutions are two different things. When you use reduction arguments, you're looking at whether you can find a solution. In that regard, both TSP and TSP-OPT are NP-hard, and it is difficult to find their solutions. In fact, the third pdf provides a solution to excercise 8.1 of your textbook, which directly shows that TSP and TSP-OPT are equivalently hard to solve.
Now, one major distinction between TSP and TSP-OPT is that the former is (what your textbook call) a search problem, whereas the latter is an optimization problem. The notion of verifying the solution of a search problem makes sense, and in the case of TSP, it is easy to do, therefore it is NP-complete. For optimization problems, the notion of verifying a solution becomes weird, because I can't think of any way to do that without first computing the size of an optimal solution, which is roughly equivalent to solving the problem itself. Since we do not know an efficient way to verify a solution for TSP-OPT, we cannot say that it is in NP, thus we cannot say that it is NP-complete. (More on this topic in the detailed explanation.)
The tl;dr is that for TSP-OPT, it is both hard to verify and hard to find solutions, while for TSP it is easy to verify and hard to find solutions.
Reductions arguments only help when it comes to finding solutions, so you need other arguments to distinguish them when it comes to verifying solutions.
More details
One thing your textbook is very brief about is what a decision problem is.
Formally, the notion of NP-completeness, NP-hardness, NP, P, etc, were developed in the context of decision problems, and not optimization or search problems.
Here's a quick example of the differences between these different types of problems.
A decision problem is a problem whose answer is either YES or NO.
TSP decision problem
Input: a graph G, a budget b
Output: Does G admit a tour of weight at most b ? (YES/NO)
Here is the search version
TSP search problem
Input: a graph G, a budget b
Output: Find a tour of G of weight at most b, if it exists.
And now the optimization version
TSP optimization problem
Input: a graph G
Output: Find a tour of G with minimum weight.
Out of context, the TSP problem could refer to any of these. What I personally refer to as the TSP is the decision version. Here your textbook use TSP for the search version, and TSP-OPT for the optimization version.
The problem here is that those various problems are strictly distinct. The decision version only ask for existence, while the search version asks for more, it needs one example of such a solution. In practice, we often want to have the actual solution, so more practical approaches may omit to mention decision problems.
Now what about it? The definition of an NP-complete problem was meant for decision problems, so it technically does not apply directly to search or optimization problems. But because the theory behind it is well defined and useful, it is handy to still apply the term NP-complete/NP-hard to search/optimization problem, so that you have an idea of how hard these problems are to solve. So when someone says the travelling salesman problem is NP-complete, formally it should be the decision problem version of the problem.
Obviously, many notions can be extended so that they also cover search problems, and that is how it is presented in your textbook. The differences between decision, search, and optimization, are precisely what the exercises 8.1 and 8.2 try to cover in your textbook. Those exercises are probably meant to get you interested in the relationship between these different types of problems, and how they relate to one another.
Short Version
The decision problem is NP-complete because you can both have a polynomial time verifier for the solution, as well as the fact that the hamiltonian cycle problem is reducible to TSP_DECIDE in polynomial time.
However, the optimization problem is strictly NP-hard, because even though TSP_OPTIMIZE is reducible from the hamiltonian (HAM) cycle problem in polynomial time, you don't have a poly time verifier for a claimed hamiltonian cycle C, whether it is the shortest or not, because you simply have to enumerate all possibilities (which consumes the factorial order space & time).
What the given reference define is, bottleneck TSP
The Bottleneck traveling salesman problem (bottleneck TSP) is a problem in discrete or combinatorial optimization. The problem is to find the Hamiltonian cycle in a weighted graph which minimizes the weight of the most weighty edge of the cycle.
The problem is known to be NP-hard. The decision problem version of this, "for a given length x is there a Hamiltonian cycle in a graph G with no edge longer than x?", is NP-complete. NP-completeness follows immediately by a reduction from the problem of finding a Hamiltonian cycle.
This problem can be solved by performing a binary search or sequential search for the smallest x such that the subgraph of edges of weight at most x has a Hamiltonian cycle. This method leads to solutions whose running time is only a logarithmic factor larger than the time to find a Hamiltonian cycle.
Long Version
The mistake is to say that the TSP is NP complete. Truth is that TSP is NP hard. Let me explain a bit:
The TSP is a problem defined by a set of cities and the distances
between each city pair. The problem is to find a circuit that goes
through each city once and that ends where it starts. This in itself
isn't difficult. What makes the problem interesting is to find the
shortest circuit among all those that are possible.
Solving this problem is quite simple. One merely need to compute the length of all possible circuits, then keep the shortest one. Issue is that the number of such circuits grows very quickly with the number of cities. If there are n cities then this number is factorial of n-1 = (n-1)(n-2)...3.2.
A problem is NP if one can easily (in polynomial time) check that a proposed solution is indeed a solution.
Here is the trick.
In order to check that a proposed tour is a solution of the TSP we need to check two things, namely
That each city is is visited only once
That there is no shorter tour than the one we are checking
We didn't check the second condition! The second condition is what makes the problem difficult to solve. As of today, no one has found a way to check condition 2 in polynomial time. It means that the TSP isn't in NP, as far as we know.
Therefore, TSP isn't NP complete as far as we know. We can only say that TSP is NP hard.
When they write that TSP is NP complete, they mean that the following decision problem (yes/no question) is NP complete:
TSP_DECISION : Given a number L, a set of cities, and distance between all city pairs, is there a tour visiting each city exactly once of length less than L?
This problem is indeed NP complete, as it is easy (polynomial time) to check that a given tour leads to a yes answer to TSPDECISION.

Finding best path trought strongly connected component

I have a directed graph which is strongly connected and every node have some price(plus or negative). I would like to find best (highest score) path from node A to node B. My solution is some kind of brutal force so it takes ages to find that path. Is any algorithm for this or any idea how can I do it?
Have you tried the A* algorithm?
It's a fairly popular pathfinding algorithm.
The algorithm itself is not to difficult to implement, but there are plenty of implementations available online.
Dijkstra's algorithm is a special case for the A* (in which the heuristic function h(x) = 0).
There are other algorithms who can outperform it, but they usually require graph pre-processing. If the problem is not to complex and you're looking for a quick solution, give it a try.
EDIT:
For graphs containing negative edges, there's the Bellman–Ford algorithm. Detecting the negative cycles comes at the cost of performance, though (worse than the A*). But it still may be better than what you're currently using.
EDIT 2:
User #templatetypedef is right when he says the Bellman-Ford algorithm may not work in here.
The B-F works with graphs where there are edges with negative weight. However, the algorithm stops upon finding a negative cycle. I believe that is a useful behavior. Optimizing the shortest path in a graph that contains a cycle of negative weights will be like going down a Penrose staircase.
What should happen if there's the possibility of reaching a path with "minus infinity cost" depends on the problem.

Determining hopeless branches early in branch-and-bound algorithms

I have to design a branch-and bound algorithm that solves the optimal tour of a graph on the cartesian plane every time. I have been given the hint that identifying hopeless branches earlier in the runtime will compound into a program that runs "a hundred times faster". I had the idea of assuming that the shortest edge connected to the starting/ending node will be either the first or last edge in the tour but a thin diamond shaped graph proves otherwise. Does any one have ideas for how to eliminate these hopeless branches or a reference that talks about this?
Basically, is there a better way to branch to subsets of solutions better than just lexicographically, eg. first branch is including and excluding edge a-b, second branch includes and excludes branch a-c
So somewhere in your branch-and-bound algorithm, you look at possible places to go, and then somehow keep track of them to do later.
To make this more efficient, you can do a couple things:
Write a better bound calculator. In other words, come up with an algorithm that determines the bound more accurately. This will result in less time spent on paths that turn out to be poor.
Instead of using a stack to keep track of things to do, use a queue. Instead of using a queue, use a priority queue (heap) ordered by bound, e.g. the things that seem best are put at the top of the heap, and the things that seem bad are put on the bottom.
Nearest-neighbor is a simple algorithm. Branch-and-Bound is just an optimizing loop and additionally you need a sub-problem solver. I think nearest-neighbor is also a branch-and-bound algorithm. Instead I would look into the simplex algorithm. It's a linear programming algorithm. Also cutting-plane algorithm to solve tsp.

Generating a sudoku of a desired difficulty?

So, I've done a fair bit of reading into generation of a Sudoku puzzle. From what I can tell, the standard way to have a Sudoku puzzle of a desired difficulty is to generate a puzzle, and then grade it afterwards, and repeat until you have one of an acceptable rating. This can be refined by generating via backtracing using some of the more complex solving patterns (XY-wing, swordfish, etc.), but that's not quite what I'm wanting to do here.
What I want to do, but have been unable to find any real resource on, is generate a puzzle from a "difficulty value" (0-1.0 value, 0 being the easiest, and 1.0 being the hardest).
For example, I want create a moderately difficult puzzle, so the value .675 is selected. Now using that value I want to be able to generate a moderately difficult puzzle.
Anyone know of something like this? Or perhaps something with a similar methodology?
Adding another answer for generating a sudoku of desired difficulty on-the-fly.
This means that unlike other approaches the algorithm runs only once and returns a sudoku configuration matching the desired difficulty (with high probability within a range or with probability=1)
Various solutions for generating (and rating) a sudoku difficulty have to do with human-based techniques and approaches, which can be easily rated.
Then one (after having generated a sudoku configuration) re-solves the sudoku with the human-like solver and depending on the techniques the solver used (e.g pairs, x-wing, swordfish etc.) a difficulty rate is also assigned.
Problems with this approach
(and requirements for the use case i had)
In order to generate a sudoku with given difficulty, with previous method one needs to solve a sudoku twice (once with the basic algorithm and once with the human-like solver).
One has to (pre-)generate many sudokus which can only be rated as to difficulty after being solved by the human-like solver. So one cannot generate a desired sudoku on-the-fly once.
The human-like solver can be complicated and in most cases (if not all) is tightly coupled to 9x9 sudoku grids. So no easy generalisation to other sudokus (e.g 4x4, 16x16, 6x6 etc.)
The difficulty rating of the human-like techniques is very subjective. For example why x-wing is taken to be more difficult than hidden singles? (personaly have solved many difficult published sudoku puzzles manualy and never used such techniques)
Another approach was used which has the following benefits:
Generalises well to arbitrary sudokus (9x9, 4x4, 6x6, 16x16 etc..)
The sudoku configuration, with desired difficulty, is generated once and on-the-fly
The difficulty rating is objective.
How it works?
First of all, the simple fact that the more difficult the puzzle, the more time it needs to be solved.
But time to be solved is intimately correlated to both number of clues (givens) and average alternatives to be investigated per empty cell.
Extending my previous answer, it was mentioned that for any sudoku puzzle the minimum number of clues is an objective property of the puzzle (for example for 9x9 grids the minimum number of clues for having a valid sudoku is 17)
One can start from there and compute minimum number of clues per difficulty level (linear correlation).
Furthermore at each step of the sudoku generation process, one can make sure the average alternatives (to be investigated) per empty cell is within given bounds (as a function of desired difficulty)
Depending on whether the algorithm uses backtrack or not (for the use case discussed the algorithm does no backtracking) the desired difficulty can be reached either with probability=1 or with high probability within bounds (respectively).
Tests of the sudokus generated with this algorithm and difficulty rating based on the previous approaches (human-like solver), show a correlation of desired and estimated difficulty rates, plus a greater ability for generalisation to arbitrary sudoku configurations.
(have used this online sudoku solver (and also this one) to correlate the difficulty rates of the test sudokus)
The code is available free on github sudoku.js (along with sample demo application), a scaled-down version of CrossWord.js a professional crossword builder in JavaScript, by same author
The sudoku difficulty is related in an interesting way to the (minimum) amount of information needed to specify a unique solution for a given grid.
Sounds like information theory, yes it has applications here too.
Sudoku puzzles should have a unique solution. Furthermore sudoku puzzles have certain symmetries, i.e by row, by column and by sub-square.
These symmetries specify the minimum number of clues (and their position more or less) needed so that the solution would be unique (i.e using a sudoku compiler or an algorithm like backtrack-search).
This would be the most difficult/hard sudoku puzzle level (i.e minimum needed number of clues). Then all other difficulty levels from less hard to easy are generated by allowing more clues than the minimum amount needed.
It should be noted that sudoku difficulty levels are not standard, as explained above, one can have as many or as few difficulty levels as one wants. What is standard is the minimum number (and position) of clues (which is the hardest level and which is relatd to the sudoku symmetries), then one can generate as many difficulty levels as one wants simply by allowing extra/redundant clues to be visible as well.
It's not as elegant as what you ask, but you can simulate this behavior with caching:
Decide how many "buckets" you want for puzzles. For example, let's say you choose 20. Thus, your buckets will contain puzzles of different difficulty ranges: 0-.05, .05-.1, .1-.15, .. , .9-.95, .95-1
Generate a puzzle
Grade the puzzle
Put it in the appropriate bucket (or throw it away when the bucket is full)
Repeat till your buckets are "filled". The size of the buckets and where they are stored will be based on the needs of your application.
Then when a user requests a certain difficulty puzzle, give them a cached one from the bucket they choose. You might also want to consider swapping numbers and changing orientation of puzzles with known difficulties to generate similar puzzles with the same level of difficulty. Then repeat the above as needed when you need to refill your buckets with new puzzles.
Well, you can't know how complicated it is, before you know how to solve it. And Sudoku solving (and therefore also the difficulty rating) belongs to the NP-C complexity class, that means it's (most likely) logically impossible to find an algorithm that is (asymptotically) faster than the proposed randomly-guess-and-check.
However, if you can find one, you have solved the P versus NP problem and should clear a cupboard for the Fields Medal... :)