Solving a Mixed Integer Quadratic Program using SCIP - scip

I have a mixed integer quadratic program (MIQP) which I would like to solve using SCIP. The program is in the form such that on fixing the integer variables, the problem turns out to be a linear program. And on fixing the the continuous variables it becomes a Integer Program. A simple example :
max. \Sigma_{i} n_i * f_i(x_i)
such that.
n_1 * x_1 + n2 * x_2 < t
n_3 * x_1 + n2 * x_2 < m
.
.
many random quadratic constraints in n_i's and x_i's
so on
Here f_i is a concave piecewise linear function.
x_i's are continuous variables ( they take real values )
n_i's are integer variables
I am able to solve the problem using SCIP. But on problems with a large number of variables SCIP takes a lot of time to find the solution. I have particularly noticed that it does not find many primal solutions. Thus the rate at which the upper bound reduces is very slow. However, I could get better results by doing set heuristics emphasis aggressive.
It would be great if anyone can guide me on the following questions :
1) Is there any particular algorithm/ Software package which solves problems that fit perfectly into the model as described above ?
2) Suggestions on how to improve the rate at which primal solutions are found.
3) What type of branching can I use to get better results ?
4) Any guidance on improving performance would be really helpful.
I am okay with relaxing the integer constraints as well.
Thanks

1) The algorithm in SCIP should fit your problem. There are other software packages that implement similar algorithms, e.g., BARON and ANTIGONE.
2) Have a look which primal heuristics were successful in your run and change their parameters to run them more frequently.
3) No idea. Default should be ok.
4) Make sure that your variables have good bounds. Tighter bounds allow for a tighter relaxation to be constructed.
If you can post an instance of your problem somewhere, or a log of a SCIP run, including the detailed statistics at the end, maybe someone can give more hints on what to improve.

Related

SLSQP in ScipyOptimizeDriver only executes one iteration, takes a very long time, then exits

I'm trying to use SLSQP to optimise the angle of attack of an aerofoil to place the stagnation point in a desired location. This is purely as a test case to check that my method for calculating the partials for the stagnation position is valid.
When run with COBYLA, the optimisation converges to the correct alpha (6.04144912) after 47 iterations. When run with SLSQP, it completes one iteration, then hangs for a very long time (10, 20 minutes or more, I didn't time it exactly), and exits with an incorrect value. The output is:
Driver debug print for iter coord: rank0:ScipyOptimize_SLSQP|0
--------------------------------------------------------------
Design Vars
{'alpha': array([0.5])}
Nonlinear constraints
None
Linear constraints
None
Objectives
{'obj_cmp.obj': array([0.00023868])}
Driver debug print for iter coord: rank0:ScipyOptimize_SLSQP|1
--------------------------------------------------------------
Design Vars
{'alpha': array([0.5])}
Nonlinear constraints
None
Linear constraints
None
Objectives
{'obj_cmp.obj': array([0.00023868])}
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.0002386835700364719
Iterations: 1
Function evaluations: 1
Gradient evaluations: 1
Optimization Complete
-----------------------------------
Finished optimisation
Why might SLSQP be misbehaving like this? As far as I can tell, there are no incorrect analytical derivatives when I look at check_partials().
The code is quite long, so I put it on Pastebin here:
core: https://pastebin.com/fKJpnWHp
inviscid: https://pastebin.com/7Cmac5GF
aerofoil coordinates (NACA64-012): https://pastebin.com/UZHXEsr6
You asked two questions whos answers ended up being unrelated to eachother:
Why is the model so slow when you use SLSQP, but fast when you use COBYLA
Why does SLSQP stop after one iteration?
1) Why is SLSQP so slow?
COBYLA is a gradient free method. SLSQP uses gradients. So the solid bet was that slow down happened when SLSQP asked for the derivatives (which COBYLA never did).
Thats where I went to look first. Computing derivatives happens in two steps: a) compute partials for each component and b) solve a linear system with those partials to compute totals. The slow down has to be in one of those two steps.
Since you can run check_partials without too much trouble, step (a) is not likely to be the culprit. So that means step (b) is probably where we need to speed things up.
I ran the summary utility (openmdao summary core.py) on your model and saw this:
============== Problem Summary ============
Groups: 9
Components: 36
Max tree depth: 4
Design variables: 1 Total size: 1
Nonlinear Constraints: 0 Total size: 0
equality: 0 0
inequality: 0 0
Linear Constraints: 0 Total size: 0
equality: 0 0
inequality: 0 0
Objectives: 1 Total size: 1
Input variables: 87 Total size: 1661820
Output variables: 44 Total size: 1169614
Total connections: 87 Total transfer data size: 1661820
Then I generated an N2 of your model and saw this:
So we have an output vector that is 1169614 elements long, which means your linear system is a matrix that is about 1e6x1e6. Thats pretty big, and you are using a DirectSolver to try and compute/store a factorization of it. Thats the source of the slow down. Using DirectSolvers is great for smaller models (rule of thumb, is that the output vector should be less than 10000 elements). For larger ones you need to be more careful and use more advanced linear solvers.
In your case we can see from the N2 that there is no coupling anywhere in your model (nothing in the lower triangle of the N2). Purely feed-forward models like this can use a much simpler and faster LinearRunOnce solver (which is the default if you don't set anything else). So I turned off all DirectSolvers in your model, and the derivatives became effectively instant. Make your N2 look like this instead:
The choice of best linear solver is extremely model dependent. One factor to consider is computational cost, another is numerical robustness. This issue is covered in some detail in Section 5.3 of the OpenMDAO paper, and I won't cover everything here. But very briefly here is a summary of the key considerations.
When just starting out with OpenMDAO, using DirectSolver is both the simplest and usually the fastest option. It is simple because it does not require consideration of your model structure, and it's fast because for small models OpenMDAO can assemble the Jacobian into a dense or sparse matrix and provide that for direct factorization. However, for larger models (or models with very large vectors of outputs), the cost of computing the factorization is prohibitively high. In this case, you need to break the solver structure down more intentionally, and use other linear solvers (sometimes in conjunction with the direct solver--- see Section 5.3 of OpenMDAO paper, and this OpenMDAO doc).
You stated that you wanted to use the DirectSolver to take advantage of the sparse Jacobian storage. That was a good instinct, but the way OpenMDAO is structured this is not a problem either way. We are pretty far down in the weeds now, but since you asked I'll give a short summary explanation. As of OpenMDAO 3.7, only the DirectSolver requires an assembled Jacobian at all (and in fact, it is the linear solver itself that determines this for whatever system it is attached to). All other LinearSolvers work with a DictionaryJacobian (which stores each sub-jac keyed to the [of-var, wrt-var] pair). Each sub-jac can be stored as dense or sparse (depending on how you declared that particular partial derivative). The dictionary Jacobian is effectively a form of a sparse-matrix, though not a traditional one. The key takeaway here is that if you use the LinearRunOnce (or any other solver), then you are getting a memory efficient data storage regardless. It is only the DirectSolver that changes over to a more traditional assembly of an actual matrix object.
Regarding the issue of memory allocation. I borrowed this image from the openmdao docs
2) Why does SLSQP stop after one iteration?
Gradient based optimizations are very sensitive to scaling. I ploted your objective function inside your allowed design space and got this:
So we can see that the minimum is at about 6 degrees, but the objective values are TINY (about 1e-4).
As a general rule of thumb, getting your objective to around order of magnitude 1 is a good idea (we have a scaling report feature that helps with this). I added a reference that was about the order of magnitude of your objective:
p.model.add_objective('obj', ref=1e-4)
Then I got a good result:
Optimization terminated successfully (Exit mode 0)
Current function value: [3.02197589e-11]
Iterations: 7
Function evaluations: 9
Gradient evaluations: 7
Optimization Complete
-----------------------------------
Finished optimization
alpha = [6.04143334]
time: 2.1188600063323975 seconds
Unfortunately, scaling is just hard with gradient based optimization. Starting by scaling your objective/constraints to order-1 is a decent rule of thumb, but its common that you need to adjust things beyond that for more complex problems.

scipy.sparse.linalg: what's the difference between splu and factorized?

What's the difference between using
scipy.sparse.linalg.factorized(A)
and
scipy.sparse.linalg.splu(A)
Both of them return objects with .solve(rhs) method and for both it's said in the documentation that they use LU decomposition. I'd like to know the difference in performance for both of them.
More specificly, I'm writing a python/numpy/scipy app that implements dynamic FEM model. I need to solve an equation Au = f on each timestep. A is sparse and rather large, but doesn't depend on timestep, so I'd like to invest some time beforehand to make iterations faster (there may be thousands of them). I tried using scipy.sparse.linalg.inv(A), but it threw memory exceptions when the size of matrix was large. I used scipy.linalg.spsolve on each step until recently, and now am thinking on using some sort of decomposition for better performance. So if you have other suggestions aside from LU, feel free to propose!
They should both work well for your problem, assuming that A does not change with each time step.
scipy.sparse.linalg.inv(A) will return a dense matrix that is the same size as A, so it's no wonder it's throwing memory exceptions.
scipy.linalg.solve is also a dense linear solver, which isn't what you want.
Assuming A is sparse, to solve Au=f and you only want to solve Au=f once, you could use scipy.sparse.linalg.spsolve. For example
u = spsolve(A, f)
If you want to speed things up dramatically for subsequent solves, you would instead use scipy.sparse.linalg.factorized or scipy.sparse.linalg.splu. For example
A_inv = splu(A)
for t in range(iterations):
u_t = A_inv.solve(f_t)
or
A_solve = factorized(A)
for t in range(iterations):
u_t = A_solve(f_t)
They should both be comparable in speed, and much faster than the previous options.
As #sascha said, you will need to dig into the documentation to see the differences between splu and factorize. But, you can use 'umfpack' instead of the default 'superLU' if you have it installed and set up correctly. I think umfpack will be faster in most cases. Keep in mind that if your matrix A is too large or has too many non-zeros, an LU decomposition / direct solver may take too much memory on your system. In this case, you might be stuck with using an iterative solver such as this. Unfortunately, you wont be able to reuse the solve of A at each time step, but you might be able to find a good preconditioner for A (approximation to inv(A)) to feed the solver to speed it up.

Worhp: Local point of infeasibility

I have a problem that is solved successfully with ipopt and fmincon. worhp terminates on local infeasibility. My x0 (init) is feasible.
This may happen with the interior point algorithm, but I expect sqp to always stay in the feasible zone?
Maybe also check the derivatives with WORHP by enabling CheckValuesDF, CheckValuesDG, CheckValuesHM, CheckStructureDF, CheckStructureDG and CheckStructureHM if you provide them. What I am pointing at is that WORHP requires a very special coordinate storage format (in particular for the Hessian). Mistakes here lead to false search directions.
Due to the approximation error of the QP subproblem this is not something you can expect in general. Consider the problem
which will have the QP subproblems
for a current x and Lagrangian multiplier lambda, as can be seen by determining the necessary derivatives. With initial values x_0 = 0 and lambda_0 = 1 we have a feasible initial guess. The first QP to be solved is then
which has the unique solution d = 2. Now, depending on the implemented linesearch, the full step might be taken, i.e. the next iterate is x_1 = x_0 + d. That means x_1 = 2 which is not a feasible point anymore. In fact, WORHP's SQP algorithm will iterate like this if you disable the par.InitialLMest and eventually find the global optimum at x = 1.
Apart from this fundamental property there can also be other effects leading to iterates leaving the feasible set, that will very much be specific to the actual solver implementation. For example numerical inaccuracies, difficulties during the solution of a QP or certain recovery strategies. As to why your problem is not solved successfully using the SQP algorithm of WORHP, I am unable to say much without knowing anything about the problem itself.

Practical solver for convex QCQP?

I am working with a convex QCQP as the following:
Min e'Ie
z'Iz=n
[some linear equalities and inequalities that contain variables w,z, and e]
w>=0, z in [0,1]^n
So the problem has only one quadratic constraint, except the objective, and some variables are nonnegative. The matrices of both quadratic forms are identity matrices, thus are positive definite.
I can move the quadratic constraint to the objective but it must have the negative sign so the problem will be nonconvex:
min e'Ie-z'Iz
The size of the problem can be up to 10000 linear constraints, with 100 nonnegative variables and almost the same number of other variables.
The problem can be rewritten as an MIQP as well, as z_i can be binary, and z'Iz=n can be removed.
So far, I have been working with CPLEX via AIMMS for MIQP and it is very slow for this problem. Using the QCQP version of the problem with CPLEX, MINOS, SNOPT and CONOPT is hopeless as they either cannot find a solution, or the solution is not even close to an approximation that I know a priori.
Now I have three questions:
Do you know any method/technique to get rid of the quadratic constraint as this form without going to MIQP?
Is there any "good" solver for this QCQP? by good, I mean a solver that efficiently finds the global optimum in a resonable time.
Do you think using SDP relaxation can be a solution to this problem? I have never solved an SDP problem in reallity, so I do not know how efficient SDP version can be. Any advice?
Thanks.

backtracking line search parameter

I am reading/practicing a bit with optimization using Nocedal&Wright, when I got the the simple backtracking algorithm, where if d is my line direction and a is the step size the algorithm looks for a such that
for some 0 < c < 1. They advised to use a very small c, order of 10^-4.
That seemed very odd to me, as a very loss demand.
I did some experimenting with c = 0.3 and it seemed to work much better then the sugested 10^-4 ( for a simple quadratic problem and steepest descent).
Any intuition as to why such a low value should work and why didn't it do well for me?
Thanks.
∇ f() may have completely different scales for different problems;
one stepsize cannot fit all.
Consider f(x) = sin( ω . x ): the right c will depend on ω,
which may be on the order of 1, or 1e-6, or ...
Thus it's a good idea to scale ∇ f() to about norm 1, then play with c.
(People who recommend "c = ...", please describe your problem size and scales.)
Add some noise to your quadratic, see what happens as you increase the noise.
Try quadratic + noise in 2d, 10d.
In machine learning, there seems to be quite a lot of folklore on c a.k.a. learning rate;
google
learning-rate on stackexchange.com ,
also gradient-descent step-size
and adagrad adaptive gradient.