Large scale linearly-constrained convex quadratic optimization - R/Python/Gurobi - large-data

I have a series of linearly-constrained convex quadratic optimization problems that have around 100.000 variables, 1 linear constraint and 100.000 bound constraints (the same as the number of variables - the solution has to be positive). I am planning to use gurobi in R and/or Python. I have noticed that, although for small problems the solver can find a solution quite quickly, for medium to large problems (as the one I have), it takes forever (some benchmarks are shown in here - credits to Stéphane Caron).
I know that qp methods do not scale very well, but I'd like to know if you are aware of any solver/technique/tool that solves medium to large qp problems faster.
Thanks!
Please click here to see the optimization problem

Related

Performance of SCIP: how many variables and constraints SCIP can deal with, and how much time SCIP will take on solving?

I'm new to SCIP, and I have a large-scale MINLP with about 500,000 integer variables, 500,000 linear constraints, and 100,000 nonlinear constraints.
I read a lot of papers about the performance of SCIP, but can't find how many variables and constraints SCIP can deal with.
One of the papers I found showing the number of sloved problems but not the number of variables and constraints as listed below.
https://link.springer.com/content/pdf/10.1007%2Fs11081-018-9411-8.pdf
Is there any experience or paper I can refer to how many variables and constraints SCIP can deal with, and how much time SCIP will take on solving?
There is hardly a limit on the size of the instances (if we're ignoring some limits imposed by the programming languages) that you can pass to SCIP - or any other MIP solver for that matter. Whether you can solve an instance in an acceptable time and without exceeding your memory is mainly a question of the computing resources at your disposal.
So, I'd say: Just give it a try!

Use PyTorch to speed up linear least squares optimization with bounds?

I'm using scipy.optimize.lsq_linear to run some linear least squares optimizations and all is well, but a little slow. My A matrix is typically about 100 x 10,000 in size and sparse (sparsity usually ~50%). The bounds on the solution are critical. Given my tolerance lsq_linear typically solves the problems in about 10 seconds and speeding this up would be very helpful for running many optimizations.
I've read about speeding up linear algebra operations using GPU acceleration in PyTorch. It looks like PyTorch handles sparse arrays (torch calls them tensors), which is good. However, I've been digging through the PyTorch documentation, particularly the torch.optim and torch.linalg packages, and I haven't found anything that appears to be able to do a linear least squares optimization with bounds.
Is there a torch method that can do linear least squares optimization with bounds like scipy.optimize.lsq_linear?
Is there another way to speed up lsq_linear or to perform the optimization in a faster way?
For what it's worth, I think I've pushed lsq_linear pretty far. I don't think I can decrease the number of matrix elements, increase sparsity or decrease optimiation tolerances much farther without sacrificing the results.
Not easily, no.
I'd try to profile lsq_linear on your problem to see if it's pure python overhead (which can probably be trimmed some) or linear algebra. In the latter case, I'd start with vendoring the lsq_linear code and swapping relevant linear algebra routines. YMMV though.

Heuristics / Solver for high dimensional planning problem

To optimize a production system by planning ~1000 timesteps ahead I try to solve an optimization problem with around 20000 dimensions containing binary and continuous variables and several complex constraints.
I know the provided information is little, but can someone give a hint which approach would be suitable for such big problems? Would you recommend some metaheuristic or a commercial solver?

Why does GLPSOL (GLPK) take a long time to solve a large MIP?

I have a large MIP problem, and I use GLPSOL in GLPK to solve it. However, solving the LP relaxation problem takes many iterations, and each iteration the obj and infeas value are all the same. I think it has found the optimal solution, but it won't stop and has continued to run for many hours. Will this happen for every large-scale MIP/LP problem? How can I deal with such cases? Can anyone give me any suggestions about this? Thanks!
The problem of solving MIPs is NP-complete in general, which means that there are instances which can't be solved efficiently. But often our problems have enough structure, so that heuristics can help to solve these models. This allowed huge gains in solving-capabilities in the last decades (overview).
For understanding the basic-approach and understanding what exactly is the problem in your case (no progress in upper-bound, no progress in lower-bound, ...), read Practical Guidelines for Solving Difficult Mixed Integer Linear
Programs.
Keep in mind, that there are huge gaps between commercial solvers like Gurobi / Cplex and non-commercial ones in general (especially in MIP-solving). There is a huge amount of benchmarks here.
There are also a lot of parameters to tune. Gurobi for example has different parameter-templates: one targets fast findings of feasible solution; one targets to proof the bounds.
My personal opinion: compared to cbc (open-source) & scip (open-source but non-free for commercial usage), glpk is quite bad.

When to switch from Dynamic Programming (2D table) to Branch & Bound algorithm?

I'm doing a knapsack optimization problem involving dynamic programming and branch & bound. I noticed that when the capacity and the item of the problem gets large, filling up the 2D table for the dynamic programming algorithm will get exponentially slower. At some point I am going to assume that I am suppose to switch algorithm depending on the size of the problem (since lecture have give two types of optimization)?
I've tried to google at what point (what size) should I switch from dynamic programming to branch & bound, but I couldn't get the result that I wanted.
Or, is there another way of looking at the knapsack problem in which I can combine dynamic programming and branch & bound as one algorithm instead of switching algorithm depending of the size of the problem?
Thanks.
Often when you have several algorithms that solve a problem but whose runtimes have different characteristics, you determine (empirically, not theoretically) when one algorithm becomes faster than the other. This is highly implementation- and machine-dependent. So measure both the DP algorithm and the B&B algorithm and figure out which one is better when.
A couple of hints:
You know that DP's runtime is proportional to the number of objects times the size of the knapsack.
You know that B&B's runtime can be as bad as 2# objects, but it's typically much better. Try to figure out the worst case.
Caches and stuff are going to matter for DP, so you'll want to estimate its runtime piecewise. Figure out where the breakpoints are.
DP takes up exorbitant amounts of memory. If it's going to take up too much, you don't really have a choice between DP and B&B.