Pyomo-IPOPT: solver falls into local minima, how to avoid that?

Pyomo-IPOPT: solver falls into local minima, how to avoid that? - optimization

I am trying to solve an optimisation problem consisting in finding the global maximum of a high dimensional (10+) monotonic function (as in monotonic in every direction). The constraints are such that they cut the search space with planes.
I have coded the whole thing in pyomo and I am using the ipopt solver. In most cases, I am confident it converges successfully to the global optimal. But if I play a bit with the constraints I see that it sometimes converges to a local minima.
It looks like a exploration-exploitation trade-off.
I have looked into the options that can be passed to ipopt and the list is so long that I cannot understand which parameters to play with to help with the convergence to the global minima.
edit:
Two hints of a solution:
my variables used to be defined with very infinite bounds, e.g. bounds=(0,None) to move on the infinite half-line. I have enforced two finite bounds on them.
I am now using multiple starts with:
opt = SolverFactory('multistart')
results = opt.solve(self.model, solver='ipopt', strategy='midpoint_guess_and_bound')
So far this has made my happy with the convergence.

Sorry, IPOPT is a local solver. If you really want to find global solutions, you can use a global solver such as Baron, Couenne or Antigone. There is a trade-off: global solvers are slower and may not work for large problems.
Alternatively, you can help local solvers with a good initial point. Be aware that active set methods are often better in this respect than interior point methods. Sometimes multistart algorithms are used to prevent bad local optima: use a bunch of different starting points. Pyomo has some facilities to do this (see the documentation).

Related

On the iterative implementation of mosekopt for large linear programs

I have to solve a linear program with a very large number of constraints. I use MOSEK (mosekopt, with MSK_IPAR_INTPNT_BASIS set equal to MSK_BI_NEVER to save time).
The solver takes time to solve the program due to the large dimension.
I thought about manually coding the following iterative procedure:
Take a random subset of constraints and solve the restricted linear program.
If a solution of the restricted linear program does not exist, stop.
If a solution of the restricted linear program exists, check if it is a solution of the original linear program. If yes, stop. If not, repeat from 1. with a larger set of constraints that includes the constraints of this iteration.
The procedure does not seem to produce a notable saving of time. I wonder whether this is because 1.,2.,3. are essentially what the solver does without needing my input. Could you advise?
Could I do improve things if, when moving from 3. to 1., I supply to mosekopt the old solution of the restricted linear program?

This may or may not be faster, than using Mosek on the complete problem. At least theoretical your approach is inferior.
You say nothing of the dimension of the problem that would be interesting to know.
Or how long it takes to solve the complete problem.
One issue tricky is how many and which constraints you are adding in 3. That will be very important.

Optimization has found the global minimum but converges to a local one

I am using the stochastic optimization algorithm CMA-ES. Although it finds the global minimum in the first cycles ( I know because it is a made-up benchmark test) the algorithm after some cycles converge to another minimum (a local one since it has a bigger cost function value).
Does everyone have experience in the matter?
Do I have to care that it converges to a local minimum since it has found the global one? Is it wrong to just use the global minimum like that and not to care about where the algorithm has converged?
My opinion from the results is that this is happening due to the normal distribution, the global minimum has only a few solutions but the local one has a great percentage of solutions. ( I have tried a lot of different populations values but the result is the same)
Thank you in advance for your help!

It is common to keep a global "best" solution when running evolutionary algorithms, especially if they are the kind that is allowed to move to worse results from a better one.
If you are running the algorithm with an approximate fitness function and getting a good-enough result is okay, you can go with what it converges to. Depending on the problem you are solving, it might be very good or very bad to overfit a solution.
If your fitness function is not an approximation and is the correct metric to optimize, just keep the best performer and use it when you finish running the algorithm.

Algebraic/implicit loops handling by Gekko

I have got a specific question with regards to algebraic / implicit loops handling by Gekko.
I will give examples in the field of Chemical Engineering, as this is how I found the project and its other libraries.
For example, when it comes to multicomponent chemical equilibrium calculations, it is not possible to explicitly work out the equations, because the concentration of one specie may be present in many different equations.
I have been using other paid software in the past and it would automatically propose a resolution procedure based on how the system is solvable (by analyzing dependency and creating automatic algebraic loops).
My question would be:
Does Gekko do that automatically?
It is a little bit tricky because sometimes one needs to add tear variables and iterate from a good starting value.
I know this message may be a little bit abstract, but I am trying to decide which software to use for my work and this is a pragmatic bottle neck that I have happened to find.
Thanks in advance for your valuable insight.

Python Gekko uses a simultaneous solution strategy so that all units are solved together instead of sequentially. Therefore, tear variables are not needed but large flowsheet problems with recycle can be difficult to converge to a feasible solution. Below are three methods that are in Python Gekko to assist in efficient solutions and initialization.
Method 1: Intermediate Variables
Intermediate variables are useful to decrease the complexity of the model. In many models, the temporary variables outnumber the regular variables. This model reduction often aides the solver in finding a solution by reducing the problem size. Intermediate variables are declared with m.Intermediates() in Python Gekko. The intermediate variables may be defined in one section or in multiple declarations throughout the model. Intermediate variables are parsed sequentially, from top to bottom. To avoid inadvertent overwrites, intermediate variable can be defined once. In the case of intermediate variables, the order of declaration is critical. If an intermediate is used before the definition, an error reports that there is an uninitialized value. Here is additional information on Intermediates with an example problem.
Method 2: Lower Block Triangular Decomposition
For large problems that have trouble with initialization, there is a mode that is activated with the option m.options.COLDSTART=2. This mode performs a lower block triangular decomposition to automatically identify independent blocks that are then solved independently and sequentially.
This decomposition method for initialization is discussed in the PhD dissertation (chapter 2) of Mostafa Safdarnejad or also in Safdarnejad, S.M., Hedengren, J.D., Lewis, N.R., Haseltine, E., Initialization Strategies for Optimization of Dynamic Systems, Computers and Chemical Engineering, 2015, Vol. 78, pp. 39-50, DOI: 10.1016/j.compchemeng.2015.04.016.
Method 3: Automatic Model Reduction
Model reduction requires more pre-processing time but can help to significantly reduce the solver time. There is additional documentation on m.options.REDUCE.
Overall Strategy for Initialization
The overall strategy that we use for initializing hard problems, such as flowsheets with recycle, is shown in this flowchart.
Sometimes it does mean breaking recycles to get an initialized solution. Other times, the initialization strategies detailed above work well and no model rearrangement is necessary. The advantage of working with a simultaneous solution strategy is degree of freedom swapping such as downstream variables can be fixed and upstream variables calculated to meet that value.

using Bonmin Counne and Ipopt for NLP

I want to just be sure that I am eligible to use Bonmin and Couenne for solving just the NLP problem (Still I do not have integer variable) and I am eager to obtain global optimum not local. I also read that Ipopt first search for the global answer and if it does not find that it will provide a local answer. How I can understand my answer is a global answer when I using Ipopt. Also, I want to what is the best NLP and MINLP open source pythonic solvers for these issues that can be merged with Pyomo?
The main reason for my question is the following output using Bonmin:
NOTE: You are using Ipopt by default with the MUMPS linear solver.
Other linear solvers might be more efficient (see Ipopt documentation).
Regards

Some notes:
(1) "Ipopt first search for the global answer and if it does not find that it will provide a local answer" This is probably not how I would phrase it. IPOPT finds local solutions. For some problems these will be the global solution. For convex problems, this is always the case (except for numerical issues).
(2) Bonmin is a local MINLP solver, Couenne is a global NLP/MINLP solver. Typically Bonmin can solve larger problems than Couenne, but you get local solutions.
(3) "NOTE: You are using Ipopt by default with the MUMPS linear solver. Other linear solvers might be more efficient (see Ipopt documentation)." This is just a notification that you are using IPOPT with linear algebra routines from MUMPS. There are other linear sub-solvers that IPOPT can use and that may perform better on large problems. Often the HARWELL routines (typically called MAnn) give better performance. MUMPS is free while the Harwell routines require a license.
In a follow-up answer (well it is not answer at all) it is stated:
Regarding Ipopt how I can understand that it is finding the global
solution or local optimum? the code will notify that? Regarding to
Bonmin according to AMPL page AMPL It provides the global solution for
the convex problem " Finds globally optimal solutions to convex
nonlinear problems in continuous and discrete variables, and may be
applied heuristically to nonconvex problems." And you were saying that
it is obtained the local solution, I am a bit confused on this part.
But the general question about all those codes is that how I can find
out that the answer is global optimum?
(a) Ipopt does not know if a solution is a local or a global optimal solution. For convex problems a local optimum is a global optimal solution. You will need to convince yourself the problem you pass on to Ipopt is convex (Ipopt will not do this for you).
(b) Bonmin: the same: if the problem is convex it will find global solutions. Otherwise you will get a local solution. You will get no notification whether a solution is a global solution: Bonmin does not know if a solution is a global optimum.
(c) When looking for guaranteed global solutions you can use a local solver only when the problem is convex. For other problems you need a global solver. Another approach is to use a multi-start algorithm with a local solver. That gives you confidence that you are not ending up with a bad local optimum.
If possible, I suggest to discuss this with your teacher. These concepts are important to understand (and most solver manuals assume you know about them).

Genetic/Evolutionary algorithms and local minima/maxima

I have run across several posts and articles that suggests using things like simulated annealing to avoid the local minima/maxima problem.
I don't understand why this would be necessary if you started out with a sufficiently large random population.
Is it just another check to insure that the initial population was, in fact, sufficiently large and random? Or are those techniques just an alternative to producing a "good" initial population?

Simulated annealing is a probabilistic optimization technique -- it's not supposed to give you more precise answers, it's supposed to give you approximations faster.

Simulated annealing is probabilistic technique where chance of getting trapped in local minima/maxima depends on scheduling of temperature. Scheduling temperature is different for different types of problems. Evolutionary Algorithm is much more robust and less likely to get trapped in local minima/maxima. SA is probabilistic. On the other hand, EA uses mutation which introduces random walk in search space, that's why EA has higher probability of getting global optima.

First of all, simulated annealing is a last resort method. There are far better, more efficient, and more effective methods of discovering where the local minima are found.
A better check would be to use a statistical method to uncover information about your data set such as variance or standard deviation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas