Numerical Instability in Optim.jl - optimization

I'm currently working on a project in Julia where I am starting with an input beta which is assumed to be incorrect. I'm running through a sequence of code that updates this beta to be the correct value and checking the error. As beta gets larger, I expect this error to reach 100%. This code ultimately does a minimization of some parameter chi which is why I've chosen to employ the optimize function from Optim.jl. The output I'm getting is below.
When I perform this calculation by hand (using 1st and 2nd derivative to update) I get this
I see that this still has some numerical instability, but it holds up longer than the Optim way does. I would expect it to behave the other way around. My optimize function is set up as
result = optimize(β -> TEfunc(E,nc,onecut,β,pcutoff,μcutoff,N),β/2,2.2*β,Brent(),abs_tol=tempcutoff,rel_tol=sqrt(tempcutoff))
βstar=Optim.minimizer(result)
Is there an argument that I'm missing in the optimize call? I just want to figure out why I have numerical instability so quickly.

Related

SciPy Basinhopping not returning lowest-found minimum

I know there is a very similar question, but mine is different. I am running an optimization using Basinhopping, with the Powell method. Within the function I am optimizing, I also store to an external array the parameters and the resulting cost function value for each iteration, so I can afterwards check the results. I've noticed repeatedly that the lowest minimization result which the basinhopping function returns is not actually the set of parameters which resulted in the lowest overall error. I assume this is not an error, but maybe me misunderstanding how the technique works. For example, in an optimization I just ran, I found the result which was returned was actually the 35th-best option, when I check my arrays after completion. The difference in cost is very small (I'm using RMSE as a metric, and the difference is 0.02), but I still don't understand how it selected the minimum.
My first thought was maybe these parameters somehow exceeded the bounds I set, but I checked and that isn't the case.
I don't yet have a shareable reproducible version since I'm using some internal modules in the function call, but I figured I would post my question since it is more about the conceptual aspect of how basinhopping selects its result.

z3 minimization and timeout

I try to use the z3 solver for a minimization problem. I was trying to get a timeout, and return the best solution so far. I use the python API, and the timeout option "smt.timeout" with
set_option("smt.timeout", 1000) # 1s timeout
This actually times out after about 1 second. However a larger timeout does not provide a smaller objective. I ended up turning on the verbosity with
set_option("verbose", 2)
And I think that z3 successively evaluates larger values of my objective, until the problem is satisfiable:
(opt.maxres [0:6117664])
(opt.maxres [175560:6117664])
(opt.maxres [236460:6117664])
(opt.maxres [297360:6117664])
...
(opt.maxres [940415:6117664])
(opt.maxres [945805:6117664])
...
I thus have the two questions:
Can I on contrary tell z3 to start with the upper bound, and successively return models with a smaller value for my objective function (just like for instance Minizinc annotations indomain_max http://www.minizinc.org/2.0/doc-lib/doc-annotations-search.html)
It still looks like the solver returns a satisfiable instance of my problem. How is it found? If it's trying to evaluates larger values of my objective successively, it should not have found a satisfiable instance yet when the timeout occurs...
edit: In the opt.maxres log, the upper bound never shrinks.
For the record, I found a more verbose description of the options in the source here opt_params.pyg
Edit Sorry to bother, I've beed diving into this recently once again. Anyway I think this might be usefull to others. I've been finding that I actually have to call the Optimize.upper method in order to get the upper bound, and the model is still not the one that corresponds to this upper bound. I've been able to add it as a new constraint, and call a solver (without optimization, just SAT), but that's probably not the best idea. By reading this I feel like I should call Optimize.update_upper after the solver times out, but the python interface has no such method (?). At least I can get the upper bound, and the corresponding model now (at the cost of unneccessary computations I guess).
Z3 finds solutions for the hard constraints and records the current values for the objectives and soft constraints. The last model that was found (the last model with the so-far best value for the objectives) is returned if you ask for a model. The maxres strategy mainly improves the lower bounds on the soft constraints (e.g., any solution must have cost at least xx) and whenever possible improves the upper bound (the optional solution has cost at most yy). The lower bounds don't tell you too much other than narrowing the range of possible optimal values. The upper bounds are available when you timeout.
You could try one of the other strategies, such as the one called "wmax", which
performs a branch-and-prune. Typically maxres does significantly better, but you may have better experience (depending on the problems) with wmax for improving upper bounds.
I don't have a mode where you get a stream of models. It is in principle possible, but it would require some (non-trivial) reorganization. For Pareto fronts you make successive invocations to Optimize.check() to get the successive fronts.

Using pymc.potential to prevent evaluation of function at meaningless parameters values

I am building a pymc model which must evaluate a very cpu expensive function (up to 1 sec per call on a very decent hardware). I am trying to limit the explored parameter space to meaningful solutions by means of a potential (the sum of a list of my variables has to stay within a given range). This works but I noticed that even when my potential returns an infinite value and forbids the parameters choice, this function gets evaluated. Is there a way to prevent that? Can one force the sampler to use a given evaluation sequence (pick up the necessary variables, check if the potential is ok and proceed if allowed)
I thought of using the potential inside the function itself and use it to determine whether it must proceed or immediately return, but is there a better way?
Jean-François
I am not aware of a way of ordering the evaluation of the potentials. This might not be the best way of doing so, but you might be able to check if the parameters are within reasonable at the beginning of the simulation. If the parameters are not within reasonable bounds you can return a value that will create your posterior to be zero.
Another option is to create a function for your likelihood. At the beginning of this function you could check if the parameters are within reasonable limits. If they are not you can return -inf without running your simulation. If they are reasonable you can run your model and calculate the log(p).
This is definitely not an elegant solution but it should work.
Full disclosure - I am not by any means a pymc expert.

Optimization through machine learning

I've got a system that takes 15 points out of a 17 by 17 grid as input (order doesn't matter), and generates a single scalar as output. The system is not representable by a formal function.
The goal is to find the optimal 15 points so that the output scalar is minimum. Solving this problem exhaustively simply takes too much time to be practical as each run takes 14 seconds.
I've started taking a machine learning course online. But this problem does seem to be rather unsophisticated and I wonder if anyone can point me to the right direction. Any help is greatly appreciated!
Use simulated annealing. I guess this will be close to optimal here.
Therefore, start with a random distribution of the 15 points. Then, in each iteration change one and accept the new state if the resulting scalar value is lower. If it is larger, accept with a certain probability (a Boltzmann factor). Eventually you have to try this for a small number of randomly chosen initial states and afterwards accept the lowest value.

optimizing a function to find global and local peaks with R

Y
I have 6 parameters for which I know maxi and mini values. I have a complex function that includes the 6 parameters and return a 7th value (say Y). I say complex because Y is not directly related to the 6 parameters; there are many embeded functions in between.
I would like to find the combination of the 6 parameters which returns the highest Y value. I first tried to calculate Y for every combination by constructing an hypercube but I have not enough memory in my computer. So I am looking for kinds of markov chains which progress in the delimited parameter space, and are able to overpass local peaks.
when I give one combination of the 6 parameters, I would like to know the highest local Y value. I tried to write a code with an iterative chain like a markov's one, but I am not sure how to process when the chain reach an edge of the parameter space. Obviously, some algorythms should already exist for this.
Question: Does anybody know what are the best functions in R to do these two things? I read that optim() could be appropriate to find the global peak but I am not sure that it can deal with complex functions (I prefer asking before engaging in a long (for me) process of code writing). And fot he local peaks? optim() should not be able to do this
In advance, thank you for any lead
Julien from France
Take a look at the Optimization and Mathematical Programming Task View on CRAN. I've personally found the differential evolution algorithm to be very fast and robust. It's implemented in the DEoptim package. The rgenoud package is another good candidate.
I like to use the Metropolis-Hastings algorithm. Since you are limiting each parameter to a range, the simple thing to do is let your proposal distribution simply be uniform over the range. That way, you won't run off the edges. It won't be fast, but if you let it run long enough, it will do a good job of sampling your space. The samples will congregate at each peak, and will spread out around them in a way that reflects the local curvature.