Should I transform constraint optimization to unconstrained optimization? - optimization

I have a two part question based on the optimization problem,
max f(x) s.t. a <= x <= b
where f is a nonlinear function and a and b are finite.
(1) I have heard that if possible, one should try transform this constrained optimization problem to an unconstrained one (I am interested in not finding local maximums but this could also be to speed up the optimization). Is this in general true?
For the specific problem at hand, I am using the "optim" function in R with "Nelder-Mead" that uses non-differentiable optimization.
(2) Is there a "best" transformation to use to transform the constrained to unconstrained problem?
I am using a +(b-a)*(sin(x)+1)/2 because it is onto and continuous (and so I am hoping not to find local maximums by searching the entire interval).
See https://math.stackexchange.com/questions/75077/mapping-the-real-line-to-the-unit-interval for some transformations. The unconstrained problem is then,
max f(a +(b-a)*(sin(x)+1)/2)
Also in the case of a one-sided constraint a < x, I have seen people use the exponential function a + exp(x). Is this the best thing to do?

Related

Can I use a lookup table instead of a 5 degree polynomial equation between three variables in a non-linear optimization model?

I am having a non-linear optimization model with several variables and a certain function between three of them should be defined as a constraint. (Let us say, that the efficiency of a machine is dependent on the inlet and outlet temperatures). I have calculated some values in a table to visualize the dependency for T_inlets and T_outlets. It gives back a pretty ugly surface. A good fit would be something like a 5 degree polynomial equation if I wanted to define a function directly, but I do not think that would boost my computation speed... So instead I am considering simply having the created table and use it as a lookup table. Is a non-linear solver able to interpret this? I am using ipopt in Pyomo environment.
Another idea would be to limit my feasible temperature range and simplify the connection...maybe with using peace-wise linearization. Is it doable with 3d surfaces?
Thanks in advance!

Usage of scipy.optimize.fmin_slsqp for Integer design variable

I'm trying to use the scipy.optimize.slsqp for an industrial-related constrained optimization. A highly non-linear FE model is used to generate the objective and the constraint functions, and their derivatives/sensitivities.
The objective function is in the form:
obj=a number calculated from the FE model
A series of constraint functions are set, and most of them are in the form:
cons = real number i - real number j (calculated from the FE model)
I would like to try to restrict the design variables to integers as that would be what input into the plant machine.
Another consideration is to have a log file recording what design variable have been tried. if a set of design variable (integer) is already tried for, skip the calculation, perturb the design variable and try again. By limiting the design variable to integers, we are able to limit the number of trials (while leaving the design variable to real, a change in the e.g. 8th decimal point could be regarded as untried values).
I'm using SLSQP as it is one of the SQP method (please correct me if I am wrong), and the it is said to be powerful to deal with nonlinear problems. I understand the SLSQP algorithm is a gradient-based optimizer and there is no way I can implement the restriction of the design variables being integer in the algorithm coded in FORTRAN. So instead, I modified the slsqp.py file to the following (where it calls the python extension built from the FORTRAN algorithm):
slsqp(m, meq, x, xl, xu, fx, c, g, a, acc, majiter, mode, w, jw)
for i in range(len(x)):
x[i]=int(x[i])
The code stops at the 2nd iteration and output the following:
Optimization terminated successfully. (Exit mode 0)
Current function value: -1.286621577077517
Iterations: 7
Function evaluations: 0
Gradient evaluations: 0
However, one of the constraint function is violated (value at about -5.2, while the default convergence criterion of the optimization code = 10^-6).
Questions:
1. Since the FE model is highly nonlinear, I think it's safe to assume the objective and constraint functions will be highly nonlinear too (regardless of their mathematical form). Is that correct?
2. Based on the convergence criterion of the slsqp algorithm(please see below), one of which requires the sum of all constraint violations(absolute values) to be less than a very small value (10^-6), how could the optimization exit with successful termination message?
IF ((ABS(f-f0).LT.acc .OR. dnrm2_(n,s,1).LT.acc).AND. h3.LT.acc)
Any help or advice is appreciated. Thank you.

Determine the running time of an algorithm with two parameters

I have implemented an algorithm that uses two other algorithms for calculating the shortest path in a graph: Dijkstra and Bellman-Ford. Based on the time complexity of the these algorithms, I can calculate the running time of my implementation, which is easy giving the code.
Now, I want to experimentally verify my calculation. Specifically, I want to plot the running time as a function of the size of the input (I am following the method described here). The problem is that I have two parameters - number of edges and number of vertices.
I have tried to fix one parameter and change the other, but this approach results in two plots - one for varying number of edges and the other for varying number of vertices.
This leads me to my question - how can I determine the order of growth based on two plots? In general, how can one experimentally determine the running time complexity of an algorithm that has more than one parameter?
It's very difficult in general.
The usual way you would experimentally gauge the running time in the single variable case is, insert a counter that increments when your data structure does a fundamental (putatively O(1)) operation, then take data for many different input sizes, and plot it on a log-log plot. That is, log T vs. log N. If the running time is of the form n^k you should see a straight line of slope k, or something approaching this. If the running time is like T(n) = n^{k log n} or something, then you should see a parabola. And if T is exponential in n you should still see exponential growth.
You can only hope to get information about the highest order term when you do this -- the low order terms get filtered out, in the sense of having less and less impact as n gets larger.
In the two variable case, you could try to do a similar approach -- essentially, take 3 dimensional data, do a log-log-log plot, and try to fit a plane to that.
However this will only really work if there's really only one leading term that dominates in most regimes.
Suppose my actual function is T(n, m) = n^4 + n^3 * m^3 + m^4.
When m = O(1), then T(n) = O(n^4).
When n = O(1), then T(n) = O(m^4).
When n = m, then T(n) = O(n^6).
In each of these regimes, "slices" along the plane of possible n,m values, a different one of the terms is the dominant term.
So there's no way to determine the function just from taking some points with fixed m, and some points with fixed n. If you did that, you wouldn't get the right answer for n = m -- you wouldn't be able to discover "middle" leading terms like that.
I would recommend that the best way to predict asymptotic growth when you have lots of variables / complicated data structures, is with a pencil and piece of paper, and do traditional algorithmic analysis. Or possibly, a hybrid approach. Try to break the question of efficiency into different parts -- if you can split the question up into a sum or product of a few different functions, maybe some of them you can determine in the abstract, and some you can estimate experimentally.
Luckily two input parameters is still easy to visualize in a 3D scatter plot (3rd dimension is the measured running time), and you can check if it looks like a plane (in log-log-log scale) or if it is curved. Naturally random variations in measurements plays a role here as well.
In Matlab I typically calculate a least-squares solution to two-variable function like this (just concatenates different powers and combinations of x and y horizontally, .* is an element-wise product):
x = log(parameter_x);
y = log(parameter_y);
% Find a least-squares fit
p = [x.^2, x.*y, y.^2, x, y, ones(length(x),1)] \ log(time)
Then this can be used to estimate running times for larger problem instances, ideally those would be confirmed experimentally to know that the fitted model works.
This approach works also for higher dimensions but gets tedious to generate, maybe there is a more general way to achieve that and this is just a work-around for my lack of knowledge.
I was going to write my own explanation but it wouldn't be any better than this.

How to optimize non-negative constraints with gradient descent

I have an optimization in the following form,
argmin_W f(W)
s.t. W_i > 0, for all i
where W is a vector, and f(W) is a function on W.
I know how to optimize without the non-negative constraints. But I am unsure about how to optimize this with gradient descent.
Optimization on the open set is quite tricky, so let us assume that W_i >= 0, consequently you can use many methods:
optimize f(|W|) on the whole domain
use GD for f(W) but after each iteration project your solution back to the domain, so put W = |W|
use constrained optimization techniques, such as L-BFGS-B
I don't think there is a general and simple way of doing it. You will have to do some sort of search at each point to make sure the constraints are met (techniques like line search, trust regions).
Or perhaps f has some structure you can exploit.

approximating log10[x^k0 + k1]

Greetings. I'm trying to approximate the function
Log10[x^k0 + k1], where .21 < k0 < 21, 0 < k1 < ~2000, and x is integer < 2^14.
k0 & k1 are constant. For practical purposes, you can assume k0 = 2.12, k1 = 2660. The desired accuracy is 5*10^-4 relative error.
This function is virtually identical to Log[x], except near 0, where it differs a lot.
I already have came up with a SIMD implementation that is ~1.15x faster than a simple lookup table, but would like to improve it if possible, which I think is very hard due to lack of efficient instructions.
My SIMD implementation uses 16bit fixed point arithmetic to evaluate a 3rd degree polynomial (I use least squares fit). The polynomial uses different coefficients for different input ranges. There are 8 ranges, and range i spans (64)2^i to (64)2^(i + 1).
The rational behind this is the derivatives of Log[x] drop rapidly with x, meaning a polynomial will fit it more accurately since polynomials are an exact fit for functions that have a derivative of 0 beyond a certain order.
SIMD table lookups are done very efficiently with a single _mm_shuffle_epi8(). I use SSE's float to int conversion to get the exponent and significand used for the fixed point approximation. I also software pipelined the loop to get ~1.25x speedup, so further code optimizations are probably unlikely.
What I'm asking is if there's a more efficient approximation at a higher level?
For example:
Can this function be decomposed into functions with a limited domain like
log2((2^x) * significand) = x + log2(significand)
hence eliminating the need to deal with different ranges (table lookups). The main problem I think is adding the k1 term kills all those nice log properties that we know and love, making it not possible. Or is it?
Iterative method? don't think so because the Newton method for log[x] is already a complicated expression
Exploiting locality of neighboring pixels? - if the range of the 8 inputs fall in the same approximation range, then I can look up a single coefficient, instead of looking up separate coefficients for each element. Thus, I can use this as a fast common case, and use a slower, general code path when it isn't. But for my data, the range needs to be ~2000 before this property hold 70% of the time, which doesn't seem to make this method competitive.
Please, give me some opinion, especially if you're an applied mathematician, even if you say it can't be done. Thanks.
You should be able to improve on least-squares fitting by using Chebyshev approximation. (The idea is, you're looking for the approximation whose worst-case deviation in a range is least; least-squares instead looks for the one whose summed squared difference is least.) I would guess this doesn't make a huge difference for your problem, but I'm not sure -- hopefully it could reduce the number of ranges you need to split into, somewhat.
If there's already a fast implementation of log(x), maybe compute P(x) * log(x) where P(x) is a polynomial chosen by Chebyshev approximation. (Instead of trying to do the whole function as a polynomial approx -- to need less range-reduction.)
I'm an amateur here -- just dipping my toe in as there aren't a lot of answers already.
One observation:
You can find an expression for how large x needs to be as a function of k0 and k1, such that the term x^k0 dominates k1 enough for the approximation:
x^k0 +k1 ~= x^k0, allowing you to approximately evaluate the function as
k0*Log(x).
This would take care of all x's above some value.
I recently read how the sRGB model compresses physical tri stimulus values into stored RGB values.
It basically is very similar to the function I try to approximate, except that it's defined piece wise:
k0 x, x < 0.0031308
k1 x^0.417 - k2 otherwise
I was told the constant addition in Log[x^k0 + k1] was to make the beginning of the function more linear. But that can easily be achieved with a piece wise approximation. That would make the approximation a lot more "uniform" - with only 2 approximation ranges. This should be cheaper to compute due to no longer needing to compute an approximation range index (integer log) and doing SIMD coefficient lookup.
For now, I conclude this will be the best approach, even though it doesn't approximate the function precisely. The hard part will be proposing this change and convincing people to use it.