Optim.jl univariate bounded optimization confusing output when using Inf as bound - optimization

The following is a self-contained example illustrating my problem.
using Optim
χI = 3
ψI = 0.5
ϕI(z) = z^-ψI
λ = 1.0532733
V0 = 0.8522423425
zE = 0.5986
wRD = 0.72166623555
objective1(z) = -(z * χI * ϕI(z + zE) * (λ-1) * V0 - z * ( wRD ))
objective2(z) = -1 * objective1(z)
lower = 0.01
upper = Inf
plot(0:0.01:0.1,objective1,title = "objective1")
png("/home/nico/Desktop/objective1.png")
plot(0:0.01:0.1,objective2, title = "objective2")
png("/home/nico/Desktop/objective2.png")
results1 = optimize(objective1,lower,upper)
results2 = optimize(objective2,lower,upper)
The plots are
and
Both objective1(z) and objective2(z) return NaN at z = 0 and finite values everywhere else, with an optimum for some z > 0.
However the output of results1 is
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.010000, Inf]
* Minimizer: Inf
* Minimum: NaN
* Iterations: 1000
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): false
* Objective Function Calls: 1001
and the output of results2 is
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.010000, Inf]
* Minimizer: Inf
* Minimum: NaN
* Iterations: 1000
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): false
* Objective Function Calls: 1001
I believe the problem is with upper = Inf. If I change that to upper = 100, for example, the output of results1 is
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.010000, 100.000000]
* Minimizer: 1.000000e-02
* Minimum: 5.470728e-03
* Iterations: 55
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
* Objective Function Calls: 56
and results2 returns
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.010000, 100.000000]
* Minimizer: 1.000000e+02
* Minimum: -7.080863e+01
* Iterations: 36
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
* Objective Function Calls: 37
as expected.

As you note in your question - you use bounded optimization algorithm but you pass an unbounded interval to it.
Citing the documentation (https://julianlsolvers.github.io/Optim.jl/latest/#user/minimization/), which is precise about it the optimize function is for Minimizing a univariate function on a bounded interval.
To give a more detail about the problem you encounter. The optimize method searches points inside your interval. There are two algorithms implemented: Brent (the default) and Golden Section. The point they first check is:
new_minimizer = x_lower + golden_ratio*(x_upper-x_lower)
and you see that it new_minimizer will be Inf. So the optimization routine is not even able to find a valid interior point. Then you see that your functions return NaN for Inf argument:
julia> objective1(Inf)
NaN
julia> objective2(Inf)
NaN
This combined gives you explanation why the minimum found is Inf and the objective is NaN in the produced output.
The second point is that you should remember that Float64 numbers have a finite precision, so you should choose the interval so as to make sure that the method is actually able to accurately evaluate the objective in it. For example even this fails:
julia> optimize(objective1, 0.0001, 1.0e308)
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.000100, 100000000000000001097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336.000000]
* Minimizer: 1.000005e+308
* Minimum: -Inf
* Iterations: 1000
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): false
* Objective Function Calls: 1001
The reason is that objective1 actually starts to behave in a numerically unstable way for very large arguments (because it has a finite precision), see:
julia> objective1(1.0e307)
7.2166623555e306
julia> objective1(1.0e308)
-Inf
The last point is that actually Optimize tells you that something went wrong and you should not rely on the results as:
julia> results1.converged
false
julia> results2.converged
false
For the initial specification of the problem (with Inf).

Related

minimum-difference constrained sparse least squares problem (called from python)

I'm struggling a bit finding a fast algorithm that's suitable.
I just want to minimize:
norm2(x-s)
st
G.x <= h
x >= 0
sum(x) = R
G is sparse and contains only 1s (and zeros obviously).
In the case of iterative algorithms, it would be nice to get the interim solutions to show to the user.
The context is that s is a vector of current results, and the user is saying "well the sum of these few entries (entries indicated by a few 1.0's in a row in G) should be less than this value (a row in h). So we have to remove quantities from the entries the user specified (indicated by 1.0 entries in G) in a least-squares optimal way, but since we have a global constraint on the total (R) the values removed need to be allocated in a least-squares optimal way amongst the other entries. The entries can't go negative.
All the algorithms I'm looking at are much more general, and as a result are much more complex. Also, they seem quite slow. I don't see this as a complex problem, although mixes of equality and inequality constraints always seem to make things more complex.
This has to be called from Python, so I'm looking at Python libraries like qpsolvers and scipy.optimize. But I suppose Java or C++ libraries could be used and called from Python, which might be good since multithreading is better in Java and C++.
Any thoughts on what library/package/approach to use to best solve this problem?
The size of the problem is about 150,000 rows in s, and a few dozen rows in G.
Thanks!
Your problem is a linear least squares:
minimize_x norm2(x-s)
such that G x <= h
x >= 0
1^T x = R
Thus it fits the bill of the solve_ls function in qpsolvers.
Here is an instance of how I imagine your problem matrices would look like, given what you specified. Since it is sparse we should use SciPy CSC matrices, and regular NumPy arrays for vectors:
import numpy as np
import scipy.sparse as spa
n = 150_000
# minimize 1/2 || x - s ||^2
R = spa.eye(n, format="csc")
s = np.array(range(n), dtype=float)
# such that G * x <= h
G = spa.diags(
diagonals=[
[1.0 if i % 2 == 0 else 0.0 for i in range(n)],
[1.0 if i % 3 == 0 else 0.0 for i in range(n - 1)],
[1.0 if i % 5 == 0 else 0.0 for i in range(n - 1)],
],
offsets=[0, 1, -1],
)
a_dozen_rows = np.linspace(0, n - 1, 12, dtype=int)
G = G[a_dozen_rows]
h = np.ones(12)
# such that sum(x) == 42
A = spa.csc_matrix(np.ones((1, n)))
b = np.array([42.0]).reshape((1,))
# such that x >= 0
lb = np.zeros(n)
Next, we can solve this problem with:
from qpsolvers import solve_ls
x = solve_ls(R, s, G, h, A, b, lb, solver="osqp", verbose=True)
Here I picked CVXOPT but there are other open-source solvers you can install such as ProxQP, OSQP or SCS. You can install a set of open-source solvers by: pip install qpsolvers[open_source_solvers]. After some solvers are installed, you can list those for sparse matrices by:
print(qpsolvers.sparse_solvers)
Finally, here is some code to check that the solution returned by the solver satisfies our constraints:
tol = 1e-6 # tolerance for checks
print(f"- Objective: {0.5 * (x - s).dot(x - s):.1f}")
print(f"- G * x <= h: {(G.dot(x) <= h + tol).all()}")
print(f"- x >= 0: {(x + tol >= 0.0).all()}")
print(f"- sum(x) = {x.sum():.1f}")
I just tried it with OSQP (adding the eps_rel=1e-5 keyword argument when calling solve_ls, otherwise the returned solution would be less accurate than the tol = 1e-6 tolerance) and it found a solution is 737 milliseconds on my (rather old) CPU with:
- Objective: 562494373088866.8
- G * x <= h: True
- x >= 0: True
- sum(x) = 42.0
Hoping this helps. Happy solving!

Gurobi objective function

I am trying to convert an objective function from scipy to Gurobi as follows but getting "unsupported operand type(s) for ** or pow(): 'gurobipy.LinExpr' and 'float'".
Any idea how I could re-write the below? Thanks in advance.
from gurobipy import *
import scipy.optimize as optimize
price = 95.0428
par = 100.0
T = 1.5
coup = 5.75
freq = 2
guess = 0.05
freq = float(freq)
periods = T * freq
coupon = coup / 100. * par / freq
dt = [(i + 1) / freq for i in range(int(periods))]
#coverting the below scipy.optimize to Gurobi
#ytm_func = lambda y: sum([coupon / (1 + y / freq) ** (freq * t) for t in dt]) + (par / (1 + y / freq) ** (freq * T)) - price
#optimize.newton(ytm_func, guess)
m = Model()
y = m.addVar(vtype=GRB.CONTINUOUS, name='y')
m.setObjective(quicksum([coupon / (1 + y / freq) ** (freq * t) for t in dt]) + (par / (1 + y / freq) ** (freq * T)) - price, GRB.MINIMIZE)
m.optimize()
m.printAttr('X')
Hi I think what you are trying to do is not supported by gurobi yet. Not at least as a quadratic programming.
First you have your variables in the denominator which is not advised / supported directly
Second what you are defining is not a quadratic problem. It is a polynomial problem. As far as I know gurobi currently supports only quadratic programs with expressions such as y*y
This is unconstrained problem so I wonder why you need gurobi. Scientific solvers deal with these problem pretty well using gradient decent, Newton and so on methods
I hope this helps

Objective C vs Swift math operations

Performing the following operation in Objective-C and Swift returns different results:
With an input of 23492.4852,
Objective-C function:
+ (double)funTest:(double)a {
return a - (int) a / 360 * 360;
}
returns 92.48521
Swift function:
class func funTest(a: Double) -> Double {
return a - Double(Int(a)) / 360 * 360
}
returns 0.48521
Does anybody know why the difference?
The difference is integer vs. floating point division. In integer division, the fractional part is ignored. Some quick examples are 1 / 2 = 0 or 2 / 3 = 0 but 1.0 / 2.0 = 0.5 and 2.0 / 3.0 = 0.67.
Let's break down how your code works in both languages:
Objective-C
Assuming a = 23492.4852:
a - (int) a / 360 * 360 = a - ((int) a) / 360 * 360
= a - 23492 / 360 * 360 // integer division
= a - 65 * 360
= 23492.4852 - 23400
= 92.4852
Objective-C inherits type promotion rules from C, which can be a lot to remember.
Swift
Assuming a = 23492.4852:
a - Double(Int(a)) / 360 * 360 = a - Double(23492) / 360 * 360 // floating point division
= a - 65.2556 * 360
= a - 23492
= 23492.4852 - 23492
= 0.4852
In both cases, the compiler has some leeways in interpreting the literal constant of 360: it can be seen an int or double.
I don't know the exact internal workings of the ObjC compiler. You just have to be careful when mixing numeric types in C.
Swift tries to prevent this confusion by forcing all operands to be of the same data type. Since a is Double, the only way to interpret 360 is that it must also be a Double.
Does anybody know why the difference?
You've just made a simple grouping error.
As you figured out both Objective-C and Swift require a cast when converting from a floating-point value to an integer one, so you have written (int)a for the former and Int(a) for the latter.
You have also understood that converting from an integer to a floating-point value differs in the two languages, in Objective-C (and C and lots of other languages) the conversion is implicit whereas in Swift it is explicit.
The only mistake you have made is in parsing the Objective-C and hence producing the wrong Swift or you've simply mis-typed the Swift.
In arithmetic expressions operators are evaluated according to a priority, relevant to your problem casts bind tightly to the following expression, multiplication and division is done next, then addition and subtraction. What this means is your Objective-C:
a - (int) a / 360 * 360
is parsed as:
a - (double) ( (int) a / 360 * 360 )
note that the (double) cast applies to the result of the expression (int) a / 360 * 360. What you've written in Swift is:
a - Double(Int(a)) / 360 * 360
which isn't the same, here the cast only applies to Int(a). What you should have written is:
a - Double(Int(a) / 360 * 360)
which applies the cast to Int(a) / 360 * 360 just as the Objective-C does.
With that correction in both languages the multiplication and division all operate on integers, and integer division is truncating (e.g. 9 / 4 is 2 not 2.25). With the misplaced parenthesis in Swift the multiplication and division all operate on floating-point values.
TL;DR: You just misplaced a parenthesis.
HTH
It's due to how the compilers see the numbers. Notice in swift you had to explicitly cast a into a double after casting it to an Int? The swift compiler sees the entire expression as Doubles so when you do Double(Int(a)) / 360 * 360 you're getting 23492 / 360 * 360 = 65.25555... * 360 = 23492.4852. However, in C/C++/Obj-C etc it sees that 23492 / 360 as an int division giving 23492 / 360 * 360 = 65 * 360 = 23400. And that's where the 90 comes from (from the loss of precision when dividing 2 ints in C.

Is there a more concise way to calculate the P value for the Anderson-Darling A-squared statistic with VBA?

I have two bits of code in VBA for Excel. One calculates the A-squared statistic for the Anderson-Darling test, this bit of code calculates the P value of the A-squared statistic. I am curious if there is a more concise way or more efficient way to calculate this value in VBA:
Function AndDarP(AndDar, Elements)
'Calculates P value for level of significance for the
'Anderson-Darling Test for Normality
'AndDar is the Anderson-Darling Test Statistic
'Elements is the count of elements used in the
'Anderson-Darling test statistic.
'based on calculations at
'http://www.kevinotto.com/RSS/Software/Anderson-Darling%20Normality%20Test%20Calculator.xls
'accessed 21 May 2010
'www.kevinotto.com
'kevin_n_otto#yahoo.com
'Version 6.0
'Permission to freely distribute and modify when properly
'referenced and contact information maintained.
'
'"Keep in mind the test assumes normality, and is looking for sufficient evidence to reject normality.
'That is, a large p-value (often p > alpha = 0.05) would indicate normality.
' * * *
'Test Hypotheses:
'Ho: Data is sampled from a population that is normally distributed
'(no difference between the data and normal data).
'Ha: Data is sampled from a population that is not normally distributed"
Dim M As Double
M = AndDar * (1 + 0.75 / Elements + 2.25 / Elements ^ 2)
Select Case M
Case Is < 0.2
AndDarP = 1 - Exp(-13.436 + 101.14 * M - 223.73 * M ^ 2)
Case Is < 0.34
AndDarP = 1 - Exp(-8.318 + 42.796 * M - 59.938 * M ^ 2)
Case Is < 0.6
AndDarP = Exp(0.9177 - 4.279 * M - 1.38 * M ^ 2)
Case Is < 13
AndDarP = Exp(1.2937 - 5.709 * M + 0.0186 * M ^ 2)
Case Else
AndDarP = 0
End Select
End Function

Optim Julia Univariate Minimization with Initial Condition

Is there a way to specify an initial condition (which I would hope improves speed) for univariate optimization using Optim in Julia? It seems like this isn't possible reading the documentation as only multivariate optimizations seem to accept an initial condition. I guess I could just specify my problem as a multivariate one, and ignore one of the variables but that's not particularly elegant.
If you don't want to use either Brent or Golden Section search, you can simply use the gradient or Hessian based methods, since R^n includes the case n = 1 for most of the algorithms in Optim. You do have to follow the syntax for multivariate methods and pass a vector.
julia> using Optim, Plots
julia> f(x) = -2*x[1]+3*x[1]^2+sin(x[1]*3)
f (generic function with 1 method)
julia> plot(x->f4([x,]), lab = "Univariate Function")
julia> optimize(f, [2.5,], GradientDescent())
Results of Optimization Algorithm
* Algorithm: Gradient Descent
* Starting Point: [2.5]
* Minimizer: [-0.12943993754432737]
* Minimum: -6.948989e-02
* Iterations: 5
* Convergence: true
* |x - x'| < 1.0e-32: false
|x - x'| = 3.35e-08
* |f(x) - f(x')| / |f(x)| < 1.0e-32: false
|f(x) - f(x')| / |f(x)| = NaN
* |g(x)| < 1.0e-08: true
|g(x)| = 4.58e-12
* stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 12
* Gradient Calls: 12
You mean for Brent's method and Golden section search? I think the initial condition in these methods is determined by initial lower and upper bounds you set. So providing an initial guess for x_minimum would be redundant/wrong from the viewpoint of these algorithms.
For example, in Brent's method the initial value of the estimated minimum is computed to be:
x_minimum = x_lower + golden_ratio*(x_upper-x_lower)
See the source code