Is there a way to specify an initial condition (which I would hope improves speed) for univariate optimization using Optim in Julia? It seems like this isn't possible reading the documentation as only multivariate optimizations seem to accept an initial condition. I guess I could just specify my problem as a multivariate one, and ignore one of the variables but that's not particularly elegant.
If you don't want to use either Brent or Golden Section search, you can simply use the gradient or Hessian based methods, since R^n includes the case n = 1 for most of the algorithms in Optim. You do have to follow the syntax for multivariate methods and pass a vector.
julia> using Optim, Plots
julia> f(x) = -2*x[1]+3*x[1]^2+sin(x[1]*3)
f (generic function with 1 method)
julia> plot(x->f4([x,]), lab = "Univariate Function")
julia> optimize(f, [2.5,], GradientDescent())
Results of Optimization Algorithm
* Algorithm: Gradient Descent
* Starting Point: [2.5]
* Minimizer: [-0.12943993754432737]
* Minimum: -6.948989e-02
* Iterations: 5
* Convergence: true
* |x - x'| < 1.0e-32: false
|x - x'| = 3.35e-08
* |f(x) - f(x')| / |f(x)| < 1.0e-32: false
|f(x) - f(x')| / |f(x)| = NaN
* |g(x)| < 1.0e-08: true
|g(x)| = 4.58e-12
* stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 12
* Gradient Calls: 12
You mean for Brent's method and Golden section search? I think the initial condition in these methods is determined by initial lower and upper bounds you set. So providing an initial guess for x_minimum would be redundant/wrong from the viewpoint of these algorithms.
For example, in Brent's method the initial value of the estimated minimum is computed to be:
x_minimum = x_lower + golden_ratio*(x_upper-x_lower)
See the source code
Related
I am trying to use Pytorch for non-convex optimisation, trying to maximise my objective (so minimise in SGD). I would like to bound my dependent variable x > 0, and also have the sum of my x values be less than 1000.
I think I have the penalty implemented correctly in the form of a ramp penalty, but am struggling with the bounding of the x variable. In Pytorch you can set the bounds using clamp but it doesn't seem appropriate in this case. I think this is because optim needs the gradients free under the hood. Full working example:
import torch
from torch.autograd import Variable
import numpy as np
def objective(x, a, b, c): # Want to maximise this quantity (so minimise in SGD)
d = 1 / (1 + torch.exp(-a * (x)))
# Checking constraint
exceeded_limit = constraint(x).item()
#print(exceeded_limit)
obj = torch.sum(d * (b * c - x))
# If overlimit add ramp penalty
if exceeded_limit < 0:
obj = obj - (exceeded_limit * 10)
print("Exceeded limit")
return - obj
def constraint(x, limit = 1000): # Must be > 0
return limit - x.sum()
N = 1000
# x is variable to optimise for
x = Variable(torch.Tensor([1 for ii in range(N)]), requires_grad=True)
a = Variable(torch.Tensor(np.random.uniform(0,100,N)), requires_grad=True)
b = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
c = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
# Would like to include the clamp
# x = torch.clamp(x, min=0)
# Non-convex methodf
opt = torch.optim.SGD([x], lr=.01)
for i in range(10000):
# Zeroing gradients
opt.zero_grad()
# Evaluating the objective
obj = objective(x, a, b, c)
# Calculate gradients
obj.backward()
opt.step()
if i%1000==0: print("Objective: %.1f" % -obj.item())
print("\nObjective: {}".format(-obj))
print("Limit: {}".format(constraint(x).item()))
if torch.sum(x<0) > 0: print("Bounds not met")
if constraint(x).item() < 0: print("Constraint not met")
Any suggestions as to how to impose the bounds would be appreciated, either using clamp or otherwise. Or generally advice on non-convex optimisation using Pytorch. This is a much simpler and scaled down version of the problem I'm working so am trying to find a lightweight solution if possible. I am considering using a workaround such as transforming the x variable using an exponential function but then you'd have to scale the function to avoid the positive values becoming infinite, and I want some flexibility with being able to set the constraint.
I meet the same problem with you.
I want to apply bounds on a variable in PyTorch, too.
And I solved this problem by the below Way3.
Your example is a little compliex but I am still learning English.
So I give a simpler example below.
For example, there is a trainable variable v, its bounds is (-1, 1)
v = torch.tensor((0.5, ), require_grad=True)
v_loss = xxxx
optimizer.zero_grad()
v_loss.backward()
optimizer.step()
Way1. RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
v.clamp_(-1, 1)
Way2. RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed.
v = torch.clamp(v, -1, +1) # equal to v = v.clamp(-1, +1)
Way3. NotError. I solved this problem in Way3.
with torch.no_grad():
v[:] = v.clamp(-1, +1) # You must use v[:]=xxx instead of v=xxx
The following is a self-contained example illustrating my problem.
using Optim
χI = 3
ψI = 0.5
ϕI(z) = z^-ψI
λ = 1.0532733
V0 = 0.8522423425
zE = 0.5986
wRD = 0.72166623555
objective1(z) = -(z * χI * ϕI(z + zE) * (λ-1) * V0 - z * ( wRD ))
objective2(z) = -1 * objective1(z)
lower = 0.01
upper = Inf
plot(0:0.01:0.1,objective1,title = "objective1")
png("/home/nico/Desktop/objective1.png")
plot(0:0.01:0.1,objective2, title = "objective2")
png("/home/nico/Desktop/objective2.png")
results1 = optimize(objective1,lower,upper)
results2 = optimize(objective2,lower,upper)
The plots are
and
Both objective1(z) and objective2(z) return NaN at z = 0 and finite values everywhere else, with an optimum for some z > 0.
However the output of results1 is
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.010000, Inf]
* Minimizer: Inf
* Minimum: NaN
* Iterations: 1000
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): false
* Objective Function Calls: 1001
and the output of results2 is
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.010000, Inf]
* Minimizer: Inf
* Minimum: NaN
* Iterations: 1000
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): false
* Objective Function Calls: 1001
I believe the problem is with upper = Inf. If I change that to upper = 100, for example, the output of results1 is
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.010000, 100.000000]
* Minimizer: 1.000000e-02
* Minimum: 5.470728e-03
* Iterations: 55
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
* Objective Function Calls: 56
and results2 returns
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.010000, 100.000000]
* Minimizer: 1.000000e+02
* Minimum: -7.080863e+01
* Iterations: 36
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
* Objective Function Calls: 37
as expected.
As you note in your question - you use bounded optimization algorithm but you pass an unbounded interval to it.
Citing the documentation (https://julianlsolvers.github.io/Optim.jl/latest/#user/minimization/), which is precise about it the optimize function is for Minimizing a univariate function on a bounded interval.
To give a more detail about the problem you encounter. The optimize method searches points inside your interval. There are two algorithms implemented: Brent (the default) and Golden Section. The point they first check is:
new_minimizer = x_lower + golden_ratio*(x_upper-x_lower)
and you see that it new_minimizer will be Inf. So the optimization routine is not even able to find a valid interior point. Then you see that your functions return NaN for Inf argument:
julia> objective1(Inf)
NaN
julia> objective2(Inf)
NaN
This combined gives you explanation why the minimum found is Inf and the objective is NaN in the produced output.
The second point is that you should remember that Float64 numbers have a finite precision, so you should choose the interval so as to make sure that the method is actually able to accurately evaluate the objective in it. For example even this fails:
julia> optimize(objective1, 0.0001, 1.0e308)
Results of Optimization Algorithm
* Algorithm: Brent's Method
* Search Interval: [0.000100, 100000000000000001097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336.000000]
* Minimizer: 1.000005e+308
* Minimum: -Inf
* Iterations: 1000
* Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): false
* Objective Function Calls: 1001
The reason is that objective1 actually starts to behave in a numerically unstable way for very large arguments (because it has a finite precision), see:
julia> objective1(1.0e307)
7.2166623555e306
julia> objective1(1.0e308)
-Inf
The last point is that actually Optimize tells you that something went wrong and you should not rely on the results as:
julia> results1.converged
false
julia> results2.converged
false
For the initial specification of the problem (with Inf).
In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.
Loss function
loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m
Probability of Y
predY is computed using sigmoid and logits can be thought as the outcome of from a neural network before reaching the classification step
predY = sigmoid(logits) #binary case
def sigmoid(X):
return 1/(1 + np.exp(-X))
Problem
Suppose we are running a feed-forward net.
Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)
Num of hidden units: 100 (only 1 hidden layer)
Iterations: 10000
Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0) undefined. How do you usually handle this issue?
If you don't mind the dependency on scipy, you can use scipy.special.xlogy. You would replace the expression
np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY))
with
xlogy(Y, predY) + xlogy(1 - Y, 1 - predY)
If you expect predY to contain very small values, you might get better numerical results using scipy.special.xlog1py in the second term:
xlogy(Y, predY) + xlog1py(1 - Y, -predY)
Alternatively, knowing that the values in Y are either 0 or 1, you can compute the cost in an entirely different way:
Yis1 = Y == 1
cost = -(np.log(predY[Yis1]).sum() + np.log(1 - predY[~Yis1]).sum())/m
How do you usually handle this issue?
Add small number (something like 1e-15) to predY - this number doesn't make predictions much off, and it solves log(0) issue.
BTW if your algorithm outputs zeros and ones it might be useful to check the histogram of returned probabilities - when algorithm is so sure that something's happening it can be a sign of overfitting.
One common way to deal with log(x) and y / x where x is always non-negative but can become 0 is to add a small constant (as written by Jakub).
You can also clip the value (e.g. tf.clip_by_value or np.clip).
I have a time series (more specifically a correlation function). I want to bandpass-filter this signal using a Gaussian function H:
H(w) = e^(-alpha((w-wn)/wn)^2),
where wn is the central frequency in my bandpass filter and alpha is a certain constant value that I know.
I apply a (inverse) FFT to my H function:
H = np.e ** (-alfa * ((w - wn) / wn) ** 2)
H = np.fft.ifft(H)
HH = np.asarray([i1 for i1 in itertools.chain(H[len(H)/2:len(H)], H[0:len(H)/2])])
And what I do then is to use fftconvolve:
filtered = fftconvolve(data, HH.real, mode='same'),
but the "filtered signal" that I see seems to be filtering frequencies centered in 2 times wn.
What is the correct way of doing this? Is there a restriction in the length of my filter with respect to the length of my time series?
Perhaps what you are looking for is the Gaussian filter from Scipy,
from scipy.ndimage import gaussian_filter
output = gaussian_filter(input, sigma )
where sigma is the standard deviation of the Gaussian kernel. See the Scipy documentation for more details. https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.gaussian_filter.html
My inner loop contains a calculation that profiling shows to be problematic.
The idea is to take a greyscale pixel x (0 <= x <= 1), and "increase its contrast". My requirements are fairly loose, just the following:
for x < .5, 0 <= f(x) < x
for x > .5, x < f(x) <= 1
f(0) = 0
f(x) = 1 - f(1 - x), i.e. it should be "symmetric"
Preferably, the function should be smooth.
So the graph must look something like this:
.
I have two implementations (their results differ but both are conformant):
float cosContrastize(float i) {
return .5 - cos(x * pi) / 2;
}
float mulContrastize(float i) {
if (i < .5) return i * i * 2;
i = 1 - i;
return 1 - i * i * 2;
}
So I request either a microoptimization for one of these implementations, or an original, faster formula of your own.
Maybe one of you can even twiddle the bits ;)
Consider the following sigmoid-shaped functions (properly translated to the desired range):
error function
normal CDF
tanh
logit
I generated the above figure using MATLAB. If interested here's the code:
x = -3:.01:3;
plot( x, 2*(x>=0)-1, ...
x, erf(x), ...
x, tanh(x), ...
x, 2*normcdf(x)-1, ...
x, 2*(1 ./ (1 + exp(-x)))-1, ...
x, 2*((x-min(x))./range(x))-1 )
legend({'hard' 'erf' 'tanh' 'normcdf' 'logit' 'linear'})
Trivially you could simply threshold, but I imagine this is too dumb:
return i < 0.5 ? 0.0 : 1.0;
Since you mention 'increasing contrast' I assume the input values are luminance values. If so, and they are discrete (perhaps it's an 8-bit value), you could use a lookup table to do this quite quickly.
Your 'mulContrastize' looks reasonably quick. One optimization would be to use integer math. Let's say, again, your input values could actually be passed as an 8-bit unsigned value in [0..255]. (Again, possibly a fine assumption?) You could do something roughly like...
int mulContrastize(int i) {
if (i < 128) return (i * i) >> 7;
// The shift is really: * 2 / 256
i = 255 - i;
return 255 - ((i * i) >> 7);
A piecewise interpolation can be fast and flexible. It requires only a few decisions followed by a multiplication and addition, and can approximate any curve. It also avoids the courseness that can be introduced by lookup tables (or the additional cost in two lookups followed by an interpolation to smooth this out), though the lut might work perfectly fine for your case.
With just a few segments, you can get a pretty good match. Here there will be courseness in the color gradients, which will be much harder to detect than courseness in the absolute colors.
As Eamon Nerbonne points out in the comments, segmentation can be optimized by "choos[ing] your segmentation points based on something like the second derivative to maximize detail", that is, where the slope is changing the most. Clearly, in my posted example, having three segments in the middle of the five segment case doesn't add much more detail.