How to do the optimization for a mean squares error in a Python code faster - optimization

(I'm new to stack overflow, but I will try to write my problem the best way I can)
For my thesis, I need to do the optimization for a mean squares error problem as fast as possible. For this problem, I used to use the scipy.optimize.minimize method (with and without jacobian). However; the optimization was still too slow for what we wanted to do. (This program is running on mac with python 3.9)
So first, this is the function to minimize (I already tried to simplify the formula, but it didn't change the speed of the program
def _residuals_mse(coef, unshimmed_vec, coil_mat, factor):
""" Objective function to minimize the mean squared error (MSE)
Args:
coef (numpy.ndarray): 1D array of channel coefficients
unshimmed_vec (numpy.ndarray): 1D flattened array (point)
coil_mat (numpy.ndarray): 2D flattened array (point, channel) of masked coils
(axis 0 must align with unshimmed_vec)
factor (float): Devise the result by 'factor'. This allows to scale the output for the minimize function to avoid positive directional linesearch
Returns:
scalar: Residual for least squares optimization
"""
# MSE regularized to minimize currents
return np.mean((unshimmed_vec + np.sum(coil_mat * coef, axis=1, keepdims=False)) ** 2) / factor + \ (self.reg_factor * np.mean(np.abs(coef) / self.reg_factor_channel))
This is the jacobian of the function ( There is maybe a way to make it faster but I didn't succeed to do it)
def _residuals_mse_jacobian( coef, unshimmed_vec, coil_mat, factor):
""" Jacobian of the function that we want to minimize, note that normally b is calculates somewhere else
Args:
coef (numpy.ndarray): 1D array of channel coefficients
unshimmed_vec (numpy.ndarray): 1D flattened array (point) of the masked unshimmed map
coil_mat (numpy.ndarray): 2D flattened array (point, channel) of masked coils
(axis 0 must align with unshimmed_vec)
factor (float): integer
Returns:
jacobian (numpy.ndarray) : 1D array of the gradient of the mse function to minimize
"""
b = (2 / (unshimmed_vec.size * factor))
jacobian = np.array([
self.b * np.sum((unshimmed_vec + np.matmul(coil_mat, coef)) * coil_mat[:, j]) + \
np.sign(coef[j]) * (self.reg_factor / (9 * self.reg_factor_channel[j]))
for j in range(coef.size)
])
return jacobian
And so this is the "main" program
import numpy as np
import scipy.optimize as opt
from numpy.random import default_rng
rand = default_rng(seed=0)
reg_factor_channel = rand.integers(1, 10, size=9)
coef = np.zeros(9)
unshimmed_vec = np.random.randint(100, size=(150))
coil_mat = np.random.randint(100, size=(150,9))
factor = 2
self.reg_factor = 5
currents_sp = opt.minimize(_residuals_mse, coef,
args=(unshimmed_vec, coil_mat, factor),
method='SLSQP',
jac = _residuals_mse_jacobian,
options={'maxiter': 1000})
On my computer, the optimization takes around 40 ms for a dataset of this size.
The matrices in the example are usually obtained after some modifications and can be way way bigger, but here to make it clear and easy to test, I choose some arbitrary ones. In addition, this optimization is done many times (Sometimes up to 50 times), so, we are already doing multiprocessing (To do different optimization at the same time). However on mac, mp is slow to start because of the spawning method (because fork is not stable on python 3.9). For this reason, I am trying to make the optimization as fast as possible to maybe remove multiprocessing.
Do any of you know how to make this code faster in python ? Also, this code will be available in open source for users, so I can only free solver (unlike MOSEK)
Edit : I tried to run the code by using the CVXPY model, with this code after the one just above:
m = currents_0.size
n = unshimmed_vec.size
coef = cp.Variable(m)
unshimmed_vec2 = cp.Parameter((n))
coil_mat2 = cp.Parameter((n,m))
unshimmed_vec2.value = unshimmed_vec
coil_mat2.value = coil_mat
x1 = unshimmed_vec2 + cp.matmul(coil_mat2,coef)
x2 = cp.sum_squares(x1) / (factor*n)
x3 = self.reg_factor / self.reg_factor_channel# cp.abs(coef) / m
obj = cp.Minimize(x2 + x3)
prob = cp.Problem(obj)
prob.solve(solver=SCS)
However, this is slowing even more my code, and it gives me a different value than with scipy.optimize.minimize, so does anyone see a problem in this code ?

I will make some sweeping assumptions:
that we can ignore _criteria_func and instead optimize _residuals_mse;
that none of this needs to be in a class;
that, unlike in your example, reg_factor_channel will never have zeros; and
that your bounds and constraints can all be ignored (though you have not made this clear).
Recognize that your inner expressions can be simplified:
np.sum(coil_mat * coef), since it uses a broadcast, is really just a matrix multiplication
mean(**2) on a vector is really just a self-dot-product divided by the length
Some of your scalar factors and vector coefficients can be combined outside of the function
This leaves us with the following, starting without the Jacobian:
import numpy as np
from numpy.random import default_rng
from scipy import optimize as opt
from timeit import timeit
rand = default_rng(seed=0)
reg_factor = 5
reg_factor_channel = rand.integers(1, 10, size=9)
reg_vector = reg_factor / len(reg_factor_channel) / reg_factor_channel
def residuals_mse(
coef: np.ndarray,
unshimmed_vec: np.ndarray,
coil_mat: np.ndarray,
factor: float,
) -> float:
inner = unshimmed_vec + coil_mat#coef
return inner.dot(inner)/len(inner)/factor + np.abs(coef).dot(reg_vector)
def old_residuals_mse(coef, unshimmed_vec, coil_mat, factor):
return np.mean(
(unshimmed_vec + np.sum(coil_mat * coef, axis=1, keepdims=False)) ** 2) / factor + (
reg_factor * np.mean(np.abs(coef) / reg_factor_channel))
def main() -> None:
unshimmed_vec = rand.integers(100, size=150)
coil_mat = rand.integers(100, size=(150, 9))
factor = 2
args = unshimmed_vec, coil_mat, factor
currents_sp = None
def run():
nonlocal currents_sp
currents_sp = opt.minimize(
fun=residuals_mse,
x0=np.zeros_like(reg_factor_channel),
args=args,
method='SLSQP',
)
t = timeit(run, number=1)
print(currents_sp)
print(t, 'seconds')
r_old = old_residuals_mse(currents_sp.x, *args)
assert np.isclose(r_old, currents_sp.fun)
if __name__ == '__main__':
main()
with output
message: Optimization terminated successfully
success: True
status: 0
fun: 435.166150155064
x: [-1.546e-01 -8.305e-02 -1.637e-01 -1.106e-01 -1.033e-01
-8.792e-02 -9.908e-02 -8.666e-02 -1.217e-01]
nit: 7
jac: [-1.179e-01 -1.621e-01 -1.112e-01 -1.765e-01 -1.678e-01
-1.570e-01 -1.456e-01 -1.722e-01 -1.299e-01]
nfev: 94
njev: 7
0.012324300012551248 seconds
The Jacobian does indeed help, but has been written in a way that is not properly vectorised. Once vectorised it looks like:
import numpy as np
from numpy.random import default_rng
from scipy import optimize as opt
from timeit import timeit
rand = default_rng(seed=0)
reg_factor = 5
reg_factor_channel = rand.integers(1, 10, size=9)
reg_vector = reg_factor / len(reg_factor_channel) / reg_factor_channel
def residuals_mse(
coef: np.ndarray,
unshimmed_vec: np.ndarray,
coil_mat: np.ndarray,
factor: float,
) -> float:
inner = unshimmed_vec + coil_mat#coef
return inner.dot(inner)/len(inner)/factor + np.abs(coef).dot(reg_vector)
def old_residuals_mse(coef, unshimmed_vec, coil_mat, factor):
return np.mean(
(unshimmed_vec + np.sum(coil_mat * coef, axis=1, keepdims=False)) ** 2) / factor + (
reg_factor * np.mean(np.abs(coef) / reg_factor_channel))
def residuals_mse_jacobian(
coef: np.ndarray,
unshimmed_vec: np.ndarray,
coil_mat: np.ndarray,
factor: float,
) -> np.ndarray:
b = 2 / unshimmed_vec.size / factor
return b * (
unshimmed_vec + coil_mat#coef
)#coil_mat + np.sign(coef) * reg_vector
def old_residuals_mse_jacobian(coef, unshimmed_vec, coil_mat, factor):
b = (2 / (unshimmed_vec.size * factor))
jacobian = np.array([
b * np.sum((unshimmed_vec + np.matmul(coil_mat, coef)) * coil_mat[:, j]) +
np.sign(coef[j]) * (reg_factor / (9 * reg_factor_channel[j]))
for j in range(coef.size)
])
return jacobian
def main() -> None:
unshimmed_vec = rand.integers(100, size=150)
coil_mat = rand.integers(100, size=(150, 9))
factor = 2
args = unshimmed_vec, coil_mat, factor
currents_sp = None
def run():
nonlocal currents_sp
currents_sp = opt.minimize(
fun=residuals_mse,
x0=np.zeros_like(reg_factor_channel),
args=args,
method='SLSQP',
jac=residuals_mse_jacobian,
)
t = timeit(run, number=1)
print(currents_sp)
print(t, 'seconds')
r_old = old_residuals_mse(currents_sp.x, *args)
assert np.isclose(r_old, currents_sp.fun)
j_new = residuals_mse_jacobian(currents_sp.x, *args)
j_old = old_residuals_mse_jacobian(currents_sp.x, *args)
assert np.allclose(j_old, j_new)
if __name__ == '__main__':
main()
message: Optimization terminated successfully
success: True
status: 0
fun: 435.1661470650057
x: [-1.546e-01 -8.305e-02 -1.637e-01 -1.106e-01 -1.033e-01
-8.792e-02 -9.908e-02 -8.666e-02 -1.217e-01]
nit: 7
jac: [-4.396e-02 -8.791e-02 -3.385e-02 -9.817e-02 -9.516e-02
-8.223e-02 -7.154e-02 -9.907e-02 -5.939e-02]
nfev: 31
njev: 7
0.005059799994342029 seconds

I would suggest trying the library NLOpt. It also has SLSQP as nonlinear solver (among many others), and I found it to be faster in many instances than SciPy optimize.
However, you’re talking 50 ms per run, you won’t get down to 5 ms.
If you’re looking to squeeze as much performance as possible, I would probably go to the metal and re-implement the objective function and Jacobian in Fortran (or C) and then use f2py (or Cython) to bridge them to Python. Looks a bit of an overkill to me though.

Related

tfp.mcmc.HamiltonianMonteCarlo Not working in Tensorflow Probability

I have the following code, which basically tries to fit a simple regression model using tensorflow probability. The code runs without error, but the MCMC sampler doesn't seem to be doing anything in that it returns a trace of the initial states.
import tensorflow.compat.v2 as tf
import tensorflow_probability as tfp
from tensorflow_probability import distributions as tfdimport warnings
tf.enable_v2_behavior()
plt.style.use("ggplot")
warnings.filterwarnings('ignore')
ru=4
N=102
N = 102 # number of data points
t = np.linspace(0, 4*np.pi, N)
data = 3+np.sin(t+0.001) + 0.5 + np.random.randn(N)
media_1 = ((data-min(data))/(max(data)-min(data)) ) #+ np.random.normal(0,.05, N)
y = np.repeat(ru, N) + np.random.normal(.3,.01,N) * media_1 + np.random.normal(0, .005, N)
# model
model = tfd.JointDistributionNamed(dict(
beta1 = tfd.Normal(0,1) ,
intercept = tfd.Normal(0,5 ) ,
var = tfd.Normal(0.05, 0.0005) ,
y = lambda intercept,beta1,var:
tfd.Independent(tfd.Normal(loc=intercept + beta1 * media_1, scale=var),
reinterpreted_batch_ndims=1
)
))
def target_log_prob_fn(intercept, beta1, var):
return model.log_prob({'intercept':intercept, 'beta1':beta1, 'var':var, 'y': y })
s = model.sample()
init_states = [ tf.fill([1], s['intercept'].numpy(), name='init_intercept'),
tf.fill([1], s['beta1'].numpy(), name='init_beta1'),
tf.fill([1], s['var'].numpy(), name='init_var'),]
num_results = 5000
num_burnin_steps = 3000
# Improve performance by tracing the sampler using `tf.function`
# and compiling it using XLA.
#tf.function(autograph=False, experimental_compile=True)
def do_sampling():
return tfp.mcmc.sample_chain(
num_results=num_results,
num_burnin_steps=num_burnin_steps,
current_state=init_states,
kernel=tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=target_log_prob_fn,
step_size=0.1,
num_leapfrog_steps=3)
)
states, kernel_results = do_sampling()
The trace that is returned in states is exactly the same as the initial values in initial_states... Any ideas?
I can confirm that this MCMC sampler is not mixing by printing the acceptance rate with a snippet I found here
print("Acceptance rate:", kernel_results.is_accepted.numpy().mean())
That same page provides some hints about how to make your HMC kernel adaptive, which means it will automatically reduce the step size if too many proposed steps are rejected (and increase it if too many are accepted, too):
# Apply a simple step size adaptation during burnin
#tf.function
def do_sampling():
adaptive_kernel = tfp.mcmc.SimpleStepSizeAdaptation(
tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=target_log_prob_fn,
step_size=0.1,
num_leapfrog_steps=3),
num_adaptation_steps=int(.8 * num_burnin_steps),
target_accept_prob=np.float64(.65))
return tfp.mcmc.sample_chain(
num_results=num_results,
num_burnin_steps=num_burnin_steps,
current_state=init_states,
kernel=adaptive_kernel,
trace_fn=lambda cs, kr: kr)
samples, kernel_results = do_sampling()
print("Acceptance rate:", kernel_results.inner_results.is_accepted.numpy().mean())
This produces a non-zero acceptance rate for me.

Evaluating the squared term of a gaussian kernel for having a covariance matrix for multi-dimensional inputs [duplicate]

I have the following code. It is taking forever in Python. There must be a way to translate this calculation into a broadcast...
def euclidean_square(a,b):
squares = np.zeros((a.shape[0],b.shape[0]))
for i in range(squares.shape[0]):
for j in range(squares.shape[1]):
diff = a[i,:] - b[j,:]
sqr = diff**2.0
squares[i,j] = np.sum(sqr)
return squares
You can use np.einsum after calculating the differences in a broadcasted way, like so -
ab = a[:,None,:] - b
out = np.einsum('ijk,ijk->ij',ab,ab)
Or use scipy's cdist with its optional metric argument set as 'sqeuclidean' to give us the squared euclidean distances as needed for our problem, like so -
from scipy.spatial.distance import cdist
out = cdist(a,b,'sqeuclidean')
I collected the different methods proposed here, and in two other questions, and measured the speed of the different methods:
import numpy as np
import scipy.spatial
import sklearn.metrics
def dist_direct(x, y):
d = np.expand_dims(x, -2) - y
return np.sum(np.square(d), axis=-1)
def dist_einsum(x, y):
d = np.expand_dims(x, -2) - y
return np.einsum('ijk,ijk->ij', d, d)
def dist_scipy(x, y):
return scipy.spatial.distance.cdist(x, y, "sqeuclidean")
def dist_sklearn(x, y):
return sklearn.metrics.pairwise.pairwise_distances(x, y, "sqeuclidean")
def dist_layers(x, y):
res = np.zeros((x.shape[0], y.shape[0]))
for i in range(x.shape[1]):
res += np.subtract.outer(x[:, i], y[:, i])**2
return res
# inspired by the excellent https://github.com/droyed/eucl_dist
def dist_ext1(x, y):
nx, p = x.shape
x_ext = np.empty((nx, 3*p))
x_ext[:, :p] = 1
x_ext[:, p:2*p] = x
x_ext[:, 2*p:] = np.square(x)
ny = y.shape[0]
y_ext = np.empty((3*p, ny))
y_ext[:p] = np.square(y).T
y_ext[p:2*p] = -2*y.T
y_ext[2*p:] = 1
return x_ext.dot(y_ext)
# https://stackoverflow.com/a/47877630/648741
def dist_ext2(x, y):
return np.einsum('ij,ij->i', x, x)[:,None] + np.einsum('ij,ij->i', y, y) - 2 * x.dot(y.T)
I use timeit to compare the speed of the different methods. For the comparison, I use vectors of length 10, with 100 vectors in the first group, and 1000 vectors in the second group.
import timeit
p = 10
x = np.random.standard_normal((100, p))
y = np.random.standard_normal((1000, p))
for method in dir():
if not method.startswith("dist_"):
continue
t = timeit.timeit(f"{method}(x, y)", number=1000, globals=globals())
print(f"{method:12} {t:5.2f}ms")
On my laptop, the results are as follows:
dist_direct 5.07ms
dist_einsum 3.43ms
dist_ext1 0.20ms <-- fastest
dist_ext2 0.35ms
dist_layers 2.82ms
dist_scipy 0.60ms
dist_sklearn 0.67ms
While the two methods dist_ext1 and dist_ext2, both based on the idea of writing (x-y)**2 as x**2 - 2*x*y + y**2, are very fast, there is a downside: When the distance between x and y is very small, due to cancellation error the numerical result can sometimes be (very slightly) negative.
Another solution besides using cdist is the following
difference_squared = np.zeros((a.shape[0], b.shape[0]))
for dimension_iterator in range(a.shape[1]):
difference_squared = difference_squared + np.subtract.outer(a[:, dimension_iterator], b[:, dimension_iterator])**2.

`scipy.optimize` functions hang even with `maxiter=0`

I am trying to train the MNIST data (which I downloaded from Kaggle) with simple multi-class logistic regression, but the scipy.optimize functions hang.
Here's the code:
import csv
from math import exp
from numpy import *
from scipy.optimize import fmin, fmin_cg, fmin_powell, fmin_bfgs
# Prepare the data
def getIiter(ifname):
"""
Get the iterator from a csv file with filename ifname
"""
ifile = open(ifname, 'r')
iiter = csv.reader(ifile)
iiter.__next__()
return iiter
def parseRow(s):
y = [int(x) for x in s]
lab = y[0]
z = y[1:]
return (lab, z)
def getAllRows(ifname):
iiter = getIiter(ifname)
x = []
l = []
for row in iiter:
lab, z = parseRow(row)
x.append(z)
l.append(lab)
return x, l
def cutData(x, y):
"""
70% training
30% testing
"""
m = len(x)
t = int(m * .7)
return [(x[:t], y[:t]), (x[t:], y[t:])]
def num2IndMat(l):
t = array(l)
tt = [vectorize(int)((t == i)) for i in range(10)]
return array(tt).T
def readData(ifname):
x, l = getAllRows(ifname)
t = [[1] + y for y in x]
return array(t), num2IndMat(l)
#Calculate the cost function
def sigmoid(x):
return 1 / (1 + exp(-x))
vSigmoid = vectorize(sigmoid)
vLog = vectorize(log)
def costFunction(theta, x, y):
sigxt = vSigmoid(dot(x, theta))
cm = (- y * vLog(sigxt) - (1 - y) * vLog(1 - sigxt)) / m / N
return sum(cm)
def unflatten(flatTheta):
return [flatTheta[i * N : (i + 1) * N] for i in range(n + 1)]
def costFunctionFlatTheta(flatTheta):
return costFunction(unflatten(flatTheta), trainX, trainY)
def costFunctionFlatTheta1(flatTheta):
return costFunction(flatTheta.reshape(785, 10), trainX, trainY)
x, y = readData('train.csv')
[(trainX, trainY), (testX, testY)] = cutData(x, y)
m = len(trainX)
n = len(trainX[0]) - 1
N = len(trainY[0])
initTheta = zeros(((n + 1), N))
flatInitTheta = ndarray.flatten(initTheta)
flatInitTheta1 = initTheta.reshape(1, -1)
In the last two lines we flatten initTheta because the fmin{,_cg,_bfgs,_powell} functions seem to only take vectors as the initial value argument x0. I also flatten initTheta using reshape in hope this answer can be of help.
There is no problem computing the cost function which takes up less than 2 seconds on my computer:
print(costFunctionFlatTheta(flatInitTheta), costFunctionFlatTheta1(flatInitTheta1))
# 0.69314718056 0.69314718056
But all the fmin functions hang, even if I set maxiter=0.
e.g.
newFlatTheta = fmin(costFunctionFlatTheta, flatInitTheta, maxiter=0)
or
newFlatTheta1 = fmin(costFunctionFlatTheta1, flatInitTheta1, maxiter=0)
When I interrupt the program, it seems to me it all hangs at lines in optimize.py calling the cost functions, lines like this:
return function(*(wrapper_args + args))
For example, if I use fmin_cg, this would be line 292 in optimize.py (Version 0.5).
How do I solve this problem?
OK I found a way to stop fmin_cg from hanging.
Basically I just need to write a function that computes the gradient of the cost function, and pass it to the fprime parameter of fmin_cg.
def gradient(theta, x, y):
return dot(x.T, vSigmoid(dot(x, theta)) - y) / m / N
def gradientFlatTheta(flatTheta):
return ndarray.flatten(gradient(flatTheta.reshape(785, 10), trainX, trainY))
Then
newFlatTheta = fmin_cg(costFunctionFlatTheta, flatInitTheta, fprime=gradientFlatTheta, maxiter=0)
terminates within seconds, and setting maxiter to a higher number (say 100) one can train the model within reasonable amount of time.
The documentation of fmin_cg says the gradient would be numerically computed if no fprime is given, which is what I suspect caused the hanging.
Thanks to this notebook by zgo2016#Kaggle which helped me find the solution.

how to use Apache Commons Math Optimization in Jython?

I want to transfer Matlab code to Jython version, and find that the fminsearch in Matlab might be replaced by Apache-Common-Math-Optimization.
I'm coding on the Mango Medical Image script manager, which uses Jython 2.5.3 as coding language. And the Math version is 3.6.1.
Here is my code:
def f(x,y):
return x^2+y^2
sys.path.append('/home/shujian/APPs/Mango/lib/commons-math3-3.6.1.jar')
sys.add_package('org.apache.commons.math3.analysis')
from org.apache.commons.math3.analysis import MultivariateFunction
sys.add_package('org.apache.commons.math3.optim.nonlinear.scalar.noderiv')
from org.apache.commons.math3.optim.nonlinear.scalar.noderiv import NelderMeadSimplex,SimplexOptimizer
sys.add_package('org.apache.commons.math3.optim.nonlinear.scalar')
from org.apache.commons.math3.optim.nonlinear.scalar import ObjectiveFunction
sys.add_package('org.apache.commons.math3.optim')
from org.apache.commons.math3.optim import MaxEval,InitialGuess
sys.add_package('org.apache.commons.math3.optimization')
from org.apache.commons.math3.optimization import GoalType
initialSolution=[2.0,2.0]
simplex=NelderMeadSimplex([2.0,2.0])
opt=SimplexOptimizer(2**(-6), 2**(-10))
solution=opt.optimize(MaxEval(300),ObjectiveFunction(f),simplex,GoalType.MINIMIZE,InitialGuess([2.0,2.0]))
skewParameters2 = solution.getPointRef()
print skewParameters2;
And I got the error below:
TypeError: optimize(): 1st arg can't be coerced to
I'm quite confused about how to use the optimization in Jython and the examples are all Java version.
I've given up this plan and find another method to perform the fminsearch in Jython. Below is the Jython version code:
import sys
sys.path.append('.../jnumeric-2.5.1_ra0.1.jar') #add the jnumeric path
import Numeric as np
def nelder_mead(f, x_start,
step=0.1, no_improve_thr=10e-6,
no_improv_break=10, max_iter=0,
alpha=1., gamma=2., rho=-0.5, sigma=0.5):
'''
#param f (function): function to optimize, must return a scalar score
and operate over a numpy array of the same dimensions as x_start
#param x_start (float list): initial position
#param step (float): look-around radius in initial step
#no_improv_thr, no_improv_break (float, int): break after no_improv_break iterations with
an improvement lower than no_improv_thr
#max_iter (int): always break after this number of iterations.
Set it to 0 to loop indefinitely.
#alpha, gamma, rho, sigma (floats): parameters of the algorithm
(see Wikipedia page for reference)
return: tuple (best parameter array, best score)
'''
# init
dim = len(x_start)
prev_best = f(x_start)
no_improv = 0
res = [[np.array(x_start), prev_best]]
for i in range(dim):
x=np.array(x_start)
x[i]=x[i]+step
score = f(x)
res.append([x, score])
# simplex iter
iters = 0
while 1:
# order
res.sort(key=lambda x: x[1])
best = res[0][1]
# break after max_iter
if max_iter and iters >= max_iter:
return res[0]
iters += 1
# break after no_improv_break iterations with no improvement
print '...best so far:', best
if best < prev_best - no_improve_thr:
no_improv = 0
prev_best = best
else:
no_improv += 1
if no_improv >= no_improv_break:
return res[0]
# centroid
x0 = [0.] * dim
for tup in res[:-1]:
for i, c in enumerate(tup[0]):
x0[i] += c / (len(res)-1)
# reflection
xr = x0 + alpha*(x0 - res[-1][0])
rscore = f(xr)
if res[0][1] <= rscore < res[-2][1]:
del res[-1]
res.append([xr, rscore])
continue
# expansion
if rscore < res[0][1]:
xe = x0 + gamma*(x0 - res[-1][0])
escore = f(xe)
if escore < rscore:
del res[-1]
res.append([xe, escore])
continue
else:
del res[-1]
res.append([xr, rscore])
continue
# contraction
xc = x0 + rho*(x0 - res[-1][0])
cscore = f(xc)
if cscore < res[-1][1]:
del res[-1]
res.append([xc, cscore])
continue
# reduction
x1 = res[0][0]
nres = []
for tup in res:
redx = x1 + sigma*(tup[0] - x1)
score = f(redx)
nres.append([redx, score])
res = nres
And the test example is as below:
def f(x):
return x[0]**2+x[1]**2+x[2]**2
print nelder_mead(f,[3.4,2.3,2.2])
Actually, the original version is for python, and the link below is the source:
https://github.com/fchollet/nelder-mead

Avoiding optimization pitfalls when modeling an ordinal predicted variable in PyMC3

I am trying to model an ordinal predicted variable using PyMC3 based on the approach in chapter 23 of Doing Bayesian Data Analysis. I would like to determine a good starting value using find_MAP, but am receiving an optimization error.
The model:
import pymc3 as pm
import numpy as np
import theano
import theano.tensor as tt
# Some helper functions
def cdf(x, location=0, scale=1):
epsilon = np.array(1e-32, dtype=theano.config.floatX)
location = tt.cast(location, theano.config.floatX)
scale = tt.cast(scale, theano.config.floatX)
div = tt.sqrt(2 * scale ** 2 + epsilon)
div = tt.cast(div, theano.config.floatX)
erf_arg = (x - location) / div
return .5 * (1 + tt.erf(erf_arg + epsilon))
def percent_to_thresh(idx, vect):
return 5 * tt.sum(vect[:idx + 1]) + 1.5
def full_thresh(thresh):
idxs = tt.arange(thresh.shape[0] - 1)
thresh_mod, updates = theano.scan(fn=percent_to_thresh,
sequences=[idxs],
non_sequences=[thresh])
return tt.concatenate([[-1 * np.inf, 1.5], thresh_mod, [6.5, np.inf]])
def compute_ps(thresh, location, scale):
f_thresh = full_thresh(thresh)
return cdf(f_thresh[1:], location, scale) - cdf(f_thresh[:-1], location, scale)
# Generate data
real_ps = [0.05, 0.05, 0.1, 0.1, 0.2, 0.3, 0.2]
data = np.random.choice(7, size=1000, p=real_ps)
# Run model
with pm.Model() as model:
mu = pm.Normal('mu', mu=4, sd=3)
sigma = pm.Uniform('sigma', lower=0.1, upper=70)
thresh = pm.Dirichlet('thresh', a=np.ones(5))
cat_p = compute_ps(thresh, mu, sigma)
results = pm.Categorical('results', p=cat_p, observed=data)
with model:
start = pm.find_MAP()
trace = pm.sample(2000, start=start)
When running this, I receive the following error:
Applied interval-transform to sigma and added transformed sigma_interval_ to model.
Applied stickbreaking-transform to thresh and added transformed thresh_stickbreaking_ to model.
Traceback (most recent call last):
File "cm_net_log.v1-for_so.py", line 53, in <module>
start = pm.find_MAP()
File "/usr/local/lib/python3.5/site-packages/pymc3/tuning/starting.py", line 133, in find_MAP
specific_errors)
ValueError: Optimization error: max, logp or dlogp at max have non-finite values. Some values may be outside of distribution support. max: {'thresh_stickbreaking_': array([-1.04298465, -0.48661088, -0.84326554, -0.44833646]), 'sigma_interval_': array(-2.220446049250313e-16), 'mu': array(7.68422528308479)} logp: array(-3506.530143064723) dlogp: array([ 1.61013190e-06, nan, -6.73994118e-06,
-6.93873894e-06, 6.03358122e-06, 3.18954680e-06])Check that 1) you don't have hierarchical parameters, these will lead to points with infinite density. 2) your distribution logp's are properly specified. Specific issues:
My questions:
How can I determine why dlogp is nan at certain points?
Is there a different way that I can express this model to avoid dlogp being nan?
Also worth noting:
This model runs fine if I don't find_MAP and use a Metropolis sampler. However, I'd like to have the flexibility of using other samplers as this model becomes more complex.
I have a suspicion that the issue is due to the relationship between the thresholds and the normal distribution, but I don't know how to disentangle them for the optimization.
Regarding question 2: I expressed the model for the ordinal predicted variable (single group) differently; I used the Theano #as_op decorator for a function that calculates probabilities for the outcomes. That also explains why I cannot use find_MAP() or gradient based samplers: Theano cannot calculate a gradient for the custom function. (http://pymc-devs.github.io/pymc3/notebooks/getting_started.html#Arbitrary-deterministics)
# Number of outcomes
nYlevels = df.Y.cat.categories.size
thresh = [k + .5 for k in range(1, nYlevels)]
thresh_obs = np.ma.asarray(thresh)
thresh_obs[1:-1] = np.ma.masked
#as_op(itypes=[tt.dvector, tt.dscalar, tt.dscalar], otypes=[tt.dvector])
def outcome_probabilities(theta, mu, sigma):
out = np.empty(nYlevels)
n = norm(loc=mu, scale=sigma)
out[0] = n.cdf(theta[0])
out[1] = np.max([0, n.cdf(theta[1]) - n.cdf(theta[0])])
out[2] = np.max([0, n.cdf(theta[2]) - n.cdf(theta[1])])
out[3] = np.max([0, n.cdf(theta[3]) - n.cdf(theta[2])])
out[4] = np.max([0, n.cdf(theta[4]) - n.cdf(theta[3])])
out[5] = np.max([0, n.cdf(theta[5]) - n.cdf(theta[4])])
out[6] = 1 - n.cdf(theta[5])
return out
with pm.Model() as ordinal_model_single:
theta = pm.Normal('theta', mu=thresh, tau=np.repeat(.5**2, len(thresh)),
shape=len(thresh), observed=thresh_obs, testval=thresh[1:-1])
mu = pm.Normal('mu', mu=nYlevels/2.0, tau=1.0/(nYlevels**2))
sigma = pm.Uniform('sigma', nYlevels/1000.0, nYlevels*10.0)
pr = outcome_probabilities(theta, mu, sigma)
y = pm.Categorical('y', pr, observed=df.Y.cat.codes.as_matrix())
http://nbviewer.jupyter.org/github/JWarmenhoven/DBDA-python/blob/master/Notebooks/Chapter%2023.ipynb