Implementing minimization in SciPy - optimization

I am trying to implement the 'Iterative hessian Sketch' algorithm from https://arxiv.org/abs/1411.0347 page 12. However, I am struggling with step two which needs to minimize the matrix-vector function.
Imports and basic data generating function
import numpy as np
import scipy as sp
from sklearn.datasets import make_regression
from scipy.optimize import minimize
import matplotlib.pyplot as plt
%matplotlib inline
from numpy.linalg import norm
def generate_data(nsamples, nfeatures, variance=1):
'''Generates a data matrix of size (nsamples, nfeatures)
which defines a linear relationship on the variables.'''
X, y = make_regression(n_samples=nsamples, n_features=nfeatures,\
n_informative=nfeatures,noise=variance)
X[:,0] = np.ones(shape=(nsamples)) # add bias terms
return X, y
To minimize the matrix-vector function, I have tried implementing a function which computes the quanity I would like to minimise:
def f2min(x, data, target, offset):
A = data
S = np.eye(A.shape[0])
#S = gaussian_sketch(nrows=A.shape[0]//2, ncols=A.shape[0] )
y = target
xt = np.ravel(offset)
norm_val = (1/2*S.shape[0])*norm(S#A#(x-xt))**2
#inner_prod = (y - A#xt).T#A#x
return norm_val - inner_prod
I would eventually like to replace S with some random matrices which can reduce the dimensionality of the problem, however, first I need to be confident that this optimisation method is working.
def grad_f2min(x, data, target, offset):
A = data
y = target
S = np.eye(A.shape[0])
xt = np.ravel(offset)
S_A = S#A
grad = (1/S.shape[0])*S_A.T#S_A#(x-xt) - A.T#(y-A#xt)
return grad
x0 = np.zeros((X.shape[0],1))
xt = np.zeros((2,1))
x_new = np.zeros((2,1))
for it in range(1):
result = minimize(f2min, x0=xt,args=(X,y,x_new),
method='CG', jac=False )
print(result)
x_new = result.x
I don't think that this loop is correct at all because at the very least there should be some local convergence before moving on to the next step. The output is:
fun: 0.0
jac: array([ 0.00745058, 0.00774882])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 416
nit: 0
njev: 101
status: 2
success: False
x: array([ 0., 0.])
Does anyone have an idea if:
(1) Why I'm not achieving convergence at each step
(2) I can implement step 2 in a better way?

Related

Decision boundary in perceptron not correct

I was preparing some code for a lecture and re-implemented a simple perceptron: 2 inputs and 1 output. Aim: a linear classifier.
Here's the code that creates the data, setups the perceptron and trains it:
from ipywidgets import interact
import numpy as np
import matplotlib.pyplot as plt
# Two randoms clouds
x = [(1,3)]*10+[(3,1)]*10
x = np.asarray([(i+np.random.rand(), j+np.random.rand()) for i,j in x])
# Colors
cs = "m"*10+"b"*10
# classes
y = [0]*10+[1]*10
class Perceptron:
def __init__(self):
self.w = np.random.randn(3)
self.lr = 0.01
def train(self, x, y, verbose=False):
errs = 0.
for xi, yi in zip(x,y):
x_ = np.insert(xi, 0, 1)
r = self.w # x_
######## HERE IS THE MAGIC HAPPENING #####
r = r >= 0
##########################################
err = float(yi)-float(r)
errs += np.abs(err)
if verbose:
print(yi, r)
self.w = self.w + self.lr * err * x_
return errs
def predict(self, x):
return np.round(self.w # np.insert(x, 0, 1, 1).T)
def decisionLine(self):
w = self.w
slope = -(w[0]/w[2]) / (w[0]/w[1])
intercept = -w[0]/w[2]
return slope, intercept
p = Perceptron()
line_properties = []
errs = []
for i in range(20):
errs.append(p.train(x, y, True if i == 999 else False))
line_properties.append(p.decisionLine())
print(p.predict(x)) # works like a charm!
#interact
def showLine(i:(0,len(line_properties)-1,1)=0):
xs = np.linspace(1, 4)
a, b = line_properties[i]
ys = a * xs + b
plt.scatter(*x.T)
plt.plot(xs, ys, "k--")
At the end, I am calculating the decision boundary, i.e. the linear eq. separating class 0 and 1. However, it seems to be off. I tried inversion etc but have no clue what is wrong. Interestingly, if I change the learning rule to
self.w = self.w + self.lr * err / x_
i.e. dividing by x_, it works properly - I am totally confused. Anyone an idea?
Solved for real
Now I added one small, but very important part to the Perceptron that I just forgot (and maybe others may forget it as well). You have to do the thresholded activation! r = r >= 0 - and now it is centered on 0 and then it does work - this is basically the answer below. If you don't do this, you have to change the classes to get again the center at 0. Currently, I prefer having the classes -1 and 1 as this gives a better decision line (centered) instead of a line that is very close to one of the data clouds.
Before:
Now:
You are creating a linear regression (not logistic regression!) with targets 0 and 1. And the line you plot is the line where the model predicts 0, so it should ideally cut through the cloud of points labeled 0, as in your first plot.
If you don't want to implement the sigmoid for logistic regression, then at least you will want to display a boundary line that corresponds to a value of 0.5 rather than 0.
As for inverting the weights providing a plot that looks like what you want, I think that's just a coincidence of this data.

Error when using tensorflow HMC to marginalise GPR hyperparameters

I would like to use tensorflow (version 2) to use gaussian process regression
to fit some data and I found the google colab example online here [1].
I have turned some of this notebook into a minimal example that is below.
Sometimes the code fails with the following error when using MCMC to marginalize the hyperparameters: and I was wondering if anyone has seen this before or knows how to get around this?
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input matrix is not invertible.
[[{{node mcmc_sample_chain/trace_scan/while/body/_168/smart_for_loop/while/body/_842/dual_averaging_step_size_adaptation___init__/_one_step/transformed_kernel_one_step/mh_one_step/hmc_kernel_one_step/leapfrog_integrate/while/body/_1244/leapfrog_integrate_one_step/maybe_call_fn_and_grads/value_and_gradients/value_and_gradient/gradients/leapfrog_integrate_one_step/maybe_call_fn_and_grads/value_and_gradients/value_and_gradient/PartitionedCall_grad/PartitionedCall/gradients/JointDistributionNamed/log_prob/JointDistributionNamed_log_prob_GaussianProcess/log_prob/JointDistributionNamed_log_prob_GaussianProcess/get_marginal_distribution/Cholesky_grad/MatrixTriangularSolve}}]] [Op:__inference_do_sampling_113645]
Function call stack:
do_sampling
[1] https://colab.research.google.com/github/tensorflow/probability/blob/master/tensorflow_probability/examples/jupyter_notebooks/Gaussian_Process_Regression_In_TFP.ipynb#scrollTo=jw-_1yC50xaM
Note that some of code below is a bit redundant but it should
in some sections but it should be able to reproduce the error.
Thanks!
import time
import numpy as np
import tensorflow.compat.v2 as tf
import tensorflow_probability as tfp
tfb = tfp.bijectors
tfd = tfp.distributions
tfk = tfp.math.psd_kernels
tf.enable_v2_behavior()
import matplotlib
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
#%pylab inline
# Configure plot defaults
plt.rcParams['axes.facecolor'] = 'white'
plt.rcParams['grid.color'] = '#666666'
#%config InlineBackend.figure_format = 'png'
def sinusoid(x):
return np.sin(3 * np.pi * x[..., 0])
def generate_1d_data(num_training_points, observation_noise_variance):
"""Generate noisy sinusoidal observations at a random set of points.
Returns:
observation_index_points, observations
"""
index_points_ = np.random.uniform(-1., 1., (num_training_points, 1))
index_points_ = index_points_.astype(np.float64)
# y = f(x) + noise
observations_ = (sinusoid(index_points_) +
np.random.normal(loc=0,
scale=np.sqrt(observation_noise_variance),
size=(num_training_points)))
return index_points_, observations_
# Generate training data with a known noise level (we'll later try to recover
# this value from the data).
NUM_TRAINING_POINTS = 100
observation_index_points_, observations_ = generate_1d_data(
num_training_points=NUM_TRAINING_POINTS,
observation_noise_variance=.1)
def build_gp(amplitude, length_scale, observation_noise_variance):
"""Defines the conditional dist. of GP outputs, given kernel parameters."""
# Create the covariance kernel, which will be shared between the prior (which we
# use for maximum likelihood training) and the posterior (which we use for
# posterior predictive sampling)
kernel = tfk.ExponentiatedQuadratic(amplitude, length_scale)
# Create the GP prior distribution, which we will use to train the model
# parameters.
return tfd.GaussianProcess(
kernel=kernel,
index_points=observation_index_points_,
observation_noise_variance=observation_noise_variance)
gp_joint_model = tfd.JointDistributionNamed({
'amplitude': tfd.LogNormal(loc=0., scale=np.float64(1.)),
'length_scale': tfd.LogNormal(loc=0., scale=np.float64(1.)),
'observation_noise_variance': tfd.LogNormal(loc=0., scale=np.float64(1.)),
'observations': build_gp,
})
x = gp_joint_model.sample()
lp = gp_joint_model.log_prob(x)
print("sampled {}".format(x))
print("log_prob of sample: {}".format(lp))
# Create the trainable model parameters, which we'll subsequently optimize.
# Note that we constrain them to be strictly positive.
constrain_positive = tfb.Shift(np.finfo(np.float64).tiny)(tfb.Exp())
amplitude_var = tfp.util.TransformedVariable(
initial_value=1.,
bijector=constrain_positive,
name='amplitude',
dtype=np.float64)
length_scale_var = tfp.util.TransformedVariable(
initial_value=1.,
bijector=constrain_positive,
name='length_scale',
dtype=np.float64)
observation_noise_variance_var = tfp.util.TransformedVariable(
initial_value=1.,
bijector=constrain_positive,
name='observation_noise_variance_var',
dtype=np.float64)
trainable_variables = [v.trainable_variables[0] for v in
[amplitude_var,
length_scale_var,
observation_noise_variance_var]]
# Use `tf.function` to trace the loss for more efficient evaluation.
#tf.function(autograph=False, experimental_compile=False)
def target_log_prob(amplitude, length_scale, observation_noise_variance):
return gp_joint_model.log_prob({
'amplitude': amplitude,
'length_scale': length_scale,
'observation_noise_variance': observation_noise_variance,
'observations': observations_
})
# Now we optimize the model parameters.
num_iters = 1000
optimizer = tf.optimizers.Adam(learning_rate=.01)
# Store the likelihood values during training, so we can plot the progress
lls_ = np.zeros(num_iters, np.float64)
for i in range(num_iters):
with tf.GradientTape() as tape:
loss = -target_log_prob(amplitude_var, length_scale_var,
observation_noise_variance_var)
grads = tape.gradient(loss, trainable_variables)
optimizer.apply_gradients(zip(grads, trainable_variables))
lls_[i] = loss
print('Trained parameters:')
print('amplitude: {}'.format(amplitude_var._value().numpy()))
print('length_scale: {}'.format(length_scale_var._value().numpy()))
print('observation_noise_variance: {}'.format(observation_noise_variance_var._value().numpy()))
num_results = 100
num_burnin_steps = 50
sampler = tfp.mcmc.TransformedTransitionKernel(
tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=target_log_prob,
step_size=tf.cast(0.1, tf.float64),
num_leapfrog_steps=8),
bijector=[constrain_positive, constrain_positive, constrain_positive])
adaptive_sampler = tfp.mcmc.DualAveragingStepSizeAdaptation(
inner_kernel=sampler,
num_adaptation_steps=int(0.8 * num_burnin_steps),
target_accept_prob=tf.cast(0.75, tf.float64))
initial_state = [tf.cast(x, tf.float64) for x in [1., 1., 1.]]
# Speed up sampling by tracing with `tf.function`.
#tf.function(autograph=False, experimental_compile=False)
def do_sampling():
return tfp.mcmc.sample_chain(
kernel=adaptive_sampler,
current_state=initial_state,
num_results=num_results,
num_burnin_steps=num_burnin_steps,
trace_fn=lambda current_state, kernel_results: kernel_results)
t0 = time.time()
samples, kernel_results = do_sampling()
t1 = time.time()
print("Inference ran in {:.2f}s.".format(t1-t0))
This can happen if you have multiple index points that are very close, so you might consider using np.linspace or just doing some post filtering of your random draw. I would also suggest a bit bigger epsilon, maybe 1e-6.

Using scipy.odr to fit curve

I'm trying to fit a set of data points via a fit function that depends on two variables, let's call these xdata and sdata. Problem is my curve is rather flat I want it to more or less "follow the points".
I've tried using scipy.odr to fit the curve it works rather well except that the curve is too flat:
import numpy as np
from math import pi
from math import sqrt
from math import log
from scipy import optimize
import scipy.optimize
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.odr import *
mudr=np.array([ 57.43708609, 46.26119205, 55.60688742, 33.21615894,
28.27072848, 22.54649007, 21.80662252, 11.21483444, 5.80211921])
#xdata points
dme=array([ 128662.54890776, 105265.32915726, 128652.56835434,
77968.67019573, 66273.56542068, 58464.58559543,
54570.66624991, 27286.90038703, 19480.92689266]) #xdata error
dmss22=np.array([ 4.90050000e+17, 4.90050000e+17, 4.90050000e+17,
4.90050000e+17, 4.90050000e+17, 4.90050000e+17,
4.90050000e+17, 4.90050000e+17, 4.90050000e+17]) #sdata points
dmse=np.array([ 1.09777592e+21, 1.11512117e+21, 1.13381702e+21,
1.15033267e+21, 1.14883089e+21, 1.27076265e+21,
1.22637165e+21, 1.19237598e+21, 1.64539205e+21]) # sdata error
F=np.array([ 115.01944248, 110.24354867, 112.77812389, 104.81830088,
104.35746903, 101.32016814, 100.54513274, 96.94226549,
93.00424779]) #ydata points
dF=np.array([ 72710.75386699, 72590.6256987 , 176539.40403673,
130555.27503081, 124299.52080164, 176426.64340597,
143013.52848306, 122117.93022746, 157547.78395513])#ydata error
def Ffitsso(p,X,B=2.58,Fc=92.2,mu=770,Za=0.9468): #fitfunction
temp1 = (2*B*X[0])/(4*pi*Fc)**2
temp2 = temp1*(afij[0]+afij[1]*np.log((2*B*X[0])/mu**2))
temp3 = temp1**2*(afij[2]+afij[3]*np.log((2*B*X[0])/mu**2)+\
afij[4]*(np.log((2*B*X[0])/mu**2))**2)
temp4 = temp1**3*(afij[5]+afij[6]*np.log((2*B*X[0])/mu**2)+\
afij[7]*(np.log((2*B*X[0])/mu**2))**2+\
afij[8]*(np.log((2*B*X[0])/mu**2))**3)
return Fc/Za*(1+p[0]*X[1])*(1+temp2+temp3+temp4)+p[1]
#fitting using scipy.odr
xtot=np.row_stack( (mudr, dmss22) )
etot=np.row_stack( (Ze, dmss22e) )
fitting = Model(Ffitsso)
mydata = RealData(xtot, F, sx=etot2, sy=dF)
myodr = ODR(mydata, fitting, beta0=[0, 100])
myoutput = myodr.run()
myoutput.pprint()
bet=myoutput.beta
plt.plot(mudr,F,"b^")
plt.plot(mudr,Ffitsso(bet,[mudr,dmss22]))
p[0]*X[0] in the fitfunction is supposed to be small compared to 1 but with the fit the value for p[0] is in order of e-18 whilst dmss22 values are in the order of e-17 which is not small enough.
Even worse is that it's negative meaning the function decreases which is not supposed to happen it's supposed to increase like the plotted data points.
Edit: I fixed, didn't know that it was so sensitive to initial beta values, put beta[0]=1.5*10(-15) and it works!**
Here is a graphical fitter with both curve_fit and ODR fitters using scipy's Differential Evolution (DE) genetic algorithm to supply initial parameter estimates for the non-linear solvers. The scipy implementation of DE uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, and this requires parameter bounds within which to search - in this example, these bounds are taken from the data maximum and minimum values. Note that it is much easier to give bounds for the initial parameter estimates rather than individual specific values.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.odr
from scipy.optimize import differential_evolution
import warnings
xData = numpy.array([1.1, 2.2, 3.3, 4.4, 5.0, 6.6, 7.7, 0.0])
yData = numpy.array([1.1, 20.2, 30.3, 40.4, 50.0, 60.6, 70.7, 0.1])
def func(x, a, b, c, d, offset): # curve fitting function for curve_fit()
return a*numpy.exp(-(x-b)**2/(2*c**2)+d) + offset
def func_wrapper_for_ODR(parameters, x): # parameter order for ODR
return func(x, *parameters)
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
parameterBounds = []
parameterBounds.append([minY, maxY]) # search bounds for a
parameterBounds.append([minX, maxX]) # search bounds for b
parameterBounds.append([minX, maxX]) # search bounds for c
parameterBounds.append([minY, maxY]) # search bounds for d
parameterBounds.append([0.0, maxY]) # search bounds for Offset
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
geneticParameters = generate_Initial_Parameters()
##########################
# curve_fit section
##########################
fittedParameters_curvefit, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters curve_fit:', fittedParameters_curvefit)
print()
modelPredictions_curvefit = func(xData, *fittedParameters_curvefit)
absError_curvefit = modelPredictions_curvefit - yData
SE_curvefit = numpy.square(absError_curvefit) # squared errors
MSE_curvefit = numpy.mean(SE_curvefit) # mean squared errors
RMSE_curvefit = numpy.sqrt(MSE_curvefit) # Root Mean Squared Error, RMSE
Rsquared_curvefit = 1.0 - (numpy.var(absError_curvefit) / numpy.var(yData))
print()
print('RMSE curve_fit:', RMSE_curvefit)
print('R-squared curve_fit:', Rsquared_curvefit)
print()
##########################
# ODR section
##########################
data = scipy.odr.odrpack.Data(xData,yData)
model = scipy.odr.odrpack.Model(func_wrapper_for_ODR)
odr = scipy.odr.odrpack.ODR(data, model, beta0=geneticParameters)
# Run the regression.
odr_out = odr.run()
print('Fitted parameters ODR:', odr_out.beta)
print()
modelPredictions_odr = func(xData, *odr_out.beta)
absError_odr = modelPredictions_odr - yData
SE_odr = numpy.square(absError_odr) # squared errors
MSE_odr = numpy.mean(SE_odr) # mean squared errors
RMSE_odr = numpy.sqrt(MSE_odr) # Root Mean Squared Error, RMSE
Rsquared_odr = 1.0 - (numpy.var(absError_odr) / numpy.var(yData))
print()
print('RMSE ODR:', RMSE_odr)
print('R-squared ODR:', Rsquared_odr)
print()
##########################################################
# graphics output section
def ModelsAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plots
xModel = numpy.linspace(min(xData), max(xData))
yModel_curvefit = func(xModel, *fittedParameters_curvefit)
yModel_odr = func(xModel, *odr_out.beta)
# now the models as line plots
axes.plot(xModel, yModel_curvefit)
axes.plot(xModel, yModel_odr)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelsAndScatterPlot(graphWidth, graphHeight)

Fit sigmoid curve in python

Thanks ahead! I am trying to fit a sigmoid curve over some data, below is my code
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
====== some code in between =======
plt.scatter(drag[0].w,drag[0].s, s = 10, label = 'drag%d'%0)
def sigmoid(x,x0,k):
y = 1.0/(1.0+np.exp(-x0*(x-k)))
return y
popt,pcov = curve_fit(sigmoid, drag[0].w, drag[0].s)
xx = np.linspace(10,1000,10)
yy = sigmoid(xx, *popt)
plt.plot(xx,yy,'r-', label='fit')
plt.legend(loc='upper left')
plt.xlabel('weight(kg)', fontsize=12)
plt.ylabel('wing span(m)', fontsize=12)
plt.show()
this is now showing the graph below which is not very rightfitting curve is the red one at bottom
What are the possible solutions?
Also I am open to other methods of fitting logistic curves on this set of data
Thanks again!
Here is an example graphical fitter using your equation with an amplitude scaling factor for my test data. This code uses scipy's Differential Evolution genetic algorithm to provide initial parameter estimates for curve_fit(), as the scipy default initial parameter estimates of all 1.0 are not always optimal. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, and this requires bounds within which to search. In this example those bounds are taken from the example data I provide, when using your own data please check that the bounds seem reasonable. Note that ranges on the parameters are much easier to provide than specific values for the initial parameter estimates.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
xData = numpy.array([19.1647, 18.0189, 16.9550, 15.7683, 14.7044, 13.6269, 12.6040, 11.4309, 10.2987, 9.23465, 8.18440, 7.89789, 7.62498, 7.36571, 7.01106, 6.71094, 6.46548, 6.27436, 6.16543, 6.05569, 5.91904, 5.78247, 5.53661, 4.85425, 4.29468, 3.74888, 3.16206, 2.58882, 1.93371, 1.52426, 1.14211, 0.719035, 0.377708, 0.0226971, -0.223181, -0.537231, -0.878491, -1.27484, -1.45266, -1.57583, -1.61717])
yData = numpy.array([0.644557, 0.641059, 0.637555, 0.634059, 0.634135, 0.631825, 0.631899, 0.627209, 0.622516, 0.617818, 0.616103, 0.613736, 0.610175, 0.606613, 0.605445, 0.603676, 0.604887, 0.600127, 0.604909, 0.588207, 0.581056, 0.576292, 0.566761, 0.555472, 0.545367, 0.538842, 0.529336, 0.518635, 0.506747, 0.499018, 0.491885, 0.484754, 0.475230, 0.464514, 0.454387, 0.444861, 0.437128, 0.415076, 0.401363, 0.390034, 0.378698])
def sigmoid(x, amplitude, x0, k):
return amplitude * 1.0/(1.0+numpy.exp(-x0*(x-k)))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = sigmoid(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
parameterBounds = []
parameterBounds.append([minY, maxY]) # search bounds for amplitude
parameterBounds.append([minX, maxX]) # search bounds for x0
parameterBounds.append([minX, maxX]) # search bounds for k
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(sigmoid, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = sigmoid(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = sigmoid(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Fitting to Poisson histogram

I am trying to fit a curve over the histogram of a Poisson distribution that looks like this
I have modified the fit function so that it resembles a Poisson distribution, with the parameter t as a variable. But the curve_fit function can not be plotted and I am not sure why.
def histo(bsize):
N = bsize
#binwidth
bw = (dt.max()-dt.min())/(N-1.)
bin1 = dt.min()+ bw*np.arange(N)
#define the array to hold the occurrence count
bincount= np.array([])
for bin in bin1:
count = np.where((dt>=bin)&(dt<bin+bw))[0].size
bincount = np.append(bincount,count)
#bin center
binc = bin1+0.5*bw
plt.figure()
plt.plot(binc,bincount,drawstyle= 'steps-mid')
plt.xlabel("Interval[ticks]")
plt.ylabel("Frequency")
histo(30)
plt.xlim(0,.5e8)
plt.ylim(0,25000)
import numpy as np
from scipy.optimize import curve_fit
delta_t = 1.42e7
def func(x, t):
return t * np.exp(- delta_t/t)
popt, pcov = curve_fit(func, np.arange(0,.5e8),histo(30))
plt.plot(popt)
The problem with your code is that you do not know what the return values of curve_fit are. It is the parameters for the fit-function and their covariance matrix - not something you can plot directly.
Binned Least-Squares Fit
In general you can get everything much, much more easily:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.special import factorial
from scipy.stats import poisson
# get poisson deviated random numbers
data = np.random.poisson(2, 1000)
# the bins should be of integer width, because poisson is an integer distribution
bins = np.arange(11) - 0.5
entries, bin_edges, patches = plt.hist(data, bins=bins, density=True, label='Data')
# calculate bin centers
bin_centers = 0.5 * (bin_edges[1:] + bin_edges[:-1])
def fit_function(k, lamb):
'''poisson function, parameter lamb is the fit parameter'''
return poisson.pmf(k, lamb)
# fit with curve_fit
parameters, cov_matrix = curve_fit(fit_function, bin_centers, entries)
# plot poisson-deviation with fitted parameter
x_plot = np.arange(0, 15)
plt.plot(
x_plot,
fit_function(x_plot, *parameters),
marker='o', linestyle='',
label='Fit result',
)
plt.legend()
plt.show()
This is the result:
Unbinned Maximum-Likelihood fit
An even better possibility would be to not use a histogram at all
and instead to carry out a maximum-likelihood fit.
But by closer examination even this is unnecessary, because the
maximum-likelihood estimator for the parameter of the poissonian distribution is the arithmetic mean.
However, if you have other, more complicated PDFs, you can use this as example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.special import factorial
from scipy import stats
def poisson(k, lamb):
"""poisson pdf, parameter lamb is the fit parameter"""
return (lamb**k/factorial(k)) * np.exp(-lamb)
def negative_log_likelihood(params, data):
"""
The negative log-Likelihood-Function
"""
lnl = - np.sum(np.log(poisson(data, params[0])))
return lnl
def negative_log_likelihood(params, data):
''' better alternative using scipy '''
return -stats.poisson.logpmf(data, params[0]).sum()
# get poisson deviated random numbers
data = np.random.poisson(2, 1000)
# minimize the negative log-Likelihood
result = minimize(negative_log_likelihood, # function to minimize
x0=np.ones(1), # start value
args=(data,), # additional arguments for function
method='Powell', # minimization method, see docs
)
# result is a scipy optimize result object, the fit parameters
# are stored in result.x
print(result)
# plot poisson-distribution with fitted parameter
x_plot = np.arange(0, 15)
plt.plot(
x_plot,
stats.poisson.pmf(x_plot, result.x),
marker='o', linestyle='',
label='Fit result',
)
plt.legend()
plt.show()
Thank you for the wonderful discussion!
You might want to consider the following:
1) Instead of computing "poisson", compute "log poisson", for better numerical behavior
2) Instead of using "lamb", use the logarithm (let me call it "log_mu"), to avoid the fit "wandering" into negative values of "mu".
So
log_poisson(k, log_mu): return k*log_mu - loggamma(k+1) - math.exp(log_mu)
Where "loggamma" is the scipy.special.loggamma function.
Actually, in the above fit, the "loggamma" term only adds a constant offset to the functions being minimized, so one can just do:
log_poisson_(k, log_mu): return k*log_mu - math.exp(log_mu)
NOTE: log_poisson_() not the same as log_poisson(), but when used for minimization in the manner above, will give the same fitted minimum (the same value of mu, up to numerical issues). The value of the function being minimized will have been offset, but one doesn't usually care about that anyway.