Trouble writing OptimizationFunction for automatic forward differentiation during Parameter Estimation of an ODEProblem - optimization

I am trying to learn Julia for its potential use in parameter estimation. I am interested in estimating kinetic parameters of chemical reactions, which usually involves optimizing reaction parameters with multiple independent batches of experiments. I have successfully optimized a single batch, but need to expand the problem to use many different batches. In developing a sample problem, I am trying to optimize using two toy batches. I know there are probably smarter ways to do this (subject of a future question), but my current workflow involves calling an ODEProblem for each batch, calculating its loss against the data, and minimizing the sum of the residuals for the two batches. Unfortunately, I get an error when initiating the optimization with Optimization.jl. The current code and error are shown below:
using DifferentialEquations, Plots, DiffEqParamEstim
using Optimization, ForwardDiff, OptimizationOptimJL, OptimizationNLopt
using Ipopt, OptimizationGCMAES, Optimisers
using Random
#Experimental data, species B is NOT observed in the data
times = [0.0, 0.071875, 0.143750, 0.215625, 0.287500, 0.359375, 0.431250,
0.503125, 0.575000, 0.646875, 0.718750, 0.790625, 0.862500,
0.934375, 1.006250, 1.078125, 1.150000]
A_obs = [1.0, 0.552208, 0.300598, 0.196879, 0.101175, 0.065684, 0.045096,
0.028880, 0.018433, 0.011509, 0.006215, 0.004278, 0.002698,
0.001944, 0.001116, 0.000732, 0.000426]
C_obs = [0.0, 0.187768, 0.262406, 0.350412, 0.325110, 0.367181, 0.348264,
0.325085, 0.355673, 0.361805, 0.363117, 0.327266, 0.330211,
0.385798, 0.358132, 0.380497, 0.383051]
P_obs = [0.0, 0.117684, 0.175074, 0.236679, 0.234442, 0.270303, 0.272637,
0.274075, 0.278981, 0.297151, 0.297797, 0.298722, 0.326645,
0.303198, 0.277822, 0.284194, 0.301471]
#Create additional data sets for a multi data set optimization
#Simple noise added to data for testing
times_2 = times[2:end] .+ rand(range(-0.05,0.05,100))
P_obs_2 = P_obs[2:end] .+ rand(range(-0.05,0.05,100))
A_obs_2 = A_obs[2:end].+ rand(range(-0.05,0.05,100))
C_obs_2 = C_obs[2:end].+ rand(range(-0.05,0.05,100))
#ki = [2.78E+00, 1.00E-09, 1.97E-01, 3.04E+00, 2.15E+00, 5.27E-01] #Target optimized parameters
ki = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1] #Initial guess of parameters
IC = [1.0, 0.0, 0.0, 0.0] #Initial condition for each species
tspan1 = (minimum(times),maximum(times)) #tuple timespan of data set 1
tspan2 = (minimum(times_2),maximum(times_2)) #tuple timespan of data set 2
# data = VectorOfArray([A_obs,C_obs,P_obs])'
data = vcat(A_obs',C_obs',P_obs') #Make multidimensional array containing all observed data for dataset1, transpose to match shape of ODEProblem output
data2 = vcat(A_obs_2',C_obs_2',P_obs_2') #Make multidimensional array containing all observed data for dataset2, transpose to match shape of ODEProblem output
#make dictionary containing data, time, and initial conditions
keys1 = ["A","B"]
keys2 = ["time","obs","IC"]
entryA =[times,data,IC]
entryB = [times_2, data2,IC]
nest=[Dict(zip(keys2,entryA)),Dict(zip(keys2,entryB))]
exp_dict = Dict(zip(keys1,nest)) #data dictionary
#rate equations in power law form r = k [A][B]
function rxn(x, k)
A = x[1]
B = x[2]
C = x[3]
P = x[4]
k1 = k[1]
k2 = k[2]
k3 = k[3]
k4 = k[4]
k5 = k[5]
k6 = k[6]
r1 = k1 * A
r2 = k2 * A * B
r3 = k3 * C * B
r4 = k4 * A
r5 = k5 * A
r6 = k6 * A * B
return [r1, r2, r3, r4, r5, r6] #returns reaction rate of each equation
end
#Mass balance differential equations
function mass_balances(di,x,args,t)
k = args
r = rxn(x, k)
di[1] = - r[1] - r[2] - r[4] - r[5] - r[6] #Species A
di[2] = + r[1] - r[2] - r[3] - r[6] #Species B
di[3] = + r[2] - r[3] + r[4] #Species C
di[4] = + r[3] + r[5] + r[6] #Species P
end
function ODESols(time,uo,parms)
time_init = (minimum(time),maximum(time))
prob = ODEProblem(mass_balances,uo,time_init,parms)
sol = solve(prob, Tsit5(), reltol=1e-8, abstol=1e-8,save_idxs = [1,3,4],saveat=time) #Integrate prob
return sol
end
function cost_function(data_dict,parms)
res_dict = Dict(zip(keys(data_dict),[0.0,0.0]))
for key in keys(data_dict)
pred = ODESols(data_dict[key]["time"],data_dict[key]["IC"],parms)
loss = L2Loss(data_dict[key]["time"],data_dict[key]["obs"])
err = loss(pred)
res_dict[key] = err
end
residual = sum(res_dict[key] for key in keys(res_dict))
#show typeof(residual)
return residual
end
lb = [0.0,0.0,0.0,0.0,0.0,0.0] #parameter lower bounds
ub = [10.0,10.0,10.0,10.0,10.0,10.0] #parameter upper bounds
optfun = Optimization.OptimizationFunction(cost_function,Optimization.AutoForwardDiff())
optprob = Optimization.OptimizationProblem(optfun,exp_dict, ki,lb=lb,ub=ub,reltol=1E-8) #Set up optimization problem
optsol=solve(optprob, BFGS(),maxiters=10000) #Solve optimization problem
println(optsol.u) #print solution
when I call optsol I get the error:
ERROR: MethodError: no method matching ForwardDiff.GradientConfig(::Optimization.var"#89#106"{OptimizationFunction{true, Optimization.AutoForwardDiff{nothing}, typeof(cost_function), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Vector{Float64}}, ::Dict{String, Dict{String, Array{Float64}}}, ::ForwardDiff.Chunk{2})
Searching online suggests that the issue may be that my cost_function function is not generic enough for ForwardDiff to handle, however I am not sure how to identify where the issue is in this function, or whether it is related to the functions (mass_balances and rxn) that are called within cost_function. Another potential issue is that I am not calling the functions appropriately when building the OptimizationFunction or the OpptimizationProblem, but I cannot identify the issue here either.
Thank you for any suggestions and your help in troubleshooting this application!

res_dict = Dict(zip(keys(data_dict),[0.0,0.0]))
This dictionary is declared to the wrong type.
zerotype = zero(params[1])
res_dict = Dict(zip(keys(data_dict),[zerotype ,zerotype]))
or
res_dict = Dict(zip(keys(data_dict),zeros(eltype(params),2)))
Either way, you want your intermediate calculations to match the type of params when using AutoForwardDiff().

In addition to the variable type specification suggested by Chris, my model also had an issue with the order of the arguments of cost_function and how I passed the arguments to the problem in optprob. This solution was shown by Contradict here

Related

Is non-identical not enough to be considered 'distinct' for kmeans centroids?

I have an issue with kmeans clustering providing centroids. I saw the same problem already asked (
K-means: Initial centers are not distinct), but the solution in that post is not working in my case.
I selected the centroids using ClusterR::Kmeans_arma. I confirmed that my centroids are not identical using mgcv::uniquecombs, but still got the initial centers are not distinct error.
> dim(t(dat))
[1] 13540 11553
> centroids = ClusterR::KMeans_arma(data = t(dat), centers = 561,
n_iter = 50, seed_mode = "random_subset",
verbose = FALSE, CENTROIDS = NULL)
> dim(centroids)
[1] 561 11553
> x = mgcv::uniquecombs(centroids)
> dim(x)
[1] 561 11553
> res = kmeans(t(dat), centers = centroids, iter.max = 200)
Error in kmeans(t(dat), centers = centroids, iter.max = 200) :
initial centers are not distinct
Any suggestion to resolve this? Thanks!
I replicated the issue you've mentioned with the following data:
cols = 13540
rows = 11553
set.seed(1)
vec_dat = runif(rows * cols)
dat = matrix(vec_dat, nrow = rows, ncol = cols)
dim(dat)
dat = t(dat)
dim(dat)
There is no 'centers' parameter in the 'ClusterR::KMeans_arma()' function, therefore I've assumed you actually mean 'clusters',
centroids = ClusterR::KMeans_arma(data = dat,
clusters = 561,
n_iter = 50,
seed_mode = "random_subset",
verbose = TRUE,
CENTROIDS = NULL)
str(centroids)
dim(centroids)
The 'centroids' is a matrix of class "k-means clustering". If your intention is to come to the clusters then you can use,
clust = ClusterR::predict_KMeans(data = dat,
CENTROIDS = centroids,
threads = 6)
length(unique(clust)) # 561
class(centroids) # "k-means clustering"
If you want to pass the 'centroids' to the base R 'kmeans' function you have to set the 'class' of the 'centroids' object to NULL and that because the base R 'kmeans' function uses internally the base R 'duplicated()' function (you can view this by using print(kmeans) in the R console) which does not recognize the 'centroids' object as a matrix or data.frame (it is an object of class "k-means clustering") and performs the checking column-wise rather than row-wise. Therefore, the following should work for your case,
class(centroids) = NULL
dups = duplicated(centroids)
sum(dups) # this should actually give 0
res = kmeans(dat, centers = centroids, iter.max = 200)
I've made a few adjustments to the "ClusterR::predict_KMeans()" and particularly I've added the "threads" parameter and a check for duplicates, therefore if you want to come to the clusters using multiple cores you have to install the package from Github using,
remotes::install_github('mlampros/ClusterR',
upgrade = 'always',
dependencies = TRUE,
repos = 'https://cloud.r-project.org/')
The changes will take effect in the next version of the CRAN package which will be "1.2.2"
UPDATE regarding output and performance (based on your comment):
data(dietary_survey_IBS, package = 'ClusterR')
kmeans_arma = function(data) {
km_cl = ClusterR::KMeans_arma(data,
clusters = 2,
n_iter = 10,
seed_mode = "random_subset",
seed = 1)
pred_cl = ClusterR::predict_KMeans(data = data,
CENTROIDS = km_cl,
threads = 1)
return(pred_cl)
}
km_arma = kmeans_arma(data = dietary_survey_IBS)
km_algos = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen")
for (algo in km_algos) {
cat('base-kmeans-algo:', algo, '\n')
km_base = kmeans(dietary_survey_IBS,
centers = 2,
iter.max = 10,
nstart = 1, # can be set to 5 or 10 etc.
algorithm = algo)
km_cl = as.vector(km_base$cluster)
print(table(km_arma, km_cl))
cat('--------------------------\n')
}
microbenchmark::microbenchmark(kmeans(dietary_survey_IBS,
centers = 2,
iter.max = 10,
nstart = 1, # can be set to 5 or 10 etc.
algorithm = algo), kmeans_arma(data = dietary_survey_IBS), times = 100)
I don't see any significant difference in the output clusters between the 'base R kmeans' and the 'kmeans_arma' function for all available 'base R kmeans' algorithms (you can test it also for your own data sets). I am not sure which algorithm the 'armadillo' library uses internally and moreover the 'base R kmeans' includes the 'nstart' parameter (you can consult the documentation for more info). Regarding performance you won't see any substantial differences for small to medium data sets but due to the fact that the armadillo library uses OpenMP internally in case that your computer has more than 1 cores then for big data sets I think the 'ClusterR::KMeans_arma' function will return the 'centroids' faster.

Is there a wrapper library for solving optimisation problems by declaring known and unknown variables?

cvxpy has a very neat way to write out the optimisation form without worrying too much about converting it into a "standard" matrix form as this is done internally somehow. Best to explain with an example:
def cvxpy_implementation():
var1 = cp.Variable()
var2 = cp.Variable()
constraints = [
var1 <= 3,
var2 >= 2
]
obj_fun = cp.Minimize(var1**2 + var2**2)
problem = cp.Problem(obj_fun, constraints)
problem.solve()
return var1.value, var2.value
def scipy_implementation1():
A = np.diag(np.ones(2))
lb = np.array([-np.inf, 2])
ub = np.array([3, np.inf])
con = LinearConstraint(A, lb, ub)
def obj_fun(x):
return (x**2).sum()
result = minimize(obj_fun, [0, 0], constraints=con)
return result.x
def scipy_implementation2():
con = [
{'type': 'ineq', 'fun': lambda x: 3 - x[0]},
{'type': 'ineq', 'fun': lambda x: x[1] - 2},]
def obj_fun(x):
return (x**2).sum()
result = minimize(obj_fun, [0, 0], constraints=con)
return result.x
All of the above give the correct result but the cvxpy implementation is much "easier" to write out, specifically I don't have to worry about the inequalities and can name variables useful thinks when writing out the inequalities. Compare that to the scipy1 and scipy2 implementations where in the first case I have to write out these extra infs and in the second case I have to remember which variable is which. You can imagine a case where I have 100 variables and while concatenating them will ultimately need to be done I'd like to be able to write it out like in cvxpy.
Question:
Has anyone implemented this for scipy? or is there an alternative library that could make this work?
thank you
Wrote something up that would do this and seems to cover the main issues I had in mind.
The general idea is you define variables and then create a simple expression as you would normally write it out and then the solver class optimises over the defined variables
https://github.com/evan54/optimisation/blob/master/var.py
The example below illustrates a simple use case
# fake data
a = 2
m = 3
x = np.linspace(0, 10)
y = a * x + m + np.random.randn(len(x))
a_ = Variable()
m_ = Variable()
y_ = a_ * x + m_
error = y_ - y
prob = Problem((error**2).sum(), None)
prob.minimize() print(f'a = {a}, a_ = {a_}') print(f'm = {m}, m_ = {m_}')

Scipy Optimize minimize returns the initial value

I am building machine learning models for a certain data set. Then, based on the constraints and bounds for the outputs and inputs, I am trying to find the input parameters for the most minimized answer.
The problem which I am facing is that, when the model is a linear regression model or something like lasso, the minimization works perfectly fine.
However, when the model is "Decision Tree", it constantly returns the very initial value that is given to it. So basically, it does not enforce the constraints.
import numpy as np
import pandas as pd
from scipy.optimize import minimize
I am using the very first sample from the input data set for the optimization. As it is only one sample, I need to reshape it to (1,-1) as well.
x = df_in.iloc[0,:]
x = np.array(x)
x = x.reshape(1,-1)
This is my Objective function:
def objective(x):
x = np.array(x)
x = x.reshape(1,-1)
y = 0
for n in range(df_out.shape[1]):
y = Model[n].predict(x)
Y = y[0]
return Y
Here I am defining the bounds of inputs:
range_max = pd.DataFrame(range_max)
range_min = pd.DataFrame(range_min)
B_max=[]
B_min =[]
for i in range(range_max.shape[0]):
b_max = range_max.iloc[i]
b_min = range_min.iloc[i]
B_max.append(b_max)
B_min.append(b_min)
B_max = pd.DataFrame(B_max)
B_min = pd.DataFrame(B_min)
bnds = pd.concat([B_min, B_max], axis=1)
These are my constraints:
con_min = pd.DataFrame(c_min)
con_max = pd.DataFrame(c_max)
Here I am defining the constraint function:
def const(x):
x = np.array(x)
x = x.reshape(1,-1)
Y = []
for n in range(df_out.shape[1]):
y = Model[n].predict(x)[0]
Y.append(y)
Y = pd.DataFrame(Y)
a4 =[]
for k in range(Y.shape[0]):
a1 = Y.iloc[k,0] - con_min.iloc[k,0]
a2 = con_max.iloc[k, 0] - Y.iloc[k,0]
a3 = [a2,a1]
a4 = np.concatenate([a4, a3])
return a4
c = const(x)
con = {'type': 'ineq', 'fun': const}
This is where I try to minimize. I do not pick a method as the automatically picked model has worked so far.
sol = minimize(fun = objective, x0=x,constraints=con, bounds=bnds)
So the actual constraints are:
c_min = [0.20,1000]
c_max = [0.3,1600]
and the max and min range for the boundaries are:
range_max = [285,200,8,85,0.04,1.6,10,3.5,20,-5]
range_min = [215,170,-1,60,0,1,6,2.5,16,-18]
I think you should check the output of 'sol'. At times, the algorithm is not able to perform line search completely. To check for this, you should check message associated with 'sol'. In such a case, the optimizer returns initial parameters itself. There may be various reasons of this behavior. In a nutshell, please check the output of sol and act accordingly.
Arad,
If you have not yet resolved your issue, try using scipy.optimize.differential_evolution instead of scipy.optimize.minimize. I ran into similar issues, particularly with decision trees because of their step-like behavior resulting in infinite gradients.

LoadError using approximate bayesian criteria

I am getting an error that is confusing me.
using DifferentialEquations
using RecursiveArrayTools # for VectorOfArray
using DiffEqBayes
f2 = #ode_def_nohes LotkaVolterraTest begin
dx = x*(1 - x - A*y)
dy = rho*y*(1 - B*x - y)
end A B rho
u0 = [1.0;1.0]
tspan = (0.0,10.0)
p = [0.2,0.5,0.3]
prob = ODEProblem(f2,u0,tspan,p)
sol = solve(prob,Tsit5())
t = collect(linspace(0,10,200))
randomized = VectorOfArray([(sol(t[i]) + .01randn(2)) for i in 1:length(t)])
data = convert(Array,randomized)
priors = [Uniform(0.0, 2.0), Uniform(0.0, 2.0), Uniform(0.0, 2.0)]
bayesian_result_abc = abc_inference(prob, Tsit5(), t, data,
priors;num_samples=500)
Returns the error
ERROR: LoadError: DimensionMismatch("first array has length 400 which does not match the length of the second, 398.")
while loading..., in expression starting on line 20.
I have not been able to locate any array of size 400 or 398.
Thanks for your help.
Take a look at https://github.com/JuliaDiffEq/DiffEqBayes.jl/issues/52, that was due to an error in passing the t. This has been fixed on master so you can use that or wait some time, we will have a new release soon with the 1.0 upgrades which will have this fixed too.
Thanks!

Julia: Error when trying to minimize a function with optimize

I have the following function with multiple arguments that I would like to minimize with Optim.jl:
function post(parm,y,x,n)
# Evaluate the log of the marginal posterior for parm at a point
fgamma=zeros(n,1);
for ii = 1:2
fgamma = fgamma + parm[ii+1]*(x[:,ii+1].^parm[4]);
end
fgamma = fgamma.^(1/parm[4]);
fgamma = fgamma + parm[1]*ones(n,1);
lpost = .5*n*log.((y - fgamma)'*(y-fgamma));
end
However, when i try to use optimize, Julia returns an error.
Old error (with parm):
MethodError: no method matching finite_difference!(::##1#2, ::Array{Float64,2}, ::Array{Float64,2}, ::Symbol)
New error(with parm2):
MethodError: Cannot `convert` an object of type Array{Float64,2} to an object of type Float64
The complete script with data and optimize call I am using is this:
using Distributions
using Optim
n = 200;
k = 3;
x = ones(n,k);
fgamma=zeros(n,1);
gam = [1.01; 0.6; 0.8; 1.5];
x[:,2] = rand(Chisq(10),n);
x[:,3] = rand(Chisq(5),n);
epsl = rand(Normal(0,1),n);
y = zeros(n,1);
for i = 1:n
y[i,1] = gam[1] + (gam[2]*x[i,2]^gam[4] + gam[3]*x[i,3]^gam[4])^(1/gam[4]) + epsl[i];
end
# Sim
bols = inv(x'x)x'y;
s2 = (y-x*bols)'*(y-x*bols)/(n-k);
sse=(n-k)*s2;
bolscov = s2.*inv(x'*x);
bolssd=zeros(k,1);
for i = 1:k
bolssd[i,1]=sqrt(bolscov[i,i]);
end
# Calculate posterior mode and Hessian at mode
nparam=k+1;
parm = ones(nparam,1);
parm[1:k,1]=bols;
parm2 = vec(parm);
opt = Optim.Options(f_tol = 1e-8, iterations = 1000);
Optim.after_while!{T}(d, state::Optim.BFGSState{T}, method::BFGS, options) = global invH = state.invH
res = optimize(p -> post(p,y,x,n), parm2, BFGS(), opt)
Does anyone knows what I am doing wrong? I think that the there is a problem with the type of lpost in the function post, since it returns a 1x1 Array{Float64,2}. Unfortunately, i couldn't handle it well.
The error message
MethodError: Cannot `convert` an object of type Array{Float64,2} to an object of type Float64
is caused by an attempt to convert a matrix into a scalar. In general this is not possible, but when the matrix is a 1x1 matrix (as the question pointed out), there is a natural transformation: scalar = matrix[1,1].
optimize wants a scalar value returned because it is a scalar non-linear optimization routine. Optimizing a vector value is even hard to unambiguously define (concepts such as Pareto optima is an attempt to do so).
So, after this prelude, the fix is simple, together with an issue with Complex optimization #fst (the poster) later tackled. Again, a single dimensional scalar is required, so real(...) was used to make a scalar out of a complex value (more precisely an ordered scalar, as complex numbers are scalars too). The resulting post function is:
function post(parm,y,x,n)
# Evaluate the log of the marginal posterior for parm at a point
fgamma=zeros(n,1);
for ii = 1:2
fgamma = fgamma + parm[ii+1]*(x[:,ii+1].^parm[4]);
end
fgamma = fgamma.^Complex(1/parm[4]);
fgamma = fgamma + parm[1]*ones(n,1);
lpost = .5*n*log.((y - fgamma)'*(y-fgamma));
return real(lpost[1,1])
end