Julia: Error when trying to minimize a function with optimize - optimization

I have the following function with multiple arguments that I would like to minimize with Optim.jl:
function post(parm,y,x,n)
# Evaluate the log of the marginal posterior for parm at a point
fgamma=zeros(n,1);
for ii = 1:2
fgamma = fgamma + parm[ii+1]*(x[:,ii+1].^parm[4]);
end
fgamma = fgamma.^(1/parm[4]);
fgamma = fgamma + parm[1]*ones(n,1);
lpost = .5*n*log.((y - fgamma)'*(y-fgamma));
end
However, when i try to use optimize, Julia returns an error.
Old error (with parm):
MethodError: no method matching finite_difference!(::##1#2, ::Array{Float64,2}, ::Array{Float64,2}, ::Symbol)
New error(with parm2):
MethodError: Cannot `convert` an object of type Array{Float64,2} to an object of type Float64
The complete script with data and optimize call I am using is this:
using Distributions
using Optim
n = 200;
k = 3;
x = ones(n,k);
fgamma=zeros(n,1);
gam = [1.01; 0.6; 0.8; 1.5];
x[:,2] = rand(Chisq(10),n);
x[:,3] = rand(Chisq(5),n);
epsl = rand(Normal(0,1),n);
y = zeros(n,1);
for i = 1:n
y[i,1] = gam[1] + (gam[2]*x[i,2]^gam[4] + gam[3]*x[i,3]^gam[4])^(1/gam[4]) + epsl[i];
end
# Sim
bols = inv(x'x)x'y;
s2 = (y-x*bols)'*(y-x*bols)/(n-k);
sse=(n-k)*s2;
bolscov = s2.*inv(x'*x);
bolssd=zeros(k,1);
for i = 1:k
bolssd[i,1]=sqrt(bolscov[i,i]);
end
# Calculate posterior mode and Hessian at mode
nparam=k+1;
parm = ones(nparam,1);
parm[1:k,1]=bols;
parm2 = vec(parm);
opt = Optim.Options(f_tol = 1e-8, iterations = 1000);
Optim.after_while!{T}(d, state::Optim.BFGSState{T}, method::BFGS, options) = global invH = state.invH
res = optimize(p -> post(p,y,x,n), parm2, BFGS(), opt)
Does anyone knows what I am doing wrong? I think that the there is a problem with the type of lpost in the function post, since it returns a 1x1 Array{Float64,2}. Unfortunately, i couldn't handle it well.

The error message
MethodError: Cannot `convert` an object of type Array{Float64,2} to an object of type Float64
is caused by an attempt to convert a matrix into a scalar. In general this is not possible, but when the matrix is a 1x1 matrix (as the question pointed out), there is a natural transformation: scalar = matrix[1,1].
optimize wants a scalar value returned because it is a scalar non-linear optimization routine. Optimizing a vector value is even hard to unambiguously define (concepts such as Pareto optima is an attempt to do so).
So, after this prelude, the fix is simple, together with an issue with Complex optimization #fst (the poster) later tackled. Again, a single dimensional scalar is required, so real(...) was used to make a scalar out of a complex value (more precisely an ordered scalar, as complex numbers are scalars too). The resulting post function is:
function post(parm,y,x,n)
# Evaluate the log of the marginal posterior for parm at a point
fgamma=zeros(n,1);
for ii = 1:2
fgamma = fgamma + parm[ii+1]*(x[:,ii+1].^parm[4]);
end
fgamma = fgamma.^Complex(1/parm[4]);
fgamma = fgamma + parm[1]*ones(n,1);
lpost = .5*n*log.((y - fgamma)'*(y-fgamma));
return real(lpost[1,1])
end

Related

Trouble writing OptimizationFunction for automatic forward differentiation during Parameter Estimation of an ODEProblem

I am trying to learn Julia for its potential use in parameter estimation. I am interested in estimating kinetic parameters of chemical reactions, which usually involves optimizing reaction parameters with multiple independent batches of experiments. I have successfully optimized a single batch, but need to expand the problem to use many different batches. In developing a sample problem, I am trying to optimize using two toy batches. I know there are probably smarter ways to do this (subject of a future question), but my current workflow involves calling an ODEProblem for each batch, calculating its loss against the data, and minimizing the sum of the residuals for the two batches. Unfortunately, I get an error when initiating the optimization with Optimization.jl. The current code and error are shown below:
using DifferentialEquations, Plots, DiffEqParamEstim
using Optimization, ForwardDiff, OptimizationOptimJL, OptimizationNLopt
using Ipopt, OptimizationGCMAES, Optimisers
using Random
#Experimental data, species B is NOT observed in the data
times = [0.0, 0.071875, 0.143750, 0.215625, 0.287500, 0.359375, 0.431250,
0.503125, 0.575000, 0.646875, 0.718750, 0.790625, 0.862500,
0.934375, 1.006250, 1.078125, 1.150000]
A_obs = [1.0, 0.552208, 0.300598, 0.196879, 0.101175, 0.065684, 0.045096,
0.028880, 0.018433, 0.011509, 0.006215, 0.004278, 0.002698,
0.001944, 0.001116, 0.000732, 0.000426]
C_obs = [0.0, 0.187768, 0.262406, 0.350412, 0.325110, 0.367181, 0.348264,
0.325085, 0.355673, 0.361805, 0.363117, 0.327266, 0.330211,
0.385798, 0.358132, 0.380497, 0.383051]
P_obs = [0.0, 0.117684, 0.175074, 0.236679, 0.234442, 0.270303, 0.272637,
0.274075, 0.278981, 0.297151, 0.297797, 0.298722, 0.326645,
0.303198, 0.277822, 0.284194, 0.301471]
#Create additional data sets for a multi data set optimization
#Simple noise added to data for testing
times_2 = times[2:end] .+ rand(range(-0.05,0.05,100))
P_obs_2 = P_obs[2:end] .+ rand(range(-0.05,0.05,100))
A_obs_2 = A_obs[2:end].+ rand(range(-0.05,0.05,100))
C_obs_2 = C_obs[2:end].+ rand(range(-0.05,0.05,100))
#ki = [2.78E+00, 1.00E-09, 1.97E-01, 3.04E+00, 2.15E+00, 5.27E-01] #Target optimized parameters
ki = [0.1, 0.1, 0.1, 0.1, 0.1, 0.1] #Initial guess of parameters
IC = [1.0, 0.0, 0.0, 0.0] #Initial condition for each species
tspan1 = (minimum(times),maximum(times)) #tuple timespan of data set 1
tspan2 = (minimum(times_2),maximum(times_2)) #tuple timespan of data set 2
# data = VectorOfArray([A_obs,C_obs,P_obs])'
data = vcat(A_obs',C_obs',P_obs') #Make multidimensional array containing all observed data for dataset1, transpose to match shape of ODEProblem output
data2 = vcat(A_obs_2',C_obs_2',P_obs_2') #Make multidimensional array containing all observed data for dataset2, transpose to match shape of ODEProblem output
#make dictionary containing data, time, and initial conditions
keys1 = ["A","B"]
keys2 = ["time","obs","IC"]
entryA =[times,data,IC]
entryB = [times_2, data2,IC]
nest=[Dict(zip(keys2,entryA)),Dict(zip(keys2,entryB))]
exp_dict = Dict(zip(keys1,nest)) #data dictionary
#rate equations in power law form r = k [A][B]
function rxn(x, k)
A = x[1]
B = x[2]
C = x[3]
P = x[4]
k1 = k[1]
k2 = k[2]
k3 = k[3]
k4 = k[4]
k5 = k[5]
k6 = k[6]
r1 = k1 * A
r2 = k2 * A * B
r3 = k3 * C * B
r4 = k4 * A
r5 = k5 * A
r6 = k6 * A * B
return [r1, r2, r3, r4, r5, r6] #returns reaction rate of each equation
end
#Mass balance differential equations
function mass_balances(di,x,args,t)
k = args
r = rxn(x, k)
di[1] = - r[1] - r[2] - r[4] - r[5] - r[6] #Species A
di[2] = + r[1] - r[2] - r[3] - r[6] #Species B
di[3] = + r[2] - r[3] + r[4] #Species C
di[4] = + r[3] + r[5] + r[6] #Species P
end
function ODESols(time,uo,parms)
time_init = (minimum(time),maximum(time))
prob = ODEProblem(mass_balances,uo,time_init,parms)
sol = solve(prob, Tsit5(), reltol=1e-8, abstol=1e-8,save_idxs = [1,3,4],saveat=time) #Integrate prob
return sol
end
function cost_function(data_dict,parms)
res_dict = Dict(zip(keys(data_dict),[0.0,0.0]))
for key in keys(data_dict)
pred = ODESols(data_dict[key]["time"],data_dict[key]["IC"],parms)
loss = L2Loss(data_dict[key]["time"],data_dict[key]["obs"])
err = loss(pred)
res_dict[key] = err
end
residual = sum(res_dict[key] for key in keys(res_dict))
#show typeof(residual)
return residual
end
lb = [0.0,0.0,0.0,0.0,0.0,0.0] #parameter lower bounds
ub = [10.0,10.0,10.0,10.0,10.0,10.0] #parameter upper bounds
optfun = Optimization.OptimizationFunction(cost_function,Optimization.AutoForwardDiff())
optprob = Optimization.OptimizationProblem(optfun,exp_dict, ki,lb=lb,ub=ub,reltol=1E-8) #Set up optimization problem
optsol=solve(optprob, BFGS(),maxiters=10000) #Solve optimization problem
println(optsol.u) #print solution
when I call optsol I get the error:
ERROR: MethodError: no method matching ForwardDiff.GradientConfig(::Optimization.var"#89#106"{OptimizationFunction{true, Optimization.AutoForwardDiff{nothing}, typeof(cost_function), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED_NO_TIME), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Vector{Float64}}, ::Dict{String, Dict{String, Array{Float64}}}, ::ForwardDiff.Chunk{2})
Searching online suggests that the issue may be that my cost_function function is not generic enough for ForwardDiff to handle, however I am not sure how to identify where the issue is in this function, or whether it is related to the functions (mass_balances and rxn) that are called within cost_function. Another potential issue is that I am not calling the functions appropriately when building the OptimizationFunction or the OpptimizationProblem, but I cannot identify the issue here either.
Thank you for any suggestions and your help in troubleshooting this application!
res_dict = Dict(zip(keys(data_dict),[0.0,0.0]))
This dictionary is declared to the wrong type.
zerotype = zero(params[1])
res_dict = Dict(zip(keys(data_dict),[zerotype ,zerotype]))
or
res_dict = Dict(zip(keys(data_dict),zeros(eltype(params),2)))
Either way, you want your intermediate calculations to match the type of params when using AutoForwardDiff().
In addition to the variable type specification suggested by Chris, my model also had an issue with the order of the arguments of cost_function and how I passed the arguments to the problem in optprob. This solution was shown by Contradict here

Domain error when using Nelder Mead algorithm in Julia

I am struggling with optimization in Julia.
I used to use Matlab but I am trying to work on Julia instead.
The following is the code I wrote.
using Optim
V = fill(1.0, (18,14,5))
agrid = range(-2, stop=20, length=18)
dgrid = range(0.01, stop=24, length=14)
#zgrid = [0.5; 0.75; 1.0; 1.25; 1.5]
zgrid = [0.7739832502827438; 0.8797631785217791; 1.0; 1.1366695315439874; 1.2920176239404275]
# function
function adj_utility(V,s_a,s_d,s_z,i_z,c_a,c_d)
consumption = s_z + 1.0125*s_a + (1-0.018)*s_d - c_a - c_d - 0.05*(1-0.018)*s_d
if consumption >= 0
return (1/(1-2)) * (( (consumption^0.88) * (c_d^(1-0.88)) )^(1-2))
end
if consumption < 0
return -99999999
end
end
# Optimization
i_a = 1
i_d = 3
i_z = 1
utility_adj(x) = -adj_utility(V,agrid[i_a],dgrid[i_d],zgrid[i_z],i_z,x[1],x[2])
result1 = optimize(utility_adj, [1.0, 1.0], NelderMead())
If I use zgrid = [0.5; 0.75; 1.0; 1.25; 1.5], then the code works.
However, if I use zgrid = [0.7739832502827438; 0.8797631785217791; 1.0; 1.1366695315439874; 1.2920176239404275], I got an error message "DomainError with -0.3781249999999996"
In the function, if the consumption is less than 0 then the value should be -9999999 so I am not sure why I am getting this message.
Any help would be appreciated.
Thank you.
Raising negative numbers to non-integer powers returns complex numbers, which is where your error is coming from.
julia> (-0.37)^(1-0.88)
ERROR: DomainError with -0.37:
Exponentiation yielding a complex result requires a complex argument.
Replace x^y with (x+0im)^y, Complex(x)^y, or similar.
Stacktrace:
[1] throw_exp_domainerror(::Float64) at ./math.jl:37
[2] ^(::Float64, ::Float64) at ./math.jl:888
[3] top-level scope at REPL[5]:1
You have a constraint that consumption must be strictly positive, but if you want consumption to be a real number you will need constraints that c_d is positive as well. You can either add this directly to your objective function as above, or you can use one of the constrained optimization algorithms in NLopt, which is available in Julia via the NLopt package.

Multi-GPU TFF simulation errors "Detected dataset reduce op in multi-GPU TFF simulation"

I ran my code for an emotion detection model using Tensorflow Federated simulation. My code work perfectly fine using CPUs only. However, I received this error when trying to run TFF with GPU.
ValueError: Detected dataset reduce op in multi-GPU TFF simulation: `use_experimental_simulation_loop=True` for `tff.learning`; or use `for ... in iter(dataset)` for your own dataset iteration.Reduce op will be functional after b/159180073.
What is this error about and how can I fix it? I tried to search many places but found no answer.
Here is the call stack if it help. It is very long so I pasted into this link: https://pastebin.com/b1R93gf1
EDIT:
Here is the code containing iterative_process
def startTraining(output_file):
iterative_process = tff.learning.build_federated_averaging_process(
model_fn,
client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.01),
server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=1.0),
use_experimental_simulation_loop=True
)
flstate = iterative_process.initialize()
evaluation = tff.learning.build_federated_evaluation(model_fn)
output_file.write(
'round,available_users,loss,sparse_categorical_accuracy,val_loss,val_sparse_categorical_accuracy,test_loss,test_sparse_categorical_accuracy\n')
curr_round_result = [0,0,100,0,100,0]
min_val_loss = 100
for round in range(1,ROUND_COUNT + 1):
available_users = fetch_available_users_and_increase_time(ROUND_DURATION_AVERAGE + random.randint(-ROUND_DURATION_VARIATION, ROUND_DURATION_VARIATION + 1))
if(len(available_users) == 0):
write_to_file(curr_round_result)
continue
train_data = make_federated_data(available_users, 'train')
flstate, metrics = iterative_process.next(flstate, train_data)
val_data = make_federated_data(available_users, 'val')
val_metrics = evaluation(flstate.model, val_data)
curr_round_result[0] = round
curr_round_result[1] = len(available_users)
curr_round_result[2] = metrics['train']['loss']
curr_round_result[3] = metrics['train']['sparse_categorical_accuracy']
curr_round_result[4] = val_metrics['loss']
curr_round_result[5] = val_metrics['sparse_categorical_accuracy']
write_to_file(curr_round_result)
Here is the code for make_federated_data
def make_federated_data(users, dataset_type):
offset = 0
if(dataset_type == 'val'):
offset = train_size
elif(dataset_type == 'test'):
offset = train_size + val_size
global LOADED_USER
for id in users:
if(id + offset not in LOADED_USER):
LOADED_USER[id + offset] = getDatasetFromFilePath(filepaths[id + offset])
return [
LOADED_USER[id + offset]
for id in users
]
TFF does support Multi-GPU, and as the error message says one of two things is happening:
The code is using tff.learning but using the default use_experimental_simulation_loop argument value of False. With multiple GPUs, this must be set to True when using APIs including tff.learning.build_federated_averaging_process. For example, calling with:
training_process = tff.learning.build_federated_averaging_process(
..., use_experimental_simulation_loop=True)
The code contains a custom tf.data.Dataset.reduce(...) call somewhere. This must be replaced with Python code that iterates over the dataset. For example:
result = dataset.reduce(initial_state=0, reduce_func=lambda s, x: s + x)
becomes
s = 0
for x in iter(dataset):
s += x
I realized that TFF has not yet supported multi-GPUs. Therefore, we need to limit number visible of GPUs to just 1, using:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

LoadError using approximate bayesian criteria

I am getting an error that is confusing me.
using DifferentialEquations
using RecursiveArrayTools # for VectorOfArray
using DiffEqBayes
f2 = #ode_def_nohes LotkaVolterraTest begin
dx = x*(1 - x - A*y)
dy = rho*y*(1 - B*x - y)
end A B rho
u0 = [1.0;1.0]
tspan = (0.0,10.0)
p = [0.2,0.5,0.3]
prob = ODEProblem(f2,u0,tspan,p)
sol = solve(prob,Tsit5())
t = collect(linspace(0,10,200))
randomized = VectorOfArray([(sol(t[i]) + .01randn(2)) for i in 1:length(t)])
data = convert(Array,randomized)
priors = [Uniform(0.0, 2.0), Uniform(0.0, 2.0), Uniform(0.0, 2.0)]
bayesian_result_abc = abc_inference(prob, Tsit5(), t, data,
priors;num_samples=500)
Returns the error
ERROR: LoadError: DimensionMismatch("first array has length 400 which does not match the length of the second, 398.")
while loading..., in expression starting on line 20.
I have not been able to locate any array of size 400 or 398.
Thanks for your help.
Take a look at https://github.com/JuliaDiffEq/DiffEqBayes.jl/issues/52, that was due to an error in passing the t. This has been fixed on master so you can use that or wait some time, we will have a new release soon with the 1.0 upgrades which will have this fixed too.
Thanks!

Gaussian Log Likelyhood loss function in Tensorflow

I need to implement a gaussian log likelihood loss function in Tensorflow, however I am not sure if what I wrote is correct. I think this is the correct definition of the loss function.
I went around implementing it like this:
two_pi = 2*np.pi
def gaussian_density_function(x, mean, stddev):
stddev2 = tf.pow(stddev, 2)
z = tf.multiply(two_pi, stddev2)
z = tf.pow(z, 0.5)
arg = -0.5*(x-mean)
arg = tf.pow(arg, 2)
arg = tf.div(arg, stddev2)
return tf.divide(tf.exp(arg), z)
mean_x, var_x = tf.nn.moments(dae_output_tensor, [0])
stddev_x = tf.sqrt(var_x)
loss_op_AE = -gaussian_density_function(inputs, mean_x, stddev_x)
loss_op_AE = tf.reduce_mean(loss_op_AE)
I want to use this as the loss function for an autoencoder, however, I am not sure this implementation is correct, since I get a NaN out of loss_op_AE.
EDIT: I also tried using:
mean_x, var_x = tf.nn.moments(autoencoder_output, axes=[1,2])
stddev_x = tf.sqrt(var_x)
dist = tf.contrib.distributions.Normal(mean_x, stddev_x)
loss_op_AE = -dist.pdf(inputs)
and I get the same NaN values.
Model the stddev as log stddev, this should fix the nan issue. So instead of pretending stddev is sigma^2, pretend it is the natural logarithm of sigma^2.