PyMC3 PK modelling. Model cant resolve to parameters used to create the data set - bayesian

I am new to PK modelling and pymc3, but I have been playing around with pymc3 and trying to implement a simple PK model as part of my own learning. Specifically a model that captures this relationship...
Where C(t)(Cpred) is concentration at time t, Dose is the dose given, V is Volume of distribution, CL is clearance.
I have generated some test data (30 subjects) with values of CL =2 , V=10, for 3 doses 100,200,300, and generated data at timepoints 0,1,2,4,8,12, and also included some random error on CL (normal distribution, 0 mean, omega =0.6) and on the residual unexplained error DV = Cpred + sigma, where sigma is normally distributed the SD =0.33. In addition I have included a transformation on C and V with respect to the weight (uniform distribution 50-90) CLi = CL * WT/70; Vi = V * WT/70.
# Create Data for modelling
np.random.seed(0)
# Subject ID's
data = pd.DataFrame(np.arange(1,31), columns=['subject'])
# Dose
Data['dose'] = np.array([100,100,100,100,100,100,100,100,100,100,
200,200,200,200,200,200,200,200,200,200,
300,300,300,300,300,300,300,300,300,300])
# Random Body Weight
data['WT'] = np.random.randint(50,100, size =30)
# Fixed Clearance and Volume for the population
data['CLpop'] =2
data['Vpop']=10
# Error rate for individual clearance rate
OMEGA = 0.66
# Individual clearance rate as a function of weight and omega
data['CLi'] = data['CLpop']*(data['WT']/70)+ np.random.normal(0, OMEGA )
# Individual Volume as a function of weight
data['Vi'] = data['Vpop']*(data['WT']/70)
# Expand dataframe to account for time points
data = pd.concat([data]*6,ignore_index=True )
data = data.sort('subject')
# Add in time points
data['time'] = np.tile(np.array([0,1,2,4,8,12]), 30)
# Create concentration values using equation
data['Cpred'] = data['dose']/data['Vi'] *np.exp(-1*data['CLi']/data['Vi']*data['time'])
# Error rate for DV
SIGMA = 0.33
# Create Dependenet Variable from Cpred + error
data['DV']= data['Cpred'] + np.random.normal(0, SIGMA )
# Create new df with only data for modelling...
df = data[['subject','dose','WT', 'time', 'DV']]
Create arrays ready for model...
# Prepare data from df to model specific arrays
time = np.array(df['time'])
dose = np.array(df['dose'])
DV = np.array(df['DV'])
WT = np.array(df['WT'])
n_patients = len(data['subject'].unique())
subject = data['subject'].values-1
I have built a simple model in pymc3 ....
pk_model = Model()
with pk_model:
# Hyperparameter Priors
sigma = Lognormal('sigma', mu =0, tau=0.01)
V = Lognormal('V', mu =2, tau=0.01)
CL = Lognormal('CL', mu =1, tau=0.01)
# Transformation wrt to weight
CLi = CL*(WT)/70
Vi = V*(WT)/70
# Expected value of outcome
pred = dose/Vi*np.exp(-1*(CLi/Vi)*time)
# Likelihood (sampling distribution) of observations
conc = Normal('conc', mu =pred, tau=sigma, observed = DV)
My expectation was that I should have been able to resolve from the data the constants and error rates that were originally used to generate the data, although I have not been able to do this, although I can get close. In this example...
data['CLi'].mean()
> 2.322473543135788
data['Vi'].mean()
> 10.147619047619049
And the trace shows....
So my questions are..
Is my code structured correctly and are there any glaring mistakes that I have overlooked that might account for this difference?
Can I structure the pymc3 model to better reflect the relationship from which I have generated the data?
What would be your suggestions to improve the model?
Thanks in advance!

I'm going to answer my own question!
But I implemented a hierarchal model following the example found here...
GLM -hierarchical
and it works a treat. Also I noticed errors in the way I was applying the errors in the dataframe - should use
data['CLer'] = np.random.normal(scale=OMEGA, size=30)
To ensure each subject has a different value for the error

Related

Scipy Optimization for Inventory allocation

I am new in Python in optimization with scipy and very enthousiastic about it!
I am trying to optimize inventory allocation for spare parts (very low demand) using SciPy.
The goal is to minimize the inventory value while obtaining a target service rate (otif for on-time in full).
I defined a function otif to return a scalar for the objective.
There is also a function stock_val to evaluate the value of the stock (taking the allocation and the price)
Finally I have minimize_stock in which I define an initial guess, provide bounds and constraint.
The constraint is simply to reach the targeted otif.
My issue is that minimize fails and just return the initial guess.
As entry:
- part_demand is a dataframe with a column with material part number and columns of ordered quantity per months.
- part_price is a dataframe with a column with material part number and price.
from scipy.optimize import minimize
def otif(part_demand, stocks):
"""
Returns the OnTimeInFull covered with stock using parts demand and stocks
"""
df_occ = part_demand[(part_demand !=0)].count(axis=1)
df_cov = part_demand[(part_demand !=0) & (part_demand.le(stocks, axis=0))].count(axis = 1)
return df_cov.sum() / df_occ.sum()
def stock_val(stocks, part_price):
"""
Returns the stock value of stock using parts prices
"""
return np.dot(stocks,part_price)
def minimize_stock(target_otif, part_demand, part_price):
"""
Returns the optimal stock value and distribution for a part demand and targeted OnTimeInFull
"""
n = part_demand.shape[0]
init_guess = np.repeat(10,n)
bounds = ((0,10),) * n # an N-tuple of 2-tuples
# construct the constraints
return_is_target = {'type': 'eq',
'fun': lambda stocks: target_otif - otif(part_demand, stocks)}
stocks = minimize(stock_val, init_guess,
args=(part_price,), method='SLSQP',
options={'disp': True},
constraints=(return_is_target),
bounds=bounds)
return stocks
# Define entry Variables
material_items = pnMonthly[dataMthColMask]
#display(material_items)
df_groupby = PNtable.groupby("Material").last()
material_price = df_groupby['UnitPrice']
n = material_items.shape[0]
init_guess = np.repeat(10000,n)
print(stock_val(init_guess, material_price))
sol = minimize_stock(target_otif=0.89, part_demand=material_items, part_price=material_price)
display(sol)
print('OTIF is ', format(otif(material_items, sol.x)))
print('Stock value is ', format(stock_val(sol.x, material_price)))
Thank you in advance for your help!
I tried different algorithms apart from SLSQP and still get failure.
I am not sure about which algorithm is best for this kind of optimization problem.
I am also not sure about the syntax for minimize or what are the conditions for the algorithm to work.

Nonlinear programming APOPT solver for optimal EV charging not infeasible with variables boundary <= 0 (GEKKO python)

I have tried to do optimal EV charging scheduling using the GEKKO packages. However, my code is stuck on some variable boundary condition when it is set to be lower than or equal to zero, i.e., x=m.Array(m.Var,n_var,value=0,lb=0,ub=1.0). The error message is 'Unsuccessful with error code 0'.
Below is my python script. If you have any advice on this problem, please don't hesitate to let me know.
Thanks,
Chitchai
#------------------------------
import numpy as np
import pandas as pd
import math
import os
from gekko import GEKKO
print('...... Preparing data for optimal EV charging ........')
#---------- Read data from csv.file -----------)
Main_path = os.path.dirname(os.path.realpath(__file__))
Baseload_path = Main_path + '/baseload_data.csv'
TOU_path = Main_path + '/TOUprices.csv'
EV_path = Main_path + '/EVtravel.csv'
df_Baseload = pd.read_csv(Baseload_path, index_col= 'Time')
df_TOU = pd.read_csv(TOU_path, index_col= 'Time')
df_EVtravel = pd.read_csv(EV_path, index_col= 'EV_no')
#Create and change run directory
rd= r'.\RunDir'
if not os.path.isdir(os.path.abspath(rd)):
os.mkdir(os.path.abspath(rd))
#--------------------EV charging optimization function----------------#
def OptEV(tou_rate, EV_data, P_baseload):
"""This function is to run EV charging optimization for each houshold"""
#Initialize EV model and traveling data in 96 intervals(interval = 15min)
s_time= 4*(EV_data[0]//1) + math.ceil(100*(EV_data[0]%1)/15) #starting time
d_time= 4*(EV_data[1]//1) + math.floor(100*(EV_data[1]%1)/15) #departure time
Pch_rating= EV_data[2] #charing rated power(kW)
kWh_bat= EV_data[3] #Battery rated capacity(kWh)
int_SoC= EV_data[4] #Initial battery's SoC(p.u.)
#Calculation charging period
if d_time<= s_time:
ch_period = 96+d_time-s_time
else:
ch_period = d_time-s_time
Np= int(ch_period)
print('charging period = %d intervals'%(Np))
#Construct revelant data list based on charging period
ch_time = [0]*Np #charging time step list
price_rate = [0]*Np #electricity price list
kW_baseload = [0]*Np #kW house baseload power list
#Re-arrange charging time, electricity price rate and baseload
for i in range(Np):
t_step = int(s_time)+i
if t_step <= 95: #Before midnight
ch_time[i] = t_step
price_rate[i] = tou_rate[t_step]
kW_baseload[i] = P_baseload[t_step]/1000 #active house baseload
else: #After midnight
ch_time[i] = t_step-96
price_rate[i] = tou_rate[t_step-96]
kW_baseload[i] = P_baseload[t_step-96]/1000
#Initialize Model
m = GEKKO(remote=False) # or m = GEKKO() for solve locally
m.path = os.path.abspath(rd) # change run directory
#define parameter
ch_eff= m.Const(value=0.90) #charging/discharging efficiency
alpha= m.Const(value= 0.00005) #regularization constant battery profile
net_load= [None]*Np #net metering houshold load power array
elec_price= [None]*Np #purchased electricity price array
SoC= [None]*(Np+1) #SoC of batteries array
#initialize variables
n_var= Np #number of dicision variables
x = m.Array(m.Var,n_var,value=0,lb=0,ub=1.0) #dicision charging variables
#Calculation relevant evaluated parameters
#x[0] = m.Intermediate(-1.025)
SoC[0]= m.Intermediate(int_SoC) #initial battery SoC
for i in range(Np):
#Netload metering evaluation
net_load[i]= m.Intermediate(kW_baseload[i]+x[i]*Pch_rating)
#electricity cost price evaluation(cent/kWh)
Neg_pr= (1/4)*net_load[i]*price_rate[i] # Reverse power cost
Pos_pr= (1/4)*net_load[i]*price_rate[i] # Purchased power cost
elec_price[i]= m.Intermediate(m.if3(net_load[i], Neg_pr, Pos_pr))
#current battery's SoC evaluation
j=i+1
SoC_dch= (1/4)*(x[i]*Pch_rating/ch_eff)/kWh_bat #Discharging(V2G)
SoC_ch= (1/4)*ch_eff*x[i]*Pch_rating/kWh_bat #Discharging
SoC[j]= m.Intermediate(m.if3(x[i], SoC[j-1]+SoC_dch, SoC[j-1]+SoC_ch))
#m.solve(disp=False)
#-------Constraint functions--------#
#EV battery constraint
m.Equation(SoC[-1] >= 0.80) #required departure SoC (minimum=80%)
for i in range(Np):
j=i+1
m.Equation(SoC[j] >= 0.20) #lower SoC limit = 20%
for i in range(Np):
j=i+1
m.Equation(SoC[j] <= 0.95) #upper SoC limit = 95%
#household Net-power constraint
for i in range(Np):
m.Equation(net_load[i] >= -10.0) #Lower netload power limit
for i in range(Np):
m.Equation(net_load[i] <= 10.0) #Upper netload power limit
#Objective functions
elec_cost = m.Intermediate(m.sum(elec_price)) #electricity cost
#battery degradation cost
bat_cost = m.Intermediate(m.sum([alpha*xi**2 for xi in x]))
#bat_cost = 0 #Not consider battery degradation cost
m.Minimize(elec_cost + bat_cost) # objective
#Set global options
m.options.IMODE = 3 #steady state optimization
#Solve simulation
try:
m.solve(disp=True) # solve
print('--------------Results---------------')
print('Objective Function= ' + str(m.options.objfcnval))
print('x= ', x)
print('price_rate= ', price_rate)
print('net_load= ', net_load)
print('elec_price= ', elec_price)
print('SoC= ', SoC)
print('Charging time= ', ch_time)
except:
print('*******Not successful*******')
print('--------------No convergence---------------')
# from gekko.apm import get_file
# print(m._server)
# print(m._model_name)
# f = get_file(m._server,m._model_name,'infeasibilities.txt')
# f = f.decode().replace('\r','')
# with open('infeasibilities.txt', 'w') as fl:
# fl.write(str(f))
Pcharge = x
return ch_time, Pcharge
pass
#---------------------- Run scripts ---------------------#
TOU= df_TOU['Prices'] #electricity TOU prices rate (c/kWh)
Load1= df_Baseload['Load1']
EV_data = [17.15, 8.15, 3.3, 24, 0.50] #[start,final,kW_rate,kWh_bat,int_SoC]
OptEV(TOU, EV_data, Load1)
#--------------------- End of a script --------------------#
When the solver fails to find a solution and reports "Solution Not Found", there is a troubleshooting method to diagnose the problem. The first thing to do is to look at the solver output with m.solve(disp=True). The solver may have identified either an infeasible problem or it reached the maximum number of iterations without converging to a solution. In your case, it identified the problem as infeasible.
Infeasible Problem
If the solver failed because of infeasible equations then it found that the combination of variables and equations is not solvable. You can try to relax the variable bounds or identify which equation is infeasible with the infeasibilities.txt file in the run directory. Retrieve the infeasibilities.txt file from the local run directory that you can view with m.open_folder() when m=GEKKO(remote=False).
Maximum Iteration Limit
If the solver reached the default iteration limit (m.options.MAX_ITER=250) then you can either try to increase this limit or else try the strategies below.
Try a different solver by setting m.options.SOLVER=1 for APOPT, m.options.SOLVER=2 for BPOPT, m.options.SOLVER=3 for IPOPT, or m.options.SOLVER=0 to try all the available solvers.
Find a feasible solution first by solving a square problem where the number of variables is equal to the number of equations. Gekko a couple options to help with this including m.options.COLDSTART=1 (sets STATUS=0 for all FVs and MVs) or m.options.COLDSTART=2 (sets STATUS=0 and performs block diagonal triangular decomposition to find possible infeasible equations).
Once a feasible solution is found, try optimizing with this solution as the initial guess.

how to get better Kriging result graphs in openturns?

I performed spherical Kriging, but I can't seem to get good output graphs.
The coordinates(x, and y) range from around around 51 latitude and around 6.5 as longitude
my observations range from -70 to +10
here is my code :
import openturns as ot
import pandas as pd
# your input / output data can be easily formatted as samples for openturns
df = pd.read_csv("kreuzkerpenutm.csv")
inputdata = ot.Sample(df[['x','y']].values)
outputdata = ot.Sample(df[['z']].values)
dimension = 2 # dimension of your input (x,y)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={'bbox_to_anchor':(1,1), 'loc':"upper left"})
view.getFigure().tight_layout()
here is my output:
kriging metamodel graph
I don't know why my graph won't show me my inputs aswell as my kriging results.
thanks for ideas and help
If the input data is not scaled in [-1,1]^d, the kriging metamodel may have issues to identify the scale parameters using maximum likelihood optimization. In order to help for this, we may:
provide a better starting point for the scale parameters of the covariance model (this is trick "A" below),
set the bounds of the optimization algorithm so that the interval where the parameters are searched for correspond to the data at hand (this is trick "B" below).
This is what the following script does, using simulated data instead of a csv data file. In the script, I create the data using a g function which is scaled so that it produces results in the [-10, 70] range, as in your problem. Please look carefuly at the setScale() method which sets the initial value of the covariance model: this is the starting point of the optimization algorithm. Then look at the setOptimizationBounds() method, which sets the bounds of the optimization algorithm.
import openturns as ot
dimension = 2 # dimension of your input (x,y)
distribution = ot.ComposedDistribution([ot.Uniform(-10.0, 50.0)] * dimension)
inputdata = distribution.getSample(100)
g = ot.SymbolicFunction(["x", "y"], ["30 + 3.0 * sin(x / 10.0) * (y / 10.0) ^ 2"])
outputdata = g(inputdata)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
covarianceModel.setScale(inputdata.getMax()) # Trick A
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
# Trick B, v2
x_range = inputdata.getMax() - inputdata.getMin()
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * x_range
minimum_scale_bounds = scale_min_factor * x_range
scaleOptimizationBounds = ot.Interval(minimum_scale_bounds, maximum_scale_bounds)
algo.setOptimizationBounds(scaleOptimizationBounds)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
metamodel.setInputDescription(["x", "y"])
metamodel.setOutputDescription(["z"])
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={"bbox_to_anchor": (1, 1), "loc": "upper left"})
view.getFigure().tight_layout()
The previous script produces the following figure.
There are other ways to implement trick B. Here is one provided by J.Pelamatti:
# Trick B, v3
for d in range(X_train.getDimension()):
dist = scipy.spatial.distance.pdist(X_train[:,d])
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * np.max(dist)
minimum_scale_bounds = scale_min_factor * np.min(dist)
This topic is discussed in this particular thread in OT's forum.
Sorry for the late answer.
Which version of openturns are you using?
Probably you have an embedded transformation of (input) data, which makes the data range between (-3, 3) approximately (standard scaling). The kriging result should contains the transformation in such a case.
With more recent openturns implementations, this feature has been removed.
Hope this can help.
Cheers

How to apply bounds on a variable when performing optimisation in Pytorch?

I am trying to use Pytorch for non-convex optimisation, trying to maximise my objective (so minimise in SGD). I would like to bound my dependent variable x > 0, and also have the sum of my x values be less than 1000.
I think I have the penalty implemented correctly in the form of a ramp penalty, but am struggling with the bounding of the x variable. In Pytorch you can set the bounds using clamp but it doesn't seem appropriate in this case. I think this is because optim needs the gradients free under the hood. Full working example:
import torch
from torch.autograd import Variable
import numpy as np
def objective(x, a, b, c): # Want to maximise this quantity (so minimise in SGD)
d = 1 / (1 + torch.exp(-a * (x)))
# Checking constraint
exceeded_limit = constraint(x).item()
#print(exceeded_limit)
obj = torch.sum(d * (b * c - x))
# If overlimit add ramp penalty
if exceeded_limit < 0:
obj = obj - (exceeded_limit * 10)
print("Exceeded limit")
return - obj
def constraint(x, limit = 1000): # Must be > 0
return limit - x.sum()
N = 1000
# x is variable to optimise for
x = Variable(torch.Tensor([1 for ii in range(N)]), requires_grad=True)
a = Variable(torch.Tensor(np.random.uniform(0,100,N)), requires_grad=True)
b = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
c = Variable(torch.Tensor(np.random.rand(N)), requires_grad=True)
# Would like to include the clamp
# x = torch.clamp(x, min=0)
# Non-convex methodf
opt = torch.optim.SGD([x], lr=.01)
for i in range(10000):
# Zeroing gradients
opt.zero_grad()
# Evaluating the objective
obj = objective(x, a, b, c)
# Calculate gradients
obj.backward()
opt.step()
if i%1000==0: print("Objective: %.1f" % -obj.item())
print("\nObjective: {}".format(-obj))
print("Limit: {}".format(constraint(x).item()))
if torch.sum(x<0) > 0: print("Bounds not met")
if constraint(x).item() < 0: print("Constraint not met")
Any suggestions as to how to impose the bounds would be appreciated, either using clamp or otherwise. Or generally advice on non-convex optimisation using Pytorch. This is a much simpler and scaled down version of the problem I'm working so am trying to find a lightweight solution if possible. I am considering using a workaround such as transforming the x variable using an exponential function but then you'd have to scale the function to avoid the positive values becoming infinite, and I want some flexibility with being able to set the constraint.
I meet the same problem with you.
I want to apply bounds on a variable in PyTorch, too.
And I solved this problem by the below Way3.
Your example is a little compliex but I am still learning English.
So I give a simpler example below.
For example, there is a trainable variable v, its bounds is (-1, 1)
v = torch.tensor((0.5, ), require_grad=True)
v_loss = xxxx
optimizer.zero_grad()
v_loss.backward()
optimizer.step()
Way1. RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
v.clamp_(-1, 1)
Way2. RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed.
v = torch.clamp(v, -1, +1) # equal to v = v.clamp(-1, +1)
Way3. NotError. I solved this problem in Way3.
with torch.no_grad():
v[:] = v.clamp(-1, +1) # You must use v[:]=xxx instead of v=xxx

Dummy Variables in Julia

In R there is nice functionality for running a regression with dummy variables for each level of a categorical variable. e.g. Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level
Is there an equivalent way to do this in Julia.
x = randn(1000)
group = repmat(1:25 , 40)
groupMeans = randn(25)
y = 3*x + groupMeans[group]
data = DataFrame(x=x, y=y, g=group)
for i in levels(group)
data[parse("I$i")] = data[:g] .== i
end
lm(y~x+I1+I2+I3+I4+I5+I6+I7+I8+I9+I10+
I11+I12+I13+I14+I15+I16+I17+I18+I19+I20+
I21+I22+I23+I24, data)
If you are using the DataFrames package, after you pool the data, the package will take care of the rest:
Pooling columns is important for working with the GLM package When fitting regression models, PooledDataArray columns in the input are translated into 0/1 indicator columns in the ModelMatrix - with one column for each of the levels of the PooledDataArray.
You can see the rest of documentation on pooled data here