How to simulate from priors with pymc3 - bayesian

I'd like to simulate y from the prior (not from the posterior) with pymc3.
I first defined the model:
import pymc3 as pm
with pm.Model() as m:
mu = pm.Normal('mu', mu=0, sd=10)
sigma = pm.Uniform('sigma', lower=0, upper=10)
y = pm.Normal('y', mu=mu, sd=sigma)
trace = pm.sample(1000, tune=1000)
Then I tried to get 10 simulated y from the model with:
y_pred = pm.sample_ppc(trace, 10, m, size=10)
But result comes out empty. I searched through the documentation but I didn't find a relevant example. Is it possible to do it with pymc3?

The trace contains the sample from the prior when no observed is associated with the model definition. However, this could fail sometimes. We are currently working on a sample_prior function that would make this process easier and more straightforward: https://github.com/pymc-devs/pymc3/pull/2876

Related

It there a way to cache OpenMDAO component outputs to avoid duplicate executions?

I am writing a model consists of five subsystems. The first subsystem is generates data for other subsystems using the inputs it does not solve anything iteratively, therefore its outputs not changed during computation. I want it calls once the compute method just like the initialization. How can I write a model that calls once in run_model and calls every time just once in run_driver?
It's a little hard to be sure without more details, but you mention "iterative" so Im guessing you have a solver at the top level of your model and there is a component not involved in that solver loop, but that is getting called each time the solver iterates.
The solution to this is to make a sub-group in your model that has just the components that need to iterate. Put your only-run-once component at the top of the model, along with that group. Put the iterative solver on the sub-group.
An alternative solution is to add a bit of caching to your component, so it checks its inputs to see if they have changed. If they have, re-run. If they have not, just keep the old answer.
Here is an example that includes both features (note: the solver in this example does not converge because its a toy problem that doesn't have a valid physical solution. I just threw it together to illustrate the model structure and caching)
import openmdao.api as om
# from openmdao.utils.assert_utils import assert_check_totals
class StingyComp(om.ExplicitComponent):
def setup(self):
self.add_input('x1', val=2.)
self.add_input('x2', val=3.)
self.add_output('x')
self._input_hash = None
def compute(self, inputs, outputs):
x1 = inputs['x1'][0] # pull the scalar out so you can hash it
x2 = inputs['x2'][0]
print("running StingyComp")
current_input_hash = hash((x1, x2))
if self._input_hash != current_input_hash :
print(' ran compute')
outputs['x'] = 2*x1 + x2**2
self._input_hash = current_input_hash
else:
print(' skipped compute')
class NormalComp(om.ExplicitComponent):
def setup(self):
self.add_input('x1', val=2.)
self.add_input('x2', val=3.)
self.add_output('y')
def compute(self, inputs, outputs):
x1 = inputs['x1']
x2 = inputs['x2']
print("running normal Comp")
outputs['y'] = x1 + x2
p = om.Problem()
p.model.add_subsystem('run_once1', NormalComp(), promotes=['*'])
p.model.add_subsystem('run_once2', StingyComp(), promotes=['*'])
sub_group = p.model.add_subsystem('sub_group', om.Group(), promotes=['*']) # transparent group that could hold sub-solver
sub_group.add_subsystem('C1', om.ExecComp('f1 = f2**2 + 1.5 * x - y**2.5'), promotes=['*'])
sub_group.add_subsystem('C2', om.ExecComp('f2 = f1**2 + x**1.5 - 2.5*y'), promotes=['*'])
sub_group.nonlinear_solver = om.NewtonSolver(solve_subsystems=False)
sub_group.linear_solver = om.DirectSolver()
p.setup()
print('first run')
p.run_model()
print('second run, same inputs')
p.run_model()
p['x1'] = 10
p['x2'] = 27.5
print('third run, new inputs')
p.run_model()

How does GEKKO optimization with bounded variables work?

I am using GEKKO to estimate the parameters of a differential equation and I have bounded one of the variables between 0 and 1. However, when I solve the ODE, I get values outside of the bounds for this variable, so I was wondering if somebody knew how GEKKO finds the solution, as this might help me resolve the issue.
Here is the code I use to fit the data. This gives me a solution x and u where x is between 0 and 1.
However, afterwards, I try to solve the ODE using scipy.integrate.solve_ivp, with the initial value of u that I got, and the solution I get for u is not between this bounds. Since it should be unique, I am wondering what is the process that GEKKO follows to find the solution (does it proyect the values to the bound or how does it deal with this?) Any comment is very appreciated.
Here is an MVCE. If you run it you can see that with GEKKO, I get a solution between the bounds and then, when I solve the ODE with solve_ivp, I don't get the same solution. Can you explain why this happens and how can I deal with it? I want to use solve_ivp to predict the next values.
from scipy.integrate import solve_ivp
from gekko import GEKKO
import matplotlib.pyplot as plt
time=[0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357]
m = GEKKO(remote=False)
m.time= [0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357]
x_data= [0.0003777630481280617, 0.002024573836061331,\
0.0008954383363035536, 0.005331749410182463]
x = m.CV(value=x_data, lb=0); x.FSTATUS = 1 # fit to measurement
x.SPLO = 0
sigma = m.FV(value=0.5, lb= 0, ub=1); sigma.STATUS=1
d = m.Param(0.05)
k = m.Param(0.001)
b = m.Param(0.5)
r = m.FV(value=0.5, lb= 0); r.STATUS=1
m_param = m.Param(1)
u = m.Var(value=0.1, lb=0, ub=1)
m.free(u)
a = m.Param(0.999)
Kmax= m.Param(100000)
m.Equations([x.dt()==x*(r*(1-a*u**2)*(1-x/(Kmax*(1-a*u**2)))-\
m_param/(k+b*u)-d), u.dt() == \
sigma*((-2*a*(b**2)*r*(u**3)+4*a*b*k*r*(u**2)\
+2*a*(k**2)*r*u-b*m_param)/((b*u+k)**2))])
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.options.EV_TYPE = 1 # linear error (2 for squared)
m.solve(disp=False, debug=False) # display solver output
def model_case_3(t, z, r, k, b, Kmax, sigma):
m=1
a=0.999
x, u= z
dxdt = x*(r*(1-a*u**2)*(1-x/(Kmax*(1-a*u**2)))-m/(k+b*u)-0.05)
dudt = sigma*((-2*a*(b**2)*r*(u**3)+4*a*b*k*r*(u**2)\
+2*a*(k**2)*r*u-b*m)/((b*u+k)**2))
return [dxdt, dudt]
sol = solve_ivp(fun=model_case_3, t_span=[0.0, 0.2356902356902357],\
y0=[0.0003777630481280617, u.value[0]],\
t_eval=[0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357], \
args=(r.value[0], 0.001, 0.5,1000000 , sigma.value[0]))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,3), constrained_layout=True)
ax1.set_title('x')
ax1.plot(time, x.value, time, sol['y'][0])
ax2.set_title('u')
ax2.plot(time, u.value, time, sol['y'][1])
plt.show()
It is not an issue with the version of Gekko as I have Gekko 0.2.8, so I am wondering if it has anything to do with the initialization of variables. I run the example I posted on spyder (I was using google colab) and I got the correct solution, but when I run the rest of the cases I got again negative values for u (solving with solve_ivp), which is quite strange.
You can add a bound to a variable when it is created by setting lb (lower bound) and ub (upper bound).
z = m.Var(lb=0,ub=10)
After you create the variable, the bound is adjusted with .lower and .upper.
z.LOWER = 1
z.UPPER = 9
Here is an example problem that shows the use of bounds where x is constrained to be greater than 0.5.
from gekko import GEKKO
t_data = [0, 0.1, 0.2, 0.4, 0.8, 1]
x_data = [2.0, 1.6, 1.2, 0.7, 0.3, 0.15]
m = GEKKO(remote=False)
m.time = t_data
x = m.CV(value=x_data,lb=0.5,ub=3); x.FSTATUS = 1 # fit to measurement
k = m.FV(); k.STATUS = 1 # adjustable parameter
m.Equation(x.dt()== -k * x) # differential equation
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.solve(disp=False) # display solver output
k = k.value[0]; print(k)
A plot of the results shows that the bounds are enforced but the model prediction does not fit the data because of the lower bound constraint (x>=0.5).
import numpy as np
import matplotlib.pyplot as plt # plot solution
plt.plot(m.time,x.value,'bo',\
label='Predicted (k='+str(np.round(k,2))+')')
plt.plot(m.time,x_data,'rx',label='Measured')
# plot exact solution
t = np.linspace(0,1); xe = 2*np.exp(-k*t)
plt.plot(t,xe,'k:',label='Exact Solution')
plt.legend()
plt.xlabel('Time'), plt.ylabel('Value')
plt.show()
Without the restrictive lower bound, the solver optimizes to best fit the points.
x = m.CV(value=x_data,lb=0.0,ub=3)
Response 1 to Question Edit
The only way that a variable (such as u) is outside of the bounds is if the solver did not report a successful solution. To report a successful solution, the solver must satisfy the Karush Kuhn Tucker conditions for optimality. I recommend that you check that it satisfied all of the equations by checking that m.options.APPSTATUS==1 after the m.solve() command. If you can include an MVCE (https://stackoverflow.com/help/minimal-reproducible-example) that has sample data so the script can run, we can help you check it.
Response 2 to Question Edit
Thanks for including a minimal reproducible example. Here are the results that I get with Gekko 0.2.8. If you are using an earlier version, I recommend that you upgrade with pip install gekko --upgrade.
The solver reports a successful solution.
EXIT: Optimal Solution Found.
The solution was found.
The final value of the objective function is 0.03164650667928192
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 0.23339999999999997 sec
Objective : 0.0316473666078486
Successful solution
---------------------------------------------------
The constraints x>=0 and 0<=u<=1 are satisfied. Could it just be an issue with an older version of Gekko?

Why does keras (SGD) optimizer.minimize() not reach global minimum in this example?

I'm in the process of completing a TensorFlow tutorial via DataCamp and am transcribing/replicating the code examples I am working through in my own Jupyter notebook.
Here are the original instructions from the coding problem :
I'm running the following snippet of code and am not able to arrive at the same result that I am generating within the tutorial, which I have confirmed are the correct values via a connected scatterplot of x vs. loss_function(x) as seen a bit further below.
# imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import Variable, keras
def loss_function(x):
import math
return 4.0*math.cos(x-1)+np.divide(math.cos(2.0*math.pi*x),x)
# Initialize x_1 and x_2
x_1 = Variable(6.0, np.float32)
x_2 = Variable(0.3, np.float32)
# Define the optimization operation
opt = keras.optimizers.SGD(learning_rate=0.01)
for j in range(100):
# Perform minimization using the loss function and x_1
opt.minimize(lambda: loss_function(x_1), var_list=[x_1])
# Perform minimization using the loss function and x_2
opt.minimize(lambda: loss_function(x_2), var_list=[x_2])
# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())
I draw a quick connected scatterplot to confirm (successfully) that the loss function that I using gets me back to the same graph provided by the example (seen in screenshot above)
# Generate loss_function(x) values for given range of x-values
losses = []
for p in np.linspace(0.1, 6.0, 60):
losses.append(loss_function(p))
# Define x,y coordinates
x_coordinates = list(np.linspace(0.1, 6.0, 60))
y_coordinates = losses
# Plot
plt.scatter(x_coordinates, y_coordinates)
plt.plot(x_coordinates, y_coordinates)
plt.title('Plot of Input values (x) vs. Losses')
plt.xlabel('x')
plt.ylabel('loss_function(x)')
plt.show()
Here are the resulting global and local minima, respectively, as per the DataCamp environment :
4.38 is the correct global minimum, and 0.42 indeed corresponds to the first local minima on the graphs RHS (when starting from x_2 = 0.3)
And here are the results from my environment, both of which move opposite the direction that they should be moving towards when seeking to minimize the loss value:
I've spent the better part of the last 90 minutes trying to sort out why my results disagree with those of the DataCamp console / why the optimizer fails to minimize this loss for this simple toy example...?
I appreciate any suggestions that you might have after you've run the provided code in your own environments, many thanks in advance!!!
As it turned out, the difference in outputs arose from the default precision of tf.division() (vs np.division()) and tf.cos() (vs math.cos()) -- operations which were specified in (my transcribed, "custom") definition of the loss_function().
The loss_function() had been predefined in the body of the tutorial and when I "inspected" it using the inspect package ( using inspect.getsourcelines(loss_function) ) in order to redefine it in my own environment, the output of said inspection didn't clearly indicate that tf.division & tf.cos had been used instead of their NumPy counterparts (which my version of the code had used).
The actual difference is quite small, but is apparently sufficient to push the optimizer in the opposite direction (away from the two respective minima).
After swapping in tf.division() and tf.cos (as seen below) I was able to arrive at the same results as seen in the DC console.
Here is the code for the loss_function that will back in to the same results as seen in the console (screenshot) :
def loss_function(x):
import math
return 4.0*tf.cos(x-1)+tf.divide(tf.cos(2.0*math.pi*x),x)

Simple Hidden Markov Model in PyMC3 throws Theano error

I'm new to PyMC3, Theano, and numpy. Was just trying to duplicate the first 'hidden' Markov Model in the Stan manual--the one in which the states are actually observed. But, I keep running into errors having to do with Theano, numpy, and perhaps what is going on behind PyMC3 distributions, which seem a bit mysterious to me. My code for the model is below:
import pandas as pd
dat_hmm = pd.read_csv('hmmVals.csv')
emission=dat_hmm.emission.values
state=dat_hmm.state.values
from pymc3 import Model, Dirichlet, Categorical
import numpy as np
basic_model = Model()
with basic_model:
#Model constants:
#num unique hidden states, num unique emissions, num instances
K=3; V=9; T=10
alpha=np.ones(K); beta=np.ones(V)
# Priors for unknown model parameters
theta = np.empty(K, dtype=object) #theta=transmission
phi = np.empty(K, dtype=object) #phi=emission
#observed emission, state:
w=np.empty(T, dtype=object); z=np.empty(T, dtype=object);
for k in range(K):
theta[k]=Dirichlet('theta'+str(k), alpha)
phi[k]=Dirichlet('phi'+str(k), beta)
# Likelihood (sampling distribution) of observations
for t in range(T):
w[t]=Categorical('w'+str(t),theta[state[t]], shape=1, observed=emission[t])
for t in range(2, T):
z[t]=Categorical('z'+str(t),phi[state[t-1]], shape=1, observed=state[t])
The line "w[t]=Categorical('w'+str(t),theta[state[t]], shape=1, observed=emission[t])" generates the error, but not on t=0, which fills in w0, but on t=1 which generates an index out of bound error. There is no index out of bound in the code line itself because state[1], theta[state[t]], and emission[t] all exist. The error messages are:
Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/pymc3/distributions/distribution.py", line 25, in __new__
return model.Var(name, dist, data)
File "/usr/local/lib/python3.4/dist-packages/pymc3/model.py", line 306, in Var
var = ObservedRV(name=name, data=data, distribution=dist, model=self)
File "/usr/local/lib/python3.4/dist-packages/pymc3/model.py", line 581, in __init__
self.logp_elemwiset = distribution.logp(data)
File "/usr/local/lib/python3.4/dist-packages/pymc3/distributions/discrete.py", line 400, in logp
a = tt.log(p[value])
File "/usr/local/lib/python3.4/dist-packages/theano/tensor/var.py", line 532, in __getitem__
lambda entry: isinstance(entry, Variable)))
File "/usr/local/lib/python3.4/dist-packages/theano/gof/op.py", line 668, in __call__
required = thunk()
File "/usr/local/lib/python3.4/dist-packages/theano/gof/op.py", line 883, in rval
fill_storage()
File "/usr/local/lib/python3.4/dist-packages/theano/gof/cc.py", line 1707, in __call__
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python3.4/dist-packages/six.py", line 686, in reraise
raise value
IndexError: index out of bounds
I don't know about the wisdom of sticking numpy objects into PyMC3 distributions or using the result of that to try to parameterize another distribution, but I have seen somewhat similar code on the web, minus the last part. Is there perhaps no good way to code such a hidden Markov model in PyMC3 yet?
I have found a way to fix the above error. The following code works--no errors and I'm able to get correct parameter estimates with Metropolis at least.
I made two mistakes and didn't realize they were so simple because I expected something complicated to be happening in Theano. One is that my data was set up for Stan and so indexed to start at 1 rather than 0. Python indexes everything to 0. I changed the data file by subtracting 1 from every value. The other error was I used theta, the transmission matrix, to calculate individual emissions and vice versa for the phi matrix. Theta was too short for the emissions.
What I wish I understood now was why the NUTS sampler keeps telling me I have a non-positive definite scaling, even though I'm feeding it MAP estimates. Metropolis works, but is slow-- about 11 minutes for these 300 observations and 1000 samples. The other mystery is why PyMC3 thinks it only took a couple seconds to calculate the samples.
import pandas as pd
dat_hmm = pd.read_csv('hmmVals.csv')
emission=dat_hmm.emission.values
state=dat_hmm.state.values
from pymc3 import Model, Dirichlet, Categorical
import numpy as np
basic_model = Model()
with basic_model:
#Model constants:
K=3; V=9; T=300 #num unique hidden states, num unique emissions, num instances
alpha=np.ones(K); beta=np.ones(V)
# Priors for unknown model parameters
theta = np.empty(K, dtype=object) #theta=transmission
phi = np.empty(K, dtype=object) #phi=emission
w=np.empty(T, dtype=object); z=np.empty(T, dtype=object); #observed emission, state
for k in range(K):
theta[k]=Dirichlet('theta'+str(k), alpha)
phi[k]=Dirichlet('phi'+str(k), beta)
#Likelihood (sampling distribution) of observationss
for t in range(2, T):
z[t]=Categorical('z'+str(t),theta[state[t-1]], shape=1, observed=state[t])
for t in range(T):
w[t]=Categorical('w'+str(t),phi[state[t]], shape=1, observed=emission[t])
I have also tried to implement HMMs in pymc3 and I had run into similar problems. I just found a way to implement a two level HMM in a vectorized fashion (to be perfectly honest, my model is not hidden but the hidden part can be added easily - I vectorized the description of the state variable). I am not sure whether this is the most efficient way, but I tested this code against a simple for loop for defining the states. The code below runs in less than a minute with 1000 data points whereas the for loop took several hours.
Here is the code:
import numpy as np
import theano.tensor as tt
import pymc3 as pm
class HMMStates(pm.Discrete):
"""
Hidden Markov Model States
Parameters
----------
P1 : tensor
probability to remain in state 1
P2 : tensor
probability to move from state 2 to state 1
"""
def __init__(self, PA=None, P1=None, P2=None,
*args, **kwargs):
super(HMMStates, self).__init__(*args, **kwargs)
self.PA = PA
self.P1 = P1
self.P2 = P2
self.mean = 0.
def logp(self, x):
PA = self.PA
P1 = self.P1
P2 = self.P2
# now we need to create an array with probabilities
# so that for x=A: PA=P1, PB=(1-P1)
# and for x=B: PA=P2, PB=(1-P2)
length = x.shape[0]
P1T = tt.tile(P1,(length-1,1)).T
P2T = tt.tile(P2,(length-1,1)).T
P = tt.switch(x[:-1],P1T,P2T).T
x_i = x[1:]
ou_like = pm.Categorical.dist(P).logp(x_i)
return pm.Categorical.dist(PA).logp(x[0]) + tt.sum(ou_like)
This class creates the states of the HMM. To call it you can do the following:
theta = np.ones(2) # prior for probabilities
with pm.Model() as model:
# 2 state model
# P1 is probablility to stay in state 1
# P2 is probability to move from state 2 to state 1
P1 = pm.Dirichlet('P1', a=theta)
P2 = pm.Dirichlet('P2', a=theta)
PA = pm.Deterministic('PA',P2/(P2+1-P1))
states = HMMStates('states',PA,P1,P2, observed=data)
start = pm.find_MAP()
trace = pm.sample(5000, start=start)
just to show how the old code looks like:
with pm.Model() as model:
# 2 state model
# P1 is probablility to stay in state 1
# P2 is probability to move from state 2 to state 1
P1 = pm.Dirichlet('P1', a=np.ones(2))
P2 = pm.Dirichlet('P2', a=np.ones(2))
PA = pm.Deterministic('PA',P2/(P2+1-P1))
state = pm.Categorical('state0',PA, observed=data[0])
for i in range(1,N_chain):
state = pm.Categorical('state'+str(i), tt.switch(data[i-1],P1,P2), observed=data[i])
start = pm.find_MAP()
trace = pm.sample(5000, start=start)

Derivative check with scalers

I have a problem that I want to scale the design variables. I have added the scaler, but I want to check the derivative to make sure it is doing what I want it to do. Is there a way to check the scaled derivative? I have tried to use check_total_derivatives() but the derivative is the exact same regardless of what value I put for scaler:
from openmdao.api import Component, Group, Problem, IndepVarComp, ExecComp
from openmdao.drivers.pyoptsparse_driver import pyOptSparseDriver
class Scaling(Component):
def __init__(self):
super(Scaling, self).__init__()
self.add_param('x', shape=1)
self.add_output('y', shape=1)
def solve_nonlinear(self, params, unknowns, resids):
unknowns['y'] = 1000. * params['x']**2 + 2
def linearize(self, params, unknowns, resids):
J = {}
J['y', 'x'] = 2000. * params['x']
return J
class ScalingGroup(Group):
def __init__(self,):
super(ScalingGroup, self).__init__()
self.add('x', IndepVarComp('x', 0.0), promotes=['*'])
self.add('g', Scaling(), promotes=['*'])
p = Problem()
p.root = ScalingGroup()
# p.driver = pyOptSparseDriver()
# p.driver.options['optimizer'] = 'SNOPT'
p.driver.add_desvar('x', lower=0.005, upper=100., scaler=1000)
p.driver.add_objective('y')
p.setup()
p['x'] = 3.
p.run()
total = p.check_total_derivatives()
# Derivative is the same regardless of what the scaler is.
The scalers and adders are consistent in their behavior, so the check derivatives routines give results in unscaled terms to be more intuitive.
If you really want to see what impact the scaler is having when the NLP sees the scaled value and you're using SNOPT, you can add SNOPT's derivative check capability:
p.driver.opt_settings['Verify level'] = 3
SNOPT_print.out will contain, with the scaler set to 1:
Column x(j) dx(j) Element no. Row Derivative Difference approxn
1 3.00000000E+00 2.19E-06 Objective 6.00000000E+03 6.00000219E+03 ok
Or if we change it to the x scaler to 1000:
Column x(j) dx(j) Element no. Row Derivative Difference approxn
1 3.00000000E+03 1.64E-03 Objective 6.00000000E+00 6.00000164E+00 ok
So in the units of the problem, which check_total_derivatives uses, the derivative doesn't change. But the scaled value as seen by the optimizer is changing.
Another way to see exactly what the optimizer is seeing from calc_gradient is to mimic the call to calc_gradient. This is not necessarily easy to figure out, but I thought I would paste it here for reference.
print p.calc_gradient(list(p.driver.get_desvars().keys()),
list(p.driver.get_objectives().keys()) + list(p.driver.get_constraints().keys()),
dv_scale=p.driver.dv_conversions,
cn_scale=p.driver.fn_conversions)