predicting p of binomial with beta prior in edward2 & tensorflow2 - tensorflow

The following code predicts the p of the binomial distribution by using beta as prior. Somehow, sometimes, I get meaningless results (acceptance rate = 0). When I write the same logic with pymc3, I have no issue.
I couldn't see what I am missing here.
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
import edward2 as ed
from pymc3.stats import hpd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
p_true = .15
N = [10, 100, 1000]
successN = np.random.binomial(p=p_true, n=N)
print(N)
print(successN)
def beta_binomial(N):
p = ed.Beta(
concentration1=tf.ones( len(N) ),
concentration0=tf.ones( len(N) ),
name='p'
)
return ed.Binomial(total_count=N, probs=p, name='obs')
log_joint = ed.make_log_joint_fn(beta_binomial)
def target_log_prob_fn(p):
return log_joint(N=N, p=p, obs=successN)
#kernel = tfp.mcmc.HamiltonianMonteCarlo(
# target_log_prob_fn=target_log_prob_fn,
# step_size=0.01,
# num_leapfrog_steps=5)
kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=target_log_prob_fn,
step_size=.01
)
trace, kernel_results = tfp.mcmc.sample_chain(
num_results=1000,
kernel=kernel,
num_burnin_steps=500,
current_state=[
tf.random.uniform(( len(N) ,))
],
trace_fn=(lambda current_state, kernel_results: kernel_results),
return_final_kernel_results=False)
p, = trace
p = p.numpy()
print(p.shape)
print('acceptance rate ', np.mean(kernel_results.is_accepted))
def printSummary(name, v):
print(name, v.shape)
print(np.mean(v, axis=0))
print(hpd(v))
printSummary('p', p)
for data in p.T:
print(data.shape)
seaborn.distplot(data, kde=False)
plt.savefig('p.png')
Libraries:
pip install -U pip
pip install -e git+https://github.com/google/edward2.git#4a8ed9f5b1dac0190867c48e816168f9f28b5129#egg=edward2
pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-2.0.0-cp37-cp37m-manylinux2010_x86_64.whl#egg=tensorflow
pip install tensorflow-probability
Sometimes I see the following (when acceptance rate=0):
And, sometimes I see the following (when acceptance rate>.9):

When I get unstable results in Bayesian inference (I use mc-stan, but it's also using NUTS), it's usually because either the priors and likelihood are mis-specified, or the hyperparameters are not good for the problem.
That first graph shows that the sampler never moved away from the initial guess at the answers (hence the 0 acceptance rate). It also worries me that the green distribution seems to be right on 0. The beta(1,1) has positive probability at 0 but a p=0 might be an unstable solution here? (as in, the sampler may not be able to calculate the derivative at that point and returns a NaN, so doesn't know where to sample next?? Complete guess there).
Can you force the initial condition to be 0 and see if that always creates a failed sampling?
Other than that, I would try tweaking the hyperparameters, such as step size, number of iterations, etc...
Also, you may want to simplify the example by only using one N. Might help you diagnose. Good luck!

random.uniform's maxval default value is None. I changed it to 1, the result became stable.
random.uniform(( len(N) ,), minval=0, maxval=1)

Related

TensorFlow Probability (tfp) equivalent of np.quantile()

I am trying to find a TensorFlow equivalent of np.quantile(). I have found tfp.stats.quantiles() (tfp stands for TensorFlow Probability). However, its constructs are a bit different from that of np.quantile().
Consider the following example:
import tensorflow_probability as tfp
import tensorflow as tf
import numpy as np
inputs = tf.random.normal((1, 4096, 4))
print("NumPy")
print(np.quantile(inputs.numpy(), q=0.9, axis=1, keepdims=False))
I am not sure from the TFP docs how I could write the above using tfp.stats.quantile(). I tried checking out the source code of both methods, but it didn't help.
Let me try to be more helpful here than I was on GitHub.
There is a difference in behavior between np.quantile and tfp.stats.quantiles. The key difference here is that numpy.quantile will
Compute the q-th quantile of the data along the specified axis.
where q is the
Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive.
and tfp.stats.quantiles
Given a vector x of samples, this function estimates the cut points by returning num_quantiles + 1 cut points
So you need to tell tfp.stats.quantiles how many quantiles you want and then select out the qth quantile. If it isn't clear how to do this just from the API, if you look at the source for tfp.stats.quantiles (for v0.19.0) we can see that it shows us how we can get a similar return structure as NumPy.
For completeness, setting up a virtual environment with
$ cat requirements.txt
numpy==1.24.2
tensorflow==2.11.0
tensorflow-probability==0.19.0
allows us to run
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
inputs = tf.random.normal((1, 4096, 4), dtype=tf.float64)
q = 0.9
numpy_quantiles = np.quantile(inputs.numpy(), q=q, axis=1, keepdims=False)
tfp_quantiles = tfp.stats.quantiles(
inputs, num_quantiles=100, axis=1, interpolation="linear"
)[int(q * 100)]
assert np.allclose(numpy_quantiles, tfp_quantiles.numpy())
print(f"{numpy_quantiles=}")
# numpy_quantiles=array([[1.31727661, 1.2699167 , 1.28735237, 1.27137588]])
print(f"{tfp_quantiles=}")
# tfp_quantiles=<tf.Tensor: shape=(1, 4), dtype=float64, numpy=array([[1.31727661, 1.2699167 , 1.28735237, 1.27137588]])>
You could also use tfp.stats.percentile(inputs, 90., axis=1, keepdims=False) -- the only difference from quantile is the 90. replacing .90.

numpy.random.multinomial at version 1.16.6 is 10x faster than later version

Here are codes and result:
python -c "import numpy as np; from timeit import timeit; print('numpy version {}: {:.1f} seconds'.format(np.__version__, timeit('np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4])', number=1000000, globals=globals())))"
numpy version 1.16.6: 1.5 seconds # 10x faster
numpy version 1.18.1: 15.5 seconds
numpy version 1.19.0: 17.4 seconds
numpy version 1.21.4: 15.1 seconds
It is noted that with fixed random seed, the output are the same with different numpy version
python -c "import numpy as np; np.random.seed(0); print(np.__version__); print(np.random.multinomial(1, [0.1, 0.2, 0.3, 0.4], size=10000))" /tmp/tt
Any advice on why numpy version after 1.16.6 is 10x slower?
We have upgraded pandas to latest version 1.3.4, which needs numpy version after 1.16.6
TL;DR: this is a local performance regression caused by the overhead of additional checks in the numpy.random.multinomial function. Very small arrays are strongly impacted due to the relative execution time of the required checks.
Under the hood
A binary search on the Git commits of the Numpy code shows that the performance regression appear the first time in mid April 2019. It can be reproduced in the commit dd77ce3cb but not 7e8e19f9a. There are some build issues for the commit in-between, but with some quick fix we can show that the commit 0f3dd0650 is the first to cause the issue. The commit says that it:
Extend multinomial to allow broadcasting
Fix zipf changes missed in NumPy
Enable 0 as valid input for hypergeometric
A deeper analysis of this commit shows that it modifies the multinomial function defined in Cython file mtrand.pyx to perform two additional following checks:
def multinomial(self, np.npy_intp n, object pvals, size=None):
cdef np.npy_intp d, i, sz, offset
cdef np.ndarray parr, mnarr
cdef double *pix
cdef int64_t *mnix
cdef int64_t ni
d = len(pvals)
parr = <np.ndarray>np.PyArray_FROM_OTF(pvals, np.NPY_DOUBLE, np.NPY_ALIGNED)
pix = <double*>np.PyArray_DATA(parr)
check_array_constraint(parr, 'pvals', CONS_BOUNDED_0_1) # <==========[HERE]
if kahan_sum(pix, d-1) > (1.0 + 1e-12):
raise ValueError("sum(pvals[:-1]) > 1.0")
if size is None:
shape = (d,)
else:
try:
shape = (operator.index(size), d)
except:
shape = tuple(size) + (d,)
multin = np.zeros(shape, dtype=np.int64)
mnarr = <np.ndarray>multin
mnix = <int64_t*>np.PyArray_DATA(mnarr)
sz = np.PyArray_SIZE(mnarr)
ni = n
check_constraint(ni, 'n', CONS_NON_NEGATIVE) # <==========[HERE]
offset = 0
with self.lock, nogil:
for i in range(sz // d):
random_multinomial(self._brng, ni, &mnix[offset], pix, d, self._binomial)
offset += d
return multin
These two checks are required for the code to be robust. However, they are currently pretty expensive considering their purpose.
Indeed, on my machine, the first check is responsible for ~75% of the overhead and the second for ~20%. The checks takes few micro-seconds but since your input is very small, the overhead is huge compared to the computation time.
One workaround to fix this issue is to write a specific Numba function for this since your input array is very small. On my machine, np.random.multinomial in a trivial Numba function results in good performance.
I checked some generators that are under the hood and saw no much change in the timings.
I guessed difference may be due to some overhead, because you are sampling only single value. And it seems to be good hypothesis. When I increased size of the generated random samples to 1000, difference between 1.16.6 and 1.19.2 (my current Numpy version) diminished to ~20%.
python -c "import numpy as np; from timeit import timeit; print('numpy version {}: {:.1f} seconds'.format(np.__version__, timeit('np.random.
multinomial(1, [0.1, 0.2, 0.3, 0.4], size=1000)', number=10000, globals=globals())))"
numpy version 1.16.6: 1.1 seconds
numpy version 1.19.2: 1.3 seconds
Note that both versions have this overhead, just newer version has it much larger. In both versions it is much faster to sample 1000 values once than sample 1 value 1000 times.
They changed by much the code between 1.16.6 and 1.17.0, see for example this commit, it's hard to analyse. Sorry that can't help you better - I propose to make an issue on Numpy's github.

How does GEKKO optimization with bounded variables work?

I am using GEKKO to estimate the parameters of a differential equation and I have bounded one of the variables between 0 and 1. However, when I solve the ODE, I get values outside of the bounds for this variable, so I was wondering if somebody knew how GEKKO finds the solution, as this might help me resolve the issue.
Here is the code I use to fit the data. This gives me a solution x and u where x is between 0 and 1.
However, afterwards, I try to solve the ODE using scipy.integrate.solve_ivp, with the initial value of u that I got, and the solution I get for u is not between this bounds. Since it should be unique, I am wondering what is the process that GEKKO follows to find the solution (does it proyect the values to the bound or how does it deal with this?) Any comment is very appreciated.
Here is an MVCE. If you run it you can see that with GEKKO, I get a solution between the bounds and then, when I solve the ODE with solve_ivp, I don't get the same solution. Can you explain why this happens and how can I deal with it? I want to use solve_ivp to predict the next values.
from scipy.integrate import solve_ivp
from gekko import GEKKO
import matplotlib.pyplot as plt
time=[0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357]
m = GEKKO(remote=False)
m.time= [0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357]
x_data= [0.0003777630481280617, 0.002024573836061331,\
0.0008954383363035536, 0.005331749410182463]
x = m.CV(value=x_data, lb=0); x.FSTATUS = 1 # fit to measurement
x.SPLO = 0
sigma = m.FV(value=0.5, lb= 0, ub=1); sigma.STATUS=1
d = m.Param(0.05)
k = m.Param(0.001)
b = m.Param(0.5)
r = m.FV(value=0.5, lb= 0); r.STATUS=1
m_param = m.Param(1)
u = m.Var(value=0.1, lb=0, ub=1)
m.free(u)
a = m.Param(0.999)
Kmax= m.Param(100000)
m.Equations([x.dt()==x*(r*(1-a*u**2)*(1-x/(Kmax*(1-a*u**2)))-\
m_param/(k+b*u)-d), u.dt() == \
sigma*((-2*a*(b**2)*r*(u**3)+4*a*b*k*r*(u**2)\
+2*a*(k**2)*r*u-b*m_param)/((b*u+k)**2))])
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.options.EV_TYPE = 1 # linear error (2 for squared)
m.solve(disp=False, debug=False) # display solver output
def model_case_3(t, z, r, k, b, Kmax, sigma):
m=1
a=0.999
x, u= z
dxdt = x*(r*(1-a*u**2)*(1-x/(Kmax*(1-a*u**2)))-m/(k+b*u)-0.05)
dudt = sigma*((-2*a*(b**2)*r*(u**3)+4*a*b*k*r*(u**2)\
+2*a*(k**2)*r*u-b*m)/((b*u+k)**2))
return [dxdt, dudt]
sol = solve_ivp(fun=model_case_3, t_span=[0.0, 0.2356902356902357],\
y0=[0.0003777630481280617, u.value[0]],\
t_eval=[0.0, 0.11784511784511785, 0.18855218855218855,\
0.2356902356902357], \
args=(r.value[0], 0.001, 0.5,1000000 , sigma.value[0]))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,3), constrained_layout=True)
ax1.set_title('x')
ax1.plot(time, x.value, time, sol['y'][0])
ax2.set_title('u')
ax2.plot(time, u.value, time, sol['y'][1])
plt.show()
It is not an issue with the version of Gekko as I have Gekko 0.2.8, so I am wondering if it has anything to do with the initialization of variables. I run the example I posted on spyder (I was using google colab) and I got the correct solution, but when I run the rest of the cases I got again negative values for u (solving with solve_ivp), which is quite strange.
You can add a bound to a variable when it is created by setting lb (lower bound) and ub (upper bound).
z = m.Var(lb=0,ub=10)
After you create the variable, the bound is adjusted with .lower and .upper.
z.LOWER = 1
z.UPPER = 9
Here is an example problem that shows the use of bounds where x is constrained to be greater than 0.5.
from gekko import GEKKO
t_data = [0, 0.1, 0.2, 0.4, 0.8, 1]
x_data = [2.0, 1.6, 1.2, 0.7, 0.3, 0.15]
m = GEKKO(remote=False)
m.time = t_data
x = m.CV(value=x_data,lb=0.5,ub=3); x.FSTATUS = 1 # fit to measurement
k = m.FV(); k.STATUS = 1 # adjustable parameter
m.Equation(x.dt()== -k * x) # differential equation
m.options.IMODE = 5 # dynamic estimation
m.options.NODES = 5 # collocation nodes
m.solve(disp=False) # display solver output
k = k.value[0]; print(k)
A plot of the results shows that the bounds are enforced but the model prediction does not fit the data because of the lower bound constraint (x>=0.5).
import numpy as np
import matplotlib.pyplot as plt # plot solution
plt.plot(m.time,x.value,'bo',\
label='Predicted (k='+str(np.round(k,2))+')')
plt.plot(m.time,x_data,'rx',label='Measured')
# plot exact solution
t = np.linspace(0,1); xe = 2*np.exp(-k*t)
plt.plot(t,xe,'k:',label='Exact Solution')
plt.legend()
plt.xlabel('Time'), plt.ylabel('Value')
plt.show()
Without the restrictive lower bound, the solver optimizes to best fit the points.
x = m.CV(value=x_data,lb=0.0,ub=3)
Response 1 to Question Edit
The only way that a variable (such as u) is outside of the bounds is if the solver did not report a successful solution. To report a successful solution, the solver must satisfy the Karush Kuhn Tucker conditions for optimality. I recommend that you check that it satisfied all of the equations by checking that m.options.APPSTATUS==1 after the m.solve() command. If you can include an MVCE (https://stackoverflow.com/help/minimal-reproducible-example) that has sample data so the script can run, we can help you check it.
Response 2 to Question Edit
Thanks for including a minimal reproducible example. Here are the results that I get with Gekko 0.2.8. If you are using an earlier version, I recommend that you upgrade with pip install gekko --upgrade.
The solver reports a successful solution.
EXIT: Optimal Solution Found.
The solution was found.
The final value of the objective function is 0.03164650667928192
---------------------------------------------------
Solver : IPOPT (v3.12)
Solution time : 0.23339999999999997 sec
Objective : 0.0316473666078486
Successful solution
---------------------------------------------------
The constraints x>=0 and 0<=u<=1 are satisfied. Could it just be an issue with an older version of Gekko?

PyMC3 sample() function does not accept the "start" value to generate a trace

I am new to PyMC3 and Bayesian inference methods. I have a simple code that tries to infer the value of some decay constant (=1) from the artificial data generated using a truncated exponential distribution:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import pymc3 as pm
import arviz as az
T = stats.truncexpon(b = 10.)
t = T.rvs(1000)
#Bayesian Inference
with pm.Model() as model:
#Define Priors
lam = pm.Gamma('$\lambda$', alpha=1, beta=1)
#Define Likelihood
time = pm.Exponential('time', lam = lam, observed = t)
#Inference
trace = pm.sample(20, start = {'lam': 10.}, \
step=pm.Metropolis(), chains=1, cores=1, \
progressbar = True)
az.plot_trace(trace)
plt.show()
This code produces a trace like below
I am really confused as to why the starting value of 10. is not accepted by the sampler. The trace above should start at 10. I am using python 3.7 to run the code.
Thank you.
Few things going on:
when the sampler first starts it has a tuning phase; samples during this phase are discarded by default, but this can be controlled with the discard_tuned_samples argument
the keys in the start argument dictionary need to correspond to the name given to the RandomVariable ('$\lambda$') not the Python variable
Incorporating those two, one can try
trace = pm.sample(20, start = {'$\lambda$': 10.},
step=pm.Metropolis(), chains=1, cores=1,
discard_tuned_samples=False)
However, the other possible issue is that
the starting value isn't guaranteed to be emitted in the first draw; only if the first proposal sample is rejected, which is down to chance.
Fixing the game (setting a random seed), though, we can get glimpse:
trace = pm.sample(20, start = {'$\lambda$': 10.},
step=pm.Metropolis(), chains=1, cores=1,
discard_tuned_samples=False, random_seed=1)
...
trace.get_values(varname='$\lambda$')[:10]
# array([10. , 5.42397358, 3.19841997, 1.09383329, 1.09383329,
# 1.09383329, 1.09383329, 1.09383329, 1.09383329, 1.09383329])

Why does keras (SGD) optimizer.minimize() not reach global minimum in this example?

I'm in the process of completing a TensorFlow tutorial via DataCamp and am transcribing/replicating the code examples I am working through in my own Jupyter notebook.
Here are the original instructions from the coding problem :
I'm running the following snippet of code and am not able to arrive at the same result that I am generating within the tutorial, which I have confirmed are the correct values via a connected scatterplot of x vs. loss_function(x) as seen a bit further below.
# imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import Variable, keras
def loss_function(x):
import math
return 4.0*math.cos(x-1)+np.divide(math.cos(2.0*math.pi*x),x)
# Initialize x_1 and x_2
x_1 = Variable(6.0, np.float32)
x_2 = Variable(0.3, np.float32)
# Define the optimization operation
opt = keras.optimizers.SGD(learning_rate=0.01)
for j in range(100):
# Perform minimization using the loss function and x_1
opt.minimize(lambda: loss_function(x_1), var_list=[x_1])
# Perform minimization using the loss function and x_2
opt.minimize(lambda: loss_function(x_2), var_list=[x_2])
# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())
I draw a quick connected scatterplot to confirm (successfully) that the loss function that I using gets me back to the same graph provided by the example (seen in screenshot above)
# Generate loss_function(x) values for given range of x-values
losses = []
for p in np.linspace(0.1, 6.0, 60):
losses.append(loss_function(p))
# Define x,y coordinates
x_coordinates = list(np.linspace(0.1, 6.0, 60))
y_coordinates = losses
# Plot
plt.scatter(x_coordinates, y_coordinates)
plt.plot(x_coordinates, y_coordinates)
plt.title('Plot of Input values (x) vs. Losses')
plt.xlabel('x')
plt.ylabel('loss_function(x)')
plt.show()
Here are the resulting global and local minima, respectively, as per the DataCamp environment :
4.38 is the correct global minimum, and 0.42 indeed corresponds to the first local minima on the graphs RHS (when starting from x_2 = 0.3)
And here are the results from my environment, both of which move opposite the direction that they should be moving towards when seeking to minimize the loss value:
I've spent the better part of the last 90 minutes trying to sort out why my results disagree with those of the DataCamp console / why the optimizer fails to minimize this loss for this simple toy example...?
I appreciate any suggestions that you might have after you've run the provided code in your own environments, many thanks in advance!!!
As it turned out, the difference in outputs arose from the default precision of tf.division() (vs np.division()) and tf.cos() (vs math.cos()) -- operations which were specified in (my transcribed, "custom") definition of the loss_function().
The loss_function() had been predefined in the body of the tutorial and when I "inspected" it using the inspect package ( using inspect.getsourcelines(loss_function) ) in order to redefine it in my own environment, the output of said inspection didn't clearly indicate that tf.division & tf.cos had been used instead of their NumPy counterparts (which my version of the code had used).
The actual difference is quite small, but is apparently sufficient to push the optimizer in the opposite direction (away from the two respective minima).
After swapping in tf.division() and tf.cos (as seen below) I was able to arrive at the same results as seen in the DC console.
Here is the code for the loss_function that will back in to the same results as seen in the console (screenshot) :
def loss_function(x):
import math
return 4.0*tf.cos(x-1)+tf.divide(tf.cos(2.0*math.pi*x),x)