PyMC3 sample() function does not accept the "start" value to generate a trace - bayesian

I am new to PyMC3 and Bayesian inference methods. I have a simple code that tries to infer the value of some decay constant (=1) from the artificial data generated using a truncated exponential distribution:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import pymc3 as pm
import arviz as az
T = stats.truncexpon(b = 10.)
t = T.rvs(1000)
#Bayesian Inference
with pm.Model() as model:
#Define Priors
lam = pm.Gamma('$\lambda$', alpha=1, beta=1)
#Define Likelihood
time = pm.Exponential('time', lam = lam, observed = t)
#Inference
trace = pm.sample(20, start = {'lam': 10.}, \
step=pm.Metropolis(), chains=1, cores=1, \
progressbar = True)
az.plot_trace(trace)
plt.show()
This code produces a trace like below
I am really confused as to why the starting value of 10. is not accepted by the sampler. The trace above should start at 10. I am using python 3.7 to run the code.
Thank you.

Few things going on:
when the sampler first starts it has a tuning phase; samples during this phase are discarded by default, but this can be controlled with the discard_tuned_samples argument
the keys in the start argument dictionary need to correspond to the name given to the RandomVariable ('$\lambda$') not the Python variable
Incorporating those two, one can try
trace = pm.sample(20, start = {'$\lambda$': 10.},
step=pm.Metropolis(), chains=1, cores=1,
discard_tuned_samples=False)
However, the other possible issue is that
the starting value isn't guaranteed to be emitted in the first draw; only if the first proposal sample is rejected, which is down to chance.
Fixing the game (setting a random seed), though, we can get glimpse:
trace = pm.sample(20, start = {'$\lambda$': 10.},
step=pm.Metropolis(), chains=1, cores=1,
discard_tuned_samples=False, random_seed=1)
...
trace.get_values(varname='$\lambda$')[:10]
# array([10. , 5.42397358, 3.19841997, 1.09383329, 1.09383329,
# 1.09383329, 1.09383329, 1.09383329, 1.09383329, 1.09383329])

Related

Why does keras (SGD) optimizer.minimize() not reach global minimum in this example?

I'm in the process of completing a TensorFlow tutorial via DataCamp and am transcribing/replicating the code examples I am working through in my own Jupyter notebook.
Here are the original instructions from the coding problem :
I'm running the following snippet of code and am not able to arrive at the same result that I am generating within the tutorial, which I have confirmed are the correct values via a connected scatterplot of x vs. loss_function(x) as seen a bit further below.
# imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import Variable, keras
def loss_function(x):
import math
return 4.0*math.cos(x-1)+np.divide(math.cos(2.0*math.pi*x),x)
# Initialize x_1 and x_2
x_1 = Variable(6.0, np.float32)
x_2 = Variable(0.3, np.float32)
# Define the optimization operation
opt = keras.optimizers.SGD(learning_rate=0.01)
for j in range(100):
# Perform minimization using the loss function and x_1
opt.minimize(lambda: loss_function(x_1), var_list=[x_1])
# Perform minimization using the loss function and x_2
opt.minimize(lambda: loss_function(x_2), var_list=[x_2])
# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())
I draw a quick connected scatterplot to confirm (successfully) that the loss function that I using gets me back to the same graph provided by the example (seen in screenshot above)
# Generate loss_function(x) values for given range of x-values
losses = []
for p in np.linspace(0.1, 6.0, 60):
losses.append(loss_function(p))
# Define x,y coordinates
x_coordinates = list(np.linspace(0.1, 6.0, 60))
y_coordinates = losses
# Plot
plt.scatter(x_coordinates, y_coordinates)
plt.plot(x_coordinates, y_coordinates)
plt.title('Plot of Input values (x) vs. Losses')
plt.xlabel('x')
plt.ylabel('loss_function(x)')
plt.show()
Here are the resulting global and local minima, respectively, as per the DataCamp environment :
4.38 is the correct global minimum, and 0.42 indeed corresponds to the first local minima on the graphs RHS (when starting from x_2 = 0.3)
And here are the results from my environment, both of which move opposite the direction that they should be moving towards when seeking to minimize the loss value:
I've spent the better part of the last 90 minutes trying to sort out why my results disagree with those of the DataCamp console / why the optimizer fails to minimize this loss for this simple toy example...?
I appreciate any suggestions that you might have after you've run the provided code in your own environments, many thanks in advance!!!
As it turned out, the difference in outputs arose from the default precision of tf.division() (vs np.division()) and tf.cos() (vs math.cos()) -- operations which were specified in (my transcribed, "custom") definition of the loss_function().
The loss_function() had been predefined in the body of the tutorial and when I "inspected" it using the inspect package ( using inspect.getsourcelines(loss_function) ) in order to redefine it in my own environment, the output of said inspection didn't clearly indicate that tf.division & tf.cos had been used instead of their NumPy counterparts (which my version of the code had used).
The actual difference is quite small, but is apparently sufficient to push the optimizer in the opposite direction (away from the two respective minima).
After swapping in tf.division() and tf.cos (as seen below) I was able to arrive at the same results as seen in the DC console.
Here is the code for the loss_function that will back in to the same results as seen in the console (screenshot) :
def loss_function(x):
import math
return 4.0*tf.cos(x-1)+tf.divide(tf.cos(2.0*math.pi*x),x)

Why does pymc3 run even when I don't include any observations?

Even when I don't include any observed values, pymc3 will still run and give me results. Is this just sampling from the prior without the likelihood?
'''
import pymc3 as pm
model = pm.Model()
with model:
# Define the prior of the parameter lambda.
lam = pm.Gamma('lambda', alpha=3.5, beta=2)
with model:
trace = pm.sample(draws=20, chains=3)
pm.traceplot(trace)
'''
Yes, you're just sampling from the prior. If you want to, you can check that by plotting the samples as histogram or kernel density estimate and compare it to the pdf you get from scipy.stats.gamma.

predicting p of binomial with beta prior in edward2 & tensorflow2

The following code predicts the p of the binomial distribution by using beta as prior. Somehow, sometimes, I get meaningless results (acceptance rate = 0). When I write the same logic with pymc3, I have no issue.
I couldn't see what I am missing here.
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
import edward2 as ed
from pymc3.stats import hpd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
p_true = .15
N = [10, 100, 1000]
successN = np.random.binomial(p=p_true, n=N)
print(N)
print(successN)
def beta_binomial(N):
p = ed.Beta(
concentration1=tf.ones( len(N) ),
concentration0=tf.ones( len(N) ),
name='p'
)
return ed.Binomial(total_count=N, probs=p, name='obs')
log_joint = ed.make_log_joint_fn(beta_binomial)
def target_log_prob_fn(p):
return log_joint(N=N, p=p, obs=successN)
#kernel = tfp.mcmc.HamiltonianMonteCarlo(
# target_log_prob_fn=target_log_prob_fn,
# step_size=0.01,
# num_leapfrog_steps=5)
kernel = tfp.mcmc.NoUTurnSampler(
target_log_prob_fn=target_log_prob_fn,
step_size=.01
)
trace, kernel_results = tfp.mcmc.sample_chain(
num_results=1000,
kernel=kernel,
num_burnin_steps=500,
current_state=[
tf.random.uniform(( len(N) ,))
],
trace_fn=(lambda current_state, kernel_results: kernel_results),
return_final_kernel_results=False)
p, = trace
p = p.numpy()
print(p.shape)
print('acceptance rate ', np.mean(kernel_results.is_accepted))
def printSummary(name, v):
print(name, v.shape)
print(np.mean(v, axis=0))
print(hpd(v))
printSummary('p', p)
for data in p.T:
print(data.shape)
seaborn.distplot(data, kde=False)
plt.savefig('p.png')
Libraries:
pip install -U pip
pip install -e git+https://github.com/google/edward2.git#4a8ed9f5b1dac0190867c48e816168f9f28b5129#egg=edward2
pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-2.0.0-cp37-cp37m-manylinux2010_x86_64.whl#egg=tensorflow
pip install tensorflow-probability
Sometimes I see the following (when acceptance rate=0):
And, sometimes I see the following (when acceptance rate>.9):
When I get unstable results in Bayesian inference (I use mc-stan, but it's also using NUTS), it's usually because either the priors and likelihood are mis-specified, or the hyperparameters are not good for the problem.
That first graph shows that the sampler never moved away from the initial guess at the answers (hence the 0 acceptance rate). It also worries me that the green distribution seems to be right on 0. The beta(1,1) has positive probability at 0 but a p=0 might be an unstable solution here? (as in, the sampler may not be able to calculate the derivative at that point and returns a NaN, so doesn't know where to sample next?? Complete guess there).
Can you force the initial condition to be 0 and see if that always creates a failed sampling?
Other than that, I would try tweaking the hyperparameters, such as step size, number of iterations, etc...
Also, you may want to simplify the example by only using one N. Might help you diagnose. Good luck!
random.uniform's maxval default value is None. I changed it to 1, the result became stable.
random.uniform(( len(N) ,), minval=0, maxval=1)

How to receive a finite number of samples at a future time using UHD/GNURadio?

I'm using the GNURadio python interface to UHD, and I'm trying to set a specific time to start collecting samples and either collect a specific number of samples or stop the collection of samples at a specific time. Essentially, creating a timed snapshot of samples. This is something similar to the C++ Ettus UHD example 'rx_timed_sample'.
I can get a flowgraph to start at a specific time, but I can't seem to get it to stop at a specific time (at least without causing overflows). I've also tried doing a finite aquisition, which works, but I can't get it to start at a specific time. So I'm kind of lost at what to do next.
Here is my try at the finite acquisition (seems to just ignore the start time and collects 0 samples):
num_samples = 1000
usrp = uhd.usrp_source(
",".join(("", "")),
uhd.stream_args(
cpu_format="fc32",
channels=range(1),
),
)
...
usrp.set_start_time(absolute_start_time)
samples = usrp.finite_acquisition(num_samples)
I've also tried some combinations of following without success (TypeError: in method 'usrp_source_sptr_issue_stream_cmd', argument 2 of type '::uhd::stream_cmd_t const &'):
usrp.set_command_time(absolute_start_time)
usrp.issue_stream_cmd(uhd.stream_cmd.STREAM_MODE_NUM_SAMPS_AND_DONE)
I also tried the following in a flowgraph:
...
usrp = flowgrah.uhd_usrp_source_0
absolute_start_time = uhd.uhd_swig.time_spec_t(start_time)
usrp.set_start_time(absolute_start_time)
flowgrah.start()
stop_cmd = uhd.stream_cmd(uhd.stream_cmd.STREAM_MODE_STOP_CONTINUOUS)
absolute_stop_time = absolute_start_time + uhd.uhd_swig.time_spec_t(collection_time)
usrp.set_command_time(absolute_stop_time)
usrp.issue_stream_cmd(stop_cmd)
For whatever reason the flowgraph one generated overflows consistently for anything greater than a .02s collection time.
I was running into a similar issue and solved it by using the head block.
Here's a simple example which saves 10,000 samples from a sine wave source then exits.
#!/usr/bin/env python
# Evan Widloski - 2017-09-03
# Logging test in gnuradio
from gnuradio import gr
from gnuradio import blocks
from gnuradio import analog
class top_block(gr.top_block):
def __init__(self, output):
gr.top_block.__init__(self)
sample_rate = 32e3
num_samples = 10000
ampl = 1
source = analog.sig_source_f(sample_rate, analog.GR_SIN_WAVE, 100, ampl)
head = blocks.head(4, num_samples)
sink = blocks.file_sink(4, output)
self.connect(source, head)
self.connect(head, sink)
if __name__ == '__main__':
try:
top_block('/tmp/out').run()
except KeyboardInterrupt:
pass

sklearn's `RandomizedSearchCV` not working with `np.random.RandomState`

I am trying to optimize a pipeline and wanted to try giving RandomizedSearchCV a np.random.RandomState object. I can't it to work but I can give it other distributions.
Is there a special syntax I can use to give RandomSearchCV a np.random.RandomState(0).uniform(0.1,1.0)?
from scipy import stats
import numpy as np
from sklearn.neighbors import KernelDensity
from sklearn.grid_search import RandomizedSearchCV
# Generate data
x = np.random.normal(5,1,size=int(1e3))
# Make model
model = KernelDensity()
# Gridsearch for best params
# This one works
search_params = RandomizedSearchCV(model, param_distributions={"bandwidth":stats.uniform(0.1, 1)}, n_iter=30, n_jobs=2)
search_params.fit(x[:, None])
# RandomizedSearchCV(cv=None, error_score='raise',
# estimator=KernelDensity(algorithm='auto', atol=0, bandwidth=1.0, breadth_first=True,
# kernel='gaussian', leaf_size=40, metric='euclidean',
# metric_params=None, rtol=0),
# fit_params={}, iid=True, n_iter=30, n_jobs=2,
# param_distributions={'bandwidth': <scipy.stats._distn_infrastructure.rv_frozen object at 0x106ab7da0>},
# pre_dispatch='2*n_jobs', random_state=None, refit=True,
# scoring=None, verbose=0)
# This one doesn't work :(
search_params = RandomizedSearchCV(model, param_distributions={"bandwidth":np.random.RandomState(0).uniform(0.1, 1)}, n_iter=30, n_jobs=2)
# TypeError: object of type 'float' has no len()
What you observe is expected, as the class-method uniform of an object of type np.random.RandomState() immediately draws a sample at the time of the call.
Compared to that, your usage of scipy's stats.uniform() creates a distribution yet to sample from. (Although i'm not sure if it's working as you expect in your case; be careful with the parameters).
If you want to incorporate something based on np.random.RandomState() you have to build your own class like mentioned in the docs:
This example uses the scipy.stats module, which contains many useful distributions for sampling parameters, such as expon, gamma, uniform or randint. In principle, any function can be passed that provides a rvs (random variate sample) method to sample a value. A call to the rvs function should provide independent random samples from possible parameter values on consecutive calls.