What are the interval transforms in pymc3 for uniform distributions? - bayesian

I've noticed that when using uniform distributions in pymc3, the sampler also scans over an _interval parameter as well unless a transform is specified for example:
with fitModel6:
normMu = pm.Uniform('normMu',lower=0,upper=1000)
will result in not only sampling over normMu, but also, normMu_interval:
trace plot of interval
trace plot of parameter
Normally, when I am using an uniform prior for scale parameter like a normalization, I will of course sample over the log interval. Is pymc3 handling this for me somehow?
Cheers

PyMC3 automatically applies transformations to bounded variables to put them on an unconstrained scale. The code for each transformation is here and a very brief discussion of the automatic transformation of variables is found in the official PyMC3 Tutorial.
Edit
In case the link breaks/moves again, here's the bulk of the info from the Tutorial
In order to sample models more efficiently, PyMC3 automatically transforms bounded RVs to be unbounded.
with pm.Model() as model:
x = pm.Uniform('x', lower=0, upper=1)`
When we look at the RVs of the model, we would expect to find x there, however:
In [16]: model.free_RVs
Out[16]: [x_interval__]
x_interval__ represents x transformed to accept parameter values between -inf and +inf. In the case of an upper and a lower bound, a LogOdds transform is applied. Sampling in this transformed space makes it easier for the sampler...

Related

taking the gradient in Tensorflow, tf.gradient

I am using this function of tensorflow to get my function jacobian. Came across two problems:
The tensorflow documentation is contradicted to itself in the following two paragraph if I am not mistaken:
gradients() adds ops to the graph to output the partial derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.
Blockquote
Blockquote
Returns:
A list of sum(dy/dx) for each x in xs.
Blockquote
According to my test, it is, in fact, return a vector of len(ys) which is the sum(dy/dx) for each x in xs.
I do not understand why they designed it in a way that the return is the sum of the columns(or row, depending on how you define your Jacobian).
How can I really get the Jacobian?
4.In the loss, I need the partial derivative of my function with respect to input (x), but when I am optimizing with respect to the network weights, I define x as a placeholder whose value is fed later, and weights are variable, in this case, can I still define the symbolic derivative of function with respect to input (x)? and put it in the loss? ( which later when we optimize with respect to weights will bring second order derivative of the function.)
I think you are right and there is a typo there, it was probably meant to be "of length len(ys)".
For efficiency. I can't explain exactly the reasoning, but this seems to be a pretty fundamental characteristic of how TensorFlow handles automatic differentiation. See issue #675.
There is no straightforward way to get the Jacobian matrix in TensorFlow. Take a look at this answer and again issue #675. Basically, you need one call to tf.gradients per column/row.
Yes, of course. You can compute whatever gradients you want, there is no real difference between a placeholder and any other operation really. There are a few operations that do not have a gradient because it is not well defined or not implemented (in which case it will generally return 0), but that's all.

pymc python change point detection for small probabilities. ZeroProbability Error

I am trying to use pymc to find a change point in a time-series. The value I am looking at over time is probability to "convert" which is very small, 0.009 on average with a range of 0.001-0.016.
I give the two probabilities a uniform distribution as a prior between zero and the max observation.
alpha = df.cnvrs.max() # Set upper uniform
center_1_c = pm.Uniform("center_1_c", 0, alpha)
center_2_c = pm.Uniform("center_2_c", 0, alpha)
day_c = pm.DiscreteUniform("day_c", lower=1, upper=n_days)
#pm.deterministic
def lambda_(day_c=day_c, center_1_c=center_1_c, center_2_c=center_2_c):
out = np.zeros(n_days)
out[:day_c] = center_1_c
out[day_c:] = center_2_c
return out
observation = pm.Uniform("obs", lambda_, value=df.cnvrs.values, observed=True)
When I run this code I get:
ZeroProbability: Stochastic obs's value is outside its support,
or it forbids its parents' current values.
I'm pretty new to pymc so not sure if I'm missing something obvious. My guess is I might not have appropriate distributions for modelling small probabilities.
It's impossible to tell where you've introduced this bug—and programming is off-topic here, in any case—without more of your output. But there is a statistical issue here: You've somehow constructed a model that cannot produce either the observed variables or the current sample of latent ones.
To give a simple example, say you have a dataset with negative values, and you've assumed it to be gamma distributed; this will produce an error, because the data has zero probability under a gamma. Similarly, an error will be thrown if an impossible value is sampled during an MCMC chain.

Elementwise Sampling with map_fn Slow

Say that I want to sample a matrix with each entry sampled from a distribution defined by an entry in another matrix. I unroll my matrix and apply map_fn to each element. With a relatively small matrix (128 x 128), the following gives me several PoolAllocator warnings (GTX TITAN Black) and does not train in any reasonable amount of time.
def sample(x):
samples = tf.map_fn(lambda z:
tf.random_normal([1], mean=z,
stddev=tf.sqrt(z * (1 - z))),
tf.reshape(x, [-1])) # apply to each element
return tf.cond(is_training, lambda: tf.reshape(samples, shape=tf.shape(x)),
lambda: tf.tanh(x))
Is there a better way to apply an elementwise operation like this?
Your code will run much faster if you can use Tensor-at-a-time operations instead of elementwise operations like tf.map_fn.
Here it looks like you want to sample from a normal distribution for each element, where the parameters of the distribution are different for each value in an input tensor. Try something like this:
def sample(x):
samples = tf.random_normal(shape=[128, 128]) * tf.sqrt(x * (1 - x)) + x
tf.random_normal() generates a normal distribution with mean 0.0 and standard deviation 1.0 by default. You can use point-wise tensor operations to fix up the standard deviation (by multiplying) and the mean (by adding) for each element. In fact, if you look at how tf.random_normal() is implemented, that's precisely what it does internally.
(You would probably also do better using a Python conditional to distinguish training from test time.)
If you plan to do this sort of thing a lot, you might file a feature request on github asking to generalize tf.random_normal to accept Tensors with more general shapes for mean and stddev. I see no reason why that shouldn't be supported.
Hope that helps!
See the tensorflow.contrib.distributions module, which has a Normal class with a sample method that does this for you.

Error when computing eigenvalues of a scipy LinearOperator: "gmres did not converge"

I'm trying to solve a large eigenvalue problem with Scipy where the matrix A is dense but I can compute its action on a vector without having to assemble A explicitly. So in order to avoid memory issues when the matrix A gets big I'd like to use the sparse solver scipy.sparse.linalg.eigs with a LinearOperator that implemements this action.
Applying eigs to an explicit numpy array A works fine. However, if I apply eigs to a LinearOperator instead then the iterative solver fails to converge. This is true even if the matvec method of the LinearOperator is simply matrix-vector multiplication with the given matrix A.
A minimal example illustrating the failure is attached below (I'm using shift-invert mode because I am interested in the smallest few eigenvalues). This computes the eigenvalues of a random matrix A just fine, but fails when applied to a LinearOperator that is directly converted from A. I tried to fiddle with the parameters for the iterative solver (v0, ncv, maxiter) but to no avail.
Am I missing something obvious? Is there a way to make this work? Any suggestions would be highly appreciated. Many thanks!
Edit: I should clarify what I mean by "make this work" (thanks, Dietrich). The example below uses a random matrix for illustration. However, in my application I know that the eigenvalues are almost purely imaginary (or almost purely real if I multiply the matrix by 1j). I'm interested in the 10-20 smallest-magnitude eigenvalues, but the algorithm doesn't behave well (i.e., never stops even for small-ish matrix sizes) if I specify which='SM'. Therefore I'm using shift-invert mode by passing the parameters sigma=0.0, which='LM'. I'm happy to try a different approach so long as it allows me to compute a bunch of smallest-magnitude eigenvalues.
from scipy.sparse.linalg import eigs, LinearOperator, aslinearoperator
import numpy as np
# Set a seed for reproducibility
np.random.seed(0)
# Size of the matrix
N = 100
# Generate a random matrix of size N x N
# and compute its eigenvalues
A = np.random.random_sample((N, N))
eigvals = eigs(A, sigma=0.0, which='LM', return_eigenvectors=False)
print eigvals
# Convert the matrix to a LinearOperator
A_op = aslinearoperator(A)
# Try to solve the same eigenproblem again.
# This time it produces an error:
#
# ValueError: Error in inverting M: function gmres did not converge (info = 1000).
eigvals2 = eigs(A_op, sigma=0.0, which='LM', return_eigenvectors=False)
I tried running your code, but not passing the sigma parameter to eigs() and it ran without problems (read eigs() docs for its meaning). I didn't see the benefit of it in your example.
Eigs can already find the smallest eigenvalues first. Set which = 'SM'

PyMC: How can I describe a state space model?

I used to code my MCMC using C. But I'd like to give PyMC a try.
Suppose X_n is the underlying state whose dynamics following a Markov chain and Y_n is the observed data. In particular,
Y_n has Poisson distribution with mean depending on X_n and a multidimensional unknown parameter theta
X_n | X_{n-1} has distribution depending on theta
How should I describe this model using PyMC?
Another question: I can find conjugate priors for theta but not for X_n. Is it possible to specify which posteriors are updated using conjugate priors and which using MCMC?
Here is an example of a state-space model in PyMC on the PyMC wiki. It basically involves populating a list and allowing PyMC to treat it as a container of PyMC nodes.
As for the second part of the question, you could certainly calculate some of your conjugate posteriors ahead of time and put them into the model. For example, if you observed binomial data x=4, n=10 you could insert a Beta node p = Beta('p', 5, 7) to represent that posterior (its really just a prior, as far as the model is concerned, but it is the posterior given data x). Then PyMC would draw a sample for this posterior at every iteration to be used wherever it is needed in the model.