pymc python change point detection for small probabilities. ZeroProbability Error - bayesian

I am trying to use pymc to find a change point in a time-series. The value I am looking at over time is probability to "convert" which is very small, 0.009 on average with a range of 0.001-0.016.
I give the two probabilities a uniform distribution as a prior between zero and the max observation.
alpha = df.cnvrs.max() # Set upper uniform
center_1_c = pm.Uniform("center_1_c", 0, alpha)
center_2_c = pm.Uniform("center_2_c", 0, alpha)
day_c = pm.DiscreteUniform("day_c", lower=1, upper=n_days)
#pm.deterministic
def lambda_(day_c=day_c, center_1_c=center_1_c, center_2_c=center_2_c):
out = np.zeros(n_days)
out[:day_c] = center_1_c
out[day_c:] = center_2_c
return out
observation = pm.Uniform("obs", lambda_, value=df.cnvrs.values, observed=True)
When I run this code I get:
ZeroProbability: Stochastic obs's value is outside its support,
or it forbids its parents' current values.
I'm pretty new to pymc so not sure if I'm missing something obvious. My guess is I might not have appropriate distributions for modelling small probabilities.

It's impossible to tell where you've introduced this bug—and programming is off-topic here, in any case—without more of your output. But there is a statistical issue here: You've somehow constructed a model that cannot produce either the observed variables or the current sample of latent ones.
To give a simple example, say you have a dataset with negative values, and you've assumed it to be gamma distributed; this will produce an error, because the data has zero probability under a gamma. Similarly, an error will be thrown if an impossible value is sampled during an MCMC chain.

Related

Parameters for numpy.random.lognormal function

I need to create a fictitious log-normal distribution of household income in a particular area. The data I have are: Average: 13,600 and Standard Deviation 7,900.
What should be the parameters in the function numpy.random.lognormal?
When i set the mean and the standard deviation as they are most of the values in the distribution are "inf", and the values also doesn't make sense when i set the parameters as the log of the mean and standard deviation.
If someone can help me to figure out what the parameters are it would be great.
Thanks!
This is indeed a nontrivial task as the moments of the log-normal distribution should be solved for the unknown parameters. By looking at say [Wikipedia][1], you will find the mean and variance of the log-normal distribution to be exp(mu + sigma2) and [exp(sigma2)-1]*exp(2*mu+sigma**2), respectively.
The choice of mu and sigma should solve exp(mu + sigma**2) = 13600 and [exp(sigma**2)-1]*exp(2*mu+sigma**2)= 7900**2. This can be solved analytically because the first equation squared provides exactly exp(2*mu+sigma**2) thus eliminating the variable mu from the second equation.
A sample code is provided below. I took a large sample size to explicitly show that the mean and standard deviation of the simulated data are close to the desired numbers.
import numpy as np
# Input characteristics
DataAverage = 13600
DataStdDev = 7900
# Sample size
SampleSize = 100000
# Mean and variance of the standard lognormal distribution
SigmaLogNormal = np.sqrt( np.log(1+(DataStdDev/DataAverage)**2))
MeanLogNormal = np.log( DataAverage ) - SigmaLogNormal**2/2
print(MeanLogNormal, SigmaLogNormal)
# Obtain draw from log-normal distribution
Draw = np.random.lognormal(mean=MeanLogNormal, sigma=SigmaLogNormal, size=SampleSize)
# Check
print( np.mean(Draw), np.std(Draw))

Question about inconsistency between tensorflow lite quantization code, paper and documentation

In this paper (Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference) published by google, quantization scheme is described as follows:
Where,
M = S1 * S2 / S3
S1, S2 and S3 are scales of inputs and output respectively.
Both S1 (and zero point Z1) and S2 (and zero point Z2) can be determined easily, whether "offline" or "online". But what about S3 (and zero point Z3)? These parameters are dependent on "actual" output scale (i.e., the float value without quantization). But output scale is unknown before it is computed.
According to the tensorflow documentation:
At inference, weights are converted from 8-bits of precision to floating point and computed using floating-point kernels. This conversion is done once and cached to reduce latency.
But the code below says something different:
tensor_utils::BatchQuantizeFloats(
input_ptr, batch_size, input_size, quant_data, scaling_factors_ptr,
input_offset_ptr, params->asymmetric_quantize_inputs);
for (int b = 0; b < batch_size; ++b) {
// Incorporate scaling of the filter.
scaling_factors_ptr[b] *= filter->params.scale;
}
// Compute output += weight * quantized_input
int32_t* scratch = GetTensorData<int32_t>(accum_scratch);
tensor_utils::MatrixBatchVectorMultiplyAccumulate(
filter_data, num_units, input_size, quant_data, scaling_factors_ptr,
batch_size, GetTensorData<float>(output), /*per_channel_scale=*/nullptr,
input_offset_ptr, scratch, row_sums_ptr, &data->compute_row_sums,
CpuBackendContext::GetFromContext(context));
Here we can see:
scaling_factors_ptr[b] *= filter->params.scale;
I think this means:
S1 * S2 is computed.
The weights are still integers. Just the final results are floats.
It seems S3 and Z3 don't have to be computed. But if so, how can the final float results be close to the unquantized results?
This inconsistency between paper, documentation and code makes me very confused. I can't tell what I miss. Can anyone help me?
Let me answer my own question. All of a sudden I saw what I missed when I was
riding bicycle. The code in the question above is from the function
tflite::ops::builtin::fully_connected::EvalHybrid(). Here the
name has explained everything! Value in the output of matrix multiplication is
denoted as r3 in section 2.2 of the paper. In terms of equation
(2) in section 2.2, we have:
If we want to get the float result of matrix multiplication, we can use equation (4) in section 2.2, then convert the result back to floats, OR we can use equation (3) with the left side replaced with r3, as in:
If we choose all the zero points to be 0, then the formula above becomes:
And this is just what EvalHybrid() does (ignoring the bias for the moment). Turns out the paper gives an outline of the quantization algorithm, while the implementation uses different variants.

Tensorflow Tensorboard - should I follow the "smooth" value or the "Value"?

I am using TF tensorboard to monitor the training progress for a model. I am getting a bit confused because I am seeing the two points that represent the validation loss value showing a different direction:
Time=13:30 Smoothed=18.33 Value=15.41..........
Time=13:45 Smoothed=17.76 Value=16.92
In this case, is the validation loss increasing or decreasing? thanks!
As I cannot put figures in the comments, have a look at this graph.
If you watch the falling slope between x = 50 and x = 100, you will see that locally, the real values increase at some points (usually after downward spikes). So you could conclude that your function values are increasing. But at a larger scope you will see that the function values are decreasing. The smoothing helps you to get make the interpretation easier, but does not return exact values.
Coming back to the local example, it would give you the insight that the overall trend is a decreasing function, but it does not provide accurate loss values.

Tensorflow: What exact formula is applied in `tf.nn.sparse_softmax_cross_entropy_with_logits`?

I tried to manually recompute the outputs of this function so I created a minimal example:
logits = tf.pack(np.array([[[[0,1,2]]]],dtype=np.float32)) # img of shape (1, 1, 1, 3)
labels = tf.pack(np.array([[[1]]],dtype=np.int32)) # gt of shape (1, 1, 1)
softmaxCrossEntropie = tf.nn.sparse_softmax_cross_entropy_with_logits(logits,labels)
softmaxCrossEntropie.eval() # --> output is [1.41]
Now according to my own calculation I only get [1.23]
When manually calculating, I'm simply applying softmax
and cross-entropy:
where q(x) = sigma(x_j) or (1-sigma(x_j)) depending whether j is the correct ground truth class or not and p(x) = labels which are then one-hot-encoded
I'm not sure where the difference might originate from. I cannot really imagine that some epsilon causes such a big difference. Does someone know where I can lookup, which exact formula is used by tensorflow?
Is the source code of that exact part available?
I could only find nn_ops.py, but it only uses another function called gen_nn_ops._sparse_softmax_cross_entropy_with_logits which I couldn't find on github...
Well, usually p(x) in cross-entropy equation is true distribution, while q(x) is the distribution obtained from softmax. So, if p(x) is one-hot (and this is so, otherwise sparse cross-entropy could not be applied), cross entropy is just negative log for probability of true category.
In your example, softmax(logits) is a vector with values [0.09003057, 0.24472847, 0.66524096], so the loss is -log(0.24472847) = 1.4076059 which is exactly what you got as output.

PyMC: How can I describe a state space model?

I used to code my MCMC using C. But I'd like to give PyMC a try.
Suppose X_n is the underlying state whose dynamics following a Markov chain and Y_n is the observed data. In particular,
Y_n has Poisson distribution with mean depending on X_n and a multidimensional unknown parameter theta
X_n | X_{n-1} has distribution depending on theta
How should I describe this model using PyMC?
Another question: I can find conjugate priors for theta but not for X_n. Is it possible to specify which posteriors are updated using conjugate priors and which using MCMC?
Here is an example of a state-space model in PyMC on the PyMC wiki. It basically involves populating a list and allowing PyMC to treat it as a container of PyMC nodes.
As for the second part of the question, you could certainly calculate some of your conjugate posteriors ahead of time and put them into the model. For example, if you observed binomial data x=4, n=10 you could insert a Beta node p = Beta('p', 5, 7) to represent that posterior (its really just a prior, as far as the model is concerned, but it is the posterior given data x). Then PyMC would draw a sample for this posterior at every iteration to be used wherever it is needed in the model.