bridgesampling with data from R2OpenBugs - bridge

I am working on Bayesian Model Averaging and need to calculate posterior model probabilities.
I want to use `bridge sampling' method recommended by Meng and Wong (1996) and Lopes and West (2004). I have posterior sample as mcmc.list class that I obtained using R2OPenBugs (90,000 samples, each sample has four values). To approximate the marginal likelihood, an approximation function g() to the joint posterior density is chosen. For simplicity, we
took g() as the tetravariate normal density function with mean set to the empirical mean
vector and variance set to the empirical variance-covariance matrix from the mcmc sample.
I dont know how to code in R this function nor how to use existing bridge_sampler function.
Bridge_sampler function has arguments that I dont know what they mean. Tutorial gives example how to use bridge_sampler function with jugs data but not with R2OpenBugs data.
Any help is appreciated!

Related

the covariance matrix returned by function ‘numpy.polyfit’

when I set ‘cov=True’,the function ‘numpy.polyfit’ will return a covariance matrix,I wonder what does it mean and how to calculate it,Thanks a lot !!!
the author of ‘ZSSR’ super resolution algorithm adopted a learning rate adjustment strategy,which first take a linear fit of reconstruction error,and then compare the standard deviation and the slope to guide the adjustment of learning rate。I can‘t see the relationship between the learning rate and the standard deviation(in fact I don’t know what does the covariance matrix returned by the funciton mean),so I wonder know the detail of this covariance matrix。

Get covariance best-fit parameters obtained by lmfit using non-"Leastsq"methods

I have some observational data and I want to fit some model parameters by using lmfit.Minimizer() to minimize an error function which, for reasons I won't get into here, must return a float instead of an array of residuals. This means that I cannot use the Leastsq method to minimize the function. In practice, methods nelder, BFGS and powell converge fine, but these methods do not provide the covariance of the best-fit parameters (MinimizerResult.covar).
I would like to know if thee is a simple way to compute this covariance when using any of the non-Leastsq methods.
It is true that the leastsq method is the only method that can calculate error bars and that this requires a residual array (with more elements than variables!).
It turns out that some work has been done in lmfit toward the goal of being able to compute uncertainties for scalar minimizers, but it is not complete. See https://github.com/lmfit/lmfit-py/issues/169 and https://github.com/lmfit/lmfit-py/pull/481. If you're interested in helping, that would be great!
But, yes, you could compute the covariance by hand. For each variable, you would need to make a small perturbation to its value (ideally around 1-sigma, but since that is what you're trying to calculate, you probably don't know it) and then fix that value and optimize all the other values. In this way you can compute the Jacobian matrix (derivative of the residual array with respect to the variables).
From the Jacobian matrix, the covariance matrix is (assuming there are no singularities):
covar = numpy.inv(numpy.dot(numpy.transpose(jacobian), jacobian))

Error propagation in a Bayesian analysis of a Markov chain

I'm analysing longitudinal panel data, in which individuals transition between different states in a Markov chain. I'm modelling the transition rates between states using a series of multinomial logistic regressions. This means that I end up with a very large number of regression slopes.
For each regression slope, I obtain a posterior distribution (using WinBUGS). From the posterior distribution, we get the mean, standard deviation, and 95% credible interval associated with the slope in question.
The value I am ultimately interested in is the expected first passage time ('hitting time') through the Markov chain. This is a function of all the different predictor variables, and so is built from the many regression slopes produced by the multinomial logistic regressions.
A simple approach would be to take the mean of each posterior distribution as a point-estimate for each regression slope, and solve for the expected first passage time at a series of different values of the predictor variables. I have now done this, but it is potentially misleading because it doesn't show the uncertainty around the predicted values of expected first passage time.
My question is: how can I calculate a credible interval for the expected first passage time?
My first thought was to approximate the error via simulation, by sampling individual values for the regression slopes from each posterior distribution, obtaining the expected first passage time given those values, and then plotting the standard deviation of all these simulated values. However, I feel like (a) this would make a statistician scream and (b) it doesn't take into account the fact that different posterior distributions will be correlated (it samples from each one independently).
In WinBUGS, you can actually obtain the correlations between the posterior distributions. So if the simulation idea is appropriate, I could in theory simulate the regression slope coefficients incorporating these correlations.
Is there a more direct and less approximate way to find the uncertainty? Could I, for instance, use WinBUGS to find the posterior distribution of the expected first passage time for a given set of values of the predictor variables? Rather like the answer to this question: define a new node and monitor it. I would imagine defining a series of new nodes, where each one is for a different set of actual predictor values, and monitoring each one. Does this make good statistical sense?
Any thoughts about this would be really appreciated!

taking the gradient in Tensorflow, tf.gradient

I am using this function of tensorflow to get my function jacobian. Came across two problems:
The tensorflow documentation is contradicted to itself in the following two paragraph if I am not mistaken:
gradients() adds ops to the graph to output the partial derivatives of ys with respect to xs. It returns a list of Tensor of length len(xs) where each tensor is the sum(dy/dx) for y in ys.
Blockquote
Blockquote
Returns:
A list of sum(dy/dx) for each x in xs.
Blockquote
According to my test, it is, in fact, return a vector of len(ys) which is the sum(dy/dx) for each x in xs.
I do not understand why they designed it in a way that the return is the sum of the columns(or row, depending on how you define your Jacobian).
How can I really get the Jacobian?
4.In the loss, I need the partial derivative of my function with respect to input (x), but when I am optimizing with respect to the network weights, I define x as a placeholder whose value is fed later, and weights are variable, in this case, can I still define the symbolic derivative of function with respect to input (x)? and put it in the loss? ( which later when we optimize with respect to weights will bring second order derivative of the function.)
I think you are right and there is a typo there, it was probably meant to be "of length len(ys)".
For efficiency. I can't explain exactly the reasoning, but this seems to be a pretty fundamental characteristic of how TensorFlow handles automatic differentiation. See issue #675.
There is no straightforward way to get the Jacobian matrix in TensorFlow. Take a look at this answer and again issue #675. Basically, you need one call to tf.gradients per column/row.
Yes, of course. You can compute whatever gradients you want, there is no real difference between a placeholder and any other operation really. There are a few operations that do not have a gradient because it is not well defined or not implemented (in which case it will generally return 0), but that's all.

Non-Convex Loss Function

I am trying to understand gradient descent algorithm by plotting the error vs value of parameters in the function. What would be an example of a simple function of the form y = f(x) with just just one input variable x and two parameters w1 and w2 such that it has a non-convex loss function ? Is y = w1.tanh(w2.x) an example ? What i am trying to achieve is this :
How does one know if the function has a non-convex loss function without plotting the graph ?
In iterative optimization algorithms such as gradient descent or Gauss-Newton, what matters is whether the function is locally convex. This is correct (on a convex set) if and only if the Hessian matrix (Jacobian of gradient) is positive semi-definite. As for a non-convex function of one variable (see my Edit below), a perfect example is the function you provide. This is because its second derivative, i.e Hessian (which is of size 1*1 here) can be computed as follows:
first_deriv=d(w1*tanh(w2*x))/dx= w1*w2 * sech^2(w2*x)
second_deriv=d(first_deriv)/dx=some_const*sech^2(w2*x)*tanh(w2*x)
The sech^2 part is always positive, so the sign of second_deriv depends on the sign of tanh, which can vary depending on the values you supply as x and w2. Therefore, we can say that it is not convex everywhere.
Edit: It wasn't clear to me what you meant by one input variable and two parameters, so I assumed that w1 and w2 were fixed beforehand, and computed the derivative w.r.t x. But I think that if you want to optimize w1 and w2 (as I suppose it makes more sense if your function is from a toy neural net), then you can compute the 2*2 Hessian in a similar way.
The same way as in high-school algebra: the second derivative tells you the direction of flex. If that's negative in all orientations, then the function is convex.