Choosing prior distribution in logit model - error-handling

I am conducting a logit regression analysis in winbugs from R. I have to force all of the coefficients of this model to be positive. Therefore, I used uniform priors for all of the coefficients, but winbugs is not happy with that: it generates a silly error windows. When I used dnorm(0.0,1.0E-4)) as prior for all the coefficients, the problem was solved. What can be done to obtain positive betas in this model given below?
model
{
for (i in 1:m) {
# Linear regression on logit
logit(p[i]) <- beta.concern2*DCEconcern2[i] + beta.concern3*DCEconcern3[i] + beta.concern4*DCEconcern4[i] + beta.concern5*DCEconcern5[i] +
beta.breath2*DCEbreath2[i] + beta.breath3*DCEbreath3[i] + beta.breath4*DCEbreath4[i] + beta.breath5*DCEbreath5[i] +
beta.weath2*DCEweath2[i] +beta.weath3*DCEweath3[i] +beta.weath4*DCEweath4[i] +beta.weath5*DCEweath5[i] +
beta.sleep2*DCEsleep2[i] +beta.sleep3*DCEsleep3[i] +beta.sleep4*DCEsleep4[i] +beta.sleep5*DCEsleep5[i] +
beta.act2*DCEact2[i] +beta.act3*DCEact3[i] +beta.act4*DCEact4[i] +beta.act5*DCEact5[i]
y2[i] ~ dbern(p[i])
}
beta.concern2 ~ dunif(0,100)
beta.concern3 ~ dunif(0,100)
beta.concern4 ~ dunif(0,100)
beta.concern5 ~ dunif(0,100)
beta.breath2 ~ dunif(0,100)
beta.breath3 ~ dunif(0,100)
beta.breath4 ~ dunif(0,100)
beta.breath5 ~ dunif(0,100)
beta.weath2 ~ dunif(0,100)
beta.weath3 ~ dunif(0,100)
beta.weath4 ~ dunif(0,100)
beta.weath5 ~ dunif(0,100)
beta.sleep2 ~ dunif(0,100)
beta.sleep3 ~ dunif(0,100)
beta.sleep4 ~ dunif(0,100)
beta.sleep5 ~ dunif(0,100)
beta.act2 ~ dunif(0,100)
beta.act3 ~ dunif(0,100)
beta.act4 ~ dunif(0,100)
beta.act5 ~ dunif(0,100)
}

Try
dnorm(0, 1.0E-8)I(0, 1.0E8)
Notice the 1.0 instead of the 10, which was causing the "expected right parenthesis" error.

In your case, I would prefer half flat normal, i.e. something like
dnorm(0, 1.0E-8)I(0, 1.0E8)
Give it a shot.
EDIT: the added I(a, b) just limits the distribution to the interval from a to b.

Related

Specify logit function explicitly in WinBUGS/OpenBUGS

I'm new to OpenBUGS and I got some problem in fitting a model with the logit() function.
Reading around I found that one possible solution for this would be explicit specify the logit function without using the WinBUGS’ own logit function:
In more complex models, we have fairly often experienced problems when
using WinBUGS’ own logit function, for instance with achieving
convergence (actually, problems may arise even with fairly simple
models.). Therefore,it is often better to specify that transformation
explicitly by logit.p[i] <- log(p[i] / (1 – p[i])), p[i] <-
exp(logit.p[i]) / (1 + exp(logit.p[i])) or p[i] <- 1 / (1 + exp(-
logit.p[i])).
(more information here http://www.mbr-pwrc.usgs.gov/software/kerybook/AppendixA_list_of_WinBUGS_tricks.pdf at point 14.).
The problem is that I don't understand how to do that, let's suppose that my original likelihood function, using the WinBUGS integrated logit function, was:
for (i in 1:n){
y[i] ~ dbern(p[i])
logit(p[i]) <- beta[1] + beta[2]*x1[i] + beta[3]*x2[i] + beta[4]*x3[i]
}
How I explicit write that?
Thank you very much.
Vincenzo
Thanks to a colleague, I found the way to explicitly specify a logit function in OpenBUGS, the working code is the following:
for (i in 1:n){
y[i] ~ dbern(logit.p[i])
logit.p[i] <- 1 / (1 + exp(-p[i]))
p[i] <- beta[1] + beta[2]*x1[i] + beta[3]*x2[i] + beta[4]*x3[i]
}

WinBugs error Trap -undefined real result

I am writing a WinBugs code for the Bayesian Statistics question :
Consider the following model that takes into account the fact that VIX (first variable) provides information for the variance of SP500 (second variable) and the fact that $Y_t^S$ and $Y_t^V$ may be correlated:
The model is at http://i.stack.imgur.com/qMHdq.png
for $t = 1, \ldots, 200$, where $\rho$ reflects the correlation between the increments of $Y_t^S$ and $Y_t^V$, $\alpha$ is a parameter taking values in the real line and $N_2(M,V)$ denotes a bivariate normal distribution with mean $M$ and covariance matrix $V$.
(The question is:)
Assign suitable priors to the parameters $\mu_s$, $\mu_v$, $\sigma$, $\omega$, $\rho$, $\alpha$ and write a WinBugs script to fit this model to your data. Implement it to sample from the posterior distribution of this model's parameters.
The WinBugs Code is :
model{for(i in 1:200){
y[i+1,1:2] ~ dnorm(mean[i,1:2],tau[i,1:2,1:2])
mean[i,1] <- y[i,1]+mu[1]+alpha*exp(y[i,2])
mean[i,2]<- y[i,2]+mu[2]
tau[i,1,1]<-exp(y[i,2])/prec[1]
tau[i,1,2]<-exp(y[i,2]/2)*rho/sqrt(prec[1]*prec[2])
tau[i,2,1]<-exp(y[i,2]/2)*rho/sqrt(prec[1]*prec[2])
tau[i,2,2]<-(1/(prec[2]))
}
mu[1] ~ dnorm (0, 0.0001)
mu[2] ~ dnorm (0, 0.0001)
prec[1] ~ dgamma (0.001, 0.001)
prec[2] ~ dgamma (0.001, 0.001)
alpha~dnorm(1,10000)
rho~dnorm(0,10)
}
list(y =structure(.Data= c(3.291839303,3.296274588,3.295265738,3.297438773,3.298200053,3.298412011,3.296300932,3.296426043,3.294455203,3.294481658,3.285708048,3.284464574,3.287575569,3.283348727,3.283355512,3.280935583,3.285914948,3.287111684,3.286400327,3.289303491,3.291186746,3.29116009,3.294849647,3.297015994,3.298090756,3.299369994,3.298503754,3.300578094,3.301034339,3.301056053,3.300321518,3.301761166,3.301524809,3.301186314,3.3005194,3.302700982,3.301364274,3.298512491,3.300093081,3.300475917,3.297878641,3.297570124,3.300808449,3.301370783,3.303489809,3.303282476,3.299788312,3.297272339,3.300660688,3.293581304,3.297289862,3.296182373,3.294970773,3.289178542,3.289180774,3.294003026,3.29332277,3.286703413,3.294221453,3.285154331,3.280152517,3.272941046,3.273626206,3.27009395,3.270156904,3.27571666,3.279669225,3.28808818,3.284906505,3.290217199,3.293269718,3.292617095,3.29777145,3.297169381,3.299866701,3.304931922,3.30488027,3.303649561,3.306118232,3.307754826,3.307906605,3.309259582,3.309562037,3.309257451,3.309487508,3.309591846,3.309911091,3.312135025,3.311482607,3.312336061,3.314604473,3.315846543,3.31534678,3.316563686,3.315458122,3.312482018,3.315245917,3.316877848,3.316372983,3.317095535,3.31393257,3.313829271,3.30666945,3.308634834,3.301535654,3.298772321,3.295069851,3.303820042,3.314126455,3.316106697,3.317758387,3.318516185,3.318455693,3.319890621,3.320264714,3.318136407,3.313635254,3.313487574,3.30547605,3.30159638,3.306618004,3.314318146,3.31065296,3.307123626,3.306002323,3.303470376,3.299435382,3.305226653,3.305899267,3.30794935,3.314530804,3.312139259,3.313253293,3.307399755,3.301498781,3.305620033,3.299940723,3.305534079,3.311760217,3.309951512,3.314398169,3.312911143,3.311062677,3.315674421,3.315661824,3.319830321,3.321596359,3.322289603,3.322153111,3.321691617,3.324344199,3.324212469,3.325408924,3.325076221,3.32443474,3.32314893,3.325800858,3.323825279,3.321915182,3.322434321,3.316234618,3.317944305,3.310514886,3.309681258,3.315119807,3.312473558,3.31831173,3.31686738,3.322115879,3.319994568,3.323891208,3.323132421,3.320457869,3.314088528,3.313054794,3.314082206,3.319364268,3.315527433,3.31380186,3.315332072,3.318192769,3.317296379,3.318459865,3.320391417,3.322645108,3.320650938,3.321358125,3.323588265,3.323250037,3.318309644,3.32230201,3.321658486,3.323862366,3.324885109,3.325862386,3.324060105,3.325261087,3.323633617,3.319212277,3.323930349,3.325205636,-1.674871187,-1.837305384,-1.784901741,-1.824437164,-1.877095042,-1.853296595,-1.793076756,-1.802020721,-1.75360385,-1.750339701,-1.541660595,-1.537570704,-1.640896418,-1.545769835,-1.571902641,-1.556650006,-1.604336613,-1.6935902,-1.699715676,-1.778820579,-1.811756808,-1.762148494,-1.818778584,-1.826568672,-1.857709419,-1.859185357,-1.880873164,-1.863628277,-1.868840571,-1.857709419,-1.838025906,-1.843086364,-1.823727823,-1.815963058,-1.796505852,-1.835147398,-1.795132589,-1.739332463,-1.780168274,-1.785580061,-1.751643889,-1.700330607,-1.790343193,-1.795818949,-1.839468745,-1.833711714,-1.727193104,-1.651880385,-1.754258154,-1.611526503,-1.656547093,-1.59284645,-1.575092078,-1.5540471,-1.583117287,-1.674274013,-1.621581021,-1.528943106,-1.641471071,-1.453534332,-1.345690975,-1.216718593,-1.28451135,-1.161741385,-1.197198918,-1.315549541,-1.462376193,-1.587427911,-1.495750895,-1.563454293,-1.585808919,-1.589591272,-1.683878412,-1.639174734,-1.676066767,-1.705884658,-1.663594506,-1.654210604,-1.6972603,-1.728462971,-1.76413233,-1.79444677,-1.777474973,-1.770778032,-1.720871468,-1.751643889,-1.708364571,-1.716473539,-1.710229163,-1.73420046,-1.778820579,-1.79788129,-1.823727823,-1.83658546,-1.750339701,-1.689935542,-1.782193745,-1.808267093,-1.814558711,-1.854765047,-1.694811844,-1.654210604,-1.464249161,-1.394472583,-1.352258787,-1.379888524,-1.255280835,-1.422607479,-1.548864573,-1.565558689,-1.633460313,-1.659476569,-1.685086464,-1.677263996,-1.644350056,-1.596113873,-1.433397543,-1.499648104,-1.401421332,-1.350612172,-1.428435452,-1.538591373,-1.511445758,-1.415487857,-1.373953779,-1.335931446,-1.299891813,-1.357631945,-1.402730434,-1.449377291,-1.570312304,-1.556650006,-1.618216566,-1.527933706,-1.379038217,-1.453534332,-1.356803139,-1.423054399,-1.522402875,-1.47367507,-1.54680019,-1.524410013,-1.463312172,-1.527429445,-1.541148304,-1.628349281,-1.665956408,-1.602685826,-1.622143032,-1.631185029,-1.689327925,-1.67367725,-1.727193104,-1.71772782,-1.71334574,-1.749688341,-1.769444817,-1.716473539,-1.6935902,-1.705265784,-1.636312824,-1.644350056,-1.555087327,-1.545769835,-1.623831253,-1.591760035,-1.613194194,-1.610416485,-1.709607188,-1.703411805,-1.770778032,-1.745142444,-1.731645785,-1.622705408,-1.602685826,-1.643773495,-1.676665175,-1.631185029,-1.641471071,-1.667139772,-1.663005033,-1.660651132,-1.708985657,-1.766120707,-1.800638718,-1.711474452,-1.728462971,-1.782869953,-1.79925891,-1.714595509,-1.752296718,-1.755568243,-1.791708899,-1.807570829,-1.820896234,-1.76413233,-1.812456437,-1.746438846,-1.674274013,-1.792392558,-1.782193745),
.Dim=c(201,2))
)
list( mu=c(0,0), prec=c(1,1),alpha=1,rhi=0.5)
I get an error "multivariate node expected" while compiling the model. What is wrong in the code?
Model
You cannot put multiple means and variances in dnorm, which you are currently doing. The model expects that your likelihood function is multivariate, but you are giving it a univariate likelihood function. That model that you specify is actually multivariate normal, which in JAGS you would specify as dmnorm, which can take a vector of means and then a variance covariance matrix (which you have already specified). Try changing the dnorm to dmnorm at the top of your model and then you should be good to go.

Fit a bayesian linear regression and predict unobservable values

I'd like to use Jags plus R to adjust a linear model with observable quantities, and make inference about unobservable ones. I found lots of example on the internet about how to adjust the model, but nothing on how to extrapolate its coefficients after having fitted the model in the Jags environment. So, I'll appreciate any help on this.
My data looks like the following:
ngroups <- 2
group <- 1:ngroups
nobs <- 100
dta <- data.frame(group=rep(group,each=nobs),y=rnorm(nobs*ngroups),x=runif(nobs*ngroups))
head(dta)
JAGS has powerful ways to make inference about missing data, and once you get the hang of it, it's easy! I strongly recommend that you check out Marc Kéry's excellent book which provides a wonderful introduction to BUGS language programming (JAGS is close enough to BUGS that almost everything transfers).
The easiest way to do this involves, as you say, adjusting the model. Below I provide a complete worked example of how this works. But you seem to be asking for a way to get the prediction interval without re-running the model (is your model very large and computationally expensive?). This can also be done.
How to predict--the hard way (without re-running the model)
For each iteration of the MCMC, simulate the response for the desired x-value based on that iteration's posterior draws for the covariate values. So imagine you want to predict a value for X=10. Then if iteration 1 (post burn-in) has slope=2, intercept=1, and standard deviation=0.5, draw a Y-value from
Y=rnorm(1, 1+2*10, 0.5)
And repeat for iteration 2, 3, 4, 5...
These will be your posterior draws for the response at X=10. Note: if you did not monitor the standard deviation in your JAGS model, you are out of luck and need to fit the model again.
How to predict--the easy way--with worked example
The basic idea is to insert (into your data) the x-values whose responses you want to predict, with the associated y-values NA. For example, if you want a prediction interval for X=10, you just have to include the point (10, NA) in your data, and set a trace monitor for the y-value.
I use JAGS from R with the rjags package. Below is a complete worked example that begins by simulating the data, then adds some extra x-values to the data, specifies and runs the linear model in JAGS via rjags, and summarizes the results. Y[101:105] contains draws from the posterior prediction intervals for X[101:105]. Notice that Y[1:100] just contains the y-values for X[1:100]. These are the observed data that we fed to the model, and they never change as the model updates.
library(rjags)
# Simulate data (100 observations)
my.data <- as.data.frame(matrix(data=NA, nrow=100, ncol=2))
names(my.data) <- c("X", "Y")
# the linear model will predict Y based on the covariate X
my.data$X <- runif(100) # values for the covariate
int <- 2 # specify the true intercept
slope <- 1 # specify the true slope
sigma <- .5 # specify the true residual standard deviation
my.data$Y <- rnorm(100, slope*my.data$X+int, sigma) # Simulate the data
#### Extra data for prediction of unknown Y-values from known X-values
y.predict <- as.data.frame(matrix(data=NA, nrow=5, ncol=2))
names(y.predict) <- c("X", "Y")
y.predict$X <- c(-1, 0, 1.3, 2, 7)
mydata <- rbind(my.data, y.predict)
set.seed(333)
setwd(INSERT YOUR WORKING DIRECTORY HERE)
sink("mymodel.txt")
cat("model{
# Priors
int ~ dnorm(0, .001)
slope ~ dnorm(0, .001)
tau <- 1/(sigma * sigma)
sigma ~ dunif(0,10)
# Model structure
for(i in 1:R){
Y[i] ~ dnorm(m[i],tau)
m[i] <- int + slope * X[i]
}
}", fill=TRUE)
sink()
jags.data <- list(R=dim(mydata)[1], X=mydata$X, Y=mydata$Y)
inits <- function(){list(int=rnorm(1, 0, 5), slope=rnorm(1,0,5),
sigma=runif(1,0,10))}
params <- c("Y", "int", "slope", "sigma")
nc <- 3
n.adapt <-1000
n.burn <- 1000
n.iter <- 10000
thin <- 10
my.model <- jags.model('mymodel.txt', data = jags.data, inits=inits, n.chains=nc, n.adapt=n.adapt)
update(my.model, n.burn)
my.model_samples <- coda.samples(my.model,params,n.iter=n.iter, thin=thin)
summary(my.model_samples)

Stan version of a JAGS model which includes a sum of discrete values - Is it possible?

I was trying to run this model in Stan. I have a running JAGS version of it (that returns highly autocorrelated parameters) and I know how to formulate it as CDF of a double exponential (with two rates), which would probably run without problems. However, I would like to use this version as a starting point for similar but more complex models.
By now I have the suspicion that a model like this is not possible in Stan. Maybe because of the discreteness introduces by taking the sum of a Boolean value, Stan may not be able to calculate gradients.
Does anyone know whether this is the case, or if I do something else in a wrong way in this model? I paste the errors I get below the model code.
Many thanks in advance
Jan
Model:
data {
int y[11];
int reps[11];
real soas[11];
}
parameters {
real<lower=0.001,upper=0.200> v1;
real<lower=0.001,upper=0.200> v2;
}
model {
real dif[11,96];
real cf[11];
real p[11];
real t1[11,96];
real t2[11,96];
for (i in 1:11){
for (r in 1:reps[i]){
t1[i,r] ~ exponential(v1);
t2[i,r] ~ exponential(v2);
dif[i,r] <- (t1[i,r]+soas[i]<=(t2[i,r]));
}
cf[i] <- sum(dif[i]);
p[i] <-cf[i]/reps[i];
y[i] ~ binomial(reps[i],p[i]);
}
}
Here is some dummy data:
psy_dat = {
'soas' : numpy.array(range(-100,101,20)),
'y' : [47, 46, 62, 50, 59, 47, 36, 13, 7, 2, 1],
'reps' : [48, 48, 64, 64, 92, 92, 92, 64, 64, 48, 48]
}
And here are the errors:
DIAGNOSTIC(S) FROM PARSER:
Warning (non-fatal): Left-hand side of sampling statement (~) contains a non-linear transform of a parameter or local variable.
You must call increment_log_prob() with the log absolute determinant of the Jacobian of the transform.
Sampling Statement left-hand-side expression:
get_base1(get_base1(t1,i,"t1",1),r,"t1",2) ~ exponential_log(...)
Warning (non-fatal): Left-hand side of sampling statement (~) contains a non-linear transform of a parameter or local variable.
You must call increment_log_prob() with the log absolute determinant of the Jacobian of the transform.
Sampling Statement left-hand-side expression:
get_base1(get_base1(t2,i,"t2",1),r,"t2",2) ~ exponential_log(...)
And at runtime:
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
stan::prob::exponential_log(N4stan5agrad3varE): Random variable is nan:0, but must not be nan!
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Rejecting proposed initial value with zero density.
Initialization between (-2, 2) failed after 100 attempts.
Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model
Here is a working JAGS version of this model:
model {
for ( n in 1 : N ) {
for (r in 1 : reps[n]){
t1[r,n] ~ dexp(v1)
t2[r,n] ~ dexp(v2)
c[r,n] <- (1.0*((t1[r,n]+durs[n])<=t2[r,n]))
}
p[n] <- max((min(sum(c[,n]) / (reps[n]),0.99999999999999)), 1-0.99999999999999))
y[n] ~ dbin(p[n],reps[n])
}
v1 ~ dunif(0.0001,0.2)
v2 ~ dunif(0.0001,0.2)
}
With regard to the min() and max(): See this post https://stats.stackexchange.com/questions/130978/observed-node-inconsistent-when-binomial-success-rate-exactly-one?noredirect=1#comment250046_130978.
I'm still not sure what model you are trying to estimate (it would be best if you post the JAGS code) but what you have above cannot work in Stan. Stan is closer to C++ in the sense that you have to declare and then define objects. In your Stan program, you have the two declarations
real t1[11,96];
real t2[11,96];
but no definitions of t1 or t2. Consequently, they are initalized to NaN and when you do
t1[i,r] ~ exponential(v1);
that gets parsed as something like
for(i in 1:11) for(r in 1:reps[i]) lp__ += log(v1) - v1 * NaN
where lp__ is an internal symbol that holds value of the log-posterior, which becomes NaN and it cannot do Metropolis-style updates of the parameters.
Perhaps you meant for t1 and t2 to be unknown parameters, in which case they should be declared in the parameters block. The following [EDITED] Stan program is valid and should work, but it might not be the program you had in mind (it does not make a lot of sense to me and the discontinuity in dif will probably preclude Stan from sampling from this posterior distribution efficiently).
data {
int<lower=1> N;
int y[N];
int reps[N];
real soas[N];
}
parameters {
real<lower=0.001,upper=0.200> v1;
real<lower=0.001,upper=0.200> v2;
real t1[N,max(reps)];
real t2[N,max(reps)];
}
model {
for (i in 1:N) {
real dif[reps[i]];
for (r in 1:reps[i]) {
dif[r] <- (t1[i,r]+soas[i]) <= t2[i,r];
}
y[i] ~ binomial(reps[i], (1.0 + sum(dif)) / (1.0 + reps[i]));
}
to_array_1d(t1) ~ exponential(v1);
to_array_1d(t2) ~ exponential(v2);
}

Can we get the residuals from JAGS output?

Say we fit a Bayesian linear mixed model using JAGS (or WinBUGS), does the output object include the model residuals? How can we find the residuals?
Thanks!
A JAGS (BUGS) model simply outputs the values of the nodes in the model you have told it to monitor. To get residuals you need to define those in the model and then monitor them. For example
model {
bResponse ~ dnorm(0, 5^-2)
sResponse ~ dunif(0, 5)
for (i in 1:length(Response)) {
eResponse[i] <- bResponse
Response[i] ~ dlnorm(eResponse[i], sResponse^-2)
Residual[i] <- (log(Response[i]) - log(eResponse[i])) / sResponse
}
}
were Residual[i] defines a residual for each of the i values of the Response. Note the above example does not deal with specifying which values to monitor.