How to avoid overdispersed Poisson regression overfitting? - bayesian

I have a dataset including three variables including company id (there are 96 companies), expert id (there are 38 experts) and points given by experts to companies. Points are discrete values from 0 to 100. I tried fitting an overdispersed poisson to model points given by the experts. But I don't know why the model overfits although I am using a linear likelihood. Here is my JAGS code:
model_code <- "
model
{
# Likelihood
for (i in 1:N) {
y[i] ~ dpois(exp(mu[i]))
mu[i] ~ dnorm(alpha[company[i]] + beta[expert[i]] , sigma^-2)
}
# Priors
for (j in 1:J){
alpha[j] ~ dnorm (mu.a, sigma.a^-2)
}
for (k in 1:K){
beta[k] ~ dnorm (mu.a, sigma.a^-2)
}
mu.a ~ dunif (0, 100)
sigma.a ~ dunif (0, 100)
sigma ~ dunif(0, 100)
}
"
Anyone knows why this model overfits and how to fix it?

Related

How to add an interaction term in JAGS between one categorical and one continuous variable?

I have mark recapture model in JAGS and I want to code a interaction between a categorical variable and a continuous variable.
ngr is the number of groups
nind is the number of individuals in my mark recapture model
gr.sp[ind] just searches in my database to which group belong individual ind
ngr is the number of groups
Som priors:
phi.precip ~ dnorm(0,0.01)
for(groups in 1:ngr) {
phi.gr[groups] ~ dnorm(0, 0.01)
}
Here is a small part of the likelihood of my model:
...
for(ind in 1:nind) {
for(yr in 1:nyear) {
logit(phi[ind,yr]) <- e.phi[ind,yr]
e.phi[ind,yr] <-
phi.gr[gr.sp[ind]] + # Categorical variable telling how much belonging to a certain group changes your fitness
phi.precip * sum.rainfall[yr] + # Effect of rain on my individuals
phi.gr.precip * phi.gr[gr.sp[ind]] * sum.rainfall[yr] # This is the interaction between the categorical and the continuous I'm trying to code.
}
...}
First, how do you define the prior for the phi.gr.precip? Should it be something resembling this:
for(groups in 1:ngr) {
phi.gr.precip[groups] ~ dnorm(0, 0.01)
}
But then, I don't know how to implement it in the likelihood.
Second, how is phi.gr.precip supposed to be coded to include the interaction between the group an individual is in (gr.sp[ind]) and the climate (sum.rainfall[yr], which represent the amount of rain in a year)?
Coding an interaction like like seems to require the same number of parameter in phi.gr.precip as there are in the categorical variable. But that would require me to loop inside the likelihood:
...
for(ind in 1:nind) {
for(yr in 1:nyear) {
logit(phi[ind,yr]) <- e.phi[ind,yr]
e.phi[ind,yr] <-
phi.gr[gr.sp[ind]] +
phi.precip * sum.rainfall[yr] +
for(groups in 1:ngr) {
phi.gr.precip[groups] * phi.gr[gr.sp[ind]] * sum.rainfall[yr]
}
}
...}
Which is not working when I run the model.
Your choice of prior looks reasonable.
Your likelihood is almost correct, but JAGS can't add a for-loop to a number. Instead, you need to move the for loop for groups up to wrap around the entire sum.
...
for(ind in 1:nind) {
for(yr in 1:nyear) {
for(groups in 1:ngr){ ### MOVE THE FOR-LOOP HERE
logit(phi[ind,yr]) <- e.phi[ind,yr]
e.phi[ind,yr] <-
phi.gr[gr.sp[ind]] +
phi.precip * sum.rainfall[yr] + phi.gr.precip[groups] * phi.gr[gr.sp[ind]] * sum.rainfall[yr]
}
}
...}

Using numpy roll in Keras

I'm trying to make a custom regularizer in Keras and I need to be able to roll the coefficient array.
I know this may be impossible however any mechanism that can replicate this roll function would be extremely appreciated.
```
def __call__(self, x):
regularization = 0.
# Add components if they are given
if self.l1:
# \lambda ||x||
regularization += self.l1 * K.sum(K.abs(x))
if self.fuse:
# \lambda \sum{ |x - x_+1| }
regularization += self.fuse * K.sum(K.abs(x - np.roll(x, 1)))
if self.abs_fuse:
# \lambda \sum{ ||x| - |x_+1|| }
regularization += self.abs_fuse * K.sum(K.abs(K.abs(x) - K.abs(np.roll(x, 1))))
```
Given that x is of shape (m, 1), a possible solution is to use tile:
def roll_reg(x):
length = K.int_shape(x)[0]
x_tile = K.tile(x, [2, 1])
x_roll = x_tile[length - 1:-1]
return K.sum(K.abs(x - x_roll))
It will result in some extra memory usage, but if x is a 1-dim vector, I guess the overhead won't be too bad.

WinBUGS Examples Vol 1, Dyes example returns error

Currently going through examples volume 1 and came across an error with the dyes example.
When I try to load inits from the example it returns "this chain contains uninitialized variables. I am not sure which part of it is not right as on the first sight I see theta, tau.btw and tau.with is all specified and nothing is left out.
I am using the code directly from Examples Vol 1 under help tab. The same error happened to all three choices of priors for between-variation.
I would really appreciate any advice on the problem. Thanks in advance.
Below is the code I copied directly from the dyes example.
model
{
for( i in 1 : batches ) {
mu[i] ~ dnorm(theta, tau.btw)
for( j in 1 : samples ) {
y[i , j] ~ dnorm(mu[i], tau.with)
}
}
theta ~ dnorm(0.0, 1.0E-10)
# prior for within-variation
sigma2.with <- 1 / tau.with
tau.with ~ dgamma(0.001, 0.001)
# Choice of priors for between-variation
# Prior 1: uniform on SD
#sigma.btw~ dunif(0,100)
#sigma2.btw<-sigma.btw*sigma.btw
#tau.btw<-1/sigma2.btw
# Prior 2: Uniform on intra-class correlation coefficient,
# ICC=sigma2.btw / (sigma2.btw+sigma2.with)
ICC ~ dunif(0,1)
sigma2.btw <- sigma2.with *ICC/(1-ICC)
tau.btw<-1/sigma2.btw
# Prior 3: gamma(0.001, 0.001) NOT RECOMMENDED
#tau.btw ~ dgamma(0.001, 0.001)
#sigma2.btw <- 1 / tau.btw
}
Data
list(batches = 6, samples = 5,
y = structure(
.Data = c(1545, 1440, 1440, 1520, 1580,
1540, 1555, 1490, 1560, 1495,
1595, 1550, 1605, 1510, 1560,
1445, 1440, 1595, 1465, 1545,
1595, 1630, 1515, 1635, 1625,
1520, 1455, 1450, 1480, 1445), .Dim = c(6, 5)))
Inits1
list(theta=1500, tau.with=1, sigma.btw=1)
Inits2
list(theta=1500, tau.with=1,ICC=0.5)
Inits3
list(theta=1500, tau.with=1, tau.btw=1)
That is not an error per se. Yes you have provided the inits for the parameters of interest.
However there are the six mu[i] variables that are not data, but are variables drawn from mu[i] ~ dnorm(theta, tau.btw).
You could provide initial values for these as well, but it is best imo to just click on gen inits if you are using WinBUGS from the GUI - this will provide initial values for those.

Gamma distribution in JAGS - Error in node

I'm trying to parameterise a gamma distribution in JAGS - with a piecewise linear predictor but my model fails to run with the following error message:
Error: Error in node (ashape/(aexp(mu[59]))) Invalid parent values
The model works when timber.recovery is drawn from a normal distribution, but the lower quantile predictions is less than zero, which is not biologically possible. I've tried a few tricks like adding 0.001 to the "mu" parameter in case it was drawing a zero, setting initial values based on outputs from a glm; but neither resolves the error message. any insights would be greatly appreciated [i'm using R2jags]. My model:
cat (
"model {
# UNINFORMATIVE PRIORS
sd_plot ~ dunif(0, 100)
tau_plot <- 1/(sd_plot * sd_plot)
# precision for plot level variance
alpha ~ dnorm(0, 1e-06)
# normal prior for intercept term
shape ~ dunif(0, 100)
# shape parameter for gamma
log_intensity ~ dnorm(0, 1e-06)
# uninformative prior for logging intensity
beta_1 ~ dnorm (0, 1e-06)
# uninformative prior; change in slope for first segment : <=3.6 years
beta_2 ~ dnorm (0, 1e-06)
# uninformative prior; change in slope for first segment : >3.6 years
InX_1 ~ dnorm (0, 1e-06)
# uniformative prior for interaction between tsl and log_intensity : <=3.6 years
InX_2 ~ dnorm (0, 1e-06)
# uniformative prior for interaction between tsl and log_intensity : >3.6 years
# PLOT LEVEL RANDOM EFFECTS
for (i in 1:nplots) {
plot_Eff[i] ~ dnorm(0,tau_plot)
}
for (i in 1:Nobs) {
# PIECEWISE LINEAR PREDICTOR
mu[i] <-
alpha +
beta_1 * (time.since.logged[i] * tsl.DUM1[i]) +
log_intensity * log.volume [i] +
beta_2 * (time.since.logged[i] * tsl.DUM2[i] - 3.6) +
beta_1 * (time.since.logged[i] * tsl.DUM2[i]) +
plot_Eff[plot.id[i]] +
InX_1 * (time.since.logged[i] * tsl.DUM1[i]) * log.volume [i] +
InX_2 * (time.since.logged[i] * tsl.DUM2[i] - 3.6) * log.volume[i] +
InX_1 * (time.since.logged[i] * tsl.DUM2[i]) * log.volume[i]
timber.recovery[i] ~ dgamma(shape,shape/exp(mu[i]))
# observed recovery
pred_timber_recovery[i] ~ dgamma(shape,shape/exp(mu[i]))
# posterior predictive distribution
pearson.residual[i] <-
(timber.recovery[i] - pred_timber_recovery[i]) / (sqrt(timber.recovery[i]))
}
}",
fill = TRUE,
file = "outputs/piecewise_TIMBER_MODEL_FINAL_GAMMA.txt")

OpenBUGS: Initializing the Model

I am having a problem in initializing the following model in OpenBUGS
model
{
#likelihood
for (t in 1:n) { yisigma2[t] <- 1/exp(theta[t]);
y[t] ~ dnorm(0,yisigma2[t]);
}
#Priors
mu ~ dnorm(0,0.1);
phistar ~ dbeta(20,1.5);
itau2 ~ dgamma(2.5,0.025);
beta <- exp(mu/2);
phi <- 2*phistar-1;
tau <- sqrt(1/itau2);
theta0~dnorm(mu, itau2)
thmean[1] <- mu + phi*(theta0-mu);
theta[1] ~ dnorm(thmean[1],itau2);
for (t in 2:n) { thmean[t] <- mu + phi*(theta[t-1]-mu);
theta[t] ~ dnorm(thmean[t],itau2);
}
}
This is my data
list(y=c(-0.0383 , 0.0019 ,......-0.0094),n=945)
And this is the list of my initials
list(phistar= 0.98, mu=0, itau2=50)
The checking of model, loading of data and compilation steps are ok. When loading initials, OpenBUGS says initial values are loaded but chain contains uninitialized variables. I then tried to initialize theta0 also but the problem persists. Could someone please help me regarding this?
Thanks
Khalid
I am newbie at OpenBugs but shouldn't you be specifying a distribution for inits rather than a single point value? something like?
inits <- function(){ list(alpha=rnorm(1), beta=rnorm(1), sigma = rlnorm(1))}