Predicting from the full posterior distribution using stan_glmer - predict

Could I ask for some help please?
I have fit a binomial model using stan_glmer and have picked the model which I think best fits the data. I have used the posterior predict command to compare my observed data to data simulated by the model and it seems very similar.
I now want to predict the probability of an event for different levels of the predictors. I would usually use the predict command in glmer but I know I should use the posterior_predict command for stan_glmer to take into account the full uncertainty in the model. If x1 and x2 are continuous predictors for a binary event and I want a random intercept on group, the model formula would be:
model <- stan_glmer(binary event ~ x1 + x2 +(1 | group), family="binomial"
My question is: I want to vary the predictors (x1 and x2) to see how the model predicts the observed data (and the variability in those predictions), maybe as a plot but I’m not sure how. Any help or guidance would be greatly appreciated.

In short, posterior_predict has a newdata argument that expects a data.frame with values of x1, x2, and group. This argument is similar to that in many other prediction functions and there is an example of using that can be executed via example(posterior_predict, package = "rstanarm").
In your case, it might be something like
nd <- with(original_data,
expand.grid(x1 = seq(from = min(x1), to = max(x1), length.out = 20),
x2 = seq(from = min(x2), to = max(x2), length.out = 20),
group = levels(group)))
PPD <- posterior_predict(model, newdata = nd)
but you could choose the values of x1 and x2 in various other ways.

Related

How to set bounds and constraints on Tensorflow Variables (tf.Variable)

I am using Tensorflow to minimize a function. The function takes about 10 parameters. Every single parameter has bounds, e.g. a minimum and a maximum value the parameter is allowed to take. For example, the parameter x1 needs to be between 1 and 10.
I also have a pair of parameters that need to have the following constraint x2 > x3. In other words, x2 must always be bigger than x3. (In addition to this, x2 and x3 also have bounds, similarly to the example of x1 above.)
I know that tf.Variable has a "constraint" argument, however I can't really find any examples or documentation on how to use this to achieve the bounds and constraints as mentioned above.
Thank you!
It seems to me (I can be mistaken) that constrained optimization (you can google for it in tensorflow) is not exactly the case for which tensroflow was designed. You may want to take a look at this repo, it may satisfy your needs, but as far as I understand, it's still not solving arbitrary constrained optimization, just some classification problems with labels and features, compatible with precision/recall scores.
If you want to use constraints on the tensorflow variable (i.e. some function applied after gradient step - which you can do manually also - by taking variable values, doing manipulations, and reassigning then), it means that you will be cutting variables after each step done using gradient in general space. It's a question whether you will successfully reach the right optimization goal this way, or your variables will stuck at boundaries, because general gradient will point somewhere outside.
My approach 1
If your problem is simple enough. you can try to parametrize your x2 and x3 as x2 = x3 + t, and then try to do cutting in the graph:
x3 = tf.get_variable('x3',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
t = tf.get_variable('t',
dtype=tf.float32,
shape=(1,),
initializer=tf.random_uniform_initializer(minval=1., maxval=10.),
constraint=lambda z: tf.clip_by_value(z, 1, 10))
x2 = x3 + t
Then, on a separate call additionally do
sess.run(tf.assign(x2, tf.clip_by_value(x2, 1.0, 10.0)))
But my opinion is that it won't work well.
My approach 2
I would also try to invent some loss terms to keep variables within constraints, which is more likely to work. For example, constraint for x2 to be in the interval [1,10] will be:
loss += alpha*tf.abs(tf.math.tan(((x-5.5)/4.5)*pi/2))
Here the expression under tan is brought to -pi/2,pi/2 and then tan function is used to make it grow very rapidly when it reaches boundaries. In this case I think you're more likely to find your optimum, but again the loss weight alpha might be too big and training will stuck somewhere nearby, if required value of x2 lies near the boundary. In this case you can try to use smaller alpha.
In addition to the answer by Slowpoke, reparameterization is another option. E.g. let's say you have a param p which should be bounded in [lower_bound,upper_bound], you could write:
p_inner = tf.Variable(...) # unbounded
p = tf.sigmoid(p_inner) * (upper_bound - lower_bound) + lower_bound
However, this will change the behavior of gradient descent.

How to shift a tensor using api in tensorflow, just like nump.roll() or shift ? [duplicate]

Lets say, that we do want to process images (or ndim vectors) using Keras/TensorFlow.
And we want, for fancy regularization, to shift each input by a random number of positions to the left (owerflown portions reappearing at the right side ).
How could it be viewed and solved:
1)
Is there any variation to numpy roll function for TensorFlow?
2)
x - 2D tensor
ri - random integer
concatenate(x[:,ri:],x[:,0:ri], axis=1) #executed for each single input to the layer, ri being random again and again (I can live with random only for each batch)
In TensorFlow v1.15.0 and up, you can use tf.roll which works just like numpy roll. https://github.com/tensorflow/tensorflow/pull/14953 .
To improve on the answer above you can do:
# size of x dimension
x_len = tensor.get_shape().as_list()[1]
# random roll amount
i = tf.random_uniform(shape=[1], maxval=x_len, dtype=tf.int32)
output = tf.roll(tensor, shift=i, axis=[1])
For older versions starting from v1.6.0 you will have to use tf.manip.roll :
# size of x dimension
x_len = tensor.get_shape().as_list()[1]
# random roll amount
i = tf.random_uniform(shape=[1], maxval=x_len, dtype=tf.int32)
output = tf.manip.roll(tensor, shift=i, axis=[1])
I just had to do this myself, and I don't think there is a tensorflow op to do np.roll unfortunately. Your code above looks basically correct though, except it doesn't roll by ri, rather by (x.shape[1] - ri).
Also you need to be careful in choosing your random integer that it is from range(1,x.shape[1]+1) rather than range(0,x.shape[1]), as if ri was 0, then x[:,0:ri] would be empty.
So what I would suggest would be something more like (for rolling along dimension 1):
x_len = x.get_shape().as_list()[1]
i = np.random.randint(0,x_len) # The amount you want to roll by
y = tf.concat([x[:,x_len-i:], x[:,:x_len-i]], axis=1)
EDIT: added missing colon after hannes' correct comment.

Define x and y limits for regression line in ggplot2

I am using ggplot2 to plot graphs, the basic aim is:
the graph has two layers, the lower layer (scatter plot) will use the data gathered from public database, and then I will add the data from my study on the top of it. I also added a regression line for my data. You can have a brief idea of what I have from this picture:
The problem is that, due the different dimensions of the two data sets, the regression lines are too long (full range), which makes the picture look strange. I want to define the x and y axis for the layer of my data, however, I just can not reach this.
For the regression, I use geom_abline to define the slope, intercept, etc, instead of using geom_lm, which I see can take the argument fullrange = FALSE.
Use stat_smooth with method = "lm" and se = FALSE (turns off confidence interval shadow).
ggplot(mpg, aes(displ, cty, color = as.factor(cyl))) +
geom_point() +
stat_smooth(method = "lm", se = FALSE) +
labs(color = "Cylinders",
x = "Displacement in Liters",
y = "Miles per Gallon")

Fit a bayesian linear regression and predict unobservable values

I'd like to use Jags plus R to adjust a linear model with observable quantities, and make inference about unobservable ones. I found lots of example on the internet about how to adjust the model, but nothing on how to extrapolate its coefficients after having fitted the model in the Jags environment. So, I'll appreciate any help on this.
My data looks like the following:
ngroups <- 2
group <- 1:ngroups
nobs <- 100
dta <- data.frame(group=rep(group,each=nobs),y=rnorm(nobs*ngroups),x=runif(nobs*ngroups))
head(dta)
JAGS has powerful ways to make inference about missing data, and once you get the hang of it, it's easy! I strongly recommend that you check out Marc Kéry's excellent book which provides a wonderful introduction to BUGS language programming (JAGS is close enough to BUGS that almost everything transfers).
The easiest way to do this involves, as you say, adjusting the model. Below I provide a complete worked example of how this works. But you seem to be asking for a way to get the prediction interval without re-running the model (is your model very large and computationally expensive?). This can also be done.
How to predict--the hard way (without re-running the model)
For each iteration of the MCMC, simulate the response for the desired x-value based on that iteration's posterior draws for the covariate values. So imagine you want to predict a value for X=10. Then if iteration 1 (post burn-in) has slope=2, intercept=1, and standard deviation=0.5, draw a Y-value from
Y=rnorm(1, 1+2*10, 0.5)
And repeat for iteration 2, 3, 4, 5...
These will be your posterior draws for the response at X=10. Note: if you did not monitor the standard deviation in your JAGS model, you are out of luck and need to fit the model again.
How to predict--the easy way--with worked example
The basic idea is to insert (into your data) the x-values whose responses you want to predict, with the associated y-values NA. For example, if you want a prediction interval for X=10, you just have to include the point (10, NA) in your data, and set a trace monitor for the y-value.
I use JAGS from R with the rjags package. Below is a complete worked example that begins by simulating the data, then adds some extra x-values to the data, specifies and runs the linear model in JAGS via rjags, and summarizes the results. Y[101:105] contains draws from the posterior prediction intervals for X[101:105]. Notice that Y[1:100] just contains the y-values for X[1:100]. These are the observed data that we fed to the model, and they never change as the model updates.
library(rjags)
# Simulate data (100 observations)
my.data <- as.data.frame(matrix(data=NA, nrow=100, ncol=2))
names(my.data) <- c("X", "Y")
# the linear model will predict Y based on the covariate X
my.data$X <- runif(100) # values for the covariate
int <- 2 # specify the true intercept
slope <- 1 # specify the true slope
sigma <- .5 # specify the true residual standard deviation
my.data$Y <- rnorm(100, slope*my.data$X+int, sigma) # Simulate the data
#### Extra data for prediction of unknown Y-values from known X-values
y.predict <- as.data.frame(matrix(data=NA, nrow=5, ncol=2))
names(y.predict) <- c("X", "Y")
y.predict$X <- c(-1, 0, 1.3, 2, 7)
mydata <- rbind(my.data, y.predict)
set.seed(333)
setwd(INSERT YOUR WORKING DIRECTORY HERE)
sink("mymodel.txt")
cat("model{
# Priors
int ~ dnorm(0, .001)
slope ~ dnorm(0, .001)
tau <- 1/(sigma * sigma)
sigma ~ dunif(0,10)
# Model structure
for(i in 1:R){
Y[i] ~ dnorm(m[i],tau)
m[i] <- int + slope * X[i]
}
}", fill=TRUE)
sink()
jags.data <- list(R=dim(mydata)[1], X=mydata$X, Y=mydata$Y)
inits <- function(){list(int=rnorm(1, 0, 5), slope=rnorm(1,0,5),
sigma=runif(1,0,10))}
params <- c("Y", "int", "slope", "sigma")
nc <- 3
n.adapt <-1000
n.burn <- 1000
n.iter <- 10000
thin <- 10
my.model <- jags.model('mymodel.txt', data = jags.data, inits=inits, n.chains=nc, n.adapt=n.adapt)
update(my.model, n.burn)
my.model_samples <- coda.samples(my.model,params,n.iter=n.iter, thin=thin)
summary(my.model_samples)

How to estimate parameters in a Bayes net using PyMC

I would like to estimate the parameters of a directed Bayes net using PyMC. I came across one particular example that implements the sprinkler network, which has 3 random variables and a conditional probability distribution (CPD) defined for each node.
However, this example has the CPD encoded using deterministic variables.
Is it possible to provide the joint or marginal distribution over 2 or 3 random variables as the observed data to a deterministic PyMC variable?
In other words, if my network is of the form X -> Z <- Y, is it possible to provide a set of tuples of the form 'x1,y1,z1' as observed data, to learn the parameters of the CPD (Z|X,Y)?
The sprinkler example is really setting "Static" probability values. In this line:
p_G = mc.Lambda('p_G', lambda S=S, R=R: pl.where(S, pl.where(R, .99, .9), pl.where(R, .8, 0.)),
doc='Pr[G|S,R]')
to my understanding, i think we would require to set learn one parameter for each value of the parent. so if we want to learn P(Z/X,Y), we will need for each combination of values of X and Y, learn one parameter set for Z.
so lets say X and Y take boolean values and Z is a bernoulli distribution.
for each value of (X,Y) , ie: (0,0),(0,1),(1,0),(1,1) we have parameters, p1,p2,p3,p4. And then Z has 4 pymc observed variables: Z1 with parameter p1, Z2 with parameter p2 , Z3 with parameter p3 and Z4 with parameter p4.
Thus:
P(Z=0/X=0,Y=0) is the mcmc estimated mean of p1.
P(Z=1/X=0,Y=0) = 1-p1
P(Z=0/X=1,Y=0) = p2 and so on....
I have a related question here:
How to use pymc to parameterize a probabilistic graphical model?