replicate stargazer output like for linear models with spatial models - dataframe

I am trying to reproduce nice stargazer model (lm) output for model that is not supperted by stargazer.
can linear model stargazer output be produced by hand? Since we can create a dataframe from every model and than insert the created dataframe to stargazer:
library(spdep)
data(afcon, package="spData")
afcon$Y = rnorm(42, 50, 20)
cns <- knearneigh(cbind(afcon$x, afcon$y), k=7, longlat=T)
scnsn <- knn2nb(cns, row.names = NULL, sym = T)
W <- nb2listw(scnsn, zero.policy = TRUE)
ols <- lm(totcon ~ Y, data = afcon)
spatial.lag <- lagsarlm(totcon ~ Y, data = afcon, W)
summary(model)
stargazer(ols, type = "text")
summary(spatial.lag)
data.frame(
spatial.lag$coefficients,
spatial.lag$rest.se
) %>%
rename(coeffs = spatial.lag.coefficients,
se = spatial.lag.rest.se) %>%
stargazer(type = "text", summary = F)
when we do stargazer(ols) output is very nice, I woud like to reproduce same output by hand for spatial.lag is there a way how to do so, how superscript etc...

You mean ^{*}? If so it's not possible in stargazer!! I've already tried it so I recommend you to check the xtable package like I did here.

I will show one approach that can be used: stargazer is really nice and you CAN even create table like above even with the model objects that are not yet supported, e.g. lets say that quantile regression model is not supported by stargazer (even thought is is):
Trick is, you need to be able to obtain coefficients and standart error e.g. as vector. Then supply stargazer with model object that is suppoerted e.g. lm as a template and then mechanically specify which coefficients and standart errors should be used:
library(stargazer)
library(tidyverse)
library(quantreg)
df <- mtcars
model1 <- lm(hp ~ factor(gear) + qsec + disp, data = df)
quantreg <- rq(hp ~ factor(gear) + qsec + disp, data = df)
summary_qr <- summary(quantreg, se = "boot")
# Standart Error for quant reg
se_qr = c(211.78266, 29.17307, 58.61105, 9.70908, 0.12090)
stargazer(model1, model1,
coef = list(NULL, summary_qr$coefficients),
se = list(NULL, se_qr),
type = "text")

Related

tidymodels: "following required column is missing from `new_data` in step..."

I'm creating and fitting a workflow for a lasso regression model in {tidymodels}. The model fits fine, but when I go to predict the test set I get an error saying "the following required column is missing from `new_data`". Tha column ("price") is in both the train and test sets. Is this a bug? What am I missing?
Any help would be greatly appreciated.
# split the data (target variable in house_sales_df is "price")
split <- initial_split(house_sales_df, prop = 0.8)
train <- split %>% training()
test <- split %>% testing()
# create and fit workflow
lasso_prep_recipe <-
recipe(price ~ ., data = train) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric())
lasso_model <-
linear_reg(penalty = 0.1, mixture = 1) %>%
set_engine("glmnet")
lasso_workflow <- workflow() %>%
add_recipe(lasso_prep_recipe) %>%
add_model(lasso_model)
lasso_fit <- lasso_workflow %>%
fit(data = train)
# predict test set
predict(lasso_fit, new_data = test)
predict() results in this error:
Error in `step_normalize()`:
! The following required column is missing from `new_data` in step 'normalize_MXQEf': price.
Backtrace:
1. stats::predict(lasso_fit, new_data = test, type = "numeric")
2. workflows:::predict.workflow(lasso_fit, new_data = test, type = "numeric")
3. workflows:::forge_predictors(new_data, workflow)
5. hardhat:::forge.data.frame(new_data, blueprint = mold$blueprint)
7. hardhat:::run_forge.default_recipe_blueprint(...)
8. hardhat:::forge_recipe_default_process(...)
10. recipes:::bake.recipe(object = rec, new_data = new_data)
12. recipes:::bake.step_normalize(step, new_data = new_data)
13. recipes::check_new_data(names(object$means), object, new_data)
14. cli::cli_abort(...)
You are getting the error because all_numeric() in step_normalize() selects the outcome price which isn't avaliable at predict time. Use all_numeric_predictors() and you should be good
# split the data (target variable in house_sales_df is "price")
split <- initial_split(house_sales_df, prop = 0.8)
train <- split %>% training()
test <- split %>% testing()
# create and fit workflow
lasso_prep_recipe <-
recipe(price ~ ., data = train) %>%
step_zv(all_predictors()) %>%
step_normalize(all_numeric_predictors())
lasso_model <-
linear_reg(penalty = 0.1, mixture = 1) %>%
set_engine("glmnet")
lasso_workflow <- workflow() %>%
add_recipe(lasso_prep_recipe) %>%
add_model(lasso_model)
lasso_fit <- lasso_workflow %>%
fit(data = train)
# predict test set
predict(lasso_fit, new_data = test)

Is there a method for converting a winmids object to a mids object?

Suppose I create 10 multiply-imputed datasets and use the (wonderful) MatchThem package in R to create weights for my exposure variable. The MatchThem package takes a mids object and converts it to an object of the class winmids.
My desired output is a mids object - but with weights. I hope to pass this mids object to BRMS as follows:
library(brms)
m0 <- brm_multiple(Y|weights(weights) ~ A, data = mids_data)
Open to suggestions.
EDIT: Noah's solution below will unfortunately not work.
The package's first author, Farhad Pishgar, sent me the following elegant solution. It will create a mids object from a winmidsobject. Thank you Farhad!
library(mice)
library(MatchThem)
#"weighted.dataset" is our .wimids object
#Extracting the original dataset with missing value
maindataset <- complete(weighted.datasets, action = 0)
#Some spit-and-polish
maindataset <- data.frame(.imp = 0, .id = seq_len(nrow(maindataset)), maindataset)
#Extracting imputed-weighted datasets in the long format
alldataset <- complete(weighted.datasets, action = "long")
#Binding them together
alldataset <- rbind(maindataset, alldataset)
#Converting to .mids
newmids <- as.mids(alldataset)
Additionally, for BRMS, I worked out this solution which instead creates a list of dataframes. It will work in fewer steps.
library("mice")
library("dplyr")
library("MatchThem")
library("brms") # for bayesian estimation.
# Note, I realise that my approach here is not fully Bayesian, but that is a good thing! I need to ensure balance in the exposure.
# impute missing data
data("nhanes2")
imp <- mice(nhanes2, printFlag = FALSE, seed = 0, m = 10)
# MathThem. This is just a fast method
w_imp <- weightthem(hyp ~ chl + age, data = imp,
approach = "within",
estimand = "ATE",
method = "ps")
# get individual data frames with weights
out <- complete(w_imp, action ="long", include = FALSE, mild = TRUE)
# assemble individual data frames into a list
m <- 10
listdat<- list()
for (i in 1:m) {
listdat[[i]] <- as.data.frame(out[[i]])
}
# pass the list to brms, and it runs as it should!
fit_1 <- brm_multiple(bmi|weights(weights) ~ age + hyp + chl,
data = listdat,
backend = "cmdstanr",
family = "gaussian",
set_prior('normal(0, 1)',
class = 'b'))
brm_multiple() can take in a list of data frames for its data argument. You can produce this from the wimids object using complete(). The output of complete() with action = "all" is a mild object, which is a list of data frames, but this is not recognized by brm_multiple() as such. So, you can just convert it to a list. This should look like the following:
df_list <- complete(mids_data, "all")
class(df_list) <- "list"
m0 <- brm_multiple(Y|weights(weights) ~ A, data = df_list)
Using complete() automatically adds a weights column to the resulting imputed data frames.

How can effects from fixed-effects model in plm be plotted in R?

I cannot get a plot for the effects I get from a fixed-effects model in plm. I tried using effect(), predict() and all kinds of packages like sjPlot, etc.
Is there a way of plotting it, especially also with interactions?
I always get error messages like:
Error in mod.matrix %*% scoef : non-conformable arguments
Try fixef? For instance, see below:
plm_2 <- plm(wealth ~ Volatility, data = ds_panel,index=c("rho"), model = "within")
y1 <- fixef(plm_2)
x1 <- as.numeric(names(y1))
plot(y1~x1, pch = 20, ylab = "FE", xlab = expression(rho))

Smoothing geom_ribbon

I've created a plot with geom_line and geom_ribbon (image 1) and the result is okay, but for the sake of aesthetics, I'd like the line and ribbon to be smoother. I know I can use geom_smooth for the line (image 2), but I'm not sure if it's possible to smooth the ribbon.I could create a geom_smooth line for the top and bottom lines of the ribbon (image 3), but is there anyway to fill in the space between those two lines?
A principled way to achieve what you want is to fit a GAM model to your data using the gam() function in mgcv and then apply the predict() function to that model over a finer grid of values for your predictor variable. The grid can cover the span defined by the range of observed values for your predictor variable. The R code below illustrates this process for a concrete example.
# load R packages
library(ggplot2)
library(mgcv)
# simulate some x and y data
# x = predictor; y = response
x <- seq(-10, 10, by = 1)
y <- 1 - 0.5*x - 2*x^2 + rnorm(length(x), mean = 0, sd = 20)
d <- data.frame(x,y)
# plot the simulated data
ggplot(data = d, aes(x,y)) +
geom_point(size=3)
# fit GAM model
m <- gam(y ~ s(x), data = d)
# define finer grid of predictor values
xnew <- seq(-10, 10, by = 0.1)
# apply predict() function to the fitted GAM model
# using the finer grid of x values
p <- predict(m, newdata = data.frame(x = xnew), se = TRUE)
str(p)
# plot the estimated mean values of y (fit) at given x values
# over the finer grid of x values;
# superimpose approximate 95% confidence band for the true
# mean values of y at given x values in the finer grid
g <- data.frame(x = xnew,
fit = p$fit,
lwr = p$fit - 1.96*p$se.fit,
upr = p$fit + 1.96*p$se.fit)
head(g)
theme_set(theme_bw())
ggplot(data = g, aes(x, fit)) +
geom_ribbon(aes(ymin = lwr, ymax = upr), fill = "lightblue") +
geom_line() +
geom_point(data = d, aes(x, y), shape = 1)
This same principle would apply if you were to fit a polynomial regression model to your data using the lm() function.

Hessian matrix with optim or numderiv package?

I do maximum likelihood estimation for a loglikelihood function of a poisson distribution. After the estimation i want to compute the standard errors of the coeffients. For that i need the hessian matrix. Now i dont know which function i should use to get the hessian matrix , optim() or hessian() from the numderiv package.
Both function give me different solution. And if i try to compute Standard errors from the hessian that i get from optim i get one NaN entry in the result.
Whats the difference between these two functions for the compution of the hessian matrix?
logLikePois <- function(parameter,y, z) {
betaKoef <- parameter
lambda <- exp(betaKoef %*% t(z))
logLikeliHood <- -(sum(-lambda+y*log(lambda)-log(factorial(y))))
return(logLikeliHood)
}
grad <- function (y,z,parameter){
betaKoef <- parameter
# Lamba der Poissonregression
lambda <- exp(betaKoef%*%t(z))
gradient <- -((y-lambda)%*%(z))
return(gradient)
}
data(discoveries)
disc <- data.frame(count=as.numeric(discoveries),
year=seq(0,(length(discoveries)-1),1))
yearSqr <- disc$year^2
formula <- count ~ year + yearSqr
form <- formula(formula)
model <- model.frame(formula, data = disc)
z <- model.matrix(formula, data = disc)
y <- model.response(model)
parFullModell <- rep(0,ncol(z))
optimierung <- optim(par = parFullModell,gr=grad, fn = logLikePois,
z = z, y = y, method = "BFGS" ,hessian = TRUE)
optimHessian <- optimierung$hessian
numderivHessian <- hessian(func = logLikePois, x = optimierung$par, y=y,z=z)
sqrt(diag(solve(optimHessian)))
sqrt(diag(solve(numderivHessian )))