tbl_regression, mfx, and p values - tidyverse

This simple example throws an error for tbl_regression with mfx, but tbl_merge still works.
library("gtsummary")
library("mfx")
data(sleep)
m1 <- tbl_regression(glm(data=sleep, group~extra, family = "binomial"))
m2 <- tbl_regression(logitmfx(data=sleep, group~extra, atmean = TRUE))
tbl_merge(list(m1,m2), tab_spanner=c("",""))
However, tbl_merge breaks if I label the variables
library("labelled")
var_label(sleep$extra) <- "Hrs of sleep"
m1 <- tbl_regression(glm(data=sleep, group~extra, family = "binomial"))
m2 <- tbl_regression(logitmfx(data=sleep, group~extra, atmean = TRUE))
tbl_merge(list(m1,m2), tab_spanner=c("",""))
How do I add the marginal effects into a single table with labelled vars?
AND, how do I drop the CI's and change the p-values to asterisks?

Related

plot gam results with original x values (not scaled and centred)

I have a dataset that I am modeling with a gam. Because there are two continuous varaibles in the gam, I have centred and scaled these variables before adding them to the model. Therefore, when I use the built-in features in gratia to show the results, the x values are not the same as the original scale. I'd like to plot the results using the scale of the original data.
An example:
library(tidyverse)
library(mgcv)
library(gratia)
set.seed(42)
df <- data.frame(
doy = sample.int(90, 300, replace = TRUE),
year = sample(c(1980:2020), size = 300, replace = TRUE),
site = c(rep("A", 150), rep("B", 80), rep("C", 70)),
sex = sample(c("F", "M"), size = 300, replace = TRUE),
mass = rnorm(300, mean = 500, sd = 50)) %>%
mutate(doy.s = scale(doy, center = TRUE, scale = TRUE),
year.s = scale(year, center = TRUE, scale = TRUE),
across(c(sex, site), as.factor))
m1 <- gam(mass ~
s(year.s, site, bs = "fs", by = sex, k = 5) +
s(doy.s, site, bs = "fs", by = sex, k = 5) +
s(sex, bs = "re"),
data = df, method = "REML", family = gaussian)
draw(m1)
How do I re-plot the last two panels in this figure to show the relationship between year and mass with ggplot?
You can't do this with gratia::draw automatically (unless I'm mistaken).* But you can use gratia::smooth_estimates to get a dataframe which you can then do whatever you like with.
To answer your specific question: to re-plot the last two panels of the plot you provided, but with year unscaled, you can do the following
# Get a tibble of smooth estimates from the model
sm <- gratia::smooth_estimates(m1)
# Add a new column for the unscaled year
sm <- sm %>% mutate(year = mean(df$year) + (year.s * sd(df$year)))
# Plot the smooth s(year.s,site) for sex=F with year unscaled
pF <- sm %>% filter(smooth == "s(year.s,site):sexF" ) %>%
ggplot(aes(x = year, y = est, color=site)) +
geom_line() +
theme(legend.position = "none") +
labs(y = "Partial effect", title = "s(year.s,site)", subtitle = "By: sex; F")
# Plot the smooth s(year.s,site) for sex=M with year unscaled
pM <- sm %>% filter(smooth == "s(year.s,site):sexM" ) %>%
ggplot(aes(x = year, y = est, color=site)) +
geom_line() +
theme(legend.position = "none") +
labs(y = "Partial effect", title = "s(year.s,site)", subtitle = "By: sex; M")
library(patchwork) # use `patchwork` just for easy side-by-side plots
pF + pM
to get:
EDIT: If you also want to shift result on the y-axis as #GavinSimpson (who is the author and maintainer of gratia) mentioned, you can do this with add_constant, adding this code before plotting above:
sm <- sm %>%
add_constant(coef(m1)["(Intercept)"]) %>%
transform_fun(inv_link(m1))
[You should also in general untransform the smooth by the inverse of the model's link function. In your case this is just the identity, so it is not necessary, but in general it would be. That's what the second step above is doing.]
In your example, this results in:
*As mentioned in the custom-plotting vignette for gratia, the goal of draw not to be fully customizable, but just to be useful default. See there for recommendations about custom plots.

Is there a method for converting a winmids object to a mids object?

Suppose I create 10 multiply-imputed datasets and use the (wonderful) MatchThem package in R to create weights for my exposure variable. The MatchThem package takes a mids object and converts it to an object of the class winmids.
My desired output is a mids object - but with weights. I hope to pass this mids object to BRMS as follows:
library(brms)
m0 <- brm_multiple(Y|weights(weights) ~ A, data = mids_data)
Open to suggestions.
EDIT: Noah's solution below will unfortunately not work.
The package's first author, Farhad Pishgar, sent me the following elegant solution. It will create a mids object from a winmidsobject. Thank you Farhad!
library(mice)
library(MatchThem)
#"weighted.dataset" is our .wimids object
#Extracting the original dataset with missing value
maindataset <- complete(weighted.datasets, action = 0)
#Some spit-and-polish
maindataset <- data.frame(.imp = 0, .id = seq_len(nrow(maindataset)), maindataset)
#Extracting imputed-weighted datasets in the long format
alldataset <- complete(weighted.datasets, action = "long")
#Binding them together
alldataset <- rbind(maindataset, alldataset)
#Converting to .mids
newmids <- as.mids(alldataset)
Additionally, for BRMS, I worked out this solution which instead creates a list of dataframes. It will work in fewer steps.
library("mice")
library("dplyr")
library("MatchThem")
library("brms") # for bayesian estimation.
# Note, I realise that my approach here is not fully Bayesian, but that is a good thing! I need to ensure balance in the exposure.
# impute missing data
data("nhanes2")
imp <- mice(nhanes2, printFlag = FALSE, seed = 0, m = 10)
# MathThem. This is just a fast method
w_imp <- weightthem(hyp ~ chl + age, data = imp,
approach = "within",
estimand = "ATE",
method = "ps")
# get individual data frames with weights
out <- complete(w_imp, action ="long", include = FALSE, mild = TRUE)
# assemble individual data frames into a list
m <- 10
listdat<- list()
for (i in 1:m) {
listdat[[i]] <- as.data.frame(out[[i]])
}
# pass the list to brms, and it runs as it should!
fit_1 <- brm_multiple(bmi|weights(weights) ~ age + hyp + chl,
data = listdat,
backend = "cmdstanr",
family = "gaussian",
set_prior('normal(0, 1)',
class = 'b'))
brm_multiple() can take in a list of data frames for its data argument. You can produce this from the wimids object using complete(). The output of complete() with action = "all" is a mild object, which is a list of data frames, but this is not recognized by brm_multiple() as such. So, you can just convert it to a list. This should look like the following:
df_list <- complete(mids_data, "all")
class(df_list) <- "list"
m0 <- brm_multiple(Y|weights(weights) ~ A, data = df_list)
Using complete() automatically adds a weights column to the resulting imputed data frames.

How to solve "Error in knn: 'train' and 'class' have different lengths"

I'm trying to use the knn function (from the class package) on my dataset. It has 12 columns of features, and the 13th is what I want to be able to predict. I'm doing a 67/33 split.
This is my code so far:
nrow(Company_bankruptcy_papernorm)
random <- sample(nrow(Company_bankruptcy_papernorm),
size = 0.33*nrow(Company_bankruptcy_papernorm),replace = FALSE)
Company_bankruptcy_test <- Company_bankruptcy_papernorm[random,]
Company_bankruptcy_train <- Company_bankruptcy_papernorm[-random,]
able(Company_bankruptcy_paper$`Bankrupt?`)
table(Company_bankruptcy_paper$`Bankrupt?`[random]) *-> length 2250*
table(Company_bankruptcy_paper$`Bankrupt?`[-random]) *-> length 4569*
Bankruptcy_train_labels <- Company_bankruptcy_paper[-random,13]
Bankruptcy_test_labels <- Company_bankruptcy_paper[random,13]
length(Bankruptcy_train_labels) -> Answer: NULL
For KNN I tried
KNN_pred1 <- knn(train = Company_bankruptcy_train,
test = Company_bankruptcy_test,
cl = Bankruptcy_train_labels, k=83)
KNN_pred1 <- knn(train = Company_bankruptcy_train,
test = Company_bankruptcy_test,
cl = Bankruptcy_train_labels$`Bankrupt?`, k=83)
But both don't work.
What can I do?
Thank you in advance!
I got the data from: https://www.kaggle.com/datasets/fedesoriano/company-bankruptcy-prediction

replicate stargazer output like for linear models with spatial models

I am trying to reproduce nice stargazer model (lm) output for model that is not supperted by stargazer.
can linear model stargazer output be produced by hand? Since we can create a dataframe from every model and than insert the created dataframe to stargazer:
library(spdep)
data(afcon, package="spData")
afcon$Y = rnorm(42, 50, 20)
cns <- knearneigh(cbind(afcon$x, afcon$y), k=7, longlat=T)
scnsn <- knn2nb(cns, row.names = NULL, sym = T)
W <- nb2listw(scnsn, zero.policy = TRUE)
ols <- lm(totcon ~ Y, data = afcon)
spatial.lag <- lagsarlm(totcon ~ Y, data = afcon, W)
summary(model)
stargazer(ols, type = "text")
summary(spatial.lag)
data.frame(
spatial.lag$coefficients,
spatial.lag$rest.se
) %>%
rename(coeffs = spatial.lag.coefficients,
se = spatial.lag.rest.se) %>%
stargazer(type = "text", summary = F)
when we do stargazer(ols) output is very nice, I woud like to reproduce same output by hand for spatial.lag is there a way how to do so, how superscript etc...
You mean ^{*}? If so it's not possible in stargazer!! I've already tried it so I recommend you to check the xtable package like I did here.
I will show one approach that can be used: stargazer is really nice and you CAN even create table like above even with the model objects that are not yet supported, e.g. lets say that quantile regression model is not supported by stargazer (even thought is is):
Trick is, you need to be able to obtain coefficients and standart error e.g. as vector. Then supply stargazer with model object that is suppoerted e.g. lm as a template and then mechanically specify which coefficients and standart errors should be used:
library(stargazer)
library(tidyverse)
library(quantreg)
df <- mtcars
model1 <- lm(hp ~ factor(gear) + qsec + disp, data = df)
quantreg <- rq(hp ~ factor(gear) + qsec + disp, data = df)
summary_qr <- summary(quantreg, se = "boot")
# Standart Error for quant reg
se_qr = c(211.78266, 29.17307, 58.61105, 9.70908, 0.12090)
stargazer(model1, model1,
coef = list(NULL, summary_qr$coefficients),
se = list(NULL, se_qr),
type = "text")

shiny sliderInput range minimum and maximum values

I need to adjust a histogram output in shiny with a range values:
(where m is an arbitrary matrix)
#ui.R
sliderInput(inputId="adjust", label="Choose adjacency threshold", value=c(0.001, 0.9), min=0.0001, max=1),
plotOutput("hist")
#server.R
df<-reactive({
idx = m > min(input$adjust) & m < max(input$adjust)
data.frame(
id = row(m)[idx],
value = m[idx])
})
output$hist<-renderPlot({hist(df()$values)})
However this doesn't seem to affect the histogram- it re-renders when I toggle the slider but its the same each time.... it takes a very long time and it just seems to take all of the values in into account?
Does anyone know how to make this work?
when I try to print the slider's min/max value- nothing comes to the page:
#ui.R
verbatimTextOutput("x")
#server
output$x<-renderPrint({min(input$adjust)})
Hence I might be approaching this the completely wrong way... does anyone know how to do this?
FULL EXAMPLE
library(shiny)
runApp(list(ui = fluidPage(sliderInput(inputId="adjust", label="Choose adjacency threshold", value=c(0.001, 0.9), min=0.0001, max=1),
plotOutput("hist")
server=function(input, output){
adjacentmat<-reactive({adjacency(dat)})
data<-reactive({
adj_mat<-adjacentmat()
adj_mat[adj_mat < input$adjust] <- 0
m<-adj_mat
idx = m > min(input$adjust) & m < max(input$adjust)
data.frame(
source = row(m)[idx],
target = col(m)[idx],
corr = m[idx])
})
output$hist<-renderPlot({hist(data()$corr)})
}
)
generate the dat variable with the following code:
library('dplyr')
set.seed(1)
# generate a couple clusters
nodes_per_cluster <- 30
n <- 10
nvals <- nodes_per_cluster * n
# cluster 1 (increasing)
cluster1 <- matrix(rep((1:n)/4, nodes_per_cluster) +
rnorm(nvals, sd=1),
nrow=nodes_per_cluster, byrow=TRUE)
# cluster 2 (decreasing)
cluster2 <- matrix(rep((n:1)/4, nodes_per_cluster) +
rnorm(nvals, sd=1),
nrow=nodes_per_cluster, byrow=TRUE)
# noise cluster
noise <- matrix(sample(1:2, nvals, replace=TRUE) +
rnorm(nvals, sd=1.5),
nrow=nodes_per_cluster, byrow=TRUE)
dat <- rbind(cluster1, cluster2, noise)
colnames(dat) <- paste0('n', 1:n)
rownames(dat) <- c(paste0('cluster1_', 1:nodes_per_cluster),
paste0('cluster2_', 1:nodes_per_cluster),
paste0('noise_', 1:nodes_per_cluster))
This works for me:
library(shiny)
runApp(list(ui = fluidPage(
mainPanel(sliderInput("test", "Select values", value= c(.001,.9), min= 0.0001, max= 1)),
verbatimTextOutput("test2")
),
server = function(input, output, session) {
output$test2 <- renderPrint(min(input$test))
}))
I'm guessing your problem is somewhere in the code you haven't shown us. Can you give the code for the entire running example of your problem?