ggsurvplot return error: arguments imply differing number of rows - ggplot2

I use ggsurvplot to generate survival plots and encountered problems when my Surv fit function includes multiple censored data.
This is the data link: testdata
my fit function and my ggsurvplot function:
library(survival)
library(survminer)
cnsr_values <- c(1,2)
fit <- surv_fit(Surv(AVAL, !CNSR %in% cnsr_values) ~ TRTKEY, data = testdata)
myplot <- ggsurvplot(fit,data = testdata,conf.int = FALSE)
When the cnsr_values only have one numeric number, for example 1, then it works; when the cnsr_values have two numeric number, for example cnsr_values = c(1, 2), then it returns error: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1169, 2;
But if I put c(1,2) in the fit function as
fit <- surv_fit(Surv(AVAL, !CNSR %in% c(1,2) ~ TRTKEY, data = testdata)
then it also works.
I have compared the two fit functions, cannot find any differences, and the data sets in the ggsurvplot are also the same.
However, there must be some where makes it return the error.
I also tried to use testdata[1:1168,]in the ggsurvplot, it works.
Can anyone please help?

Related

I'm having making an animated line chart, problems with the X axis

I'm trying to animate a plot I have where the X axis is non-numeric. The plot itself looks great, but I get a few error messages trying to animate it using the transition_reveal function.
I've got a data set called df100m that tracks the times/speeds of 10 meter splits of the 100 meter dash for various Olympic runners. It looks like this.
splits
runners
times(s)
speed(mph)
10-20
Bolt_08
1.070
21.93
20-30
Bolt_08
0.910
24.58
84 more rows of different splits and runners omitted for space.
Plotting the average speed for this data set using stat_smooth looks great. I removed the reaction time (RT), the final time (TOTAL), and the starting 10m (Start-10), so that it only shows the numeric splits. Here is the code for the plot I have so far:
df100m %>%
filter(!grepl("RT", splits)) %>%
filter(!grepl("TOTAL", splits)) %>%
filter(!grepl("Start-10", splits)) %>%
ggplot(mapping = aes(x = splits, y = speed, col = runner, group = runner)) +
stat_smooth(method = loess, se = F, fullrange = F) +
theme(axis.text.x = element_text(angle = 90)) +
theme(aspect.ratio = 3/7) +
theme_solarized_2(light=F)
However when I add +transition_reveal(~splits) I get the following error message:
Error in seq.default(range[1], range[2], length.out = nframes) :
'from' must be a finite number
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
Playing around with it, I sometimes also get the "invalid 'times' argument" error.
I know there are a few problems with the X axis (splits), it's a character rather than numeric, but also has a dash (-). I've seen a few posts attempting to fix this error, but I am unable to fix it as I am a beginner. Could someone point me to the right direction?
Using some minimal made-up data, this is one possible approach creating the smoothed lines before the plotting, then basing the transition_reveal on the splits mutated to integers (as splits_int).
library(tidyverse)
library(gganimate)
library(broom)
tribble(~splits, ~speed, ~runner,
"10-20", 20.0, "A",
"20-30", 21.0, "A",
"30-40", 22.0, "A",
"10-20", 19.0, "B",
"20-30", 20.0, "B",
"30-40", 21.0, "B"
) %>%
mutate(splits_int = factor(splits) %>% as.integer()) %>%
nest(data = -runner) %>%
mutate(
lm_model = map(data, ~loess(speed ~ splits_int, data = .x)),
augmented = map(lm_model, augment) %>% map(select, .fitted)
) %>%
unnest(c(augmented, data)) %>%
ggplot(aes(splits, .fitted, col = runner, group = runner)) +
geom_line() +
transition_reveal(splits_int)
Created on 2022-12-10 with reprex v2.0.2

Is there a method for converting a winmids object to a mids object?

Suppose I create 10 multiply-imputed datasets and use the (wonderful) MatchThem package in R to create weights for my exposure variable. The MatchThem package takes a mids object and converts it to an object of the class winmids.
My desired output is a mids object - but with weights. I hope to pass this mids object to BRMS as follows:
library(brms)
m0 <- brm_multiple(Y|weights(weights) ~ A, data = mids_data)
Open to suggestions.
EDIT: Noah's solution below will unfortunately not work.
The package's first author, Farhad Pishgar, sent me the following elegant solution. It will create a mids object from a winmidsobject. Thank you Farhad!
library(mice)
library(MatchThem)
#"weighted.dataset" is our .wimids object
#Extracting the original dataset with missing value
maindataset <- complete(weighted.datasets, action = 0)
#Some spit-and-polish
maindataset <- data.frame(.imp = 0, .id = seq_len(nrow(maindataset)), maindataset)
#Extracting imputed-weighted datasets in the long format
alldataset <- complete(weighted.datasets, action = "long")
#Binding them together
alldataset <- rbind(maindataset, alldataset)
#Converting to .mids
newmids <- as.mids(alldataset)
Additionally, for BRMS, I worked out this solution which instead creates a list of dataframes. It will work in fewer steps.
library("mice")
library("dplyr")
library("MatchThem")
library("brms") # for bayesian estimation.
# Note, I realise that my approach here is not fully Bayesian, but that is a good thing! I need to ensure balance in the exposure.
# impute missing data
data("nhanes2")
imp <- mice(nhanes2, printFlag = FALSE, seed = 0, m = 10)
# MathThem. This is just a fast method
w_imp <- weightthem(hyp ~ chl + age, data = imp,
approach = "within",
estimand = "ATE",
method = "ps")
# get individual data frames with weights
out <- complete(w_imp, action ="long", include = FALSE, mild = TRUE)
# assemble individual data frames into a list
m <- 10
listdat<- list()
for (i in 1:m) {
listdat[[i]] <- as.data.frame(out[[i]])
}
# pass the list to brms, and it runs as it should!
fit_1 <- brm_multiple(bmi|weights(weights) ~ age + hyp + chl,
data = listdat,
backend = "cmdstanr",
family = "gaussian",
set_prior('normal(0, 1)',
class = 'b'))
brm_multiple() can take in a list of data frames for its data argument. You can produce this from the wimids object using complete(). The output of complete() with action = "all" is a mild object, which is a list of data frames, but this is not recognized by brm_multiple() as such. So, you can just convert it to a list. This should look like the following:
df_list <- complete(mids_data, "all")
class(df_list) <- "list"
m0 <- brm_multiple(Y|weights(weights) ~ A, data = df_list)
Using complete() automatically adds a weights column to the resulting imputed data frames.

stat_cor function in ggplot2: Print R and p-values on two lines

Is it possible to Correlation-values and the p-value on two lines instead of comma-separated as is the default:
default:
R=0.8, p=0.004
want:
R=0.8
p=0.004
The stat_cor function is from the ggpubr library (not base ggplot2). Regardless, the documentation for the function has your answer, which is to use the label.sep= argument in stat_cor. You can set that to "\n" to add a new line character as a separation and get the label over two lines. See the example in the documentation with the adjustment:
library(ggplot2)
library(ggpubr)
data("mtcars")
df <- mtcars
df$cyl <- as.factor(df$cyl)
sp <- ggscatter(df, x = "wt", y = "mpg",
add = "reg.line", # Add regressin line
add.params = list(color = "blue", fill = "lightgray"), # Customize reg. line
conf.int = TRUE # Add confidence interval
)
# Add correlation coefficient
sp + stat_cor(method = "pearson", label.x = 3, label.y = 30, label.sep='\n')

R dbWriteTable doesn't work with a column of all NA's and class not logical

I am trying to upload a table to SQL from R using dbWriteTable but I am having issues as some of my columns that have all NA's in them. I've learned that if the class is logical than it works but if it is anything else, it throws me an error.
Anybody have a solution? The columns which will have all NA's will be random so I can't just set a column to.logical() and I can't figure out a way to do it using lapply. I also do not want to get rid of these columns either.
Works
test <- data.frame(Name = c("Fred","Wilma","George"), Villians = c(2,4,3), Information = c(NA,NA,NA), stringsAsFactors = FALSE)
if (dbExistsTable(con, "test")) {dbRemoveTable(con, "test")}
dbWriteTable(con, name = "test", value = test, row.names = FALSE)
> sapply(test,class)
Name Villians Information
"character" "numeric" "logical"
Throws an error
test <- data.frame(Name = c("Fred","Wilma","George"), Villians = c(2,4,3), Information = c(NA,NA,NA), stringsAsFactors = FALSE)
test$Information <- as.character(test$Information)
if (dbExistsTable(con, "test")) {dbRemoveTable(con, "test")}
dbWriteTable(con, name = "test", value = test, row.names = FALSE)
Warning message:
In max(nchar(as.character(x)), na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
> sapply(test,class)
Name Villians Information
"character" "numeric" "character"
I am using a company server so if there is any configuration that needs to be made on that end, I may or may not be able to get it done.

How to change colors in stat_summary()

I am trying to plot two columns of raw data (I have used melt to combine them into one data frame) and then add separate error bars for each. However, I want to make the raw data for each column one pair of colors and the error bars another set of colors, but I can't seem to get it to work. The plot I am getting is at the link below. I want to have different color pairs for the raw data and for the error bars. A simple reproducible example is coded below, for illustrative purposes.
dat2.m<-data.frame(obs=c(2,4,6,8,12,16,2,4,6),variable=c("raw","raw","raw","ip","raw","ip","raw","ip","ip"),value=runif(9,0,10))
c <- ggplot(dat2.m, aes(x=obs, y=value, color=variable,fill=variable,size = 0.02)) +geom_jitter(size=1.25) + scale_colour_manual(values = c("blue","Red"))
c<- c+stat_summary(fun.data="median_hilow",fun.args=(conf.int=0.95),aes(color=variable), position="dodge",geom="errorbar", size=0.5,lty=1)
print(c)
[1]: http://i.stack.imgur.com/A5KHk.jpg
For the record: I think that this is a really, really bad idea. Unless you have a use case where this is crucial, I think you should re-examine your plan.
However, you can get around it by adding a new set of variables, padded with a space at the end. You will want/need to play around with the legends, but this should work (though it is definitely ugly):
dat2.m<- data.frame(obs=c(2,4,6,8,12,16,2,4,6),variable=c("raw","raw","raw","ip","raw","ip","raw","ip","ip"),value=runif(9,0,10))
c <- ggplot(dat2.m, aes(x=obs, y=value, color=variable,fill=variable,size = 0.02)) +geom_jitter(size=1.25) + scale_colour_manual(values = c("blue","Red","green","purple"))
c<- c+stat_summary(fun.data="median_hilow",fun.args=(conf.int=0.95),aes(color=paste(variable," ")), position="dodge",geom="errorbar", size=0.5,lty=1)
print(c)
One way around this would be to use repetitive calls to geom_point and stat_summary. Use the data argument of those functions to feed subsets of your dataset into each call, and set the color attribute outside of aes(). It's repetitive and somewhat defeats the compactness of ggplot, but it'd do.
c <- ggplot(dat2.m, aes(x = obs, y = value, size = 0.02)) +
geom_jitter(data = subset(dat2.m, variable == 'raw'), color = 'blue', size=1.25) +
geom_jitter(data = subset(dat2.m, variable == 'ip'), color = 'red', size=1.25) +
stat_summary(data = subset(dat2.m, variable == 'raw'), fun.data="median_hilow", fun.args=(conf.int=0.95), color = 'pink', position="dodge",geom="errorbar", size=0.5,lty=1) +
stat_summary(data = subset(dat2.m, variable == 'ip'), fun.data="median_hilow", fun.args=(conf.int=0.95), color = 'green', position="dodge",geom="errorbar", size=0.5,lty=1)
print(c)