How to replace all spaces in a data frame with underscores? - replaceall

Example_df <- tibble(
Site = c("Oyster Bed", "Rock Harbour"),
Species = c("Rhizophora mangle", "Avecinnia germinans"),
Colour = c("Red", "Black"))
so i would like to change to this for the whole data set...
Example_df <- tibble
(Site = c("Oyster_Bed", "Rock_Harbour"),
Species = c("Rhizophora_mangle", "Avecinnia_germinans"),
Colour = c("Red", "Black"))
Replace the spaces with underscores

Related

plot gam results with original x values (not scaled and centred)

I have a dataset that I am modeling with a gam. Because there are two continuous varaibles in the gam, I have centred and scaled these variables before adding them to the model. Therefore, when I use the built-in features in gratia to show the results, the x values are not the same as the original scale. I'd like to plot the results using the scale of the original data.
An example:
library(tidyverse)
library(mgcv)
library(gratia)
set.seed(42)
df <- data.frame(
doy = sample.int(90, 300, replace = TRUE),
year = sample(c(1980:2020), size = 300, replace = TRUE),
site = c(rep("A", 150), rep("B", 80), rep("C", 70)),
sex = sample(c("F", "M"), size = 300, replace = TRUE),
mass = rnorm(300, mean = 500, sd = 50)) %>%
mutate(doy.s = scale(doy, center = TRUE, scale = TRUE),
year.s = scale(year, center = TRUE, scale = TRUE),
across(c(sex, site), as.factor))
m1 <- gam(mass ~
s(year.s, site, bs = "fs", by = sex, k = 5) +
s(doy.s, site, bs = "fs", by = sex, k = 5) +
s(sex, bs = "re"),
data = df, method = "REML", family = gaussian)
draw(m1)
How do I re-plot the last two panels in this figure to show the relationship between year and mass with ggplot?
You can't do this with gratia::draw automatically (unless I'm mistaken).* But you can use gratia::smooth_estimates to get a dataframe which you can then do whatever you like with.
To answer your specific question: to re-plot the last two panels of the plot you provided, but with year unscaled, you can do the following
# Get a tibble of smooth estimates from the model
sm <- gratia::smooth_estimates(m1)
# Add a new column for the unscaled year
sm <- sm %>% mutate(year = mean(df$year) + (year.s * sd(df$year)))
# Plot the smooth s(year.s,site) for sex=F with year unscaled
pF <- sm %>% filter(smooth == "s(year.s,site):sexF" ) %>%
ggplot(aes(x = year, y = est, color=site)) +
geom_line() +
theme(legend.position = "none") +
labs(y = "Partial effect", title = "s(year.s,site)", subtitle = "By: sex; F")
# Plot the smooth s(year.s,site) for sex=M with year unscaled
pM <- sm %>% filter(smooth == "s(year.s,site):sexM" ) %>%
ggplot(aes(x = year, y = est, color=site)) +
geom_line() +
theme(legend.position = "none") +
labs(y = "Partial effect", title = "s(year.s,site)", subtitle = "By: sex; M")
library(patchwork) # use `patchwork` just for easy side-by-side plots
pF + pM
to get:
EDIT: If you also want to shift result on the y-axis as #GavinSimpson (who is the author and maintainer of gratia) mentioned, you can do this with add_constant, adding this code before plotting above:
sm <- sm %>%
add_constant(coef(m1)["(Intercept)"]) %>%
transform_fun(inv_link(m1))
[You should also in general untransform the smooth by the inverse of the model's link function. In your case this is just the identity, so it is not necessary, but in general it would be. That's what the second step above is doing.]
In your example, this results in:
*As mentioned in the custom-plotting vignette for gratia, the goal of draw not to be fully customizable, but just to be useful default. See there for recommendations about custom plots.

ggplot2_ combining line and barplot in one graph

Let's say I'm creating the grouped barplot by something like this:
data <- data.frame(time = factor(1:3), type = LETTERS[1:4], values = runif(24)*10)
ggplot(data, aes(x = type, y = values, fill = time)) +
stat_summary(fun=mean, geom='bar', width=0.55, size = 1, position=position_dodge(0.75))
Inside each type I want to connect all bar tops (meaning to connect 3 bars for A, 3 bars for B, and so on) with the line.
I'd like to get something like that as a result:
Is there a way to do that ?
Thank you!
I changed the code to another logic that I prefer, that is to prepare the data before using ggplot().
Code
library(dplyr)
library(ggplot2)
data <- data.frame(time = factor(1:3), type = LETTERS[1:4], values = runif(24)*10)
pdata <- data %>% group_by(type,time) %>% summarise(values = mean(values,na.rm = TRUE)) %>% ungroup()
pdata %>%
ggplot(aes(x = type, y = values)) +
geom_col(
mapping = aes(fill = time, group = time),
width = 0.55,
size = 1,
position = position_dodge(0.75)
)+
geom_line(
mapping = aes(group = type),
size = 1,
position = position_dodge2(.75)
)
Output

Add space argument to facet_wrap

facet_wrap() has been recognized for not having a space = "free" argument (https://github.com/tidyverse/ggplot2/issues/2933). This can causes spacing issues on the y-axis of plots.
Create the above figure using the following code:
library(tidyverse)
p <-
mtcars %>%
rownames_to_column() %>%
ggplot(aes(x = disp, y = rowname)) + geom_point() +
facet_wrap(~ carb, ncol = 1, scales = "free_y")
facet_grid on the other hand has a space = "free" argument. Allowing for nice y-axis spacing.
Create the above figure using the following code:
p <-
mtcars %>%
rownames_to_column() %>%
ggplot(aes(x = disp, y = rowname)) + geom_point() +
facet_grid(carb ~ ., scales = "free_y", space = "free_y")
The issue with this is that the label is on the side, not the top. I sometimes have longer facet labels and few rows in the facet. This means the facet label gets cut off.
There is a solution from the ggforce package (comment by ilarischeinin on https://github.com/tidyverse/ggplot2/issues/2933).
p <-
mtcars %>%
rownames_to_column() %>%
ggplot(aes(x = disp, y = rowname)) + geom_point()
p + ggforce::facet_col(vars(carb), scales = "free_y", space = "free")
But, there are limitations leaving ggplot2. For example, I ultimately want a two column figure, and this functionality does not seem possible with ggforce. Is there any way to produce the same result using facet_wrap() so that I can utilize the ncol() argument?
Here is a potential workaround based on https://stackoverflow.com/a/29022188/12957340 :
library(tidyverse)
library(gtable)
library(grid)
p1 <- mtcars %>%
rownames_to_column() %>%
ggplot(aes(x = disp, y = rowname)) + geom_point() +
facet_grid(carb ~ ., scales = "free_y", space = "free_y") +
theme(panel.spacing = unit(1, 'lines'),
strip.text.y = element_text(angle = 0))
gt <- ggplotGrob(p1)
panels <-c(subset(gt$layout, grepl("panel", gt$layout$name), se=t:r))
for(i in rev(panels$t-1)) {
gt = gtable_add_rows(gt, unit(0.5, "lines"), i)
}
panels <-c(subset(gt$layout, grepl("panel", gt$layout$name), se=t:r))
strips <- c(subset(gt$layout, grepl("strip-r", gt$layout$name), se=t:r))
stripText = gtable_filter(gt, "strip-r")
for(i in 1:length(strips$t)) {
gt = gtable_add_grob(gt, stripText$grobs[[i]]$grobs[[1]], t=panels$t[i]-1, l=5)
}
gt = gt[,-6]
for(i in panels$t) {
gt$heights[i-1] = unit(0.8, "lines")
gt$heights[i-2] = unit(0.2, "lines")
}
grid.newpage()
grid.draw(gt)
Created on 2021-12-15 by the reprex package (v2.0.1)
It's not clear to me what you mean by "I ultimately want a two column figure", but if you can come up with an example to illustrate your 'ultimate' expected outcome I can try to adapt this approach and see if it will work or not.

How can I make the line appear besides the dots?

I tried to plot my data but I can only get the points, if I put "linetype" with geom:line it does not appear. Besides, I have other columns in my data set, called SD, SD.1 and SD.2, which are standard deviation values I calculated previously that appear at the bottom. I would like to remove them from the plot and put them like error bars in the lines.
library(tidyr)
long_data <- tidyr::pivot_longer(
data=OD,
cols=-Days,
names_to="Strain",
values_to="OD")
ggplot(long_data, aes(x=Days, y=OD, color=Strain)) +
geom_line() + geom_point(shape=16, size=1.5) +
scale_color_manual(values=c("Wildtype"="darkorange2", "Winter"="cadetblue3", "Flagella_less"="olivedrab3"))+
labs(title="Growth curve",x="Days",y="OD750",color="Legend")+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5,color="black",size=8),
axis.text.y=element_text(angle=0,hjust=1,vjust=0.5,color="black",size=8),
plot.title=element_text(hjust=0.5, size=13,face = "bold",margin = margin(t=0, r=10,b=10,l=10)),
axis.title.y =element_text(size=10, margin=margin(t=0,r=10,b=0,l=0)),
axis.title.x =element_text(size=10, margin=margin(t=10,r=10,b=0,l=0)),
axis.line = element_line(size = 0.5, linetype = "solid",colour = "black"))

ggplot - line ordering, one line on top of the other

Here is my example:
library(ggplot2)
forecast <- c(2,2,1,2,2,3,2,3,3,3,3)
actual <- c(2,2,1,2,2,3,2,3,2,2,1)
my_df <- data.frame(forecast = forecast, actual = actual)
my_df$seq_order <- as.factor(1:NROW(my_df))
my_df <-gather(my_df, "line_type", "value", -seq_order)
ggplot(data=my_df, aes(x=seq_order, y = value,
colour = line_type, group=line_type))+geom_line()+theme(legend.position="bottom")
Here is how it looks:
I would like to have red line to be on top of blue line everywhere where they coincide. I tried scale_color_manual(values = c("forecast" = "red" ,"actual" = "blue")), but it did not work.
Change the factor level order. Don't forget to change the group too.
See this related thread, why I used scales::hue() etc
library(tidyverse)
forecast <- c(2,2,1,2,2,3,2,3,3,3,3)
actual <- c(2,2,1,2,2,3,2,3,2,2,1)
my_df <- data.frame(forecast = forecast, actual = actual, seq_order = 1:11)
my_df <-gather(my_df, line_type, value, -seq_order) %>% mutate(type = factor(line_type, levels = c('forecast','actual')))
ggplot(data=my_df, aes(x=seq_order, y = value,
colour = type, group = type)) +
geom_line()+
theme(legend.position="bottom") +
scale_color_manual(values = rev(scales::hue_pal()(2)))
Created on 2020-03-24 by the reprex package (v0.3.0)