Automatically assigning p-value position in ggplot loop - ggplot2

I am running an mapply loop on a huge set of data to graph 13 parameters for 19 groups. This is working great except the p-value position. Due to the data varying for each plot I cannot assign position using label.y = 125 for example, in some plots it is in the middle of the bar/error bar. However, I can't assign it higher without having it way to high on other graphs. Is there a way to adjust to the data and error bars?
This is my graphing function and like I said the graph is great, except p-value position. Specifically, the stat compare means anova line.
ANOVA_plotter <- function(Variable, treatment, Grouping, df){
Inputdf <- df %>%
filter(Media == treatment, Group == Grouping) %>%
ggplot(aes_(x = ~ID, y = as.name(Variable))) +
geom_bar(aes(fill = ANOVA_Status), stat = "summary", fun = "mean", width = 0.9) +
stat_summary(geom = "errorbar", fun.data = "mean_sdl", fun.args = list(mult = 1), size = 1) +
labs(title = paste(Variable, "in", treatment, "in Group", Grouping, sep = " ")) +
theme(legend.position = "none",axis.title.x=element_blank(), axis.text = element_text(face="bold", size = 18 ), axis.text.x = element_text(angle = 45, hjust = 1)) +
stat_summary(geom = "errorbar", fun.data = "mean_sdl", fun.args = list(mult = 1), width = 0.2) +
stat_compare_means(method = "anova", label.y = 125) +
stat_compare_means(label = "p.signif", method = "t.test", paired = FALSE, ref.group = "Control")
}
I get graphs that look like this
(https://i.stack.imgur.com/hV9Ad.jpg)
But I can't assign it to label.y = 200 because of plots like this
(https://i.stack.imgur.com/uStez.jpg)

Related

Why are Y axis labels offset in ggplot?

The Y axis labels are offset
The Y axis labels from 100 to the top are aligned to the right, while from 90 to the bottom are aligned to the left. I've looked at many paramaters and I couldn't find one to be causing this. Also, I haven't found anyone else with this same issue.
Here's my code:
test <- ggplot(data,
aes(x=Month, y=Value, color=Name, group=Name, fill=Name))+
geom_line(size=3)+
geom_point(size=5)+
scale_color_manual(values=c("#FBE785","#0F5F00","#FFC300","#1BFFAA"))+
ylab ("")+
xlab ("")+
labs(caption = paste("Fonte: Fred e IBGE.")) +
scale_x_date(date_labels = "%b/%y", breaks = "6 month", expand=c(0,0))+
coord_cartesian(clip = "off")+
theme_minimal() +
guides(fill=guide_legend())+
theme(panel.background = element_rect(fill= "#122929",color = "#122929"),
plot.background = element_rect(fill = "#122929"),
panel.grid.major = element_line(color = "#4D4B55", size =0.1),
panel.grid.minor = element_line(color= "#4D4B55", size =0.1),
panel.grid = element_blank(),
axis.text.y = element_text(vjust = 1, hjust=-1),
axis.text.x = element_text(vjust = -1, hjust=0),
legend.title = element_blank(),
legend.position = "bottom",
legend.key.width = unit(1.5, "cm"),
plot.caption = element_text(family = "Abel",vjust = -1, hjust = 0,colour="#4D4B55", size= 30),
text = element_text(family = "Abel", color = "#4D4B55",size = 35),
plot.margin = margin(1,1,1.5,1.2, "cm"))
ggsave("./test.png", width = 21, height = 15, dpi = 300)
PS: Not sharing the data itself because I guess that's not where the problem is.
Thanks!
Your text is misaligned because of your axis.text.y argument. Change hjust to 1 and it will be properly aligned. I have provided a minimal reproducible example below.
library(tidyverse)
Month <- rep(x = month.abb, times = 10)
Value <- sample(x = 10:120, size = 120, replace = TRUE)
Name <- sample(x = LETTERS[1:4], size = 120, replace = TRUE)
data <- data.frame(Month, Value, Name)
ggplot(data, aes(x = Month, y = Value, color = Name, group = Name, fill = Name)) +
geom_line(size = 1) + geom_point(size = 2) +
theme(axis.text.y = element_text(vjust = 1, hjust = 1)) # <- problem here

Creating graph on ggplot2 with facte_wrap using multiple seperate variables

For my thesis I am using R-studio. I want to make a graph on ggplot2 with x= age(H2_lft) and y = IMT value (Mean_IMT_alg). I want to plot a graph with multiple variables(cardiovascular risk factors) to see the relationship between a certain variable/cardiovascular risk factor (e.g. smoking(H2_roken)/gender(H1_geslacht)/ethnicity(H1_EtnTotaal) and the IMT value on a certain age.
First, I plotted multiple lines (each line represented a variable) in a graph. But I think this is a little too messy. I actually want to have multiple 'pannels/graphs' with x= age and y = IMT value. And in every graph I want to have a different variable.
I hope my explanation is clear enough and someone can help me :)
My first code (multiple lines in same plot) is:
t <- ggplot(data = Dataset, aes(x = H2_lft, y = MeanIMT_alg)) +
geom_smooth(se = FALSE, aes(group = H1_EtnTotaal, colour = H1_EtnTotaal)) +
geom_smooth(se = FALSE, aes(group = H2_Roken, colour = H2_Roken)) +
geom_smooth(se = FALSE, aes(group = H1_geslacht, colour = H1_geslacht)) +
stat_smooth(method = lm, se=FALSE) +
theme_classic()
t + labs(x = "Age (years)", y = "Mean IMT (mm)", title ="IMT", caption = "Figure 2: mean IMT", color = "cardiovascular risk factors", fil = "cardiovascular risk factors")
To accomplish multiple panels i used 'facet_wrap'. The problem however is that when using 'groups' in facet_Wrap, R makes groups that proceed on each other. But i want the groups to be unrelated of eachother. For example: I want one graph with a line for Marroccan ethnicity, one line with current smoking and one line with Male participants. I do not want a graph with: morroccan women that currently smoke or: Dutch men that never smoked. So, I want the graph with all the lines but split into several graphs.
The code that I used to accomplish this is:
t <- ggplot(data = Dataset, aes(x = H2_lft, y = MeanIMT_alg)) +
geom_smooth(se = FALSE, aes(group = H1_EtnTotaal, colour = H1_EtnTotaal)) +
geom_smooth(se = FALSE, aes(group = H2_Roken, colour = H2_Roken)) +
geom_smooth(se = FALSE, aes(group = H1_geslacht, colour = H1_geslacht)) +
stat_smooth(method = lm, se=FALSE)+
facet_wrap(~H1_EtnTotaal + ~H2_Roken + ~H1_geslacht, scales = "free_y") +
theme_classic()
t + labs(x = "Age (years)", y = "Mean IMT (mm)", title ="IMT", caption = "Figure 2: mean IMT", color = "cardiovascular risk factors", fil = "cardiovascular risk factors")
I think it might be generally easier to reshape the data to a long format for plotting with ggplot2. If you want seperate legends for each of the categories, you can use the {ggnewscale} package to do so. Is this (approximately) what you're looking for?
library(ggnewscale)
library(ggplot2)
# Dummy data
Dataset <- data.frame(
H2_lft = runif(100, 18, 90),
MeanIMT_alg = rnorm(100),
H1_EtnTotaal = sample(LETTERS[1:5], 100, replace = TRUE),
H2_Roken = sample(LETTERS[6:8], 100, replace = TRUE),
H1_geslacht = sample(c("M", "F"), 100, replace = TRUE)
)
# Reshape data to long format
new <- tidyr::pivot_longer(Dataset, c(H1_EtnTotaal, H2_Roken, H1_geslacht))
ggplot(new, aes(H2_lft, MeanIMT_alg, group = value)) +
geom_smooth(
data = ~ subset(.x, name == "H1_EtnTotaal"),
aes(colour = value),
se = FALSE
) +
scale_colour_discrete(name = "EtnTotaal") +
new_scale_colour() +
geom_smooth(
data = ~ subset(.x, name == "H1_geslacht"),
aes(colour = value),
se = FALSE
) +
scale_colour_discrete(name = "geslacht") +
new_scale_colour() +
geom_smooth(
data = ~ subset(.x, name == "H2_Roken"),
aes(colour = value),
se = FALSE
) +
scale_colour_discrete(name = "Roken") +
geom_smooth(
method = lm, se = FALSE,
aes(group = NULL)
) +
facet_wrap(~ name)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
Created on 2022-11-04 by the reprex package (v2.0.0)

Start ggplot continuous axis with a squiggly line break? [duplicate]

I have a dataframe (dat) with two columns 1) Month and 2) Value. I would like to highlight that the x-axis is not continuous in my boxplot by interrupting the x-axis with two angled lines on the x-axis that are empty between the angled lines.
Example Data and Boxplot
library(ggplot2)
set.seed(321)
dat <- data.frame(matrix(ncol = 2, nrow = 18))
x <- c("Month", "Value")
colnames(dat) <- x
dat$Month <- rep(c(1,2,3,10,11,12),3)
dat$Value <- rnorm(18,20,2)
ggplot(data = dat, aes(x = factor(Month), y = Value)) +
geom_boxplot() +
labs(x = "Month") +
theme_bw() +
theme(panel.grid = element_blank(),
text = element_text(size = 16),
axis.text.x = element_text(size = 14, color = "black"),
axis.text.y = element_text(size = 14, color = "black"))
The ideal figure would look something like below. How can I make this discontinuous axis in ggplot?
You could make use of the extended axis guides in the ggh4x package. Alas, you won't easily be able to create the "separators" without a hack similar to the one suggested by user Zhiqiang Wang
guide_axis_truncated accepts vectors to define lower and upper trunks. This also works for units, by the way, then you have to pass the vector inside the unit function (e.g., trunc_lower = unit(c(0,.45), "npc") !
library(ggplot2)
library(ggh4x)
set.seed(321)
dat <- data.frame(matrix(ncol = 2, nrow = 18))
x <- c("Month", "Value")
colnames(dat) <- x
dat$Month <- rep(c(1,2,3,10,11,12),3)
dat$Value <- rnorm(18,20,2)
# this is to make it slightly more programmatic
x1end <- 3.45
x2start <- 3.55
p <-
ggplot(data = dat, aes(x = factor(Month), y = Value)) +
geom_boxplot() +
labs(x = "Month") +
theme_classic() +
theme(axis.line = element_line(colour = "black"))
p +
guides(x = guide_axis_truncated(
trunc_lower = c(-Inf, x2start),
trunc_upper = c(x1end, Inf)
))
Created on 2021-11-01 by the reprex package (v2.0.1)
The below is taking user Zhiqiang Wang's hack a step further. You will see I am using simple trigonometry to calculate the segment coordinates. in order to make the angle actually look as it is defined in the function, you would need to set coord_equal.
# a simple function to help make the segments
add_separators <- function(x, y = 0, angle = 45, length = .1){
add_y <- length * sin(angle * pi/180)
add_x <- length * cos(angle * pi/180)
## making the list for your segments
myseg <- list(x = x - add_x, xend = x + add_x,
y = rep(y - add_y, length(x)), yend = rep(y + add_y, length(x)))
## this function returns an annotate layer with your segment coordinates
annotate("segment",
x = myseg$x, xend = myseg$xend,
y = myseg$y, yend = myseg$yend)
}
# you will need to set limits for correct positioning of your separators
# I chose 0.05 because this is the expand factor by default
y_sep <- min(dat$Value) -0.05*(min(dat$Value))
p +
guides(x = guide_axis_truncated(
trunc_lower = c(-Inf, x2start),
trunc_upper = c(x1end, Inf)
)) +
add_separators(x = c(x1end, x2start), y = y_sep, angle = 70) +
# you need to set expand to 0
scale_y_continuous(expand = c(0,0)) +
## to make the angle look like specified, you would need to use coord_equal()
coord_cartesian(clip = "off", ylim = c(y_sep, NA))
I think it is possible to get what you want. It may take some work.
Here is your graph:
library(ggplot2)
set.seed(321)
dat <- data.frame(matrix(ncol = 2, nrow = 18))
x <- c("Month", "Value")
colnames(dat) <- x
dat$Month <- rep(c(1,2,3,10,11,12),3)
dat$Value <- rnorm(18,20,2)
p <- ggplot(data = dat, aes(x = factor(Month), y = Value)) +
geom_boxplot() +
labs(x = "Month") +
theme_bw() +
theme(panel.grid = element_blank(),
text = element_text(size = 16),
axis.text.x = element_text(size = 14, color = "black"),
axis.text.y = element_text(size = 14, color = "black"))
Here is my effort:
p + annotate("segment", x = c(3.3, 3.5), xend = c(3.6, 3.8), y = c(14, 14), yend = c(15, 15))+
coord_cartesian(clip = "off", ylim = c(15, 25))
Get something like this:
If you want to go further, it may take several tries to get it right:
p + annotate("segment", x = c(3.3, 3.5), xend = c(3.6, 3.8), y = c(14, 14), yend = c(15, 15))+
annotate("segment", x = c(0, 3.65), xend = c(3.45, 7), y = c(14.55, 14.55), yend = c(14.55, 14.55)) +
coord_cartesian(clip = "off", ylim = c(15, 25)) +
theme_classic()+
theme(axis.line.x = element_blank())
Just replace axis with two new lines. This is a rough idea, it may take some time to make it perfect.
You could use facet_wrap. If you assign the first 3 months to one group, and the other months to another, then you can produce two plots that are side by side and use a single y axis.
It's not exactly what you want, but it will show the data effectively, and highlights the fact that the x axis is not continuous.
dat$group[dat$Month %in% c("1", "2", "3")] <- 1
dat$group[dat$Month %in% c("10", "11", "12")] <- 2
ggplot(data = dat, aes(x = factor(Month), y = Value)) +
geom_boxplot() +
labs(x = "Month") +
theme_bw() +
theme(panel.grid = element_blank(),
text = element_text(size = 16),
axis.text.x = element_text(size = 14, color = "black"),
axis.text.y = element_text(size = 14, color = "black")) +
facet_wrap(~group, scales = "free_x")
* Differences in the plot are likely due to using different versions of R where the set.seed gives different result

Colours don't show up in bar plot using ggplot2

When I try to assign colours using colour = "" or fill = """, the graph changes its colour always to the same colour (some kind of weird orange tone). The specific code I used is this:
Plot <- ggplot(data, aes(ymin = 0)) + geom_rect(aes(xmin = left, xmax = right, ymax = a, colour = "#FDFEFE")).
Has anyone had this problem before? It doesn`t seem to matter whether I use the colour names or the HTML codes, the result stays the same.
Thank you!
I just wanted to add some explanation because this also tripped me up when I started using ggplot.
Some toy data:
data <- tibble(a = 1:10, left = 1:10 * 20, right = 1:10 * 20 + 10)
As explained by #stefan, you need to set the hard-coded color outside of the aestetics:
ggplot(data, aes(ymin = 0)) +
geom_rect(aes(xmin = left, xmax = right, ymax = a), fill = "#FDFEFE")
The aestetics are meant to link the plot to your data table. When you write this:
ggplot(data, aes(ymin = 0)) +
geom_rect(aes(xmin = left, xmax = right, ymax = a, fill = "#FDFEFE"))
It is like having the following:
data <- tibble(a = 1:10, left = 1:10 * 20, right = 1:10 * 20 + 10, col = "#FDFEFE")
ggplot(data, aes(ymin = 0)) +
geom_rect(aes(xmin = left, xmax = right, ymax = a, fill = col))
Except that ggplot understands "#FDFEFE" as a categorical value, not as a color. Having "#FDFEFE" or "banana" is the same to ggplot:
ggplot(data, aes(ymin = 0)) +
geom_rect(aes(xmin = left, xmax = right, ymax = a, fill = "banana"))
or
data <- tibble(a = 1:10, left = 1:10 * 20, right = 1:10 * 20 + 10, col = "banana")
ggplot(data, aes(ymin = 0)) +
geom_rect(aes(xmin = left, xmax = right, ymax = a, fill = col))
assign default colours to the categorical data.
As an extra, if you want to assign specific colors to different entries in the table, it is best to use a scale_*_manual layer:
data <- tibble(a = 1:10, left = 1:10 * 20, right = 1:10 * 20 + 10,
col = sample(c("banana", "orange", "coconut"), 10, replace = T))
ggplot(data, aes(ymin = 0)) +
geom_rect(aes(xmin = left, xmax = right, ymax = a, fill = col)) +
scale_fill_manual(values = c("banana" = "yellow2", "orange" = "orange2", "coconut" = "burlywood4"))
If you wanted to hard-code the colors in the table, you would have to use this column outside to the aestetics:
data <- tibble(a = 1:10, left = 1:10 * 20, right = 1:10 * 20 + 10,
col = sample(c("yellow2", "orange2", "burlywood4"), 10, replace = T))
ggplot(data, aes(ymin = 0)) +
geom_rect(aes(xmin = left, xmax = right, ymax = a), fill = data$col)
But it is best to use meaningful categorical values and assign the colors in the scale layer. This is what the grammar of graphics is all about!

geom_vline() dateRangeInput()

I have set up a line graph in shiny. The x axis has dates covering 2014 to current date.
I have set up various vertical lines using geom_vline() to highlight points in the data.
I'm using dateRangeInput() so the user can choose the start/end date range to look at on the graph.
One of my vertical lines is in Feb 2014. If the user uses the dateRangeInput() to look at dates from say Jan 2016 the vertical line for Feb 2014 is still showing on the graph. This is also causing the x axis to go from 2014 even though the data line goes from Jan 2016 to current date.
Is there a way to stop this vertical line showing on the graph when it's outside of the dataRangeInput()? Maybe there's an argument in geom_vline() to deal with this?
library(shiny)
library(tidyr)
library(dplyr)
library(ggplot2)
d <- seq(as.Date("2014-01-01"),Sys.Date(),by="day")
df <- data.frame(date = d , number = seq(1,length(d),by=1))
lines <- data.frame(x = as.Date(c("2014-02-07","2017-10-31", "2017-08-01")),
y = c(2500,5000,7500),
lbl = c("label 1", "label 2", "label 3"))
#UI
ui <- fluidPage(
#date range select:
dateRangeInput(inputId = "date", label = "choose date range",
start = min(df$date), end = max(df$date),
min = min(df$date), max = max(df$date)),
#graph:
plotOutput("line")
)
#SERVER:
server <- function(input, output) {
data <- reactive({ subset(df, date >= input$date[1] & date <= input$date[2])
})
#graph:
output$line <- renderPlot({
my_graph <- ggplot(data(), aes(date, number )) + geom_line() +
geom_vline(data = lines, aes(xintercept = x, color = factor(x) )) +
geom_label(data = lines, aes(x = x, y = y,
label = lbl, colour = factor(x),
fontface = "bold" )) +
scale_color_manual(values=c("#CC0000", "#6699FF", "#99FF66")) +
guides(colour = "none", size = "none")
return(my_graph)
})
}
shinyApp(ui = ui, server = server)
As mentioned by Aimée in a different thread:
In a nutshell, ggplot2 will always plot all of the data that you provide and the axis limits are based on that unless you specify otherwise. So because you are telling it to plot the line & label, they will appear on the plot even though the rest of the data doesn't extend that far.
You can resolve this by telling ggplot2 what you want the limits of your x axis to be, using the coord_cartesian function.
# Set the upper and lower limit for the x axis
dateRange <- c(input$date[1], input$date[2])
my_graph <- ggplot(df, aes(date, number)) + geom_line() +
geom_vline(data = lines, aes(xintercept = x, color = factor(x) )) +
geom_label(data = lines, aes(x = x, y = y,
label = lbl, colour = factor(x),
fontface = "bold" )) +
scale_color_manual(values=c("#CC0000", "#6699FF", "#99FF66")) +
guides(colour = "none", size = "none") +
coord_cartesian(xlim = dateRange)