ggplot facet different Y axis order based on value - ggplot2

I have a faceted plot wherein I'd like to have the Y-axis labels and the associated values appear in descending order of values (and thereby changing the order of the labels) for each facet. What I have is this, but the order of the labels (and the corresponding values) is the same for each facet.
ggplot(rf,
aes(x = revenues,
y = reorder(AgencyName, revenues))) +
geom_point(stat = "identity",
aes(color = AgencyName),
show.legend = FALSE) +
xlab(NULL) +
ylab(NULL) +
scale_x_continuous(label = scales::comma) +
facet_wrap(~year, ncol = 3, scales = "free_y") +
theme_minimal()
Can someone point me to the solution?

The functions reorder_within and scale_*_reordered from the tidytext package might come in handy.
reorder_within recodes the values into a factor with strings in the form of "VARIABLE___WITHIN". This factor is ordered by the values in each group of WITHIN.
scale_*_reordered removes the "___WITHIN" suffix when plotting the axis labels.
Add scales = "free_y" in facet_wrap to make it work as expected.
Here is an example with generated data:
library(tidyverse)
# Generate data
df <- expand.grid(
year = 2019:2021,
group = paste("Group", toupper(letters[1:8]))
)
set.seed(123)
df$value <- rnorm(nrow(df), mean = 10, sd = 2)
df %>%
mutate(group = tidytext::reorder_within(group, value, within = year)) %>%
ggplot(aes(value, group)) +
geom_point() +
tidytext::scale_y_reordered() +
facet_wrap(vars(year), scales = "free_y")

Related

plot gam results with original x values (not scaled and centred)

I have a dataset that I am modeling with a gam. Because there are two continuous varaibles in the gam, I have centred and scaled these variables before adding them to the model. Therefore, when I use the built-in features in gratia to show the results, the x values are not the same as the original scale. I'd like to plot the results using the scale of the original data.
An example:
library(tidyverse)
library(mgcv)
library(gratia)
set.seed(42)
df <- data.frame(
doy = sample.int(90, 300, replace = TRUE),
year = sample(c(1980:2020), size = 300, replace = TRUE),
site = c(rep("A", 150), rep("B", 80), rep("C", 70)),
sex = sample(c("F", "M"), size = 300, replace = TRUE),
mass = rnorm(300, mean = 500, sd = 50)) %>%
mutate(doy.s = scale(doy, center = TRUE, scale = TRUE),
year.s = scale(year, center = TRUE, scale = TRUE),
across(c(sex, site), as.factor))
m1 <- gam(mass ~
s(year.s, site, bs = "fs", by = sex, k = 5) +
s(doy.s, site, bs = "fs", by = sex, k = 5) +
s(sex, bs = "re"),
data = df, method = "REML", family = gaussian)
draw(m1)
How do I re-plot the last two panels in this figure to show the relationship between year and mass with ggplot?
You can't do this with gratia::draw automatically (unless I'm mistaken).* But you can use gratia::smooth_estimates to get a dataframe which you can then do whatever you like with.
To answer your specific question: to re-plot the last two panels of the plot you provided, but with year unscaled, you can do the following
# Get a tibble of smooth estimates from the model
sm <- gratia::smooth_estimates(m1)
# Add a new column for the unscaled year
sm <- sm %>% mutate(year = mean(df$year) + (year.s * sd(df$year)))
# Plot the smooth s(year.s,site) for sex=F with year unscaled
pF <- sm %>% filter(smooth == "s(year.s,site):sexF" ) %>%
ggplot(aes(x = year, y = est, color=site)) +
geom_line() +
theme(legend.position = "none") +
labs(y = "Partial effect", title = "s(year.s,site)", subtitle = "By: sex; F")
# Plot the smooth s(year.s,site) for sex=M with year unscaled
pM <- sm %>% filter(smooth == "s(year.s,site):sexM" ) %>%
ggplot(aes(x = year, y = est, color=site)) +
geom_line() +
theme(legend.position = "none") +
labs(y = "Partial effect", title = "s(year.s,site)", subtitle = "By: sex; M")
library(patchwork) # use `patchwork` just for easy side-by-side plots
pF + pM
to get:
EDIT: If you also want to shift result on the y-axis as #GavinSimpson (who is the author and maintainer of gratia) mentioned, you can do this with add_constant, adding this code before plotting above:
sm <- sm %>%
add_constant(coef(m1)["(Intercept)"]) %>%
transform_fun(inv_link(m1))
[You should also in general untransform the smooth by the inverse of the model's link function. In your case this is just the identity, so it is not necessary, but in general it would be. That's what the second step above is doing.]
In your example, this results in:
*As mentioned in the custom-plotting vignette for gratia, the goal of draw not to be fully customizable, but just to be useful default. See there for recommendations about custom plots.

How to add count (n) / summary statistics as a label to ggplot2 boxplots?

I am new to R and trying to add count labels to my boxplots, so the sample size per boxplot shows in the graph.
This is my code:
bp_east_EC <-total %>% filter(year %in% c(1977, 2020, 2021, 1992),
sampletype == "groundwater",
East == 1,
#EB == 1,
#N59 == 1,
variable %in% c("EC_uS")) %>%
ggplot(.,aes(x = as.character(year), y = value, colour = as.factor(year))) +
theme_ipsum() +
ggtitle("Groundwater EC, eastern Curacao") +
theme(plot.title = element_text(hjust = 0.5, size=14)) +
theme(legend.position = "none") +
labs(x="", y="uS/cm") +
geom_jitter(color="grey", size=0.4, alpha=0.9) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=23, size=2) #shows mean
I have googled a lot and tried different things (with annotate, with return functions, mtext, etc), but it keeps giving different errors. I think I am such a beginner I cannot figure out how to integrate such suggestions into my own code.
Does anybody have an idea what the best way would be for me to approach this?
I would create a new variable that contained your sample sizes per group and plot that number with geom_label. I've generated an example of how to add count/sample sizes to a boxplot using the iris dataset since your example isn't fully reproducible.
library(tidyverse)
data(iris)
# boxplot with no label
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot()
# boxplot with label
iris %>%
group_by(Species) %>%
mutate(count = n()) %>%
mutate(mean = mean(Sepal.Length)) %>%
ggplot(aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() +
geom_label(aes(label= count , y = mean + 0.75), # <- change this to move label up and down
size = 4, position = position_dodge(width = 0.75)) +
geom_jitter(alpha = 0.35, aes(color = Species)) +
stat_summary(fun = mean, geom = "point", shape = 23, size = 6)

How to assign unique point colors in faceted ggplot

I am making a facet plot and trying to select the color I want for each plot (olivegreen and olivegreen3 in this case).
I need to plot temp versus lat faceted by year
library(tidyverse)
d.f. <- data.frame( year = c(rep(2013, 10),rep(2015,10)), temperature = rep(c(1:10),2), lat = rep(c(70:79),2))
ggplot(data = d.f., aes(x=temperature, y = lat)) +
geom_point(show.legend = FALSE, size = 3) +
facet_wrap(~year) +
xlab("Surface Temperature") +
ylab("Latitude")
I am sure it is a simple fix, and perhaps I'm tired this Friday but I cannot figure out how to make the points in each facet different colors of my choosing. When I do color = year in the aes() argument that makes them unique colors, but automatically picked.
By default, year will be a numeric vector, so you'll need to coerce year to a factor. Then you can configure the color scale manually:
ggplot(data = d.f., aes(x=temperature, y = lat, color = factor(year))) +
geom_point(show.legend = FALSE, size = 3) +
scale_color_manual(values = c("darkolivegreen", "darkolivegreen3"))+
facet_wrap(~year) +
xlab("Surface Temperature") +
ylab("Latitude")

Spacing between x-axis groups bigger than within group spacing geom_col

I am trying to get double the space between the groups Automatic and Manual on the x-axis compared to the spaces within these groups. I am using geom_col() and experimted with different arguments, suchs as position_dodge, width and preserve = "single". I can't get this to work. What I am aiming for is a graph such as I have added as an image.
library(ggplot2)
library(ggthemes)
library(plyr)
#dataset
df <- mtcars
df$cyl <- as.factor(df$cyl)
df$am <- as.factor(df$am)
df$am <- revalue(df$am, c("0"="Automatic", "1"="Manual"))
ggplot(df, aes(fill = cyl, x = am, y = mpg)) +
geom_col(position = position_dodge(width = 0.9)) +
theme_bw()
Try using a combination of position=position_dodge(width=...) and width=...
For example:
ggplot(df, aes(fill = cyl, x = am, y = mpg)) +
geom_col(position = position_dodge(width = 0.9), width=0.8) +
theme_bw()
The width() command gives the displayed width of individual bars, while the position(width=) gives the space that is reserved for the bars.
The difference between the two values gives the space between bars within a group, while 1 - position_dodge(width=) gives the space between the groups.

Using 2nd variable to label axis ticks in ggplot2 facets

I am attempting to make a faceted t-chart using ggplot2, where the x-axis is represented a sequence of events and the y-axis represents the number of days between those events. The x-axis should be labelled with the event date, but it is not a time series since the distance between x-axis ticks should be uniform regardless of the real time between events.
Adding a faceting layer has been confusing me. Here's some sample data:
df <- data.frame(EventDT = as.POSIXct(c("2014-11-22 07:41:00", "2015-02-24 08:10:00",
"2015-06-10 13:54:00", "2015-07-11 02:43:00",
"2015-08-31 19:08:00", "2014-11-18 14:06:00",
"2015-06-09 23:10:00", "2016-02-29 07:55:00",
"2016-05-22 04:30:00", "2016-05-25 21:46:00",
"2014-12-22 16:19:00", "2015-05-13 16:38:00",
"2015-06-01 09:05:00", "2016-02-21 02:30:00",
"2016-05-13 01:36:00")),
EventNBR = rep(1:5, 3),
Group = c(rep('A', 5), rep('B',5), rep('C',5)),
Y = c(15.818750, 94.020139, 106.238889, 30.534028, 51.684028,
187.670139, 203.377778, 264.364583, 82.857639, 3.719444,
169.829861, 142.013194, 18.685417, 264.725694,81.962500))
Ignoring the date of the event, I can produce this:
g <- ggplot(df, aes(x=EventNBR, y=Y)) +
geom_point() +
geom_line() +
facet_wrap(~ Group, scales='free_x')
Plot should show EventDT along X-axis, not EventNBR
I have tried to use the labels parameter to scale_x_discrete without success:
xaxis.ticks <- function(x) {
df[which(df$EventNBR) == x] }
g + scale_x_discrete(labels = xaxis.ticks)
But that's wrong in a way I can't describe, because it cuts off my tick labels altogether.
Because there is a 1-1 correspondence between EventNBR and EventDT by Group for this dataset, it seems like there should be an easy solution, but I can't figure it out. Am I overlooking something painfully easy?
In general, this is a very problematic thing as mentioned here and there are several other topics on this.
But luckily in your case it is possible since you use scales='free_x'.
What you need to do is adding an unique index column like
df$id <- 1:nrow(df)
and afterwards you can overwrite these indexes with you column with correct labels.
g <- ggplot(df, aes(x=id, y=Y)) +
geom_point() +
geom_line() +
facet_wrap(~ Group, scales='free_x')
g + scale_x_continuous(breaks=df$id, labels=df$EventDT) +
theme(axis.text.x=element_text(angle=90, vjust=.5))
There might be easier solutions but this is working in your example.
Also, the labels seem to be gone since the x axis is numeric and not discrete. So using scale_x_continuous produces the correct labels.
EDIT:
So a full example looks like this
library(ggplot2)
df <- data.frame(EventDT = as.POSIXct(c("2014-11-22 07:41:00", "2015-02-24 08:10:00",
"2015-06-10 13:54:00", "2015-07-11 02:43:00",
"2015-08-31 19:08:00", "2014-11-18 14:06:00",
"2015-06-09 23:10:00", "2016-02-29 07:55:00",
"2016-05-22 04:30:00", "2016-05-25 21:46:00",
"2014-12-22 16:19:00", "2015-05-13 16:38:00",
"2015-06-01 09:05:00", "2016-02-21 02:30:00",
"2016-05-13 01:36:00")),
EventNBR = rep(1:5, 3),
Group = c(rep('A', 5), rep('B',5), rep('C',5)),
Y = c(15.818750, 94.020139, 106.238889, 30.534028, 51.684028,
187.670139, 203.377778, 264.364583, 82.857639, 3.719444,
169.829861, 142.013194, 18.685417, 264.725694,81.962500))
df$id <- 1:nrow(df)
g <- ggplot(df, aes(x=id, y=Y)) +
geom_point() +
geom_line() +
facet_wrap(~ Group, scales='free_x')
g + scale_x_continuous(breaks=df$id, labels=df$EventDT) +
theme(axis.text.x=element_text(angle=90, vjust=.5))
and produces the following output: