Stack bars with percentages and values shown - ggplot2

Here is my dataframe - data_long1
data.frame(
value = c(88, 22, 100, 12, 55, 17, 10, 2, 2),
Subtype = as.factor(c("lung","prostate",
"oesophagus","lung","prostate","oesophagus","lung",
"prostate","oesophagus")),
variable = as.factor(c("alive","alive",
"alive","dead","dead","dead","uncertain","uncertain",
"uncertain"))
)
The following code gives me a nice graph that I want, with all the values displayed, but none in percentages.
ggplot(data_long1, aes(x = Subtype, y = value, fill = variable)) + geom_bar(stat = "identity") +
geom_text(aes(label= value), size = 3, hjust = 0.1, vjust = 2, position = "stack")
What I am looking for is a stacked bar chart with The actual values displayed on the Y Axis not percentages(like previous graph) BUT also a percentage figure displayed on each subsection of the actual Bar Chart. I try this code and get a meaningless graph with every stack being 33.3%.
data_long1 %>% count(Subtype, variable) %>% group_by(Subtype) %>% mutate(pct= prop.table(n) * 100) %>% ggplot() + aes(x = Subtype, y = variable, fill=variable) +
geom_bar(stat="identity") + ylab("Number of Patients") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct),"%")), position=position_stack(vjust=0.5)) + ggtitle("My Tumour Sites") + theme_bw()
I cannot seem to find a way to use the mutate function to resolve this problem. Please help.

I would pre-compute the summaries you want. Here is the proportion within each subtype:
data_long2 <- data_long1 %>%
group_by(Subtype) %>%
mutate(proportion = value / sum(value))
ggplot(data_long2, aes(x = Subtype, y = value, fill = variable)) +
geom_bar(stat = "identity") +
geom_text(aes(label= sprintf('%0.0f%%', proportion * 100)), size = 3, hjust = 0.1, vjust = 2, position = "stack")
You can also get the proportion across all groups and types simply by removing the group_by statement:
data_long2 <- data_long1 %>%
mutate(proportion = value / sum(value))
ggplot(data_long2, aes(x = Subtype, y = value, fill = variable)) +
geom_bar(stat = "identity") +
geom_text(aes(label= sprintf('%0.0f%%', proportion * 100)), size = 3, hjust = 0.1, vjust = 2, position = "stack")

Related

Adding stat = count on top of histogram in ggplot

I've seen some other examples (especially using geom_col() and stat_bin()) to add frequency or count numbers on top of bars. I'm trying to get this to work with geom_histogram() where I have a discrete (string), not continuous, x variable.
library(tidyverse)
d <- cars |>
mutate( discrete_var = factor(speed))
ggplot(d, aes(x = discrete_var)) +
geom_histogram(stat = "count") +
stat_bin(binwidth=1, geom='text', color='white', aes(label=..count..),
position=position_stack(vjust = 0.5)) +
Gives me an error because StatBin requires a continuous x variable. Any quick fix ideas?
The error message gives you the answer: ! StatBin requires a continuous x variable: the x variable is discrete.Perhaps you want stat="count"?
So instead of stat_bin() use stat_count()
And for further reference here is a reproducible example:
library(tidyverse)
d <- cars |>
mutate( discrete_var = factor(speed))
ggplot(data = d,
aes(x = discrete_var)) +
geom_histogram(stat = "count") +
stat_count(binwidth = 1,
geom = 'text',
color = 'white',
aes(label = ..count..),
position = position_stack(vjust = 0.5))

How to add count (n) / summary statistics as a label to ggplot2 boxplots?

I am new to R and trying to add count labels to my boxplots, so the sample size per boxplot shows in the graph.
This is my code:
bp_east_EC <-total %>% filter(year %in% c(1977, 2020, 2021, 1992),
sampletype == "groundwater",
East == 1,
#EB == 1,
#N59 == 1,
variable %in% c("EC_uS")) %>%
ggplot(.,aes(x = as.character(year), y = value, colour = as.factor(year))) +
theme_ipsum() +
ggtitle("Groundwater EC, eastern Curacao") +
theme(plot.title = element_text(hjust = 0.5, size=14)) +
theme(legend.position = "none") +
labs(x="", y="uS/cm") +
geom_jitter(color="grey", size=0.4, alpha=0.9) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=23, size=2) #shows mean
I have googled a lot and tried different things (with annotate, with return functions, mtext, etc), but it keeps giving different errors. I think I am such a beginner I cannot figure out how to integrate such suggestions into my own code.
Does anybody have an idea what the best way would be for me to approach this?
I would create a new variable that contained your sample sizes per group and plot that number with geom_label. I've generated an example of how to add count/sample sizes to a boxplot using the iris dataset since your example isn't fully reproducible.
library(tidyverse)
data(iris)
# boxplot with no label
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot()
# boxplot with label
iris %>%
group_by(Species) %>%
mutate(count = n()) %>%
mutate(mean = mean(Sepal.Length)) %>%
ggplot(aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() +
geom_label(aes(label= count , y = mean + 0.75), # <- change this to move label up and down
size = 4, position = position_dodge(width = 0.75)) +
geom_jitter(alpha = 0.35, aes(color = Species)) +
stat_summary(fun = mean, geom = "point", shape = 23, size = 6)

Adjust binwidth size for faceted dotplot with free y axis

I would like to adjust the binwidth of a faceted geom_dotplot while keeping the dot sizes the same.
Using the default binwidth (1/30 of the data range), I get the following plot:
library(ggplot2)
df = data.frame(
t = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
x = 1,
y = c(1, 2, 3, 4, 5, 100, 200, 300, 400, 500)
)
ggplot(df, aes(x=x, y=y)) +
geom_dotplot(binaxis="y", stackdir="center") +
facet_wrap(~t, scales="free_y")
However, if I change the binwidth value, the new value is taken as an absolute value (and not the ratio of the data range), so the two facets get differently sized dots:
geom_dotplot(binaxis="y", stackdir="center", binwidth=2) +
Is there a way to adjust binwidth so it is relative to its facet's data range?
One option to achieve your desired result would be via multiple geom_dotplots which allows to set the binwidth for each facet separately. This however requires some manual work to compute the binwidths so that the dots are the same size for each facet:
library(ggplot2)
y_ranges <- tapply(df$y, factor(df$t), function(x) diff(range(x)))
binwidth1 <- 2
scale2 <- binwidth1 / (y_ranges[[1]] / 30)
binwidth2 <- scale2 * y_ranges[[2]] / 30
ggplot(df, aes(x=x, y=y)) +
geom_dotplot(data = ~subset(.x, t == 1), binaxis="y", stackdir="center", binwidth = binwidth1) +
geom_dotplot(data = ~subset(.x, t == 2), binaxis="y", stackdir="center", binwidth = binwidth2) +
facet_wrap(~t, scales="free_y")

ggplot facet different Y axis order based on value

I have a faceted plot wherein I'd like to have the Y-axis labels and the associated values appear in descending order of values (and thereby changing the order of the labels) for each facet. What I have is this, but the order of the labels (and the corresponding values) is the same for each facet.
ggplot(rf,
aes(x = revenues,
y = reorder(AgencyName, revenues))) +
geom_point(stat = "identity",
aes(color = AgencyName),
show.legend = FALSE) +
xlab(NULL) +
ylab(NULL) +
scale_x_continuous(label = scales::comma) +
facet_wrap(~year, ncol = 3, scales = "free_y") +
theme_minimal()
Can someone point me to the solution?
The functions reorder_within and scale_*_reordered from the tidytext package might come in handy.
reorder_within recodes the values into a factor with strings in the form of "VARIABLE___WITHIN". This factor is ordered by the values in each group of WITHIN.
scale_*_reordered removes the "___WITHIN" suffix when plotting the axis labels.
Add scales = "free_y" in facet_wrap to make it work as expected.
Here is an example with generated data:
library(tidyverse)
# Generate data
df <- expand.grid(
year = 2019:2021,
group = paste("Group", toupper(letters[1:8]))
)
set.seed(123)
df$value <- rnorm(nrow(df), mean = 10, sd = 2)
df %>%
mutate(group = tidytext::reorder_within(group, value, within = year)) %>%
ggplot(aes(value, group)) +
geom_point() +
tidytext::scale_y_reordered() +
facet_wrap(vars(year), scales = "free_y")

Stacked Bar Chart Labels-- Using geom_text to label % on a value based y-axis

I am looking to create a stacked bar chart where my y-axis measures the value but the table shows the % of total bar.
I think I need to add a pct column to my table then use that but am not sure how to get the pct column either.
Df for example is:
date, type, value, pct
Jan 1, A, 5, 45% (5/11)
Jan 1, B, 6, 55% (6/11)
table and chart image
Maybe something like this?
library(dplyr)
library(ggplot2)
test.df <- data.frame(date = c("2020-01-01", "2020-01-01", "2020-01-02", "2020-01-02"),
type = c("A", "B", "A", "B"),
val = c(5:6, 1, 7))
test.df <- test.df %>%
group_by(date) %>%
mutate(
type.num = as.numeric(type),
prop = val/sum(val),
y_text_pos = ifelse(type=="B", val, sum(val))) %>%
ungroup()
ggplot(data = test.df, aes(x = as.Date(date), y = val, fill = type)) +
geom_col() +
geom_text(aes(y = y_text_pos, label = paste0(round(prop*100,1), "%")), color = "black", vjust = 1.1)
With the output: