Column colors not matching custom scale - ggplot2

I've put together a barplot that gives counts for phenotypes in the maxilla and mandible. I'd like the maxilla bars to be in black, and the mandible in grey. I've created a scale color variable for the fill option, yet not luck, the colors are still out of order.
I've already used this code successfully on a number of other barplots, but no luck here. I believe it is because I have combined 3 data frames into one using rbind, however the structure of the combined data frame is no different from the uncombined one, which do work.
The first four bars should be black, the last four bars should be grey.
### 3 data sets
a<-data.frame(
row.names = c("RI2_MAX_E1","ri2_mand_E1","rc1_mand_E1"),
count = c(2,2,2),
labels = c("RI2", "ri2", "rc1")
)
b<-data.frame(
row.names = c("RP3_MAX_E1","RP4_MAX_E1","rp3_mand_E1"),
count = c(3,3,2),
labels = c("RP3", "RP4", "rp3")
)
c<-data.frame(
row.names = c("RM3_MAX_E1","rm3_mand_E1"),
count = c(5,6),
labels = c("RM3", "rm3")
)
### Bind datasets into 1
E1.bind<-rbind(a,b,c)
### order variables
E1.bind$labels<-factor(E1.bind$labels, levels =c("RI2","RP3","RP4","RM3","ri2","rc1","rp3","rm3"))
### Custom scale color
E1.color<-c("black","black","black","black","grey","grey","grey","grey")
### plot
ggplot(data= E1.bind, aes(x=E1.bind$labels, y=E1.bind$count,fill=E1.color)) +
geom_bar(stat="identity") +
xlab("Teeth") + ylab("Phenotype counts") +
ggtitle("Teeth With Greatest Number of Phenotypes - Element 1")+
scale_fill_manual(name="",
labels = c("Maxilla","Mandible"),
values = c("black","grey"))+
scale_x_discrete(labels = c("RI2","RP3","RP4","RM3","ri2","rc1","rp3","rm3")) +
scale_y_continuous(breaks = seq(0,6,1)) +
theme_classic()+
theme(legend.position="top")+
theme(plot.title = element_text(hjust = 0.5))

Your E1.bind looks like this:
E1.bind
count labels
RI2_MAX_E1 2 RI2
ri2_mand_E1 2 ri2
rc1_mand_E1 2 rc1
RP3_MAX_E1 3 RP3
RP4_MAX_E1 3 RP4
rp3_mand_E1 2 rp3
RM3_MAX_E1 5 RM3
rm3_mand_E1 6 rm3
Note the order of the labels. Then you are using this as fill:
E1.color<-c("black","black","black","black","grey","grey","grey","grey")
A better way is to add a Type to your dataframe that you use to define the fill color. That way it's also more scalable:
library(dplyr)
library(ggplot2)
a<-data.frame(
row.names = c("RI2_MAX_E1","ri2_mand_E1","rc1_mand_E1"),
count = c(2,2,2),
labels = c("RI2", "ri2", "rc1")
)
b<-data.frame(
row.names = c("RP3_MAX_E1","RP4_MAX_E1","rp3_mand_E1"),
count = c(3,3,2),
labels = c("RP3", "RP4", "rp3")
)
c<-data.frame(
row.names = c("RM3_MAX_E1","rm3_mand_E1"),
count = c(5,6),
labels = c("RM3", "rm3")
)
### Bind datasets into 1
E1.bind<-rbind(a,b,c)
E1.bind$Type <- ifelse(grepl('R.*', E1.bind$labels), "Maxilla", "Mandible")
### Sort by Type
E1.bind <- arrange(E1.bind, desc(Type))
### plot
ggplot(data= E1.bind, aes(x=labels, y=count, fill=Type)) +
geom_bar(stat="identity") +
xlab("Teeth") + ylab("Phenotype counts") +
ggtitle("Teeth With Greatest Number of Phenotypes - Element 1") +
scale_y_continuous(breaks = seq(0,6,1)) +
scale_x_discrete(limits=E1.bind$labels) +
scale_fill_manual(values = c("grey", "black")) +
theme_classic()+
theme(legend.position="top")+
theme(plot.title = element_text(hjust = 0.5))
This results in:

Related

How to add count (n) / summary statistics as a label to ggplot2 boxplots?

I am new to R and trying to add count labels to my boxplots, so the sample size per boxplot shows in the graph.
This is my code:
bp_east_EC <-total %>% filter(year %in% c(1977, 2020, 2021, 1992),
sampletype == "groundwater",
East == 1,
#EB == 1,
#N59 == 1,
variable %in% c("EC_uS")) %>%
ggplot(.,aes(x = as.character(year), y = value, colour = as.factor(year))) +
theme_ipsum() +
ggtitle("Groundwater EC, eastern Curacao") +
theme(plot.title = element_text(hjust = 0.5, size=14)) +
theme(legend.position = "none") +
labs(x="", y="uS/cm") +
geom_jitter(color="grey", size=0.4, alpha=0.9) +
geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=23, size=2) #shows mean
I have googled a lot and tried different things (with annotate, with return functions, mtext, etc), but it keeps giving different errors. I think I am such a beginner I cannot figure out how to integrate such suggestions into my own code.
Does anybody have an idea what the best way would be for me to approach this?
I would create a new variable that contained your sample sizes per group and plot that number with geom_label. I've generated an example of how to add count/sample sizes to a boxplot using the iris dataset since your example isn't fully reproducible.
library(tidyverse)
data(iris)
# boxplot with no label
ggplot(iris, aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot()
# boxplot with label
iris %>%
group_by(Species) %>%
mutate(count = n()) %>%
mutate(mean = mean(Sepal.Length)) %>%
ggplot(aes(x = Species, y = Sepal.Length, fill = Species)) +
geom_boxplot() +
geom_label(aes(label= count , y = mean + 0.75), # <- change this to move label up and down
size = 4, position = position_dodge(width = 0.75)) +
geom_jitter(alpha = 0.35, aes(color = Species)) +
stat_summary(fun = mean, geom = "point", shape = 23, size = 6)

how to color different datasets separately when overlapping them using geom_smooth and color settings

i have 2 datasets that span full genomes, separated by chromosomes (scaffolds), for 2 group comparisons and i want to overlap them in a single graph.
the way i was doing was as follow:
ggplot(NULL, aes(color = as_factor(scaffold))) +
geom_smooth(data = windowStats_SBvsOC, aes(x = mid2, y = Fst_group1_group5), se=F) +
geom_smooth(data = windowStats_SCLvsSCU, aes(x = mid2, y = Fst_group3_group4), se=F) +
scale_y_continuous(expand = c(0,0), limits = c(0, 1)) +
scale_x_continuous(labels = chrom$chrID, breaks = axis_set$center) +
scale_color_manual(values = rep(c("#276FBF", "#183059"), unique(length(chrom$chrID)))) +
scale_size_continuous(range = c(0.5,3)) +
labs(x = NULL,
y = "Fst (smoothed means)") +
theme_minimal() +
theme(
legend.position = "none",
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.title.y = element_text(),
axis.text.x = element_text(angle = 60, size = 8, vjust = 0.5))
this way, i get each chromosome with alternating colors, and the smoothing is per chromosome. but i wanted the colors to be different between the 2 groups so i can distinguish when they are overlapped like this. is there a way to do it? i can only do it once i remove the color by scaffold, but then the smoothing gets done across the whole genome and i don't want that!
my dataset is big, so i'm attaching it here!
i'm running this in rstudio 2022.02.3, R v.3.6.2 and package ggplot2
EDIT: i've figured out! i just needed to change color = as_factor(scaffold) to group = as_factor(scaffold); and then add the aes(color) to each geom_smooth() function.

ggplot facet different Y axis order based on value

I have a faceted plot wherein I'd like to have the Y-axis labels and the associated values appear in descending order of values (and thereby changing the order of the labels) for each facet. What I have is this, but the order of the labels (and the corresponding values) is the same for each facet.
ggplot(rf,
aes(x = revenues,
y = reorder(AgencyName, revenues))) +
geom_point(stat = "identity",
aes(color = AgencyName),
show.legend = FALSE) +
xlab(NULL) +
ylab(NULL) +
scale_x_continuous(label = scales::comma) +
facet_wrap(~year, ncol = 3, scales = "free_y") +
theme_minimal()
Can someone point me to the solution?
The functions reorder_within and scale_*_reordered from the tidytext package might come in handy.
reorder_within recodes the values into a factor with strings in the form of "VARIABLE___WITHIN". This factor is ordered by the values in each group of WITHIN.
scale_*_reordered removes the "___WITHIN" suffix when plotting the axis labels.
Add scales = "free_y" in facet_wrap to make it work as expected.
Here is an example with generated data:
library(tidyverse)
# Generate data
df <- expand.grid(
year = 2019:2021,
group = paste("Group", toupper(letters[1:8]))
)
set.seed(123)
df$value <- rnorm(nrow(df), mean = 10, sd = 2)
df %>%
mutate(group = tidytext::reorder_within(group, value, within = year)) %>%
ggplot(aes(value, group)) +
geom_point() +
tidytext::scale_y_reordered() +
facet_wrap(vars(year), scales = "free_y")

Connect observations (dots and lines) without using ggpaired

I created a bar chart using geom_bar with "Group" on the x-axis (Female, Male), and "Values" on the y-axis. Group is further subdivided into "Session" such that there is "Session 1" and "Session 2" for both Male and Female (i.e. four bars in total).
Since all participants participated in Session 1 and 2, I overlayed a dotplot (geom_dot) over each of the four bars, to represent the individual data.
I am now trying to connect the observations for all participants ("PID"), between session 1 and 2. In other words, there should be lines connecting several sets of two-points on the "Male" portion of the x-axis (i.e. per participant), and "Female portion".
I tried this with "geom_line" (below) but to no avail (instead, it created a single vertical line in the middle of "Male" and another in the middle of "Female"). I'm not too sure how to fix this.
See code below:
ggplot(data_foo, aes(x=factor(Group),y=Values, colour = factor(Session), fill = factor(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 1.0, position = "dodge", fill = "black") +
geom_line(aes(group = PID), colour="dark grey") +
labs(title='My Data',x='Group',y='Values') +
theme_light()
Sample data (.txt)
data_foo <- readr::read_csv("PID,Group,Session,Values
P1,F,1,14
P2,F,1,13
P3,F,1,16
P4,M,1,18
P5,F,1,20
P6,M,1,27
P7,M,1,19
P8,M,1,11
P9,F,1,28
P10,F,1,20
P11,F,1,24
P12,M,1,10
P1,F,2,26
P2,F,2,21
P3,F,2,19
P4,M,2,13
P5,F,2,26
P6,M,2,15
P7,M,2,23
P8,M,2,23
P9,F,2,30
P10,F,2,21
P11,F,2,11
P12,M,2,19")
The trouble you have is that you want to dodge by several groups. Your geom_line does not know how to split the Group variable by session. Here are two ways to address this problem. Method 1 is probably the most "ggploty way", and a neat way of adding another grouping without making the visualisation too overcrowded. for method 2 you need to change your x variable
1) Use facet
2) Use interaction to split session for each Group. Define levels for the right bar order
I have also used geom_point instead, because geom_dot is more a specific type of histogram.
I would generally recommend to use boxplots for such plots of values like that, because bars are more appropriate for specific measures such as counts.
Method 1: Facets
library(ggplot2)
ggplot(data_foo, aes(x = Session, y = Values, fill = as.character(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = PID)) +
geom_point(aes(group = PID), shape = 21, color = 'black') +
facet_wrap(~Group)
Created on 2020-01-20 by the reprex package (v0.3.0)
Method 2: create an interaction term in your x variable. note that you need to order the factor levels manually.
data_foo <- data_foo %>% mutate(new_x = factor(interaction(Group,Session), levels = c('F.1','F.2','M.1','M.2')))
ggplot(data_foo, aes(x = new_x, y = Values, fill = as.character(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = PID)) +
geom_point(aes(group = PID), shape = 21, color = 'black')
Created on 2020-01-20 by the reprex package (v0.3.0)
But everything gets visually not very compelling.
I suggest doing a few visualization tips to have a more informative chart. For example, I feel like having a differentiation of colors for PID will help us track the changes of each participant for different levels of other variables. Something like:
library(ggplot2)
ggplot(data_foo, aes(x = factor(Session), y = Values, fill = factor(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = factor(PID), colour=factor(PID)), size=2, alpha=0.7) +
geom_point(aes(group = factor(PID), colour=factor(PID)), shape = 21, size=2,show.legend = F) +
theme_bw() +
labs(x='Session',fill='Session',colour='PID')+
theme(legend.position="right") +
facet_wrap(~Group)+
scale_colour_discrete(breaks=paste0('P',1:12))
And we have the following plot:
Hope it helps.

Is it possible to have 2 legends for variables when one is continuous and the other is discrete?

I checked a few examples online and I am not sure that it can be done because every plot with 2 different variables (continuous and discrete) has one of 2 options:
legend regarding the continuous variable
legend regarding the discrete variable
Just for visualization, I put here an example. Imagine that I want to have a legend for the blue line. Is it possible to do that??
The easiest approach would be to map it to a different aesthetic than you already use:
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
geom_smooth(method = "loess", aes(linetype = "fit"))
There area also specialised packages for adding additional colour legends:
library(ggplot2)
library(ggnewscale)
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
new_scale_colour() +
geom_smooth(method = "loess", aes(colour = "fit"))
Beware that if you want to tweak colours via a colourscale, you must first add these before calling the new_scale_colour(), i.e.:
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
scale_colour_manual(values = c("red", "green", "blue")) +
new_scale_colour() +
geom_smooth(method = "loess", aes(colour = "fit")) +
scale_colour_manual(values = "purple")
EDIT: To adress comment: yes it is possible with a line that is data independent, I was just re-using the data for brevity of example. See below for arbitrary line (also should work with the ggnewscale approach):
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
geom_line(data = data.frame(x = 1:30, y = rnorm(10, 200, 10)),
aes(x, y, linetype = "arbitrary line"))