Colouring line of best fit by shape whilst accounting for a separate coloured variable in ggplot? - ggplot2

I have a technical issue with my attempts to plot group differences whilst accounting for 3 variables. This all works fine until I attempt to plot the line of best fit for each group; which results in a plot that makes it difficult to distinguish between groups (as seen below)
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color = Petal.Length, shape = Species)) + geom_point() +
scale_color_viridis_c() +
geom_smooth(method = "lm", se = FALSE, show.legend = TRUE)
I would like to provide a manual discrete colour for each best fit line, so that readers can distinguish between groups easier (for example; something like having a red line for setosa, a white line for versicolor and black line for virginica). Below are the examples of what I have tried so far with their associated error messages.
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color = Petal.Length, shape = Species)) + geom_point() +
scale_color_viridis_c() +
geom_smooth(method = "lm", se = FALSE, show.legend = TRUE, aes(color = Species))
"Error: Discrete value supplied to continuous scale"
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color = Petal.Length, shape = Species)) + geom_point() +
scale_color_viridis_c() +
geom_smooth(method = "lm", se = FALSE, show.legend = TRUE , aes(color = Species)) +
scale_color_discrete()
"Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale.
geom_smooth() using formula 'y ~ x'
Error: Continuous value supplied to discrete scale"
Any recommendations on how to manually assign a colour to each line (whilst leaving the scatter plot colours unchanged) would be very appreciated.
Many thanks in advance,
Rhys

Related

geom_point with shape, fill and color

I have created a ggplot of points that show the mean and sd of the variable "y-axis" in each level of x_axis, and have different shapes according to cat.1 and different colors according to cat.2. There are 3 panels according to "time"
the dataframe "example" can be downloaded from here:
https://drive.google.com/file/d/1fJWp6qoSYgegivA5PgNsQkVFkVlT4qcC/view?usp=sharing
plot1<-ggplot(example,aes(x=x_axis,y=mean , shape = cat.1)) + theme_bw() +
facet_wrap(~time,dir = "h")+
geom_point(aes(color=cat.2), position = position_jitter(0), size=4)+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
geom_errorbar(aes(x_axis, ymin=mean-sd, ymax=mean+sd),
position = position_jitter(0), width=0.1)
The plot is like this:
plot1
Since I preferred the points to have a black border, I have added color="black", and have replaced the previous "color= cat.2", by "fill=cat.2". I realize that the correct way is to use "fill" instead of "color", but the fill function does not seem to work! All the points are black:
plot2<-ggplot(example,aes(x=x_axis,y=mean , shape = cat.1)) + theme_bw() +
facet_wrap(~time,dir = "h")+
geom_point(aes(fill=cat.2), position = position_jitter(0), size=4, color="black")+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
geom_errorbar(aes(x_axis, ymin=mean-sd, ymax=mean+sd),
position = position_jitter(0), width=0.1)
plot2
I have tried adding "shape=21" to the geom_point layer, and it gives the dots filled according to cat.2 and with the black border, but the plot does not show the shapes according to cat.1.
How can I create the scatterplot with shapes and fills according to two factors, and also add a black border to the points?
Now with scale_shape_manual as indicated by #erc, it worked:
plot3<-ggplot(example,aes(x=x_axis,y=mean , shape = cat.1)) + theme_bw() +
facet_wrap(~time,dir = "h")+
geom_jitter(aes(fill=cat.2), position = position_jitter(0), size=4, color="black")+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
geom_errorbar(aes(x_axis, ymin=mean-sd, ymax=mean+sd),
position = position_jitter(0), width=0.1) +
scale_shape_manual( values =c("x"=24,"y"=21))

How can I make the line appear besides the dots?

I tried to plot my data but I can only get the points, if I put "linetype" with geom:line it does not appear. Besides, I have other columns in my data set, called SD, SD.1 and SD.2, which are standard deviation values I calculated previously that appear at the bottom. I would like to remove them from the plot and put them like error bars in the lines.
library(tidyr)
long_data <- tidyr::pivot_longer(
data=OD,
cols=-Days,
names_to="Strain",
values_to="OD")
ggplot(long_data, aes(x=Days, y=OD, color=Strain)) +
geom_line() + geom_point(shape=16, size=1.5) +
scale_color_manual(values=c("Wildtype"="darkorange2", "Winter"="cadetblue3", "Flagella_less"="olivedrab3"))+
labs(title="Growth curve",x="Days",y="OD750",color="Legend")+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5,color="black",size=8),
axis.text.y=element_text(angle=0,hjust=1,vjust=0.5,color="black",size=8),
plot.title=element_text(hjust=0.5, size=13,face = "bold",margin = margin(t=0, r=10,b=10,l=10)),
axis.title.y =element_text(size=10, margin=margin(t=0,r=10,b=0,l=0)),
axis.title.x =element_text(size=10, margin=margin(t=10,r=10,b=0,l=0)),
axis.line = element_line(size = 0.5, linetype = "solid",colour = "black"))

Adding numeric label to geom_hline in ggplot2

I have produced the graph pictured using the following code -
ggboxplot(xray50g, x = "SupplyingSite", y = "PercentPopAff",
fill = "SupplyingSite", legend = "none") +
geom_point() +
rotate_x_text(angle = 45) +
# ADD HORIZONTAL LINE AT BASE MEAN
geom_hline(yintercept = mean(xray50g$PercentPopAff), linetype = 2)
What I would like to do is label the horizontal geom_hline with it's numeric value so that it appears on the y axis.
I have provided an example of what I would like to achieve in the second image.
Could somebody please help with the code to achieve this for my plot?
Thanks!
There's a really great answer that should help you out posted here. As long as you are okay with formatting the "extra tick" to match the existing axis, the easiest solution is to just create your axis breaks manually and specify within scale_y_continuous. See below where I use an example to label a vertical dotted line on the x-axis using this method.
df <- data.frame(x=rnorm(1000, mean = 0.5))
ggplot(df, aes(x)) +
geom_histogram(binwidth = 0.1) +
geom_vline(xintercept = 0.5, linetype=2) +
scale_x_continuous(breaks=c(seq(from=-4,to=4,by=2), 0.5))
Again, for other methods, including those where you want the extra tick mark formatted differently than the rest of the axis, check the top answer here.

Connect observations (dots and lines) without using ggpaired

I created a bar chart using geom_bar with "Group" on the x-axis (Female, Male), and "Values" on the y-axis. Group is further subdivided into "Session" such that there is "Session 1" and "Session 2" for both Male and Female (i.e. four bars in total).
Since all participants participated in Session 1 and 2, I overlayed a dotplot (geom_dot) over each of the four bars, to represent the individual data.
I am now trying to connect the observations for all participants ("PID"), between session 1 and 2. In other words, there should be lines connecting several sets of two-points on the "Male" portion of the x-axis (i.e. per participant), and "Female portion".
I tried this with "geom_line" (below) but to no avail (instead, it created a single vertical line in the middle of "Male" and another in the middle of "Female"). I'm not too sure how to fix this.
See code below:
ggplot(data_foo, aes(x=factor(Group),y=Values, colour = factor(Session), fill = factor(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 1.0, position = "dodge", fill = "black") +
geom_line(aes(group = PID), colour="dark grey") +
labs(title='My Data',x='Group',y='Values') +
theme_light()
Sample data (.txt)
data_foo <- readr::read_csv("PID,Group,Session,Values
P1,F,1,14
P2,F,1,13
P3,F,1,16
P4,M,1,18
P5,F,1,20
P6,M,1,27
P7,M,1,19
P8,M,1,11
P9,F,1,28
P10,F,1,20
P11,F,1,24
P12,M,1,10
P1,F,2,26
P2,F,2,21
P3,F,2,19
P4,M,2,13
P5,F,2,26
P6,M,2,15
P7,M,2,23
P8,M,2,23
P9,F,2,30
P10,F,2,21
P11,F,2,11
P12,M,2,19")
The trouble you have is that you want to dodge by several groups. Your geom_line does not know how to split the Group variable by session. Here are two ways to address this problem. Method 1 is probably the most "ggploty way", and a neat way of adding another grouping without making the visualisation too overcrowded. for method 2 you need to change your x variable
1) Use facet
2) Use interaction to split session for each Group. Define levels for the right bar order
I have also used geom_point instead, because geom_dot is more a specific type of histogram.
I would generally recommend to use boxplots for such plots of values like that, because bars are more appropriate for specific measures such as counts.
Method 1: Facets
library(ggplot2)
ggplot(data_foo, aes(x = Session, y = Values, fill = as.character(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = PID)) +
geom_point(aes(group = PID), shape = 21, color = 'black') +
facet_wrap(~Group)
Created on 2020-01-20 by the reprex package (v0.3.0)
Method 2: create an interaction term in your x variable. note that you need to order the factor levels manually.
data_foo <- data_foo %>% mutate(new_x = factor(interaction(Group,Session), levels = c('F.1','F.2','M.1','M.2')))
ggplot(data_foo, aes(x = new_x, y = Values, fill = as.character(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = PID)) +
geom_point(aes(group = PID), shape = 21, color = 'black')
Created on 2020-01-20 by the reprex package (v0.3.0)
But everything gets visually not very compelling.
I suggest doing a few visualization tips to have a more informative chart. For example, I feel like having a differentiation of colors for PID will help us track the changes of each participant for different levels of other variables. Something like:
library(ggplot2)
ggplot(data_foo, aes(x = factor(Session), y = Values, fill = factor(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = factor(PID), colour=factor(PID)), size=2, alpha=0.7) +
geom_point(aes(group = factor(PID), colour=factor(PID)), shape = 21, size=2,show.legend = F) +
theme_bw() +
labs(x='Session',fill='Session',colour='PID')+
theme(legend.position="right") +
facet_wrap(~Group)+
scale_colour_discrete(breaks=paste0('P',1:12))
And we have the following plot:
Hope it helps.

ggplot2 - add manual legend to multiple layers

I have a ggplot in which I am using color for my geom_points as a function of one of my columns(my treatment) and then I am using the scale_color_manual to choose the colors.
I automatically get my legend right
The problem is I need to graph some horizontal lines that have to do with the experimental set up, which I am doing with geom_vline, but then I don't know how to manually add a separate legend that doesn't mess with the one I already have and that states what those lines are.
I have the following code
ggplot(dcons.summary, aes(x = meters, y = ymean, color = treatment, shape = treatment)) +
geom_point(size = 4) +
geom_errorbar(aes(ymin = ymin, ymax = ymax)) +
scale_color_manual(values=c("navy","seagreen3"))+
theme_classic() +
geom_vline(xintercept = c(0.23,3.23, 6.23,9.23), color= "bisque3", size=0.4) +
scale_x_continuous(limits = c(-5, 25)) +
labs(title= "Sediment erosion", subtitle= "-5 -> 25 meters; standard deviation; consistent measurements BESE & Control", x= "distance (meters)", y="erosion (cm)", color="Treatment", shape="Treatment")
So I would just need an extra legend beneath the "treatment" one that says "BESE PLOTS LOCATION" and that is related to the gray lines
I have been searching for a solution, I've tried using "scale_linetype_manual" and also "guides", but I'm not getting there
As you provided no reproducible example, I used data from the mtcars dataset.
In addition I modified this similar answer a little bit. As you already specified the color and in addition the fill factor is not working here, you can use the linetype as a second parameter within aes wich can be shown in the legend:
xid <- data.frame(xintercept = c(15,20,30), lty=factor(1))
mtcars %>%
ggplot(aes(mpg ,cyl, col=factor(gear))) +
geom_point() +
geom_vline(data=xid, aes(xintercept=xintercept, lty=lty) , col = "red", size=0.4) +
scale_linetype_manual(values = 1, name="",label="BESE PLOTS LOCATION")
Or without the second data.frame:
ggplot() +
geom_point(data = mtcars,aes(mpg ,cyl, col=factor(gear))) +
geom_vline(aes(xintercept=c(15,20,30), lty=factor(1) ), col = "red", size=0.4)+
scale_linetype_manual(values = 1, name="",label="BESE PLOTS LOCATION")