Plotting lines, alphas and facet_grids - ggplot2

I'd like to compare a number of curves which are very similar. Plotting them on the same axis gets very confusing (because they are so close together) but plotting them on different axes is no better because they are almost indistinguishable.
I'd like to do a facet_wrap for each curve and have the curves which do not belong to that facet set to alpha = 0.25 while the faceted curve is alpha = 1.
For reference, my data looks like
tibble::tribble(
~METHOD, ~mean, ~lower, ~upper, ~width, ~n, ~p,
"ASYMPTOTIC", 0.939365, 0.938319047477885, 0.940410952522115, 0.0445455136166802, 500L, "p = 0.07",
"CLOGLOG", 0.946005, 0.945005622467146, 0.946986730333604, 0.0445611267179592, 500L, "p = 0.07",
"EXACT", 0.964695, 0.963876926502612, 0.96549956881645, 0.0465924279807666, 500L, "p = 0.07",
"LOGIT", 0.95672, 0.955819372211766, 0.957603082736301, 0.0450897716879822, 500L, "p = 0.07",
"WILSON", 0.95672, 0.95581938978958, 0.957603065836674, 0.044862320426846, 500L, "p = 0.07",
"ASYMPTOTIC", 0.93936, 0.938314007137612, 0.940405992862388, 0.0445885063907574, 499L, "p = 0.07"
)
I'd like to facet METHOD and plot n vs width.
Is this at all possible?

Can't do much with your sample data (need more rows), so I cooked up an example using some generated data. Basically, if we add two grouping variables, we can add the lines to each facet and control the alpha, and then use the normal facetting.
Basically, I would add a second column METHOD2 (which equals METHOD), and use that to draw lines on each facet. In the call to geom_line using METHOD2, make sure to specify geom_line(data = select(dat, -METHOD), aes(group = METHOD2))
library(tidyverse)
set.seed(123)
dat <- data_frame(x = rnorm(50),
y = 2 * x + rnorm(50),
g = sample(letters[1:3], 50, T))
dat$g2 <- dat$g
ggplot(dat, aes(x = x, y = y))+
geom_line(data = select(dat, -g),
aes(group = g2), alpha = .25)+
geom_line()+
facet_wrap(~g)
Specific code for your case:
dat$METHOD2 <- dat$METHOD
ggplot(dat, aes(x = width, y = n))+
geom_line(data = select(dat, -METHOD), aes(group = METHOD2), alpha = 0.25)+
geom_line()

Related

Adding stat = count on top of histogram in ggplot

I've seen some other examples (especially using geom_col() and stat_bin()) to add frequency or count numbers on top of bars. I'm trying to get this to work with geom_histogram() where I have a discrete (string), not continuous, x variable.
library(tidyverse)
d <- cars |>
mutate( discrete_var = factor(speed))
ggplot(d, aes(x = discrete_var)) +
geom_histogram(stat = "count") +
stat_bin(binwidth=1, geom='text', color='white', aes(label=..count..),
position=position_stack(vjust = 0.5)) +
Gives me an error because StatBin requires a continuous x variable. Any quick fix ideas?
The error message gives you the answer: ! StatBin requires a continuous x variable: the x variable is discrete.Perhaps you want stat="count"?
So instead of stat_bin() use stat_count()
And for further reference here is a reproducible example:
library(tidyverse)
d <- cars |>
mutate( discrete_var = factor(speed))
ggplot(data = d,
aes(x = discrete_var)) +
geom_histogram(stat = "count") +
stat_count(binwidth = 1,
geom = 'text',
color = 'white',
aes(label = ..count..),
position = position_stack(vjust = 0.5))

Connect observations (dots and lines) without using ggpaired

I created a bar chart using geom_bar with "Group" on the x-axis (Female, Male), and "Values" on the y-axis. Group is further subdivided into "Session" such that there is "Session 1" and "Session 2" for both Male and Female (i.e. four bars in total).
Since all participants participated in Session 1 and 2, I overlayed a dotplot (geom_dot) over each of the four bars, to represent the individual data.
I am now trying to connect the observations for all participants ("PID"), between session 1 and 2. In other words, there should be lines connecting several sets of two-points on the "Male" portion of the x-axis (i.e. per participant), and "Female portion".
I tried this with "geom_line" (below) but to no avail (instead, it created a single vertical line in the middle of "Male" and another in the middle of "Female"). I'm not too sure how to fix this.
See code below:
ggplot(data_foo, aes(x=factor(Group),y=Values, colour = factor(Session), fill = factor(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_dotplot(binaxis = "y", stackdir = "center", dotsize = 1.0, position = "dodge", fill = "black") +
geom_line(aes(group = PID), colour="dark grey") +
labs(title='My Data',x='Group',y='Values') +
theme_light()
Sample data (.txt)
data_foo <- readr::read_csv("PID,Group,Session,Values
P1,F,1,14
P2,F,1,13
P3,F,1,16
P4,M,1,18
P5,F,1,20
P6,M,1,27
P7,M,1,19
P8,M,1,11
P9,F,1,28
P10,F,1,20
P11,F,1,24
P12,M,1,10
P1,F,2,26
P2,F,2,21
P3,F,2,19
P4,M,2,13
P5,F,2,26
P6,M,2,15
P7,M,2,23
P8,M,2,23
P9,F,2,30
P10,F,2,21
P11,F,2,11
P12,M,2,19")
The trouble you have is that you want to dodge by several groups. Your geom_line does not know how to split the Group variable by session. Here are two ways to address this problem. Method 1 is probably the most "ggploty way", and a neat way of adding another grouping without making the visualisation too overcrowded. for method 2 you need to change your x variable
1) Use facet
2) Use interaction to split session for each Group. Define levels for the right bar order
I have also used geom_point instead, because geom_dot is more a specific type of histogram.
I would generally recommend to use boxplots for such plots of values like that, because bars are more appropriate for specific measures such as counts.
Method 1: Facets
library(ggplot2)
ggplot(data_foo, aes(x = Session, y = Values, fill = as.character(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = PID)) +
geom_point(aes(group = PID), shape = 21, color = 'black') +
facet_wrap(~Group)
Created on 2020-01-20 by the reprex package (v0.3.0)
Method 2: create an interaction term in your x variable. note that you need to order the factor levels manually.
data_foo <- data_foo %>% mutate(new_x = factor(interaction(Group,Session), levels = c('F.1','F.2','M.1','M.2')))
ggplot(data_foo, aes(x = new_x, y = Values, fill = as.character(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = PID)) +
geom_point(aes(group = PID), shape = 21, color = 'black')
Created on 2020-01-20 by the reprex package (v0.3.0)
But everything gets visually not very compelling.
I suggest doing a few visualization tips to have a more informative chart. For example, I feel like having a differentiation of colors for PID will help us track the changes of each participant for different levels of other variables. Something like:
library(ggplot2)
ggplot(data_foo, aes(x = factor(Session), y = Values, fill = factor(Session))) +
geom_bar(stat = "summary", fun.y = "mean", position = "dodge") +
geom_line(aes(group = factor(PID), colour=factor(PID)), size=2, alpha=0.7) +
geom_point(aes(group = factor(PID), colour=factor(PID)), shape = 21, size=2,show.legend = F) +
theme_bw() +
labs(x='Session',fill='Session',colour='PID')+
theme(legend.position="right") +
facet_wrap(~Group)+
scale_colour_discrete(breaks=paste0('P',1:12))
And we have the following plot:
Hope it helps.

Depth Profiling visualization

I'm trying to create a depth profile graph with the variables depth, distance and temperature. The data collected is from 9 different points with known distances between them (distance 5m apart, 9 stations, 9 different sets of data). The temperature readings are according to these 9 stations where a sonde was dropped directly down, taking readings of temperature every 2 seconds. Max depth at each of the 9 stations were taken from the boat also.
So the data I have is:
Depth at each of the 9 stations (y axis)
Temperature readings at each of the 9 stations, at around .2m intervals vertical until the bottom was reached (fill area)
distance between the stations, (x axis)
Is it possible to create a depth profile similar to this? (obviously without the greater resolution in this graph)
I've already tried messing around with ggplot2 and raster but I just can't seem to figure out how to do this.
One of the problems I've come across is how to make ggplot2 distinguish between say 5m depth temperature reading at station 1 and 5m temperature reading at station 5 since they have the same depth value.
Even if you can guide me towards another program that would allow me to create a graph like this, that would be great
[ REVISION ]
(Please comment me if you know more suitable interpolation methods, especially not needing to cut under bottoms data.)
ggplot() needs long data form.
library(ggplot2)
# example data
max.depths <- c(1.1, 4, 4.7, 7.7, 8.2, 7.8, 10.7, 12.1, 14.3)
depth.list <- sapply(max.depths, function(x) seq(0, x, 0.2))
temp.list <- list()
set.seed(1); for(i in 1:9) temp.list[[i]] <- sapply(depth.list[[i]], function(x) rnorm(1, 20 - x*0.5, 0.2))
set.seed(1); dist <- c(0, sapply(seq(5, 40, 5), function(x) rnorm(1, x, 1)))
dist.list <- sapply(1:9, function(x) rep(dist[x], length(depth.list[[x]])))
main.df <- data.frame(dist = unlist(dist.list), depth = unlist(depth.list) * -1, temp = unlist(temp.list))
# a raw graph
ggplot(main.df, aes(x = dist, y = depth, z = temp)) +
geom_point(aes(colour = temp), size = 1) +
scale_colour_gradientn(colours = topo.colors(10))
# a relatively raw graph (don't run with this example data)
ggplot(main.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp)) + # geom_contour() +
scale_fill_gradientn(colours = topo.colors(10))
If you want a graph such like you showed, you have to do interpolation. Some packages give you spatial interpolation methods. In this example, I used akima package but you should think seriously that which interpolation methods to use.
I used nx = 300 and ny = 300 in below code but I think it would be better to decide those values carefully. Large nx and ny gives a high resolution graph, but don't foreget real nx and ny (in this example, real nx is only 9 and ny is 101).
library(akima); library(dplyr)
interp.data <- interp(main.df$dist, main.df$depth, main.df$temp, nx = 300, ny = 300)
interp.df <- interp.data %>% interp2xyz() %>% as.data.frame()
names(interp.df) <- c("dist", "depth", "temp")
# draw interp.df
ggplot(interp.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp)) + # geom_contour() +
scale_fill_gradientn(colours = topo.colors(10))
# to think appropriateness of interpolation (raw and interpolation data)
ggplot(interp.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp), alpha = 0.3) + # interpolation
scale_fill_gradientn(colours = topo.colors(10)) +
geom_point(data = main.df, aes(colour = temp), size = 1) + # raw
scale_colour_gradientn(colours = topo.colors(10))
Bottoms don't match !!I found ?interp says "interpolation only within convex hull!", oops... I'm worrid about the interpolation around the problem-area, is it OK ? If no problem, you need only cut the data under the bottoms. If not, ... I can't answer immediately (below is an example code to cut).
bottoms <- max.depths * -1
# calculate bottom values using linear interpolation
approx.bottoms <- approx(dist, bottoms, n = 300) # n must be the same value as interp()'s nx
# change temp values under bottom into NA
library(dplyr)
interp.cut.df <- interp.df %>% cbind(bottoms = approx.bottoms$y) %>%
mutate(temp = ifelse(depth >= bottoms, temp, NA)) %>% select(-bottoms)
ggplot(interp.cut.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp)) +
scale_fill_gradientn(colours = topo.colors(10)) +
geom_point(data = main.df, size = 1)
If you want to use stat_contour
It is harder to use stat_contour than geom_raster because it needs a regular grid form. As far as I see your graph, your data (depth and distance) don't form a regular grid, it means it is much difficult to use stat_contour with your raw data. So I used interp.cut.df to draw a contour plot. And stat_contour have a endemic problem (see How to fill in the contour fully using stat_contour), so you need to expand your data.
library(dplyr)
# 1st: change NA into a temp's out range value (I used 0)
interp.contour.df <- interp.cut.df
interp.contour.df[is.na(interp.contour.df)] <- 0
# 2nd: expand the df (It's a little complex, so please use this function)
contour.support.func <- function(df) {
colname <- names(df)
names(df) <- c("x", "y", "z")
Range <- as.data.frame(sapply(df, range))
Dim <- as.data.frame(t(sapply(df, function(x) length(unique(x)))))
arb_z = Range$z[1] - diff(Range$z)/20
df2 <- rbind(df,
expand.grid(x = c(Range$x[1] - diff(Range$x)/20, Range$x[2] + diff(Range$x)/20),
y = seq(Range$y[1], Range$y[2], length = Dim$y), z = arb_z),
expand.grid(x = seq(Range$x[1], Range$x[2], length = Dim$x),
y = c(Range$y[1] - diff(Range$y)/20, Range$y[2] + diff(Range$y)/20), z = arb_z))
names(df2) <- colname
return(df2)
}
interp.contour.df2 <- contour.support.func(interp.contour.df)
# 3rd: check the temp range (these values are used to define contour's border (breaks))
range(interp.cut.df$temp, na.rm=T) # 12.51622 20.18904
# 4th: draw ... the bottom border is dirty !!
ggplot(interp.contour.df2, aes(x = dist, y = depth, z = temp)) +
stat_contour(geom="polygon", breaks = seq(12.51622, 20.18904, length = 11), aes(fill = ..level..)) +
coord_cartesian(xlim = range(dist), ylim = range(bottoms), expand = F) + # cut expanded area
scale_fill_gradientn(colours = topo.colors(10)) # breaks's length is 11, so 10 colors are needed
# [Note]
# You can define the contour's border values (breaks) and colors.
contour.breaks <- c(12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5)
# = seq(12.5, 20.5, 1) or seq(12.5, 20.5, length = 9)
contour.colors <- c("darkblue", "cyan3", "cyan1", "green3", "green", "yellow2","pink", "darkred")
# breaks's length is 9, so 8 colors are needed.
# 5th: vanish the bottom border by bottom line
approx.df <- data.frame(dist = approx.bottoms$x, depth = approx.bottoms$y, temp = 0) # 0 is dummy value
ggplot(interp.contour.df2, aes(x = dist, y = depth, z = temp)) +
stat_contour(geom="polygon", breaks = contour.breaks, aes(fill = ..level..)) +
coord_cartesian(xlim=range(dist), ylim=range(bottoms), expand = F) +
scale_fill_gradientn(colours = contour.colors) +
geom_line(data = approx.df, lwd=1.5, color="gray50")
bonus: legend technic
library(dplyr)
interp.contour.df3 <- interp.contour.df2 %>% mutate(temp2 = cut(temp, breaks = contour.breaks))
interp.contour.df3$temp2 <- factor(interp.contour.df3$temp2, levels = rev(levels(interp.contour.df3$temp2)))
ggplot(interp.contour.df3, aes(x = dist, y = depth, z = temp)) +
stat_contour(geom="polygon", breaks = contour.breaks, aes(fill = ..level..)) +
coord_cartesian(xlim=range(dist), ylim=range(bottoms), expand = F) +
scale_fill_gradientn(colours = contour.colors, guide = F) + # add guide = F
geom_line(data = approx.df, lwd=1.5, color="gray50") +
geom_point(aes(colour = temp2), pch = 15, alpha = 0) + # add
guides(colour = guide_legend(override.aes = list(colour = rev(contour.colors), alpha = 1, cex = 5))) + # add
labs(colour = "temp") # add
You want to treat this as a 3-D surface with temperature as the z dimension. The given plot is a contour plot and it looks like ggplot2 can do that with stat_contour.
I'm not sure how the contour lines are computed (often it's linear interpolation along a Delaunay triangulation). If you want more control over how to interpolate between your x/y grid points, you can calculate a surface model first and feed those z coordinates into ggplot2.

Legend not showing. Error in strwidth(legend, units = "user", cex = cex, font = text.font) : plot.new has not been called yet

I have the code below that is a combination of two boxplots and dot plots in one. It is a representation of barring density in 4 different species. The grey depicts the males and the tan the females.
data<-read.csv("C:/Users/Jeremy/Documents/A_Trogon rufus/Black-and-White/BARDATA_boxplots_M.csv")
datF<-read.csv("C:/Users/Jeremy/Documents/A_Trogon rufus/FEMALES_BW&Morphom.csv")
cleandataM<-subset(data, data$Age=="Adult" & data$White!="NA", select=(OTU:Density))
cleandatF<-subset(datF, datF$Age=="Adult", select=(OTU:Density))
dataM<- as.data.frame(cleandataM)
dataF<- as.data.frame(cleandatF)
library(ggplot2)
ggplot(dataM, aes(factor(OTU), Density))+
geom_boxplot(data=dataF,aes(factor(OTU),Density), fill="AntiqueWhite")+
geom_boxplot(fill="lightgrey", alpha=0.5)+
geom_point(data=dataF,position = position_jitter(width = 0.1), colour="tan")+
geom_point(data=dataM, position = position_jitter(width = 0.1), color="DimGrey")+ scale_x_discrete(name="",limits=order)+
scale_y_continuous(name="Bar Density (bars/cm)")+
theme(panel.background = element_blank(),panel.grid.minor=element_blank(),
panel.grid.major=element_blank(),axis.line = element_line(colour = "black"),
axis.title.y = element_text(colour="black", size=14),
axis.text.y = element_text(colour="black", size=12),
axis.text.x = element_text(colour="black", size=14))
This works just fine.
However, when I try to add a legend as:
legend("topright", inset=.01, bty="n", cex=.75, title="Sex",
c("Male", "Female"), fill=c("lightgrey", "black")
It returns the following Error:
Error in strwidth(legend, units = "user", cex = cex, font = text.font) :
plot.new has not been called yet
Please, is there someone who could suggest how to correct this?

Is it possible to have 2 legends for variables when one is continuous and the other is discrete?

I checked a few examples online and I am not sure that it can be done because every plot with 2 different variables (continuous and discrete) has one of 2 options:
legend regarding the continuous variable
legend regarding the discrete variable
Just for visualization, I put here an example. Imagine that I want to have a legend for the blue line. Is it possible to do that??
The easiest approach would be to map it to a different aesthetic than you already use:
library(ggplot2)
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
geom_smooth(method = "loess", aes(linetype = "fit"))
There area also specialised packages for adding additional colour legends:
library(ggplot2)
library(ggnewscale)
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
new_scale_colour() +
geom_smooth(method = "loess", aes(colour = "fit"))
Beware that if you want to tweak colours via a colourscale, you must first add these before calling the new_scale_colour(), i.e.:
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
scale_colour_manual(values = c("red", "green", "blue")) +
new_scale_colour() +
geom_smooth(method = "loess", aes(colour = "fit")) +
scale_colour_manual(values = "purple")
EDIT: To adress comment: yes it is possible with a line that is data independent, I was just re-using the data for brevity of example. See below for arbitrary line (also should work with the ggnewscale approach):
ggplot(mtcars, aes(x = mpg, y = hp)) +
geom_point(aes(colour = as.factor(gear), size = cyl)) +
geom_line(data = data.frame(x = 1:30, y = rnorm(10, 200, 10)),
aes(x, y, linetype = "arbitrary line"))