Multiple Relative frequency histogram in R, ggplot - ggplot2

I'm trying to plot multiple histograms of relative frequencies in R. ggplot

Below are some basic example with the build-in iris dataset. The relative part is obtained by multiplying the density with the binwidth.
library(ggplot2)
ggplot(iris, aes(Sepal.Length, fill = Species)) +
geom_histogram(aes(y = after_stat(density * width)),
position = "identity", alpha = 0.5)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(iris, aes(Sepal.Length)) +
geom_histogram(aes(y = after_stat(density * width))) +
facet_wrap(~ Species)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2022-03-07 by the reprex package (v2.0.1)

Related

ggplot facet different Y axis order based on value

I have a faceted plot wherein I'd like to have the Y-axis labels and the associated values appear in descending order of values (and thereby changing the order of the labels) for each facet. What I have is this, but the order of the labels (and the corresponding values) is the same for each facet.
ggplot(rf,
aes(x = revenues,
y = reorder(AgencyName, revenues))) +
geom_point(stat = "identity",
aes(color = AgencyName),
show.legend = FALSE) +
xlab(NULL) +
ylab(NULL) +
scale_x_continuous(label = scales::comma) +
facet_wrap(~year, ncol = 3, scales = "free_y") +
theme_minimal()
Can someone point me to the solution?
The functions reorder_within and scale_*_reordered from the tidytext package might come in handy.
reorder_within recodes the values into a factor with strings in the form of "VARIABLE___WITHIN". This factor is ordered by the values in each group of WITHIN.
scale_*_reordered removes the "___WITHIN" suffix when plotting the axis labels.
Add scales = "free_y" in facet_wrap to make it work as expected.
Here is an example with generated data:
library(tidyverse)
# Generate data
df <- expand.grid(
year = 2019:2021,
group = paste("Group", toupper(letters[1:8]))
)
set.seed(123)
df$value <- rnorm(nrow(df), mean = 10, sd = 2)
df %>%
mutate(group = tidytext::reorder_within(group, value, within = year)) %>%
ggplot(aes(value, group)) +
geom_point() +
tidytext::scale_y_reordered() +
facet_wrap(vars(year), scales = "free_y")

How to outline a histogram with a color and add a bell curve on ggplot2

I have been trying to add a bell curve to my histogram an outline it with a color so that it is more pleasing. enter image description here
I have added what my histogram looks like to give someone an idea on what I am working with, also here is my code thus far, thank you in advance.
ggplot(data = mammal.data.22.select2)+
geom_histogram(aes(x=Time, fill=Species))+
scale_fill_manual(values=c("paleturquoise4", "turquoise2"))+
facet_wrap(~Species, nrow=1)+
ylab("Observations")+
xlab("Time of Day")+
theme(strip.text.x = element_blank())
Let's build a histogram with a build-in dataset that seems similar-ish to your data structure.
library(ggplot2)
binwidth <- 0.25
p <- ggplot(iris, aes(Petal.Length)) +
geom_histogram(
aes(fill = Species),
binwidth = binwidth,
alpha = 0.5
) +
facet_wrap(~ Species)
You can use stat_bin() + geom_step() to give an outline to the histogram, without colouring the edge of every rectangle in the histogram. The only downside is that the first and last bins don't touch the x-axis.
p + stat_bin(
geom = "step", direction = "mid",
aes(colour = Species), binwidth = binwidth
)
To overlay a density function with a histogram, you could calculate the relevant parameters yourself and use stat_function() with fun = dnorm repeatedly. Alternatively, you can use ggh4x::stat_theodensity() to achieve a similar thing. Note that whether you use stat_function() or stat_theodensity(), you should scale the density back to the counts that your histogram uses (or scale histogram to density). In the example below, we do that by using after_stat(count * binwidth).
p + ggh4x::stat_theodensity(
aes(colour = Species,
y = after_stat(count * binwidth))
)
Created on 2022-04-15 by the reprex package (v2.0.1)
(disclaimer: I'm the author of ggh4x)

Percentage labels in pie chart with ggplot

I'm working now in a statistics project and recently started with R. I have some problems with the visualization. I found a lot of different tutorials about how to add percentage labels in pie charts, but after one hour of trying I still don't get it. Maybe something is different with my data frame so that this doesn't work?
It's a data frame with collected survey answers, so I'm not allowed to publish them here. The column in question (geschäftliche_lage) is a factor with three levels ("Gut", "Befriedigend", "Schlecht"). I want to add percentage labels for each level.
I used the following code in order to create the pie chart:
dataset %>%
ggplot(aes(x= "", fill = geschäftliche_lage)) +
geom_bar(stat= "count", width = 1, color = "white") +
coord_polar("y", start = 0, direction = -1) +
scale_fill_manual(values = c("#00BA38", "#619CFF", "#F8766D")) +
theme_void()
This code gives me the desired pie chart, but without percentage labels. As soon as a I try to add percentage labels, everything is messed up. Do you know a clean code for adding percentage labels?
If you need more information or data, just let me know!
Greetings
Using mtcars as example data. Maybe this what your are looking for:
library(ggplot2)
ggplot(mtcars, aes(x = "", fill = factor(cyl))) +
geom_bar(stat= "count", width = 1, color = "white") +
geom_text(aes(label = scales::percent(..count.. / sum(..count..))), stat = "count", position = position_stack(vjust = .5)) +
coord_polar("y", start = 0, direction = -1) +
scale_fill_manual(values = c("#00BA38", "#619CFF", "#F8766D")) +
theme_void()
Created on 2020-05-25 by the reprex package (v0.3.0)

ggplot2 - add manual legend to multiple layers

I have a ggplot in which I am using color for my geom_points as a function of one of my columns(my treatment) and then I am using the scale_color_manual to choose the colors.
I automatically get my legend right
The problem is I need to graph some horizontal lines that have to do with the experimental set up, which I am doing with geom_vline, but then I don't know how to manually add a separate legend that doesn't mess with the one I already have and that states what those lines are.
I have the following code
ggplot(dcons.summary, aes(x = meters, y = ymean, color = treatment, shape = treatment)) +
geom_point(size = 4) +
geom_errorbar(aes(ymin = ymin, ymax = ymax)) +
scale_color_manual(values=c("navy","seagreen3"))+
theme_classic() +
geom_vline(xintercept = c(0.23,3.23, 6.23,9.23), color= "bisque3", size=0.4) +
scale_x_continuous(limits = c(-5, 25)) +
labs(title= "Sediment erosion", subtitle= "-5 -> 25 meters; standard deviation; consistent measurements BESE & Control", x= "distance (meters)", y="erosion (cm)", color="Treatment", shape="Treatment")
So I would just need an extra legend beneath the "treatment" one that says "BESE PLOTS LOCATION" and that is related to the gray lines
I have been searching for a solution, I've tried using "scale_linetype_manual" and also "guides", but I'm not getting there
As you provided no reproducible example, I used data from the mtcars dataset.
In addition I modified this similar answer a little bit. As you already specified the color and in addition the fill factor is not working here, you can use the linetype as a second parameter within aes wich can be shown in the legend:
xid <- data.frame(xintercept = c(15,20,30), lty=factor(1))
mtcars %>%
ggplot(aes(mpg ,cyl, col=factor(gear))) +
geom_point() +
geom_vline(data=xid, aes(xintercept=xintercept, lty=lty) , col = "red", size=0.4) +
scale_linetype_manual(values = 1, name="",label="BESE PLOTS LOCATION")
Or without the second data.frame:
ggplot() +
geom_point(data = mtcars,aes(mpg ,cyl, col=factor(gear))) +
geom_vline(aes(xintercept=c(15,20,30), lty=factor(1) ), col = "red", size=0.4)+
scale_linetype_manual(values = 1, name="",label="BESE PLOTS LOCATION")

Depth Profiling visualization

I'm trying to create a depth profile graph with the variables depth, distance and temperature. The data collected is from 9 different points with known distances between them (distance 5m apart, 9 stations, 9 different sets of data). The temperature readings are according to these 9 stations where a sonde was dropped directly down, taking readings of temperature every 2 seconds. Max depth at each of the 9 stations were taken from the boat also.
So the data I have is:
Depth at each of the 9 stations (y axis)
Temperature readings at each of the 9 stations, at around .2m intervals vertical until the bottom was reached (fill area)
distance between the stations, (x axis)
Is it possible to create a depth profile similar to this? (obviously without the greater resolution in this graph)
I've already tried messing around with ggplot2 and raster but I just can't seem to figure out how to do this.
One of the problems I've come across is how to make ggplot2 distinguish between say 5m depth temperature reading at station 1 and 5m temperature reading at station 5 since they have the same depth value.
Even if you can guide me towards another program that would allow me to create a graph like this, that would be great
[ REVISION ]
(Please comment me if you know more suitable interpolation methods, especially not needing to cut under bottoms data.)
ggplot() needs long data form.
library(ggplot2)
# example data
max.depths <- c(1.1, 4, 4.7, 7.7, 8.2, 7.8, 10.7, 12.1, 14.3)
depth.list <- sapply(max.depths, function(x) seq(0, x, 0.2))
temp.list <- list()
set.seed(1); for(i in 1:9) temp.list[[i]] <- sapply(depth.list[[i]], function(x) rnorm(1, 20 - x*0.5, 0.2))
set.seed(1); dist <- c(0, sapply(seq(5, 40, 5), function(x) rnorm(1, x, 1)))
dist.list <- sapply(1:9, function(x) rep(dist[x], length(depth.list[[x]])))
main.df <- data.frame(dist = unlist(dist.list), depth = unlist(depth.list) * -1, temp = unlist(temp.list))
# a raw graph
ggplot(main.df, aes(x = dist, y = depth, z = temp)) +
geom_point(aes(colour = temp), size = 1) +
scale_colour_gradientn(colours = topo.colors(10))
# a relatively raw graph (don't run with this example data)
ggplot(main.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp)) + # geom_contour() +
scale_fill_gradientn(colours = topo.colors(10))
If you want a graph such like you showed, you have to do interpolation. Some packages give you spatial interpolation methods. In this example, I used akima package but you should think seriously that which interpolation methods to use.
I used nx = 300 and ny = 300 in below code but I think it would be better to decide those values carefully. Large nx and ny gives a high resolution graph, but don't foreget real nx and ny (in this example, real nx is only 9 and ny is 101).
library(akima); library(dplyr)
interp.data <- interp(main.df$dist, main.df$depth, main.df$temp, nx = 300, ny = 300)
interp.df <- interp.data %>% interp2xyz() %>% as.data.frame()
names(interp.df) <- c("dist", "depth", "temp")
# draw interp.df
ggplot(interp.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp)) + # geom_contour() +
scale_fill_gradientn(colours = topo.colors(10))
# to think appropriateness of interpolation (raw and interpolation data)
ggplot(interp.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp), alpha = 0.3) + # interpolation
scale_fill_gradientn(colours = topo.colors(10)) +
geom_point(data = main.df, aes(colour = temp), size = 1) + # raw
scale_colour_gradientn(colours = topo.colors(10))
Bottoms don't match !!I found ?interp says "interpolation only within convex hull!", oops... I'm worrid about the interpolation around the problem-area, is it OK ? If no problem, you need only cut the data under the bottoms. If not, ... I can't answer immediately (below is an example code to cut).
bottoms <- max.depths * -1
# calculate bottom values using linear interpolation
approx.bottoms <- approx(dist, bottoms, n = 300) # n must be the same value as interp()'s nx
# change temp values under bottom into NA
library(dplyr)
interp.cut.df <- interp.df %>% cbind(bottoms = approx.bottoms$y) %>%
mutate(temp = ifelse(depth >= bottoms, temp, NA)) %>% select(-bottoms)
ggplot(interp.cut.df, aes(x = dist, y = depth, z = temp)) +
geom_raster(aes(fill = temp)) +
scale_fill_gradientn(colours = topo.colors(10)) +
geom_point(data = main.df, size = 1)
If you want to use stat_contour
It is harder to use stat_contour than geom_raster because it needs a regular grid form. As far as I see your graph, your data (depth and distance) don't form a regular grid, it means it is much difficult to use stat_contour with your raw data. So I used interp.cut.df to draw a contour plot. And stat_contour have a endemic problem (see How to fill in the contour fully using stat_contour), so you need to expand your data.
library(dplyr)
# 1st: change NA into a temp's out range value (I used 0)
interp.contour.df <- interp.cut.df
interp.contour.df[is.na(interp.contour.df)] <- 0
# 2nd: expand the df (It's a little complex, so please use this function)
contour.support.func <- function(df) {
colname <- names(df)
names(df) <- c("x", "y", "z")
Range <- as.data.frame(sapply(df, range))
Dim <- as.data.frame(t(sapply(df, function(x) length(unique(x)))))
arb_z = Range$z[1] - diff(Range$z)/20
df2 <- rbind(df,
expand.grid(x = c(Range$x[1] - diff(Range$x)/20, Range$x[2] + diff(Range$x)/20),
y = seq(Range$y[1], Range$y[2], length = Dim$y), z = arb_z),
expand.grid(x = seq(Range$x[1], Range$x[2], length = Dim$x),
y = c(Range$y[1] - diff(Range$y)/20, Range$y[2] + diff(Range$y)/20), z = arb_z))
names(df2) <- colname
return(df2)
}
interp.contour.df2 <- contour.support.func(interp.contour.df)
# 3rd: check the temp range (these values are used to define contour's border (breaks))
range(interp.cut.df$temp, na.rm=T) # 12.51622 20.18904
# 4th: draw ... the bottom border is dirty !!
ggplot(interp.contour.df2, aes(x = dist, y = depth, z = temp)) +
stat_contour(geom="polygon", breaks = seq(12.51622, 20.18904, length = 11), aes(fill = ..level..)) +
coord_cartesian(xlim = range(dist), ylim = range(bottoms), expand = F) + # cut expanded area
scale_fill_gradientn(colours = topo.colors(10)) # breaks's length is 11, so 10 colors are needed
# [Note]
# You can define the contour's border values (breaks) and colors.
contour.breaks <- c(12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5)
# = seq(12.5, 20.5, 1) or seq(12.5, 20.5, length = 9)
contour.colors <- c("darkblue", "cyan3", "cyan1", "green3", "green", "yellow2","pink", "darkred")
# breaks's length is 9, so 8 colors are needed.
# 5th: vanish the bottom border by bottom line
approx.df <- data.frame(dist = approx.bottoms$x, depth = approx.bottoms$y, temp = 0) # 0 is dummy value
ggplot(interp.contour.df2, aes(x = dist, y = depth, z = temp)) +
stat_contour(geom="polygon", breaks = contour.breaks, aes(fill = ..level..)) +
coord_cartesian(xlim=range(dist), ylim=range(bottoms), expand = F) +
scale_fill_gradientn(colours = contour.colors) +
geom_line(data = approx.df, lwd=1.5, color="gray50")
bonus: legend technic
library(dplyr)
interp.contour.df3 <- interp.contour.df2 %>% mutate(temp2 = cut(temp, breaks = contour.breaks))
interp.contour.df3$temp2 <- factor(interp.contour.df3$temp2, levels = rev(levels(interp.contour.df3$temp2)))
ggplot(interp.contour.df3, aes(x = dist, y = depth, z = temp)) +
stat_contour(geom="polygon", breaks = contour.breaks, aes(fill = ..level..)) +
coord_cartesian(xlim=range(dist), ylim=range(bottoms), expand = F) +
scale_fill_gradientn(colours = contour.colors, guide = F) + # add guide = F
geom_line(data = approx.df, lwd=1.5, color="gray50") +
geom_point(aes(colour = temp2), pch = 15, alpha = 0) + # add
guides(colour = guide_legend(override.aes = list(colour = rev(contour.colors), alpha = 1, cex = 5))) + # add
labs(colour = "temp") # add
You want to treat this as a 3-D surface with temperature as the z dimension. The given plot is a contour plot and it looks like ggplot2 can do that with stat_contour.
I'm not sure how the contour lines are computed (often it's linear interpolation along a Delaunay triangulation). If you want more control over how to interpolate between your x/y grid points, you can calculate a surface model first and feed those z coordinates into ggplot2.