Barplot of percentages by groups in ggplot2 - ggplot2

So, I've done my searches but cannot find the solution to this problem i have with a bar plot in ggplot.
I'm trying to make the bars be in percentage of the total number of cases in each group in grouping variable 2.
Right now i have it visualising the number of counts,
Dataframe = ASAP
Grouping variable 1 - cc_groups (seen in top of the graph)
(counts number of cases within a range (steps of 20) in a score from 0-100.)
grouping variable 2 - asap
( binary variable with either intervention or control, number of controls and interventions are not the same)
Initial code
``` r
ggplot(ASAP, aes(x = asap, fill = asap)) + geom_bar(position = "dodge") +
facet_grid(. ~ cc_groups) + scale_fill_manual(values = c("red",
"darkgray"))
#> Error in ggplot(ASAP, aes(x = asap, fill = asap)): could not find function "ggplot"
```
Created on 2020-05-19 by the reprex package (v0.3.0)
this gives me the following graph which is a visualisation of the counts in each subgroup.
enter image description here
I have manually calculated the different percentages that actually needs to be visualised:
table_groups <- matrix(c(66/120,128/258,34/120,67/258,10/120,30/258,2/120,4/258,0,1/258,8/120,28/258),ncol = 2, byrow = T)
colnames(table_groups) <- c("ASAP","Control")
rownames(table_groups) <- c("0-10","20-39","40-59","60-79","80-99","100")
ASAP Control
0-10 0.55000 0.496124
20-39 0.28333 0.259690
40-59 0.08333 0.116279
60-79 0.01667 0.015504
80-99 0.00000 0.003876
100 0.06667 0.108527
When i use the solution provided by Stefan below (which was an excellent answer but didn't do the actual trick. i get the following output
``` r
ASAP %>% count(cc_groups, asap) %>% group_by(cc_groups) %>% mutate(pct = n/sum(n)) %>%
ggplot(aes(x = asap, y = pct, fill = asap)) + geom_col(position = "dodge") +
facet_grid(~cc_groups) + scale_fill_manual(values = c("red",
"darkgray"))
#> Error in ASAP %>% count(cc_groups, asap) %>% group_by(cc_groups) %>% mutate(pct = n/sum(n)) %>% : could not find function "%>%"
```
<sup>Created on 2020-05-19 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
enter image description here
whereas (when i go analogue) id like it to show the percentages as above like this.
enter image description here
Im SO sorry about that drawing.. :) and reprex kept feeding me errors, im sure im using it incorrectly.

The easiest way to achieve this is via aggregating the data before plotting, i.e. manually computing counts and percentages:
library(ggplot2)
library(dplyr)
ASAP %>%
count(cc_groups, asap) %>%
group_by(asap) %>%
mutate(pct = n / sum(n)) %>%
ggplot(aes(x = asap, y = pct, fill=asap)) +
geom_col(position="dodge")+
facet_grid(~cc_groups)+
scale_fill_manual(values = c("red","darkgray"))
Using ggplot2::mpg as example data:
library(ggplot2)
library(dplyr)
# example data
mpg2 <- mpg %>%
filter(cyl %in% c(4, 6)) %>%
mutate(cyl = factor(cyl))
# Manually compute counts and percentages
mpg3 <- mpg2 %>%
count(class, cyl) %>%
group_by(class) %>%
mutate(pct = n / sum(n))
# Plot
ggplot(mpg3, aes(x = cyl, y = pct, fill = cyl)) +
geom_col(position = "dodge") +
facet_grid(~ class) +
scale_fill_manual(values = c("red","darkgray"))
Created on 2020-05-18 by the reprex package (v0.3.0)

Related

plot gam results with original x values (not scaled and centred)

I have a dataset that I am modeling with a gam. Because there are two continuous varaibles in the gam, I have centred and scaled these variables before adding them to the model. Therefore, when I use the built-in features in gratia to show the results, the x values are not the same as the original scale. I'd like to plot the results using the scale of the original data.
An example:
library(tidyverse)
library(mgcv)
library(gratia)
set.seed(42)
df <- data.frame(
doy = sample.int(90, 300, replace = TRUE),
year = sample(c(1980:2020), size = 300, replace = TRUE),
site = c(rep("A", 150), rep("B", 80), rep("C", 70)),
sex = sample(c("F", "M"), size = 300, replace = TRUE),
mass = rnorm(300, mean = 500, sd = 50)) %>%
mutate(doy.s = scale(doy, center = TRUE, scale = TRUE),
year.s = scale(year, center = TRUE, scale = TRUE),
across(c(sex, site), as.factor))
m1 <- gam(mass ~
s(year.s, site, bs = "fs", by = sex, k = 5) +
s(doy.s, site, bs = "fs", by = sex, k = 5) +
s(sex, bs = "re"),
data = df, method = "REML", family = gaussian)
draw(m1)
How do I re-plot the last two panels in this figure to show the relationship between year and mass with ggplot?
You can't do this with gratia::draw automatically (unless I'm mistaken).* But you can use gratia::smooth_estimates to get a dataframe which you can then do whatever you like with.
To answer your specific question: to re-plot the last two panels of the plot you provided, but with year unscaled, you can do the following
# Get a tibble of smooth estimates from the model
sm <- gratia::smooth_estimates(m1)
# Add a new column for the unscaled year
sm <- sm %>% mutate(year = mean(df$year) + (year.s * sd(df$year)))
# Plot the smooth s(year.s,site) for sex=F with year unscaled
pF <- sm %>% filter(smooth == "s(year.s,site):sexF" ) %>%
ggplot(aes(x = year, y = est, color=site)) +
geom_line() +
theme(legend.position = "none") +
labs(y = "Partial effect", title = "s(year.s,site)", subtitle = "By: sex; F")
# Plot the smooth s(year.s,site) for sex=M with year unscaled
pM <- sm %>% filter(smooth == "s(year.s,site):sexM" ) %>%
ggplot(aes(x = year, y = est, color=site)) +
geom_line() +
theme(legend.position = "none") +
labs(y = "Partial effect", title = "s(year.s,site)", subtitle = "By: sex; M")
library(patchwork) # use `patchwork` just for easy side-by-side plots
pF + pM
to get:
EDIT: If you also want to shift result on the y-axis as #GavinSimpson (who is the author and maintainer of gratia) mentioned, you can do this with add_constant, adding this code before plotting above:
sm <- sm %>%
add_constant(coef(m1)["(Intercept)"]) %>%
transform_fun(inv_link(m1))
[You should also in general untransform the smooth by the inverse of the model's link function. In your case this is just the identity, so it is not necessary, but in general it would be. That's what the second step above is doing.]
In your example, this results in:
*As mentioned in the custom-plotting vignette for gratia, the goal of draw not to be fully customizable, but just to be useful default. See there for recommendations about custom plots.

Add space argument to facet_wrap

facet_wrap() has been recognized for not having a space = "free" argument (https://github.com/tidyverse/ggplot2/issues/2933). This can causes spacing issues on the y-axis of plots.
Create the above figure using the following code:
library(tidyverse)
p <-
mtcars %>%
rownames_to_column() %>%
ggplot(aes(x = disp, y = rowname)) + geom_point() +
facet_wrap(~ carb, ncol = 1, scales = "free_y")
facet_grid on the other hand has a space = "free" argument. Allowing for nice y-axis spacing.
Create the above figure using the following code:
p <-
mtcars %>%
rownames_to_column() %>%
ggplot(aes(x = disp, y = rowname)) + geom_point() +
facet_grid(carb ~ ., scales = "free_y", space = "free_y")
The issue with this is that the label is on the side, not the top. I sometimes have longer facet labels and few rows in the facet. This means the facet label gets cut off.
There is a solution from the ggforce package (comment by ilarischeinin on https://github.com/tidyverse/ggplot2/issues/2933).
p <-
mtcars %>%
rownames_to_column() %>%
ggplot(aes(x = disp, y = rowname)) + geom_point()
p + ggforce::facet_col(vars(carb), scales = "free_y", space = "free")
But, there are limitations leaving ggplot2. For example, I ultimately want a two column figure, and this functionality does not seem possible with ggforce. Is there any way to produce the same result using facet_wrap() so that I can utilize the ncol() argument?
Here is a potential workaround based on https://stackoverflow.com/a/29022188/12957340 :
library(tidyverse)
library(gtable)
library(grid)
p1 <- mtcars %>%
rownames_to_column() %>%
ggplot(aes(x = disp, y = rowname)) + geom_point() +
facet_grid(carb ~ ., scales = "free_y", space = "free_y") +
theme(panel.spacing = unit(1, 'lines'),
strip.text.y = element_text(angle = 0))
gt <- ggplotGrob(p1)
panels <-c(subset(gt$layout, grepl("panel", gt$layout$name), se=t:r))
for(i in rev(panels$t-1)) {
gt = gtable_add_rows(gt, unit(0.5, "lines"), i)
}
panels <-c(subset(gt$layout, grepl("panel", gt$layout$name), se=t:r))
strips <- c(subset(gt$layout, grepl("strip-r", gt$layout$name), se=t:r))
stripText = gtable_filter(gt, "strip-r")
for(i in 1:length(strips$t)) {
gt = gtable_add_grob(gt, stripText$grobs[[i]]$grobs[[1]], t=panels$t[i]-1, l=5)
}
gt = gt[,-6]
for(i in panels$t) {
gt$heights[i-1] = unit(0.8, "lines")
gt$heights[i-2] = unit(0.2, "lines")
}
grid.newpage()
grid.draw(gt)
Created on 2021-12-15 by the reprex package (v2.0.1)
It's not clear to me what you mean by "I ultimately want a two column figure", but if you can come up with an example to illustrate your 'ultimate' expected outcome I can try to adapt this approach and see if it will work or not.

For loop to read in multiple tables from SQLite database

I would like to create a for loop that reads in multiple tables from a SQLite database. I would like it to either read the first 300 tables, but ideally I would like to get it to read 300 random tables from my database into R.
For each table read in, I would like it to go through the written code, save the graph at the end then start over with a new table. If possible I would like the all of the tables to be on the same graph. I have written the code for a single table, but I am unsure as to how I could proceed from here.
for (i in 1:300){
# Reads the selected table in database
ind1 <- dbReadTable(mydb, i)
# Formats the SQL data to appropriate R data structure
cols <- c("Mortality", "AnimalID", "Species", "Sex", "CurrentCohort",
"BirthYear", "CaptureUnit","CaptureSubunit",
"CaptureArea", "ProjectName")
ind[cols] <- lapply(ind[cols], factor) ## as.factor() could also be used
ind$DateAndTime <- as.POSIXct(ind$DateAndTime, tz = "UTC",
origin = '1970-01-01')
# Converts the Longitude and Latitude to UTMs
ind <- convert_utm(ind1)
ind_steps <- ind %>%
# It's always a good idea to *double check* that your data are sorted
# properly before using lag() or lead() to get the previous/next value.
arrange(AnimalID, DateAndTime) %>%
# If we group_by() AnimalID, lead() will insert NAs in the proper
# places when we get to the end of one individual's data and the beginning
# of the next
group_by(AnimalID) %>%
# Now rename our base columns to reflect that they are the step's start point
rename(x1 = utm_x,
y1 = utm_y,
t1 = DateAndTime) %>%
# Attach the step's end point
mutate(x2 = lead(x1),
y2 = lead(y1),
t2 = lead(t1)) %>%
# Calculate differences in space and time
mutate(dx = x2 - x1,
dy = y2 - y1,
DateAndTime = as.numeric(difftime(t2, t1, units = "hours"))) %>%
# Calculate step length
mutate(sl = sqrt(dx^2 + dy^2)) %>%
# Calculate absolute angle
mutate(abs_angle = (pi/2 - atan2(dy, dx)) %% (2*pi)) %>%
# Calculate relative angle
mutate(rel_diff = (abs_angle - lag(abs_angle)) %% (2*pi),
rel_angle = ifelse(rel_diff > pi, rel_diff - 2*pi, rel_diff)) %>%
# Drop this uneccesary column
select(-rel_diff) %>%
# Drop incomplete final step
filter(!is.na(x2))
ind_steps <- ind_steps %>%
mutate(NSD = (x2 - x1[1])^2 + (y2 - y1[1])^2)
# Plot NSD
ind_steps %>%
ggplot(aes(x = t2, y = NSD)) +
geom_line() +
theme_bw()
}
Any help would be greatly appreciated!
If there are 1000 tables you can use sample to get random 300 from them, create a list with length 300 to store the plots and if you want to plot them together you can use cowplot::plot_grid.
random_tables <- sample(1000, 300, replace = TRUE)
plot_list <- vector('list', 300)
for (i in seq_along(random_tables)){
# Reads the selected table in database
ind1 <- dbReadTable(mydb, random_tables[i])
#...Rest of the code
#....
#....
# Plot NSD
plot_list[[i]] <- ggplot(ind_steps, aes(x = t2, y = NSD)) +
geom_line() + theme_bw()
}
cowplot::plot_grid(plotlist = plot_list, nrow = 30, ncol = 10)

double geom_bar, how to get the values for each bar

I have a ggplot of countries (X axis) over two different time periods (Y axis), so double bar for each country.
I would like to see the values of each bar. I used geom_text but I get the values on the same line so they are not in place. How can I use geom_text for this type of plot ?
Rcountry %>%
gather("Type", "Value",-Country) %>%
ggplot(aes(Country, Value, fill = Type)) +
geom_bar(position = "dodge", stat = "identity") +
coord_flip()+
theme_minimal()+scale_fill_grey()+
theme(legend.position="bottom")+
theme(legend.title = element_blank())+
scale_fill_manual(values=c("darkslategray4", "darkslategrey"))+
labs(x="Country", y="Stock of robots per thousands worker in '000")+
geom_text(aes(label=c(X2010, X2018)), size=3.5)```
Thank you
This can be achieved by adding position = position_dodge(.9) to geom_text, i.e. you have to the positioning used in geom_bar to geom_text to get the labels right. Using mtcars as example data, try this:
library(ggplot2)
library(dplyr)
mtcars2 <- mtcars %>%
group_by(cyl, gear) %>%
summarise(mpg = mean(mpg)) %>%
ungroup()
ggplot(mtcars2, aes(x = factor(cyl), mpg, fill = factor(gear))) +
geom_bar(position = "dodge", stat = "identity") +
theme_minimal() +
scale_fill_grey() +
theme(legend.position="bottom")+
theme(legend.title = element_blank())+
labs(x="Country", y="Stock of robots per thousands worker in '000")+
geom_text(aes(label = mpg), position = position_dodge(.9), size=3.5) +
coord_flip()
Created on 2020-04-15 by the reprex package (v0.3.0)

Adding percentage labels to a barplot with y-axis 'count' in R

I'd like to add percentage labels per gear to the bars but keep the count y-scale.
E.g. 10% of all 'gear 3' are '4 cyl'
library(ggplot)
ds <- mtcars
ds$gear <- as.factor(ds$gear)
p1 <- ggplot(ds, aes(gear, fill=gear)) +
geom_bar() +
facet_grid(cols = vars(cyl), margins=T)
p1
Ideally only in ggplot, wihtout adding dplyr or tidy. I found some of these solutions but then I get other issues with my original data.
EDIT: Suggestions that this is a duplicate from:
enter link description here
I saw this also earlier, but wasn't able to integrate that code into what I want:
# i just copy paste some of the code bits and try to reconstruct what I had earlier
ggplot(ds, aes(gear, fill=gear)) +
facet_grid(cols = vars(cyl), margins=T) +
# ..prop.. meaning %, but i want to keep the y-axis as count
geom_bar(aes(y = ..prop.., fill = factor(..x..)), stat="count") +
# not sure why, but I only get 100%
geom_text(aes( label = scales::percent(..prop..),
y= ..prop.. ), stat= "count", vjust = -.5)
The issue is that ggplot doesn't know that each facet is one group. This very useful tutorial helps with a nice solution. Just add aes(group = 1)
P.S. At the beginning, I was often quite reluctant and feared myself to manipulate my data and pre-calculate data frames for plotting. But there is no need to fret! It is actually often much easier (and safer!) to first shape / aggregate your data into the right form and then plot/ analyse the new data.
library(tidyverse)
library(scales)
ds <- mtcars
ds$gear <- as.factor(ds$gear)
First solution:
ggplot(ds, aes(gear, fill = gear)) +
geom_bar() +
facet_grid(cols = vars(cyl), margins = T) +
geom_text(aes(label = scales::percent(..prop..), group = 1), stat= "count")
edit to reply to comment
Showing percentages across facets is quite confusing to the reader of the figure and I would probably recommend against such a visualization. You won't get around data manipulation here. The challenge is here to include your "facet margin". I create two summary data frames and bind them together.
ds_count <-
ds %>%
count(cyl, gear) %>%
group_by(gear) %>%
mutate(perc = n/sum(n)) %>%
ungroup %>%
mutate(cyl = as.character(cyl))
ds_all <-
ds %>%
count(cyl, gear) %>%
group_by(gear) %>%
summarise(n = sum(n)) %>%
mutate(cyl = 'all', perc = 1)
ds_new <- bind_rows(ds_count, ds_all)
ggplot(ds_new, aes(gear, fill = gear)) +
geom_col(aes(gear, n, fill = gear)) +
facet_grid(cols = vars(cyl)) +
geom_text(aes(label = scales::percent(perc)), stat= "count")
IMO, a better way would be to simply swap x and facetting variables. Then you can use ggplots summarising function as above.
ggplot(ds, aes(as.character(cyl), fill = gear)) +
geom_bar() +
facet_grid(cols = vars(gear), margins = T) +
geom_text(aes(label = scales::percent(..prop..), group = 1), stat= "count")
Created on 2020-02-07 by the reprex package (v0.3.0)