Sorting a dataframe to create a "quilt" chart using geom_tile in ggplot2 - ggplot2

I am looking to create a "quilt" chart that would show the best performing category in each year. This is a fairly common chart, but I have no idea how to produce this in R. Below is a simple reproducible example.
fruit <- c("Apples", "Oranges", "Pears", "Peaches")
yield2020 <- c("200", "100", "250", "125")
yield2021 <- c("40", "90", "85", "100")
yield2022 <- c("150", "110", "150", "170")
DF <- data.frame (Fruit, yield2020, yield2021, yield2022)
As you can see in DF, each year, Apples, Oranges, Peaches, and Pears have different output levels. I'm looking to create a geom_tile chart that would show Pears as the top performer in 2020, Peaches in 2021, and Peaches again in 2022, with the other fruit groups shown, color coded, below it. Any advice is greatly appreciated!
I found the below example of a different quilt chart, which might help frame my ultimate goal here

One option to achieve your desired result would be to reshape your data to long and to add a column with the rank per year which could then be mapped on the y aes. Additionally I use scale_y_reverse to put the top performers on top of the chart:
library(dplyr)
library(tidyr)
library(ggplot2)
df <- DF |>
pivot_longer(-fruit, names_to = "year", names_prefix = "yield") |>
mutate(across(c(year, value), as.numeric)) |>
group_by(year) |>
mutate(rank = rank(-value, ties.method = "first"))
ggplot(df, aes(year, rank, fill = fruit)) +
geom_tile() +
geom_text(aes(label = paste(fruit, value, sep = "\n"))) +
scale_y_reverse() +
guides(fill = "none")

Related

Use geom_label_repel only for certain observations?

I'm a football data analyst using NFL team logos as my points on a scatterplot. However, these images will sometimes cover each other up. I want to find a way to repel a label for those images that are overlapping with one another. However, I only want to have a repelled label for points where the team image is not fully visible. Is there a way to have R only insert labels for a few datapoints? I've attached an image below in which all datapoints have a label attached. My current call to geom_label_repel is:
ggplot + geom_label_repel(label.size = 0.1)
Any advice is greatly appreciated!
You could do something like this to choose the labels you want:
library(tidyverse)
library(ggrepel)
df <- tribble(~team, ~aay, ~epa,
"LA", 8, 5,
"PIT", 6, -2,
"KC", 7, 5,
"DAL", 7, 5
)
# Select desired labels
labels <- df |> filter(team %in% c("KC", "DAL"))
df |>
ggplot(aes(aay, epa)) +
geom_point() +
geom_label_repel(aes(label = team), data = labels, force = 20) +
xlim(c(0, 10)) +
ylim(c(-8, 8))
Created on 2022-05-25 by the reprex package (v2.0.1)

Grouping boxplot together

I have the following data and would like to create a group boxplot. I created a bargraph in excel and would like to create the boxplot in the exact same way using R (see bargraph here). I tried using ggplot but was unsuccessful. Any help will be appreciated. Thank you.
Fruit spring summer fall
Banana 19.36 91.51 49.99
Apple 65.27 51.55 42.83
orange 16.21 94.71 62.33
It's not clear what exactly you are looking for. To have the bar graph, you can do something like this:
library(tidyverse)
df %>%
pivot_longer(!Fruit) %>%
ggplot(aes(x = Fruit, y = value, fill = name)) +
geom_bar(position="dodge", stat="identity")
Output
As far as a boxplot, you would need more than 1 data point, but if you wanted to do a box plot by Fruit, then you could do something like this:
df %>%
pivot_longer(!Fruit) %>%
ggplot(aes(x = Fruit, y = value, fill = Fruit)) +
geom_boxplot()
Output

Grouping the factors in ggplot

I am trying to create a graph based on matrix similar to one below... I am trying to group the Erosion values based on "Slope"...
library(ggplot2)
new_mat<-matrix(,nrow = 135, ncol = 7)
colnames(new_mat)<-c("Scenario","Runoff (mm)","Erosion (t/ac)","Slope","Soil","Tillage","Rotation")
for ( i in 1:nrow(new_mat)){
new_mat[i,2]<-sample(10:50, 1)
new_mat[i,3]<-sample(0.1:20, 1)
new_mat[i,4]<-sample(c("S2","S3","S4","S5","S1"),1)
new_mat[i,5]<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,7]<-sample(c("WBP","WBF","WF"),1)
new_mat[i,6]<-sample(c("Intense","Reduced","Notill"),1)
new_mat[i,1]<-paste0(new_mat[i,4],"_",new_mat[i,5],"_",new_mat[i,6],"_",new_mat[i,7],"_")
}
#### Graph part ########
grphs_mat<-as.data.frame(new_mat)
grphs_mat$`Runoff (mm)`<-as.numeric(as.character(grphs_mat$`Runoff (mm)`))
grphs_mat$`Erosion (t/ac)`<-as.numeric(as.character(grphs_mat$`Erosion (t/ac)`))
ggplot(grphs_mat, aes(Scenario, `Erosion (t/ac)`,group=Slope, colour = Slope))+
scale_y_continuous(limits=c(0,max(as.numeric((grphs_mat$`Erosion (t/ac)`)))))+
geom_point()+geom_line()
But when i run this code.. The values are distributed in x-axis for all 135 scenarios. But what i want is grouping to be done in terms of slope but it also picks up the other common factors such as Soil+Rotation+Tillage and place it in x-axis. For example:
For these five scenarios:
S1_Deep_Intense_WBF_
S2_Deep_Intense_WBF_
S3_Deep_Intense_WBF_
S4_Deep_Intense_WBF_
S5_Deep_Intense_WBF_
It separates the S1, S2, S3,S4,S5 but also be able to know that other factors are same and put them in x-axis such that the slope lines are stacked on top of each other in 135/5 = 27 x-axis points. The final figure should look like this (Refer image). Apologies for not being able to explain it better.
I think i am making a mistake in grouping or assigning the x-axis values.
I will appreciate your suggestions.
In the example you give, I didn't get every possible factor combination represented so the plots looked a bit weird. What I did instead was start with the following:
set.seed(42)
new_mat <- matrix(,nrow = 1000, ncol = 7)
And then deduplicated this by summarising the values. A possible relevant step here for you analysis is that I made new variable with the interaction() function that is the combination of three other factors.
library(tidyverse)
df <- grphs_mat
df$x <- with(df, interaction(Rotation, Soil, Tillage))
# The simulation did not yield unique combinations
df <- df %>% group_by(x, Slope) %>%
summarise(n = sum(`Erosion (t/ac)`))
Next, I plotted this new x variable on the x-axis and used "stack" positions for the lines and points.
g <- ggplot(df, aes(x, y = n, colour = Slope, group = Slope)) +
geom_line(position = "stack") +
geom_point(position = "stack")
To make the x-axis slightly more readable, you can replace the . that the interaction() function placed by newlines.
g + scale_x_discrete(labels = function(x){gsub("\\.", "\n", x)})
Another option is to simply rotate the x axis labels:
g + theme(axis.text.x.bottom = element_text(angle = 90))
There are a few additional options for the x-axis if you go into ggplot2 extension packages.

plot only those histograms with n number of counts

My data looks like this. With this code I get lots of histograms for all my different Space.Groups.
massaged <- read.delim("~/Downloads/dummy.tsv")
library(tidyverse)
df <- na.omit(massaged)
ggplot() +
geom_histogram(data = df, mapping = aes(x = pH.Value)) +
facet_grid(. ~ Space.Group)
As you can see, most of my histograms contain a few total number of counts.
I am interested in only those histograms with a total number of counts greater than 100. For this particular plot I was trying to do something like
ggplot(data = filter(df, Uniprot.Recommended.Name == "Beta-2-microglobulin", Space.Group == %in% c("P 1 21 1", "P 1", "P 21 21 21", "C 1 2 1"))) +
geom_histogram(mapping = aes(x = pH.Value)) +
facet_grid(. ~ Space.Group)
Which by the way it does not work and is not very automatic because I have to make the first plot and then check by hand which Space.Groups are the interesting ones.
My question: Is it possible to tell ggplot2 to only plot those histograms with a total number of counts greater than a number (100, in this case).

How to show higher values in ggplot2 within facet_grid

I have just found the function facet_grid in ggplot2, it's awesome. The question is: I have a list with 6 countries (column HC) and destination of flights all around the world. My data look like this:
HC Reason Destination freq Perc
<chr> <chr> <chr> <int> <dbl>
1 Germany Study Germany 9 0.3651116
2 Germany Work Germany 3 0.1488095
3 Germany Others Germany 3 0.4901961
4 Hungary Study Germany 105 21.4285714
5 Hungary Work Germany 118 17.6382661
6 Hungary Others Germany 24 5.0955414
7 Luxembourg Study Germany 362 31.5056571
Is there a way that in each country only show the top ten destinations and using the function facet_grid? Im trying to make a scatter plot in this way:
Geograp %>%
gather(key=Destination, value=freq, -Reason, -Qcountry) %>%
rename(HC = Qcountry) %>%
group_by(HC,Reason) %>%
mutate(Perc=freq*100/sum(freq)) %>%
ggplot(aes(x=Perc, y=reorder(Destination,Perc))) +
geom_point(size=3) +
theme_bw() +
facet_grid(HC~Reason) +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(colour = "grey60", linetype = "dashed"))
Which produces this graph: I want to avoid the overplotting in the y-axis. Thanks in advance!!!
You could create a variable indicating the rank of each destination by country and then in the ggplot call select rows with ranking <= 10, e.g.
ggplot(data = mydata[rank <= 10, ], ....)
PS: Currently you create data and plot data all in one line using pipes. I would separate the data creation and plotting step.
As You have not posted Your data in correct format (check out dput()), i have used just a sample data. Using dplyr package i grouped in this case by grp variable (group_by(grp), in Your case it is a country) and selected top 10 rows (...top_n(n = 10,...) which are sorted by x variable (wt = x, in Your case it will be freq) and plotted it further (just in this case scatter plot):
library(dplyr)
set.seed(123)
d <- data.frame(x = runif(90),grp = gl(3, 30))
d %>%
group_by(grp) %>%
top_n(n = 10, wt = x) %>%
ggplot(aes(x=x, y=grp)) + geom_point()