How to show higher values in ggplot2 within facet_grid - ggplot2

I have just found the function facet_grid in ggplot2, it's awesome. The question is: I have a list with 6 countries (column HC) and destination of flights all around the world. My data look like this:
HC Reason Destination freq Perc
<chr> <chr> <chr> <int> <dbl>
1 Germany Study Germany 9 0.3651116
2 Germany Work Germany 3 0.1488095
3 Germany Others Germany 3 0.4901961
4 Hungary Study Germany 105 21.4285714
5 Hungary Work Germany 118 17.6382661
6 Hungary Others Germany 24 5.0955414
7 Luxembourg Study Germany 362 31.5056571
Is there a way that in each country only show the top ten destinations and using the function facet_grid? Im trying to make a scatter plot in this way:
Geograp %>%
gather(key=Destination, value=freq, -Reason, -Qcountry) %>%
rename(HC = Qcountry) %>%
group_by(HC,Reason) %>%
mutate(Perc=freq*100/sum(freq)) %>%
ggplot(aes(x=Perc, y=reorder(Destination,Perc))) +
geom_point(size=3) +
theme_bw() +
facet_grid(HC~Reason) +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(colour = "grey60", linetype = "dashed"))
Which produces this graph: I want to avoid the overplotting in the y-axis. Thanks in advance!!!

You could create a variable indicating the rank of each destination by country and then in the ggplot call select rows with ranking <= 10, e.g.
ggplot(data = mydata[rank <= 10, ], ....)
PS: Currently you create data and plot data all in one line using pipes. I would separate the data creation and plotting step.

As You have not posted Your data in correct format (check out dput()), i have used just a sample data. Using dplyr package i grouped in this case by grp variable (group_by(grp), in Your case it is a country) and selected top 10 rows (...top_n(n = 10,...) which are sorted by x variable (wt = x, in Your case it will be freq) and plotted it further (just in this case scatter plot):
library(dplyr)
set.seed(123)
d <- data.frame(x = runif(90),grp = gl(3, 30))
d %>%
group_by(grp) %>%
top_n(n = 10, wt = x) %>%
ggplot(aes(x=x, y=grp)) + geom_point()

Related

forecasting in time series in R with dplyr and ggplot

Hope all goes well.
I have a data set that I can share a small piece of it:
date=c("2022-08-01","2022-08-02","2022-08-03","2022-08-04",
"2022-08-05","2022-08-6")
sold_items=c(12,18,9,31,19,10)
df <- data.frame(date=as.Date(date),sold_items)
df %>% sample_n(5)
date sold_items
1 2022-08-04 31
2 2022-08-03 9
3 2022-08-01 12
4 2022-08-06 10
5 2022-08-02 18
I need to forecast the number of sold items in the next two weeks (14 days after the last available date in the data).
And also need to show the forecasted data along with the current data on one graph using ggplot
I have been looking into forecast package to use ARIMA but I am lost and could not convert this data to a time series object.
I wonder if someone can provide a solution with dplyr to my problem.
Thank you very much.
# first create df
` df =
tibble(
sold = c(12, 18, 9, 31, 19, 10),
date = seq(as.Date("2022-08-01"),
by = "day",
length = length(sold))) %>%
relocate(date)
#then coerce to a tsibble object (requires package fpp3) and model:
df %>%
as_tsibble(index = date) %>%
model(ARIMA(sold)) %>%
forecast(h = 14)

extract individual panels of facet plot to re-arrange

I would like to rearrange a facet plot with 3 panels to have them fit better in a poster presentation. Currently, I have A over B over C (one column), and it is important to keep B over C.
What I would like is to have a square (2x2) presentation, with A over nothing, and B over C.
Can I either extract the individual panels of the plot, or create a facet with no axes or other graphic (like plotgrid with a NULL panel).
A second option would be the ggh4x package which via facet_manual adds some flexibility to place the panels:
library(ggplot2)
library(ggh4x)
design <- "
AB
#C"
ggplot(mtcars, aes(mpg, disp)) +
geom_point() +
facet_manual(~cyl, design = design)
One approach could be creating separate plots using nest() and map() from {tidyverse} and then using {patchwork} package to align them as we want.
(Since OP didn't provide any data and code, I am using builtin mtcars dataset to show how to do this). Suppose This is the case where we have a facetted plot with 3 panels in a 3 x 1 format.
library(tidyverse)
# 3 x 1 faceted plot
mtcars %>%
ggplot(aes(mpg, disp)) +
geom_point() +
facet_wrap(~cyl, nrow = 3)
Now to match the question, lets suppose panel for cyl 4 is plot A, panel for cyl 6 is plot B and for cyl 8 is plot C.
So to this, we first created a nested dataset with respect to facet variable using group_by(facet_var) %>% nest() and then map the ggplot over the nested data to get plots (gg object) for each nested data.
library(tidyverse)
library(patchwork)
# Say, plotA is cyl 4
# plotB is cyl 6
# plotC is cyl 8
# 2 x 2 facet plot
plot_data <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(
plots = map2(
.x = data,
.y = cyl,
.f = ~ ggplot(data = .x, mapping = aes(mpg, disp)) +
geom_point() +
ggtitle(paste0("cyl is ",.y))
)
)
plot_data
#> # A tibble: 3 × 3
#> # Groups: cyl [3]
#> cyl data plots
#> <dbl> <list> <list>
#> 1 6 <tibble [7 × 10]> <gg>
#> 2 4 <tibble [11 × 10]> <gg>
#> 3 8 <tibble [14 × 10]> <gg>
Then simply align the plots using {patchwork} syntax as we wanted. I have used plot_spacer() to create blank space.
plot_data
plots <- plot_data$plots
plots[[2]] + plots[[1]] + plot_spacer() + plots[[3]] +
plot_annotation(
title = "A 2 X 2 faceted plot"
)

Plotly is not reading ggplot output well

I am using the following code to plot some data points and it works well in ggplot. However, when I feed this into ggplotly, the visualization and Y-axis labels change completely. Y-axis label shift to right and gets flipped, and the lines in the center get thinner.
Code
library(ggplot2)
library(tidyverse)
library(plotly)
file2 <- read.csv( text = RCurl::getURL("https://gist.githubusercontent.com/gireeshkbogu/806424c1777ff721a046b3e30e85af5a/raw/50ac0b4696f514677b4987b90305fdf879fbcd84/reproducible.examples.txt"), sep="\t")
p <- ggplot(data=subset(file2,!is.na(datetime)),
aes(x=datetime, y=Count,
color=Type,
group=Subject)) +
geom_point(size=4, alpha=0.6) +
scale_y_continuous(breaks=c(0,1))+
theme(axis.text.x=element_text(angle=90, size = 5))+
facet_grid(Subject ~ ., switch = "y") +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
theme(strip.text.y.left = element_text(angle = 0, size=5)) +
scale_color_manual(values=c("red", "#990000", "#330000", "#00CC99", "#0099FF"))
ggplotly(p)
Ggplot image
Ggplotly image
Reproducible Example
Subject datetime Type Count
user1 4/16/20 15:00 A1 1
user1 3/28/20 13:00 A1 1
user2 4/29/20 15:00 A1 1
user2 5/02/20 09:00 A1 1
user1 2/19/20 18:00 A2 1
user1 4/20/20 16:00 A2 1
Converting ggplot to plotly turns out to be surprisingly complicated! Many ggplot features are silently dropped or incorrectly translated over to plotly.
If I am not mistaken, switch = "y" within your facet_grid is being silently dropped.
In addition, you have too many facets in your plot. Looks like "Subject" is creating 30+ facets. I know that it is tempting to try and fit as much data into one plot, but you are really pushing the limits of what you can do with facets here.
I made some modifications. See if this is something you can work with:
library(ggplot2)
library(tidyverse)
library(plotly)
library(RCurl)
# your original file
file2 <- read.csv( text = RCurl::getURL("https://gist.githubusercontent.com/gireeshkbogu/806424c1777ff721a046b3e30e85af5a/raw/50ac0b4696f514677b4987b90305fdf879fbcd84/reproducible.examples.txt"), sep="\t")
head(file2)
# scaling down the dataframe so that you have fewer facets per plot
file3 <- file2 %>%
as_tibble() %>%
na.omit() %>%
filter(Subject %in% c("User1", "User2", "User3", "User4")) %>%
arrange(Subject, datetime)
head(file3)
# sending the smaller data frame to ggplot
p_2 <- ggplot(data=file3,
aes(x=datetime, y=Count, color=Type, group=Subject)) +
geom_point(size=4, alpha=0.6) +
scale_y_continuous(breaks=c(0,1))+
theme(axis.text.x=element_text(angle=90, size = 5)) +
facet_grid(Subject ~ .) + # removing "Switch" ; it is being dropped by plotly
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
legend.position = "left") + # move legend to left on ggplot
theme(strip.text.y.left = element_text(angle = 0, size=5)) +
scale_color_manual(values=c("red", "#990000", "#330000", "#00CC99", "#0099FF"))
p_2
ggplotly(p_2) %>%
layout(title = "Modified & Scaled Down Plot",
legend = list(orientation = "v", # fine-tune legend directly in plotly,
y = 1, x = -0.1)) # you may need to fiddle with these
The modified code yields me this plot. You will probably need to make a few small groups by "Subject" and call a plot for each group.

plot only those histograms with n number of counts

My data looks like this. With this code I get lots of histograms for all my different Space.Groups.
massaged <- read.delim("~/Downloads/dummy.tsv")
library(tidyverse)
df <- na.omit(massaged)
ggplot() +
geom_histogram(data = df, mapping = aes(x = pH.Value)) +
facet_grid(. ~ Space.Group)
As you can see, most of my histograms contain a few total number of counts.
I am interested in only those histograms with a total number of counts greater than 100. For this particular plot I was trying to do something like
ggplot(data = filter(df, Uniprot.Recommended.Name == "Beta-2-microglobulin", Space.Group == %in% c("P 1 21 1", "P 1", "P 21 21 21", "C 1 2 1"))) +
geom_histogram(mapping = aes(x = pH.Value)) +
facet_grid(. ~ Space.Group)
Which by the way it does not work and is not very automatic because I have to make the first plot and then check by hand which Space.Groups are the interesting ones.
My question: Is it possible to tell ggplot2 to only plot those histograms with a total number of counts greater than a number (100, in this case).

Match fields within one data frame with column names in another data frame

I have two data frames. In the last column ("Bill") in the first data frame, I want to apply a function (fixed price + Quantity*price/qty). In order to apply the function, R should match the values in the first column of df1 to the column names of df2.
I have solved the problem by creating a function and several ifelse statements, but I would want to use a statement that automatically matches the values in df1 with the column names in df2. The data set that I have contains more than 2 million rows and I would need to apply the same rationale into building other similar functions. It would be nice to use something that does not require a loop or takes too long to process.
### Set up your data frames like so ###
Code <- c("a1", "a2", "c3", "a1")
Name <- c("Dan", "David", "Anna", "Lisa")
Quantity <- c(30, 12, 10, 10)
df1 <- as.data.frame(cbind("Code" = Code, "Name" = Name, "Quantity" = Quantity), stringsAsFactors = F)
df1$Quantity <- as.numeric(df1$Quantity)
fixed_price <- c(12, 5, 23)
price_per_qty <- c(1, 4, 7)
df2 <- as.data.frame(rbind("fixed_price" = fixed_price, "price_per_qty" = price_per_qty))
colnames(df2) <- c("a1", "a2", "c3")
### Combine dataframe 1 and 2 into a single dataframe ###
# Code below pulls individual columns from df2 based on the
# index provided by the "Code" column in df1, transposes them
# so they'll line up with df1, then column binds them to df1
df3 <- cbind(df1, t(df2[,df1$Code]))
# the bill is calculated simply enough
bill <- df3[4] + df3[3] * df3[5]
colnames(bill) <- "bill"
# Finally, output the results as you wanted
cbind(df3, bill)
So I have a fairly similar answer to graggsd, but here is what worked for me. I merged two data frames based on the key word "Code" and then combined it into the big data frame into combined_data. I then used a function which I think is what you defined above and then passed the respective data frames through it.
df2 <- t(data.frame(c(12,1),c(5,4),c(23,7)))
rownames(df2) <- c("a1","a2","c3")
test <- rownames(df2)
df2 <- cbind.data.frame(df2,test)
colnames(df2) <- c("fixed price","price/qty","Code")
df1 <- data.frame(c("a1","a2","c3","a1"), c("Dan","David","Anna","Lisa"),c(30,12,10,10))
colnames(df1) <- c("Code","Name","Quantity")
combined_data <- dplyr::inner_join(df1,df2, by = "Code")
f1 <- function(x,y,z){
x + y * z
}
bill <- f1(combined_data[,4],combined_data[,3],combined_data[,5])
finalDataSet <- cbind.data.frame(combined_data,bill)
The final data set:
Code Name Quantity fixed price price/qty bill
1 a1 Dan 30 12 1 42
2 a2 David 12 5 4 53
3 c3 Anna 10 23 7 93
4 a1 Lisa 10 12 1 22