In below ggplot2/plotly, how to set default series - ggplot2

In below ggplot2/plotly, there are theree series 'A/B/C', how to select 'A' as default series and 'B/C' not show in fianl plot? (I mean, there are all in the plot, but only 'A' selected in default, you can add 'B/C' to plot by click the legend)
library(tidyverse)
library(plotly)
mdate <- rep(as.Date(c('2022-1-1','2022-1-2','2022-1-3')),3)
cat <- c("A","A","A","B","B","B","C","C","C")
amount <- c(22,27,12,14,4,10,9,6,4)
plot_data <- data.frame(mdate,cat,amount)
p1 <- plot_data %>% ggplot(aes(mdate,amount,color=cat))+
geom_line()
ggplotly(p1,plotly)

Related

extract individual panels of facet plot to re-arrange

I would like to rearrange a facet plot with 3 panels to have them fit better in a poster presentation. Currently, I have A over B over C (one column), and it is important to keep B over C.
What I would like is to have a square (2x2) presentation, with A over nothing, and B over C.
Can I either extract the individual panels of the plot, or create a facet with no axes or other graphic (like plotgrid with a NULL panel).
A second option would be the ggh4x package which via facet_manual adds some flexibility to place the panels:
library(ggplot2)
library(ggh4x)
design <- "
AB
#C"
ggplot(mtcars, aes(mpg, disp)) +
geom_point() +
facet_manual(~cyl, design = design)
One approach could be creating separate plots using nest() and map() from {tidyverse} and then using {patchwork} package to align them as we want.
(Since OP didn't provide any data and code, I am using builtin mtcars dataset to show how to do this). Suppose This is the case where we have a facetted plot with 3 panels in a 3 x 1 format.
library(tidyverse)
# 3 x 1 faceted plot
mtcars %>%
ggplot(aes(mpg, disp)) +
geom_point() +
facet_wrap(~cyl, nrow = 3)
Now to match the question, lets suppose panel for cyl 4 is plot A, panel for cyl 6 is plot B and for cyl 8 is plot C.
So to this, we first created a nested dataset with respect to facet variable using group_by(facet_var) %>% nest() and then map the ggplot over the nested data to get plots (gg object) for each nested data.
library(tidyverse)
library(patchwork)
# Say, plotA is cyl 4
# plotB is cyl 6
# plotC is cyl 8
# 2 x 2 facet plot
plot_data <- mtcars %>%
group_by(cyl) %>%
nest() %>%
mutate(
plots = map2(
.x = data,
.y = cyl,
.f = ~ ggplot(data = .x, mapping = aes(mpg, disp)) +
geom_point() +
ggtitle(paste0("cyl is ",.y))
)
)
plot_data
#> # A tibble: 3 × 3
#> # Groups: cyl [3]
#> cyl data plots
#> <dbl> <list> <list>
#> 1 6 <tibble [7 × 10]> <gg>
#> 2 4 <tibble [11 × 10]> <gg>
#> 3 8 <tibble [14 × 10]> <gg>
Then simply align the plots using {patchwork} syntax as we wanted. I have used plot_spacer() to create blank space.
plot_data
plots <- plot_data$plots
plots[[2]] + plots[[1]] + plot_spacer() + plots[[3]] +
plot_annotation(
title = "A 2 X 2 faceted plot"
)

Scatter plot with ggplot, using indexing to plot subsets of the same variable on x and y axis

I'm working with a subset of weather data for Heathrow downloaded Met Office data. This data set contains no missing values.
Using ggplot, I'd like to create a scatter plot for the maximum temperature (tmax) for Heathrow, with 2018 data plotted against 2019 data (see below for example). There are 12 data points for both 2018 and 2019.
I've attempted this with the below, however it does not work. This appears to be due to the indexing as the code works fine when not attempting to use the indexes within the aes() function.
How can I get this to work?
2018Index <- which(HeathrowData$Year == 2018)
2019Index <- which(HeathrowData$Year == 2019)
scatter<-ggplot(HeathrowData, aes(tmax[2018Index], tmax[2019Index]))
scatter + geom_point()
scatter + geom_point(size = 2) + labs(x = "2018", y = "2019"))
As your data is in long format you need some data wrangling to put the values for your years in separate columns aka you have to reshape your data to wide:
Using some random fake data:
library(dplyr)
library(tidyr)
library(ggplot2)
# Example data
set.seed(123)
HeathrowData <- data.frame(
Year = rep(2017:2019, each = 12),
tmax = runif(36)
)
# Select, Filter, Convert to Wide
HeathrowData <- HeathrowData %>%
select(Year, tmax) %>%
filter(Year %in% c(2018, 2019)) %>%
group_by(Year) %>%
mutate(id = row_number()) %>%
ungroup() %>%
pivot_wider(names_from = Year, values_from = tmax, names_prefix = "y")
ggplot(HeathrowData, aes(y2018, y2019)) +
geom_point(size = 2) +
labs(x = "2018", y = "2019")

Draw ggplots in the plot grid by column - plot_grid() function of the cowplot package

I am using the plot_grid() function of the cowplot package to draw ggplots in a grid and would like to know if there is a way to draw plots by column instead of by row?
library(ggplot2)
library(cowplot)
df <- data.frame(
x = c(3,1,5,5,1),
y = c(2,4,6,4,2)
)
# Create plots: say two each of path plot and polygon plot
p <- ggplot(df)
p1 <- p + geom_path(aes(x,y)) + ggtitle("Path 1")
p2 <- p + geom_polygon(aes(x,y)) + ggtitle("Polygon 1")
p3 <- p + geom_path(aes(y,x)) + ggtitle("Path 2")
p4 <- p + geom_polygon(aes(y,x)) + ggtitle("Polygon 2")
plots <- list(p1,p2,p3,p4)
plot_grid(plotlist=plots, ncol=2) # plots are drawn by row
I would like to have plots P1 and P2 in the first column and p3 and p4 in the second column, something like:
plots <- list(p1, p3, p2, p4) # plot sequence changed
plot_grid(plotlist=plots, ncol=2)
Actually I could have 4, 6, or 8 plots. The number of rows in the plot grid will vary but will always have 2 columns. In each case I would like to fill the plot grid by column (vertically) so my first 2, 3, or 4 plots, as the case maybe, appear over each other. I would like to avoid hardcode these different permutations if I can specify something like par(mfcol = c(n,2)).
As you have observed, plot_grid() draws plots by row. I don't believe there's any way to change that, so if you want to maintain using plot_grid() (which would be probably most convenient), then one approach could be to change the order of the items in your list of plots to match what you need for plot_grid(), given knowledge of the number of columns.
Here's a function I have written that does that. The basic idea is to:
create a list of indexes for number of items in your list (i.e. 1:length(your_list)),
put the index numbers into a matrix with the specified number of rows,
read back that matrix into another vector of indexes by column
reorder your list according to the newly ordered indexes
I've tried to build in a way to make this work even if the number of items in your list is not divisible by the intended number of columns (like a list of 8 items arranged in 3 columns).
reorder_by_col <- function(myData, col_num) {
x <- 1:length(myData) # create index vector
length(x) <- prod(dim(matrix(x, ncol=col_num))) # adds NAs as necessary
temp_matrix <- matrix(x, ncol=col_num, byrow = FALSE)
new_x <- unlist(split(temp_matrix, rep(1:ncol(temp_matrix), each=row(temp_matrix))))
names(new_x) <- NULL # not sure if we need this, but it forces an unnamed vector
return(myData[new_x])
}
This all was written with a little help from Google and specifically answers to questions posted here and here.
You can now see the difference without reordering:
plots <- list(p1,p2,p3,p4)
plot_grid(plotlist=plots, ncol=2)
... and with reordering using the new method:
newPlots <- reorder_by_col(myData=plots, col_num=2)
plot_grid(plotlist=newPlots, ncol=2)
The argument, byrow, has now been added to plot_grid.
In the case where you would like to have num_plots < nrow * ncol the remaining spots will be empty.
You can now call:
library(ggplot2)
df <- data.frame(
x = 1:10, y1 = 1:10, y2 = (1:10)^2, y3 = (1:10)^3, y4 = (1:10)^4
)
p1 <- ggplot(df, aes(x, y1)) + geom_point()
p2 <- ggplot(df, aes(x, y2)) + geom_point()
p3 <- ggplot(df, aes(x, y3)) + geom_point()
cowplot::plot_grid(p1, p2, p3, byrow = FALSE)

Plotly is not reading ggplot output well

I am using the following code to plot some data points and it works well in ggplot. However, when I feed this into ggplotly, the visualization and Y-axis labels change completely. Y-axis label shift to right and gets flipped, and the lines in the center get thinner.
Code
library(ggplot2)
library(tidyverse)
library(plotly)
file2 <- read.csv( text = RCurl::getURL("https://gist.githubusercontent.com/gireeshkbogu/806424c1777ff721a046b3e30e85af5a/raw/50ac0b4696f514677b4987b90305fdf879fbcd84/reproducible.examples.txt"), sep="\t")
p <- ggplot(data=subset(file2,!is.na(datetime)),
aes(x=datetime, y=Count,
color=Type,
group=Subject)) +
geom_point(size=4, alpha=0.6) +
scale_y_continuous(breaks=c(0,1))+
theme(axis.text.x=element_text(angle=90, size = 5))+
facet_grid(Subject ~ ., switch = "y") +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
theme(strip.text.y.left = element_text(angle = 0, size=5)) +
scale_color_manual(values=c("red", "#990000", "#330000", "#00CC99", "#0099FF"))
ggplotly(p)
Ggplot image
Ggplotly image
Reproducible Example
Subject datetime Type Count
user1 4/16/20 15:00 A1 1
user1 3/28/20 13:00 A1 1
user2 4/29/20 15:00 A1 1
user2 5/02/20 09:00 A1 1
user1 2/19/20 18:00 A2 1
user1 4/20/20 16:00 A2 1
Converting ggplot to plotly turns out to be surprisingly complicated! Many ggplot features are silently dropped or incorrectly translated over to plotly.
If I am not mistaken, switch = "y" within your facet_grid is being silently dropped.
In addition, you have too many facets in your plot. Looks like "Subject" is creating 30+ facets. I know that it is tempting to try and fit as much data into one plot, but you are really pushing the limits of what you can do with facets here.
I made some modifications. See if this is something you can work with:
library(ggplot2)
library(tidyverse)
library(plotly)
library(RCurl)
# your original file
file2 <- read.csv( text = RCurl::getURL("https://gist.githubusercontent.com/gireeshkbogu/806424c1777ff721a046b3e30e85af5a/raw/50ac0b4696f514677b4987b90305fdf879fbcd84/reproducible.examples.txt"), sep="\t")
head(file2)
# scaling down the dataframe so that you have fewer facets per plot
file3 <- file2 %>%
as_tibble() %>%
na.omit() %>%
filter(Subject %in% c("User1", "User2", "User3", "User4")) %>%
arrange(Subject, datetime)
head(file3)
# sending the smaller data frame to ggplot
p_2 <- ggplot(data=file3,
aes(x=datetime, y=Count, color=Type, group=Subject)) +
geom_point(size=4, alpha=0.6) +
scale_y_continuous(breaks=c(0,1))+
theme(axis.text.x=element_text(angle=90, size = 5)) +
facet_grid(Subject ~ .) + # removing "Switch" ; it is being dropped by plotly
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
legend.position = "left") + # move legend to left on ggplot
theme(strip.text.y.left = element_text(angle = 0, size=5)) +
scale_color_manual(values=c("red", "#990000", "#330000", "#00CC99", "#0099FF"))
p_2
ggplotly(p_2) %>%
layout(title = "Modified & Scaled Down Plot",
legend = list(orientation = "v", # fine-tune legend directly in plotly,
y = 1, x = -0.1)) # you may need to fiddle with these
The modified code yields me this plot. You will probably need to make a few small groups by "Subject" and call a plot for each group.

unique values in categorical variables R estudio

How can I find how many unique values each categorical takes in a data frame and then represent it with a graph? all this in R studio
We'll use the tidyverse here.
library(tidyverse)
You can apply the unique() function to a dataframe to remove any repeat rows.
df <- iris %>% unique()
The group_by(), summarise() and n() functions let you count the number of instances of a variable in a dataframe.
df2 <- df %>% group_by(Species) %>% summarise(n = n())
## alternatively use count() which does the same thing
df2 <- df %>% count(Species)
Finally we can use the ggplot package to create a graph.
ggplot() + geom_col(data = df2, aes(x = Species, y = n))
If you're not interested in having a separate dataframe with the data in it and want to jump straight to the graph, you can ignore the step with group_by() and summarise() and just use geom_bar().
ggplot() + geom_bar(data = df, aes(Species))