Aligning side-by-side related plots with ggplot2 - ggplot2

I have the following two-part plot which are not aligned:
Side-by-side plots not aligned
These plots are produced by the following code:
require(ggplot2)
require(gridExtra)
set.seed(0)
data <- data.frame(x=rpois(30,5),y=rpois(30,11),z=rpois(300,25))
left.plot <- ggplot(data,aes(x,y))
+ geom_bin2d(binwidth=1)
margin.data <- as.data.frame( margin.table(table(data),1))
right.plot <- ggplot(margin.data, aes(x=x,y=Freq))
+ geom_bar(stat="identity")+coord_flip()
grid.arrange(left.plot, right.plot, ncol=2)
How can I align the rows in the left plot to the bars in the right plot?

Your issues are simple, albeit twofold.
Ultimately you need to use scale_y_continuous() and scale_x_continuous() to set your axis limits to match on eatch figure. That's impeded by the fact that the x value is a factor. Convert it to a numeric and throw in some scaling and you're good to go.
left.plot <- ggplot(data,aes(x,y)) +
geom_bin2d(binwidth=1) +
scale_y_continuous(limits = c(1, 16))
margin.data <- as.data.frame( margin.table(table(data),1))
right.plot <- ggplot(margin.data, aes(x=as.numeric(as.character(x)),y=Freq)) +
geom_bar(stat="identity") +
scale_x_continuous(limits = c(1, 16)) +
xlab("x") +
coord_flip()

Using package ggExtra I was able to get an almost solution
require(ggplot2)
require(ggExtra)
set.seed(0)
data <- data.frame(x=rpois(30,5),y=rpois(30,11),z=rpois(300,25))
left.plot <- ggplot(data,aes(x,y)) + geom_bin2d(binwidth=1)
ggMarginal(left.plot, margins="y", type = "histogram", size=2,bins=(max(data$y)-min(data$y)+1),binwidth=1.06)
I say almost because I had to set manually binwidth=1.06 to align bar and counts.
Manually aligned plots using ggExtra::ggMarginal

Related

How can I define and add a lagend to this ggplot 2 script?

I came up with the following script to bin my data on X values, and plot the means of those bins in overlapping bar graphs. It works fine, but I can't seem to get a legend to generate, probably due to poor understanding of aesthetic mapping.
Here is the script, note that "MOI" and "T_cell_contacts" are two data columns in each DF.
ggplot(mapping=aes(MOI, T_cell_contacts)) + stat_summary_bin(data = Cleaned24hr4, fun = "mean", geom="bar", bins= 100, fill = "#FF6666", alpha = 0.3) + stat_summary_bin(data = cleaned24hr8, fun = "mean", geom="bar", bins= 100, fill = "#3733FF", alpha = 0.3) + ylab("mean")
I also added the graph that it plots.
Full disclosure: I was in the middle of writing this when #schumacher posted their response :). Decided to finish anyway.
There are two ways to approach this. One way (more complicated) is to keep the dataframes separate and ask ggplot2 to create a legend via mapping, and the second (simpler) way is to combine into one dataset similar to what #schumacher posted and map the fill color to the extra id column created.
I'll show you both, but first, here's a sample dataset:
library(ggplot2)
set.seed(8675309)
df1 <- data.frame(my_x=rep(1:100, 3), my_y=rnorm(300, 40, 4))
df2 <- data.frame(my_x=rep(11:110, 3), my_y=rnorm(300, 110, 10))
# and the plot code similar to OP's question
ggplot(mapping=aes(x = my_x, y = my_y)) +
stat_summary_bin(data=df1, fun="mean", geom="bar", bins=40, fill="blue", alpha=0.3) +
stat_summary_bin(data=df2, fun="mean", geom="bar", bins=40, fill="red", alpha=0.3)
Method 1 : Combine Dataframes
This is the preferred method for a variety of reasons I can't list completely here. There are a lot of options you can use for combining datasets. One is using union() or rbind() after adding some sort of ID column to your data, but you can do all in one shot using bind_rows() from dplyr:
df <- dplyr::bind_rows(list(dataset1 = df1, dataset2 = df2), .id="id")
The result will bind the rows together and by specifying the .id argument, it will create a new column in the dataset called "id" that uses the names for each of the datasets in the list as the value. In this case, the value in thd df$id column is either "dataset1" if it originated from df1 or "dataset2" if it originated from df2.
Then you use aes(fill=...) to map the fill color to the column "id" in the combined dataset.
p <- ggplot(df, aes(x=my_x, y=my_y)) +
stat_summary_bin(aes(fill=id), fun="mean", geom="bar", bins=40, alpha=0.3)
p
This creates a plot with the default colors for fill, so if you want to supply your own, just use scale_fill_manual(values=...) to specify the particular colors. Using a named vector for values= ensures that each color is applied the way you want it to be, but you can just supply an unnamed vector of color names.
p + scale_fill_manual(values = c("dataset1" = "blue", "dataset2" = "red"))
Method 2 : Use mapping to add the legend
While Method 1 is preferred, there is another way that does not force you to combine your dataframes. This is also useful to illustrate a bit about how ggplot2 decides to create and draw legends. The legend is created automaticaly via the mapping= argument, specifically via aes(). If you put any aesthetic inside of aes() that would normally impart a different appearance and not location (with some exceptions like x, y, and label), then this initiates the creation of a legend. You can map either a column in your dataset (like above), or you can just supply a single value and that will be applied to the entire dataset used for the geom. In this case, see what happens when you change the fill= argument for each geom call to be within aes() and assign it to a character value:
p1 <- ggplot(mapping = aes(x=my_x, y=my_y)) +
stat_summary_bin(aes(fill="first"), data=df1, fun="mean", geom="bar", bins=40, alpha=0.3) +
stat_summary_bin(aes(fill="second"), data=df2, fun="mean", geom="bar", bins=40, alpha=0.3) +
scale_fill_manual(values = c("first" = "blue", "second" = "red"))
p1
It works! When you provide a character value for the fill= aesthetic inside aes(), it's basically labeling every observation in that data to have the value "first" or "second" and using that to make the legend. Cool, right?
You notice a problem though, which is that the alpha value for the legend is not correct. This is because you get overplotting. It's just one of the reasons why you shouldn't really do it this way, but... sort of works. It is only noticeable if you ahve an alpha value. You can get that to look normal, but you need to use guide_legend() to override the aesthetics. Since the code effectively causes the legend to be drawn completely for each geom... you have to cut the alpha value in half for it to display correctly.
p1 + guides(fill=guide_legend(override.aes = list(alpha=0.15)))
Oh, and the real reason why not to use Method 2 is.... just think about doing that again for 5 datasets... how about 10?... how about 20?.....
I think the difficulty has to do with building a single legend out of two different geoms. My approach was to combine your data into a single data frame. The records from each to be set apart by a new category column, I'll call "cat" for short.
With the popular dplyr package:
Cleaned24hr4 <- mutate(Cleaned24hr4, cat = "hr4")
Cleaned24hr8 <- mutate(Cleaned24hr8, cat = "hr8")
Then put them together:
Cleaned <- union(Cleaned24hr4,Cleaned24hr8)
Define your colors:
colorcode <- c("hr4" = "#FF6666", "hr8" = "#3733FF")
Here's my ggplot statement:
ggplot(Cleaned, mapping=aes(MOI, T_cell_contacts)) +
stat_summary_bin(fun = "mean", geom="bar", bins= 100, aes(fill = cat), alpha = 0.3) +
scale_fill_manual(values = colorcode) +
ylab("mean")
Output using some dummy data.

Combining shape & color legends in ggplot2

When I plot the data below, I get 2 separate legends: factor(Type), relating to color, & factor(Category), relating to shape. I would like to have one legend (with no title) that represents both color & shape. Other StackOverflow solutions have not worked for me, please help!
library(sp)
library(sf)
library(ggplot2)
library(ggmap)
library(dplyr)
Retrieve & format NYC area basemap
region.bb = c(left=-74.25,bottom=40.55,right=-73.7,top=40.97)
nyc.stamen <- get_stamenmap(bbox=region.bb,zoom=10,maptype="terrain-background")
Create data frame of coordinate data
Longitude <- c(-73.950311,-73.964482,-73.953678,-73.893522,-73.815856,-74.148499,-73.9465,-73.9585,-73.9223,-73.877744,-73.8796,-73.873983,-73.7781,-74.1745,-74.193432,-74.116770,-73.816316,-74.099108,-73.765924,-73.916045)
Latitude <- c(40.815313,40.767544,40.631762,40.872481,40.734335,40.604014,40.7315,40.8217,40.7905,40.837525,40.8105,40.776969,40.6413,40.6895,40.580011,40.773013,40.857311,40.744994,40.610648,40.799044)
Category <- c(0,1,1,1,1,1,2,2,2,3,4,5,5,5,6,6,6,7,7,7)
Type <- c(1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2)
coordinate.data <- data.frame(Longitude,Latitude,Type,Style,stringsAsFactors=F)
rownames(coordinate.data) <- c("METER","MANH","BKLN","BRON","QUEE","STAT","NEWTOWN_CREEK","NORTH_RIVER","WARDS_ISLAND","BUS_DEPOT","HUNTS_POINT","LGA","JFK","NJT","SITS","ERIE_NJ","BRONX_PELHAM","HUSDON","BAYSWATER","PP")
Plot points over NYC basemap
map.plot <- ggmap(nyc.stamen) +
xlab("Longitude") +
ylab("Latitude") +
geom_point(data=coordinate.data,aes(x=Longitude,y=Latitude,color=factor(Type),shape=factor(Category)),size=3) +
scale_shape_manual(values=c(8,4,0,1,2,5,6,10)) +
scale_color_manual(values=c("red","black")) +
theme(legend.background=element_rect(fill="white"),legend.key=element_rect(fill="white",color=NA))
print(map.plot)
Normally, you combine the color and shape legends when they both use the same variable as factor. In your case, color is a factor of Type and shape is a factor of Category. Combining them does not make any sense. My suggestion is to leave them with no title. While in your example there is a clear distinction of colors, you could have a red square and a black square, and in such a situation what should the legend display?
You can eliminate the legend title with statement
labs(x="Longitude" , y="Latitude" ,col=NULL, shape=NULL ) +
Also, you can combine the legends with the following statement:
guides(color="none", shape= guide_legend(override.aes=list(color=c("red","black") ))) +
My suggestion is not to do so.

Draw ggplots in the plot grid by column - plot_grid() function of the cowplot package

I am using the plot_grid() function of the cowplot package to draw ggplots in a grid and would like to know if there is a way to draw plots by column instead of by row?
library(ggplot2)
library(cowplot)
df <- data.frame(
x = c(3,1,5,5,1),
y = c(2,4,6,4,2)
)
# Create plots: say two each of path plot and polygon plot
p <- ggplot(df)
p1 <- p + geom_path(aes(x,y)) + ggtitle("Path 1")
p2 <- p + geom_polygon(aes(x,y)) + ggtitle("Polygon 1")
p3 <- p + geom_path(aes(y,x)) + ggtitle("Path 2")
p4 <- p + geom_polygon(aes(y,x)) + ggtitle("Polygon 2")
plots <- list(p1,p2,p3,p4)
plot_grid(plotlist=plots, ncol=2) # plots are drawn by row
I would like to have plots P1 and P2 in the first column and p3 and p4 in the second column, something like:
plots <- list(p1, p3, p2, p4) # plot sequence changed
plot_grid(plotlist=plots, ncol=2)
Actually I could have 4, 6, or 8 plots. The number of rows in the plot grid will vary but will always have 2 columns. In each case I would like to fill the plot grid by column (vertically) so my first 2, 3, or 4 plots, as the case maybe, appear over each other. I would like to avoid hardcode these different permutations if I can specify something like par(mfcol = c(n,2)).
As you have observed, plot_grid() draws plots by row. I don't believe there's any way to change that, so if you want to maintain using plot_grid() (which would be probably most convenient), then one approach could be to change the order of the items in your list of plots to match what you need for plot_grid(), given knowledge of the number of columns.
Here's a function I have written that does that. The basic idea is to:
create a list of indexes for number of items in your list (i.e. 1:length(your_list)),
put the index numbers into a matrix with the specified number of rows,
read back that matrix into another vector of indexes by column
reorder your list according to the newly ordered indexes
I've tried to build in a way to make this work even if the number of items in your list is not divisible by the intended number of columns (like a list of 8 items arranged in 3 columns).
reorder_by_col <- function(myData, col_num) {
x <- 1:length(myData) # create index vector
length(x) <- prod(dim(matrix(x, ncol=col_num))) # adds NAs as necessary
temp_matrix <- matrix(x, ncol=col_num, byrow = FALSE)
new_x <- unlist(split(temp_matrix, rep(1:ncol(temp_matrix), each=row(temp_matrix))))
names(new_x) <- NULL # not sure if we need this, but it forces an unnamed vector
return(myData[new_x])
}
This all was written with a little help from Google and specifically answers to questions posted here and here.
You can now see the difference without reordering:
plots <- list(p1,p2,p3,p4)
plot_grid(plotlist=plots, ncol=2)
... and with reordering using the new method:
newPlots <- reorder_by_col(myData=plots, col_num=2)
plot_grid(plotlist=newPlots, ncol=2)
The argument, byrow, has now been added to plot_grid.
In the case where you would like to have num_plots < nrow * ncol the remaining spots will be empty.
You can now call:
library(ggplot2)
df <- data.frame(
x = 1:10, y1 = 1:10, y2 = (1:10)^2, y3 = (1:10)^3, y4 = (1:10)^4
)
p1 <- ggplot(df, aes(x, y1)) + geom_point()
p2 <- ggplot(df, aes(x, y2)) + geom_point()
p3 <- ggplot(df, aes(x, y3)) + geom_point()
cowplot::plot_grid(p1, p2, p3, byrow = FALSE)

Grouping the factors in ggplot

I am trying to create a graph based on matrix similar to one below... I am trying to group the Erosion values based on "Slope"...
library(ggplot2)
new_mat<-matrix(,nrow = 135, ncol = 7)
colnames(new_mat)<-c("Scenario","Runoff (mm)","Erosion (t/ac)","Slope","Soil","Tillage","Rotation")
for ( i in 1:nrow(new_mat)){
new_mat[i,2]<-sample(10:50, 1)
new_mat[i,3]<-sample(0.1:20, 1)
new_mat[i,4]<-sample(c("S2","S3","S4","S5","S1"),1)
new_mat[i,5]<-sample(c("Deep","Moderate","Shallow"),1)
new_mat[i,7]<-sample(c("WBP","WBF","WF"),1)
new_mat[i,6]<-sample(c("Intense","Reduced","Notill"),1)
new_mat[i,1]<-paste0(new_mat[i,4],"_",new_mat[i,5],"_",new_mat[i,6],"_",new_mat[i,7],"_")
}
#### Graph part ########
grphs_mat<-as.data.frame(new_mat)
grphs_mat$`Runoff (mm)`<-as.numeric(as.character(grphs_mat$`Runoff (mm)`))
grphs_mat$`Erosion (t/ac)`<-as.numeric(as.character(grphs_mat$`Erosion (t/ac)`))
ggplot(grphs_mat, aes(Scenario, `Erosion (t/ac)`,group=Slope, colour = Slope))+
scale_y_continuous(limits=c(0,max(as.numeric((grphs_mat$`Erosion (t/ac)`)))))+
geom_point()+geom_line()
But when i run this code.. The values are distributed in x-axis for all 135 scenarios. But what i want is grouping to be done in terms of slope but it also picks up the other common factors such as Soil+Rotation+Tillage and place it in x-axis. For example:
For these five scenarios:
S1_Deep_Intense_WBF_
S2_Deep_Intense_WBF_
S3_Deep_Intense_WBF_
S4_Deep_Intense_WBF_
S5_Deep_Intense_WBF_
It separates the S1, S2, S3,S4,S5 but also be able to know that other factors are same and put them in x-axis such that the slope lines are stacked on top of each other in 135/5 = 27 x-axis points. The final figure should look like this (Refer image). Apologies for not being able to explain it better.
I think i am making a mistake in grouping or assigning the x-axis values.
I will appreciate your suggestions.
In the example you give, I didn't get every possible factor combination represented so the plots looked a bit weird. What I did instead was start with the following:
set.seed(42)
new_mat <- matrix(,nrow = 1000, ncol = 7)
And then deduplicated this by summarising the values. A possible relevant step here for you analysis is that I made new variable with the interaction() function that is the combination of three other factors.
library(tidyverse)
df <- grphs_mat
df$x <- with(df, interaction(Rotation, Soil, Tillage))
# The simulation did not yield unique combinations
df <- df %>% group_by(x, Slope) %>%
summarise(n = sum(`Erosion (t/ac)`))
Next, I plotted this new x variable on the x-axis and used "stack" positions for the lines and points.
g <- ggplot(df, aes(x, y = n, colour = Slope, group = Slope)) +
geom_line(position = "stack") +
geom_point(position = "stack")
To make the x-axis slightly more readable, you can replace the . that the interaction() function placed by newlines.
g + scale_x_discrete(labels = function(x){gsub("\\.", "\n", x)})
Another option is to simply rotate the x axis labels:
g + theme(axis.text.x.bottom = element_text(angle = 90))
There are a few additional options for the x-axis if you go into ggplot2 extension packages.

Plotly is not reading ggplot output well

I am using the following code to plot some data points and it works well in ggplot. However, when I feed this into ggplotly, the visualization and Y-axis labels change completely. Y-axis label shift to right and gets flipped, and the lines in the center get thinner.
Code
library(ggplot2)
library(tidyverse)
library(plotly)
file2 <- read.csv( text = RCurl::getURL("https://gist.githubusercontent.com/gireeshkbogu/806424c1777ff721a046b3e30e85af5a/raw/50ac0b4696f514677b4987b90305fdf879fbcd84/reproducible.examples.txt"), sep="\t")
p <- ggplot(data=subset(file2,!is.na(datetime)),
aes(x=datetime, y=Count,
color=Type,
group=Subject)) +
geom_point(size=4, alpha=0.6) +
scale_y_continuous(breaks=c(0,1))+
theme(axis.text.x=element_text(angle=90, size = 5))+
facet_grid(Subject ~ ., switch = "y") +
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
theme(strip.text.y.left = element_text(angle = 0, size=5)) +
scale_color_manual(values=c("red", "#990000", "#330000", "#00CC99", "#0099FF"))
ggplotly(p)
Ggplot image
Ggplotly image
Reproducible Example
Subject datetime Type Count
user1 4/16/20 15:00 A1 1
user1 3/28/20 13:00 A1 1
user2 4/29/20 15:00 A1 1
user2 5/02/20 09:00 A1 1
user1 2/19/20 18:00 A2 1
user1 4/20/20 16:00 A2 1
Converting ggplot to plotly turns out to be surprisingly complicated! Many ggplot features are silently dropped or incorrectly translated over to plotly.
If I am not mistaken, switch = "y" within your facet_grid is being silently dropped.
In addition, you have too many facets in your plot. Looks like "Subject" is creating 30+ facets. I know that it is tempting to try and fit as much data into one plot, but you are really pushing the limits of what you can do with facets here.
I made some modifications. See if this is something you can work with:
library(ggplot2)
library(tidyverse)
library(plotly)
library(RCurl)
# your original file
file2 <- read.csv( text = RCurl::getURL("https://gist.githubusercontent.com/gireeshkbogu/806424c1777ff721a046b3e30e85af5a/raw/50ac0b4696f514677b4987b90305fdf879fbcd84/reproducible.examples.txt"), sep="\t")
head(file2)
# scaling down the dataframe so that you have fewer facets per plot
file3 <- file2 %>%
as_tibble() %>%
na.omit() %>%
filter(Subject %in% c("User1", "User2", "User3", "User4")) %>%
arrange(Subject, datetime)
head(file3)
# sending the smaller data frame to ggplot
p_2 <- ggplot(data=file3,
aes(x=datetime, y=Count, color=Type, group=Subject)) +
geom_point(size=4, alpha=0.6) +
scale_y_continuous(breaks=c(0,1))+
theme(axis.text.x=element_text(angle=90, size = 5)) +
facet_grid(Subject ~ .) + # removing "Switch" ; it is being dropped by plotly
theme(axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank(),
legend.position = "left") + # move legend to left on ggplot
theme(strip.text.y.left = element_text(angle = 0, size=5)) +
scale_color_manual(values=c("red", "#990000", "#330000", "#00CC99", "#0099FF"))
p_2
ggplotly(p_2) %>%
layout(title = "Modified & Scaled Down Plot",
legend = list(orientation = "v", # fine-tune legend directly in plotly,
y = 1, x = -0.1)) # you may need to fiddle with these
The modified code yields me this plot. You will probably need to make a few small groups by "Subject" and call a plot for each group.