How to adjust plot areas in ggplot? - ggplot2

I am trying to use grid.arrange to display multiple graphs on the same page generated by ggplot.Each subplot with difference x and y scale. Two subplot share the legend. My perpose is that to display the plot areas same size. Are there parameter to adjust plot area ( except legend area)? The facet is inadequate to arrange it.
df <- data.frame(class=paste0('a',1:20),
x1=runif(20),
x2=runif(20),
y1=runif(20),
y2=runif(20))
p1 <- ggplot(df,aes(x=x1,y=y1))+
geom_point(aes(color=class),size=2,show.legend=TRUE)+
stat_smooth(method='lm',color='black')+
theme_bw()
p2 <- ggplot(df,aes(x=x2,y=y2))+
geom_point(aes(color=class),size=2,show.legend=FALSE)+
stat_smooth(method='lm',color='black')+
theme_bw()
grid.arrange(p1,p2,nrow=2)

Using patchwork package
# install.packages("devtools", dependencies = TRUE)
# devtools::install_github("thomasp85/patchwork")
library(patchwork)
p1 / p2 + plot_annotation(title = "Plot title",
subtitle = "Plot subtitle",
tag_levels = 'A',
tag_suffix = ')')
Created on 2018-11-20 by the reprex package (v0.2.1.9000)

Related

Plotly does not properly show axis numbers with math_format

The code below indicates that, while using math_format command in ggplot 'labels', the plot displays well if ggplot is used, but it fails if it is displayed through plotly. I need to use plotly in my code. Does somebody have some suggestion?
library(tidyverse)
library(scales)
library(plotly)
p <- mtcars %>% ggplot(aes(x=mpg, y=disp))+
geom_point() +
scale_x_continuous(trans = log_trans(),
breaks = trans_breaks("log", function(x) exp(x), n.breaks = 5),
labels = trans_format("log", math_format(e^.x, format = function(x) number(x, accuracy = 0.01, decimal.mark = ','))))
p
ggplotly(p)

Legend title for ggplot with several options

I am plotting density plots by groups with different colors and linetypes.
ggplot(data = data,
aes(x=X, group=G, color=factor(G), linetype = factor(G))) +
geom_density()
How can I change the legend title while keeping it as one legend?
Your issue comes from the fact that when you add a legend title (e.g., using scale_color_discrete), you're only doing it for color and not linetype. The first plot is fine because the legends have identical arguments (i.e., neither is specified). You need to provide identical specifications for each legend in order to combine them. See this post for more information.
There may be other ways around this issue, but we can't say for certain since we can't access your dataset (data).
library(tidyverse)
data(mtcars)
# this is ok; one legend
ggplot(data = mtcars,
aes(x=mpg, group=cyl, color=factor(cyl), linetype = factor(cyl))) +
geom_density()
# this is not ok; now two legends
ggplot(data = mtcars,
aes(x=mpg, group=cyl, color=factor(cyl), linetype = factor(cyl))) +
geom_density() + scale_color_discrete("New Name?")
# this is ok again; back to one legend
ggplot(data = mtcars,
aes(x=mpg, group=cyl, color=factor(cyl), linetype = factor(cyl))) +
geom_density() +
scale_colour_manual(name = "New!",
labels = c("4", "6", "8"),
values = c("red", "blue", "pink")) +
scale_linetype_manual(name = "New!",
labels = c("4", "6", "8"),
values = c(2, 4, 6))

plotting 'rworldmap' shapefile in ggplot2 across pacific ocean

I'm having issues making a plot in ggplot using some raster data I've gathered. I'll simulate the raster data as a dataframe here:
# Set up coordinates #
lon <- seq(120, 290, 1)
lat <- seq(-30, 30, 1)
r1 <- data.frame(
"lon" = rep(lon, length(lat)),
"lat" = rep(lat, each = length(lon))
)
# Add variable #
set.seed(2022)
r1$var <- rnorm(n = nrow(r1), 0, 1)
# Plot raster #
library(ggplot2)
p1 <- ggplot(r1, aes(x = lon, y = lat, fill = var))+
geom_raster()
p1
The issue I'm having is when I try to add a shapefile (specifically from rworldmap) to this plot. Because the data is projected in longitudes between -180 and 180 (instead of 0 to 360), it's unable to plot anything east of 180 East.
library(rworldmap)
library(sf)
# Download Shapefile #
world.shp <- getMap(resolution = 'low')
world.shp <- st_as_sf(world.shp)
# Plot shapefile on top of raster data #
p2 <- ggplot()+
geom_raster(data = r1, aes(x = lon, y = lat, fill = var))+
geom_sf(data = world.shp)+
coord_sf(xlim = c(120, 290), ylim = c(-30, 30), expand = TRUE)
p2
Notice how only Australia plots, when we should also be getting South America, Latin America, and North America.
I've tried many different strategies to reproject the rworldmap shapefile (world.shp), from defining a crs in st_as_sf() to specifying a crs in the coord_sf() argument. However, I've had no no success. The solution seems very simple, but I can't seem to find it. Any help with this would be greatly appreciated.
Cheers,

How to generate several legends for single plot matplotlib

I was making a plot of f(x,y,z) and wanted this to be displayed in a 2D-plane. To avoid cluttering my legend i decided to have different linestyles for y, different colors for z and place the two in two separate legends. I couldn't find out how to do this even after a lot of digging, so I'm posting the solution i came up with here :) If anyone has more elegant solutions I'm all ears :)
Basically the solution was to make three plots, set two of them to have size (0,0) and place those two where i wanted the legends. It feels like an ugly way to do it, but it gave a nice plot and i didn't find any other way :) The resulting plot looks like this:
def plot_alt(style = 'log'):
cmap = cm.get_cmap('inferno')
color_scale = 1.2 #Variable to get colors from a certain part of the colormap
#Making grids for delta T and average concentration
D_T_axis = -np.logspace(np.log10(400), np.log10(1), 7)
C_bar_list = np.linspace(5,10,4)
ST_list = np.logspace(-3,-1,100)
# f(x,y,z)
DC_func = lambda C_bar, ST, DT: 2*C_bar * (1 - np.exp(ST*DT))/(1 + np.exp(ST*DT))
#Some different linestyles
styles = ['-', '--', '-.', ':']
fig, ax = plt.subplots(1,3, figsize = (10,5))
plt.sca(ax[0])
for i, C_bar in enumerate(C_bar_list): #See plot_c_rel_av_DT() for 'enumerate'
for j, DT in enumerate(D_T_axis):
plt.plot(ST_list, DC_func(C_bar, ST_list, DT), color = cmap(np.log10(-DT)/(color_scale*np.log10(-D_T_axis[0]))),
linestyle = styles[i])
# Generating separate legends by plotting lines in the two other subplots
# Basically: to get two separate legends i make two plots, place them where i want the legends
# and set their size to zero, then display their legends.
plt.sca(ax[1]) #Set current axes to ax[1]
for i, C_bar in enumerate(C_bar_list):
# Plotting the different linestyles
plt.plot(C_bar_list, linestyle = styles[i], color = 'black', label = str(round(C_bar, 2)))
plt.sca(ax[2])
for DT in D_T_axis:
#plotting the different colors
plt.plot(D_T_axis, color = cmap(np.log10(-DT)/(color_scale*np.log10(-D_T_axis[0]))), label = str(int(-DT)))
#Placing legend
#This is where i move and scale the three plots to make one plot and two legends
box0 = ax[0].get_position() #box0 is an object that contains the position and dimentions of the ax[0] subplot
box2 = ax[2].get_position()
ax[0].set_position([box0.x0, box0.y0, box2.x0 + 0.4*box2.width, box0.height])
box0 = ax[0].get_position()
ax[1].set_position([box0.x0 + box0.width, box0.y0 + box0.height + 0.015, 0,0])
ax[1].set_axis_off()
ax[2].set_position([box0.x0 + box0.width ,box0.y0 + box0.height - 0.25, 0,0])
ax[2].set_axis_off()
#Displaying plot
plt.sca(ax[0])
plt.xscale('log')
plt.xlim(0.001, 0.1)
plt.ylim(0, 5)
plt.xlabel(r'$S_T$')
plt.ylabel(r'$\Delta C$')
ax[1].legend(title = r'$\langle c \rangle$ [mol/L]',
bbox_to_anchor = (1,1), loc = 'upper left')
ax[2].legend(title = r'$-\Delta T$ [K]', bbox_to_anchor = (1,1), loc = 'upper left')
#Suptitle is the title of the figure. You can also have titles for the individual subplots
plt.suptitle('Steady state concentration gradient as a function of Soret-coefficient\n'
'for different temperature gradients and total concentrations')

Scatterplot with marginal KDE plots and multiple categories in Matplotlib

I'd like a function in Matplotlib similar to the Matlab 'scatterhist' function which takes continuous values for 'x' and 'y' axes, plus a categorical variable as input; and produces a scatter plot with marginal KDE plots and two or more categorical variables in different colours as output:
I've found examples of scatter plots with marginal histograms in Matplotlib, marginal histograms in Seaborn jointplot, overlapping histograms in Matplotlib and marginal KDE plots in Matplotib ; but I haven't found any examples which combine scatter plots with marginal KDE plots and are colour coded to indicate different categories.
If possible, I'd like a solution which uses 'vanilla' Matplotlib without Seaborn, as this will avoid dependencies and allow complete control and customisation of the plot appearance using standard Matplotlib commands.
I was going to try to write something based on the above examples; but before doing so wanted to check whether a similar function was already available, and if not then would be grateful for any guidance on the best approach to use.
#ImportanceOfBeingEarnest: Many thanks for your help.
Here's my first attempt at a solution.
It's a bit hacky but achieves my objectives, and is fully customisable using standard matplotlib commands. I'm posting the code here with annotations in case anyone else wishes to use it or develop it further. If there are any improvements or neater ways of writing the code I'm always keen to learn and would be grateful for guidance.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
from scipy import stats
label = ['Setosa','Versicolor','Virginica'] # List of labels for categories
cl = ['b','r','y'] # List of colours for categories
categories = len(label)
sample_size = 20 # Number of samples in each category
# Create numpy arrays for dummy x and y data:
x = np.zeros(shape=(categories, sample_size))
y = np.zeros(shape=(categories, sample_size))
# Generate random data for each categorical variable:
for n in range (0, categories):
x[n,:] = np.array(np.random.randn(sample_size)) + 4 + n
y[n,:] = np.array(np.random.randn(sample_size)) + 6 - n
# Set up 4 subplots as axis objects using GridSpec:
gs = gridspec.GridSpec(2, 2, width_ratios=[1,3], height_ratios=[3,1])
# Add space between scatter plot and KDE plots to accommodate axis labels:
gs.update(hspace=0.3, wspace=0.3)
# Set background canvas colour to White instead of grey default
fig = plt.figure()
fig.patch.set_facecolor('white')
ax = plt.subplot(gs[0,1]) # Instantiate scatter plot area and axis range
ax.set_xlim(x.min(), x.max())
ax.set_ylim(y.min(), y.max())
ax.set_xlabel('x')
ax.set_ylabel('y')
axl = plt.subplot(gs[0,0], sharey=ax) # Instantiate left KDE plot area
axl.get_xaxis().set_visible(False) # Hide tick marks and spines
axl.get_yaxis().set_visible(False)
axl.spines["right"].set_visible(False)
axl.spines["top"].set_visible(False)
axl.spines["bottom"].set_visible(False)
axb = plt.subplot(gs[1,1], sharex=ax) # Instantiate bottom KDE plot area
axb.get_xaxis().set_visible(False) # Hide tick marks and spines
axb.get_yaxis().set_visible(False)
axb.spines["right"].set_visible(False)
axb.spines["top"].set_visible(False)
axb.spines["left"].set_visible(False)
axc = plt.subplot(gs[1,0]) # Instantiate legend plot area
axc.axis('off') # Hide tick marks and spines
# Plot data for each categorical variable as scatter and marginal KDE plots:
for n in range (0, categories):
ax.scatter(x[n],y[n], color='none', label=label[n], s=100, edgecolor= cl[n])
kde = stats.gaussian_kde(x[n,:])
xx = np.linspace(x.min(), x.max(), 1000)
axb.plot(xx, kde(xx), color=cl[n])
kde = stats.gaussian_kde(y[n,:])
yy = np.linspace(y.min(), y.max(), 1000)
axl.plot(kde(yy), yy, color=cl[n])
# Copy legend object from scatter plot to lower left subplot and display:
# NB 'scatterpoints = 1' customises legend box to show only 1 handle (icon) per label
handles, labels = ax.get_legend_handles_labels()
axc.legend(handles, labels, scatterpoints = 1, loc = 'center', fontsize = 12)
plt.show()`
`
Version 2, using Pandas to import 'real' data from a csv file, with a different number of entries in each category. (csv file format: row 0 = headers; col 0 = x values, col 1 = y values, col 2 = category labels). Scatterplot axis and legend labels are generated from column headers.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
from scipy import stats
import pandas as pd
"""
Create scatter plot with marginal KDE plots
from csv file with 3 cols of data
formatted as following example (first row of
data are headers):
'x_label', 'y_label', 'category_label'
4,5,'virginica'
3,6,'sentosa'
4,6, 'virginica' etc...
"""
df = pd.read_csv('iris_2.csv') # enter filename for csv file to be imported (within current working directory)
cl = ['b','r','y', 'g', 'm', 'k'] # Custom list of colours for each categories - increase as needed...
headers = list(df.columns) # Extract list of column headers
# Find min and max values for all x (= col [0]) and y (= col [1]) in dataframe:
xmin, xmax = df.min(axis=0)[0], df.max(axis=0)[0]
ymin, ymax = df.min(axis=0)[1], df.max(axis=0)[1]
# Create a list of all unique categories which occur in the right hand column (ie index '2'):
category_list = df.ix[:,2].unique()
# Set up 4 subplots and aspect ratios as axis objects using GridSpec:
gs = gridspec.GridSpec(2, 2, width_ratios=[1,3], height_ratios=[3,1])
# Add space between scatter plot and KDE plots to accommodate axis labels:
gs.update(hspace=0.3, wspace=0.3)
fig = plt.figure() # Set background canvas colour to White instead of grey default
fig.patch.set_facecolor('white')
ax = plt.subplot(gs[0,1]) # Instantiate scatter plot area and axis range
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.set_xlabel(headers[0], fontsize = 14)
ax.set_ylabel(headers[1], fontsize = 14)
ax.yaxis.labelpad = 10 # adjust space between x and y axes and their labels if needed
axl = plt.subplot(gs[0,0], sharey=ax) # Instantiate left KDE plot area
axl.get_xaxis().set_visible(False) # Hide tick marks and spines
axl.get_yaxis().set_visible(False)
axl.spines["right"].set_visible(False)
axl.spines["top"].set_visible(False)
axl.spines["bottom"].set_visible(False)
axb = plt.subplot(gs[1,1], sharex=ax) # Instantiate bottom KDE plot area
axb.get_xaxis().set_visible(False) # Hide tick marks and spines
axb.get_yaxis().set_visible(False)
axb.spines["right"].set_visible(False)
axb.spines["top"].set_visible(False)
axb.spines["left"].set_visible(False)
axc = plt.subplot(gs[1,0]) # Instantiate legend plot area
axc.axis('off') # Hide tick marks and spines
# For each category in the list...
for n in range(0, len(category_list)):
# Create a sub-table containing only entries matching current category:
st = df.loc[df[headers[2]] == category_list[n]]
# Select first two columns of sub-table as x and y values to be plotted:
x = st[headers[0]]
y = st[headers[1]]
# Plot data for each categorical variable as scatter and marginal KDE plots:
ax.scatter(x,y, color='none', s=100, edgecolor= cl[n], label = category_list[n])
kde = stats.gaussian_kde(x)
xx = np.linspace(xmin, xmax, 1000)
axb.plot(xx, kde(xx), color=cl[n])
kde = stats.gaussian_kde(y)
yy = np.linspace(ymin, ymax, 1000)
axl.plot(kde(yy), yy, color=cl[n])
# Copy legend object from scatter plot to lower left subplot and display:
# NB 'scatterpoints = 1' customises legend box to show only 1 handle (icon) per label
handles, labels = ax.get_legend_handles_labels()
axc.legend(handles, labels, title = headers[2], scatterpoints = 1, loc = 'center', fontsize = 12)
plt.show()