How can I use matplotlib.pyplot to customize geopandas plots? - matplotlib

What is the difference between geopandas plots and matplotlib plots? Why are not all keywords available?
In geopandas there is markersize, but not markeredgecolor...
In the example below I plot a pandas df with some styling, then transform the pandas df to a geopandas df. Simple plotting is working, but no additional styling.
This is just an example. In my geopandas plots I would like to customize, markers, legends, etc. How can I access the relevant matplotlib objects?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
X = np.linspace(-6, 6, 1024)
Y = np.sinc(X)
df = pd.DataFrame(Y, X)
plt.plot(X,Y,linewidth = 3., color = 'k', markersize = 9, markeredgewidth = 1.5, markerfacecolor = '.75', markeredgecolor = 'k', marker = 'o', markevery = 32)
# alternatively:
# df.plot(linewidth = 3., color = 'k', markersize = 9, markeredgewidth = 1.5, markerfacecolor = '.75', markeredgecolor = 'k', marker = 'o', markevery = 32)
plt.show()
# create GeoDataFrame from df
df.reset_index(inplace=True)
df.rename(columns={'index': 'Y', 0: 'X'}, inplace=True)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Y'], df['X']))
gdf.plot(linewidth = 3., color = 'k', markersize = 9) # working
gdf.plot(linewidth = 3., color = 'k', markersize = 9, markeredgecolor = 'k') # not working
plt.show()

You're probably confused by the fact that both libraries named the method .plot(. In matplotlib that specifically translates to a mpl.lines.Line2D object, which also contains the markers and their styling.
Geopandas, assumes you want to plot geographic data, and uses a Path for this (mpl.collections.PathCollection). That has for example the face and edgecolors, but no markers. The facecolor comes into play whenever your path closes and forms a polygon (your example doesn't, making it "just" a line).
Geopandas seems to use a bit of a trick for points/markers, it appears to draw a "path" using the "CURVE4" code (cubic Bézier).
You can explore what's happening if you capture the axes that geopandas returns:
ax = gdf.plot(...
Using ax.get_children() you'll get all artists that have been added to the axes, since this is a simple plot, it's easy to see that the PathCollection is the actual data. The other artists are drawing the axis/spines etc.
[<matplotlib.collections.PathCollection at 0x1c05d5879d0>,
<matplotlib.spines.Spine at 0x1c05d43c5b0>,
<matplotlib.spines.Spine at 0x1c05d43c4f0>,
<matplotlib.spines.Spine at 0x1c05d43c9d0>,
<matplotlib.spines.Spine at 0x1c05d43f1c0>,
<matplotlib.axis.XAxis at 0x1c05d036590>,
<matplotlib.axis.YAxis at 0x1c05d43ea10>,
Text(0.5, 1.0, ''),
Text(0.0, 1.0, ''),
Text(1.0, 1.0, ''),
<matplotlib.patches.Rectangle at 0x1c05d351b10>]
If you reduce the amount of points a lot, like use 5 instead of 1024, retrieving the Path's drawn show the coordinates and also the codes used:
pcoll = ax.get_children()[0] # the first artist is the PathCollection
path = pcoll.get_paths()[0] # it only contains 1 Path
print(path.codes) # show the codes used.
# array([ 1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
# 4, 4, 4, 4, 4, 4, 4, 4, 79], dtype=uint8)
Some more info about how these paths work can be found at:
https://matplotlib.org/stable/tutorials/advanced/path_tutorial.html
So long story short, you do have all the same keywords as when using Matplotlib, but they're the keywords for Path's and not the Line2D object that you might expect.
You can always flip the order around, and start with a Matplotlib figure/axes created by you, and pass that axes to Geopandas when you want to plot something. That might make it easier or more intuitive when you (also) want to plot other things in the same axes. It does require perhaps a bit more discipline to make sure the (spatial)coordinates etc match.
I personally almost always do that, because it allows to do most of the plotting using the same Matplotlib API's. Which admittedly has perhaps a slightly steeper learning curve. But overall I find it easier compared to having to deal with every package's slightly different interpretation that uses Matplotlib under the hood (eg geopandas, seaborn, xarray etc). But that really depends on where you're coming from.

Thank you for your detailed answer. Based on this I came up with this simplified code from my real project.
I have a shapefile shp and some point data df which I want to plot. shp is plotted with geopandas, df with matplotlib.plt. No need for transferring the point data into a geodataframe gdf as I did initially.
# read marker data (places with coordindates)
df = pd.read_csv("../obese_pct_by_place.csv")
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['sweref99_lng'], df['sweref99_lat']))
# read shapefile
shp = gpd.read_file("../../SWEREF_Shapefiles/KommunSweref99TM/Kommun_Sweref99TM_region.shp")
fig, ax = plt.subplots(figsize=(10, 8))
ax.set_aspect('equal')
shp.plot(ax=ax)
# plot obesity markers
# geopandas, no edgecolor here
# gdf.plot(ax=ax, marker='o', c='r', markersize=gdf['obese'] * 25)
# matplotlib.pyplot with edgecolor
plt.scatter(df['sweref99_lng'], df['sweref99_lat'], c='r', edgecolor='k', s=df['obese'] * 25)
plt.show()

Related

Seaborn jointplot link x-axis to Matplotlib subplots

Is there a way to add additional subplots created with vanilla Matplotlib to (below) a Seaborn jointplot, sharing the x-axis? Ideally I'd like to control the ratio between the jointplot and the additional plots (similar to gridspec_kw={'height_ratios':[3, 1, 1]}
I tried to fake it by tuning figsize in the Matplotlib subplots, but obviously it doesn't work well when the KDE curves in the marginal plot change. While I could manually resize the output PNG to shrink/grow one of the figures, I'd like to have everything aligned automatically.
I know this is tricky with the way the joint grid is set up, but maybe it is reasonably simple for someone fluent in the underpinnings of Seaborn.
Here is a minimal working example, but there are two separate figures:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Figure 1
diamonds = sns.load_dataset('diamonds')
g = sns.jointplot(
data=diamonds,
x="carat",
y="price",
hue="cut",
xlim=(1, 2),
)
g.ax_marg_x.remove()
Figure 2
fig, (ax1, ax2) = plt.subplots(2,1,sharex=True)
ax1.scatter(x=diamonds["carat"], y=diamonds["depth"], color="gray", edgecolor="black")
ax1.set_xlim([1, 2])
ax1.set_ylabel("depth")
ax2.scatter(x=diamonds["carat"], y=diamonds["table"], color="gray", edgecolor="black")
ax2.set_xlabel("carat")
ax2.set_ylabel("table")
Desired output:
I think this is a case where setting up the figure using matplotlib functions is going to be better than working backwards from a seaborn figure layout that doesn't really match the use-case.
If you have a non-full subplot grid, you'll have to decide whether you want to (A) set up all the subplots and then remove the ones you don't want or (B) explicitly add each of the subplots you do want. Let's go with option A here.
figsize = (6, 8)
gridspec_kw = dict(
nrows=3, ncols=2,
width_ratios=[5, 1],
height_ratios=[4, 1, 1],
)
subplot_kw = dict(sharex="col", sharey="row")
fig = plt.figure(figsize=figsize, constrained_layout=True)
axs = fig.add_gridspec(**gridspec_kw).subplots(**subplot_kw)
sns.kdeplot(data=df, y="price", hue="cut", legend=False, ax=axs[0, 1])
sns.scatterplot(data=df, x="carat", y="price", hue="cut", ax=axs[0, 0])
sns.scatterplot(data=df, x="carat", y="depth", color=".2", ax=axs[1, 0])
sns.scatterplot(data=df, x="carat", y="table", color=".2", ax=axs[2, 0])
axs[0, 0].set(xlim=(1, 2))
axs[1, 1].remove()
axs[2, 1].remove()
BTW, this is almost a bit easier with plt.subplot_mosaic, but it does not yet support axis sharing.
You could take the figure created by jointplot(), move its padding (with subplots_adjust()) and add 2 extra axes.
The example code will need some tweaking for each particular situation.
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import seaborn as sns
diamonds = sns.load_dataset('diamonds')
g = sns.jointplot(data=diamonds, x="carat", y="price", hue="cut",
xlim=(1, 2), height=12)
g.ax_marg_x.remove()
g.fig.subplots_adjust(left=0.08, right=0.97, top=1.05, bottom=0.45)
axins1 = inset_axes(g.ax_joint, width="100%", height="30%",
bbox_to_anchor=(0, -0.4, 1, 1),
bbox_transform=g.ax_joint.transAxes, loc=3, borderpad=0)
axins2 = inset_axes(g.ax_joint, width="100%", height="30%",
bbox_to_anchor=(0, -0.75, 1, 1),
bbox_transform=g.ax_joint.transAxes, loc=3, borderpad=0)
shared_x_group = g.ax_joint.get_shared_x_axes()
shared_x_group.remove(g.ax_marg_x)
shared_x_group.join(g.ax_joint, axins1)
shared_x_group.join(g.ax_joint, axins2)
axins1.scatter(x=diamonds["carat"], y=diamonds["depth"], color="grey", edgecolor="black")
axins1.set_ylabel("depth")
axins2.scatter(x=diamonds["carat"], y=diamonds["table"], color="grey", edgecolor="black")
axins2.set_xlabel("carat")
axins2.set_ylabel("table")
g.ax_joint.set_xlim(1, 2)
plt.setp(axins1.get_xticklabels(), visible=False)
plt.show()
PS: How to share x axes of two subplots after they have been created contains some info about sharing axes (although here you simply get the same effect by setting the xlims for each of the subplots).
The code to position the new axes has been adapted from this tutorial example.

Pandas: plot a dataframe with on its right side rectangle colored according to an array's values

I have a dataframe with 100 rows and 4 columns. I have an array (size 100,1) filled with values spanning between 0 and 1. I would like to plot my dataframe, with on its right side a rectangle which will take a color depending on the value of the array at a specific row (see the poor drawing I made, the array is written to help understanding what I want). I would like the colors to be a gradient, where 0 = dark blue, and 1 = bright red.
I know how to create a colormap, but this is slightly different.
Which function do you advise me to use ?
Here is some code I use for the plotting:
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
rectangle_values = np.random.rand(100)
plt.figure(figsize=(15,15))
ax = sns.heatmap(df, cbar = None)
)
My solution would be to use plot.subplots to create two plots with the width_ratios argument as something like 19:1. On the left hand side you plot the data frame as usual, on the right hand side you plot the vector. Notice that I am using vmin and vmax to set the boundaries as required (0, 1) for the vector. Also, for the requested colors, I'm using MatPlotLib's RdBu (Red and Blue map), but it was needed to reverse it in order to meet your requirements. You can confirm the colors by the values, on this run the generated random values were [0.74, 0.96, 0.87, 0.50, 0.26].
df = pd.DataFrame(np.random.randint(0,5,size=(5, 4)), columns=list('ABCD'))
rectangle_values = pd.DataFrame(np.random.rand(5), columns=['foo'])
plt.subplots(1, 2, gridspec_kw={'width_ratios': [19, 1]})
plt.subplot(1, 2, 1)
sns.heatmap(df, cbar = None)
plt.subplot(1, 2, 2)
sns.heatmap(rectangle_values, cbar = None, cmap=plt.cm.get_cmap('RdBu').reversed(), vmin=0, vmax=1)
plt.show()
And the output is:

mouse-over only on actual data points

Here's a really simple line chart.
%matplotlib notebook
import matplotlib.pyplot as plt
lines = plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
plt.setp(lines,marker='D')
plt.ylabel('foo')
plt.xlabel('bar')
plt.show()
If I move my mouse over the chart, I get the x and y values for wherever the pointer is. Is there any way to only get values only when I'm actually over a data point?
I understood you wanted to modify the behavior of the coordinates displayed in the status bar at the bottom right of the plot, is that right?
If so, you can "hijack" the Axes.format_coord() function to make it display whatever you want. You can see an example of this on matplotlib's example gallery.
In your case, something like this seem to do the trick?
my_x = np.array([1, 2, 3, 4])
my_y = np.array([1, 4, 9, 16])
eps = 0.1
def format_coord(x, y):
close_x = np.isclose(my_x, x, atol=eps)
close_y = np.isclose(my_y, y, atol=eps)
if np.any(close_x) and np.any(close_y):
return 'x=%s y=%s' % (ax.format_xdata(my_x[close_x]), ax.format_ydata(my_y[close_y]))
else:
return ''
fig, ax = plt.subplots()
ax.plot(my_x, my_y, 'D-')
ax.set_ylabel('foo')
ax.set_xlabel('bar')
ax.format_coord = format_coord
plt.show()

Not able to add 'map.drawcoastline' to 3d figure using 'ax.add_collection3d(map.drawcoastlines())'

So I want to plot a 3d map using matplotlib basemap. But an error message comes popping up.
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.basemap import Basemap
from matplotlib.collections import PolyCollection
import numpy as np
map = Basemap(llcrnrlon=-20,llcrnrlat=0,urcrnrlon=15,urcrnrlat=50,)
fig = plt.figure()
ax = Axes3D(fig)
#ax.set_axis_off()
ax.azim = 270
ax.dist = 7
polys = []
for polygon in map.landpolygons:
polys.append(polygon.get_coords())
lc=PolyCollection(polys,edgecolor='black',facecolor='#DDDDDD',closed=False)
ax.add_collection3d(lc)
ax.add_collection3d(map.drawcoastlines(linewidth=0.25))
ax.add_collection3d(map.drawcountries(linewidth=0.35))
lons = np.array([-13.7, -10.8, -13.2, -96.8, -7.99, 7.5, -17.3, -3.7])
lats = np.array([9.6, 6.3, 8.5, 32.7, 12.5, 8.9, 14.7, 40.39])
cases = np.array([1971, 7069, 6073, 4, 6, 20, 1, 1])
deaths = np.array([1192, 2964, 1250, 1, 5, 8, 0, 0])
places = np.array(['Guinea', 'Liberia', 'Sierra Leone','United States','Mali','Nigeria', 'Senegal', 'Spain'])
x, y = map(lons, lats)
ax.bar3d(x, y, np.zeros(len(x)), 2, 2, deaths, color= 'r', alpha=0.8)
plt.show()
I got an error message on line 21{i.e ax.add_collection3d(map.drawcoastlines(linewidth=0.25))} saying:-
'It is not currently possible to manually set the aspect '
NotImplementedError: It is not currently possible to manually set the aspect on 3D axes'
I found this question because I had the exact question.
I later chanced upon some documentation that revealed the workaround - if setting of aspect is not implemented, then let's not set it by setting fix_aspect to false:
map = Basemap(fix_aspect=False)
EDIT:
I suppose I should add a little more to my previous answer to make it a little easier to understand what to do.
The NotImplementedError is a deliberate addition by the matplotlib team, as can be seen here. What the error is saying is that we are trying to fix the aspect ratio of the plot, but this is not implemented in 3d plots.
This error occurs when using mpl_toolkits.basemap() with 3d plots as it sets fix_aspect=True by default.
Therefore, to do away with the NotImplementedError, one can consider adding fix_aspect=False when calling mpl_toolkits.basemap(). For example:
map = Basemap(llcrnrlon=-20,llcrnrlat=0,urcrnrlon=15,urcrnrlat=50,fix_aspect=False)

Matplotlib: combining two bar charts

I'm trying to generate 'violin'-like bar charts, however i'm running in several difficulties described bellow...
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
# init data
label = ['aa', 'b', 'cc', 'd']
data1 = [5, 7, 6, 9]
data2 = [7, 3, 6, 1]
data1_minus = np.array(data1)*-1
gs = gridspec.GridSpec(1, 2, top=0.95, bottom=0.07,)
fig = plt.figure(figsize=(7.5, 4.0))
# adding left bar chart
ax1 = fig.add_subplot(gs[0])
ax1.barh(pos, data1_minus)
ax1.yaxis.tick_right()
ax1.yaxis.set_label(label)
# adding right bar chart
ax2 = fig.add_subplot(gs[1], sharey=ax1)
ax2.barh(pos, data2)
Trouble adding 'label' as labels for both charts to share.
Centering the labels between the both plots (as well as vertically in the center of each bar)
Keeping just the ticks on the outer yaxis (not inner, where the labels would go)
If I understand the question correctly, I believe these changes accomplish what you're looking for:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
# init data
label = ['aa', 'b', 'cc', 'd']
data1 = [5, 7, 6, 9]
data2 = [7, 3, 6, 1]
data1_minus = np.array(data1)*-1
gs = gridspec.GridSpec(1, 2, top=0.95, bottom=0.07,)
fig = plt.figure(figsize=(7.5, 4.0))
pos = np.arange(4)
# adding left bar chart
ax1 = fig.add_subplot(gs[0])
ax1.barh(pos, data1_minus, align='center')
# set tick positions and labels appropriately
ax1.yaxis.tick_right()
ax1.set_yticks(pos)
ax1.set_yticklabels(label)
ax1.tick_params(axis='y', pad=15)
# adding right bar chart
ax2 = fig.add_subplot(gs[1], sharey=ax1)
ax2.barh(pos, data2, align='center')
# turn off the second axis tick labels without disturbing the originals
[lbl.set_visible(False) for lbl in ax2.get_yticklabels()]
plt.show()
This yields this plot:
As for keeping the actual numerical ticks (if you want those), the normal matplotlib interface ties the ticks pretty closely together when the axes are shared (or twinned). However, the axes_grid1 toolkit can allow you more control, so if you want some numerical ticks you can replace the entire ax2 section above with the following:
from mpl_toolkits.axes_grid1 import host_subplot
ax2 = host_subplot(gs[1], sharey=ax1)
ax2.barh(pos, data2, align='center')
par = ax2.twin()
par.set_xticklabels('')
par.set_yticks(pos)
par.set_yticklabels([str(x) for x in pos])
[lbl.set_visible(False) for lbl in ax2.get_yticklabels()]
which yields: