Matplotlib: Boxplot and bar chart shifted when overlaid using twinx

Matplotlib: Boxplot and bar chart shifted when overlaid using twinx - matplotlib

When I create a barplot and overlay a bar chart using twin x then the boxes appear shifted by one to the right compared to the bars.
This problem has been identified before (Python pandas plotting shift x-axis if twinx two y-axes), but the solution no longer seems to work. (I am using Matplotlib 3.1.0)
li_str = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten']
df = pd.DataFrame([[i]+j[k] for i,j in {li_str[i]:np.random.randn(j,2).tolist() for i,j in \
enumerate(np.random.randint(5, 15, len(li_str)))}.items() for k in range(len(j))]
, columns=['A', 'B', 'C'])
fig, ax = plt.subplots(figsize=(16,6))
ax2 = ax.twinx()
df_gb = df.groupby('A').count()
p1 = df.boxplot(ax=ax, column='B', by='A', sym='')
p2 = df_gb['B'].plot(ax=ax2, kind='bar', figsize=(16,6)
, colormap='Set2', alpha=0.3, secondary_y=True)
plt.ylim([0, 20])
The output shows the boxes shifted to the right by one compared to the bars. The respondent of the previous post rightly pointed out that the tick-locations of the bars are zero-based and the tick-locations of the boxes are one-based, which is causing the shift. However, the plt.bar() method the respondent uses to fix it, now throws an error, since an x-parameter has been made mandatory. If the x-parameter is provided it still throws an error because there is no parameter 'left' anymore.
df.boxplot(column='B', by='A')
plt.twinx()
plt.bar(left=plt.xticks()[0], height=df.groupby('A').count()['B'],
align='center', alpha=0.3)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-186-e257461650c1> in <module>
26 plt.twinx()
27 plt.bar(left=plt.xticks()[0], height=df.groupby('A').count()['B'],
---> 28 align='center', alpha=0.3)
TypeError: bar() missing 1 required positional argument: 'x'
In addition, I would much prefer a fix using the object-oriented approach with reference to the axes, because I want to place the chart into an interactive ipywidget.
Here is the ideal chart:
Many thanks.

You can use the following trick: Provide the x-values for placing your bars starting at x=1. To do so, use range(1, len(df_gb['B'])+1) as the x-values.
fig, ax = plt.subplots(figsize=(8, 4))
ax2 = ax.twinx()
df_gb = df.groupby('A').count()
df.boxplot(column='B', by='A', ax=ax)
ax2.bar(range(1, len(df_gb['B'])+1), height=df_gb['B'],align='center', alpha=0.3)

Related

How can I use matplotlib.pyplot to customize geopandas plots?

What is the difference between geopandas plots and matplotlib plots? Why are not all keywords available?
In geopandas there is markersize, but not markeredgecolor...
In the example below I plot a pandas df with some styling, then transform the pandas df to a geopandas df. Simple plotting is working, but no additional styling.
This is just an example. In my geopandas plots I would like to customize, markers, legends, etc. How can I access the relevant matplotlib objects?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
X = np.linspace(-6, 6, 1024)
Y = np.sinc(X)
df = pd.DataFrame(Y, X)
plt.plot(X,Y,linewidth = 3., color = 'k', markersize = 9, markeredgewidth = 1.5, markerfacecolor = '.75', markeredgecolor = 'k', marker = 'o', markevery = 32)
# alternatively:
# df.plot(linewidth = 3., color = 'k', markersize = 9, markeredgewidth = 1.5, markerfacecolor = '.75', markeredgecolor = 'k', marker = 'o', markevery = 32)
plt.show()
# create GeoDataFrame from df
df.reset_index(inplace=True)
df.rename(columns={'index': 'Y', 0: 'X'}, inplace=True)
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Y'], df['X']))
gdf.plot(linewidth = 3., color = 'k', markersize = 9) # working
gdf.plot(linewidth = 3., color = 'k', markersize = 9, markeredgecolor = 'k') # not working
plt.show()

You're probably confused by the fact that both libraries named the method .plot(. In matplotlib that specifically translates to a mpl.lines.Line2D object, which also contains the markers and their styling.
Geopandas, assumes you want to plot geographic data, and uses a Path for this (mpl.collections.PathCollection). That has for example the face and edgecolors, but no markers. The facecolor comes into play whenever your path closes and forms a polygon (your example doesn't, making it "just" a line).
Geopandas seems to use a bit of a trick for points/markers, it appears to draw a "path" using the "CURVE4" code (cubic Bézier).
You can explore what's happening if you capture the axes that geopandas returns:
ax = gdf.plot(...
Using ax.get_children() you'll get all artists that have been added to the axes, since this is a simple plot, it's easy to see that the PathCollection is the actual data. The other artists are drawing the axis/spines etc.
[<matplotlib.collections.PathCollection at 0x1c05d5879d0>,
<matplotlib.spines.Spine at 0x1c05d43c5b0>,
<matplotlib.spines.Spine at 0x1c05d43c4f0>,
<matplotlib.spines.Spine at 0x1c05d43c9d0>,
<matplotlib.spines.Spine at 0x1c05d43f1c0>,
<matplotlib.axis.XAxis at 0x1c05d036590>,
<matplotlib.axis.YAxis at 0x1c05d43ea10>,
Text(0.5, 1.0, ''),
Text(0.0, 1.0, ''),
Text(1.0, 1.0, ''),
<matplotlib.patches.Rectangle at 0x1c05d351b10>]
If you reduce the amount of points a lot, like use 5 instead of 1024, retrieving the Path's drawn show the coordinates and also the codes used:
pcoll = ax.get_children()[0] # the first artist is the PathCollection
path = pcoll.get_paths()[0] # it only contains 1 Path
print(path.codes) # show the codes used.
# array([ 1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
# 4, 4, 4, 4, 4, 4, 4, 4, 79], dtype=uint8)
Some more info about how these paths work can be found at:
https://matplotlib.org/stable/tutorials/advanced/path_tutorial.html
So long story short, you do have all the same keywords as when using Matplotlib, but they're the keywords for Path's and not the Line2D object that you might expect.
You can always flip the order around, and start with a Matplotlib figure/axes created by you, and pass that axes to Geopandas when you want to plot something. That might make it easier or more intuitive when you (also) want to plot other things in the same axes. It does require perhaps a bit more discipline to make sure the (spatial)coordinates etc match.
I personally almost always do that, because it allows to do most of the plotting using the same Matplotlib API's. Which admittedly has perhaps a slightly steeper learning curve. But overall I find it easier compared to having to deal with every package's slightly different interpretation that uses Matplotlib under the hood (eg geopandas, seaborn, xarray etc). But that really depends on where you're coming from.

Thank you for your detailed answer. Based on this I came up with this simplified code from my real project.
I have a shapefile shp and some point data df which I want to plot. shp is plotted with geopandas, df with matplotlib.plt. No need for transferring the point data into a geodataframe gdf as I did initially.
# read marker data (places with coordindates)
df = pd.read_csv("../obese_pct_by_place.csv")
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['sweref99_lng'], df['sweref99_lat']))
# read shapefile
shp = gpd.read_file("../../SWEREF_Shapefiles/KommunSweref99TM/Kommun_Sweref99TM_region.shp")
fig, ax = plt.subplots(figsize=(10, 8))
ax.set_aspect('equal')
shp.plot(ax=ax)
# plot obesity markers
# geopandas, no edgecolor here
# gdf.plot(ax=ax, marker='o', c='r', markersize=gdf['obese'] * 25)
# matplotlib.pyplot with edgecolor
plt.scatter(df['sweref99_lng'], df['sweref99_lat'], c='r', edgecolor='k', s=df['obese'] * 25)
plt.show()

Draw various plots in one figure

The image below shows, what i want, 3 different plots in one execution but using a function
enter image description here
enter image description here
I used the following code:
def box_hist_plot(data):
sns.set()
ax, fig = plt.subplots(1,3, figsize=(20,5))
sns.boxplot(x=data, linewidth=2.5, ax=fig[0])
plt.hist(x=data, bins=50, density=True, ax = fig[1])
sns.violinplot(x = data, ax=fig[2])
and i got this error:
inner() got multiple values for argument 'ax'

Besides the fact that you should not call a Figure object ax and an array of Axes object fig, your problem comes from the line plt.hist(...,ax=...). plt.hist() should not take an ax= parameter, but is meant to act on the "current" axes. If you want to specify which Axes you want to plot, you should use Axes.hist().
def box_hist_plot(data):
sns.set()
fig, axs = plt.subplots(1,3, figsize=(20,5))
sns.boxplot(x=data, linewidth=2.5, ax=axs[0])
axs[1].hist(x=data, bins=50, density=True)
sns.violinplot(x = data, ax=axs[2])

Debugging interactive matplotlib figures in Jupyter notebooks

The example below should highlight the point that you click on, and change the title of the graph to show the label associated with that point.
If I run this Python script as a script, when I click on a point I will get an error " line 15, in onpick
TypeError: only integer scalar arrays can be converted to a scalar index", which is expected. event.ind is a list, and I need to change that to ind = event.ind[0] to be correct here.
However, when I run this in a Jupyter notebook, the figure appears, but the error is silently ignored, so it just appears that the code does not work. Is there a way to get Jupyter to show me that an error has occurred?
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
x = [0, 1, 2, 3, 4, 5]
labels = ['a', 'b', 'c', 'd', 'e', 'f']
ax.plot(x, 'bo', picker=5)
# this is the transparent marker for the selected data point
marker, = ax.plot([0], [0], 'yo', visible=False, alpha=0.8, ms=15)
def onpick(event):
ind = event.ind
ax.set_title('Data point {0} is labeled "{1}"'.format(ind, labels[ind]))
marker.set_visible(True)
marker.set_xdata(x[ind])
marker.set_ydata(x[ind])
ax.figure.canvas.draw() # this line is critical to change the linewidth
fig.canvas.mpl_connect('pick_event', onpick)
plt.show()

Matplotlib: Assign legend to different figures

Inside a loop I am calculating some things and then I want to plot them in two different figures. I have set up the figures as
susc_comp, (ax1,ax2) = plt.subplots( 2, 1, sharex=True, sharey='none', figsize=(8.3,11.7))
cole_cole, (ax3) = plt.subplots( 1, 1, sharex='none', sharey='none', figsize=(8.3,11.7))
for j,temp in enumerate(indexes_T[i]):
Calculate and plot in the corresponding ax1,ax2,ax3
plt.legend(loc=0, fontsize='small', numpoints = 1, ncol=(len(indexes_T[i]))/2, frameon=False)
susc_comp.savefig('suscp_components'+str(field)+'Oe.png', dpi=300)
cole_cole.savefig('Cole_Cole'+str(field)+'Oe.png', dpi=300)
But I get the legend only in the sus_comp figure (it is the same legend for both figures). How can I select the figure and add the legend to each of them?
Thank you very much!

You can call figure.legend directly (although I think this may have less functionality than plt.legend). Therefore, I would do this a different way.
The question states that both legends are the same. In addition, the second figure only has 1 axes in it. Therefore one solution would be to get the handles and labels from ax3, then manually apply those to both figures. A simplified example is below:
import matplotlib.pyplot as plt
susc_comp, (ax1, ax2) = plt.subplots(1,2)
cole_cole, ax3 = plt.subplots()
ax1.plot([1,2,3], label="Test1")
ax2.plot([3,2,1], label="Test2")
ax3.plot([1,2,3], label="Test1")
ax3.plot([3,2,1], label="Test2")
handles, labels = ax3.get_legend_handles_labels()
ax2.legend(handles, labels, loc=1, fontsize='small', numpoints = 1)
ax3.legend(handles, labels, loc=1, fontsize='small', numpoints = 1)
plt.show()
This gives the following 2 figures:

Labels on Gridspec [duplicate]

I'm facing a problem in showing the legend in the correct format using matplotlib.
EDIT: I have 4 subplots in a figure in 2 by 2 format and I want legend only on the first subplot which has two lines plotted on it. The legend that I got using the code attached below contained endless entries and extended vertically throughout the figure. When I use the same code using linspace to generate fake data the legend works absolutely fine.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import os
#------------------set default directory, import data and create column output vectors---------------------------#
path="C:/Users/Pacman/Data files"
os.chdir(path)
data =np.genfromtxt('vrp.txt')
x=np.array([data[:,][:,0]])
y1=np.array([data[:,][:,6]])
y2=np.array([data[:,][:,7]])
y3=np.array([data[:,][:,9]])
y4=np.array([data[:,][:,11]])
y5=np.array([data[:,][:,10]])
nrows=2
ncols=2
tick_l=6 #length of ticks
fs_axis=16 #font size of axis labels
plt.rcParams['axes.linewidth'] = 2 #Sets global line width of all the axis
plt.rcParams['xtick.labelsize']=14 #Sets global font size for x-axis labels
plt.rcParams['ytick.labelsize']=14 #Sets global font size for y-axis labels
plt.subplot(nrows, ncols, 1)
ax=plt.subplot(nrows, ncols, 1)
l1=plt.plot(x, y2, 'yo',label='Flow rate-fan')
l2=plt.plot(x,y3,'ro',label='Flow rate-discharge')
plt.title('(a)')
plt.ylabel('Flow rate ($m^3 s^{-1}$)',fontsize=fs_axis)
plt.xlabel('Rupture Position (ft)',fontsize=fs_axis)
# This part is not working
plt.legend(loc='upper right', fontsize='x-large')
#Same code for rest of the subplots
I tried to implement a fix suggested in the following link, however, could not make it work:
how do I make a single legend for many subplots with matplotlib?
Any help in this regard will be highly appreciated.

If I understand correctly, you need to tell plt.legend what to put as legends... at this point it is being loaded empty. What you get must be from another source. I have quickly the following, and of course when I run fig.legend as you do I get nothing.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax1 = fig.add_axes([0.1, 0.1, 0.4, 0.7])
ax2 = fig.add_axes([0.55, 0.1, 0.4, 0.7])
x = np.arange(0.0, 2.0, 0.02)
y1 = np.sin(2*np.pi*x)
y2 = np.exp(-x)
l1, l2 = ax1.plot(x, y1, 'rs-', x, y2, 'go')
y3 = np.sin(4*np.pi*x)
y4 = np.exp(-2*x)
l3, l4 = ax2.plot(x, y3, 'yd-', x, y4, 'k^')
fig.legend(loc='upper right', fontsize='x-large')
#fig.legend((l1, l2), ('Line 1', 'Line 2'), 'upper left')
#fig.legend((l3, l4), ('Line 3', 'Line 4'), 'upper right')
plt.show()
I'd suggest doing one by one, and then applying for all.

It is useful to work with the axes directly (ax in your case) when when working with subplots. So if you set up two plots in a figure and only wish to have a legend in your second plot:
t = np.linspace(0, 10, 100)
plt.figure()
ax1 = plt.subplot(2, 1, 1)
ax1.plot(t, t * t)
ax2 = plt.subplot(2, 1, 2)
ax2.plot(t, t * t * t)
ax2.legend('Cubic Function')
Note that when creating the legend, I am doing so on ax2 as opposed to plt. If you wish to create a second legend for the first subplot, you can do so in the same way but on ax1.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Matplotlib: Boxplot and bar chart shifted when overlaid using twinx - matplotlib

Related

How can I use matplotlib.pyplot to customize geopandas plots?

Draw various plots in one figure

Debugging interactive matplotlib figures in Jupyter notebooks

Matplotlib: Assign legend to different figures

Labels on Gridspec [duplicate]

Categories

Resources