How to add labels to sets of seaborn boxplot - matplotlib

I have 2 sets of boxplots, one set in blue color and another in red color. I want the legend to show the label for each set of boxplots, i.e.
Legend:
-blue box- A, -red box- B
Added labels='A' and labels='B' within sns.boxplot(), but didn't work with error message "No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument". How do I add the labels?
enter image description here
code for the inserted image:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
x = list(range(1,13))
n = 40
index = [item for item in x for i in range(n)]
np.random.seed(123)
df = pd.DataFrame({'A': np.random.normal(30, 2, len(index)),
'B': np.random.normal(10, 2, len(index))},
index=index)
red_diamond = dict(markerfacecolor='r', marker='D')
blue_dot = dict(markerfacecolor='b', marker='o')
plt.figure(figsize=[10,5])
ax = plt.gca()
ax1 = sns.boxplot( x=df.index, y=df['A'], width=0.5, color='red', \
boxprops=dict(alpha=.5), flierprops=red_diamond, labels='A')
ax2 = sns.boxplot( x=df.index, y=df['B'], width=0.5, color='blue', \
boxprops=dict(alpha=.5), flierprops=blue_dot, labels='B')
plt.ylabel('Something')
plt.legend(loc="center", fontsize=8, frameon=False)
plt.show()
Here are the software versions I am using: seaborn version 0.11.2. matplotlib version 3.5.1. python version 3.10.1

The following approach sets a label via the boxprops, and creates a legend using part of ax.artists. (Note that ax, ax1 and ax2 of the question's code are all pointing to the same subplot, so here only ax is used.)
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
x = np.arange(1, 13)
index = np.repeat(x, 40)
np.random.seed(123)
df = pd.DataFrame({'A': np.random.normal(30, 2, len(index)),
'B': np.random.normal(10, 2, len(index))},
index=index)
red_diamond = dict(markerfacecolor='r', marker='D')
blue_dot = dict(markerfacecolor='b', marker='o')
plt.figure(figsize=[10, 5])
ax = sns.boxplot(data=df, x=df.index, y='A', width=0.5, color='red',
boxprops=dict(alpha=.5, label='A'), flierprops=red_diamond)
sns.boxplot(data=df, x=df.index, y='B', width=0.5, color='blue',
boxprops=dict(alpha=.5, label='B'), flierprops=blue_dot, ax=ax)
ax.set_ylabel('Something')
handles, labels = ax.get_legend_handles_labels()
handles = [h for h, lbl, prev in zip(handles, labels, [None] + labels) if lbl != prev]
ax.legend(handles=handles, loc="center", fontsize=8, frameon=False)
plt.show()
Alternative approaches could be:
pd.melt the dataframe to long form, so hue could be used; a problem here is that then the legend wouldn't take the alpha from the boxprops into account; also setting different fliers wouldn't be supported
create a legend from custom handles

Related

Trying to place text in mpl just above the first yticklabel

I am having diffculties to move the text "Rank" exactly one line above the first label and by not using guesswork as I have different chart types with variable sizes, widths and also paddings between the labels and bars.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pylab import rcParams
rcParams['figure.figsize'] = 8, 6
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame.from_records(zip(np.arange(1,30)))
df.plot.barh(width=0.8,ax=ax,legend=False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.tick_params(axis='y', which='major', pad=36)
ax.set_title("Rankings")
ax.text(-5,30,"Rank")
plt.show()
Using transData.transform didn't get me any further. The problem seems to be that ax.text() with the position params of (0,0) aligns with the start of the bars and not the yticklabels which I need, so getting the exact position of yticklabels relative to the axis would be helpful.
The following approach creates an offset_copy transform, using "axes coordinates". The top left corner of the main plot is at position 0, 1 in axes coordinates. The ticks have a "pad" (between label and tick mark) and a "padding" (length of the tick mark), both measured in "points".
The text can be right aligned, just as the ticks. With "bottom" as vertical alignment, it will be just above the main plot. If that distance is too low, you could try ax.text(0, 1.01, ...) to have it a bit higher.
import matplotlib.pyplot as plt
from matplotlib.transforms import offset_copy
import pandas as pd
import numpy as np
from matplotlib import rcParams
rcParams['figure.figsize'] = 8, 6
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame.from_records(zip(np.arange(1, 30)))
df.plot.barh(width=0.8, ax=ax, legend=False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.tick_params(axis='y', which='major', pad=36)
ax.set_title("Rankings")
tick = ax.yaxis.get_major_ticks()[-1] # get information of one of the ticks
padding = tick.get_pad() + tick.get_tick_padding()
trans_offset = offset_copy(ax.transAxes, fig=fig, x=-padding, y=0, units='points')
ax.text(0, 1, "Rank", ha='right', va='bottom', transform=trans_offset)
# optionally also use tick.label.get_fontproperties()
plt.tight_layout()
plt.show()
I've answered my own question while Johan was had posted his one - which is pretty good and what I wanted. However, I post mine anyways as it uses an entirely different approach. Here I add a "ghost" row into the dataframe and label it appropriately which solves the problem:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from pylab import rcParams
rcParams['figure.figsize'] = 8, 6
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
df = pd.DataFrame.from_records(zip(np.arange(1,30)),columns=["val"])
#add a temporary header
new_row = pd.DataFrame({"val":0}, index=[0])
df = pd.concat([df[:],new_row]).reset_index(drop = True)
df.plot.barh(width=0.8,ax=ax,legend=False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.tick_params(left=False, bottom=False)
ax.tick_params(axis='y', which='major', pad=36)
ax.set_title("Rankings")
# Set the top label to "Rank"
yticklabels = [t for t in ax.get_yticklabels()]
yticklabels[-1]="Rank"
# Left align all labels
[t.set_ha("left") for t in ax.get_yticklabels()]
ax.set_yticklabels(yticklabels)
# delete the top bar effectively by setting it's height to 0
ax.patches[-1].set_height(0)
plt.show()
Perhaps the advantage is that it is always a constant distance above the top label, but with the disadvantage that this is a bit "patchy" in the most literal sense to transform your dataframe for this task.

How to turn seaborn boxplot fliers on/off with buttons

I want to implement buttons to turn on/off the fliers in a set of seaborn boxplots. I tried to follow the method of changing through the artists mentioned in this link: https://stackoverflow.com/a/36893152/18193150 but was unsuccessful. Appreciate if someone can show me how to do it. Cheers.
This is the code I tried with:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from matplotlib.widgets import Button
x = np.arange(1, 13)
index = np.repeat(x, 40)
np.random.seed(123)
df = pd.DataFrame({'A': np.random.normal(30, 2, len(index)),
'B': np.random.normal(10, 2, len(index))},
index=index)
red_diamond = dict(markerfacecolor='r', marker='D')
blue_dot = dict(markerfacecolor='b', marker='o')
fig=plt.figure(figsize=[10, 5])
ax = sns.boxplot(data=df, x=df.index, y='A', width=0.5, color='red',
boxprops=dict(alpha=.5, label='A'), flierprops=red_diamond)
sns.boxplot(data=df, x=df.index, y='B', width=0.5, color='blue',
boxprops=dict(alpha=.5, label='B'), flierprops=blue_dot, ax=ax)
# button to off boxplot fliers
resetax_off = plt.axes([0.8, 0.02, 0.08, 0.035])
button_off = Button(resetax_off, 'Flier off', color='red',
hovercolor='lightslategrey')
# button to on boxplot fliers
resetax_on = plt.axes([0.6, 0.02, 0.08, 0.035])
button_on = Button(resetax_on, 'Flier on', color='gold',
hovercolor='lightslategrey')
def click_off(event):
for i,artist in enumerate(ax.artists):
line = ax1.line[i+4] #trying to get Line2D for the fliers, 4th in the list of 6
line.set(alpha=0)
fig.canvas.draw_idle()
button_off.on_clicked(click_off)
def click_on(event):
for i,artist in enumerate(ax.artists):
line = ax1.line[i+4] #trying to get Line2D for the fliers
line.set(alpha=1)
fig.canvas.draw_idle()
button_on.on_clicked(click_on)
plt.show()

pandas plot multiple df in canvas

I have difficulty with setting two different data plots inside one axis in tkinter canvas. Currently is being displayed only last plot, second is hidden.
Update:
Test example is working, but not in my original setup.
Problem is moved here.
Test example:
from tkinter import Tk, Canvas
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
window = Tk()
window.geometry('800x600')
df = pd.DataFrame()
df['x'] = np.linspace(1, 10, 10)
df['y'] = np.random.randint(1, 10, 10)
df2 = pd.DataFrame()
df2['x'] = np.linspace(1, 10, 50)
df2['y'] = np.random.randint(1, 10, 50)
fig, ax = plt.subplots()
df.plot(x='x', y='y', kind='bar', ax=ax, width=1., figsize=(3, 2.5), legend=None)
df2.plot(x='x', y='y', kind='line', ax=ax, legend=None)
Canvas(window, background='white') # create canvas field
canvas_plot = FigureCanvasTkAgg(fig, window) # Draw area
canvas_plot.get_tk_widget().grid(column=1, row=6, padx=1, pady=10, rowspan=2, columnspan=3)
canvas_plot.draw() # draw canvas
window.mainloop()
Question:
How to make this plots to be displayed in one figure?

Plotting Pandas dataframe subplots with different linestyles

I am plotting a figure with 6 sets of axes, each with a series of 3 lines from one of 2 Pandas dataframes (1 line per column).
I have been using matplotlib .plot:
import pandas as pd
import matplotlib.pyplot as plt
idx = pd.DatetimeIndex(start = '2013-01-01 00:00', periods =24,freq = 'H')
df1 = pd.DataFrame(index = idx, columns = ['line1','line2','line3'])
df1['line1']= df1.index.hour
df1['line2'] = 24 - df1['line1']
df1['line3'] = df1['line1'].mean()
df2 = df1*2
df3= df1/2
df4= df2+df3
fig, ax = plt.subplots(2,2,squeeze=False,figsize = (10,10))
ax[0,0].plot(df1.index, df1, marker='', linewidth=1, alpha=1)
ax[0,1].plot(df2.index, df2, marker='', linewidth=1, alpha=1)
ax[1,0].plot(df3.index, df3, marker='', linewidth=1, alpha=1)
ax[1,1].plot(df4.index, df4, marker='', linewidth=1, alpha=1)
fig.show()
It's all good, and matplotlib automatically cycles through a different colour for each line, but uses the same colours for each plot, which is what i wanted.
However, now I want to specify more details for the lines: choosing specific colours for each line, and / or changing the linestyle for each line.
This link shows how to pass multiple linestyles to a Pandas plot. e.g. using
ax = df.plot(kind='line', style=['-', '--', '-.'])
So I need to either:
pass lists of styles to my subplot command above, but style is not recognised and it doesn't accept a list for linestyle or color. Is there a way to do this?
or
Use df.plot:
fig, ax = plt.subplots(2,2,squeeze=False,figsize = (10,10))
ax[0,0] = df1.plot(style=['-','--','-.'], marker='', linewidth=1, alpha=1)
ax[0,1] = df2.plot(style=['-','--','-.'],marker='', linewidth=1, alpha=1)
ax[1,0] = df3.plot( style=['-','--','-.'],marker='', linewidth=1, alpha=1)
ax[1,1] = df4.plot(style=['-','--','-.'], marker='', linewidth=1, alpha=1)
fig.show()
...but then each plot is plotted as a seperate figure. I can't see how to put multiple Pandas plots on the same figure.
How can I make either of these approaches work?
using matplotlib
Using matplotlib, you may define a cycler for the axes to loop over color and linestyle automatically. (See this answer).
import numpy as np; np.random.seed(1)
import pandas as pd
import matplotlib.pyplot as plt
f = lambda i: pd.DataFrame(np.cumsum(np.random.randn(20,3),0))
dic1= dict(zip(range(3), [f(i) for i in range(3)]))
dic2= dict(zip(range(3), [f(i) for i in range(3)]))
dics = [dic1,dic2]
rows = range(3)
def set_cycler(ax):
ax.set_prop_cycle(plt.cycler('color', ['limegreen', '#bc15b0', 'indigo'])+
plt.cycler('linestyle', ["-","--","-."]))
fig, ax = plt.subplots(3,2,squeeze=False,figsize = (8,5))
for x in rows:
for i,dic in enumerate(dics):
set_cycler(ax[x,i])
ax[x,i].plot(dic[x].index, dic[x], marker='', linewidth=1, alpha=1)
plt.show()
using pandas
Using pandas you can indeed supply a list of possible colors and linestyles to the df.plot() method. Additionally you need to tell it in which axes to plot (df.plot(ax=ax[i,j])).
import numpy as np; np.random.seed(1)
import pandas as pd
import matplotlib.pyplot as plt
f = lambda i: pd.DataFrame(np.cumsum(np.random.randn(20,3),0))
dic1= dict(zip(range(3), [f(i) for i in range(3)]))
dic2= dict(zip(range(3), [f(i) for i in range(3)]))
dics = [dic1,dic2]
rows = range(3)
color = ['limegreen', '#bc15b0', 'indigo']
linestyle = ["-","--","-."]
fig, ax = plt.subplots(3,2,squeeze=False,figsize = (8,5))
for x in rows:
for i,dic in enumerate(dics):
dic[x].plot(ax=ax[x,i], style=linestyle, color=color, legend=False)
plt.show()

Matplotlib: Don't show errorbars in legend

I'm plotting a series of data points with x and y error but do NOT want the errorbars to be included in the legend (only the marker). Is there a way to do so?
Example:
import matplotlib.pyplot as plt
import numpy as np
subs=['one','two','three']
x=[1,2,3]
y=[1,2,3]
yerr=[2,3,1]
xerr=[0.5,1,1]
fig,(ax1)=plt.subplots(1,1)
for i in np.arange(len(x)):
ax1.errorbar(x[i],y[i],yerr=yerr[i],xerr=xerr[i],label=subs[i],ecolor='black',marker='o',ls='')
ax1.legend(loc='upper left', numpoints=1)
fig.savefig('test.pdf', bbox_inches=0)
You can modify the legend handler. See the legend guide of matplotlib.
Adapting your example, this could read:
import matplotlib.pyplot as plt
import numpy as np
subs=['one','two','three']
x=[1,2,3]
y=[1,2,3]
yerr=[2,3,1]
xerr=[0.5,1,1]
fig,(ax1)=plt.subplots(1,1)
for i in np.arange(len(x)):
ax1.errorbar(x[i],y[i],yerr=yerr[i],xerr=xerr[i],label=subs[i],ecolor='black',marker='o',ls='')
# get handles
handles, labels = ax1.get_legend_handles_labels()
# remove the errorbars
handles = [h[0] for h in handles]
# use them in the legend
ax1.legend(handles, labels, loc='upper left',numpoints=1)
plt.show()
This produces
Here is an ugly patch:
pp = []
colors = ['r', 'b', 'g']
for i, (y, yerr) in enumerate(zip(ys, yerrs)):
p = plt.plot(x, y, '-', color='%s' % colors[i])
pp.append(p[0])
plt.errorbar(x, y, yerr, color='%s' % colors[i])
plt.legend(pp, labels, numpoints=1)
Here is a figure for example:
The accepted solution works in simple cases but not in general. In particular, it did not work in my own more complex situation.
I found a more robust solution, which tests for ErrorbarContainer, which did work for me. It was proposed by Stuart W D Grieve and I copy it here for completeness
import matplotlib.pyplot as plt
from matplotlib import container
label = ['one', 'two', 'three']
color = ['red', 'blue', 'green']
x = [1, 2, 3]
y = [1, 2, 3]
yerr = [2, 3, 1]
xerr = [0.5, 1, 1]
fig, (ax1) = plt.subplots(1, 1)
for i in range(len(x)):
ax1.errorbar(x[i], y[i], yerr=yerr[i], xerr=xerr[i], label=label[i], color=color[i], ecolor='black', marker='o', ls='')
handles, labels = ax1.get_legend_handles_labels()
handles = [h[0] if isinstance(h, container.ErrorbarContainer) else h for h in handles]
ax1.legend(handles, labels)
plt.show()
It produces the following plot (on Matplotlib 3.1)
I works for me if I set the label argument as a None type.
plt.errorbar(x, y, yerr, label=None)